📘 Why Autoscaling Often Makes Things Worse

3 Apr 20223 min read

When “adding capacity” destabilizes the system

Autoscaling reacts to symptoms.
Real problems are causal.

1️⃣ The Dangerous Belief

“If traffic increases, autoscaling will save us.”

This assumes:

Load is linear
Metrics are immediate
New instances are instantly useful
Bottlenecks scale horizontally

None of these are true in production.

2️⃣ Autoscaling Is a Feedback Loop (Control Theory)

Autoscaling is a delayed negative feedback loop.

Timeline:

Load increases
Latency rises
Metrics detect it
Scale-out happens
New instances start
Load redistributes

Each step has delay.

Delayed feedback systems oscillate.

3️⃣ Failure Mode #1 — Cold Start Amplification

What happens during scale-up

JVM warmup
Cache cold
Connection pools empty
JIT not optimized

New instances are slower, not faster.

Autoscaling often increases tail latency first.

4️⃣ Failure Mode #2 — Scaling the Wrong Bottleneck

Example

App scales horizontally
Database does not

Result:

More app instances
More DB connections
DB melts faster

Autoscaling increases pressure on the bottleneck.

5️⃣ Failure Mode #3 — Retry & Autoscaling Feedback Loop

Sequence

Latency increases
Clients retry
Load doubles
Autoscaler scales up
New instances generate retries
System collapses

Autoscaling can turn temporary slowness into total failure.

6️⃣ Failure Mode #4 — Thrashing (Scale Up ↔ Scale Down)

Why it happens

Metrics lag (30–60s)
Bursty traffic
Aggressive scale rules

Instances are:

Created
Destroyed
Recreated

CPU, memory, and caches are constantly cold.

7️⃣ Autoscaling Hides Capacity Planning Failures

Teams stop asking:

What is our real capacity?
Where is the bottleneck?
What is the knee point?

They rely on:

“Autoscaling will handle it.”

Until it doesn’t.

8️⃣ Code Example — Naive Autoscaling Trigger ❌

scaleUp:
  cpuUtilization: 70%
  add: 2 pods

Why this is bad

CPU is a lagging indicator
P99 already exploded
Scaling happens too late

9️⃣ The Right Mental Model

Autoscaling is a cost optimization tool — not a reliability tool.

It is good for:

Diurnal traffic
Predictable growth
Cost efficiency

It is bad for:

Spikes
Cascading failures
Bottlenecked systems

🔑 What Actually Prevents Failure (Before Autoscaling)

In order of importance:

Hard concurrency limits
Backpressure
Load shedding
Timeouts
Circuit breakers
Autoscaling (last)

1️⃣0️⃣ Autoscaling Done Right (When It Helps)

Autoscaling helps only if:

Bottleneck scales horizontally
Cold start cost is low
Load changes slowly
Limits already exist

✅ Better Pattern

if (inFlight > MAX_CAPACITY) {
  return res.status(503).send("Overloaded");
}

Autoscaling then:

Handles long-term growth
Not short-term survival

1️⃣1️⃣ Real-World Examples

Netflix

Heavy load shedding
Autoscaling secondary

AWS APIs

Strict throttling
Autoscaling behind limits

Google

Explicit admission control
Autoscaling only after safety