š Why Autoscaling Often Makes Things Worse
When āadding capacityā destabilizes the system
Autoscaling reacts to symptoms.
Real problems are causal.
1ļøā£ The Dangerous Belief
āIf traffic increases, autoscaling will save us.ā
This assumes:
Load is linear
Metrics are immediate
New instances are instantly useful
Bottlenecks scale horizontally
None of these are true in production.
2ļøā£ Autoscaling Is a Feedback Loop (Control Theory)
Autoscaling is a delayed negative feedback loop.
Timeline:
Load increases
Latency rises
Metrics detect it
Scale-out happens
New instances start
Load redistributes
Each step has delay.
Delayed feedback systems oscillate.
3ļøā£ Failure Mode #1 ā Cold Start Amplification
What happens during scale-up
JVM warmup
Cache cold
Connection pools empty
JIT not optimized
New instances are slower, not faster.
Autoscaling often increases tail latency first.
4ļøā£ Failure Mode #2 ā Scaling the Wrong Bottleneck
Example
App scales horizontally
Database does not
Result:
More app instances
More DB connections
DB melts faster
Autoscaling increases pressure on the bottleneck.
5ļøā£ Failure Mode #3 ā Retry & Autoscaling Feedback Loop
Sequence
Latency increases
Clients retry
Load doubles
Autoscaler scales up
New instances generate retries
System collapses
Autoscaling can turn temporary slowness into total failure.
6ļøā£ Failure Mode #4 ā Thrashing (Scale Up ā Scale Down)
Why it happens
Metrics lag (30ā60s)
Bursty traffic
Aggressive scale rules
Instances are:
Created
Destroyed
Recreated
CPU, memory, and caches are constantly cold.
7ļøā£ Autoscaling Hides Capacity Planning Failures
Teams stop asking:
What is our real capacity?
Where is the bottleneck?
What is the knee point?
They rely on:
āAutoscaling will handle it.ā
Until it doesnāt.
8ļøā£ Code Example ā Naive Autoscaling Trigger ā
scaleUp:
cpuUtilization: 70%
add: 2 pods
Why this is bad
CPU is a lagging indicator
P99 already exploded
Scaling happens too late
9ļøā£ The Right Mental Model
Autoscaling is a cost optimization tool ā not a reliability tool.
It is good for:
Diurnal traffic
Predictable growth
Cost efficiency
It is bad for:
Spikes
Cascading failures
Bottlenecked systems
š What Actually Prevents Failure (Before Autoscaling)
In order of importance:
Hard concurrency limits
Backpressure
Load shedding
Timeouts
Circuit breakers
Autoscaling (last)
1ļøā£0ļøā£ Autoscaling Done Right (When It Helps)
Autoscaling helps only if:
Bottleneck scales horizontally
Cold start cost is low
Load changes slowly
Limits already exist
ā Better Pattern
if (inFlight > MAX_CAPACITY) {
return res.status(503).send("Overloaded");
}
Autoscaling then:
Handles long-term growth
Not short-term survival
1ļøā£1ļøā£ Real-World Examples
Netflix
Heavy load shedding
Autoscaling secondary
AWS APIs
Strict throttling
Autoscaling behind limits
Explicit admission control
Autoscaling only after safety
