Loading...
āœ“

12-Hour Money-Back Guarantee

šŸ“˜ Capacity Planning Is Harder Than It Looks

šŸ“˜ Capacity Planning Is Harder Than It Looks

šŸ“˜ Capacity Planning Is Harder Than It Looks

3 Apr 20223 min read

Why ā€œwe’re under 50% CPUā€ is a dangerous lie

Capacity planning is not about how fast your system is.
It’s about how it fails under load.

1ļøāƒ£ The Most Common (Wrong) Mental Model

āŒ What teams think

ā€œOur service can handle 10k RPS.ā€

āœ… Reality

Capacity is work per second, not requests per second.

Two requests are never equal.

2ļøāƒ£ The First Big Mistake — RPS ≠ Load

Example

Request CPU Cost
GET /profile 1 ms
GET /feed 40 ms
POST /checkout 120 ms

At 1,000 RPS:

  • All profiles → fine

  • All checkout → system dead

Traffic shape matters more than traffic volume.

3ļøāƒ£ The Real Unit of Capacity: CPU-Seconds

Key Rule

A system has N CPU-seconds per second.

Example:

  • 8 cores

  • Each core ā‰ˆ 1 CPU-second / second

Total budget:

8 CPU-seconds / second

If one request costs:

50 ms CPU = 0.05 CPU-seconds

Max sustainable throughput:

8 / 0.05 = 160 RPS

No amount of async changes this.

4ļøāƒ£ Little’s Law (The Law Everyone Ignores)

Concurrency = Throughput Ɨ Latency

If:

  • Throughput = 200 RPS

  • Latency = 500 ms

Then:

Concurrency = 100 in-flight requests

Increase latency → concurrency explodes → memory explodes.

5ļøāƒ£ Why ā€œHeadroomā€ Doesn’t Save You

āŒ Common rule

ā€œKeep CPU under 70%ā€

Why this fails

  • GC pauses

  • Kernel scheduling

  • Cache misses

  • Lock contention

At ~70–80% CPU:

  • Context switching skyrockets

  • Tail latency spikes

  • Throughput drops

Systems don’t degrade linearly.

6ļøāƒ£ The Knee of the Curve (Critical Insight)

Every system has a knee point:

  • Before knee → stable

  • After knee → latency explodes

  • Slight load increase → collapse

7ļøāƒ£ Capacity Planning Mistake #2 — Ignoring Variance

Even if average load is safe:

  • P95 requests

  • Cache misses

  • Cold shards

  • Slow disks

…will dominate P99.

Capacity planning must consider worst-case paths, not averages.

8ļøāƒ£ Async, Queues, and the Capacity Illusion

āŒ ā€œWe added async + a queueā€

What actually happened:

  • Requests stopped blocking

  • In-flight count increased

  • Memory usage spiked

  • Tail latency worsened

9ļøāƒ£ The Only Capacity That Matters: Bottlenecks

Your system’s capacity is the minimum of:

  • CPU

  • Memory

  • DB connections

  • Disk IOPS

  • Network

  • External dependencies

You don’t scale a system.
You scale its tightest bottleneck.

1ļøāƒ£0ļøāƒ£ Practical Capacity Model (Simple & Useful)

For each critical dependency, compute:

Capacity = (Resource Budget) / (Cost per request)

Example:

  • DB connections = 100

  • Avg query time = 50 ms

Max DB RPS:

100 / 0.05 = 2000 RPS

Now apply:

  • Cache miss rate

  • Retries

  • Fanout

Real capacity is much lower.

1ļøāƒ£1ļøāƒ£ Why Load Tests Lie

Load tests often:

  • Use uniform traffic

  • Ignore cache warmup

  • Ignore retries

  • Ignore long tails

Result:

ā€œIt handled 5k RPS in staging!ā€

1ļøāƒ£2ļøāƒ£ What Good Capacity Planning Actually Looks Like

āœ… Principles

  1. Plan for P99 cost, not average

  2. Assume partial failure

  3. Model fanout

  4. Enforce hard limits

  5. Fail fast beyond capacity

1ļøāƒ£3ļøāƒ£ Capacity Without Limits Is Fiction

āŒ No limit

app.get("/data", async (req, res) => {
  res.send(await work());
});

āœ… Capacity-aware

if (inFlight > MAX) {
  return res.status(503).send("Over capacity");
}

Capacity only exists if it’s enforced.

1ļøāƒ£4ļøāƒ£ The Dark Truth

Capacity planning is not about preventing overload.
It’s about deciding how you fail.

Fail modes:

  • Slow failure āŒ

  • Cascading failure āŒ

  • Fast, bounded failure āœ