Loading...
āœ“

12-Hour Money-Back Guarantee

šŸ“˜ Tail Latency (P99) & Amplification

šŸ“˜ Tail Latency (P99) & Amplification

šŸ“˜ Tail Latency (P99) & Amplification

3 Apr 20223 min read

Why Your System Is ā€œFastā€ but Users Still Complain

A system can be fast on average and still fail in production.


🧠 What Is Tail Latency?

Metric Meaning
P50 Median request
P95 95% of requests
P99 Slowest 1% (users remember this)

Users don’t experience averages.
They experience the slowest requests.


🧩 The Root Cause: Amplification

Tail latency explodes when one request fans out into many operations.

1 user request
→ N cache reads
→ M DB calls
→ K index lookups
→ L network hops

Even if each step is ā€œfastā€, the slowest sub-operation dominates.

🧮 The Math (Simple but Deadly)

If:

  • Each call has 1% chance of being slow

  • Request fans out to 20 calls

Chance at least one is slow ā‰ˆ 1 - (0.99)^20 ā‰ˆ 18%

šŸ‘‰ P99 becomes common.

🧩 Example 1 — Read Amplification → Tail Latency

āŒ Naive API

async function getFeed(userId) {
  const posts = await db.getPosts(userId);

  return Promise.all(
    posts.map(p => db.getComments(p.id)) // fanout
  );
}

What Happens

  • 1 request → 1 + N DB calls

  • One slow comment query → whole request slow

āœ… Fix 1 — Reduce Fanout

async function getFeed(userId) {
  const posts = await db.getPosts(userId);
  const ids = posts.map(p => p.id);

  const comments = await db.getCommentsBatch(ids);

  return merge(posts, comments);
}

āœ” Fewer calls
āœ” Lower P99
āœ” Predictable latency

🧩 Example 2 — Write Amplification → Tail Latency

āŒ Synchronous Write Path

async function placeOrder(order) {
  await db.insert(order);
  await inventory.update(order);
  await redis.del("orders");
  await searchIndex.update(order);
}

Problem

  • User waits for slowest downstream

  • One spike → P99 explosion

āœ… Fix 2 — Async the Fanout

async function placeOrder(order) {
  await db.insert(order);
  queue.publish("order_created", order);
}

Workers handle:

  • Cache invalidation

  • Index updates

āœ” Fast response
āœ” Stable tail latency

🧩 Example 3 — Locks & Tail Latency

Locks don’t slow everyone —
they slow someone.

await mutex.lock();
await criticalSection();
await mutex.unlock();

Problem

  • Queue builds

  • One slow holder → long tail

āœ… Fix 3 — Narrow or Remove Locks

  • Request coalescing

  • Lock-free reads

  • Sharded locks

🧠 Why Caching Alone Doesn’t Fix P99

Caching improves P50, not necessarily P99.

Why?

  • Cold misses

  • Cache eviction

  • Hot keys

  • Lock contention

  • Network hiccups

Tail latency is about worst-case paths, not averages.

🧩 Observability Mistake (Common)

Teams monitor:

avg latency = 50ms āœ…

But ignore:

P99 = 2.5s āŒ

šŸŽÆ Golden Rules to Kill Tail Latency

  1. Reduce fanout

  2. Avoid synchronous chains

  3. Cap retries

  4. Use timeouts aggressively

  5. Prefer stale data over waiting

  6. Measure P99 first, average later