Loading...
āœ“

12-Hour Money-Back Guarantee

šŸ“˜ The Noisy Neighbor Problem

šŸ“˜ The Noisy Neighbor Problem

šŸ“˜ The Noisy Neighbor Problem

3 Apr 20223 min read

Designing Safe Multi-Tenant Systems

Multi-tenancy doesn’t fail because of traffic.
It fails because one customer behaves differently than the rest.

1ļøāƒ£ What Is the Noisy Neighbor Problem?

In a multi-tenant system, multiple customers (tenants) share the same infrastructure:

  • App servers

  • Databases

  • Caches

  • Queues

A noisy neighbor is a tenant whose behavior:

  • Consumes disproportionate resources

  • Degrades performance for others

  • Causes cascading failures

One bad tenant should never break a good one.

2ļøāƒ£ Why This Is a System Design Problem (Not Just Ops)

The noisy neighbor problem is caused by design choices, not traffic spikes.

It happens even when:

  • Total traffic is low

  • Infrastructure is healthy

  • SLAs are reasonable

3ļøāƒ£ The Naive Multi-Tenant Design (Guaranteed to Fail)

āŒ Single Shared Everything

All Tenants
   ↓
Single App Pool
   ↓
Single DB
   ↓
Single Cache

Failure Mode

  • One tenant runs heavy queries

  • DB CPU spikes

  • Latency increases

  • All tenants suffer

4ļøāƒ£ Why ā€œMore Hardwareā€ Doesn’t Fix This

Adding capacity:

  • Helps everyone equally

  • Doesn’t isolate bad behavior

  • Delays failure

Capacity without isolation just increases the blast radius.

5ļøāƒ£ Root Causes of Noisy Neighbors

Resource How Noise Happens
CPU Heavy computation
DB Full scans, joins
Cache Hot keys
Queue Large messages
Network Large payloads

The weakest shared resource fails first.

6ļøāƒ£ Isolation Strategy #1 — Per-Tenant Rate Limiting

Idea

Limit how much each tenant can consume.

āœ… Code (Token Bucket)

const limits = new Map();

function allow(tenantId) {
  const bucket = limits.get(tenantId) ?? { tokens: 100 };
  if (bucket.tokens <= 0) return false;
  bucket.tokens--;
  limits.set(tenantId, bucket);
  return true;
}

Why This Helps

āœ” Simple
āœ” Immediate protection
āŒ Doesn’t isolate backend cost

7ļøāƒ£ Isolation Strategy #2 — Load-Based Limits (Better)

Instead of counting requests, count work.

function estimateCost(req) {
  return req.type === "heavy" ? 10 : 1;
}

Reject tenants exceeding their load budget.

This aligns perfectly with Load ≠ Traffic.

8ļøāƒ£ Isolation Strategy #3 — Per-Tenant Queues

Idea

Each tenant gets its own queue.

Why This Works

āœ” Noise is contained
āœ” Backpressure per tenant
āŒ More operational complexity

9ļøāƒ£ Isolation Strategy #4 — Database-Level Isolation

Option A — Shared DB, Tenant ID Column

SELECT * FROM orders WHERE tenant_id = ?

āŒ Still noisy
āŒ Shared indexes
āŒ Lock contention

Option B — Schema Per Tenant

tenant_123.orders
tenant_456.orders

āœ” Better isolation
āŒ Schema sprawl

Option C — Database Per Tenant (Strongest)

āœ” Hard isolation
āœ” Clean SLAs
āŒ Expensive
āŒ Hard to manage at scale

1ļøāƒ£0ļøāƒ£ Isolation Strategy #5 — Cache Partitioning

āŒ Shared Cache Keyspace

cache_key = "post:42"

āœ… Tenant-Aware Cache

cache_key = "tenant:123:post:42"

Add Per-Tenant Limits

  • Max keys

  • Max memory

  • Max TTL

1ļøāƒ£1ļøāƒ£ Isolation Strategy #6 — Compute Isolation

Option A — Thread Pools per Tenant

const pools = {
  free: createPool(10),
  paid: createPool(50)
};

Option B — Process / Pod Isolation

  • One tenant per pod

  • Horizontal isolation

  • Clear blast radius

This is how high-end SaaS works.

1ļøāƒ£2ļøāƒ£ The Noisy Neighbor Killer: Admission Control

Central Gate (Critical Pattern)

if (!tenantHasBudget(tenantId)) {
  return res.status(429).send("Rate Limited");
}

Isolation without enforcement is an illusion.

1ļøāƒ£3ļøāƒ£ Priority-Based Isolation (Business-Aware)

Tenant Tier Treatment
Free Aggressive limits
Pro Higher budgets
Enterprise Dedicated capacity

This aligns:

  • Revenue

  • SLAs

  • Architecture

1ļøāƒ£4ļøāƒ£ Failure Story (Realistic)

  • One tenant exports data

  • Triggers full-table scans

  • DB CPU spikes

  • Cache evictions

  • All tenants timeout

Root cause:

No per-tenant DB or query isolation.