Loading...
āœ“

12-Hour Money-Back Guarantee

šŸ“˜ Schema Registry & Compatibility Rules

šŸ“˜ Schema Registry & Compatibility Rules

šŸ“˜ Schema Registry & Compatibility Rules

1 Feb 20224 min read

The Contract System Behind Event-Driven Architectures

Event streaming without schema governance is just distributed chaos.

A Schema Registry is not about serialization.
It is about controlling change over time.

1ļøāƒ£ Why You Need a Schema Registry

Without a registry:

  • Producers change fields silently

  • Consumers break in production

  • Replays fail

  • Historical data becomes unreadable

  • Rollbacks become impossible

In event systems, schema evolution is not optional — it is continuous.

2ļøāƒ£ What a Schema Registry Actually Does

At a high level:

  • Stores event schemas (Avro / Protobuf / JSON Schema)

  • Assigns version IDs

  • Enforces compatibility rules

  • Prevents breaking changes

  • Provides schema lookup at runtime


The registry sits between developers and chaos.

3ļøāƒ£ The Core Concept: Compatibility Modes

This is where depth begins.

There are four primary compatibility types:

Mode

Who must survive change?

Backward

Old consumers

Forward

Old producers

Full

Both

None

No guarantees

4ļøāƒ£ Backward Compatibility (Most Common)

New schema can read old data.

Used when:

  • Consumers upgrade before producers

  • Replay is common

Example

V1

{
  "orderId": "string",
  "total": "number"
}

V2 (Add optional field)

{
  "orderId": "string",
  "total": "number",
  "currency": "string?"
}

āœ” Backward compatible
Old events still readable.

5ļøāƒ£ Forward Compatibility

Old schema can read new data.

Used when:

  • Producers upgrade first

  • Consumers lag behind

Rule:

New fields must have defaults or be optional.

6ļøāƒ£ Full Compatibility

Both backward and forward compatible.

This is safest but most restrictive.

7ļøāƒ£ The Dangerous One: None

No compatibility checks.

This is how production outages happen.

8ļøāƒ£ Compatibility Deep Dive (Avro Example)

Let’s go precise.

Rule 1: Adding a field

āœ” Allowed if:

  • Field has default

  • Or is optional


Rule 2: Removing a field

āœ” Allowed only if:

  • Field was optional

  • Or had default


Rule 3: Changing field type

āŒ Usually breaking.

"total": "string" → "number"

Breaks deserialization.

9ļøāƒ£ Why Type Systems Matter

Avro:

  • Strong schema enforcement

  • Supports evolution rules

JSON:

  • Looser

  • Riskier

  • Often breaks silently

Protobuf:

  • Field numbers matter

  • Never reuse field numbers

  • Reserve removed fields

šŸ”Ÿ Hard Production Rules

NEVER:

  • Rename fields

  • Reuse field numbers (Protobuf)

  • Change meaning of a field

  • Remove required fields

  • Change enum values carelessly

1ļøāƒ£1ļøāƒ£ Compatibility vs Semantics

Compatibility rules check structure, not meaning.

This is the subtle danger.

Example:

"amount": number

Before: cents
After: dollars

Schema compatible.
Semantically catastrophic.

Schema registry protects structure, not intent

1ļøāƒ£2ļøāƒ£ Subject-Level Compatibility

In Confluent-style systems:

Each topic/subject has:

  • Its own compatibility mode

  • Its own version history

Example:

orders-value → BACKWARD
payments-value → FULL
audit-log → NONE

1ļøāƒ£3ļøāƒ£ Rolling Deploy Safety Pattern

Safe rollout order depends on compatibility mode.

If BACKWARD:

  1. Deploy consumers

  2. Deploy producers

If FORWARD:

  1. Deploy producers

  2. Deploy consumers

Get this wrong → outage.

Compatibility determines safe order.

1ļøāƒ£4ļøāƒ£ Schema ID Encoding (Wire Format)

Producer sends:

[magic byte][schema ID][serialized payload]

Consumer:

  • Reads schema ID

  • Fetches schema

  • Deserializes

This enables:

  • Multiple schema versions in same topic

  • Long-lived history

1ļøāƒ£5ļøāƒ£ Multi-Team Governance Problem

Without registry:

  • Team A changes event

  • Team B deploys later

  • Production crash

With registry:

  • CI fails on incompatible schema

  • Change blocked before deploy

1ļøāƒ£6ļøāƒ£ Advanced: Transitive Compatibility

Instead of comparing only to latest schema:

Compare against all historical schemas.

Why?
Because replay from 3 years ago must still work.

Modes:

  • BACKWARD_TRANSITIVE

  • FORWARD_TRANSITIVE

  • FULL_TRANSITIVE

Transitive compatibility is required for event sourcing systems.

1ļøāƒ£7ļøāƒ£ Schema Evolution in Event Sourcing

Event sourcing demands:

  • Transitive backward compatibility

  • Infinite retention safety

  • Upcasting support

Otherwise:

  • You cannot rebuild projections.

1ļøāƒ£8ļøāƒ£ Versioning Strategy Matrix

Strategy

Pros

Cons

Additive fields

Simple

Limited flexibility

Versioned events

Clear

Proliferation

Upcasters

Clean consumers

Central complexity

Dual publish

Safe migration

Storage overhead

1ļøāƒ£9ļøāƒ£ Failure Story (Realistic)

Team adds required field:

"country": "string"

Without default.

Old consumer crashes.

Kafka lag spikes.

Autoscaling reacts.

Retry storm begins.

Outage.

Root cause:

No compatibility enforcement.

2ļøāƒ£0ļøāƒ£ Production-Grade Setup Checklist

If you run Kafka in production:

āœ” Schema Registry enforced in CI
āœ” Transitive backward compatibility
āœ” Consumer contract tests
āœ” Replay test environment
āœ” Schema documentation
āœ” Event semantic versioning