Back to Blog Distributed Systems

Consistent Hashing: The Algorithm That Keeps Distributed Systems Sane

April 2026 8 min read Srikanth Badavath


Imagine you're running a distributed cache with 4 nodes and millions of keys. Everything is working great. Then traffic spikes and you add a 5th node. With naive modulo hashing, almost every key gets remapped to a different node — you've just invalidated your entire cache in one shot. This is the thundering herd problem, and consistent hashing was invented to solve it.

The Problem with Naive Hashing

With modulo hashing, a key maps to node hash(key) % N where N is the number of nodes. When N changes, most keys change their target node:

ChangeKeys RemappedImpact
4 nodes → 5 nodes~80%Cache invalidated, DB flooded
4 nodes → 3 nodes (node failure)~75%Cache cold, potential outage
10 nodes → 11 nodes~91%Worse as cluster grows
Consistent hashing (any change)~K/N onlyOnly affected keys move

For a cache, remapped keys mean cache misses. Every miss hits the database — a sudden flood of queries that may overwhelm it. Systems have gone down because of this cascade.

The Ring: How Consistent Hashing Works

Consistent hashing maps both keys and nodes onto a circular ring of integers (typically 0 to 2³²−1 ≈ 4.3 billion). The algorithm:

  1. Hash each node to a position on the ring using its ID or IP address.
  2. To find which node owns a key, hash the key and walk clockwise until you hit a node.
  3. That node is responsible for storing the key.
clockwise A pos:0 B pos:90 C pos:180 D pos:270 k1 → B k2 → C k3 → D k4 → A Hash Ring 0 → 2³²

Hash ring with 4 nodes (blue) and 4 keys (orange). Each key routes clockwise to the next node.

When you add a node (say Node E at position 45), it only takes over keys between position 0 (Node A) and 45. All other keys are unaffected. On average, only K/N keys need to move (K = total keys, N = total nodes).

Example: 1,000,000 keys across 10 nodes. Add node 11: with modulo hashing ~910,000 keys remap. With consistent hashing: only ~90,909 keys move. Same math, vastly different impact.

Python Implementation

Here's a working consistent hash ring implementation. This is what production libraries like uhashring and client-side Memcached do internally:

import hashlib
import bisect

class ConsistentHashRing:
    def __init__(self, nodes=None, replicas=150):
        """
        replicas: number of virtual nodes per physical node
        Higher = better distribution, more memory
        """
        self.replicas = replicas
        self.ring = {}          # hash position → node name
        self.sorted_keys = []   # sorted list of hash positions

        for node in (nodes or []):
            self.add_node(node)

    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_node(self, node: str):
        for i in range(self.replicas):
            # Each virtual node gets a unique key: "node:0", "node:1" …
            vnode_key = f"{node}:{i}"
            h = self._hash(vnode_key)
            self.ring[h] = node
            bisect.insort(self.sorted_keys, h)

    def remove_node(self, node: str):
        for i in range(self.replicas):
            h = self._hash(f"{node}:{i}")
            del self.ring[h]
            self.sorted_keys.remove(h)

    def get_node(self, key: str) -> str:
        """Walk clockwise to find the node responsible for key."""
        if not self.ring:
            return None
        h = self._hash(key)
        # Find first position >= h (clockwise)
        idx = bisect.bisect(self.sorted_keys, h) % len(self.sorted_keys)
        return self.ring[self.sorted_keys[idx]]

# Usage
ring = ConsistentHashRing(nodes=["node-1", "node-2", "node-3"], replicas=150)

# Route keys
print(ring.get_node("user:1001"))   # → "node-2"
print(ring.get_node("session:abc")) # → "node-1"

# Add a node — minimal remapping
ring.add_node("node-4")
print(ring.get_node("user:1001"))   # may or may not change

The key design decisions in production: replicas=150 gives reasonable distribution without excessive memory. MD5 is used for speed, not security. The bisect module gives O(log n) lookups on the sorted ring.

Virtual Nodes: Solving the Hot Spot Problem

Basic consistent hashing has a load distribution flaw. If nodes land unevenly on the ring by chance, some nodes own much larger arcs than others:

Without Virtual Nodes Node A (55%) B (18%) C (27%) ⚠ Uneven — Node A gets 55% of traffic With Virtual Nodes (150×) A (33%) B (33%) C (33%) ✓ Even — ~33% each

Virtual nodes dramatically smooth out load distribution across physical nodes.

With 150 virtual nodes per physical node, the law of large numbers kicks in and each physical node ends up owning roughly equal arcs of the ring.

How Lookups Flow End-to-End

flowchart LR Client(["Client"]) Hash["Hash the key hash('user:1001')"] Ring["Walk ring clockwise find next node position"] Node(["Node-2 (responsible node)"]) Cache{{"Cache hit?"}} DB[("Database")] Return(["Return value"]) Client -->|"get('user:1001')"| Hash Hash --> Ring Ring --> Node Node --> Cache Cache -->|"Yes"| Return Cache -->|"No"| DB DB --> Return

Replication with the Ring

For fault tolerance, data is replicated to the next N nodes clockwise (N = replication factor). In Cassandra with RF=3:

# Cassandra replication
CREATE KEYSPACE my_app WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'us-east': 3,   # 3 replicas in US-East
    'eu-west': 2    # 2 replicas in EU-West
};

# Consistency levels per query
session.execute(
    "SELECT * FROM users WHERE id = ?",
    [user_id],
    consistency_level=ConsistencyLevel.QUORUM  # wait for majority
)

When a node fails, its replicas cover the gap transparently. When it rejoins, nodetool repair syncs missed writes.

Real-World Usage

Amazon DynamoDB

DynamoDB hashes the partition key to determine storage placement. This is why partition key choice is critical:

# Good: high cardinality → even distribution
partition_key = "user_id"          # millions of unique values

# Bad: low cardinality → hot partition
partition_key = "status"           # only "active" / "inactive"

# Bad: time-based → hot partition (all writes hit latest)
partition_key = "date"             # 2024-01-15

Apache Cassandra

Cassandra's token ring is visible with nodetool ring. Adding a node triggers streaming:

# Check ring layout
nodetool ring

# When adding node-5:
# 1. node-5 joins, gets token positions
# 2. Existing nodes stream ~K/N keys to node-5
# 3. nodetool repair syncs remaining state
nodetool repair

Redis Cluster

Redis Cluster uses 16,384 hash slots instead of a continuous ring. Keys map to slots via CRC16(key) % 16384. Each node owns a range of slots. This is consistent hashing with discrete slots rather than a continuous ring.

SystemApproachVirtual NodesDefault Replicas
Apache CassandraToken ring + vnodes256 per nodeRF=1 (configurable)
Amazon DynamoDBInternal ring (proprietary)Yes (details hidden)3 (across AZs)
Redis Cluster16,384 hash slotsSlot ranges1 (+ replicas)
Memcached (client-side)Ring in client lib100–200 typicalNone (cache only)

Limitations

Takeaways