Every computer science student learns the CAP theorem. You can pick two of three: Consistency, Availability, and Partition Tolerance. But most explanations stop at the triangle diagram, leaving you no sense of what this means when you're actually building systems.
This post is the one I wish I had before my first production distributed system. We'll go through what each property really means, walk through a real partition scenario, and see how Cassandra, DynamoDB, and Zookeeper make this trade-off explicitly — with code.
The CAP Triangle
The CAP triangle. In practice, P is always required in distributed systems — so the real choice is C vs A during partitions.
What the Three Properties Actually Mean
Consistency (C) — Linearizability
Every read receives the most recent write or an error. This is not ACID consistency. In CAP it means linearizability: if you write value V to node-1, any subsequent read from node-2 must return V. No stale reads, ever.
# CP system guarantee (Zookeeper style)
write(key="leader", value="node-3")
# Partition occurs...
read(key="leader") # → "node-3" ✓ OR → Error ✓
# NEVER → "node-1" (old value)
Availability (A) — Always Respond
Every request receives a non-error response — but it doesn't have to be the latest data. Nodes that are up must respond. No timeouts, no errors, even during failures.
# AP system guarantee (Cassandra style)
write(key="cart", value="[item1,item2]", consistency=ONE)
# Partition occurs — replica C is unreachable
read(key="cart", consistency=ONE) # → "[item1]" (stale but responds)
# NEVER → Error
Partition Tolerance (P) — Survive Network Failures
The system continues operating even when network partitions drop messages between nodes. In any real distributed system (multiple machines or data centers), you must tolerate partitions — networks are unreliable. So the real choice is always C vs A when a partition occurs.
Key insight: "CA" systems (consistent + available, no partition tolerance) are just single-node databases. The moment you go multi-node across a network, you must tolerate partitions. CP or AP is the only real choice.
What a Network Partition Actually Looks Like
Node C is isolated. Client writes balance = 500 to Node A. Now what?
| Choice | Behavior | Trade-off |
|---|---|---|
| CP (consistency) | Refuse write until C confirms. Return error to client. | Correct data, but unavailable while partitioned |
| AP (availability) | Accept write on A+B. Sync C when partition heals. | Always responds, but C serves stale data temporarily |
Neither is wrong — it depends on your domain. A bank ledger demands CP. A shopping cart can tolerate AP.
How Real Systems Choose
Apache Cassandra — Tunable AP
Cassandra defaults to AP but gives you a consistency dial per query:
from cassandra.cluster import Cluster
from cassandra.policies import ConsistencyLevel
cluster = Cluster(['node1', 'node2', 'node3'])
session = cluster.connect('my_keyspace')
# AP: fast, available, eventually consistent
session.execute(
"INSERT INTO events (id, data) VALUES (%s, %s)",
(event_id, data),
timeout=1.0
) # ConsistencyLevel.ONE (default) — 1 replica confirms
# Sliding toward CP: stronger guarantees
from cassandra import ConsistencyLevel
session.default_consistency_level = ConsistencyLevel.QUORUM
# Now majority of replicas must confirm — survives 1 node down
# but fails if partition isolates minority
# Full CP: all replicas must confirm
session.default_consistency_level = ConsistencyLevel.ALL
# Fails if ANY replica is unreachable
The rule: QUORUM writes + QUORUM reads gives you strong consistency even in an eventually-consistent system, as long as the quorum sets overlap.
Amazon DynamoDB — AP with CP Opt-in
DynamoDB defaults to eventually consistent (AP) reads but lets you opt into strongly consistent (CP) reads per call:
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')
# AP (default) — cheap, fast, eventually consistent
response = table.get_item(Key={'user_id': '1001'})
# CP opt-in — 2x read cost, guaranteed latest value
response = table.get_item(
Key={'user_id': '1001'},
ConsistentRead=True # ← strongly consistent
)
# For financial operations: use transactions (CP)
dynamodb.meta.client.transact_write(Items=[
{
'Update': {
'TableName': 'Accounts',
'Key': {'id': {'S': 'A'}},
'UpdateExpression': 'SET balance = balance - :amt',
'ConditionExpression': 'balance >= :amt',
'ExpressionAttributeValues': {':amt': {'N': '100'}}
}
}
])
Apache Zookeeper — Strict CP
Zookeeper is built for distributed coordination (leader election, distributed locks, config management). Stale data here causes split-brain — two nodes both think they're the leader, writing conflicting data.
from kazoo.client import KazooClient
zk = KazooClient(hosts='zoo1:2181,zoo2:2181,zoo3:2181')
zk.start()
# Distributed lock — only one process holds it at a time
lock = zk.Lock('/locks/resource_A', 'my-process')
with lock:
# Only I am here. Zookeeper guarantees this.
# If partition occurs and quorum is lost:
# → Zookeeper STOPS serving (refuses all requests)
# → Lock is NOT granted to anyone
# This is CP: correct or nothing.
do_critical_work()
# Leader election — Zookeeper style
election = zk.Election('/elections/master')
election.run(my_leader_function) # blocks until elected
If a Zookeeper ensemble of 5 nodes loses 3 (quorum = 3), it stops serving entirely. It won't risk serving inconsistent state. This is why Zookeeper is used for coordination rather than high-throughput data storage.
The Partition Healing Problem
When the partition repairs, AP systems must reconcile diverged state. Different systems handle this differently:
Last-Write-Wins is simple but loses data. CRDTs (Conflict-free Replicated Data Types) merge automatically but only work for certain data structures (counters, sets). Application-level merge is most powerful but requires you to design conflict resolution logic upfront.
Beyond CAP: PACELC
CAP only describes behavior during partitions. But what about normal operation? The PACELC theorem extends this:
If Partition → choose C or A. Else (no partition) → choose Latency or Consistency.
| System | Partition choice | No-partition choice | PACELC |
|---|---|---|---|
| Cassandra (ONE) | Availability | Latency | PA/EL |
| Cassandra (QUORUM) | Availability | Consistency | PA/EC |
| DynamoDB (eventually consistent) | Availability | Latency | PA/EL |
| DynamoDB (strong reads) | Consistency | Consistency | PC/EC |
| Zookeeper | Consistency | Consistency | PC/EC |
| MySQL (single node) | N/A | Consistency | —/EC |
Connection to My Thesis Research
My thesis on Decentralized Agent-Based Workflow Orchestration Using Peer-to-Peer Negotiation is fundamentally a CAP problem. Without a central orchestrator, each agent in the network must independently decide: when my peers are unreachable, do I halt (CP) or continue with potentially stale workflow state (AP)? Getting this right is the core engineering challenge — and it turns out "it depends on the task" is a real answer that has to be encoded into the agent's negotiation protocol.
Practical Decision Guide
- Financial transactions, inventory, locks, leader election → CP (Zookeeper, etcd, HBase, PostgreSQL with Paxos)
- User sessions, shopping carts, social feeds, metrics → AP (Cassandra, DynamoDB with eventual consistency)
- Mixed workloads → Tunable systems (Cassandra QUORUM, DynamoDB per-request consistency)
- Always design for partition healing — decide your conflict resolution strategy before you write the first line of data code.