Quorums and Consensus: Untangling the Nuances of Agreement in Distributed Systems

Quorum and Consensus appear frequently in distributed-systems literature – especially in discussions around replication, fault tolerance and distributed databases. They often sound like close cousins, both involve replicas "agreeing" on something, both rely on majorities, and both aim to keep a distributed system from drifting apart.

Yet, despite these superficial similarities, they are fundamentally different in purpose and spirit.

In this blog post, we will explore the nuanced difference between these two agreement mechanisms, namely - Quorum and Consensus - and how each shapes the design of distributed systems.

Quorums

In leaderless systems like Dynamo, Cassandra, etc. quorums strive to keep the system fresh enough, not necessarily identical across replicas. In simple words, replicas are allowed to diverge temporarily. However, total divergence is avoided by ensuring that every pair of quorums has a non-empty intersection, a property known as the Quorum Intersection Property.

This is also formally expressed as: \( w + r > n \). Here, w = write quorum size, r = read quorum size, and n = total replicas.

Operation	What Quorum Achieves
Write Quorum (w)	Ensures that enough replicas store the new value so that it won’t be lost if some nodes fail.
Read Quorum (r)	Ensures that the read set overlaps with the last successful write quorum, so you’re likely to see the latest value.

Because there’s no coordination (like consensus) step enforcing global order, resulting in each replica ending up with its own version of history, these divergent versions must eventually be reconciled using one of the following mechanisms:
1. LWW (Last-Write-Wins) timestamps,
2. Vector clocks to detect concurrent versions, or
3. CRDTs to merge them safely.

In essence, leaderless quorum systems tolerate temporary conflict, aiming for eventual convergence, not immediate agreement.

Consensus

Consensus protocols go far beyond counting acknowledgements. Consensus ensures all replicas do indeed agree on a singular version of history resulting in single order of history – even in the presence of crashes, leader changes, or message delays. While the FLP impossibility shows that perfect consensus is unattainable in a purely asynchronous network, practical systems overcome this with timeouts, leader election, and failure detectors.

Algorithms like Paxos, Raft, and Zab guarantee three things:
1. Agreement: No two nodes ever decide differently.
2. Order: All nodes apply decisions in the same order.
3. Durability: Once decided, a value can’t be lost or overwritten.

This gives us linearizability, not just total order broadcast - the illusion of a single, up-to-date copy of the data that everyone interacts with in same consistent timeline.

This as you can see stands in stark contrast to quorum based systems which permit temporary divergence among replicas. In consensus, by design, replicas must never diverge.

Same Word, Different Worlds: Quorum in Replication vs Quorum in Consensus

Both quorum systems and consensus protocols use majorities but they use them for completely different reasons.

To see how, let’s walk through both models step by step.

Quorum in Leaderless Replication

Goal: Keep reads and writes fresh enough, not necessarily identical across replicas.

Step 1. Cluster Setup
Imagine a system with five replicas: A, B, C, D, and E.
The system defines a write quorum (w) of 3 and a read quorum (r) of 3, satisfying the rule w + r > n. This ensures that every successful write and every quorum read overlap on at least one replica.

Step 2. Two Concurrent Writes
Now, two clients perform writes around the same time.
• Client 1 writes the value X, which is stored on replicas A, B, and C.
• Client 2 writes the value Y, which is stored on replicas C, D, and E.

Both operations succeed because each write reached three replicas, enough to form a quorum. However, notice that replica C now has both versions: X and Y.

Step 3. The Overlap
The presence of C in both write quorums is what prevents total divergence. But unlike consensus systems, C has no special coordination role. It simply stores what it receives, perhaps tagging each value with a timestamp. There is no leader, no arbitration, and no rule that forces replicas to agree on which version should “win.”

Step 4. Reading the Data
Later, a client issues a read that queries three replicas, say A, D, and E.
• Replica A still has X.
• Replicas D and E have Y.

The client now sees conflicting versions of the same record.

This can happen in two ways:
1. Read repair: During the read itself, each replica returns its version. The coordinator compares them, notices the mismatch, and initiates a repair on the spot.
2. Anti-entropy repair: A background process (for example, using Merkle-tree comparisons) may later detect inconsistencies even if no client read exposed them.

At this point, the database applies some conflict-resolution strategy: Last-Write-Wins, vector clocks, or CRDT merges. But this resolution happens after the writes have already diverged, not as part of the write itself.

Once a conflict is detected and a winning version is chosen, the system propagates that decision across replicas.
For example, if Y wins under a Last-Write-Wins policy:
• The coordinator (or read-repair process) writes Y back to all replicas.
• Any replica that held X replaces it with Y.
• Nodes like C, which had stored both versions, now keep only the surviving one. The obsolete version is deleted or marked with a tombstone until compaction removes it.
• After repair, all replicas converge to the same state, restoring consistency.

Step 5. The Result
Leaderless quorum systems guarantee overlap but not agreement. They tolerate temporary divergence, ensuring that updates are not lost and that replicas will eventually converge after repair or reconciliation. However, at any given moment, different replicas can legitimately disagree about the latest value and that’s considered acceptable design.

Quorum in Consensus

Goal: Ensure all replicas agree on the same value and on the same order of writes.

Step 1. Cluster Setup
Again, imagine five nodes: A, B, C, D, E. A majority quorum consists of any three nodes.

Step 2. The First Proposal
The leader proposes Value X for log slot #1 and sends it to all nodes.
Replicas A, B, and C acknowledge the proposal, forming a majority. At this point, X is chosen — it is safely committed and will appear in every replica’s history.

Step 3. A Leader Change
Suppose the leader crashes or becomes unreachable.
A new leader is elected and forms its own quorum, for example, C, D, and E and now proposes Value Y for the same slot.

Step 4. Overlap Ensures Continuity
Because both quorums include replica C, the new leader learns that X was already accepted in that position. It must therefore preserve X and cannot replace it with Y. This overlap between majorities guarantees that no two conflicting values can ever both be committed.

Step 5. The Result
Consensus ensures that:
• All nodes eventually agree that X is the decided value for that log position.
• The global history remains single, ordered, and durable.
• Even if leaders change or messages arrive out of order, there is only one truth shared by all replicas.

Why Systems Still Choose Quorum Over Consensus

At this point, you might be wondering:

If consensus prevents conflicts altogether, why would anyone settle for quorum-based systems that allow divergence and require reconciliation later?

It does sound counter-intuitive, resolving conflicts after every write or read seems unproductive and messy. Besides, this temporary divergence comes with a cost in the form of common consistency anomalies, such as:
1. Concurrent write conflicts: Two clients update the same key at the same time; different replicas accept different values, and a read may return either until reconciliation occurs.
2. Stale reads: A read overlaps with a write and queries replicas that haven’t yet applied the latest update.
3. Non-monotonic reads: A client might read a newer value once and an older value later if requests reach different replicas.
4. Lost updates: When conflict resolution (like Last-Write-Wins) overwrites one of two concurrent updates.

But in reality, despite these anomalies, quorum-based designs thrive in scenarios where performance and availability matter more than perfect global agreement. The truth is that quorums are fast, scalable, and remarkably resilient, especially in workloads where conflicts are rare. They allow systems to accept writes locally, tolerate partitions gracefully, and keep latency low even across distant datacentres.

However, most large-scale systems end up using both: quorums for high-throughput data planes and consensus for critical control planes where correctness is non-negotiable. For example, Kafka uses quorum replication for messages while relying on Raft-based KRaft for metadata consensus, and Cassandra uses quorums for data replication, while ZooKeeper (or etcd) provides consensus for cluster coordination.

References