You Can Keep Data Safe Without Keeping It in Order
In distributed systems, it’s easy to blur the line between keeping data safe and keeping data in order. Both sound like the same thing, until you look closely at how systems actually do it.
At the centre of this confusion lies a familiar piece of technology: the log. Specifically, the Write-Ahead Log (WAL). Everyone’s heard of it. Every database has one. But here’s the catch:
A WAL guarantees durability, not consistency.
And that distinction changes everything.
WAL Exists for Durability, Not Consistency
The WAL is a local durability mechanism. Before a database modifies any on-disk structures, it appends the operation to a WAL and flushes it to disk. If the process crashes, it can replay the WAL and recover its last consistent state.
That’s the WAL’s entire job description:
• remember what was committed
• guarantee the durability of a confirmed write
The job description doesn’t say anything about how other replicas see that write. If your cluster has multiple replicas, each node’s WAL may contain the same entries in different orders — or even different entries altogether and that’s perfectly fine for durability.
However, when multiple replicas accept writes independently, each node’s WAL can record operations in a different order. That’s harmless for durability, but fatal for consistency: you can’t build a coherent state if everyone replays events in their own sequence.
To solve that, distributed systems, depending on their design goals, use either reconciliation or total order broadcast (also known as atomic broadcast).
One merges histories; the other decides history.
Total Order Broadcast Is About Agreement, Not Recovery
For some systems diverging histories aren't acceptable. This is because correctness depends on a single, agreed history and not just a matching final state.
The problem now isn't just crash recovery; it's also coordination.
To solve this, distributed systems introduce Total Order Broadcast (TOB), a coordination primitive that guarantees:
1. Every message (or operation) is delivered to all replicas
2. All replicas deliver messages in the same order
3. No replica fabricates or skips messages
In simpler terms, TOB ensures a single shared sequence of events, even if those events were produced concurrently by different nodes.
Different architectures achieve this order in different ways:
• Single-leader systems rely on all writes funneling through one node, which defines the authoritative order for everyone else.
• Consensus protocols like Raft, Paxos, and Zab take a distributed route, running an agreement algorithm to decide both which operations to include and in what order they appear.
In both cases, the ordering doesn’t emerge from the WAL, it comes from coordination. The log only records what happened locally; TOB defines what happened globally.
WALs preserve each node’s past; Total Order Broadcast aligns them into one shared history.
Reconciliation Ensures Eventual Convergence, Not Total Order
In a multi-leader setup, each region or datacentre elects its own leader. Each leader appends writes to its local WAL, then replicates those writes asynchronously to other leaders which are then reconciled for any conflicts through Last-Write-Wins, merge logic, or CRDT-based rules.
In contrast, leaderless systems, as already discussed at length in my last post, use quorums to balance performance and availability, allowing temporary divergence for higher throughput and reconciling differences later for eventual convergence.
In both multi-leader and leaderless systems when conflicts in WALs are reconciled they do not result in Total Order Broadcast. Conflicts are resolved but without enforcing a single, globally agreed sequence of operations.
Reconciliation happens after the fact, when nodes detect they’ve diverged and try to merge differences. This could happen minutes or hours after the writes occurred. Once two nodes have diverged:
• There’s no longer a canonical sequence of events they can both reconstruct without coordination
• The act of merging (e.g., taking the latest timestamp or combining CRDT states) throws away order information by design
That’s why the result might be consistent in value, but not in order — meaning it’s convergent, not totally ordered.
Reconciliation ensures eventual convergence, not total order. TOB guarantees global agreement, not just a matching end state.
WALs preserve each node’s past; reconciliation stitches those pasts into one convergent present.
Conclusion
Most systems can survive just by remembering what happened, others insist on agreeing on how it happened.
That’s the real divide:
• Reconciliation makes peace after divergence.
• Total Order Broadcast prevents divergence in the first place.
You don’t always need order to keep data safe. But when correctness depends on shared history, not just shared outcome, order becomes non-negotiable.