The Illusion of Time in Distributed Systems, Part II: How Systems Keep Order Without Real Time

In the last post, we saw how physical clocks betray us! They drift, skew, jump, and bend, creating logs where effects appear before causes and refunds happen before payments.

So what do we do when time itself can’t be trusted?
We stop measuring time, and start measuring causality.

Second in the series of blogs on The Illusion of Time in Distributed Systems, this blog discusses how distributed systems can recover order from chaos using logical clocks that captures causality.

From Time to Order

Here’s the realisation that changes everything: we don’t actually care about when something happened, we care about what happened before what.

If Server A sends a message to Server B, we know even without any shared clock that “send” happens before “receive”. That’s one invariant distributed systems can rely on.

This “happened-before” relation, usually written as A → B, means A could have influenced B, directly or through a chain of messages. It’s not measured in seconds or milliseconds; it’s measured in dependency.

Logical clocks can capture this notion with simple event counters. Each process keeps its own counter that increases with each event. The count doesn't track real time, it tracks progress.

Lamport Clocks: Order Without Time

Lamport clocks are the simplest way to replace timestamps with pure causality.

Each process keeps a single counter, its own sense of progress, and follows three simple rules:
1. Increment your counter for every local event.
2. Include your counter when sending a message.
3. Update your counter to max(local, received) + 1 when receiving a message.

That’s it. No synchronisation, no drift correction, just consistent ordering.

Step Event P1’s clock P2’s clock Message Explanation
1 P1 does event A 1 0 P1 increments its counter.
2 P1 sends a message 2 0 Sends timestamp = 2 Sending itself is another event, so P1 increments again.
3 P2 receives the message 2 0 -> max(0,2)+1=3 Received timestamp = 2 P2 updates its counter so it’s ahead of the sender.
4 P2 does event B 2 4 P2 increments again for its local event.

Now we can say A -> message -> B, because P1’s event A (1) happens before P2’s event B (4). The numbers don’t represent milliseconds or wall time, only order. And for distributed systems, order is all that really matters.

Vector Clocks: Seeing Concurrency

Lamport clocks are great for telling us what happened before what, but they can’t tell us what happened at the same time.

If two events occur independently, say on different machines with no messages exchanged, Lamport clocks still assign them some arbitrary order. That’s fine for ordering, but not for truth. Distributed systems need to know when things were truly concurrent i.e., when two operations happened independently and neither could have influenced the other. That’s where vector clocks come in.

Instead of keeping just one counter, every process maintains a vector of counters, one for each process in the system. Each process tracks not only its own progress, but what it knows about everyone else.
1. When a process performs a local event, it increments its own counter.
2. When it sends a message, it attaches its entire vector.
3. When another process receives that message, it merges the two vectors component-wise (max of each position) and then increments its own counter.

This way, every process builds a partial view of global causality:
1. A happened before B (causal) if A < B i.e., all elements of A are ≤ corresponding elements of B, and at least one is strictly smaller.
2. A and B are concurrent if A and B incomparable i.e., some elements are smaller, and others are larger.

Example 1: Detecting Causal Order

Step Event P1’s vector P2’s vector Message Explanation
1 P1 does event A [1,0] [0,0] Local event: increment P1’s own entry.
2 P1 sends message to P2 [2,0] [0,0] Sends [2,0] Sending is an event: increment, then attach vector.
3 P2 receives the message [2,0] [0,0] -> max([0,0],[2,0]) = [2,0]-> increment own   -> [2,1] Received [2,0] Merge component-wise max, then increment P2’s own entry.
4 P2 does event B [2,0] [2,2] Local event at P2: increment own entry.

Now we can say A -> B, because [1,0] ≤ [2,2]. That means B causally depends on A i.e., B happened after learning about A.

Example 2: Detecting Concurrency

Step Event P1’s vector P2’s vector Explanation
1 P1 does event a [1,0] [0,0] Local event at P1.
2 P2 does event b [1,0] [0,1] Independent local event at P2 (no messages exchanged).

Here, the vectors [1,0] and [0,1] are incomparable i.e., neither dominates the other. That tells us that a and b are concurrent: they happened independently, without any causal link.

TrueTime: Knowing That You Don’t Know

What if you could measure how uncertain your clock really is?
That’s the idea behind TrueTime, Google’s time API used in Spanner.

Instead of pretending time is a single number, TrueTime treats it as an interval of uncertainty i.e., a range within which the real time might lie:

\(t ∈ [earliest, latest]\)

Every Spanner node constantly synchronises its physical clock using both GPS and atomic clocks. Even then, there’s always some uncertainty, maybe a few milliseconds, maybe tens, depending on network delay and hardware drift.

Rather than hiding that uncertainty, Spanner exposes it.

When a transaction commits, Spanner deliberately waits until its TrueTime interval no longer overlaps with any earlier commit window. Only then can it safely declare that this transaction “happened before” the next.

Example
1. Transaction A commits at 100 ms ± 7 ms. Spanner waits until real time > 107 ms.
2. Then Transaction B starts at 110 ms ± 7 ms.
3. Because the intervals don’t overlap, B is guaranteed to have happened after A globally.

TrueTime doesn’t solve time’s unreliability, it quantifies it. By explicitly acknowledging uncertainty, systems like Spanner can make strong guarantees without pretending clocks are perfect.


References

  1. Database Internals
  2. Designing Data-Intensive Applications
Show Comments