The Illusion of Time in Distributed Systems, Part III: Outsmarting Time With Tokens

In the previous post, we saw how distributed systems escape the chaos of unreliable clocks by using logical time to measure causality instead of seconds.

But there’s one domain where time keeps sneaking back in i.e., Coordination.

And as already explained in Part I of this series, coordination in distributed systems is built on leases, locks, and heartbeats, mechanisms that rely on measuring durations correctly. But this assumption falls apart when time doesn’t flow evenly i.e., when processes pause, VMs freeze, or code simply stops running while the kernel’s monotonic clock marches on.

The Temptation of Time-Based Leases

A familiar coordination pattern seems innocent enough:
"Grant this node exclusive access to resource X for 10 seconds."

The reasoning looks sound:
1. The node renews the lease periodically.
2. If it crashes, the lease expires.
3. Another node can safely take over.

But this safety hides a fragile assumption i.e., that every participant experiences ten seconds in roughly the same way.

When Time Runs at Different Speeds

Imagine two nodes sharing a resource, say, a distributed lock guarding a database row.

Real Time Node A Node B
0 s Acquires 10 s lease
+8 s Suspended for 2 s (GC pause) Still running
+10 s Resumes; believes 2 s remain Believes lease expired and acquires new lease
+10 s (real) Both think they hold the lock Dual ownership

A few lost seconds two nodes now share one resource. If they write to the same data, corruption follows.

Fencing Tokens: Outsmarting Time

The fix is elegantly simple: stop trusting the client’s perception of time, and let the server decide what’s current.

When a lease is granted, the coordinator (like ZooKeeper or etcd) attaches a fencing token, a monotonically increasing number. Each time a lease is issued or renewed, token is incremented.

Every request to the protected resource must include its token. The resource checks: Is this token newer than the last one I accepted? If not, it rejects the request, even if the client still believes its lease is valid.

Order is enforced not by time, but by issuance.

Step Action Fencing Token Outcome
1 Node A acquires lock 42
2 Node B acquires lock later 43
3 A (still paused or out-of-sync) sends write with token 42 Rejected — older token
4 B writes with token 43 Accepted

Conclusion

Time-based coordination feels simple but hides dangerous assumptions:
• Clocks may be correct, but execution isn’t continuous.
• Network delays turn renewals into false expiries.
• Crashed or paused nodes can replay outdated requests.

Fencing tokens shift authority from the clock to the coordinator.

And with this we conclude our series on The Illusion of Time in Distributed Systems.

In conclusion, distributed systems survive unreliable, uneven time not by trusting time, but by outsmarting it i.e., by preserving order through causality and fencing against discontinuities in time.


References

  1. Database Internals
  2. Designing Data-Intensive Applications
Show Comments