Understanding consistency in distributed systems

1k Views Asked by At

How do you get high-frequency, consistent read/writes on a distributed system? Generally not sure how to conceptualize consistency on large scale systems.

Use case: Prevent a user from performing the same action within a specified time period. In the abuse case, this may be a high frequency operation.

Extended questions: How would I scale this operation up? How do systems like Firestore provide high-availability while also providing consistency? What do Firestore quotas (such as 1 document write per second) tell us about how they may have built their system?

Thanks

2

There are 2 best solutions below

7
Aleksi On

GCP's Firestore uses the same technology as Cloud Spanner to ensure consistency at scale. To learn more about Cloud Spanner and its CAP implications take a look at the introduction here:

In terms of CAP, Spanner claims to be both consistent and highly available despite operating over a wide area [...]. Does this mean that Spanner is a CA system as defined by CAP? [...] The purist answer is “no” because partitions can happen and in fact have happened at Google, and during some partitions, Spanner chooses C and forfeits A. It is technically a CP system. However, no system provides 100% availability, so the pragmatic question is whether or not Spanner delivers availability that is so high that most users don't worry about its outages.

Hence, while technically a CP system, Cloud Spanner (and thus also Firestore) is effectively CAP, as its "5 or more nines" availability guarantee is high enough for most users to ignore outages.

How do systems like Firestore provide high-availability while also providing consistency?

First, Google runs its own private global network for services like these, which means they're able to provide much stronger guarantees as opposed to relying on public networks.

Second, these systems utilize synchronized clocks to ensure consistency. In Google's case that boils down to TrueTime, a globally synchronized, GPS and atomic-clock based clock that provides strong time semantics (bounded uncertainty of 7ms) even for transactions happening on the opposite sides of the globe. Clocks make it possible to replace communication with local computation: instead of node N asking another node M whether some property holds, it can deduce the answer based on some information about M from the past together with the current time on N's clock (Liskov91).

Cloud Spanner depends on TrueTime to generate monotonically increasing timestamps. Cloud Spanner uses these timestamps in two ways. First, it uses them as proper timestamps for write transactions without the need for global communication. Second, it uses them as timestamps for strong reads, which enables strong reads to execute in one round of communication, even strong reads that span multiple servers. source

For further theory on how clocks help distributed system design, see Liskov's paper here. For more on Cloud Spanner, I highly recommend these summaries on the original Spanner paper as well as the follow-up paper.

Update: The good news is that you don't need atomic clocks, GPS and private global networks to ensure consistency and high availability. The open-source Spanner-inspired CockroachDB achieves much the same as its predecessor, although in lieu of TrueTime's strong time certainty it has to rely on more coarse-grained and less efficient synchronization as outlined in this fantastic comparison:

A simple statement of the contrast between Spanner and CockroachDB would be: Spanner always waits on writes for a short interval, whereas CockroachDB sometimes waits on reads for a longer interval.

0
fionbio On

How do you get high-frequency, consistent read/writes on a distributed system?

So some notions that stem from unavoidable facts of life:

  • Your work may exceed the capability of a single machine so you split up your work "sharding"
  • A single machine may die, so writes must go to more than a single place before being acknowledged

Most strongly consistent systems are single leader. Remember raft/paxos are single leader systems.

What often happens is your write goes to the leader, the leader synchronously replicates to 1 or more replicas, and then responds to you upon success. Your reads can be dependent on whether you are willing to accept out-of-date data or not (read from master, or read from replica).

Some examples of single leader systems (and/or within a key range "shard"):

  • BigTable/HBase
  • DynamoDB (Paxos)
  • Datomic
  • Kafka
  • MySQL/Postgres
  • Cassandra LWT (Paxos)
  • Yugabyte (Raft)
  • Manhattan/MemoryDB (Ordered distributed logs (single machine))
  • TiDB (Raft)

Other:

The answer then is to split up the work and ask the single leader. Failover (read "leader election") is pretty fast in the common case.

How do systems like Firestore provide high-availability while also providing consistency?

Single leader with sharding (auto sharded on BigTable or Colossus).

There is a lot more to say here. Perhaps I'll expand this answer later.