Can someone explain CAP simply?

deployment

#1

So I am familiar with the theorem, but every example I read always just feels vague about what exactly a partition is or (practically) what it means if something is available, etc. It’s like everything I read uses the terms they’re trying to explain to describe itself.

Specifically I’m wondering the difference between the following in terms of the databases that fall into the category (like I’m not sure where Mnesia sits)…

  • AP
  • AC
  • CP

(in real terms, like what might I expect under different conditions? whats a real world scenario where something can happen on one system but not another)?


#2

I did a video on my channel about it last week https://youtu.be/4cLY8gNGzbo


#3

even though i studied systems I am no expert by any means, but I will give it a shot

what exactly a partition

partition in cap theorem does not refer to database partition or hard drive partition it is network partition imagine you have 10 node cluster and 2 nodes which are no longer able to connect to other nodes for some reason so that is partition in simple terms.

  • AP -> a system which is available in presence of network partition but won’t be consistent
  • AC -> a system which is consistent and available in presence of network partition as long as there is no network partition
  • CP -> a system which is consistent but might not always be available

#4

Shouldn’t that be “consistent and available as long as there is no network partition”?
As far as I remember, the CAP theorem can be boiled down to the choice between consistency and availability in presence of a network partition.


#5

Yes, it should.

Maybe not quite helpful, but I find this picture funny (from https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/)


#6

That is indeed funny! :smile:


#7

My explanation would be as follows:

Whenever a network partition occurs, you have to make a choice how to handle this situation:

  1. do you wait for a connection to be re-established with the majority of the nodes, to make sure everyone agrees about the state of the system (This is consistency over availability AKA CP, since the feature is not available until connection with the majority of the network is re-established).
  2. do you ‘just do something’ and later broadcast the results to the other nodes in the network once connection is re-established? (This is availability over consistency AKA AP, since conflicting changes might happen in different parts (partitions) of the un-connected network, in which case only one of the conflicting changes will ‘win’ in the long run).

There is no way to pick both, although you can decide, on a per-feature basis, which one of these two you’d like to use. (For instance: payments need to be CP, but altering your user profile could be done AP because when such a change is reverted it is not problematic).

Side note: A ‘CA’-system does not mean anything: It is a system that will completely fall over whenever a network partition happens, Both AP and CP systems are consistent and available as long as there is no network partition anyway.

Interestingly, CRDTs (conflict-free replicated datatypes) are a way to make an ‘AP’ system very close to a CP one, since you make sure from beforehand that it will always be possible to combine data without having to throw anything away (because there will not be a conflict).

Also, there exist a notion of eventual consistency, which is a property many AP systems have, stating that if at some point in the future connection between all parts of the network is re-established, then everyone will agree on a single consistent state again. Many real-word systems work like this; for instance, the Bitcoin blockchain and many similar distributed ledgers are AP with eventual consistency.


#8

indeed it should be “no network partition” made a typo


#9

Also, do note that you need to pay attention to algorithm an eventually consistency implementation uses to reconcile conflicting changes, many times it is something very simple like “last write wins”, in which case some of your user data is being thrown away (maybe even silently).


#10

awesome explanations guys, thank you. @Qqwy very well said. that makes sense