These are all questions I have from reading through the raft paper.
Once commands are properly replicated, each server’s
state machine processes them in log order, and the out-
puts are returned to clients. As a result, the servers appear
to form a single, highly reliable state machine
Does this mean that a state machine that is replicated by raft has to be deterministic. i.e. make no calls to random number generation etc?
This sounds like it could be a source of interesting bug in an Elixir implementation. Because Elixir has no guarantees about a given function being pure then command order could be agreed but each node would end up with a different state.
The leader accepts
log entries from clients, replicates them on other servers,
and tells servers when it is safe to apply log entries to
their state machines
So code the user of raft has to take care that any side effects that are a result of running the statemachine can safely be applied
n times where n is the number or nodes.
What if deriving the new state of the statemachine is work intensive? is there anywhere to share the new state without having each node recompute?
Yes, I believe you are correct: Commands are things that alter the ‘state machine’ that is present on all nodes, and these are not supposed to perform (side-) effects. You can definitely let them perform effects, but you need to keep in mind that these will be executed once on each node, and that the order of these executions (does node A perform the effect X before node B peforms effect X or after?) cannot be known. (And when you call a RNG on two nodes, you probably end up with two different numbers. So probably you want to come up with the RNG before you create the command. )
The idea of the commands (/log entries) is that these are small w. r. t. the state. If recomputing the new state is more intensive than sharing the new state, then you might create a command that just shares this new state.
But beware! How are you sure that new state is a correct result of the commands that have been put in by users calling the nodes? This is exactly the reason why every node applies the commands in order to the contained state; in this way every node can independently verify each of the computational steps.
But no verification actually happens because they do not actually check each others state. i.e. there’s no crash if two nodes do come up with different states?
All nodes that run the same software will agree upon the same ordering and contents of the command log. Because each does the same state transformations from the same empty ‘start’ state, they will all end up in the same state.
There is no crash when two nodes come up with a different state, but we do now have two different ‘views of reality’ which might be a problem when you try to use the result of this distributed state in some other system and connect to different nodes (with different states) at different times. Unless you are conscious that these states might be in conflict, the apps that use the output of this system might misbehave in peculiar ways.
An example would be a system that keeps track of payments: If there is a confict(AKA a fork in the network), in one state I might still have enough money to pay someone, while in another I might have already spent that money.
This stuff is not specific to Raft by the way, but appears in all Byzantine Consensus Algorithms.
I thought we were saying that they would not, if running code that made use of random
When grabbing a random number during the execution of a command, two nodes running the same software will indeed end up in different states. This is probably a bad idea because it would not be useful to have different random numbers on every node.
So to let me rephrase my earlier answer:
Performing output effects after updating the state should be fine, although when they execute cannot be known. But performing input effects (a random number, looking at the nodes environment like the filesystem, asking the user for input etc.) that affect how the state is updated is a really bad idea since the states stored in the nodes will no longer be the same. Input effects should/could be used to generate the commands, but applying the commands to the current state must be pure.
Ignoring Byzantine failures this is what the raft protocol is designed to solve. It provides consensus about the log that is replicated on all nodes. While the log can diverge during times of netsplits user state machines and thus clients will always see the agreed upon state of the world.