Durable, globally unique processes (Entites) are created by controlling side effects.
(In short side effects are deferred until the state change of the entity has been committed to storage).
The purpose of an entity is to handle data that must be strongly consistent within a given execution context.
defmodule MyApp.Counter do
@behaviour Pachyderm.Entity
alias MyApp.Counter.{Increment, ...}
alias MyApp.Counter.{Increased, ...}
def init() do
%{count: 0}
end
def handle(%Increment{}, %{count: count}) do
events = [%Increased{amount: 1}]
if count == 9 do
effects = [{MyApp.AdminMailer, %{threshold: 10}}]
{:ok, {events, effects}}
else
{:ok, events}
end
end
def update(%Increased{amount: amount}, state = %{count: current}) do
%{state | count: current + amount}
end
end
Calling Entities
type = MyApp.Counter
id = UUID.uuid4()
reference = {type, id}
{:ok, state} = Pachyderm.call(reference, %Increment{})
# => {:ok, %{count: 1}}
A detailed discussion of why this API was chosen and further examples is available in the project README.
Why
The single global process is very easy to reason about, but has many pitfalls. The dangers of the single global process.
Pachyderm tries to rescue this simple model by offering a standard way of managing some of the pitfalls.
Prior Art
The description of this project is quite difficult because there are lots of related but different approaches to this problem.
What is your approach in the face of netsplits?
EDIT: I’ve read through the README, and it seems, if I interpret the explanation in the deferred side-effects section correctly, that Pachyderm defers netsplit handling to the storage layer: The first event to be ‘committed to storage’ wins, and at that time other events that were based on a stale state will be dropped (and you can possibly retry them manually).
‘Commited to storage’ is not a very clear thing to say. What is the storage? What does it mean to be commited to it?
If the storage is distributed, then multiple nodes might both save things thinking they are the first. On the other hand, if the storage is only in one single place, you create a single point of failure and a bottleneck (since all events go through this storage).
It says it is based on Orleans so hopefully a safe assumption that it does storage similarly. If this is the case then it writes an etag with the state of a grain to storage when it saves state and uses the etag it read in on start to compare against the one currently in the store when doing the write. if the etags don’t match then it fails.
Committed to storage depends on the storage being used. At the moment it means that a Database transaction has completed.
However an append only log is something that can be implemented multiple ways, it could be written to kafka for example.
If the storage is distributed, then multiple nodes might both save things thinking they are the first
That would be a bug in the append only log. The log must offer a strong consistency API, and only consider the write to have succeeded once they are sure that only their write is definitely the next entry in the log.
Would it be correct to say that ‘the database’, ‘Kafka’ or whatever other thing that manages the append-only log is a bottleneck and failure-point for the Pachyderm system? And that Pachyderm therefore scales as well as the technology used to manage the append-only log can scale?
Yes, as much as any coordination point for strong consistency MUST BE (by definition) a single point of failure. Pachyderm’s goal is for easy to work with strong consistency.
There are no consistency guarantees between entities. Pachyderm run each entity concurrently and so is concurrent to the maximum possible degree. There is nothing stopping a log also being fully partitioned by entity.
https://github.com/rabbitmq/ra implements Raft for consensus and allows defining side-effects that run on the leader only. Using it with disk_log as storage layer for an event sourced system
Ra looks interesting. At first I thought It might be cool to make an alternative Log storage. for Pachyderm using it.
However it could be even more useful, because it handles side effects as data in much the same way as Pachyderm. Just finished watching this talk. Which is really informative.
Instead of presenting virtual actors as actors would it be actually make more sense to create virtual actors similarly how dapr.io is doing it by treating actor calls basically as web requests. This way even reentrancy could be supported in Elixir for virtual actors.