Domain Driven design : Aggregate roots as Elixir Processes/GenServer

knav_negi · March 30, 2017, 7:15am

Listening to Eric Evans podcast on DDD wherein he mentions that actor model is a great fit for modelling Aggregates in domain driven design, got me really curious. After googling, I realized that there are lot of people who are kinda favoring this approach, for example this https://www.infoq.com/news/2013/06/actor-model-ddd and http://pkaczor.blogspot.in/2014/04/reactive-ddd-with-akka.html.

However there are couple of things I couldn’t get my head around:

If we have one process for each aggregate, how does it work out in a multi-node setup (we wouldn’t want to inundate one single node (hosting the aggregate process) with all the requests.
How do we mange life-cycle of all the processes, How do we share load.
I assume a general way of creating a long running process in elixir is creating a gen-server, but my understanding is that we go for gen-server wherein we have some resource to manage, foe example a database connection or a http client.

hubertlepicki · March 30, 2017, 7:39am

Yep, aggregates as GenServers are the way to go.

When it comes to multi-node set up, I don’t really have much experience in that area (elixir is too fast, one node was enough :P), but I assume the aggregates would be unique across cluster. Does not really matter where they would be spawned on, they could register globally and be dispatched commands/events/messages accordingly. Elixir/Erlang has building blocks for that built in.
Life-cycle of all processes - I was using a supervisor with :simple_one_for_one strategy to start up new GenServer instances that were my aggregates. You can suspend those processes or terminate them after some inactivity time.
GenServer’s description from docs states that: “A behaviour module for implementing the server of a client-server relation.”. But this does not mean that it is a meant to implement only stuff like connection clients etc. It rather says that you implement your own servers using it, and interface from other parts of the code. Call this server “actor” or “process”. GenServers are perfect to implement state management, and respond to messages/events etc. They are way to go here I think.

OK , to get you some resources I highly recommend having a look at “Functional Web Development with Elixir, OTP, and Phoenix”, where is basically DDD application style used (without naming it DDD which is OK). https://pragprog.com/book/lhelph/functional-web-development-with-elixir-otp-and-phoenix

Also, have a look at the framework and blog posts here: https://10consulting.com/

and finally do browse this forum for “DDD” and “CQRS”. There have been quite a few topics related to this already here so it can give you some broader perspective Hope it’s helpful.

DianaOlympos · March 30, 2017, 8:28am

My 2 cents : do not try to move your Design in DDD by translating it into code/software constructs directly. It does not work that well.

Try to back out a bit of DDD, and see if you even need it to be a single process, or a process.

DDD is good, there is a lot of nice thing to do with it, but do not drink to much koolaid

bryanhunter · March 30, 2017, 8:53am

Hi! I’ve been using ErlangVM processes as aggregates in my projects for a while (~2010), and I’m a fan of the approach. Here’s a talk I gave on the topic: “CQRS with Erlang” (https://vimeo.com/97318824)

Say you have a Patient aggregate. For each patient instance (e.g. patient #10020) we would spawn a process (GenServer) to represent that particular Patient. The repository you use to load the aggregate instances would be backed by a process registry (such as Ulf Wiger’s gproc https://github.com/uwiger/gproc or Elixir 1.4’s Registry https://hexdocs.pm/elixir/Registry.html).

If requested, aggregate instance (Patient ID 100020) was already running we would simply use it. If it wasn’t loaded we would spawn a new process, register the new process for that ID, load the aggregate’s state into it (from document or an event store), and use it. The key here is there is only one instance of Patient-100020 even if we load it many times. If we then load patient #500010, we would have two running processes (100020, 500010). We code our business logic in our aggregate. When multiple callers want to run commands against Patient-100020 the requests are processed sequentially, and concurrency bugs becomes much easer to avoid. When multiple callers want to run commands against different Patients then it all can happen in parallel. The Erlang VM is beautiful.

For life cycle we could keep the loaded aggregates running in memory forever if memory wasn’t a concern. If memory is a concern, it’s simple to have aggregates time out if they don’t receive a command in X minutes. Another option is to use a MRU list (most recently used) and only keep the hottest 10,000 aggregate instances in memory.

Like with any problem things gets trickier when we use multiple nodes (“fallacies of distributed computing”). If we are OK with the risks of distributed Erlang, then our process registry can work across nodes (via gproc) and we can have one instance of patient-100020 on the whole cluster. Pretty nice. Other ideas are to use a key ring or to piggyback on Riak core (https://github.com/basho/riak_core). Note: If you are considering gproc for this you might want to read Christopher Meiklejohn’s “Erlang gproc Failure Semantics” (https://christophermeiklejohn.com/erlang/2013/06/05/erlang-gproc-failure-semantics.html)

You will find additional resources by searching mixes of these terms Erlang, Elixir, actor, CQRS, Event sourcing, DDD, agent.

Love this thread! Thanks for pitching the question.

knav_negi · March 30, 2017, 9:53am

Thanks for such an elaboration answer I was also thinking of using riak core in multi-node scenario (making aggregate as Vnode), However one thing that I’m not sure of, is where do I handle the persistence, should that logic go inside aggregate or should I have repository for each aggregate root (I think which is the recommended way of doing)

knav_negi · April 10, 2017, 12:00pm

Thanks @bryanhunter for your help, Now I have working application that uses processes for each aggregate root (on the same lines as you suggested), I am using elixir Process Registry to maintain process registration.
Given that this is for fun and learning, I was wondering If I can distribute these actors across multiple node, So far I have been only able to run my application on two nodes, But only one node is being used. How do I distribute these processes in different nodes ?

if anyone interested, code is here: https://github.com/nav301186/rumuk/tree/develop/apps/bhaduli

bryanhunter · August 2, 2017, 3:41pm

Apologies @knav_negi, I just noticed your reply.

If you are still playing with the idea, Paul Schoenfelder’s excellent “swarm” library will help you move from local to distributed registration. https://github.com/bitwalker/swarm