CQRS with Commanded - Return directly from aggregate

jdumont · June 18, 2019, 1:29pm

Afternoon all, I’m back with another ES/CQRS question that’s perhaps more philosophical (for lack of a better word) than technical.

Event-sourcing with Elixir and @slashdotdash 's Commanded actually seems a little different than it is in other languages thanks to the BEAM. Each of the aggregates in our app actually exist as an individual process, and they hang around until they are stopped.

I have an aggregate in my app that builds workouts — I’m back on this CrossFit app — and whilst they are being created they exist as a process and my projection in the db is effectively a mirror, just with some computed fields added which aren’t relevant during the workouts creation. Once the status of a workout changes from "draft" to "published" I’ll be stopping the aggregate as-per Commanded’s docs and the latest post from Bruno, and just the projection of it should suffice.

However, whilst I’m building the workout, it would be really handy to just return the current state of it from the aggregate, rather than the projection. My projection literally just copies out the aggregates state and I don’t need to rely on the projection for unique validations, so I don’t see a huge difference in just returning the aggregate rather than the projection. It should be faster as eventual-consistency wouldn’t be an issue and there should be no issues with validating commands as I’m quite literally working from the canonical source of truth.

This is why my question is more philosophical than technical. What I’ve just described goes against CQRS as I understand it, but it seems to make more logical sense?

jdumont · June 18, 2019, 2:03pm

I’m going to go ahead and answer myself after half an hour playing around with this idea…I shouldn’t / can’t return the state of the aggregate directly. Commanded doesn’t seem to allow for it, which must be for good reason.

This is why I’ve enjoyed working with CQRS so far, whilst there’s a lot more procedure, it keeps me from shooting myself in the foot!

slashdotdash · June 18, 2019, 2:05pm

Commanded allows you to directly access an aggregate’s state via the undocumented Commanded.Aggregates.Aggregate.aggregate_state/2 function:

alias Commanded.Aggregates.Aggregate

%BankAccount{..} = Aggregate.aggregate_state(BankAccount, account_number)

The reason why this isn’t exposed publicly is because an aggregate’s state should be an internal implementation detail. Instead, CQRS promotes the concept of two models: one for writes (aggregates) and one for reads (projections). Commands are handled by the write model and queries are handled by the read model.

In your example you have noticed that the projection (read model) and aggregate state (write model) are very similar, so it might save you some time and effort to use the aggregate state instead of creating a separate read model. This works until the needs of the two models starts to diverge, due to differing data access patterns. At which point you either migrate to a separate read model, or extend the aggregate state to store data which isn’t necessary for command handling. Investing the effort from the outset by implementing the two separate models should payoff as your model becomes richer in behaviour.

jdumont · June 18, 2019, 2:24pm

Hey Ben, thanks for explaining that. I figured there’d be an undocumented way of doing this, but when I couldn’t see anything obvious and documented I figured there was probably a good reason for it.

What I want to do it actually very similar to one of my first attempts at this project using just a plain GenServer. It worked well —which gives me confidence in this approach — but I struggled to map the in-memory structure to the DB. I don’t have that problem any more now that I’ve moved towards denormalisation and multiple read models rather than a cleanly normalised database, plus I understand the domain much better.

It makes sense that I should use the proper read model from the off — as I’ve already built it that way —and as you say it protects me from diverging needs.

benwilson512 · June 18, 2019, 2:26pm

I think @slashdotdash 's explanation does a good job of noting why this is at odds with the traditional CQRS mantra of having distinct read and write models.

The approach I took with https://github.com/cargosense/fable leans slightly more towards what you’re asking for by eschewing CQRS in favor of a simpler implementation of Event Sourcing.

Basically you have database table, say “shipments”, and Fable guarantees that each event you emit for a given shipment row is handled serially by that shipment row. There may well be other database tables that are affected by that same event (containers or products in that shipment) and in that way the “shipments” table acts a lot like an aggregate. All changes to the shipments table and the containers table should happen because you cut an event, and then the event handler for that event makes changes to the various tables based on the event.

However with Fable as written today you still just read from the database normally when it comes to querying data, and it doesn’t push you to maintain a separate read vs write model. This is either a feature or a big deficiency depending on how wedded you are to CQRS.

Sadly Fable is still alpha stage and missing any useful documentation Nonetheless I think there’s a “market” for a middle ground for a more minimalist library that supplies event sourcing guarantees without requiring buy in to the entire CQRS world view.

jdumont · June 18, 2019, 2:49pm

Thanks for the reply, I’ll certainly take a look at Fable. I do agree that there’s an opportunity for an ES-lite option that doesn’t require a lot of the additional complexity that CQRS brings.

In my case I have tried to navigate that middle ground as there’s certainly bits of my app that don’t benefit from CQRS and even those that do wouldn’t be impossible without it. However, I’ve relented and gone “full ES/CQRS” just because it’s been enough of a stretch to understand the concepts without trying to work out which parts I need and which I don’t. I know that’s a terrible reason to adopt an architecture, but in the interest of flattening out my learning curve I thought it was a good idea.

My interest in the whole topic stemmed from the functional nature of event sourcing and how it suited the later parts of my app which are very easy to think about as a series of events (athletes completing workouts and movements). I found that without ES the app started taking a distinctly OO turn, starting to rely on lots of relations in the DB and even things that looked like class inheritance.

Event sourcing is a very simple pattern to understand and use, but I found that it was the persistence of the resulting state that complicated things. I like the idea of Fable and it certainly looks like it would be a good middle ground where the two models don’t need to be drastically different.

jdumont · June 19, 2019, 1:40pm

@slashdotdash Just one further question if I may?

In my Workout aggregate I have a field called elements which is a list of maps, and in the projection is of type {:array, :map} too.

At the moment I’ve been populating the list in the aggregate with a simple

workout.elements ++ [elements]

to append new ones, and using

Ecto.Multi.update_all(multi, :element, workout_query(workout_uuid), push: [elements: new_element])

for the projector.

That works OK, but removing specific elements using an element_uuid key in the map is problematic in Ecto thanks to using the {:array, :map} type. Removing the elements in the aggregate is simple: it’s just an Enum.reject().

Getting the point, would you consider removing the element in the command and storing the updated_elements in the event a pattern to avoid? My thinking is that the logic is isolated to one place, rather than implemented two different ways in the aggregate and the projector, and I could just use the JSONB field type for Ecto and forego a lot of messing about as the projector update would simply become:

project %ElementRemoved{workout_uuid: workout_uuid, updated_elements: updated_elements} do
  Ecto.Multi.update_all(multi, :element_removal, workout_query(workout_uuid), set: [elements: updated_elements])
end

To my mind that it simpler, and removes the potential for errors to creep in between two different implementations. The tradeoff is loading each ElementRemoved event with additional data that isn’t strictly required to record the change in state.

Am I just trying to shoot myself in the foot yet again?

slashdotdash · June 19, 2019, 2:00pm

It really depends on how many elements are going to be stored for each workout.

If it’s not too many then the solution you propose would be acceptable. Whereas if you expect many elements per workout then I’d suggest storing them in a separate table which would allow you to easily delete a removed element by its identity using Ecto.Multi.html.delete_all/4.

jdumont · June 19, 2019, 2:10pm

Hmm, it’s never going to be a huge number, but the volume of data contained could be significant if everything inside is heavily denormalised.

I just moved the elements out of their own table and into the {:array, :map} because not all elements shared the same fields. Having everything nicely normalised is what prompted my descent into class inheritance previously…

That said, there’s nothing to stop me from keeping a separate table where the actual data is still a blob but I’ve got the element_uuid and workout_uuid fields to help with joins. I guess this is a perfect example of your above point where the read and write models can diverge and be structured differently!

I’ll have a think on how I want to proceed. I did like the large amount of denormalisation I had planned, but in this case is seems some normalisation would really help, even if it introduces more complexity in the querying.

slashdotdash · June 19, 2019, 2:24pm

Using a separate elements table with a jsonb column would be a hybrid approach that might work (basically using Postgres as a document DB!).