What concurrency issues does using Event Sourcing expose me to?

makeitrein · April 18, 2020, 9:15pm

Happy Saturday, Elixir Forum - I’m exploring the Commanded event sourcing library for a potential weekend project. A coworker of mine mentioned that a common concurrency issue in event sourcing is having command handlers write multiple events to a stream when only one written event is valid.

For example, two “close bank account” commands are fired off at the same time, and the handler processes both. Thus, the handler writes two “closed bank accounts” events to the event stream. Only one “closed bank accounts” event should have been written.

Our discussion made me realize that I might not know what I’m getting into when using Event Sourcing - are there other potential concurrency problems that should I be wary of? I see that Commanded has a section related to “Command dispatch consistency guarantee” - does strong consistency protect against common concurrency problems?

bottlenecked · April 18, 2020, 10:03pm

Hi, not sure how exactly Commanded is implementing aggregates, but I’d expect they are using processes- which means that barring some grievous design error (either on their part or yours) that should not be a situation you will ever find yourself in, since processes can only ever process one message at a time. So in your example even if two ‘close account’ commands reach your process, the second one should fail the validation because the first should have been fully processed (command -> validation -> domain event -> apply -> store and publish)* before ever the next message is consumed from the process’s inbox.

Having said that, be sure to understand the tradeoffs when going the ES way for some part of your system- unless you really need it, you may be trading too much development effort for benefits you’re not using

*I may be getting the order wrong because I’m not familiar with the Commanded’s flavor of ES

benwilson512 · April 18, 2020, 11:02pm

Command handlers dont get to write to the event stream, they aren’t allowed. Command handlers have to ask an aggregate to actually do the command, and those aggregates serialize commands, meaning they run just one at a time. If a particular event sourcing library can’t guarantee serialized command handling for an aggregate, then it is a seriously flawed library. To my knowledge, Commanded does all of the right things there.

This does of course mean you need to choose your aggregates wisely, and understand how to run operations that involve multiple aggregates. To use your example, a bank account is a fine choice of aggregate, and the aggregate’s job will be to ensure that things like double closes don’t happen. Where things get tricky is something like a transer, wherein you want to move some funds from one aggregate to another. This tends to push things in a “TransferRequested” “TransferAccepted” style event log, which I actually think is pretty solid.

I’m not sure that these are common. If a library says it’s doing event sourcing, but it doesn’t provide a way to do serialized event handling around specific topics or aggregates, I’m not really sure it’s doing event sourcing. Certainly not CQRS.

madlep · April 19, 2020, 9:33pm

The serialisation of data writes you raise should be fine, as others have mentioned.

The big concurrency issue that trips people up though is eventual consistency. if you’re writing a web front end for a todo list (as an example), and the user adds a new todo, there may be a lag in between the time the event is written, and an event handler running to build a projection for the front end to query. That means the user might not see the new todo appear when they click “save” - which can be weird UX. Commanded has ways to specify how that is handled, but it means you need to think about that up front to understand the consistency/performance trade offs.

madlep · April 19, 2020, 9:48pm

Where “should be fine” is dependent on how you’ve modelled your aggregates and validation and events and things.

Event sourcing gives you a lot of tools for handling those scenarios, but it means you need to understand your data and your business logic. It often means more work and more moving pieces than an equivalent application written as a regular CRUD app. For a lot of apps though, that trade off makes sense.

Particularly anything that needs an audit trail or guarantees around what is changed and how and when. If you’re dealing with money, you might want event sourcing. If writing a blog or a todo list, you might not.

makeitrein · April 21, 2020, 2:49am

Thx @bottlenecked, @benwilson512, and @madlep - good orientation before I dive too deep into code land.

Definitely recognize that ES requires a fairly high degree of data modeling and coding precision that punishes slip-ups… but I think I’m willing to pay that cost (famous last words).

I’d like to build undo and history functionality for a particular model, and event sourcing seems like a reasonable paradigm to support this. Most of the other attempts that I’ve had in the past coding this functionality always felt a bit janky - here’s hoping that ES will provide a better path forward.

Still deciding on whether to ES the entire application, or just the areas that need it. It feels like an all-or-nothing proposition, and it’d be a bit awkward to mix multiple styles at once.

bottlenecked · April 21, 2020, 6:14am

Hey there, if you have a good case for it then by all means go ahead- the warning was to make someone that thinks ES ‘is cool’ rethink their decision carefully.

As for the all in- there are many Greg Young videos talking about ES, and in one or two of those he warns ‘please don’t tell me you’ve built an event sourced system’. His argument is that there is probably just one part of a system that needs it, and it’s OK to pay the cost of admission there- otherwise you will be entering a world of unnecessary complexity.

So… be careful

domvas · April 21, 2020, 7:18am

I ask myself the same question few months ago about doing an all ES application, because I wanted one style for all the system and I really wanted to some ES / CQRS stuff with append- only store and everything. And the more I thought about my project and the more I learn about ES/CQRS, the more I started to feel that having ES everywhere was not a good solution. But I wanted to do it anyway! Then I followed the commanded tutorial (which I highly recommend you to do) and well it opens my eyes: for a lot of part of my project, the amount of work will be huge for not so much benefits and with eventual consistency on top.
At the end, most of the project is classic and one part is some kind of homemade ES, homemade because I don’t need the full power of commanded for now.
And I learn another thing: RGPD destroy ES (and statistics) (if you really comply to rgpd…)

Good luck with your project, ES is great!

andrejsm · April 21, 2020, 7:51am

Did you mean GDPR?

domvas · April 21, 2020, 8:31am

Yep, I mean GDPR but I used the french term, sorry for that…

saverio-kantox · April 24, 2020, 11:38am

GDPR does not destroy EventSourcing. You just need to source separately the business data (which is not strictly tied to a person) and the personal data (which you can put in separate silos and clean up when needed).

I’m not saying it’s easy, but it’s doable without affecting the event-sourced aggregations.

domvas · April 25, 2020, 8:35am

I think you’re right. I’m just extreme in my view of what a personal data is.
The law states that:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

Just with the economic, cultural or social identity, almost everything on the internet related to persons is personal data. This means your cart is personal data, the statistic where you can be found is personal data, etc.
Most of the time, companies say that just removing name, email and such things are enough. I don’t think so because you can also identify someone by habits, contacts, etc.
Again, I may be a bit extreme but it could be really nice that if you want do disappear from somewhere, you disappear completely, without any trails, and with ES where you don’t delete events, it can be harder because there will always be a link to someone at some point, will it be or not.
But I may mistaken myself also

Qqwy · April 25, 2020, 8:50am

Yes, if you are following Event Sourcing to the letter, then you are not allowed to ever remove data, which means you cannot adhere to the GDPR.

In practice this is however mainly a problem when you are using a system whose events depend on the previous events (such as a distributed ledger/blockchain which uses a merkle tree to store events, making it impossible to alter or destroy any intermediate events).
If you use event sourcing outside of these systems, you will be fine, as long as you keep in mind that certain parts of earlier events might be altered to shield the privacy of individuals.

wolf4earth · April 26, 2020, 7:47am

I’ve worked on an event sourced system in the times of GDPR. In our case we referenced a user in events by their user id (which was a randomly generated UUID) and kept personal data (email, name, address etc.) In a classic CRUD-like table which was not eventsourced.

When a user requested to be deleted we simply deleted the relevant row in said table which made it impossible to relate the users actions to them.

This was enough to fulfill the GDPR requirements, at least according to our data protection officer who really knew their stuff.

dimitarvp · May 3, 2020, 10:11am

Well, you can just use ex_audit only for the table where you want history of changes.

slashdotdash · May 3, 2020, 7:40pm

Commanded uses a GenServer process for each aggregate instance so that commands for the same bank account will be handled serially. As long as you guard against closing an already closed account in the command handler you will be fine.

One caveat is if you have multiple nodes hosting the application and you do not use distributed Erlang. In this scenario you could have two instances of the same bank account process running on two different nodes. If the same command was being processed concurrently on both nodes then they would attempt to write the same account closed event to the event store. To protect against this issue the event store uses optimistic concurrency when appending events to each stream. This ensures that the first write will succeed and the second write will be rejected since there is a new event in the stream. Commanded will apply the new event to the aggregate and retry the command which will now fail since the account has already been closed. You could also include the current version of the aggregate in every dispatched command and have it be rejected if the aggregate’s actual version when processing the command doesn’t match.

GDPR compliance with regards to PII (personally identifiable information) data can usually be solved in one of three different ways in an event sourced system:

“Crypto-shredding” where PII data is encrypted in events and the encryption key is thrown away to prevent read access to the data.
Store PII in a separate mutable data store which allows modification and deletion.
Allow events and/or streams to be modified (so not an immutable event store).

See https://github.com/commanded/recipes/issues/4

slashdotdash · May 3, 2020, 7:50pm

Building an application which is fully event sourced is usually a bad idea, unless your intention is to learn the concepts involved. Event sourcing comes with some trade-offs such as accepting eventual consistency and requires more investment in modelling your domain over a typical CRUD based application. Therefore it’s better to use event sourcing where it is well suited: temporal models, complex business rules, auditability, etc.

It’s perfectly acceptable to mix CRUD and event sourcing within an application, but I’d recommend using a single style within each context.