Designing for Failure

Hello.

I’m currently reading the beta version of Functional Web Development with OTP and Phoenix and I love the way that the book uses OTP to store state (using Agents and GenServer). I’m thinking about how I could use this technique in my next project but also thinking about failure and recovery (which isn’t in the book yet as it hasn’t been written :smile:)

As an example of the approach I’m thinking of - assume I have an Order which I’d like to model using a process BUT also when details of the order change I’d also want to persist to the database. If one of the attributes I persist to the database is the PID of the process that models the order then IF that process crashes the Supervisor for that process can simply read the last state from the database and recreate the process.

Does this make sense or am I overcomplicating my design. I appreciate that you shouldn’t over-use processes and that the standard way would be to simply use the database for everything but I’m intrigued by the use of process to model state.

I’d love to hear people’s views on the above approach.

many thanks (and have a great day).

Dave

Might not be particularly relevant but have you seen this talk https://www.youtube.com/watch?v=fkDhU-2NWJ8? I remember them using something similar to the approach you are describing.

I did watch that that talk which is what got me thinking about this issue in the first place. Then I came across the Functional Web Development book and that REALLY got me thinking :smile:

It seems I’m not the only one as there is this thread that I’ve just come across…

I guess my question is more about IF I did follow this approach would the “store PID” approach be a reasonable one.

cheers

Dave

I don’t know. I would probably store the last good state of the crashed process in a ets table (in terminate callback if it’s a genserver) and have the supervisor read it from there.

Thanks for the suggestion. I thought ETS tables “disappeared” when the associated process died or am I completely wrong?

You are right. But I would suggest the ets table keeping the state of the crashed process to not be associated with the process itself, but with its supervisor or maybe even with a completely separate process.

I would use a :unique Elixir Registry for keeping track of my entities based on their ID.
This way I don’t have to juggle PIDs around.

I would not store the PID into the database, only the ID and state of the entity.

Only if there is no heir process. You can tell ets that when its parent table dies then pass it to its heir process, which gets a message alerting them (so that they can re-create its sibling and give it ets back to repeat the process again for example).

1 Like

Thanks kwando - I didn’t think about Elixir Registry - great idea.

Ah thank you OvermindDL1.