In-memory SAAS users account system - design questions

hubertlepicki · September 13, 2017, 10:09am

As an exercise in OTP for myself & my team, I was building an in-memory user accounts system. I am roughly following @lance’s path laid out in his book.

Each Account is a GenServer. The primary reason behind this is to enforce that the operations and state transitions between “unregistered”, “unconfirmed”, “active”, “deactivated” states are wrapped in application-level transaction and are no race conditions such as 2 users registered with the same email because double form submit etc. This is enforced by simple state machine module that guards the transition rules.

What I am doing is that I start an AccountsSupervisor with :simple_one_for_one strategy. It is responsible for starting new Account servers.

I want to be able to find each account either by ID, or by e-mail. Both are unique. So I created 2 unique registries: AccountsByEmail and AccountnsById. Whenever I start server for particular account, in it’s init function it tries to register in both registries. If this fails at any point - this likely means server is already started for particular account. When I attempt to log user in, I use registry AccountsByEmail, when I have account ID - the other one.

I have a few questions to the above:

Is my use of Registry as node-wide unique index good? I am especially concerned by the fact that I had to create 2 indices because I am looking the Accounts up in 2 different ways.
I am designing the persistence mechanism for this now. What I think is that along the Account, I will start an AccountDiskWriter or something similar - another process that would be a GenServer,accepting async casts whenever Account changes. It would serialize these changes and write to disk (most likely just insert to DETS).
I need a way to bring everything up on the server start up. I was thinking about simply going through all my Accounts from DETS tables, and restoring it one by one on system start up. Does this make sense?
Where do I load the saved state of the GenServer in case of restart by supervisor, or in case of 3)? I am having some trouble figuring out it. As far as I understand the init function is blocking the parent Supervisor until it returns, so it’ll become a bottleneck if I put the disk reads there. As alternative, I could send myself a message from the init function, then handle it in handle_cast, where I would read the state from DETS, and only then register to the both of my Registries. I am interested, however, from my parent process (like web worker) to know if the Account was started & restored properly. So in such solution I no longer can rely on the returned value of init function, as the account will likely to start properly always - it will fail later in it’s life cycle when it restores state. So I have to block waiting to see if it appeared in Registry, possibly with some timeout. Is this correct solution? I suspect there may be simpler one.

mkaszubowski · September 13, 2017, 10:52am

Here are my thoughts on 4):

Why do you want to read the state from dets and only then register the process? I guess you can reverse the order, but maybe I am missing something.

For the rest, you’re correct. init will block the supervisor, so it may slow the system down.

One solution would be to send the message to self() and load the state in the separate handle_info/handle_cast. To be able to detect that the process is ready, you can use a function like this:

def start(...) do
  {:ok, pid} = AccountsSupervisor.start_child(...)
  Accounts.ensure_started(pid)
end

where Accounts.ensure_started(pid) uses a GenServer.call() and just replies with :ok in the callback. The message from GenServer.call is guaranteed to be delivered after the message sent from init so it will block the caller until the process state is restored, but will not block the supervisor.

On the other hand, one of the responsibilities of the supervisors is to ensure a predictable start sequence and give you some guarantees after the start. Maybe it’s a good idea to wait a bit longer when starting the node, but have the data available before any other action in the system can be performed? You might want to consider if this is possible for this server to crash and be restarted. If yes, maybe consider separating the state into one process and all the risky activities to another.

Here are some resources on that topic that might be valuable (including a bit of self-promotion )

http://mkaszubowski.pl/2017/09/02/On-Restoring-Process-State.html

https://ferd.ca/it-s-about-the-guarantees.html

hubertlepicki · September 13, 2017, 12:01pm

@mkaszubowski thanks!

I think this boils down to 1) as well. I am keeping the e-mail address in state. It can change too, in which case I update in the AccountsByEmail registry. So in order to properly register in both registries, I need to read the state.

I think I will re-work this to avoid doing so altogether. I will keep the e-mail to ID mapping in some separate store in memory or ETS table. Then my lookups will always be by ID, and I will be able to postpone state restoration from disk. This will sort out my biggest problems 1 & 4 alike.

PS. @mkaszubowski your web site is throwing a invalid SSl cert error in Chrome 60 @ Linux. I think it serves *.github.com SSL certificate instead of custom one for your domain.