Eventually persisting in-memory state?

Suppose I have a GenServer or Agent that holds some state and serves immediate responses.

What are some strategies to asynchronously persist this state in database, without slowing down the GenServer, but then also handling failures resulting in portion of state not written? In a way, this is sort-of “write cache” or buffer/queue for high-concurrency write situations…

My specific example: I have limited-capacity course with 30 seats, and I have 10000 within single second trying to enroll. I wanna put them in a queue, and give them immediate response whether or not they made it among the first 30, but then in the next step I also need to write it to db, so that the initial response is “set in stone” so to speak. It would be fine to send them kind-of “pre-approval” immediately in the first step, and then “full confirmation” a few seconds later once it’s written in the database.

I didn’t study computer science, so maybe this is a common problem with simple solution.

Thank you!

2 Likes

How many instances of web server are you going to have? Is it going to sustain the 10000 requests in a second?

If one server instance is fine, then a single GenServer per course sounds like a good solution. Looks like your constraint is just a counter (30 seats). Assuming you have a reasonable DB (INSERT is just a few milliseconds), you can just write the record immediately and have all the consistency guarantees. So the GenServer would be something like:

  1. Start the GenServer with course id,
  2. On init (or handle_continue) get the booked seats,
  3. On each “book a seat” message, check the counter, write to DB if seat available, return result.

Be sure to bump the timeout on the GenServer.call when sending the “book a seat” message. I think this should be good enough, but it’s hard to tell without real performance tests.

If it turns out that you need multiple instances of the web server, then things get a bit more hairy. You could go with Erlang distribution or use out-of-process cache like Redis.

2 Likes

Yep, I mostly agree with what @stefanchrobot said.

You could theoretically go with a singleton GenServer (a single global process), but that will make horizontally scaling your app to several instances and ensuring you don’t loose data in case of restarts a bit more difficult.

Honestly, I would rather:

  • Implement it efficiently with standard database queries. You can go a long way with a database like Postgres before you hit such a bottleneck. How likely it is that you would actually get 10.000 requests per second? Remember you can always optimize your solution when traffic actually grows to the point you start to see scalability problems.

  • If you need more performance, I would implement this with Redis rather than in a GenServer. You could use a simple integer counter, or even a set, so you can efficiently check if the course is full and only write to database if not (in cases when the sets are much bigger and one is fine with an approximated count Redis even provides HyperLogLog). With Redis you would get performance, but also durability if you use the WAL, so you don’t risk loosing bookings if your server crashes or restarts.

Of course, it is possible with Elixir to run a single global GenServer in your cluster taking care of each counter, but you would need to take special care in your logic to make it work properly in all cases (think about restarts, deployments, etc.).

2 Likes

Why not keep things in the OTP world, like using Mnesia, that supports persistence or just Ets if you don’t need it?

And Redis will never be faster then ETS or Mnesia, plus Redis needs to have twice the memory it is using when it triggers the write commit for persisting the data to disk.

Of course that’s an option too. I personally feel that implementing something like that properly with Mnesia with a distributed setup would be a bit harder than with Redis and stateless app instances.

Admittedly though, there is a bit of familiarity bias here, as I personally have more experience implementing high throughput solutions with Redis than with Mnesia. At the end of the day, if one is familiar with Mnesia and with the ins and outs of deploying the app in a distributed setup, that would be a good solution too.

As for the performance, it really depends on the case. Sure, if you hit a local Mnesia node, the latency would be hard to beat, but when you add distribution, disk persistence, etc. your mileage may vary. Regarding memory, this does not sound like a memory intensive case to me. What I am saying is that a statement like “Redis will never be faster than ETS or Mnesia” might be a bit too absolute, but I would not derail the discussion on that point.

Thank you. I was thinking or Redis ordered sets (ordered by time of signing up) but was waiting whether someone will bring it up instead of (D)ETS and Mnesia - they do seem to offer similar features, but it feels like a few bits and pieces need to be added manually, while Redis is more up to date with today’s needs and has it all out of the box… So I would just ZADD everyone into ordered list, then save the first 30 in bulk to Postgres, but keep the list as a “waiting list” around, maybe just using Redis persistence options…

Then I was thinking also about not scaling prematurely because I believe Postgres is in fact much more powerful than most people realize :slight_smile: However, I am worried mostly about the atomicity - INSERT with RETURNING the “rank” in the table, which might be not so performant anymore…?

This is a university system and there’s gonna be dozens, maybe hundreds, of courses - I was thinking each having its own GenServer taking care of the queue. And they open the floodgates at the same moment for all the students, which is 40k people – so it’s huge sudden spike in traffic, but it’s also true that the atomic writes, which I’m afraid of the most, would likely be spread at least over several seconds, so probably not reach the rate of thousands per second.

I’m just trying to prevent the repeating scenario that the system always goes down within seconds of opening and then they’re putting out fires for the whole day while people at home are crazily hitting refresh button all day, desperately needing to enroll before it’s all full :smiley:

3 Likes