Guidelines for stateful web apps with persistence?

Hey there, my first Elixir app in production was a re-write of a Rails app using Phoenix and therefore it’s still organized too much like a Rails app.

Now that I’ve learned more, I’m planning a rewrite of it to make it more Elixirish. I’ve followed the excellent Elixir for Programmers course and also the beta Functional Web Development with Elixir, OTP and Phoenix book.

I really like the idea of first building the Elixir app(s), and then putting a Phoenix interface around it, and also placing persistence in the Elixir app and not in the Phoenix layer. The whole idea of having all in memory as running processes seems awesome, I’m all thumbs up for avoiding the DB bottleneck.

But, in both cases, we’re building simple games (hangman and battleships), that don’t need much memory. Do you think this can be applied to real world examples where we have more data?

As an example, in my app, we have users that submit sites for web crawling. A site is basically a starting URL, that also holds up to 5,000 web pages (a web page is basically a URL). But then each web page can store the HTML and A11Y issues found in it. An issue is basically a JSON struct with its details. So for a single site you could easily have hundreds of thousands of issues stored.

So, my concerns are:

  • Should we aim for putting everything into memory? Or aim for storing it in DB and then put only IDs in memory?
  • Use DB just as a secondary storage, a backup?
  • When new data is generated, and we update it in memory, at what point do we save that also to DB? As it happens, or periodically?
  • How to decide to unload something from memory? My guess is that we need to keep a track of the latest usage and then stop the GenServer for that site after a reasonable time has passed.

In short, can you recommend readings, talks, source code to read and understand better how a real world stateful web app with persistence is organized?

Thanks!

Given the criteria that you specify I’d aim for a hybrid approach.

  • “Recent Data” remains in memory - you could use something as simple Process.send_after/4 before the process “waits” for the next thing to do, canceling it with Process.cancel_timer/2 when that “next” thing comes or wrapping up when that timed message hits.
  • However while in memory after any update ensure that your state is “sane” and wrap up the relevant parts of your state into an event, dispatching it to a service that queues it up to perform the necessary persistent storage updates.
  • When something arrives for a process that doesn’t currently exist in memory create a new one based on the information in persistent storage.

Depending on activity and the average number of issues 1 process per URL may be justifiable.

1 Like