Options for process state backup other than Postgres

Phillipp · May 6, 2019, 9:03am

Hey,

I wonder what would be some valuable options for process state backups if one does not want to add a database like Postgres to the setup.

In my particular case, I got a bunch of “tracking” processes that should survive an application restart (e.g. from deployment). Hot code reloading is not an option in our Kubernetes setup.

The state I need to backup does not contain Elixir specific data structures (e.g. atoms) so it can easily be encoded to JSON and back into lists and maps.

What I have thought of so far:

Writing JSON files (e.g. {identifier}.json) to disk and load them on startup
Writing state to Mnesia (maybe useful if we add clustering later)

Are there any other options I am not aware of? What would you guys do in my case?

tty · May 6, 2019, 9:35am

I would favor mnesia and not bother with JSON since you can stuff the entire map/struct into mnesia without worrying about marshalling.

LostKobrakai · May 6, 2019, 9:37am

If you can store stuff on the filesystem I’d rather user :erlang.term_to_binary than json.

Phillipp · May 6, 2019, 9:37am

That is also what I thought. Never worked with Mnesia. I hope I can specify a custom filesystem location for the Mnesia data so I can simply mount it to the container.

Phillipp · May 6, 2019, 9:38am

Good point. I always forget about that.

Phillipp · May 6, 2019, 10:02am

I think I go for :erlang.term_to_binary in my case.

Good read:

dimitarvp · May 6, 2019, 10:31am

If you don’t need dynamic querying and joins, both mnesia and plain files are fine.

engineeringdept · May 6, 2019, 10:46am

Redis can be good for storing binary blobs for retrieval later, depending on the size.

tty · May 6, 2019, 12:15pm

You can but it is a global value i.e. every mnesia table for that node would be stored there. You can specify it in the sys.config or on the command line -mnesia dir Directory.

tty · May 6, 2019, 12:17pm

If you are using mnesia or ets or dets there is no need for term_to_binary.

al2o3cr · May 6, 2019, 12:55pm

If you’re using it as a straightforward “tracker ID -> stored state” persistent mapping, maybe DETS would be sufficient? It lacks the general transactional behaviors of Mnesia, but reads of a single key are guaranteed to not see partially-written data.

tristan · May 6, 2019, 4:03pm

Postgres is likely the best way to go and is what I have used before, https://github.com/erleans/erleans

Mnesia can be fine on a single node but, particularly if running in kubernetes, don’t expect to be able to utilize it as a distributed store.

A StatefulSet which ensures the same volume mounted for a specific nodename could be useful or storing a local copy. But far better to be able to have a stateless app deployment and use Postgres for data.

keathley · May 6, 2019, 4:56pm

This point really can’t be emphasized enough. If you’re going to use mnesia you should be aware of its use cases and its limitations. Mnesia might be fine for your use case but you should really prove that first.

I like to use jsonb columns in postgres for these sorts of things but since you specifically called that out as a non-option then I might consider something like dynamodb. Dynamo is a reasonable choice if you just want to store binary blobs somewhere. Its hard to tell from your example but if you only need to read in the files once on app start then you might be able to store the files on s3 and pull them from there when the app is booting. There’s a lot of options here. Most importantly I would highly recommend avoiding creating an ad-hoc database if you can help it.