How to properly do backups with Mnesia?

Hey everyone!

As you might have heard previously, I’m working on a chat-application. We happen to use Mnesia as database of choice. For safety, we’d like to perform backups of all messages in the system every hour, so that in the case something goes unexpectedly extremely wrong, we can restore an old version.

However, when the system grows, backing up every hour will result in a large duplication of data. So my question:

  • Is there a way to perform incremental backups with Mnesia?
  • Or alternatively, is it smarter to back up the data in another, non-mnesia (e.g. some kind of serialized) format?
  • Are there other people doing backups with Mnesia, if so, what is your strategy?
1 Like

I’ve heard of a way but it is not documented.

I found one mention of it here: http://erlang.org/documentation/doc-5.0.1/lib/mnesia-3.9.2/doc/html/notes.html (section 1.10.1)

According to this document you can use [{incremental, PrevName}] to the mnesia:backup/1,2 and mnesia:backup_checkpoint/2,3 to do incremental backups.

It is not quite clear how to get this started though. I know I did some experiments with this back in 2013/2014 but can’t find much code or nodes any longer just that I had it working but there was some caveat (with mnesia there always is :smiley: )

My guess is that it would be something like this:


% First backup
{ok, Name, _Nodes} = mnesia:activate_checkpoint([
{max, mnesia:system_info(tables)}
]),
ok = mnesia:backup_checkpoint(Name, "mybackup0.bak"),


% Second Backup
ok = mnesia:backup("mybackup1.bak", [{incremental, Name}]).

]),

It would likely be more consistent to always to the activate_checkpoint and backup_checkpoint and in case of PrevName being undefined you don’t it as an argument.

I think one problem here is if I remember correctly is that checkpoints are gone when a node restarts and if that happens you must start with a full backup again before doing incremental.

4 Likes

How big are the backups? If they are less then a few GB’s I would probably opt for a full back up - though maybe not every hour.

Can you get two drives in a raid array so that the drives are mirrored? That way you will have an exact copy of your DB/app on two drives at all times, and the backups will serve as a ‘back up’ in the event of both going down at the same time or due to some other failure. You’ll also get some performance benefits too.

1 Like

Currently the backups are not that big yet, but since it is a chat application, it means that the size of the database will rapidly explode once we get more users.
Furthermore, there is a lot more data being added to the database than ever removed. Doing full-system backups will contain a lot of duplicate information.

RAID is great for hardware failure, but of course less so for software failure. Nevertheless, definitely something to look into :+1:! :smiley:

@cmkarlsson Very interesting! I’ll test this out when I start working on it, and will share what I find.

1 Like

Doesn’t mnesia still have the 2gb limit?

yes and no. The disc_only_copies for the default backend do have a 2GB limit for each table partition. disc_copies tables are limited by RAM on the machine.

Recently other mnesia backends have been released (by 3rd party) that does not have the 2GB limit or RAM limitation.

2 Likes

The table size limits can be circumvented using fragmented tables.

In terms of backups, using checkpoints is the way to go.

How long do you have to or want to store the chat data. Deleting old chats is a possibility, I’m looking at you kik.

Off loading old data to another database is a common technique. This is especially true if you want to run billing or reporting queries.

Another is to create tables by day. i.e. create a new chat table each day and although that means having 365 unique tables per year the data within each is bounded.

I recommend having 3 Mnesia nodes, 2 for your application and the 3rd for administration tasks like backups.

2 Likes