Pow's mnesia cache with dynamic nodes (containers in Kubernetes)

jsm · September 26, 2019, 8:14am

Thanks for the awesome work on your library, @danschultzer I’m planning to use Pow in my next project, which will rely on distributed nodes which will dynamically scale up and down. They will discover one another using libcluster’s DNS-based discovery mechanism.

It is not immediately clear to me if this will work properly with Pow’s mnesia caching mechanism. In the docs (https://hexdocs.pm/pow/Pow.Store.Backend.MnesiaCache.html) it seems to me that the output of Node.list is passed to the mnesia cache worker as an initial parameter. It is possible, however, that this list is not fully populated yet (mnesia cache will effectively race libcluster) or that the list changes at some later point in time, when nodes are added or removed.

So the question is, will the MnesiaCache (or the cluster heal Unsplit worker) deal with such situations by itself? If not, do you have any advice for dealing with it? Or would you recommend using the Redis cache in this case?

danschultzer · September 26, 2019, 8:39am

@sensiblearts has been working on a guide for libcluster setup (that will be put up on powauth.com). Here’s a way to handle it: https://github.com/pow-auth/pow_site/issues/10#issuecomment-533845299

I’ve tested it locally with a k8s cluster, and it worked perfectly scaling up and down. If you push this to production environment I would be really happy to hear some feedback, since I don’t have a production setup using libcluster.

Also, you are more than welcome to join in helping to write the guide, adding examples, caveats, etc

jsm · September 26, 2019, 9:25am

Thanks, that is interesting! Especially the bit in the guide that says:

As long as at least one node in the :extra_db_nodes list is connected to the cluster, the MnesiaCache instances for all other nodes will automatically be connected and replicated. This makes it very easy to join clusters, since you won’t have to update the config on the old nodes at all. And as long as the old nodes connect to at least one node that’s in the cluster when restarting, it’ll automatically connect to the new node as well, even without updating the :extra_db_nodes setting.

I’m not completely clear if the potential for a race condition does not exist anymore though. Suppose I have two containers, my_app@x.x.x.1 and my_app@x.x.x.2. Let’s say node 1 is brought up first. It polls the DNS service and receives a list of peers with only itself in it at that moment. It starts MnesiaCache with :extra_db_nodes receiving only itself. Node 2 is brought up later, receives both itself and node 1 from the DNS request and passes this along to MnesiaCache.

For node 1, does this fulfill the stated requirement “at least one node in the :extra_db_nodes list is connected to the cluster”? Or will node1 have it’s own separate mnesia cache because it only found itself initially?

I intend to provide any feedback / guide / PR that I have to offer

danschultzer · September 26, 2019, 9:39am

The replication starts as soon as another node connects, so node 1 will automatically be in the cluster when node 2 connects to it.

jsm · September 26, 2019, 10:05am

Awesome, thanks for your help!