Horde (and other dist. super.): Managing module versions

(This question is phrased in an abstract way, because it seems to me to be a general problem. But I can create a small example on request.)

So with Horde, a supervised process can be restarted on any node.

Say I have a cluster with three nodes: A, B and C.

I also have a number of GenServers running using the Foo.V_1 module (all under a Horde.DynamicSupervisor). They each starts up other workers under another Supervisor using the Bar.V_1 module.

But, at no point do I know, if a process has been restarted before and on which node it is currently running.

Say I deploy changes to Node A, creating a Foo.V_2 module that starts up Bar.V_2 workers. I create a GenServer with Foo.V_2 called MyWorker. But, before I can deploy changes to the other two nodes, Node A crashes. So now, the Horde.DynamicSupervisor will restart MyWorker on either Node B or Node C.

The trouble is, neither Node B or Node C has the Foo.V_2 nor Bar.V_2 modules.

If I understand correctly (and please feel free to demonstrate my ignorance), then Horde will be able to use the Foo.V_2 to create the worker, but what about Bar.V_2?

Wouldn’t I now get errors from Foo.V_2 because it is trying to use Bar.V_2 to create workers but Bar.V_2 is undefined?

If I understand the situation correctly, what is the best practice for dealing with this?

A simple solution can be having distinct cookies between deployed versions, so clustering only happens between nodes of the same version.

What if it is important the all nodes be able to cluster because they need to share an :mnesia database, for example?

One deployment strategy independent idea I had:

  1. Always use semantic versioning in module names
  2. Have a mechanism by which we can keep track of the last updated node (i.e. the node with the most up to date code available, and hence widest range of versions). Perhaps using Horde.Registry.put_meta (Horde.Registry — Horde v0.8.7).
  3. Explicitly specify (in the start_link and init functions) for workers created from Foo, which module versions they need.
  4. If the worker detects that the module version it needs is not present, it makes an rpc call to the latest updated node, gets the bytecode for the module with :code.get_object_code (Erlang -- code), and then loads that into memory.

This looks like rolling releases for complex database migrations.

First you need to deploy all your nodes with the v1 and v2 modules, but only v1 is used.

Then you can deploy nodes using the v2 modules only.

Finally a third deployment where the v1 code is removed.