Call Horde.Registry.put_meta inside terminate/2 doesn't persist state

csokun · December 28, 2023, 6:00am

Hi, I’m trying to understand how to properly implement GenServer state handoff in a cluster environment.

I searched around and managed to get a simple GenServer up & running elixir_libcluster_horde_demo/lib/mix_app1/greeting.ex at main · csokun/elixir_libcluster_horde_demo · GitHub using libcluster & horde.

Based on my reading, I should be able to perform state handoff by simply:

Add Process.flag(:trap_exit, true) in init/2
Persist state via terminate/2, using Horde.Registry.put_meta/3
Reload existing state on init/2 or handle_continue/2 using Horde.Registry.meta/2

Each time I kill the old process using :init.stop() a new process is successfully spawned on another node. However, the newly spawned process is unable to find the existing state calling Horde.Registry.meta(...) always return :error

But, if I manually invoke Horde.Registry.put_meta(...) then :init.stop() the newly spawned process can restore the state.

Is there something wrong with my implementation? Why calling Horde.Registry.put_meta(...) within terminate/2 does not save the GenServer state.

hubertlepicki · December 28, 2023, 9:15am

I strongly suspect what is happening here is that you’re shutting down the whole node before Horde.Registry manages to propagate its state across other nodes in the cluster. CRDTs are eventually consistent and it’s not an immediate changes that you’re putting onto registry, they have to be synced back to the rest of the cluster.

I never did what you’re trying to do, with using Horde to pass the state to itself so not sure how to handle it and if it’s good idea at all.

What I found a reliable way to pass the state in the above situation, and also to ensure that a required process indeed starts on the other nodes was to use PostgreSQL to store the state. In my case there were two scenarios: one in which I scheduled an Oban job to start a corresponding process on other nodes, and in it’s payload I was passing the state, in another case I was just saving the state to a database and process would be started on-demand on other nodes when required and would pick it up.

csokun · December 28, 2023, 12:01pm

Thank @hubertlepicki I think you nailed it I added Process.sleep(1000) after calling Horde.Registry.put_meta/3 and the state get persisted.

anuarsaeed · December 28, 2023, 12:46pm

I believe using Process.sleep is not a very reliable option since the distribution of state may take longer or less due to the underlying network. I typically address similar cases using message passing.

hubertlepicki · December 28, 2023, 1:23pm

I agree but having a glance at the Horde.Registry doesn’t seem to expose any public function / mechanism to check if the changes applied locally got synced, also it may be good enough.

@derekkraan any recommendations?

derekkraan · February 12, 2024, 7:52am

Hi, sorry for taking so long to respond, Horde’s backing store (DeltaCRDT) is eventually-consistent, and there is no mechanism for checking to see if the locally-applied changes have been synced.

Using Horde.Registry.put_meta/3 during shutdown is unlikely to work well, as @hubertlepicki has pointed out, since syncing is performed asynchronously, and since you’re in the process of shutting down the node, it’s highly likely that the node will stop before any syncing can happen.

My recommendation for now, as Horde is not persisted, would be to only use Horde for data that you can afford to lose. Process state handoff would be better implemented using a database like Postgres or Redis, or perhaps something else.