Using Registry as a counter? (or better alternatives?)

sasajuric · June 29, 2018, 12:27pm

Also, be sure you’re not starting linked tasks, or, if you do, that you’re trapping exits. Likewise, if the process is registered at some registry, but not trapping exits, terminate will not be invoked if the registry is taken down. The same holds for other non-parent processes which are linked to the process where terminate should be invoked.

Also, the process might be brutally killed by its parent if :shutdown is not set to :infinity.

The problem I have with this kind of thinking is that a lot of preconditions must be satisfied to ensure that terminate is invoked. You need to keep the entire code of the GenServer in your mind to ensure that the cleanup code is always executed. Every time you make a change to that GenServer, you need to keep in mind that such change must not lead to a potential bypassing of terminate.

I find it hard to reason in such terms, and I think this is very error prone, so I’d usually do the cleanup in an external process. I’d mostly opt for terminate if synchronism is required (we need to cleanup before the process terminates).

amnu3387 · June 29, 2018, 1:08pm

I think that what I would like to understand is if in “simpler” situations you can rely on using terminate, because if you can reliably write a gen_server that only does some simple processing, writing a single terminate callback is fairly less involved than spinning up additional processes and setting up monitoring. Basically it goes back to, when can we rely on terminate and I think you’ve given a lot of info when it might be appropriate to do so, but you end up the previous post by saying that “a lot of preconditions must be satisfied” and that you find it hard to reason about it in those terms. Hence it seems that you’re advocating that monitoring the process is indeed the best way to go about it, so back to the “don’t rely” on terminates. I created a new topic with an actual sample gen_server.

michalmuskala · July 4, 2018, 1:03pm

A distributed counter is one of the use cases, we’re thinking about when introducing the Firenest.ReplicatedState abstraction in the firnest project. As an example, with the interface we have planned right now, a distributed counter could look like the following.

Each process can register itself for tracking and increment/decrement the counter. When the process goes down, its data is removed (when a node goes down other nodes remove data for all processes from the dead node).

defmodule DistributedCounter do
  alias Firenest.ReplicatedState

  @behaviour ReplicatedState

  def child_spec(opts) do
    Firenest.ReplicatedState.child_spec(topology: MyApp.FirenestTopology, name: opts[:name], handler: __MODULE__)
  end

  def track(server, key) do
    ReplicatedState.put(server, key, self(), 0)
    # calls local_put as the callback inside the server
  end

  def increment(server, key, by) when is_integer(by) do
    ReplicatedState.update(server, key, self(), {:increment, by})
    # calls local_update as the callback inside the server
  end

  def untrack(server, key) do
    ReplicatedState.delete(server, key, self())
    # calls local_delete as the callback inside the server
  end

  def get(server, key) do
    # list returns a value for each process tracking the state - both local and remote, 
    # we just sum them to get the final value
    Enum.sum(ReplicatedState.list(server, key))
  end

  @impl true
  def init(_opts) do
    {0, _config = %{}}
  end

  @impl true
  def local_put(state, _config) do
    {:ok, state}
  end

  @impl true
  def local_update({:increment, by}, _delta, state, _config) do
    # we don't use precise data tracking, so we just use the state as out delta that will be 
    # propagated to remote servers in the handle_remote_delta callback
    {_state = state + by, _delta = state + by}
  end

  @impl true
  def handle_remote_delta(remote_delta, _old_state, _config) do
    # since the remote delta is just the remote state, we don't need to do any 
    # state mutation and the remote delta is the new state
    remote_delta
  end
end

The same abstraction will be used to re-implement Phoenix.Presence and possibly other things - it seems quite flexible. I’d recommend reading the docs in the linked PR - while the implementation is not ready, the docs should be close to the final thing we want. There’s also a mechanism for precise tracking of state changes (with the observe_remote_deltas callback which is not shown here).

darkmarmot · July 4, 2018, 3:44pm

Awesome, I’ll check it out! Hope to use it when ready

In the meantime, for those curious, I ended up implementing distributed counters/stats using a Registry pair – one :duplicate Registry to hold groups of actor processes, one :unique to update values associated with each process. Using them in tandem, I can get local node stats pretty quickly.

For stats across all servers, just hitting genservers on every box and merging the results (which are reduced to per node stats, so it’s to sync summaries, not all of the data).

darkmarmot · July 4, 2018, 4:02pm

Firenest seems to be doing a lot of things that we’ve been trying to build in parallel on our current system (especially with regards to topology).

Do you have a sense yet of when you’ll have a first release ready?

I haven’t looked through all of it yet, but was wondering if it is going to have a concept of an ‘ideal’ topology of whitelisted nodes that could be updated over time? Or any quorum/raft type implementations?

Thanks,
Scott S.

michalmuskala · July 4, 2018, 5:24pm

The current plan is to have things ready and tested by the end of summer.

Could you expand on that a bit? What would be the use case? Are you thinking of a wider cluster with some subset of nodes forming this “ideal group”?

Probably not directly, but at some point, I’d like to look at porting some of the existing implementations over to use the topology provided by firenest and/or maybe build on the SyncedServer abstraction.

darkmarmot · July 4, 2018, 8:03pm

The reason for an ‘ideal’ being that if we have an optimal ‘everything is up and running’ topology of which there is strong consensus, we could use it as the basis to determine if our current network has a majority quorum or should, for instance, act as a read-only data store.

We’re currently experimenting with the idea of an in-memory only system spread across multiple data centers.

From what I can tell of the current Firenest, it looks like a good foundation upon which we could layer additional constraints

I’d love to test it out and provide feedback whenever you think it might be at a good alpha/beta stage!