How do you ensure that only one “copy” of a worker will be active in a multi node otp application?

How to ensure that only one “copy” of worker will be active in a multi node otp application?
We created some tasks to update KPIs that will trigger on a set timed interval (ex: 60 seconds).
That’s why to avoid duplication and extra resources usage, only one per worker type need to be up running accross all nodes.

defmodule MyApp.Application do
  @moduledoc """
  Application Settings.
  """

  use Application

  def start(_type, _args) do
    import Supervisor.Spec

    children = [
      supervisor(MyApp.Repo, []),
      supervisor(MyApp.Endpoint, []),
      worker(Guardian.DB.Token.SweeperServer, []),
      worker(MyApp.Services.UserAgents.Server, []),
      worker(MyApp.Services.IPs.Server, [])
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end
1 Like

Would Raft be applicable to your use case?

2 Likes

Also will throw this in. I’ve made use of it. You can logically solve your use case but will find you want a lib probly to handle all the various situations that can arise

You can probably get started with OTPs global

Note this relies on :global so it isn’t resistant to netsplits. If you already have a DB that supports locks you could use that in addition to this to get a robust combo. Or, in fact, just have the task run on all nodes and rely entirely on locks + a “last updated” timestamp to avoid duplicate updates.

This really depends on what you mean when you say, “ensure”. How important is it that you only ever have one copy of a worker? There are loads of libraries out there to help solve this problem: swarm, syn, gproc, or even :global. Each of these libraries have different tradeoffs and guarantees. They handle things like network partitions and node failures differently. It’ll really come down to the kinds of guarantees you need. If you have a small and relatively stable set of nodes and can tolerate occasionally having “duplicate” workers then I’d look at swarm. If you need somewhat stricter guarantees then you might look at gproc. If you need even stricter guarantees you might want to use raft or better yet use an external data store like redis. My intuition (which should be taken with a massive grain of salt since I don’t know your exact use case) is that you’re probably best off using something like swarm. You can use “at least once” messaging guarantees over whatever transport you’re using and work to make your downstream service idempotent.

4 Likes