How do you ensure that only one “copy” of a worker will be active in a multi node otp application?

pedromvieira · April 30, 2018, 3:46pm

How to ensure that only one “copy” of worker will be active in a multi node otp application?
We created some tasks to update KPIs that will trigger on a set timed interval (ex: 60 seconds).
That’s why to avoid duplication and extra resources usage, only one per worker type need to be up running accross all nodes.

defmodule MyApp.Application do
  @moduledoc """
  Application Settings.
  """

  use Application

  def start(_type, _args) do
    import Supervisor.Spec

    children = [
      supervisor(MyApp.Repo, []),
      supervisor(MyApp.Endpoint, []),
      worker(Guardian.DB.Token.SweeperServer, []),
      worker(MyApp.Services.UserAgents.Server, []),
      worker(MyApp.Services.IPs.Server, [])
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

peerreynders · April 30, 2018, 6:12pm

Would Raft be applicable to your use case?

subetei · April 30, 2018, 6:40pm

github.com

ostinelli/syn/blob/master/README.md

[![Build Status](https://travis-ci.org/ostinelli/syn.svg?branch=master)](https://travis-ci.org/ostinelli/syn)
[![Hex pm](https://img.shields.io/hexpm/v/syn.svg)](https://hex.pm/packages/syn)


# Syn
**Syn** (short for _synonym_) is a global Process Registry and Process Group manager for Erlang.

## Introduction

##### What is a Process Registry?
A global Process Registry allows registering a process on all the nodes of a cluster with a single Key. Consider this the process equivalent of a DNS server: in the same way you can retrieve an IP address from a domain name, you can retrieve a process from its Key.

Typical Use Case: registering on a system a process that handles a physical device (using its serial number).

##### What is a Process Group?
A global Process Group is a named group which contains many processes, possibly running on different nodes. With the group Name, you can retrieve on any cluster node the list of these processes, or publish a message to all of them. This mechanism allows for Publish / Subscribe patterns.

Typical Use Case: a chatroom.

##### What is Syn?

This file has been truncated. show original

Also will throw this in. I’ve made use of it. You can logically solve your use case but will find you want a lib probly to handle all the various situations that can arise

mbuhot · April 30, 2018, 10:32pm

You can probably get started with OTPs global

dom · April 30, 2018, 11:01pm

Note this relies on :global so it isn’t resistant to netsplits. If you already have a DB that supports locks you could use that in addition to this to get a robust combo. Or, in fact, just have the task run on all nodes and rely entirely on locks + a “last updated” timestamp to avoid duplicate updates.

keathley · May 1, 2018, 7:15pm

This really depends on what you mean when you say, “ensure”. How important is it that you only ever have one copy of a worker? There are loads of libraries out there to help solve this problem: swarm, syn, gproc, or even :global. Each of these libraries have different tradeoffs and guarantees. They handle things like network partitions and node failures differently. It’ll really come down to the kinds of guarantees you need. If you have a small and relatively stable set of nodes and can tolerate occasionally having “duplicate” workers then I’d look at swarm. If you need somewhat stricter guarantees then you might look at gproc. If you need even stricter guarantees you might want to use raft or better yet use an external data store like redis. My intuition (which should be taken with a massive grain of salt since I don’t know your exact use case) is that you’re probably best off using something like swarm. You can use “at least once” messaging guarantees over whatever transport you’re using and work to make your downstream service idempotent.