We are relatively new to elixir. So far we have couple of standalone services running on production and they are performing exceptionally well
Now moving forward to next service we have this use case for which couldn’t able to figure out the right way to implement this in elixir.
Here our service act as communicator between the third party source and our kafka stream, where we receive data from third party via TCP and we push data to kafka to be consumed by other services. This third party source has 2 instances, one as primary and second as backup, which we can use in case we don’t receive data from primary. The communicator service will be deployed on 2 different instances (i.e 1st instance will receive from primary and 2nd will receive from backup), so in this case if we start both communicator machines, they both will emit duplicate data to kafka which we don’t want. We want only one to be able to push to kafka and on failure or so, second should push data to kafka.
Here we want something like master slave cluster setup for our service, where master will push data to kafka and slave will be in pause state and as soon as master goes down, slave will start pushing data to kafka.
Possible solution which we thought is to use:
- keepalived - to have active-pasive setup
- Keep state in Redis and they use it to decide whether service should push data further or not. (this is very bad solution though)
We think there will be better way to deal with this clustering use-case using Erlang distributed architecture.
I can think of two possible solutions. Both solutions are variations on your keepalive idea:
Distributed Applications. IMHO this is mis-named as it describes a classic active/passive failover.
GenServer i.e. registering the
Application:start/2 pattern match on
:failover to determine which source to pull from.
Likewise for the global
GenServer solution you can determine source via
sys.config or node name.
Nodes don’t usually send out an orderly notification of their demise; what would the system use to detect “liveness”? What happens to messages that arrive before the master is declared “down”?
What happens if the system goes split-brain (due to weird local networking failures, for instance) and both instances are pushing events into Kafka?
Deciding when & how to handle failover is the hard part, the core “receive TCP and put in Kafka” logic is straightforward. You’ll want to think carefully about exactly what the goal is - keep the data flowing when the upstream primary goes down? Keep the data flowing when the primary instance of the communicator service goes down? Both? - and what guarantees that goal requires from the system.
I was not aware of Erlang Distributed Applications and Global GenServer. Will give it a try
You have a valid point Will think on that front as well.
For us, when switch over happens it fine for us if both master slave instances sends data to kafka for some duration (may be 3-5 min but not more than that). So we wanted mechanism to reliably do switch over in max 5 min.