Questions about distributing elixir app to multiple nodes

ali · March 23, 2018, 4:56pm

Background

I am writing a backend for IOT using elixir. Every device continuously report its measurements and technical logs to the backend, which are then aggregated and persisted at different time intervals for reporting. Support staff can troubleshoot and fix issues on the these devices remotely. Total count of devices in the field will increase significantly in future. Its obvious that single server cannot handle all of them.

I am thinking to create a process per device, which receives measurements, maintains running counters and compute the aggregates and save in db for each device. Each support staff person is also modeled as user process that sends remote commands to device via the device process. Devices also report back the success/failure of remote commands back to user process via device process. Devices and support staff app maintains persistent connection with backend using phoenix channels which talks to their corresponding (device or user) processes.

To distribute device and user processes on multiple servers I am thinking to use consistent hashing as load balancing strategy. It roughly distribute the load uniformly across the cluster. But I hit a problem where user process is on machine A and it want to send remote commands to two devices whose processes are running on machine B and C respectively.
I know, I need some distributed process registry, to solve this problem.
I have looked at https://hex.pm/packages/swarm and couldn’t get my head around it. Internally it uses consistent hashing as well.

Questions

Is the above mentioned approach is a right way to approach the problem?
If I go with consistent hashing in ha-proxy to distribute the client connections across cluster, swarm consistent hashing needs to be same as of ha-proxy, otherwise it may result into excessive node to node communication if channel process and its corresponding device process land on different nodes?
Its is reasonable to just spawn device process on the same node where it connects and use phoenix presence as distributed process registry?
Any better way to approach this problem?

dom · March 24, 2018, 1:10am

Is there a reason you can’t have the HTTP request handlers or Phoenix channels directly interact with device processes, rather than go through a separate user process?

ali · March 24, 2018, 3:41am

The reason is: a user sends remote command to the device, and I need to keep track of the command state (received at device, executed, failed etc.) and report it back to user. If I go with HTTP I need to track this state in db, and periodically poll this info.
I need a user process along with phoenix channel to track remote commands states. If I go with channel only I may loose remote commands states during intermittent connectivity.

kokolegorille · March 24, 2018, 5:25am