Elixir distributed processing - IoT

tomazbracic · September 17, 2019, 10:44am

Hi,

At startup where I work, we use Elixir in production for over a year. We are really happy with our decision back then. But now we are facing diferent kind of challenges.

To explain a little bit. Our product is Connected Car IoT platform. Long story shortly… focus of our current problem is how to approach distribution / load/scale of processing. To understand this keep in mind, that we use Processes in a way that process is created when you start your trip, and more or less terminated after you start the car. Of course, then there are other phases, etc. Currently the node responsible for this was ok. But now there is a chance that we might need to scale our solution by factor 500x.

The main issue is how to solve the issue to process each car on only 1 node. So for instance. When you start the trip there is an uniquely named process started. And during that trip… all data for that unique ID should be delivered to that process. And with all the cases… draining the node, restarting the node, terminating the node… preferably moving that processing to another node. But scaling that much …

The ecosystem around how to approach scale-out in Erlang/Elixir is quite vibrant. More established approaches, then more recent problems… it is quite hard to say what is here to stay, what is good enough with enough dev power behind (those projects). I can read about Riak core a lot, but on the same time… that project had a lot of “pause” in between if not mistaken. Now you have Swarn, LibCluster, Partisan, native distribution, … all the way towards AWS scaling groups. So try to understand my “frustrations” Should we go in the direction of keeping “a node” as a single processing unit (a lot of issues with that) or should we approach Elixir/Erlang ecosystem for that. And if this… which one? What we don’t want is to “complicate” too much this … since we might create more advanced/hard problems … that we might not be able to solve them… and Google might not have that many answers.

So … currently, we use a server like AWS EC2 - c4.2xlarge which is like 8 CPU, 15Gb RAM… and let say… numbers for metrics are now 5000. And we would like to go to a 1 MIO for instance (in reality I guess less).
And of course there is a lot of infrastructure around this… (AWS services, Kafka, Elasticsearch, ScyllaDB… etc)

What would you suggest? I would really appreciate your feedback.

Kind regards,
Tomaz

brunoripa · September 19, 2019, 7:59am

Hi @tomazbracic, interestingly i am working in a similar environment (IoT), thought we are just finishing a poc, so no scaling problems involved so far.

Anyway more than ideas, for now, i have a bunch of questions:

why are you moving towards one process handling a whole car trip ? What are pros/cons in your opinion ?
are you currently creating (i suppose sequentially) multiple processes to handle a single car trip ?

Without deep thinking, i would expect you to have multiple erlang nodes (to transparently migrate processes), and spawning (or killing) nodes based on some metrics (i know this is quite obvious, but it’s just the discussion start )

Bruno

zmagajna · September 19, 2019, 9:49am

Hi @brunoripa, @tomazbracic colleague here.

) We are not trying to save the data for the whole trip on one single process, but only the lastest value of the device state, like location, notifications and so on, and because of that there is a must that we show the latest data to the users and not sending the notifications when two similar messages which can cause sending the messages twice and so on.
The pros of this(my opinion I don’t know if it is correct one) is know if the business logic was already processed by the event that comes before or if the user asks(website) for the device data he gets the latest data that was processed.
The cons you can have a slower processing time since the message needs to find the process on which he needs to send it to.
2.) No, for now, we only use one node and on that node, we can start the process when the new data comes in and leave it up until the device stops sending(we only use the process to hold the live data in, but all the rest we save on the NoSQL DB, at the end process is only responsible for sending the range of data that other process need to process this range for trips and other stuff).

The set up for now works, but when it comes to the distribution, there will be a few problems that probably this kind of set up will need to deal with(netsplits, upgrade of the system/business logic, so on). Now like @tomazbracic write we will need to scale up big time in the next few months are we were wondering if the current solution can be tweaked or there is any other suggestion or implementation on how to do the distribution.

tristan · September 19, 2019, 1:37pm

There is no one size fits all solution to this. My first suggestion is always to try to not have a system that requires a uniquely named process in a cluster of nodes. There are built in tools like global but they do not scale because of the need for communicating between all nodes in the cluster.

Swarm might work for you, I don’t know how mature it is. It is a hard problem – which is why it is so important to consider options that don’t require solving it. I built https://github.com/erleans/erleans to provide a possible solution (all solutions have their caveats) based on the https://dotnet.github.io/orleans/ model that is mature itself. The company that Erleans was made for was no more and never saw production. I bring it up in threads like this in hopes one day someone will want to put resources into taking it further

lucaong · September 19, 2019, 2:53pm

I don’t know your system, so there is a chance that what I am about to say does not apply to you, but I would really try to avoid associating a globally unique and long-lived process to each car. It sounds to me like an architecture that will give more headaches than solutions, and makes it especially hard to manage high availability and horizontal scaling.

Would it make sense instead to keep this car state in some store, like Redis, or some other data store fast enough for your use-case?

elcritch · September 20, 2019, 2:33am

Congrats on the success! I’ve come to view Elixir/Erlang as providing a nice toolbox for distributed applications but you’ve still got to craft it. We do an embedded IoT Sensor at my company, and I was super happy that moving a process from one node to two was super simple. I can give some thoughts on Riak core.

Regarding Riak core, it’d likely be stable enough despite having a hiatus, as in my (somewhat limited) experience that once an Erlang/Elixir project is stable it doesn’t require too much babysitting or aging. Also Riak was/is run by companies with large data stores and most stories were that Riak (ala Erlang) rarely “broke” but flexed and might slow down.

I played around with Riak core a bit for a potential KV/data processor and store. It’d take a little bit to “wrap your head” around the Riak core model but at its core it’s relatively straightforward. It’s main benefits would be that there is a “virtual hash ring” that maps your request ID/trip ID to a hash which is then mapped to specific Erlang node(s) that is a stable mapping. Although I can’t recall if Riak core handles long living processes, but you could easily lookup or create local node processes or ets entries. Riak core does provide hooks to handle transferring data (or trips) to new nodes (draining, etc).

Riak core allows you to specify how many nodes to duplicate your request to and allows the Riak client to handle missing nodes and their responses. You can specify just one node to call in Riak client, but I could see using 2/3 redundant nodes as being a possible way to provide redundancy for high availability. Essentially something like “riak_call(:gps update, 1234, nodes: 3)” wherein the call would replicate the action across three nodes. Then if a specific node goes down you wouldn’t loose your (cached?) data. I’d only go that duplicated route if you don’t need to run expensive calculations or don’t mind the extra server overhead… And depending on your data, storing your data using rocksdb wouldn’t be infeasible either.

elcritch · September 20, 2019, 2:45am

PS Another option that might fit your needs really well would be to checkout if Phoenix Presence (Using Phoenix Presence | What did I learn) could work for you. Presuming it handles node draining and syncing, but it handles duplicate messages and synchronizing presence data which seems very similar to your case. Ie one chat room per device or similar and it scales really well and is “first class” in Elixir/Phoenix world.

If they went that route, it’d make more sense to use ETS or Mnesia as they’re native to Erlang and could run in the same cluster. Redis would require another stack… But mnesia has some nice features that’d work well for this kind of case. With mnesia it seems you could remove the one process per trip model.

P.S. @tomazbracic you might talk to someone like Erlang Solutions who could help put you on a good architecture path of you want to go 500x quickly.

DianaOlympos · September 22, 2019, 8:57am

So i promised a longer answer, but i see that @tristan answered already quite like i was going to.

I will add my voice to the chorus of “avoid single process if possible”. It seems you want both “always show latest” and “never send the same message twice”. Now the question is, how many messages can you afford to not notify on? And how much can you afford to lose some messages because a node went down ?

Depending on the answer to these, you will have different needs. If you can afford to lose a node and all the live data on it and possibly lose some messages, then it is easy, you can simply use your “global” process and use something like Swarm.

If you cannot accept these two constraints, then you enter the realm of tradeoff, because you will need to have some redundancy in your data. Which means that you get consistency problems. There are no “simple” answer there, it will be up to you to find how to do it.

In your case, you also have an ordering problem. I will suppose that you already have a function that allows you to order. I will guess it is timestamp based. Beware of that. In a distributed system, the clock of your machines will not be aligned. You may accept some errors there and decide it is good enough if you do not have high tempo data. It depends of how much you can afford to lose in term of ordering.

Otherwise, you will need to define a distributed order. That is far more complex and will also inform your design.

I think at that point, you will need to reach to people with experience handling this kind of things i would say. Because all answers will depends on your particular business logic needs. Someone offered Erlang-Solutions, there are others. Disclaimer: i may be biased here as i offer these services too.

PS: As a final note for the problem of the “single global process”, @keathley has a good blogpost on the danger of it https://keathley.io/blog/sgp.html