I am a cloud solution architect for Microsoft. My focus is on OSS technologies in Azure - MS hired me because I know nothing about their tech ;). One of my clients is a vehicle tracking company which generates roughly one and a half billion events per month. They currently host their “Kafka-AKKA-Spark in a Kubernetes cluster with a Casandra back end” application in Azure. Their target market is Southern Africa (SA, Botswana, Namibia, Mozambique, Zimbabwe, Zambia and Malawi).
They are looking at ways to consolidate or simplify their technology stack. Having dabbled in Erlang and Elixir on small projects, I suggested that they look at implementing the solution in one technology - Elixir, and perhaps Erlang, or a combination thereof. They agreed! I think I may have bitten off more than I can chew…
I need guidance with a basic architecture to show how to implement an IoT gateway, an event processing engine, a storage facility and exposing the back end to advanced analytics engines. If my proposal goes well, there will most probably be opportunities to help the organisation implement the solution. My goal is to start out with a stock Elixir/Erlang solution and to plug in more appropriate solutions (Rabbit/Riak/CouchDB etc) if it is needed at a later stage.
I have some ideas floating around in my head, but I though it prudent to ask advice and let experienced hands guide my though process.
The forum slows down on weekends, real-life and such. ^.^
Either way, Elixir/Erlang is perfectly suited to this, however more information. You say you want to handle IoT so what kind of interface is it, do they connect over standard web interfaces, a raw TCP socket, etc… etc…? More details please.
The IoT solution will be for new products so the current protocol is not really important (there are a couple of them and this is an opportunity to do some unification work there as well). The IoT device pumps telemetry, GPS and other information information over data networks. Some networks may be extremely constrained like Edge or GPRS. Compact packaging of payloads is important, I thought of ASN.1 or ProtoBuff instead of JSON to package information. The back end, should be able to send update payloads to the IoT devices to update its firmware. The connections will probably be raw data (packed/unpacked with ASN.1 or ProtoBuff codecs) over TCP, but I am coming to the community for advice, so I really want to hear suggestions and have an open mind. I just want to do what is best for the customer.
I am glad that you think a BEAM based back end solution will work well. I even though of having a FreeBSD\Linux device (like a Raspberry Pi or Beagle Board) with Erlang/Elixir on the vehicle high availability and updates without downtime, but let’s not get ahead of yourselves.
As for one and a half billion events per month if they are requests and not some fat computations and are evenly distributed it should be around 600 per second which should be fine, here’s one benchmark that tried to spam plug and phoenix systems https://gist.github.com/omnibs/e5e72b31e6bd25caf39a
The distribution of events is not uniform for motor vehicles, the bulk of the events are generated during morning and afternoon rush hours. For trucks, the events are more uniformly distributed because they tend to be on the move for more hours of during the day. I am not sure about the volumes, but I will guess that 80% of the events are generated during 40 peak periods per month. The total duration is about 120 hours which puts us at a cool 10_000_000 events per hour during peak times. I have to get the real numbers.
There are alerts generated for some products, like geo-fencing triggers or detecting abnormal driving behaviour. These can be quite computation intensive. Then there are products that suggest alternative routes for trucks based on road conditions in some pretty remote places.
So there are many small things that have to be handled concurrently - sounds like a fun task to use Elixir / BEAM, you’ll probably not get away with a single server though
I’d cover the basics, stress test the whole thing and then decide if I’m ok with the performance of the more expensive tasks without resolving to ports and something like Go or C to handle them. There might be someone able to give you more details on how to handle that.
Just out of curiosity: the current stack that you describe is actually not that bad for this kind of job, is there a problem with it that would justify a complete rewrite?
The event engine sounds like a perfect fit for GenStage/Flow. Here is an article about Discord using it for Overwatch with 25,000 concurrent users and 15k bursts of messages. They had a small buffer problem with where they had to limit their in-progress runs to 100, and they were still able to handle it with minimal shedding and taking advantage of all of the back pressure mechanisms that GenStage offers. I think with a different use case you should be able to do it “easily” with 100% durability.
Also I think “exposing the back end to advanced analytics engines” is an easy and perfect fit for Elixir apps in an umbrella project. This is one of the real strengths IMO of Elixir/Erlang. It’s like “microservices” but a more natural fit with OTP and supervision (which makes me usually avoid using the loaded term microservices).
ok so here is how i would look at it. I work on some similar solutions for a car manufacturer. Sadly we do not use Elixir, but i do use Elixir a lot at home.
Gateway : Elixir will shine here. It will mainly depends of your transport layer and your goals for the Service, but should be completely doable and not too hard.
Event processing. That is the part GenStage could do, but there would need a work on top of it to be done. I am working on that type of things to try to see if we need to update GenStage to make it better recently. Doable, but GenStage is still a bit “low level API” for that, so it will need a bit of work.
Storage facility : that depends a lot on your needs. need more data here.
exposing the backend : not really hard either.
Something i would try, especially if you can deal with the latency it adds, is to keep Kafka between some of these backend. Kafka adds resiliency and can absorb some things when load shedding may not be acceptable. It is not always the good solution but it helps… and it make is easy to plug other backend on the firehose later.
Elixir will be perfect I am 50 % into setting up a nerves project for handling the events from a smart city project.
basically its a net of LORA gataways with a nerves/Raspery pi to handle the events of some 13 000 units passing it to a centos Elixir/RIAK KV system. Then a phoenix app to show the visuals.
Look into nerves if you need gatways/substations that need to interact with hardware. its brilliant.
RIAK KV was chosen because we have alot of small data that we cant afford to loose.
Thanks for the advice so far. I will propose a PoC along the following lines:
Use gen_coap for the data plane.
GenStage to do event processing.
Store raw data an Azure Data Lake for Advanced Analytics. It is a strategy of the organisation to do all big data this way going forward. This is a ReST call to store information in the data lake.
Simulate telemetry from 10,000 trucks/haulers/lorries that report GPS co-ordinates, speed, engine temperature, engine RPM, pneumatic system pressure and electrical system data at 1 second intervals — that is, we do 10,000 tps.
The primary goal is to use only technologies one gets with the standard install, and only packages from hex.pm. That is, I want to install stock Elixir and use only it and whatever I stick in my mix deps function to make it work, no need to install a database and a queuing system — unless it is absolutely necessary.
One question I have is which OS is better suited to the PoC? Linux or FreeBSD? I am an old Solaris kernel developer and are very familiar with Zones (Jails), DTrace and ZFS, but is that a good fit for “the masses”?
I come from Solaris as well and you’ll feel at home using FreeBSD instead of GNU/Linux. You can compile Erlang on FreeBSD with DTrace support enabled. In GNU/Linux you’ll need to enable SystemTap tracing instead. Also, kernel polling (kpoll) is faster in FreeBSD and you can use it with Erlang (--erl "+K true")
PD: I work on transportation and logistics (regular cars) and we use Elixir to index all GPS reports (among other things) on FreeBSD.
Sounds awesome Just chiming in here, another happy FreeBSD user and also CouchDB user. It may not be immediately obvious but you can run Elixir apps directly inside the CouchDB VM, which gives superb performance. All you need to do is add -s <app> to your /etc/rc.conf.d/couchdb file per https://github.com/freebsd/freebsd-ports/blob/master/databases/couchdb/files/couchdb.in#L23 and it will be started & maintained along with CouchDB. I ship the code today as a custom package that deploys its code into /usr/local/lib/couchdb/plugins/ which is a delightfully simple solution.
I would consider your database setup early on, as the need for handling network partitions from Africa to Azure is really important. CouchDB on a rPi would make an excellent local store until the network is available again to sync through to the main repo. rabbitmq has some nice IoT flavoured transports like MQTT but you would then need to create a local store first, before sync.
I would just like to say that I think you pick the right technologies. I work for a startaup where we deal with data from cars (connected car platform). The load you mention is similar to mine. Elixir/erlang/OTP is great and really capable technology. Though I am using MQTT (broker written in erlang) for 3/4g traffic part. On devicec in cars…we use C based sw. Then we use 2-way TLS, there is some Golang code for small services, Kafka for keeping everything together, currently still Postgres, but we have things prepared for Cassandra. Elixir is great. The whole ecosystem is great. Just believe in yourself.