I still remember Jose’s talk about the GenRouter, I am guessing that the GenBroken will be its actual name. If I remember correctly on the the goals behind the GenBroker is to be able to achieve pipeline parallelism
and systems with worker pools.
The reason I interested is because I am currently using Spark for big data. One of the main reasons Spark exists is to try and solve the big data distributed problem, that is, how to process and analyze data in a distributed manner with out loosing the ability to do e.g. machine learning and other task that beyond the map, reduce tasks.
The Spark setup and use feels very heavy, installing and beginning to use it is painful, thankfully Docker is there to help but you hit a lot of java/scala specifics even if you just want to do it in Python. So I began to wonder about Elixir in this context, the first thing that you know is missing a numerics library similar to Numpy. I knew about NIFs but had never used any, then I found Rustler, a way to do NIFs in Rust which seems like a perfect match for Elixir since its safer (is it always safe unless you use unsafe?) than C/C++.
Questions
- What is the status of GenBroker?
- Is there a fundamental reason why it could solve the distributed big data problem similar to spark?
Opinions
- I believe that a Numeric library can and should exist for Elixir using NIFs, given that Python leverages low level language rutines is C/C++/Fortran through Numpy, Elixir should be able to do the same.
- Spark’s RDDs are very powerful but you might wonder if ETS tables might be able to play a similar role.
- Elixir/Erlang has a unique position in the sense that concurrency, distribution and parallelism are “native”, well understood and well behaved, the Erlang/Elixir community should start looking into how to leverage this power for big data which is becoming increasingly important.