Using Elixir for Data Science and Machine Learning

dyowee23 · June 13, 2016, 8:47am

Will future versions of Elixir be appropriate for data science/machine learning projects, like Scala or F#?

AstonJ · June 13, 2016, 10:59am

There are a few threads here that might be of interest to you: https://elixirforum.com/search?q=machine%20learning

swennemans · June 13, 2016, 2:07pm

There is also Handbook of Neuroevolution Through Erlang

murphy · June 14, 2016, 8:06am

can someone tell me about his experience with this book?
(its kinda expensive, but looks really interesting to me)

Hoegbo · June 14, 2016, 1:39pm

Its worth the money

its been some time since I plowed through it but overall its a great read. and should get you up and running with the basics. I`d recommend knowing more than basics of Erlang when reading through it though.

If AI and elixir/Erlang is what you want to do. the this is a mandatory book to read through in my opinion.

Hoegbo · June 14, 2016, 1:51pm

We are using Elixir AI systems in production. Elixir/Erlang is fantastic for it. We could use more native libraries there are a 4-5 that I know of. and there is a guy that has 3 libraries on hex. we rolled our own stuff though Hoping to be able to open source the backbone of that eventually.

Only downside is the lack of libraries compared to certain other languages. But since the BEAM is awesomeness incarnated making your own stuff is fairly easy. Break everything into small enough pieces and performance issues with regards to calculations will not be hindering you at all. I could write wall to wall about how Awesome Elixir/Erlang is for AI and especially neural networks.

cgarciae · June 14, 2016, 11:05pm

Here is my experience so far:

Two month ago I had to process a big CSV file, transform it, and write the output to another file. Python with Pandas was running out of memory because of some operations that required pure python code, so I gave Elixir a shot. The good thing is that the system didn’t freeze due lack of memory, the whole process a single parallel pipeline (as most as it could), and it was very easy to program. The bad thing is that it was SLOWWWWWW. Switched to Spark, Scala + RDDs made the problem look easy.
This month I’ve been playing around with TensorFlow. TensorFlow runs in C++ so it helps you with performance even if you are on Python, I also take it uses all the cores of CPU to that helps. But last week I implemented a CNN for MNIST and started to feel the weight of the model, so now I am now looking to run the computation in the GPU to gain some performance.

My thoughts on the matter: Elixir would be an awesome language for Machine Learning and Big Data, but it still lacks: 1) Libraries with C / C++ / Rust? bindings through NIFs or Ports to perform high performance computation, we need need the equivalent of Numpy, Pandas, and TensorFlow, 2) the attention of the scientific community, who’s currently in Python and some are migrating to Julia.

I’d be interested to hear concrete examples from @Hoegbo because the ErlangVM is not good at number crunching, you can have each layer as a process if you like but that isn’t going to get you anywhere if you don’t have BLAS or CUDA. Hopefully one day it does.

marcusjwhelan · June 4, 2017, 12:07am

@Hoegbo @cgarciae has there been any advance in ML with Elixir? Is there a way to have elixir maybe pass variables to and from python methods so python could do the number crunching but also be controlled by elixir’s concurrency model?

WolfDan · June 4, 2017, 1:36am

I’m looking for the same, as mention in this amazing talk

We need more libraries on Elixir for ML, I’m pretty sure than ML in Elixir has a lot of potential, but make libraries for ML is not an easy work

marcusjwhelan · June 4, 2017, 1:55am

Well that is what I am saying. Why can’t we leverage elixir to use something like ErlPort or Apache Thrift to use python to crunch the numbers and simply use elixir as the controller of python methods using elixir to have method concurrency with the message system. Can this not be done?

WolfDan · June 4, 2017, 3:27am

Can be done? Yes

Its the right way to do it? I dont think, thats because ports on Erlang have a high implact is not correctly used and can crash the BEAM, but just for operations its another history I don’t think it will crash the BEAM, I think with some research we can build a good library for Elixir ML

michalmuskala · June 4, 2017, 7:10am

Ports are extremely safe. They are, in principle, a mechanism that enables you to treat an OS process as an Erlang process. This means you can monitor them, link, etc just like an Erlang/Elixir process.

I know of at least one team using python to do ML that way - have python scripts doing ML and call them from Elixir using ports.

You might also find this talk from ElixirConf EU interesting - it talks about using Elixir for controlling Python, Julia and R from Elixir.

marcusjwhelan · June 4, 2017, 9:27pm

@michalmuskala so using ErlPorts would work fine or is there a better way? Or would it have to be scripts that you pass variables to and from. I ask because I am trying to do this exact same thing. I want to use elixir to pretty much be the master controlling async processes on a bus. So that python processes could block other python processes through the elixir message bus. Or at least something like this.

I would be using Phoenix, but with a React front end, Ecto with postgreSQL, with python doing all the ML dirty work. Since python already has Tensorflow and Scikit Learn, numpy, and pandas, it is at least better at mathematical computations than Elixir.

Qqwy · June 5, 2017, 8:59am

@marcusjwhelan A port is treated from the Elixir/Erlang side as a process. This means you can send messages to them, and also receive messages from them. There exist slightly higher-level wrappers like ErlPort, that expose an API to call any exposed functions directly from the other language.

Setting up a connection between Elixir and Python using ErlPort is very easy (I did it before to call into some NLTK-stuff in Python from Elixir), and probably the direction you’d want to take.

marcusjwhelan · June 5, 2017, 1:12pm

@Qqwy Thank you, I have been asking so many people what the best approach is. Either they don’t know you can do this or they don’t know a good way to do it. I have seen people say ErlPort but also that it doesn’t work well all the time. Something about data loss or something along the lines of variables not matching up.

But here is my last final question. Can I have a python process working on an elixir thread block another python process on another thread through elixir’s messaging system?

idi527 · June 5, 2017, 1:26pm

Doesn’t ErlPort start a python interpreter for every process? I don’t quite remember. If it’s true, it might be not the most efficient way.

Qqwy · June 5, 2017, 1:41pm

@idi527 Rather, ErlPort starts one python interpreter, which is one process (and which can be interacted with like any other process). You can indeed run multiple instances of these side-by-side but usually I don’t think there is a need to do this.

@marcusjwhelan

Can I have a python process working on an elixir thread block another python process on another thread through elixir’s messaging system?

I think this is only possible if the to-be-blocked python Port listens for messages, i.e. using cooperative scheduling. In-VM processes are managed by the scheduler(s) and are therefore scheduled preemptively. This is not the same for a Port (which is a connected Operating-System-level process, meaning that the Operating System also handles how its CPU-time scheduled).

I do wonder where this question comes from, as it seems like a weird and complex thing that might be avoided by structuring your application differently.

marcusjwhelan · June 5, 2017, 2:05pm

@Qqwy
I was simply curious to the possibility of doing this. But I see now that yes this can just be avoided. You shouldn’t need to have python process blocking since if you are going to need another language that doesn’t support the same concurrency model, forcing it to would be illogical.

I just had this crazy idea to have python processes create models and that you could have nodes of python scripts that only worked on a single model and if you wanted to add classifiers and create a larger neural network that could be dynamic. Something like this,
tensorflow playground where each column’s python scripts would not start to run till all of the previous columns finished. But that means it is synchronous anyways. So I was wrong to think I needed the ability to run multiple concurrent processes that needed to block one another.

AstonJ · May 17, 2018, 11:15am

3 posts were split to a new topic: Matrex - Fast matrix manipulation library for Elixir (Machine Learning)

IRLeif · October 9, 2018, 5:45pm

I’m a data scientist who loves Elixir.

There is an incredible amount of potential for Elixir in data science and machine learning, since the language provides excellent facilities for data transformation through pattern matching, piping, etc.

Also, the functional programming paradigm is a much more natural fit for data science than object-oriented and imperative programming, being conceptually closer to the problem space.

At my workplace, we primarily use Python (but also some R and Scala) for data science, which are the de facto standards in this domain. The reason for that, I think, is because of some extensive and useful libraries, such as pandas, scikit-learn, TensorFlow (by Google), Keras and many others.

Elixir has a lot of catching up to do in the library department with respect to Python.

One other aspect of Elixir (and the BEAM in general) which is of interest to data science, is the way that it allows for easy distribution and scaling, at least on the CPU-side. However, I’m not sure how to interface with GPUs via Elixir (e.g. via CUDA), which is essential for effective machine learning.

At the moment, I’m looking into using Elixir for hosting and exposing pre-trained machine learning models (trained using TensorFlow and Kreas) to consuming applications via APIs, etc. @anshuman23 has already created Tensorflex for that purpose, which looks very promising.

It is also sensible to use Elixir for the aspects of data science which have to do with gathering and preparing the data set you need to train your models (aka “tidy data”).