Using Elixir for Data Science and Machine Learning

@marcusjwhelan A port is treated from the Elixir/Erlang side as a process. This means you can send messages to them, and also receive messages from them. There exist slightly higher-level wrappers like ErlPort, that expose an API to call any exposed functions directly from the other language.

Setting up a connection between Elixir and Python using ErlPort is very easy (I did it before to call into some NLTK-stuff in Python from Elixir), and probably the direction you’d want to take. :slight_smile:

1 Like

@Qqwy Thank you, I have been asking so many people what the best approach is. Either they don’t know you can do this or they don’t know a good way to do it. I have seen people say ErlPort but also that it doesn’t work well all the time. Something about data loss or something along the lines of variables not matching up.

But here is my last final question. Can I have a python process working on an elixir thread block another python process on another thread through elixir’s messaging system?

Doesn’t ErlPort start a python interpreter for every process? I don’t quite remember. If it’s true, it might be not the most efficient way.

@idi527 Rather, ErlPort starts one python interpreter, which is one process (and which can be interacted with like any other process). You can indeed run multiple instances of these side-by-side but usually I don’t think there is a need to do this.

@marcusjwhelan

Can I have a python process working on an elixir thread block another python process on another thread through elixir’s messaging system?

I think this is only possible if the to-be-blocked python Port listens for messages, i.e. using cooperative scheduling. In-VM processes are managed by the scheduler(s) and are therefore scheduled preemptively. This is not the same for a Port (which is a connected Operating-System-level process, meaning that the Operating System also handles how its CPU-time scheduled).

I do wonder where this question comes from, as it seems like a weird and complex thing that might be avoided by structuring your application differently.

2 Likes

@Qqwy
I was simply curious to the possibility of doing this. But I see now that yes this can just be avoided. You shouldn’t need to have python process blocking since if you are going to need another language that doesn’t support the same concurrency model, forcing it to would be illogical.

I just had this crazy idea to have python processes create models and that you could have nodes of python scripts that only worked on a single model and if you wanted to add classifiers and create a larger neural network that could be dynamic. Something like this,
tensorflow playground where each column’s python scripts would not start to run till all of the previous columns finished. But that means it is synchronous anyways. So I was wrong to think I needed the ability to run multiple concurrent processes that needed to block one another.

1 Like

3 posts were split to a new topic: Matrex - Fast matrix manipulation library for Elixir (Machine Learning)

I’m a data scientist who loves Elixir.

There is an incredible amount of potential for Elixir in data science and machine learning, since the language provides excellent facilities for data transformation through pattern matching, piping, etc.

Also, the functional programming paradigm is a much more natural fit for data science than object-oriented and imperative programming, being conceptually closer to the problem space.

At my workplace, we primarily use Python (but also some R and Scala) for data science, which are the de facto standards in this domain. The reason for that, I think, is because of some extensive and useful libraries, such as pandas, scikit-learn, TensorFlow (by Google), Keras and many others.

Elixir has a lot of catching up to do in the library department with respect to Python.

One other aspect of Elixir (and the BEAM in general) which is of interest to data science, is the way that it allows for easy distribution and scaling, at least on the CPU-side. However, I’m not sure how to interface with GPUs via Elixir (e.g. via CUDA), which is essential for effective machine learning.

At the moment, I’m looking into using Elixir for hosting and exposing pre-trained machine learning models (trained using TensorFlow and Kreas) to consuming applications via APIs, etc. @anshuman23 has already created Tensorflex for that purpose, which looks very promising.

It is also sensible to use Elixir for the aspects of data science which have to do with gathering and preparing the data set you need to train your models (aka “tidy data”).

6 Likes

I see that this post is from 2016 and quite a lot has probably changed since then. I’m curious to hear your experiences with using Elixir/Erlang for AI and neural networks since then, if you’re still at it.

2 Likes

I stumbled upon these amazing blog post by @TheQuengineer :

Perhaps they will be useful to others who follow this thread.

3 Likes

This is going to sound harsh, but those posts seem quite naive - a three layer neural network is not “deep learning”, and using Erlang/Elixir processes as individual neurons would scale terribly. DL networks are essentially a pipeline of tensors that transform input tensors into output tensors. Horizontal scaling via distributed computing is horribly inefficient for this - you need massive hardware parallelism, e.g., GPUs via CUDA. Elixir might have some promise as a front end to TensorFlow or another C/C++/Rust library, but IMO implementing things in pure Elixir is a non-starter for anything but toy problems.

6 Likes

Yes, I agree with what you’re saying, @jamesnorton. I found those articles to be interesting nonetheless, despite the limited practical utility of the proposed approach.

I don’t see many blog posts and articles about using Elixir for machine learning and deep learning in particular, likely due to the limitations you mentioned. I’m easily excited.

I welcome any writings on the subject, if only as food for thought :slight_smile:

As you can see in my post above from a couple of days ago, I share your concerns about the computational aspects of machine learning in pure Elixir, accessing GPUs via CUDA, etc.

Personally, I’m more hopeful about the possibilities of using Elixir for the non-computational and operational aspects of data science, such as data gathering and wrangling, model serving/exposure, model distribution for federated/collaborative learning, monitoring model behaviour, etc.

1 Like

I am not a Data Scientist but interested in the subject and stumbled upon this library. Have not tried or tested it but since it was mentioned that for any serious machine learning you would need access to CUDA for GPU processing I thought this could be of interest to you.

The library has not been updated in a while but it seems someone already looked into CUDA bindings for Elixir/Erlang.

2 Likes

OpenCL seems better than CUDA as then you’d be able to use other streaming processing, FPGA’s, etc… etc… CUDA locks you in to nVidia hardware only when there is so much other available hardware. Even if you did just want to constraint yourself to GPU’s then Vulkan’s compute layer would be the way to go, not CUDA.

1 Like

This Erlang library for OpenCl might be worth checking out then. Its been around for a while it seems but has been recently updated https://github.com/tonyrog/cl. Should be semi-compatible with Elixir.

2 Likes

Disclaimer: I’m not a GPGPU expert. Please be skeptical about what I’m about to say.


The issue seems to be that the major projects such as Google’s TensorFlow only support CUDA and (unfortunately) not OpenCL (yet). To my limited knowledge in this area, I believe the primary reason for this is because TensorFlow depends on Eigen, which currently only supports CUDA.

It would be awesome to have a vendor-agnostic platform layer of sorts, independent of OpenCL, CUDA and other future GPGPU interfaces X. Back when I was working in the games industry, I lead a rendering team which developed an analogous proprietary platform layer for OpenGL and DirectX. Building something like that is a massive and time-consuming undertaking.

It seems to me like Nvidia has an edge with CUDA, simply because that is what most established frameworks, libraries, tools and applications have chosen to target. Data scientists who make use of said products rarely have the know-how, time and interest to develop low-level GPU interfaces.

1 Like

You might be interested in Intel’s nGraph:

nGraph Library is an open-source C++ library and runtime / compiler suite for Deep Learning ecosystems. With nGraph Library, data scientists can use their preferred deep learning framework on any number of hardware architectures, for both training and inference.

1 Like

Interesting, thanks for the tip! I’ll check it out.

Let’s keep in mind that supervised learning is only one approach to ML, and backpropagation (the heaviest number crunching part of Deep Learning) is falling out of favor. DL is not very scalable, and Unsupervised Learning (i.e. Reinforcement Learning) seems to be expected to be the key path to AGI (artificial general intelligence) — according to Richard Sutton and others

With that in mind, I have been successfully using Matrex for high-dimensional vectorized computation for Multi-Armed Bandits (Elementary Reinforcement Learning). You can check it out here. Its part of The Automata Project. Down the road I may need to use python or julia via erlports (or maybe even docker containers as used here) for the vectorized parts, but for prototyping, things are going well so far.

In my view, Neuroevolutionary Typology and Weight Evolving Artificial Neural Networks (TWEANN) with Novelty Search is one of the most promising alternative approaches, and Elixir has a head start in this sub-field of ML thanks to Gene Sher’s book.

As far as python interop goes, something like this looks pretty appealing for scaling ML as well.

The Automata Project is seeking contributors if anyone here is interested.

3 Likes

I wrote a comment on a separate thread with some more relevant info on this topic:

I’m leaving a reference here for future adventurers to discover.

2 Likes

Hei @ericsteen, what is the landscape now that we have Nx at our disposition?

1 Like