I work with renderfarms at work, also called grid computing. Think SETI, or Oracle Grid Engine,
I was wondering if Elixir, specifically OTP would be a good candidate for such a system? I ask because I have found very little material on this topic related to Elixir, or even Erlang. Unlike distributed computing, like most Elixir/OTP articles talk about, where multiple nodes carry out the same task, grid computing passes a task to a specific node, based on criteria for that job (RAM, CPU etc), and that node runs that specific task alone.
I’m wondering initially about the best approach to connecting worker nodes to a central scheduler process. Is that best done with OTP communication, HTTP (REST), RPC, sockets etc?
Nodes would have to register with the system with their capabilities (CPU, RAM etc) and a jobs would be dispatched to each node. I would imagine mnesia would work as a central store for available nodes?
As for a job, it could be anything like rendering, writing out geometry, shell commands etc. For this I’ve looked at Port which seems like the best option for this.
Initially I’m looking at some approaches/advice for setting up the “grid” part. How best to bring multiple nodes together to leverage the best of what Elixir has to offer, so that a single process/server can pass jobs to these nodes to execute.
If I had to design this I would probably first look at elixir client that phones home over a phoenix channel so a central job scheduler server before I started a full fledged node mesh.
Easier to get started, web layer forces you to think through the structure and formats of the commands to fan out etc. If you encounter limitations you can look at doing microclusters etc.
Such a thing could be made in any kind of language, but I feel the realtime aspects of Phoenix hold your hand enough that you can spend more time on your problem domain and not on the connectivity.
Node meshing is great, but it lends itself a lot more to homogenous node groups than heterogenous ones imo.
This is more for my education than an answer to the OP. Based on the description of the overall architecture this seems like something where you would have nested supervisors. The main system supervisor would manage several sub-supervisors where each of these is dedicated to a particular task.
Is this also a use-case for something like rabbitmq?
I’ve done this sort of thing. At two companies I wrote task orchestration software with bode matching capabilities in elixir. First place was containers, second place was VMs. Software was great. Elixir was a perfect match. Problem was political. First company wanted golang. Second company wanted python, then switched over to golang.
So I left. At the first company, the product died. At the second company, they hired a very expensive programmer with AWS EXPERIENCE that i am not convinced is very good. He was already writing insanely complicated microservices on my way out (CTO was undermining my authority on the project I owned).
Now I’m working at a place that is writing a CRUD app and I’m pretty happy.
To answer your architecture question, the way I have done it is to build out a protocol for the central server(s) to reach in to the nodes. I suggest first starting with a simple protocol (like, literally I started with running ssh commands over the wire) then abstracting those calls into a common control module with a defined module behaviour/API. Then graduate to using agents (not Agents the module, just the general concept) on each node, write an F ton of tests to make sure the agents fulfill the behaviour contract. When you’re ready flip the switch and it can be a smooth transition.
I wound up writing my own protocols (check out pony_express and erps libraries on hex.pm), but you could use rabbitmq or kafka if you don’t mind more infra burden, or you could also use erlang mesh networking. I didn’t use mesh networking because my architecture was a recursive tree and one of my node connections was literally cross continental and command needed to flow over the open internet so needed to be encrypted with TLS.
To be fair, you can connect erlang meshes over tls, but… Probably you don’t want the huge latencies, I hear “problems can happen”
That’s what I was initially thinking. The clients (worker nodes) will register to a central REST API on a server, letting it know they are available and what they are capable of. The information on these clients can then be stored in a DB or similar store and when a job needs to be dispatched, the scheduler can query this store and send the jobs to the client.
I guess that the clients will also be running a sort of REST API to be able to receive these job details, or tell them what to do?
Our current system works on the basis that we submit the job details to a node, and it will know what scripts to run to execute the commands. So essentially we are not passing in the full command, but the type and arguments required. This isn’t so much a project for work, but more of a personal interest project. The current system we have is written in C and communicates via sockets.
For commerically available equivalent systems you could look at Pixar’s Tractor or Thinkbox (now AWS) Deadline amongst others.
Usually the clients are not always up, so they would need to register in with the main control. The clients would run on their own process outside of the main server’s supervisor.
As for RaqbbitMQ I’m not sure, I don’t know a lot about it as such, but AFAIK you give it some task(s) and it will send them to connected clients. In a grid system it’s a little different in that you can allocate resources (clients) limit your job to a number of clients and specific clients etc.
My main reasons for choosing Elixir for this project are
- I like Elixir and want to use it for something more than some toy projects
- I want to learn more about grid computing and how I can leverage Elixir for it
- Elixir’s (OTP’s) stability, fault tolerance and other features make it a good candidate for long running processes that also need to communicate with a central process.
I just wanted some experienced user input with regards to which approach is most suitable/appropriate for this in order to connect the nodes to a central scheduler and communicate with them (tell them what work to do)
The devil is in the details. How do someone debug a failed job? How do you fetch input data and store output data? Those data can be huge. If you can sandbox everything including data, you can probably use something similar to CI/CD pipeline. However, it may not be the case. Some institution use huge expensive NFS servers for data and they are PITA to maintain.
Jobs would log to a location - Tractor has this configurable to an NFS mount, local store, or where ever you can tbh. Our current system writes to a local file which is served via Apache for viewing - we have engineering maintain that, so we don’t need to worry about that. But for my purposes, writing a log to a local file is for my purposes.
Input data is passed via job scripts, or HTTP. There’s a number of possible ways to run a job, Tractor for example keeps it simple and you pass the command line to the client and it will run it.
You should use a db to keep a log, but for low latency and consistent dispatch, I recommend one genserver or better gen_statem (I wrote a library called StateServer to wrap gen_statem) per node to track their existence, availability, and state. Then you can write a scatter/gather function that asynchronously calls all of the registered node “avatars” with a request, at which point the state server will respond with a thumbs up/down on availability for the job, and thus the job gets scheduled. You’ll want a durable queue of jobs (I recommend oban) since if you are at capacity you’ll have to queue them
I happen to think this is a fantastic use case for elixir. Something like this at a basic one-node controller setup would probably be about 1000 lines of code for the basic logic (you will have a lot of config code/use-case specific logic). The great thing is that scaling this to multi-node for HA will almost be trivial! Maybe 50 lines of code. Maybe. Hmu if you want more tips!
When I referred to logs I meant the logs of the tasks, that is their stdout and stderr. These aren’t required to be kept around forever and are viewed during the tasks running or shortly after they finish.
I’ll have to take a look into gen_statem as it’s new to me. Thanks for the information though, seems as though I should do some planning and thinking about this and see what I can come up with.