ProcessHub - process distribution library

anuarsaeed · June 19, 2024, 8:27pm

Good catch! Thanks for the pr!
First pull request for the project

anuarsaeed · June 21, 2024, 1:51am

Released v0.2.5-alpha Release v0.2.5-alpha · alfetahe/process-hub · GitHub

This release includes bug fixes and documentation improvements.

michalmuskala · June 21, 2024, 2:54pm

this is a bit naive question, but how does this compare to using the built-in pg module for keeping track of processes in a cluster and doing service discovery that way?

anuarsaeed · June 21, 2024, 4:14pm

This is a very good question! My initial plan was to use the pg module to track the registered processes.

Using pg would solve some of the issues, the biggest probably being keeping track of registered processes across the cluster. However, I would still need to figure out the underlying mechanisms to distribute processes, react to cluster changes, migrate processes to keep the load balanced, handle process state handovers, manage process replications, deal with network splits, and more.

My initial requirements kept growing, and I realized I needed all of that. At this stage, using pg to track processes was not an option anymore because I needed a way to store much more information regarding the registered processes. I could have used pg along with my custom ETS tables, but I decided to store all the data in the ETS tables and synchronize the table myself.

In fact, ProcessHub relies on another library I built which is based on pg. This same mechanism is used to synchronize the ETS tables between the nodes. Basically, I use the pg module to emit process registration/unregistration events to keep the ETS tables in sync. I use the same library for other pub/sub related jobs as well.

I would recommend others to use the pg module when starting out if the distribution requirements are not that strict because it is a really simple and reliable option. Once the distribution requirements become more stringent, there are other mature libraries that can help, and ProcessHub aims to be one in the future.

A small fact: I wasn’t initially planning to build such a library. I am actually writing software that needs this functionality, and at some point, I decided to extract the code into separate libraries. That’s how ProcessHub was born.

anuarsaeed · July 6, 2024, 12:13am

Version 0.2.6-alpha released

This version includes bug fixes, new API functions, and minor improvements to the documentation.

Changed

Replaced Cachex with our custom implementation to enhance performance.
Updated the default values for max_restarts and max_seconds to 100 and 4, respectively.
Storage module now accepts the table identifier as the first parameter, allowing it to be used with multiple tables.

Added

Introduced new API functions get_pids/2 and get_pid/2 to get the pid/s by child_id.
New guide page for interacting with the process registry.

Fixed

Corrected an issue where local supervisor restarts were not properly updating the global registry.
Fixed various typos and errors in the documentation.

shifters98 · July 6, 2024, 9:31am

Thanks for another update!

In my current project I use :syn (GitHub - ostinelli/syn: A scalable global Process Registry and Process Group manager for Erlang and Elixir.) for process registry for a pub/sub mechanism and :libcluster (GitHub - bitwalker/libcluster: Automatic cluster formation/healing for Elixir applications) as as an automated clustering mechanism as well as ProcessHub for process distribution and control. It does seem like there is overlap (there is!) but each does a good job for what i use it for i.e. i am using selected bits of each one as they all have there strengths and weaknesses.

I was thinking is it worth making the process tracking method selectable so you can use/make different implementations if you wish? i.e. yours, pg, syn, custom etc as each has trade offs.

My project is just a hobby project but its intended to make a changing cluster of machines look like a single machine with a single memory space and have the system behave as if its a single machine with automated handling of nodes joining and leaving the cluster and auto migration of processes to keep the system complete and running.

Its a play thing and speed and memory constraints are not of any concern and i intend to use a lot of processes like objects (similar to GitHub - wojtekmach/oop: OOP in Elixir!) but with all the genserver stuff hidden behind a macro DSL. You could then submit a program to the cluster without knowing or caring about where it is running.

Well enough rambling (sorry!), the joys of retirement is you can play with all these interesting techs!!

Best wishes

Shifters

anuarsaeed · July 7, 2024, 5:32pm

I was thinking is it worth making the process tracking method selectable so you can use/make different implementations if you wish? i.e. yours, pg, syn, custom etc as each has trade offs.

While I wouldn’t rule out the possibility of making it configurable in the future, given that the project is still evolving, the process tracking and process registry are currently very coupled with other sections of the code, making it difficult to make configurable at this moment.

Well enough rambling (sorry!), the joys of retirement is you can play with all these interesting techs!!

I wish I can get there myself too

anuarsaeed · September 15, 2024, 1:50am

Good evening, version 0.2.9-alpha out!

Includes some bugfixes related with binary child_ids.

jaybe78 · September 22, 2024, 9:08pm

Hello,

Any idea when your library will be ready for production use ? Any other library you would suggest in the meantime ?

dimitarvp · September 22, 2024, 9:12pm

Can you open-source this? I’d love to take a look!

anuarsaeed · September 22, 2024, 10:29pm

Hi, I plan to put the library through a testing phase in a real system within 6 months, but there are no guarantees.

The Horde library is very popular, and you might find it useful. Another option I recommend is building your own solution using Erlang’s :pg module.

jaybe78 · September 23, 2024, 7:06am

Well I’m kind of a beginner in elixir, I come from nodeJs + I don’t know erlang + I have 6 months ahead of me to release a mvp.
I don’t think I’ll have enough time in that timeframe to learn Erlang and develop my own distributed solution…

That’s why I’m looking for a better solution. I could use Horde but read that some people had a lot of problems with it. May be the easy way out for now is just to save data to a DB or use something like redis ?

shifters98 · September 23, 2024, 8:14am

I can when (if!) its finished/usable - i have restarted 3 times over 5 years, changing the tooling each time.

I have been taking a break from it recently (its more a winter project) so not much to actually share at the moment.

The DSL is very much a ‘in my head’ design rather than something i have coded up yet! Still reading the Metaprogramming elixir book and looking at other coding examples.

It’s very much a play thing so has no schedule for development etc!

Shifters

anuarsaeed · September 23, 2024, 8:54pm

I don’t know your specific requirements, which makes it hard to suggest anything.

Horde has been around for a while and is widely used, so I believe it performs well for its intended purpose.

You can also try this library, though it hasn’t been battle-tested yet. It’s still in development, but reported bugs will be addressed.