ProcessHub - process distribution library

anuarsaeed · September 29, 2023, 10:42pm

Hello

Published a new library - ProcessHub!

ProcessHub is a library designed to manage process distribution within the Elixir cluster. Each cluster initiates its own hub under the supervision tree, and all these hubs collectively form a cluster.

Drawing inspiration from frameworks like Horde and Swarm, one of the primary motivations behind developing this library was the necessity to replicate processes in order to enhance reliability.

This library was originally a component of another project and was tightly integrated with it until it grew and I made the decision to open source it and extract it from the codebase.

Please be caution when using this library, as it is currently in an alpha release phase.

Key Features:

Cluster Distribution: ProcessHub allows you to effortlessly distribute processes across a cluster of nodes.
Configurable Strategies: Tailor your distribution strategies to your specific needs. ProcessHub offers a range of strategies for redundancy handling, process replication, network failure mitigation, and more.
Scalability and Availability: Designed with scalability and availability in mind, ProcessHub’s operations are predominantly asynchronous and non-blocking. It’s eventually consistent.
Decentralized Architecture
React to events : ships with a set of events that can be hooked into and trigger code.

Theres actually much more. Please read the documentation.

Link to the documentation: Hex
Github: Repo

Thanks

anuarsaeed · October 7, 2023, 2:06pm

New version released.

v0.1.1-alpha - 2023-10-07

Elixir 1.13-1.15 support added.
Includes minor bugfixes, test fixes and documentation updates.

Added

Added GitHub Actions for automated testing.
Made sure that ProcessHub is compatible with Elixir 1.13-1.15.
Added example usage section to the documentation.

Changed

Updated ProcessHub documentation by adding a list of all available strategies.
Removed unnecessary file .tool-version generated by asdf.

Fixed

README.md table of contents links fixed.
Fixed ProcessHub await/1 function example code formatting.
Fixed tests for elixir 1.15 & OTP 26
Fixed test case which was failing in some cases due to async call being executed before.

The library is still in the alpha phase, feedback and suggestions are welcome. If you have any issues, please open an issue or a pull request on Github.

shifters98 · October 9, 2023, 6:25pm

Thanks for sharing this!! It resembles a system i designed over the last few years (as a hobby project) but much better documented and more professionally written (and not entangled with the rest of my code the way mine is - i just wanted something working without worrying to much about support, reuse etc etc!)

I know at the moment you do not have a distribution strategy as a configurable item but could they be included in the future?

Some additional examples (as well as your hash ring):

:local (to force creation on the local node the code thats creating it is running on)
:notlocal (so must not be on the local node the code creating it is running on)
:roundrobin (one for this node, one for the next node etc etc)
:foreach (create one process on each node in the current cluster)
etc

thanks

shifters

anuarsaeed · October 10, 2023, 7:26am

Thank you for the feedback!

Yes, it’s possible that the distribution will be configurable too in the future.
The current hash ring implementation works great, but providing an option to configure the distribution would make the library much more versatile.

New features will probably be added as it’s still in the early development stage.
Currently, I’m thinking of adding more failure-handling mechanisms, as there are so many things that can go wrong when designing distributed systems.

I’m hoping this library could become one of the go-to options when building distributed systems using Elixir, like Horde and Swarm are at the moment.
By gaining more traction, others could help report bugs and make the library more bulletproof. It’s currently the heart of the application I’m building.

Also, thank you for the great ideas. I really like the idea of letting the user pick the node for the process to start on.

shifters98 · October 11, 2023, 9:35am

An additional feature may be to be able to lock a process in place so it can’t be moved to another node once it starts running on its distributed node (at the risk of that node falling off the cluster etc). That would tie in with the :foreach distribution strategy (or maybe that strategy always has the processes locked to their deployed node when deployed??).

I have looked at Swarm and Horde as options for my hobby project in the past but ended up doing my own for various reasons but i love the simplicity of this projects API and its ease of use.

As we are deploying processes with it, I was thinking of having a simple macro to remove all the process template cruft from the code - something similar to what Dave Thomas does here GitHub - pragdave/component: Experiment in moving towards higher-level Elixir components (but working with ProcessHub of course). Yours thoughts?

Just some random suggestions

Shifters

anuarsaeed · October 13, 2023, 5:38am

:local (to force creation on the local node the code thats creating it is running on)
:notlocal (so must not be on the local node the code creating it is running on)

A custom distribution strategy, something like StaticDistribution , could handle these two cases. Maybe also offer a third option to allow the user to define the node for that process. Now, handling node failures could be tricky. For example, if we have a 2-node cluster (an uneven number is better, but just an example), and we lose one node, we end up starting the :notlocal processes locally, or we could choose to ignore them. This could be a user-defined option.

:roundrobin (one for this node, one for the next node etc etc)
:foreach (create one process on each node in the current cluster)

These two could be distribution strategies as well. We could handle failures using the hash ring to achieve consensus.

Anyways, creating a new distribution strategy opens up the option to define custom ones, so it’s a pretty good idea.

I also had a thought of letting the user pick the distribution strategy when starting processes, but in the long run, I think mixing different distribution strategies will backfire when handling failures. Perhaps it’s better to just start multiple ProcessHub instances with their different distribution strategies.

I was thinking of having a simple macro to remove all the process template cruft from the code - something similar to what Dave Thomas does here GitHub - pragdave/component: Experiment in moving towards higher-level Elixir components (but working with ProcessHub of course). Yours thoughts?

I was thinking of something similar. The process hand over steps could be implemented as a macro.

use GenServer

 def handle_info({:process_hub, :handover_start, startup_resp, from}, state) do
    case startup_resp do
      {:ok, pid} ->
        # Send the state to the remote process.
        Process.send(pid, {:process_hub, :handover, state}, [])

        # Signal the handler process that the state handover has been handled.
        Process.send(from, {:process_hub, :retention_handled}, [])

      _ ->
        nil
    end

    {:noreply, state}
  end

  def handle_info({:process_hub, :handover, handover_state}, _state) do
    {:noreply, handover_state}
  end

But I’m not sure if Elixir will complain about it, because if a user defines their own functions below the use statement and some handle_info/2 callbacks, these functions may need to be grouped together. Maybe I’m just overthinking here

At the moment I’m working on a system that is using ProcessHub. Going to test the library there and maybe add some more failure handling mechanisms.

Once I see the system has been running with no issues for some time I will probably implement custom distribution strategies.

Thank you for your thought, really good ideas.

shifters98 · October 13, 2023, 7:53am

Yes i think you are absolutely correct there in terms of clarity of what each process hub would do for distribution any maybe also failure handling (after all not all processes need to be restarted on failure).

Good luck with your on going project!!

Shifters

anuarsaeed · February 9, 2024, 6:47pm

Just released version 0.2.0-alpha.

Documentation

This version introduces guide pages, several new features with the most significant being the configurable strategy for distribution. This means that we can now replace the default consistent hashing implementation with our own. By default, ProcessHub still uses consistent hashing, but it now supports, out of the box, a guided distribution strategy. This strategy essentially requires the user/system to specify which processes should reside on which node, etc.

Additionally, I conducted some performance profiling to identify the biggest bottlenecks. It turns out that process handover migration was significantly slow due to numerous calls (number of spawned processes) to the Supervisor.which_children/1 function. It appears that this function struggles with a large number of processes, as also noted in the documentation. I identified and addressed some other issues related to performance, reliability, and failure handling.

For the next release, my focus will be on performance and reliability improvements while also adding smaller features such as configurable timeouts etc.

Happy coding!

shifters98 · February 14, 2024, 10:01am

Many thanks for the update!!

Looking forward to having a play.

cheers

Shifters

egze · February 14, 2024, 12:33pm

I don’t have a project for it now, but this is a very welcome library. We had distributed processes in last project and it wasn’t easy.

anuarsaeed · February 15, 2024, 4:51pm

Thank you both for the feedback; it means a lot.

I have pushed a minor upgrade that includes some additional guides to help you get started more easily. There’s still lots of functionality to cover with documentation, but this will be done step by step in the future.

shifters98 · April 26, 2024, 9:06am

FYI,

There have been several updates on github for this library recently by the author.

Not sure why he isn’t posting them here as they are useful!

Shifters

anuarsaeed · April 26, 2024, 11:08pm

I’m pleased to know that there’s interest in the development progress. I’ll continue to update any future releases here.

I’d like to share some plans regarding the library. One of the primary upcoming goals is to optimize the library by identifying bottlenecks and areas where improvements can be made. The most recent release introduced configurable timeouts, which are essential for larger systems with a significant number of nodes and processes, and which can tolerate higher latencies. This enhancement has also laid the groundwork for performance testing and fine-tuning. I’ll update the documentation later by including general guidelines for configuring timeouts and other parameters based on the cluster size and the number of processes.

Additionally, I’ll need to expand the documentation and provide more guides because various configurations and methods are undiscovered in the current documentation.

Prior to optimizing and improving the documentation, I’ll enhance the hook system to make it more robust. This will provide additional ways to extend or modify the system’s behavior. At this point, it should be relatively straightforward to replace built-in strategies with custom implementations that can hook into the system at different points as needed and execute code.

Hopefully 0.x.x-beta soon

shifters98 · April 27, 2024, 9:43am

Thanks for the update!

Look forward to the updates especially the docs!

cheers

Shifters

nulltree · April 27, 2024, 11:57am

Yes, while I don’t have any immediate use cases for this, I’m kind of silently following along since your first post as the topic deeply fascinates me.

Thanks for the docs especially, I am learning a thing or two from them!

anuarsaeed · June 2, 2024, 4:21pm

New Release 0.2.4-alpha

This release focuses on the hook system. The new hook system includes:

New hooks
Struct-based hook handlers
Hook handler priorities
New functions in the ProcessHub.Service.HookManager module to register and remove hook handlers

The base behaviors for different strategies have also been simplified by removing many callback functions. These callbacks are now dynamically attached to the running system using hooks. This means we only need to implement the callbacks that are truly necessary, primarily pure functions where a return value is expected.

Additionally, this enhancement allows us to extend the system further by hooking into different parts of it.

Happy coding!

shifters98 · June 2, 2024, 4:47pm

Sounds interesting!

Will have to have a look through the code for that!

cheers

Shifters

slouchpie · June 19, 2024, 4:11am

I started exploring this library a few hours ago and it is extremely good. I have used horde in the past and pogo more recently. Both of those libs are great but process_hub is truly a joy to use.

There is a lot to love and I have not even started using the hot-swapping and hooks stuff. Reading the list of hook events blew my mind. My imagination was piqued.

I love the simplicity of the hub_id atom. This plays well with Fly because the fly region can be used to build the hub_id, for simple per-region clustering.

I also love the way you use local ETS tables for fast lookup. Perfect use-case for ets.

I am still doing a refactor on a feature branch to see if my own hacky :pogo-based attempts at distributed process management can be replaced with :process_hub. So far I refactored a cache for oauth “state” and a per-kitchen “global singleton” order number generator process. I have way less LOC already and it is far simpler to read and understand.

I am in awe. Congratulations.

anuarsaeed · June 19, 2024, 7:44am

Hi,

Thank you for the feedback!

In the next release, I will improve the documentation to make it easier for others to use the library. Additionally, I plan to stress test the library with different configuration combinations and post the results. This will be helpful later when deciding on values for timeouts and other settings.

I will keep you all posted on the progress.

I also noticed an error in the documentation regarding hooks. The example for dynamic hook registration using the ProcessHub.t/0 configuration struct was still showing the deprecated method for defining hook handlers.

The correct method uses the new HookManager struct, as shown below:

%HookManager{
    id: :hook_id_1, 
    m: MyModule,
    f: :my_function,
    a: [:some_data, :_]
}

I will review the documentation and upload the corrections.

The library is still in the alpha stage, so there are likely some issues. If you find any problems, please report them on GitHub or in this forum, and I will gladly fix them all.

Thank you again for the awesome feedback!

slouchpie · June 19, 2024, 2:55pm

I think I found a bug when trying to stop a child that has a string ID.