thiagomajesk

thiagomajesk

PartitionSupervisor architecture/ optimizations

Hey folks! :waving_hand:

I think I have an interesting optimization challenge, let me know what you think…

A few months ago, I created my own ECS (Entity Component System) implementation based on a different interpretation of the pattern (because why not!?). Here’s the link to the repo in case you are curious. It’s ok if you haven’t heard of the ECS pattern before. I will try to describe the problem more generically:

I have a list of “objects” (which are just numeric identifiers), and those “objects” have “behaviors” attached to them. The only thing that those “behaviors” do is handle events sent to “objects” they are attached to. A really basic example

defmodule Ping do
  def handle_event(:play, object, _args), 
    do: IO.puts("Ping for #{object}")
end

Those “behaviors” are just a module that implements a handle_event callback, not a GenServer. To make things easier to visualize, imagine you have two “behaviors”: Ping and Pong:

object1 = System.unique_integer([:positive])  # 1314
object2 = System.unique_integer([:positive])  # 1416

Ping.attach(object1)
Pong.attach(object1)

Ping.attach(object2)
Pong.attach(object2)

World.send(object1, :play) 
# Ping for 1314
# Pong for 1314

World.send(object2, :play) 
# Ping for 1416
# Pong for 1416

There are a few interesting properties about this problem:

  • A “behavior” can also send additional events

  • The handling event logic is synchronous and can potentially block.

  • Events sent to the same “object” should always be processed in order:

    Enum.each([Ping, Pong, Other], &(&1.handle_event(object, event, args))
    
  • If you send another event to the same “object”, it should be queued up and processed after the current event finishes being processed by all its “behaviors”:

    (object1) :play -> [Ping] -> [Pong] -> :forfit -> [Other]
    
  • Events sent to different “objects” can be parallelized as they don’t affect each other:

    (object1) :play   -> [Ping] -> [Pong]
    (object2) :play   -> [Ping] -> [Pong]
    (object3) :forfit -> [Other]
    

The current solution: There’s a single GenServer called World that is responsible for dispatching messages to registered objects. This GenServer uses a PartitionSupervisor, which ensures that “objects” are correctly “sharded”, causing events for the same object to always be processed sequentially in the same places, while events sent to other “objects” are processed in parallel:

[World] 
| ---(send :play)-----> [Router 1] (object1, Ping -> Pong)
|----(send :play)-----> [Router 2] (object2, Ping -> Pong)
|----(send :forfit)---> [Router 3] (object3, Ping -> Pong -> Other)

The challenge with the current solution is that we only have a fixed number of partitions, and those processes can quickly become a bottleneck if, let’s say, you have something like multiple LiveViews sending messages at the same time, which increases the load, the process reductions increase, and they slow to a halt. I could potentially work on resizing the partitions, but that would require some additional logic to check when those processed are being overwhelmed (which might be too late).

I briefly explored GenStage, which would change the topology to something like:

[World] 
| ---(send :play)-----> [Ping] -> [Pong] -> [Other]
|----(send :play)-----> [Ping] -> [Pong] -> [Other]
|----(send :forfit)---> [Ping] -> [Pong] -> [Other]

This problem almost looks like the perfect use case for GenStage; however, there’s one additional aspect of this problem that makes things a little trickier: “behaviors” and events are completely dynamic! They can be attached/ removed from an object at any point in time, and this affects which stages need to execute.

So, even though the order of the stages is always consistent, not all objects need to go through all the stages, for instance, “object1” might need ot handle: Ping → Pong → Other, while “object2” needs to handle just: Pong → Other or even Ping → Other.

I haven’t given up on GenStage, yet, but I still have to figure out if a demand of 1 makes sense to keep an influx of events, and how to properly handle partitions (GenStage.PartitionDispatcher perhaps) that can grow/ shrink.


To finish this up, I think an ideal solution (in my mind at least) would be something like a dynamic pool of partitions that grow to fit the demand and shrink when there’s a lull in events. So:

  1. An object gets a new event
  2. Checks if there’s any available process to handle the event for this object
    2.1. If yes, send the event to be processed (potentially blocking)
    2.2 If not, spawns a new process and registers that process as the object’s “handler”
  3. Processes periodically send their queue length somewhere so we can decide if the demand is too high and needs the pool to grow or shrink (because we need to ensure that processes are always processed by the same partition, we might need some logic to shed/ redirect messages between partitions if one particular object it taking too long).

This is all theoretical; I haven’t tried it yet. So, what are your thoughts? How would you solve this?

Most Liked

thiagomajesk

thiagomajesk

Ok, it took me a while to try different approaches, but I believe I have figured out the optimal solution for my use case. Just sharing here in case anyone is curious…

The first wall that I hit early on was trying to dispatch and process as many events as possible (at 60 FPS), which, of course, caused huge spikes in memory and ever-growing reductions as processes couldn’t keep up with the load. With that in mind, it became quite clear that I had to add backpressure to the system, so I started considering GenStage again.

However, because the spikes were unpredictable, I couldn’t rely on GenStage’s internal buffer size, as it was causing too many events to be dropped (one object could receive 1000 events while others just 100 at a time, and shedding those events was hurting the consistency requirement I have).

So I decided to try different approaches with GenStage until I found something that worked for me. It turns out that combining GenStage.PartitionDispatcher with GenStage.ConsumerSupervisor almost exactly matches the design I had in mind, which is basically having a dynamic pool of workers that can drain different queues of events. The topology looks a little bit like this:

            |--- [Producer Consumer] --- [Consumer Supervisor] ---|--- [Worker]
            |                                                     |--- [Worker]
            |
[Producer] -|--- [Producer Consumer] --- [Consumer Supervisor] ---|--- [Worker]
            |                                                     |--- [Worker]
            |
            |--- [Producer Consumer] --- [Consumer Supervisor] ---|--- [Worker]
                                                                  |--- [Worker]

This is the TLRD, but there was quite a lot of experimentation, so I documented the journey in more detail here in case this might be interesting to folks: Processing millions of events with elixir.

Where Next?

Popular in Discussions Top

sashaafm
Piggy backing a bit on @dvcrn topic BEAM optimization for functions with static return type?, I’ve been trying to understand in a deeper ...
New
griffinbyatt
Sobelow Sobelow is a security-focused static analysis tool for the Phoenix framework. For security researchers, it is a useful tool for g...
New
lorenzo
Hey everone! I created a prototype for my app using Nodejs for the api. But the framework I chose wasnt great (in general theresnt any g...
New
tomekowal
Hey guys! I want to create a toy project that shows a chart of temperature over time and updates every 5 seconds. I feel LiveView is per...
New
Nvim
Elixir appears to be a superior language to Python. I don’t see any advantage of Python over Elixir. Are there any?
New
Fl4m3Ph03n1x
Background A few days ago I was listening to The future of Elixir from Elixir Talks, with Dave Thomas (@pragdave ) and Brian Mitchell. I...
New
opsb
We’re considering our architecture from a viewpoint of scaling our traffic heavily over the next 6 months. Our current deployment is runn...
New
arpan
Hello everyone :wave: Today I am very excited to announce a project that I have been working on for almost 3 months now. The project is...
New
laiboonh
Hi all, I am trying to convince my team to use liveview over the current react. What are some of the points where one should consider us...
New
arcanemachine
https://nitter.net/josevalim/status/1744395345872683471 https://twitter.com/josevalim/status/1744395345872683471
New

Other popular topics Top

rms.mrcs
Hi, I need to transform a list of numbers into a map where the keys are the indexes and the values are the original values of the list. ...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
jononomo
I am trying to figure out how Mix knows whether the environment is test, dev, or prod – where is this set? Thanks.
New
vonH
When I run the Plug and I recompile I wind up having to use Ctrl C to quit iex and start again. Witht the help of rlwrap I can use the cu...
New
AstonJ
We’ve put together this wiki for Phoenix LiveView - please feel free to add any info you feel is worth including. What is Phoenix LiveV...
New
gausby
I asked this very same question on twitter and got some interesting feedback, but I thought it would be a good question to ask here as we...
1207 39297 209
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
belgoros
I’m not a pro in using Regex and can’t figure out why the following behaviour happens, especially if we take into account the difference ...
New
AngeloChecked
What learn first? Rust or Elixir Hi Elixir community! I’m here because i want learn a new language. I’m a junior developer and mainly i ...
New
TunkShif
This post is an instruction guide to help you setup your Neovim for Elixir development from scratch. It includes general information on h...
274 41539 114
New

We're in Beta

About us Mission Statement