myronmarston

Comparing two GenStage Designs

After hearing @josevalim’s keynote at ElixirConf, I got inspired to re-write our application’s data-processing pipeline (which works OK but is pretty inefficient) to use GenStage. I initially imagined I’d build it doing something like this:

Here are the parts:

The StaleShardIDProducer periodically reads from an external datasource to see what shards are stale due to newly available data snapshots, and produces stale shard IDs.
The ShardWorker stages are producer-consumers that receive stale shard IDs from the StaleShardIDProducer and produce a list of sub-tasks that must be performed to finish building the shard.
The SubTaskWorker stages perform the sub-tasks, and send the results back to the ShardWorker to be injested into the shard. When all sub-tasks have completed, the built shard is persisted.

I got started prototyping this and quickly realized that there’s a problem: when a producer-consumer (such as my ShardWorker) emits events from handle_events/3–as I was planning on having it do–that is treated as “finishing” the events it was given, and it will turn around and send more demand upstream even though it hasn’t finished building the shard (it needs to wait for all subtasks to complete for that). That would have the effect of having it “move on” to the next shard when we really want to make it wait until the shard is complete before requesting a new one from the StaleShardIDProducer.

From my understanding of GenStage at the time, I couldn’t figure out how to make this design work, so I tweaked it slightly:

This design has a couple differences:

The ShardWorker stages are consumers, which allows them to wait to finish an event until all the sub tasks are complete and it can persist the shard.
There are actually two GenStage flows here instead of just one. The SubTaskBuffer is a new producer that the SubTaskWorker stages subscribe to. The ShardWorker stages send sub-tasks directly to the buffer so they will get worked on. Essentially, the SubTaskBuffer and SubTaskWorker stages form a worker pool with the buffer as the entry point for clients to send work to.

I prototyped this, got it working, and was pretty happy with it. I haven’t gotten to implement a production version of this, though. And today I was re-reading the GenStage docs and noticed something I hadn’t noticed before: the optional handle_subscribe/4 callback allows you to implement a :manual mode, where demand is not automatically sent upstream. Instead, you send demand upstream when appropriate by manually calling GenStage.ask/3. If I’m understanding this correctly, I think this means that my initial design is possible – I just have to make the ShardWorker stages implement :manual mode, where they only demand a new stale shard ID after finishing and persisting a shard.

So now I’m wondering which direction to go. What are the tradeoffs between these designs? I easily understand how the worker pool works (in fact, I’ve built a productionized version of it) but I’m a lot fuzzier on how things work when you have N consumers subscribed to M producers. I’m hoping @josevalim can weigh in with a recommendation :).

7 comments

#genstage

8 3119 7

2016-10-07 23:54:01 UTC

Most Liked

josevalim

Creator of Elixir

@myronmarston first let me clarify that, although the producer_consumer sends demand as soon as handle_events is done, those events are not consumed until demand is received from downstream. This is a form of pre-fetching to ensure we always have data in flux.

That said, the issue with your second design is that you no longer have back-pressure all the way. It won’t be a problem if the ShardWorker is the slowest layer in your pipeline but it also means you are not gaining anything by having two layers of GenStage.

For example, instead of a second pipeline, you could directly start the shard worker children directly in a supervisor:

          /[consumer]\
[producer]-[consumer]-[supervisor]
          \[consumer]/

It would work similarly to what you have designed: the shard worker will start multiple children in the supervisor and wait for those children to reply back.

However, my preferred solution would be to simply not do any asynchronous work in the stages. The question is: does the stale shard id producer provides enough events to make all shard workers busy, using 100% of your machine resources without the need to start subworkers?

Imagine the stale shard id producer can provide events faster than they can be consumed. In this case, you have enough work on each shard worker to use all cores without needing to break each shard worker in a subtask worker. In this scenario, if you have 4 cores and 4 shard workers, that will be enough. Breaking it into smaller tasks won’t buy you anything because all of the tasks will still be working towards the same 4 shard workers.

However, this may not be the case. You may process events faster than the producer can emit them. Or maybe the subtask workers are IO bound. In this case, you can also keep with a single stale shard id and multiple subworkers except you start multiple tasks (Task.async or Task.Supervisor.async) and await for them inside handle_events/3 for each subworker. This way you keep the synchronicity and still can break the work apart.

Post #2

myronmarston

I would say that GenStage, as a library, only provides a unidirectional flow, but ultimately, each stage is a process, and can of course do normal Elixir/Erlang stuff like send messages. While it’s perhaps a bit non-standard to send results back, there are certainly valid reasons for doing so. (As I’ll explain below).

While I can see why you would think this, the ShardWorker stage does in fact wait on the sub task workers. In my prototype, it did so by calling this, as I showed in my answer to @josevalim above:

ShardProcess.perform_build(stale_shard_id)

ShardProcess here is a gen server, so this did the following:

Started the gen server
Waited for the gen server to completely finish building the shard and exit

The ShardProcess was responsible for sending sub tasks into the worker pool, keeping track of pending tasks, and receiving results until it got all results back–at which point it would persist the shard and exit.

Such a design would probably work, and might be conceptually simpler, but I don’t think it’s the one we will go with. The shard data structure we are building can get quite large (in the worst cases, multiple GB) and as such, we really only want it to exist in one process. Including it in a message to another process would involve copying the entire data structure, which would be quite slow.

In addition, we have designed the shard so that individual pieces (such as individual values in a map within the shard data structure) can be compressed individually. We get a pretty good compression ratio (about 20:1) so by having each sub task send its results back to the ShardProcess, it allows the ShardProcess to compress the results, and put it in the shard as sub tasks complete, which works out nicely to keep things from using more memory than needed. I don’t think we could get the same kind of benefits out of the design you have proposed.

Post #7

Where Next?

View thread on forum (has 7 responses!)

genstage

Home Questions & Help>Questions

#genstage

8 3122 7

Last post

Questions & Help>Questions

Help with elixir-ts-mode in doom-emacs config

Questions & Help>Questions

Are Vi keybindings possible inside IEx?

Questions & Help>Questions

I miss the ternary operator - does anyone have a macro that allows a ternary operator in Elixir code?

Questions & Help>Questions

Empty Result on Generic Action with graphql_unnested_unions

Questions & Help>Questions

Clarification about `assign/2,3` usage in `render/1` callbacks

Questions & Help>Questions

With the new 1.20 release does it change the way you see Gleam?

Questions & Help>Questions

Using Phoenix.LiveView.TagEngine as an EEx.Engine is deprecated!

Questions & Help>Questions

About ambiguity introduced in function default arguments

Questions & Help>Questions

OpenApiSpex schema - are there any naming conventions on handling show and index routes?

Questions & Help>Questions

How to get type warnings before test failure reports

Questions & Help>Questions

Questions Questions ❯

Latest on Elixir Forum

finance - XIRR, NPV and other financial calcs matching Excel/Sheets

News>Announcing

Oaskit 0.14.1 - security release

News>News & Updates

API Management Console - runtime route toggling for Phoenix apps

News>Announcing

Testers wanted: protocol pruning for smaller client bundles

News>RFCs

Update from the Phoenix Team - Steffen Deusch | ElixirConf EU

Learning Resources>Talks

Mob 0.7.14 released!

News>News & Updates

LT: Sherlock: The truth is in the code - Aleksandr Lossenko | ElixirConf EU

Learning Resources>Talks

CI workflows on Tangled for Elixir

Blogs & Podcasts>Blog Posts

Help with elixir-ts-mode in doom-emacs config

Questions & Help>Questions

LT: The Elixir Hiring Paradox - Arjun Gillard | ElixirConf EU

Learning Resources>Talks

BEAM There, Done That with Annette Bieniusa & Guillaume Duboc - Typing Erlang & Elixir After 30 Years

Blogs & Podcasts>Podcasts

Are Vi keybindings possible inside IEx?

Questions & Help>Questions

Mob 0.7.13 released!

News>News & Updates

Finitomata v0.41.0 released!

News>News & Updates

Oban v2.23.0 released!

News>News & Updates

Elixir Forum ❯

Sub Categories:

Forums

We're in Beta

About us Mission Statement

Comparing two GenStage Designs

myronmarston

Comparing two GenStage Designs

Most Liked

josevalim

myronmarston

Where Next?

Popular in Questions

Idiomatic guard clause for checking not nil

How to use return statement with if condition in elixir?

Visual Studio Code - how to highlight html closing tags in html.eex?

How to serve an img with Phoenix?

Pattern match with multiple match options

What do you think of Gleam compared to Elixir?

How do I read a file and enumerate its lines

Best Practises for Error handling elixir?

LiveView phx-change attribute does not emit event on input text

How to get struct from map - elixir?

Other popular topics

Oban - Reliable and Observable Job Processing

Elixir way to conditionally update a map

Deploying Elixir into ECS causing many "'global' at node :"xxxxx@10.0.X.X" requested disconnect from node :"xxxx@10.0.X.X" in order to prevent overlapping partitions"

How do I put an IF statement in a templates file in Elixir Phoenix?

What to learn first - Rust or Elixir?

Starship (cross-shell prompt) error - (starship::utils): Executing command "elixir" timed out

How To Get Phoenix & VueJS working Together?

Failed to run 'elixir' command error in vs code

WebSocket is closed before the connection is established

Concat/appending lists

Questions & Help>Questions

Latest on Elixir Forum

Categories:

Sub Categories:

Forums

Popular Tags

We're in Beta

Comparing two GenStage Designs

myronmarston

Comparing two GenStage Designs

Most Liked

josevalim

myronmarston

Where Next?

Popular in Questions

Idiomatic guard clause for checking not nil

How to use return statement with if condition in elixir?

Visual Studio Code - how to highlight html closing tags in html.eex?

How to serve an img with Phoenix?

Pattern match with multiple match options

What do you think of Gleam compared to Elixir?

How do I read a file and enumerate its lines

Best Practises for Error handling elixir?

LiveView phx-change attribute does not emit event on input text

How to get struct from map - elixir?

Other popular topics

Oban - Reliable and Observable Job Processing

Elixir way to conditionally update a map

Deploying Elixir into ECS causing many "'global' at node :"xxxxx@10.0.X.X" requested disconnect from node :"xxxx@10.0.X.X" in order to prevent overlapping partitions"

How do I put an IF statement in a templates file in Elixir Phoenix?

What to learn first - Rust or Elixir?

Starship (cross-shell prompt) error - (starship::utils): Executing command "elixir" timed out

How To Get Phoenix & VueJS working Together?

Failed to run 'elixir' command error in vs code

WebSocket is closed before the connection is established

Concat/appending lists

Questions & Help>Questions

Latest on Elixir Forum

Sponsor Spotlight

Our Sponsors

Categories:

Sub Categories:

Forums

Popular Tags

Our Sponsors

We're in Beta