How to use GenStage properly?

vahuja4 · April 15, 2019, 11:28am

In the Usage Guidelines section of the GenStage doc (https://hexdocs.pm/gen_stage/GenStage.html#content), the following paragraph got me thinking:

As you get familiar with GenStage, you may want to organize your stages according to your business domain. For example, stage A does step 1 in your company workflow, stage B does step 2 and so forth. That’s an anti- pattern.

The same guideline that applies to processes also applies to GenStage: use processes/stages to model runtime properties, such as concurrency and data-transfer, and not for code organization or domain design purposes. For the latter, you should use modules and functions.

Can someone please tell me if I am thinking about the following properly:

I have a stream of events coming in (the producer of events is an external application). An event is a json and there are multiple types of events. Depending upon the type of event, certain checks need to be performed on the fields within the json, for example ‘quantity’ > 1000. Also, certain checks need to be performed regardless of the event type. Another point is that some of the checks based on a time window, for example, counting the number of orders with ‘quantity’ > 1000 in the last 10 minutes.

After reading about GenStage, it felt like the right approach to use, and I was thinking of the pipeline like this:

A (gets the events from an http stream) -> B (splits the streams based on event type) -> C1, C2, C3 (these subscribe to the stream based on the event type and process the event).

Does this sound like the right approach? Or, is this the anti-pattern that the GenStage document describes. Please help me see this properly.

kokolegorille · April 15, 2019, 12:02pm

It’s ok to use GenStage (or Flow, or Broadway) to import events, but I don’t think it’s a good idea to use GenStage for event checking.

Event checking, with all the rules You can have, is related to your business model, and Modules, Functions can solve this.

Also, You could add a stream filter, based on this business checking before entering pipelines. You could route events based on type on the appropriate pipeline.

It’s not clear what You do with those events, but I would say anything that is related to business check should go inside Modules, while processing flow of events should go into pipelines.

Fl4m3Ph03n1x · April 15, 2019, 12:11pm

Perhaps the question should be rephrased to:

Which runtime benefits will you obtain from dividing your stage into a stream that gets events and one that filters them?

Can you filter in parallel and thus obtain a runtime benefit?

vahuja4 · April 15, 2019, 4:11pm

Thank you for your reply! Could you point me to a good example please - something that illustrates the separation of the business logic and genstage functionality.

vahuja4 · April 15, 2019, 4:13pm

Yes, the filters are all having different logical tests, so they can be executed in parallel. Any pointers to good examples will be very helpful. Thank you!

Fl4m3Ph03n1x · April 16, 2019, 7:51am

Finding good examples of GenStage is hard, at least for me. Most people I know simply jump directly and use Flow. Having in mind Broadway was released and Hastega will be out soon as well, I am not sure people will still have a reason to use GenStage directly.

I do know of book that @kokolegorille recommended, which has a chapter dedicated to GenStage which may help you:

Hope it helps!