Suggestions for workflow & automation sequencing

StillLearning · June 6, 2020, 12:08pm

Hey all, looking for suggestions.

I’m trying to tack a problem where I have an arbitrary sequence of operations, determined at run time, to execute for a workflow. Because these are defined at run time (possibly loaded from an external definition), I don’t see how to use GenStage.

A structure (map, list, stack) with functions as variables assigned at runtime would work, but (I think) would be only as durable as the module definitions for the functions or the runtime duration of the application. Something like
workflow = {{name1, [fun, args]}, {name2, [fun, args]}…}
…so this doesn’t seem a great choice.

I’ve looked a eval and eval_bind to extract the AST with the thought that could be more durable, but (I think) is applied at compile time, not runtime. Also it seems there has to be a much better way to do this than macros, which violates the first rule of macros!

Alternately, I could have an arbitrarily ordered set of descriptors for each step (atoms, literals or tulles) which are attached as attributes the “object” (message or ecto record) of the workflow and then dispatch / pattern match with a custom state evaluator. Something like
workflow_object = {name, …, {current_state: something, next_state: something_else}}

While I am learning towards the last choice because I can see how it would work, as an Elixir newbie I’m likely to be oblivious to much better ways to approach this and I’d welcome any suggestions from the community on those.

(FWIW, I’ve spent quite a number of hours looking through the forums here and other sites for configurable workflows in Elixir and Erlang and not seen much coverage)

dimitarvp · June 6, 2020, 2:48pm

What still remains unclear from your post is: are all possible operations already compiled and ready to be invoked via runtime reflection? Or are you aiming at code generation at runtime?

StillLearning · June 6, 2020, 3:41pm

apologies for the ambiguity! Let me try to clarify…but in short, all possible operations would be defined at compile-time. E.g. static modules & functions with arguments…all very normal.

transform({something, some_modifier})
alert({someone, something})
persist({something, somewhere})

…while the order in which these are called could be determined at runtime in order to implement a specific workflow. runtime reflection as you described. No (obvious to me) need for code generation at runtime.

The more that I thought about this, the more it seemed like I should be matching first on an event type, then attaching it to a list or map of that describes the expected flow (e.g. [{:op-a, true}, {:op-b, false},…] which could be defined at runtime (or loaded from a configuration). If I do that, then (I think) I should be able to just iterate through the elements in the list or have a custom state machine that traverses a map.

Does this seem reasonable? I wondered if there were a more natural way of doing this elixir, with functions as first-class citizens…

(The broader context is that I’m contemplating a usage for modelling business processes, where the processes would be per-organization and I want to be able to accommodate these process definitions with a single workflow application)

dimitarvp · June 6, 2020, 4:30pm

While I can’t exactly grasp your requirements, the fact that all code that would need to be executed would already be compiled is extremely good news and makes your life much easier.

You need a way to resolve your runtime data to a MFA (Module, Function, Arguments) tuple.

Once you create that resolve function(s) you can just do:

apply(module, function, arguments)

And this gets you your runtime reflection.

As for how would you resolve input parameters to a MFA tuple, this depends very much on you and the shape of the data. If you post several examples of input data and the expected output MFA then I should be able to give you an idea or two (very likely involving Function.capture so you can even skip the apply call and just use captured_function.(arguments)).

StillLearning · June 6, 2020, 5:51pm

(Thank you for your perspectives & suggestions!)

Let me attempt an illustration…using a quality testing workflow in a Pharmaceutical setting where vaccines are being produced.

we have a machine “M” that measures a characteristic of our product and records that measurement for each batch of product it processes
the machine requires inspection at regular intervals (perhaps every 30 days). An inspection flow might look like
- schedule an inspection (“calibration” in this setting) by creating an event in 30 days
- when the machine is calibrated, there are three possible outcomes
1. if the machine is operating nominally, record the calibration results and schedule the next calibration
2. if the machine is operating outside of expected, but can be recalibrated to be within norms, then
  * record the deviation
  * schedule a secondary sample check of any product measured by “M” in last 30 days
  * if second sample check passes, record the calibration and schedule the next one
  * if second sample check identifies out-of-spec product
  -> disable machine from further production
  -> generate alert to quality manager
  -> schedule maintenance on the machine
3. if the machine is operating outside of expected, and can NOT be recalibrated to be within norms, then
  * record the deviation
  * disable machine from further production
  * generate alert to quality manager
  * schedule a secondary sample check of any product measured by “M” in last 30 days

(simplistic) input data for the inspection might be something like:
{
inspection type : scheduled,
machine : machine type,
identifier : machine ID,
location : some helpful location description
date : datetime,
spec : performance spec tuple or map
}

An inspection of an air filter might have a similar (but different) workflow to the quality control machine, as might a fire alarm, while a water leak detector in a garage might be very different. A failure report from a shipped product might trigger a root cause analysis, which then causes a machine inspection to be scheduled.

Other machines (or processes) might have different workflows or different orderings based on their type or triggering event, but they use variants of the same operations
- schedule an event (inspection, maintenance)
- record an inspection (result)
- transform an input (perhaps to generate a trend, a moving average or std dev)
- record a maintenance activity (action by someone to change something)
- test against an expected performance level (inspect or calibrate)
- release & reschedule if the machine is within an expected performance level (conformant)
- escalate & alert if the machine fails to perform as expected (deviation), perhaps triggering a further workflow

What I’d like to be able to do is have these workflows of how they use the common operations be configurable at runtime.

Is this helpful?

MrDoops · June 6, 2020, 7:45pm

I would recommend reading up on data flow graph models, behaviour trees, and state machine specifications. The idea of a data flow graph is that you have a series of steps as nodes where each step is a node and the edges connecting nodes represent a data flow dependency. I.e. given a piece of data fed into this “workflow” model the output of one step is fed in as the input to the next step.

Regarding workflows like you describe you also need control flow constructs, rules (if this pattern, then do this) being the primitive. There’s a lot of research rabbit holes here worth investigating like RETE rules engines that allow for performant evaluation of user-defined rules. RETE is pretty complicated, but the basic idea is worth mentioning:

A rule in a data flow model is really two steps:

The Pattern Matching Function/Step/Conditional Expression: Tell me when a piece of data matches some pattern (if this).
And the actual “work” function: (then do this) e.g. a Step/Job

The basic idea behind most of these rules engines is that the pattern matching function is attached to the top of the data flow graph and the “work” functions are attached as nodes dependent to the pattern functions.

Further logical constructs such as AND / OR can be managed as dependent patterns. So if you have a logical expression such as IF X AND Y THEN DO Z we could model that as something like:

+-------+    +-------+      +-------+
|       |    |       |      |       |
|   X   +---->   Y   +----->+   Z   |
|       |    |       |      |       |
+-------+    +-------+      +-------+

Where for step Z to be ran, both X and Y have to be true.

This doesn’t really get interesting until we add other rules such as IF X and F THEN DO B, so we now have a shared dependency on condition X and some new dependencies. So were we to add this rule to our graph we’d get something like:

             +-------+      +-------+
             |       |      |       |
    +-------->   F   +------>   B   |
    |        |       |      |       |
    |        +-------+      +-------+
+---+---+    +-------+      +-------+
|       |    |       |      |       |
|   X   +---->   Y   +----->+   Z   |
|       |    |       |      |       |
+-------+    +-------+      +-------+

What’s even more interesting about this is that our data model of data flow dependencies also reveals potential parallelism between steps. F and Y are both dependent on the result of X but not dependent on each other, so assuming step X produces a result, F and Y can then be ran in parallel. The concurrency opportunities of modeling computations in a DAG are why these structures are used extensively in a lot of domains.

You can find Directed Acyclic Graphs (DAGs) just about everywhere once you start looking. Git, Tensorflow, Apache Beam (not our Beam but still a cool Beam anyhow), and Apache Airflow all use DAGs in some capacity. There’s a lot of fun papers and a lot diverse domains they’re used in like expert systems, dynamic workflow modeling (more of what you’re looking for), game agent AI (usually called behaviour trees but they’re essentially the same thing if you squint a bit) and signal processing.

Like an AST a data flow graph is a data-structure that represents some computation that can be run given some input. The difference is something like the Elixir AST is manipulated at compile-time whereas a Graph or nested map data-structure of some kind is something you can manage at runtime. This also exposes a fairly difficult problem: how do we verify correctness of a program we’re throwing together at runtime? Compilers and type systems are non-trivial. Is step X flowing into step Y producing an incompatible input? Compilers are hard and doing these sorts of checks at compile time let a lone runtime is not easy. My own research has run into very large state spaces and at this point I’ll need learn lots of dynamic property based/generative testing based on layers of properties and an uncomfortable amount of category theory jargon.

Anyway super fun research area and an area I think the rich capabilities of Elixir/Erlang’s runtime can really shine and make new strides in.

I’d recommend watching these lectures by the late, great Patrick Winston.

This talk is a great overview of dataflow models. and the paper the talk is referencing is a great source of more papers.

Here’s a nice doc describing RETE: https://cis.temple.edu/~giorgio/cis587/readings/rete.html

Here are some Elixir libraries worth checking out in no particular order:

That said, you can do all of this with vanilla elixir, structs, and a bit of cleverness passing functions around or using protocols/behaviours. The idea is that you build a model of the computation as a data structure separate from the execution at runtime, how you implement this can be optimized depending on your domain.

Hope this helps!

StillLearning · June 6, 2020, 8:06pm

Zounds! no small amount of opportunity to learn here

What you are describing sounds very much what I’m looking for. I’ll definitely spend some quality time researching this before asking any further questions!