Idiomatic way to track work execution via multiple means simultaneously (pid, monitor ref, struct id, etc.)

david_ex · January 13, 2020, 8:19am

[Question summary] How best to track work being performed, when this info must be recalled via

worker pid (workers send messages to a GenServer manager)
monitor ref (workers are monitored and work needs to be enqueued again on worker crash)
struct id (multiple work attempts per struct need to be tracked across worker restarts to prevent infinite loops)

I’m sure this is a common occurrence, so I’d love to know how best to address this (both for my limited concurrency case, and in a more general case with “typical” concurrency)…

Let’s say I have a list of Person structs containing a birth date, and I want to process all of these (e.g. compute their current age). I have GenServer that monitors and tracks the success/failure of each Person that gets processed.

A new worker process is started to handle each Person that needs to be processed. The GenServer is notified when the processing of a Person starts, as well as when it ends successfully.

The GenServer will Process.monitor the pid of the process working on a Person struct, and if the process crashes, will enqueue the Person again. If a given Person fails to be processed multiple times, we log an error and bail.

The above setup means that the “processing state” for a given Person struct needs to be accessed by:

worker pid (so the managing GenServer state can be updated when the worker reports it was successful)
monitor ref (if a worker crashes, the work needs to be rescheduled)
struct id (we want to track the number of processing attempts to prevent infinite loops)

What’s the best approach here? Simply use multiple maps (e.g. struct_id => all_info, monitor_ref => struct_id, and worker_pid => struct_id) and make sure to keep them updated in sync?

For what it’s worth, the concurrency is going to be extremely limited (let’s say a dozen concurrent processes) because it involves third part resources that shouldn’t be overwhelmed. Given that, should ETS be used with the struct id as the main key, and accept that a full table scan will happen when looking up via worker pid or monitor ref?

(I’m trying to not muddy the waters too much with specifics, but here’s some more info on what I’ve got going: the structs get processed within a GenStage pipeline with limited concurrency. Each struct gets processed via a GenStage.ConsumerSupervisor. If a struct fails to be processed, I want to re-emit it from the GenStage producer so it can be tried again. If the same struct fails multiple attempts, and error is logged and the struct will not be re-emitted by the producer.)

LostKobrakai · January 13, 2020, 8:40am

As you’re already using GenStage I’d suggest taking a look at Broadway, which has acknowledgement functionality.

david_ex · January 13, 2020, 1:59pm

I’d rather not add in Broadway at this time, because I don’t need its throughput capacity and I can’t find documentation/examples on acknowledging messages in the simple case (success/fail) let alone a more involved scenario (enqueue if fewer than x failures, else drop message) => I’d rather avoid the extra complexity overhead until I can wrap my head around the other pipeline components

That said, I’m curious as to how this is managed in general: it seems to me that the “manager-workers” pattern (e.g. https://zxq9.com/archives/1311), and therefor the need to track relationships between structs/worker pids/monitor refs is quite common in Elixir and Erlang, so surely there’s a typical “canonical” way to handle this?

(Btw if the idiomatic way to do it is “use several maps and keep them in sync”, that’s fine: just want to check I’m not missing some obvious pattern to improve my code.)