Had a little task today, and the quick solution felt a bit like a stupid pet trick … thought I’d share it here as something a little different from the continual stream of questions
The task was this: we needed all the pair-wise combinations from a set of entries, and these need to (for performance reasons) be batched up. So we have data like this:
[1..10_000]
… which gets turned into batches like this using Enum.chunk_every/2
:
[[1..500], [501..1000], [1001..1500], .. etc]
We then want to do some computation with the next value (e.g. 1) against the rest of the values in its batch (e.g. [2…500]) and then subsequently against each further batch (e.g. [501…1000], then [1001…1500], etc). Pair-wise combinations. Fun.
The computation is done async and the batches are generated on request by a GenServer
for consumption by workers. So we need to keep track of where we are in the batching, so we need to keep state. Yuck! State! amiright?
Instead of keeping the chunked data around in a state term, I instead opted to keep a function there instead which captured that data:
batches = Enum.chunk_by(sequence, batch_size)
state = fn -> next_batch(batches) end
It is then used like this from a message handler in the GenServer
:
def next_job(state) do
case state.() do
{:done, _} = done -> done
{{subject, batch}, next} -> {create_job(subject, batch), next}
end
end
What is that next_batch
call in initial state term, you ask? (Ok, you probably didn’t … but, then again, maybe you did since you have read this far!)
defp next_batch([[current| rest] | batches]) do
next_batch(current, rest, batches, [])
end
defp next_batch(current, [], [], []) do
done_tuple()
end
defp next_batch(_current, [], [], [[next | rest] | batches]) do
{{next, rest}, fn -> next_batch(next, rest, batches, []) end}
end
defp next_batch(_current, [next|rest], [], acc) do
{{next, rest}, fn -> next_batch(next, rest, acc, []) end}
end
defp next_batch(current, rest, [next_batch | batches], acc) do
{{current, next_batch},
fn -> next_batch(current, rest, batches, [next_batch | acc]) end}
end
defp done_tuple(), do: {:done, fn -> done_tuple() end}
Generators! Ignoring the moderately ugly function headers, the useful bit is that the functions return a tuple containing the result of the calculation (the next batch) as well as an anonymous function that contains the next call to next_pair_job that can be used to get the next job … this allows the “detail” of how next_batch/4
works to be entirely opaque to the calling code.
It calls the function (which is its state!) until it gets a :done
tuple. It doesn’t matter how many times it is called as the :done
tuple has a function which, when called, itself returns the same :done
tuple. As this is used in a distributed application where we can not know the order or number of calls in advance, that’s a necessary attribute to have, and the above manages that elegantly.
It was just a nice way to encapsulate the actual iteration through the batches so it could be “hidden” from the GenServer
using it without cluttering up its own message handlers. Performance was just fine, so the code cleanliness this approach offered was considered to offset the overhead.
During code review it came up as an out-of-the-ordinary approach (though certainly not novel), so thought I’d share it here. Yay, stupid pet tricks!