May anyone help to explain this code by Jose in Broadway to me?

I am an Elixir novice programmer and reading the source code of Broadway in order to learn the best way of writing Elixir code.

When I am reading the part about Broadway’s Processor, I found out one part that I do not quite understand.

The messages are not passed through the Map (maybe replacing the true value) but with the Process dict. I checked the git history and found that it’s the change made by Jose from a branch named jv-speed-up-ack. Is it faster? Why?

  defp group_by_acknowledger(ackers, messages, key) do
    Enum.reduce(messages, ackers, fn %{acknowledger: {acknowledger, ack_ref, _}} = msg, acc ->
      ack_info = {acknowledger, ack_ref}
      pdict_key = {ack_info, key}
      Process.put(pdict_key, [msg | Process.get(pdict_key, [])])
      Map.put(acc, ack_info, true)
    end)
  end

  defp call_ack({{acknowledger, ack_ref} = ack_info, true}) do
    successful = Process.delete({ack_info, :successful}) || []
    failed = Process.delete({ack_info, :failed}) || []
    acknowledger.ack(ack_ref, Enum.reverse(successful), Enum.reverse(failed))
  end
7 Likes

The code above could be replaced by a map. The only reason we are using the process dictionary is because sometimes we are handling dozens of thousands of events per second, and in this case the map operations start showing up in benchmarks, so we use the mutable process dictionary instead.

I want to make it clear that people shouldn’t really be doing this in practice unless they benchmark and see an order of magnitude difference. In all my time with Erlang, I had to do this trick only twice, here and on GenStage, and both for the same reason.

19 Likes

Just curious if you tried ETS and how it fared compared to maps and process dictionary?

1 Like

I have not, I think ETS would be more complex than pdict in this case, and offer no benefits.

1 Like

Thanks for the explanation and advice very much. :+1:

If you do not mind, may I know why the Process Dictionary would be faster? As you mentioned that the process dictionary is mutable? Map is not?

Exactly. The process dictionary root key structure is mutable.

1 Like