Enum.chunk_while/4 - am I using it right and if, is this an interesting example for the elixir documentation?

Hey fellow Elixir people,

I am still a Elixir noob and on some problems I spend a lot of time.

In order to nest a flat list into elixir structs / maps (e.g. from csv or relational db) i wrote a small parser.

The most time spent was on not understanding Enum.chunk_while/4. So as a check if I use it correctly now I wrote this small example (in form of a test). See below.

Just ahead my question: Is there a more elixir way to do that (keep in mind that I convolute multiple such chunkers and use a dynamic schema to chunk into, I guess filtering or grouping alone would not do…)?

In case this would be totally ok as a principle: What do you think, would this example (modified maybe) be something to extend the Elixir documentation? And in case there is another “Yes” here. How would that work, just a pull request to the elixir repo? The specific part which took so much time is annotated specificly below.

Many thanks for your time and consideration!

defmodule Example_chunk_while do
  @moduledoc """
  Proposition for additional Example in
  Documentation for `Enum.chunk_while`.
  """

  use ExUnit.Case

  test "chunker_test" do
    list_of_maps = [
      %{a: 5, b: 9},
      %{a: 5, b: 9},
      %{a: 7, b: 15},
      %{a: 360, b: 15},
      %{a: 360, b: 15}
    ]

    expected_result = [
      [%{a: 5, b: 9}, %{a: 5, b: 9}],
      [%{a: 7, b: 15}],
      [%{a: 360, b: 15}, %{a: 360, b: 15}]
    ]

    chunk_fun = fn element, acc ->
      # check *initial* case
      if acc == [] do
        {:cont, [element]}
      else
        # If not empty: compare with last element
        [previous | _] = acc

        previous_code = Map.get(previous, :a)

        case element.a do
          ^previous_code -> {:cont, Enum.reverse([element | acc])}
          # the following line did cost me some time to figure out!
          # In case you want to group by some features but also allow
          # entries which result in a group of "one entry", you need
          # to return the element as the acc for the next processing step.
          _ -> {:cont, acc, [element]}
        end
      end
    end

    after_fun = fn
      [] -> {:cont, []}
      acc -> {:cont, Enum.reverse(acc), []}
    end

    result = Enum.chunk_while(list_of_maps, [], chunk_fun, after_fun)

    assert result == expected_result
  end
end

Ps. I could maybe “opensource” my “parser” but it is quite messy still and changes hourly :wink:
I guess you guys already have your libraries for such things (which I didn’t really find to be honest). I am glad for pointers! I don’t currently use ecto, and I am trying to build a completely db agnostic app to begin with and add a persistency layer later on. The parser is going to be used on file.streams (with Stream.chunk_while) and on “complete” smaller files and results from outgoing db requests. Then the data is send further down the “pipeline” and gets added to the state eventually.

1 Like

I think the main thing you could do here to make this more idiomatic is to use more pattern matching in your chunk_fun. The first step is to extract your if into a match in the function head:

chunk_fun = fn
  element, [] ->
    {:cont, [element]}

  element, [previous | _] = acc ->
    previous_code = Map.get(previous, :a)

    case element.a do
      ^previous_code -> {:cont, Enum.reverse([element | acc])}
      # the following line did cost me some time to figure out!
      # In case you want to group by some features but also allow
      # entries which result in a group of "one entry", you need
      # to return the element as the acc for the next processing step.
      _ -> {:cont, acc, [element]}
    end
end

From there, it’s somewhat personal taste but you can also extract the comparison to the function head to resulting in:

chunk_fun = fn
  element, [] ->
    {:cont, [element]}

  %{a: a} = element, [%{a: a} | _] = acc ->
    {:cont, Enum.reverse([element | acc])}

  element, acc ->
    {:cont, acc, [element]}
  end
end

What I like about this is that it highlights that there are really 3 outcomes. You’re either initializing things, you’re comparing an inner attribute for equality, or you’re passing things along. Depending on the complexity of your comparison, you may not be able to do the quality check in the function head, but it’s usually good to pull out stuff like [prev | _] = acc at least.

As a tiny point, I’m not sure that the Enum.reverse is correct, it seems to me like that would be constantly flip flopping the accumulator. Rather it seems more like you’d want:

chunk_fun = fn
  element, [] ->
    {:cont, [element]}

  %{a: a} = element, [%{a: a} | _] = acc ->
    {:cont, [element | acc]}

  element, acc ->
    {:cont, Enum.reverse(acc), [element]}
  end
end

Here, you build up the acc back to front as usual, and then reverse when you emit it as a chunk.

PRs to the Elixir repo for docs are always welcome, just as always be willing to iterate with the repo owners about wording and clarity.

5 Likes

Hey thank you very much for your detailed answer!

This helps me a lot, especially with the “match” in the function head. I saw something similar somewhere but did not understand it really. But now this makes sense! Such anonymous functions still do get me, so i mostly use them for simple inline stuff. I see that I missed out on something there :D.

It is getting late here in central europe :wink: - So I’ll try to adapt your ways for my actual (and more complex) problem tomorrow. Especially your second “comparison part” need some fresh brainpower to wrap my head around ;). It looks very promising!

About my issue with the documentation. What I missed in the documentation was the pattern:

{:cont, Enum.reverse(acc), [element]}

instead there is just the patterns:

{:cont, Enum.reverse(acc), []}

or

{:cont, Enum.reverse(acc)}

shown.

This might be clear for most people, but I thought: “the acc is a special kind”. But of course one can also “preload” the acc with the current element like we do it here.
So if you or someone else agrees that this might help to understand things. I’ll be happy to try to work out a proposal for PR (using your answer of course :wink: ).

Concerning the “reverse”, I’ll need to check my code too. I changed this many times during trying to debug my code… but at the moment the “first level ids” of my nested maps do actually land at the bottom… of course for getting the data it doesn’t matter, but as you say, maybe this is completely unnecessary :smiley:

Thanks again and best regards!

I think you may have oversimplified the example, because it can be spelled Enum.chunk_by/2:

iex(1)>     list_of_maps = [
...(1)>       %{a: 5, b: 9},
...(1)>       %{a: 5, b: 9},
...(1)>       %{a: 7, b: 15},
...(1)>       %{a: 360, b: 15},
...(1)>       %{a: 360, b: 15}
...(1)>     ]
[
  %{a: 5, b: 9},
  %{a: 5, b: 9},
  %{a: 7, b: 15},
  %{a: 360, b: 15},
  %{a: 360, b: 15}
]

iex(2)> Enum.chunk_by(list_of_maps, & &1.a)
[
  [%{a: 5, b: 9}, %{a: 5, b: 9}],
  [%{a: 7, b: 15}], 
  [%{a: 360, b: 15}, %{a: 360, b: 15}]
]

The implementation from Stream.Reducers looks familiar:

1 Like

I think you may have oversimplified the example, because it can be spelled Enum.chunk_by/2:

I sure did :smiley: - Thank you for thinking through!

As soon as I have modified my existing code I’ll try to make a more complex example - at the moment it is not in a state that I could present it to a critical audience :wink:

In the function I am building, at the moment I actually just switch out Stream.chunk_by and Enum.chunk_by depending on the “mode”. I am really glad that this works :smiley:

Edit: I don’t really get these Stream.reducers, is that just what I call when I use Stream.chunk_by, or is this something separate again? It indeed looks familiar!
Edit2: Ah I found it, so Stream.chunk_by calls this implementation from the Reducers “collection”. I am not 100% sure what this means, but this means that the logic underneath is quite similar to my convolution, the reoccurring pattern reassures me also that this is an appropriate way.
Edit3: :rofl: and the Enum.chunk_by calls this Reducer thing too, I’m learning… :wink:

I’ll try to post again here, maybe takes some days…

The source for Enum and Stream are a really good read if you’re getting used to Elixir idioms, though some of them (:eyes: Stream.zip for instance) can be very challenging to follow :slight_smile:

1 Like