Group within a list

Starting with this

list = [
  ["a", "01.01.-02.01."],
  ["b", "04.03."],
  ["b", "06.03."],
  ["b", "08.03."],
  ["c", "09.09. - 01.10."]
]

I would like to convert it to

[
  ["a", "01.01.-02.01."],
  ["b", "04.03., 06.03. and 08.03."],
  ["c", "09.09. - 01.10."]
]

I want to group everything with the same first element (in this case “b”) of the list and comma separate the second part (and put an additional “and” between before the last part).

The list is in alphabetical order. So I know that all “b” sublists follow each other.

I have the feeling that this can be solved very elegantly with Elixir but the solutions I come up are all clumsy. What is the best way to tackle this?

Have a look here Enum — Elixir v1.13.4

Especially map, group_by, slice and join.

1 Like
def group([ [prefix, left], [prefix, right] | other_lists ]) do
  group([ [prefix,  [left, right]] | other_lists ])
end
def group([ [prefix | list] | other_lists ]) do
  [join(List.flatten maybelist) | group(other_lists)]
end
def group([]), do: []

def join(list, acc \\ "")
def join([single_element], "") do
  to_string single_element
end
def join([prelast, last], acc) do
   "#{acc}, #{prelast} and #{last}"
end
def join([head | tail], acc) do
  join(tail, "#{acc}, #{head}")
end
1 Like

I’d go for stdlib (EDIT: I left - on purpose :grin: - a little error in the code as an exercise for the reader)

  test "test" do
    [
      ["a", "01.01.-02.01."],
      ["b", "04.03."],
      ["b", "06.03."],
      ["b", "08.03."],
      ["c", "09.09. - 01.10."]
    ]
    |> Enum.group_by(fn [h | _] -> h end, fn [_ | t] -> t end)
    |> Enum.map(fn {k, v} -> [k, join_last_special(v, ",", " and ")] end)
    |> dbg
  end

  defp join_last_special([single], _, _) do
    single
  end

  defp join_last_special(list, joiner, last_joiner) do
    head_slice = Enum.slice(list, 0..-2)
    last = List.last(list)
    head_joined = Enum.join(head_slice, joiner)
    Enum.join([head_joined, last], last_joiner)
  end

so we have no recursion (visible) and we can follow the code easily.

[
  ["a", "01.01.-02.01."],
  ["b", "04.03."],
  ["b", "06.03."],
  ["b", "08.03."],
  ["c", "09.09. - 01.10."]
] #=> [
  ["a", "01.01.-02.01."],
  ["b", "04.03."],
  ["b", "06.03."],
  ["b", "08.03."],
  ["c", "09.09. - 01.10."]
]
|> Enum.group_by(fn [h | _] -> h end, fn [_ | t] -> t end) #=> %{
  "a" => [["01.01.-02.01."]],
  "b" => [["04.03."], ["06.03."], ["08.03."]],
  "c" => [["09.09. - 01.10."]]
}
|> Enum.map(fn {k, v} -> [k, join_last_special(v, ",", " and ")] end) #=> [
  ["a", ["01.01.-02.01."]],
  ["b", "04.03.,06.03. and 08.03."],
  ["c", ["09.09. - 01.10."]]
]

I always try solutions in this order:

stdlib > handmade*
comprehension** (if applicable) > map > reduce > recursion

*) not true if you are learning. Implementing Enum functions by hand is a good (but hard) exercise.
**) with the arrival of dbg map became more sexy.

2 Likes

I feel you might get better understanding and feel more rewarded if you posted your own try at this and we helped you correct it.

It’s not shameful to make mistakes while learning, we’re all doing it every day. :+1: (You should see some of the completely nonsensical async Rust code that I’ve written in the last several months, lol.)

3 Likes

That’s a nice solution
However, you’re having several suboptimal problems here

  1. Here the list gets copied, while it doesn’t have to
head_slice = Enum.slice(list, 0..-2)
last = List.last(list)
head_joined = Enum.join(head_slice, joiner)

It is better to use List.pop_at

  1. Enum.group_by creates a map from list which is bad, because it loses the order and creates an extra copy of the structure

  2. Enum.join([head_joined, last], last_joiner) can just be string concatenation using <>

2 Likes

That’s fun. Here’s a conceptual one-liner that I’ve expanded for readability

iex(24)> list = [
...(24)>   ["a", "01.01.-02.01."],
...(24)>   ["b", "04.03."],
...(24)>   ["b", "06.03."],
...(24)>   ["b", "08.03."],
...(24)>   ["c", "09.09. - 01.10."]
...(24)> ]

iex(25)> for {k, vs} <- Enum.group_by(list, &hd(&1), &List.last(&1)),
...(26)>     {firsts, final} = Enum.split(vs, -1),
...(26)>     first = Enum.join(firsts, ", "),
...(26)>     v = (if first == "", do: hd(final), else: Enum.join([first, final], ", and ")),
...(26)>  do: [k, v]
[
  ["a", "01.01.-02.01."],
  ["b", "04.03., 06.03., and 08.03."],
  ["c", "09.09. - 01.10."]
]

3 Likes

why do you use Enum.join([lefr, right], joiner) instead of just "#{left}#{joiner}#{right}"

Yeah, that’s better since we know it’s 2 elements. I was just having a fun for comprehension diversion on a Sunday afternoon and hadn’t gone that far.

Edit: edited version

for {k, vs} <- Enum.group_by(list, &hd(&1), &List.last(&1)),
     {firsts, final} = Enum.split(vs, -1),
     first = Enum.join(firsts, ", "),
     v = (if first == "", do: hd(final), else: "#{first} and #{final}"),
 do: [k, v]
4 Likes

I would write a custom join function:

def custom_join(items, acc \\ [])
def custom_join([], _acc), do: ""
def custom_join([z], _acc), do: z
def custom_join([y, z], acc), do: [z, " and ", y | acc] |> Enum.reverse() |> to_string()
def custom_join([h | rest], acc), do: custom_join(rest, [", ", h | acc])

And use it like this:

list
|> Enum.group_by(fn [k, _v] -> k end, fn [_k, v] -> v end)
|> Enum.map(fn {k, v} -> [k, custom_join(v)] end)

Result

[
  ["a", "01.01.-02.01."],
  ["b", "04.03., 06.03. and 08.03."],
  ["c", "09.09. - 01.10."]
]

Note that the result isn’t necessarily sorted. If you want the result ordered by key, you need to add Enum.sort_by(fn {k, _v} -> k end) between the group_by and map.

3 Likes

That’s a good point. Way better than slice and last.

iex(1)> l = [0,1,2]
[0, 1, 2]
iex(2)> List.pop_at(l, -1)
{2, [0, 1]}
defp join_last_special(list, joiner, last_joiner) do
  {[last], head_slice} = List.pop_at(list, -1)
  Enum.join(head_slice, joiner) <> last_joiner <> last
end

Always happy about your comprehension solutions. :+1:

@gregvaughn @adamu @wintermeyer @Sebb
Solution with group_by is incorrect, because it loses the order. Maps in Elixir are unordered

2 Likes

You don’t need to accumulate a result and reverse it in custom_join. See this article for explanation
https://ferd.ca/erlang-s-tail-recursion-is-not-a-silver-bullet.html

I agree that one has to be aware of the fact, but I do not see the problem here. It is easier to accept that drawback and sort later if needed. Benefit: readable and easy code. I think this is a natural application for a group_by.

What should order be with this input:

[
  ["b", "04.03."],
  ["a", "01.01.-02.01."],
  ["b", "06.03."],
  ["c", "09.09. - 01.10."]
  ["b", "08.03."],
]

So, no need to group_by. Recursive function groups them in one traversal

IMO that is not important. More important is, that the code is easy to read and maybe applicable in other situations. Compare Greg’s solution to yours. It took me seconds to understand what he is doing and I had to think about your solution. If I add a |> sort to the comprehension its still easy to follow. May not be fast (runtime-wise) but fast code-read-time-wise. (more important until performance is important which is less often the case as one might think)

1 Like

I would like to compare my answer to yours, but unfortunately your answer does not compile.

Here’s a fixed version

def group([ [prefix, left], [prefix, right] | other_lists ]) do
  group([ [prefix,  [left, right]] | other_lists ])
end
def group([ [prefix | list] | other_lists ]) do
  [[prefix, join(List.flatten list)] | group(other_lists)]
end
def group([]), do: []

def join([head]) do
  to_string head
end
def join([head | tail]) do
  join(tail, to_string head)
end
def join([prelast, last], acc) do
   "#{acc}, #{prelast} and #{last}"
end
def join([head | tail], acc) do
  join(tail, "#{acc}, #{head}")
end

However, calls to to_string can be removed

The task outlined in the topic is really two problems dressed as one and the recursive answer of @hst337 actually already shows this. It’s a problem of “how to find consecutive keys and group by them” and it’s the problem of how to join a list of elements with , and and. For the latter I’d usually suggest to use some library, which can do natural language constructs like formatting a list correctly (e.g. ex_cldr), but the english version of that is not too hard to implement manually.

Spliting the responsibilities makes this quite a bit less awkward.

# This will never receive an empty list, so no need to handle it
format = fn
  [el] ->
    "#{el}"

  list ->
    {last, rest} = List.pop_at(list, -1)
    "#{Enum.join(rest, ", ")} and #{last}"
end

list
|> Enum.chunk_by(fn [key, _] -> key end)
|> Enum.map(fn [[key, _] | _] = list ->
  values = Enum.map(list, fn [_, value] -> value end)
  [key, format.(values)]
end)
4 Likes

Indeed, if the input is known to be ordered (as the OP specified), avoiding the group by + sort is much more efficient.

Obligatory benchmarks.

Name                   ips        average  deviation         median         99th %
hissssst            4.58 M      218.43 ns  ±4236.57%         208 ns         334 ns
adamu               1.39 M      721.54 ns  ±3155.34%         583 ns         792 ns
gregvaughn          1.16 M      864.17 ns  ±2572.42%         709 ns        1667 ns
LostKobrakai        0.99 M     1013.13 ns  ±3407.05%         459 ns         666 ns

Comparison:
hissssst            4.58 M
adamu               1.39 M - 3.30x slower +503.11 ns
gregvaughn          1.16 M - 3.96x slower +645.75 ns
LostKobrakai        0.99 M - 4.64x slower +794.70 ns

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.14.0
Erlang 25.1
2 Likes