Enum.reduce and Map.update functionality

tio407 · October 4, 2019, 1:56am

I’m failing to understand what Elixir is doing in this function. The objective is to create a map with each word as a key and how many times that word appears in the sentence as its value.

For example:
sentence = "one fish two fish red fish blue fish"

First the sentence is split into a list and then I get lost around the functionality. How does Elixir pass in arguments into increment/2?

  @spec count(String.t()) :: map
  def count(sentence) do
    sentence
    |> String.split(" ")
    |> Enum.reduce(%{}, &increment/2)
  end

  defp increment(word, map) do
    Map.update(map, word, 1, &(&1 + 1))
  end
end

Also, regarding the Map.update function, I see nowhere in the docs where it says it loops and accumulates; so confused as to how that is happening.

peerreynders · October 4, 2019, 2:38am

That is happening in the recursion that is inherent to Enum.reduce/3 .

Enum.reduce(list, %{}, &increment/2)

starts with an empty map as an accumulator - increment/2 then adds or updates the count for every word it encounters in the list.

iex(1)> plus_one = fn x -> x + 1 end
#Function<7.126501267/1 in :erl_eval.expr/5>
iex(2)> increment = fn word, map -> Map.update(map, word, 1, plus_one) end
#Function<13.126501267/2 in :erl_eval.expr/5>
iex(3)> sentence = "one fish two fish red fish blue fish"
"one fish two fish red fish blue fish"
iex(4)> list = String.split(sentence, " ")
["one", "fish", "two", "fish", "red", "fish", "blue", "fish"]
iex(5)> result = Enum.reduce(list, %{}, increment)
%{"blue" => 1, "fish" => 4, "one" => 1, "red" => 1, "two" => 1}
iex(6)>

mindok · October 4, 2019, 2:42am

I think if you can get your head around what Enum.reduce does, the example will become clear. There is a reasonable explanation here: https://robdor.com/2015/01/22/elixir-enum-reduce/

The three inputs into Enum.reduce are:

A list / enumerable to “loop” over (in this example it is the output of String.split)
An accumulator to capture output (in this example, an empty map is the starting point - %{})
A function to apply to each item in the list. (in this example, increment). This function takes two inputs - a value and the accumulator and returns an updated accumulator. Map.update isn’t managing the loop - Enum.reduce is managing the loop. Map.update is simply updating the word count for each word.

Enum.reduce walks through the items in the list and pushes each item into the function provided along with the current value of the accumulator. The function returns an updated accumulator each time.

Essentially you are doing:

acc1 = increment("one", %{})  # acc1 will be %{"one" => 1}
acc2 = increment("fish", acc1) # acc2 will be %{"one" => 1, "fish" => 1}
acc3 = increment("two", acc2) # acc3 will be %{"one" => 1, "fish" => 1, "two" => 1}
acc4 = increment("fish", acc3 ) # acc4 will be %{"one" => 1, "fish" => 2, "two" => 1}
...

Edit: It is worthwhile to put the effort in to understanding this - it unlocks a few key Elixir concepts.

tio407 · October 4, 2019, 3:06am

This helps so much. I think I understand it except one piece.

What is the &1 argument refer to here:
Map.update(map, word, 1, &(&1 + 1))

mindok · October 4, 2019, 3:31am

Map.update/4 is documented here:
https://hexdocs.pm/elixir/Map.html#update/4

The final argument is a function that updates the map value if it already exists. We want a function that just adds 1 to the old value each time we come across the same word, so we could write it as:

Map.update(map, word, 1, fn oldval -> oldval + 1 end)

But Elixir has shortcut syntax - anonymous functions can be defined with value placeholders rather than named parameters. See https://hexdocs.pm/elixir/Kernel.SpecialForms.html#&/1-anonymous-functions

So &(&1 + 1) is equivalent to the fn oldval -> oldval + 1 end above

noomz · August 13, 2021, 2:34pm

I just doubt about the performance. So, with this map updating, it will allocate a new variable every iteration? When will GC clear it? What happens if we have a million words to process, will it consume all our memory?

sabiwara · August 13, 2021, 11:34pm

The reason why Elixir (and other FP languages) can be efficient despite immutable data structures is because they use different data structures from imperative languages, called persistent data structures. Maps are not hash tables, but some kind of trie called HAMT (Hashed Array Map Tree). The big difference is that you don’t need to copy the full structure to create a new one, you can reuse most of the old structure.

You can see this property, called structural sharing, in action here:

map1 = Map.new(1..10_000, fn i -> {to_string(i), i} end)
map2 = Map.put(map1, "42", nil)

67770 = :erts_debug.size(map1)                  
67773 = :erts_debug.size(map2)
67835 = :erts_debug.size({map1, map2})

The memory used by map1 and map2 together is basically the same as the memory used by map1 alone, because map1 was reused to make map2. Of course, there is still some garbage, but the GC will have considerably less work

PS: Since you mention performance, implementing this word count would be more efficient using the dedicated Enum.frequencies/1 function which is optimized for this.