Wrapping a pipeline in Enum.map vs. a pipeline of Enum.map

wfgilman · August 23, 2016, 3:11am

I am working with data from an external source that I’m transforming to my internal schema and then storing in my database. The external data is a list of maps that I turn into a struct. I’m transforming the data in this fashion:

list = [%ExternalData{a: 1, b: 2, c: 3, d: 4}]

list
|> Enum.map(&Map.drop(&1, [:a, :b, :c]))
|> Enum.map(&Map.put_new(&1, :e, lookup_e_from_d(Map.get(&1, :d))))

Naturally this is inefficient because it loops over the entire list in each step, which seems unnecessary. To me it would be better to put the transformations on the map in a separate function and call a function capture within Enum.map:

list
|> Enum.map(&transform(&1))

def transform(map) do
  map
  |> Map.drop([:a, :b, :c])
  |> Map.put_new(:e, lookup_e_from_d(&Map.get(&1, :d)))
end

However, this second method does not work. When I remove the Enum.map wrapper from each of the transformation functions, I start to get BadMapError errors. Any custom function I attempt to call as a function capture in the pipeline doesn’t execute correctly. For example, the lookup function I used in the capture renders as the function itself, not the return-value (this is one such example error):

from a in MySchema.E
  where a.d = ^#Function<6.50752066/1 in :erl_eval.expr/5>
  select a

This where clause used the value returned by the function lookup_e_from_d/1 when in the Enum.map wrapper, but now it doesn’t.

Does anyone know what’s happening here and what I might be missing?

Thanks!

NobbZ · August 23, 2016, 5:38am

[quote=“wfgilman, post:1, topic:1433”]

list = [%ExternalData{a: 1, b: 2, c: 3, d: 4}]

list
|> Enum.map(&Map.drop(&1, [:a, :b, :c]))
|> Enum.map(&Map.put_new(&1, :e, lookup_e_from_d(Map.get(&1, :d))))
```[/quote]

Instead of this, you should try to use `Stream`s. But maybe benchmark first, they have some overhead.

```exs
list
|> Stream.map(&Map.drop(&1, [:a, :b, :c]))
|> Stream.map(&Map.put_new(&1, :e, lookup_e_from_d(Map.get(&1, :d))))
|> Enum.into([])

Maybe even pass around the stream instead of forcing it right now, to pause the execution until it is really necessary. Which would involve some adjustements at other places depending on how your code is structured. Also if your functions do have sideeffects, you should force the stream early to enforce the sideeffects.

riverrun · August 23, 2016, 7:01am

Is it Map.drop or Map.put_new that is failing?
It would also be helpful to see the input to the function that is failing - by adding |> IO.inspect before it.

michalmuskala · August 23, 2016, 7:34am

The problem is that both calls are not equivalent. In the first one you pass a value to lookup_e_from_d/1, while in the second one you pass a function.

|> Enum.map(&Map.put_new(&1, :e, lookup_e_from_d(Map.get(&1, :d))))

Is not the same as:

|> Map.put_new(:e, lookup_e_from_d(&Map.get(&1, :d)))

Here you’re passing a fun down to the lookup function. I believe the most idiomatic way to write that would be:

def transform(%{d: d} = map) do
  map
  |> Map.drop([:a, :b, :c])
  |> Map.put_new(:e, lookup_e_from_d(d))
end

wfgilman · August 24, 2016, 5:41pm

Thanks for the feedback everyone! The solution that worked best for my problem was to redesign my transformation functions to derive the variables used in lookups through pattern matching.