Appending to a list inside of Enum.reduce pipeline

Hi, I am parsing data from a csv. Everything is going pretty well except for the fact that my csv has a column for each state associated with a sales territory. It is set up this way so a group of states can all be in the same row as the territory id. Here is my function:

  def parse_data(csv_object) do
    csv_object
    |> Enum.map( fn x -> x["territory_id"] end)
    |> Enum.uniq()
    |> Enum.map(fn territory_id -> 
      csv_object
      |> Enum.filter(fn x -> x["territory_id"] == territory_id end)
      |> Enum.reduce(%{territory_id: territory_id ,ownerids: [] ,name: nil, sub_vertical: nil, states: [], number_rows_to_get: nil}, fn row, acc -> 
          acc
          |> Map.put(:ownerids, acc.ownerids++ [row["ownerid"]])
          |> Map.put(:territory_name, row["territory_name"])
          |> Map.put(:sub_vertical_name, row["sub_vertical_id"])
          |> Map.put(:number_rows_to_get, row["number_rows_to_get"])
          |> Map.put(:states, acc.states++ [row["state_1"]])
          |> Map.concat(:states, acc.states++ [row["state_2"]])
          |> Map.put(:states, acc.states++ [row["state_3"]])
          |> Map.put(:states, acc.states++ [row["state_4"]])
          |> Map.put(:states, acc.states++ [row["state_5"]])
          |> Map.put(:states, acc.states++ [row["state_6"]])
          |> Map.put(:states, acc.states++ [row["state_7"]])
          |> Map.put(:states, acc.states++ [row["state_8"]])
          |> Map.put(:states, acc.states++ [row["state_9"]])
          |> Map.put(:states, acc.states++ [row["state_10"]])
          |> Map.put(:states, acc.states++ [row["state_11"]])
          |> Map.put(:states, acc.states++ [row["state_12"]])
          |> Map.put(:states, acc.states++ [row["state_13"]])
          |> Map.put(:states, acc.states++ [row["state_14"]])
          |> Map.put(:states, acc.states++ [row["state_15"]])
          |> Map.put(:states, acc.states++ [row["state_16"]])
          |> Map.put(:states, acc.states++ [row["state_17"]])
          |> Map.put(:states, acc.states++ [row["state_18"]])
          |> Map.put(:states, acc.states++ [row["state_19"]])
          |> Map.put(:states, acc.states++ [row["state_20"]])
          |> Map.put(:states, acc.states++ [row["state_21"]])
          |> Map.put(:states, acc.states++ [row["state_22"]])
          |> Map.put(:states, acc.states++ [row["state_23"]])
          |> Map.put(:states, acc.states++ [row["state_24"]])
          |> Map.put(:states, acc.states++ [row["state_25"]])
          |> Map.put(:states, acc.states++ [row["state_26"]])
          |> Map.put(:states, acc.states++ [row["state_27"]])
          |> Map.put(:states, acc.states++ [row["state_28"]])
          |> Map.put(:states, acc.states++ [row["state_29"]])
      end)
    end)
  end 

Obviously the problem is that each iteration of Map.put overwrites the previous one. Being that I am needing to append values to a specific key in a map at each new step of the pipeline I am not sure how to achieve this. I thought I could use Enum.concat but that seems to only work when appending one enumerable to another and has no option for specifying a certain key to continually append to. Any help appreciated.

example of csv before being input:

example of desired output:

%{
    territory_id: "24",
    territory_name: "west_exteriors",
    rep_name: "Coleman Baker"
    ownerids: ["0051P000003BJ5t"]
    states: ["Alaska", "British Columbia", "Alberta"],
    sub_vertical_name: "Exteriors"   
    },
...
]

So basically I want to be able to append the state in column `state_1` to a list inside the map then append the state in column `state_2` to the same list etc. Hope that gives more context. Thanks!
1 Like

Hi @Justinbenfit23 . If you could provide with an example of simplified input and corresponding desired output it would help to understand the problem better.

1 Like

Hi @RudManusachi! Thank you for responding! I have added what you requested. Does that make it easier to understand what I’m stuck on?

Using a loop:

states = for i <- 1..29, do: row["state_#{i}"]
acc = %{acc | states: states ++ acc.states}

One by one:

|> Map.update(:states, &[row["state_1"] | &1])

This prepends rather than appends; appending/concatenating at the end of a list requires copying the whole list so doing it in a loop is very inefficient. Better prepend in the loop, and reverse at the end if you need to.

4 Likes

+1 to what @dom said about prepending with [state | acc.states] would be preferred if order doesn’t really matter.

@Justinbenfit23 can’t we just build up the acc like:

...
|> Enum.reduce(%{territory_id: territory_id ,ownerids: [] ,name: nil, sub_vertical: nil, states: [], number_rows_to_get: nil}, fn row, acc -> 
  # list of states from the row
  row_states = [
    row["state_1"],
    row["state_2"],
    row["state_3"],
    ...
  ]
  
  %{
    states: acc.states ++ states,
    ownerids: acc.ownerids ++ [row["ownerid"]],
    territory_name: row["territory_name"],
    ...
  }
end)

Also I would recommend to look towards Enum.group_by/2.

csv_object
|> Enum.group_by(fn row -> row["territory_id"] end)
|> Enum.map(fn {territory_id, territory_rows} ->
  initial_state = %{territory_id: territory_id, ownerids: [] ,name: nil, sub_vertical: nil, states: [], number_rows_to_get: nil}

  Enum.reduce(territory_rows, initial_state, fn row, acc ->
    states = [
      row["state_1"],
      row["state_2"],
      row["state_3"],
      ...
    ]
  
    %{
      territory_id: territory_id,
      states: acc.states ++ states,
      ownerids: acc.ownerids ++ [row["ownerid"]],
      territory_name: row["territory_name"],
      rep_name: row["rep_name"],
      sub_vertical_name: row["sub_vertical_id"],
      ...
    }
  end)
end)
4 Likes

It’s a good case for List.foldr :slight_smile:

1 Like

@dom Thank you for this! I am starting to see the utility of the & operator with these sorts of anonymous functions. Very helpful for understanding the way elixir would incorporate for loops as well. Thanks!

@RudManusachi Thank you so much for this! I have read about prepending but didn’t understand it could be used with reverse like this. This is also very helpful for me to start wrapping my head better around how you would have variable declaration inside of pipelines. I have had this view of pipelines being more rigid than your response has shown me. So big thank you to you. As far as using group_by I looked up the docs on it and I’m still a little fuzzy on what the function inside the groupby is actually doing. Is it just taking all the rest of the columns in the table and grouping them to the corresponding territory_id? If so aren’t they already in a map by this point and therefore grouped already in a sense? Sorry for the newbie questions. Thanks again!

what the function inside the groupby is actually doing. Is it just taking all the rest of the columns in the table and grouping them to the corresponding territory_id?

Kinda, except it takes all of the columns, not “the rest”.

If so aren’t they already in a map by this point and therefore grouped already in a sense?

Not really. If I understand correctly the table data is “denormalized”… I guess it by your usage of

csv_object
|> Enum.map( fn x -> x["territory_id"] end)
|> Enum.uniq()

and then inside map Enum.filter suggests that there are multiple rows with same territory_id but different ownerid.

I recommend experimenting in iex shell with those functions… for example:

data = [
  %{"territory_id" => "24", "owner_id" => "1", "state_1" => "Alabama", "state_2" => "Alaska"},
  %{"territory_id" => "24", "owner_id" => "2", "state_1" => "Alabama", "state_2" => "Alaska"},
  %{"territory_id" => "25", "owner_id" => "5", "state_1" => "California", "state_2" => "Texas"},
  %{"territory_id" => "25", "owner_id" => "6", "state_1" => "California", "state_2" => "Texas"},
]

Enum.group_by(data, fn row -> row["territory_id"] end)

would give us a grouped map with territory_ids in place of keys and lists of corresponding rows as a value… (already filtered by territory_id).

%{
  "24" => [
    %{"territory_id" => "24", "owner_id" => "1", "state_1" => "Alabama", "state_2" => "Alaska"},
    %{"territory_id" => "24", "owner_id" => "2", "state_1" => "Alabama", "state_2" => "Alaska"}
  ],
  "25" => [
    %{"territory_id" => "25", "owner_id" => "5", "state_1" => "California", "state_2" => "Texas"},
    %{"territory_id" => "25", "owner_id" => "6", "state_1" => "California", "state_2" => "Texas"}
  ]
}

That basically would replace manual call of uniq and nested filter
Then we can pipe that map into Enum.map and deal with each item as with tuple of {territory_id, list_of_rows}

3 Likes

Oh I see what you’re saying. So basically use group_by instead of uniq and filter. That makes a lot of sense! Thanks!

What solution did you arrive at?

Hi @dimitarvp! Sorry for not updating this sooner. I took a little from everyone actually. Here is the updated Enum.Reduce function:

def parse_data(csv_object) do
    csv_object
    |> Enum.map( fn x -> x["territory_id"] end)
    |> Enum.uniq()
    |> Enum.map(fn territory_id -> 
      csv_object
      |> Enum.filter(fn x -> x["territory_id"] == territory_id end)
      |> Enum.reduce(%{territory_id: territory_id,ownerids: [] ,name: nil, sub_vertical: [], states: []}, fn row, acc ->
        
          states = [
            row["state_1"],
            row["state_2"],
            row["state_3"],
            row["state_4"],
            row["state_5"],
            row["state_6"],
            row["state_7"],
            row["state_8"],
            row["state_9"],
            row["state_10"],
            row["state_11"],
            row["state_12"],
            row["state_13"],
            row["state_14"],
            row["state_15"],
            row["state_16"],
            row["state_17"],
            row["state_18"],
            row["state_19"],
            row["state_20"],
            row["state_21"],
            row["state_22"],
            row["state_23"],
            row["state_24"],
            row["state_25"],
            row["state_26"],
            row["state_27"],
            row["state_28"],
            row["state_29"],
          ]
          |> Enum.filter(fn x -> x !== "" end)

          
          
          
          acc
          |> Map.put(:ownerids, acc.ownerids++ [row["ownerid"]])
          |> Map.put(:territory_name, row["territory_name"])
          |> Map.put(:sub_vertical, row["sub_vertical_name"])
          |> Map.put(:number_rows_to_get, row["number_rows_to_get"])
          |> Map.put(:states, Enum.uniq(acc.states++ states))
          

      end) 
    end)
  end 
1 Like

But why didn’t you use Enum.group_by as suggested earlier? It still seems like you’re reinventing it.

@dimitarvp I was afraid of straying too far from what I already knew worked. It seemed like using group by would further nest the data and I wasn’t confident enough in my ability to access nested data to try it. I also needed to move on from the project so time constraints played a role as well.

Fair enough. Still, always keep in mind that shorter and more standardized code is usually easier to maintain.

Can you post a sample input in textual form, with the desired output (you posted a screenshot with the input earlier)? I’m willing to try my hand at this when I get a free 20 minutes slot.

1 Like

@dimitarvp Thanks! Here is a text input:

territory_name	territory_id	rep_name	ownerid	sub_vertical_name	state_1	state_2	state_3	state_4	state_5	state_6
west_home security	44	Matt Meiling	0051P000003mpFL	Home Security	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home security	44	Payton Black	0055G000007HfD8	Home Security	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home security	44	Matt Meiling	0051P000003mpFL	Home Security	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home security	44	Payton Black	0055G000007HfD8	Home Security	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home security	44	Matt Meiling	0051P000003mpFL	Home Security	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home security	44	Payton Black	0055G000007HfD8	Home Security	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Coleman Bakker	0051a0000034hCD	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Riley Marlowe	0055G000007G9yX	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Jacob Pace	0055G000007HGcb	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Kale Smith	0051P000003mZaJ	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Kaleb Polk	0051a00000364Pb	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Olivia Tarlow	0055G000007H4SP	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Aaron Price	0051P000003mm4f	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Matt Meiling	0051P000003mpFL	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Payton Black	0055G000007HfD8	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Matt Meiling	0051P000003mpFL	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Payton Black	0055G000007HfD8	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Matt Meiling	0051P000003mpFL	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_home automation	46	Payton Black	0055G000007HfD8	Home Automation	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Coleman Bakker	0051a0000034hCD	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Riley Marlowe	0055G000007G9yX	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Jacob Pace	0055G000007HGcb	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Kale Smith	0051P000003mZaJ	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Kaleb Polk	0051a00000364Pb	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Olivia Tarlow	0055G000007H4SP	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Aaron Price	0051P000003mm4f	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Matt Meiling	0051P000003mpFL	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Payton Black	0055G000007HfD8	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Matt Meiling	0051P000003mpFL	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Payton Black	0055G000007HfD8	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Matt Meiling	0051P000003mpFL	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
west_locksmith	48	Payton Black	0055G000007HfD8	Locksmith	British Columbia	Alberta	Saskatchewan	Manitoba	Oregon	Washington
east_pest control	52	Tremain Petersen	0051a000002FOa7	Pest Control	Quebec	Wisconsin	Illinois	Kentucky	Mississippi	Tennessee
east_pest control	52	Joshua Jenks	0055G000007G6GY	Pest Control	Quebec	Wisconsin	Illinois	Kentucky	Mississippi	Tennessee
east_pest control	52	Joshua Jenks	0055G000007G6GY	Pest Control	Quebec	Wisconsin	Illinois	Kentucky	Mississippi	Tennessee

Does this work? I appreciate the help!

And the expected output is the one you showed in your original post?

Correct