Reduce list of lists to a map where keys come from the first element (like csv file)

dkuku · December 26, 2020, 12:04pm

I think its a fairly common problem.
Lets say I got a csv file and I want to create a list of maps:

a,b,c
1,2,3
4,5,6

and I want to get:

[%{"a" => "1", "b" => "2", "c" => "3"}, %{"a" => "4", "b" => "5", "c" => "6"}]

I came with a solution but I’m wondering if there is a simpler and more performant way?

[keys | values] =
    "a,b,c\n1,2,3\n4,5,6" 
    |> String.split("\n") 
    |> Enum.map(&String.split(&1, ","))   

values
    |> Enum.map(&Enum.zip(keys, &1))
    |> Enum.map(&Map.new/1)

Nicd · December 26, 2020, 6:17pm

I would use a CSV library for this purpose to avoid issues with escaping and different formats. There are plenty on Hex.pm.

ericgray · December 26, 2020, 6:20pm

NimbleCSV is good for that.

  alias NimbleCSV.RFC4180, as: CSV

    File.read!("data/myfile.csv")
    |> CSV.parse_string()
    |> Enum.map(fn [a, b, c] -> %{a: a, b: b, c: c} end)

dkuku · December 26, 2020, 10:22pm

Thanks, I know I can parse it as csv but my problem really is multiple lists where the first is the header. I’m interested in the design pattern for this problem. I ended up joining the last two Enum.maps: https://dev.to/dkuku/phoenix-live-dashboard-custom-page-4chj

krp · December 27, 2020, 12:36am

Looks good to me. Although I would likely write the last line as

Enum.into(%{})

Eiji · December 27, 2020, 5:16am

Regarding charlist you can write code like:

# split by any horizontal whitespace character (regex: \h)
# do it globally i.e. for all horizontal whitespaces occurrences
# return binary i.e. Elixir string (Erlang string is charlist)
'your charlist' |> :string.trim() |> :re.replace("\\h+", ',', [:global, return: :binary])

and use it as a normal csv.

Regarding ps aux command output you can create a custom parser using NimbleCSV:

defmodule Example do
  # [head | tail] pattern for fetching headers and rows
  def sample([headers | rows) do
    Enum.map(rows, &sample(headers, &1, %{}))
  end

  # again [head | tail] pattern for fetching key-value pairs from 2 lists
  # we are updating accumulator
  defp sample([key | rest_keys], [value | rest_values], acc) do
    sample(rest_keys, rest_values, Map.put(acc, key, value))
  end

  # which is returned when we are at end of headers or row cells
  defp sample([], _, acc), do: acc
  defp sample(_, [], acc), do: acc
  # or just
  # defp sample([], [], acc), do: acc
  # in case we are sure that every row length is equal to headers length
end

# for a different number of spaces between columns
# for me it's 11 columns, so I used 20 as safe value
separator = Enum.map 1..20, &String.duplicate(" ", &1)
# simple custom parser with multiple separators
NimbleCSV.define(MyParser, separator: separator)
# a csv string
csv = 'ps aux' |> :os.cmd() |> List.to_string() |> String.trim()
# parse a csv and transform it to map
result = csv |> MyParser.parse_string(skip_headers: false) |> Example.sample()

Here is how I found my highest separator length:

:os.cmd('ps aux')
|> List.to_string()
# split by everything which is not space character
|> String.split(~r/[^ ]+/)
# get highest length
|> Enum.reduce(1, fn elem, acc ->
  length = String.length(elem)
  if length > acc, do: length, else: acc
end)
# this is faster than
# list |> Enum.max_by(&String.length/1) |> String.length()
# as we do not call String.length/1 twice for separator with highest length

Helpful resources:

:re Erlang module
NimbleCSV.define/2 documentation

dkuku · December 27, 2020, 11:06am

Thanks -I’ll look into nimble_csv. I got some ideas for parsing other commands output and this looks exactly what I need