Could someone review this CSV to data structure code?

As an exercise, I decided to convert a CSV, similar in format to the one located here, to a map. Please notice that the file is subject to some non-free license, just in case you planned to use the data commercially.

The idea is that I convert the three categories (provided as columns in the CSV file) into a map of maps of lists.

%{"Group1" => %{"Classification1" => ["Domain1", "Domain2", "Domain3"]},
              %{"Classification2" => ["Domain4", "Domain5", "Domain6"]},
              %{"Classification3" => []}, 
%{"Group2" => %{"Classification4" => ["Domain7"]}}

You’ll notice that a classification can have a missing domain (in other words, the tuple would be {grouping, classification, ""}.

I am using NimbleCSV to get the file and convert it to the map.

defmodule TaxonomyMap do
  @doc """
  Open the target file, parse it, and create a list of tuples of
  {grouping, classification, specialization} for each row.
  def get_taxonomies(file) do
    |>!(read_ahead: 1000)
    |> NimbleCSV.RFC4180.parse_stream
    |> [_, grouping, classification, specialization, _, _] ->
      {grouping, classification, specialization}

  defp maybe_blank_list(value) do
    case value do
      "" -> []
      _ -> [value]

  defp classification_map(classification, specialization) do
    %{classification => maybe_blank_list(specialization)}

  defp nil?(acc, key_or_keys) do
    get_in(acc, key_or_keys) |> is_nil

  def run(taxonomies) do
    |> Enum.reduce(%{}, fn({g, c, s}, acc) ->
      cond do
        nil?(acc, [g]) ->
          # The grouping is not in the map
          # Add the grouping, classification, and specialization for this row
          put_in(acc, [g], classification_map(c, s))
        nil?(acc, [g, c]) ->
          # The classification is not in the grouping
          # Add the classification and specialization to the grouping
          put_in(acc, [g], Map.merge(get_in(acc, [g]), classification_map(c, s)))
        !nil?(acc, [g, c]) ->
          # The classification and grouping both exist
          # Add the specialization to the grouping
          put_in(acc, [g, c], get_in(acc, [g, c]) ++ maybe_blank_list(s))

The way you’d use this is by running TaxonomyMap.get_taxonomies("taxonomy.csv") |>

Is there anything that can or should be improved? I’d love to hear your thoughts on how I can improve for clarity or to make my code more “elixiric”.

1 Like

It all looks reasonably good to me. The one place that might be more “elixiry” is the cond switch inside the reduce.

I found that part somewhat hard to reason about without flipping back and forth in the file. You’re basically doing a test and then a transformation. It’s more ‘elixiry’ to simply write the transformations as pattern matching function heads or case statements and let the computer sort out which one to use.

|> Enum.reduce(%{}, fn({g, c, s}, acc) -> inject(acc, [g, c], s ) end 

def inject( acc, [group, class], spec ) do
       case get_in(acc, [group, class] ) do 
           nil   -> inject_class( acc, [group, class], spec)
           found -> put_in(acc, [group, class], found ++ maybe_blank_lists(spec))

def inject_class(acc, [group, class], spec ) do 
     case get_in(acc, [group] ) do 
           nil   -> put_in(acc, [group], classification_map(class, spec))
           found -> put_in(acc, [group], Map.merge( found, classification_map(class, spec)))

I’m not entirely happy with that, but hope it shows the idea. I feel like there is probably an additional refactoring involving pulling out the case statements from inject that might make the code even more straightforward.


Noice! Thanks for the pattern. This is exactly what I needed to hear.

I can’t wait to get back to my workstation to refactor this.

1 Like