Could someone review this CSV to data structure code?

As an exercise, I decided to convert a CSV, similar in format to the one located here, to a map. Please notice that the file is subject to some non-free license, just in case you planned to use the data commercially.

The idea is that I convert the three categories (provided as columns in the CSV file) into a map of maps of lists.

%{"Group1" => %{"Classification1" => ["Domain1", "Domain2", "Domain3"]},
              %{"Classification2" => ["Domain4", "Domain5", "Domain6"]},
              %{"Classification3" => []}, 
%{"Group2" => %{"Classification4" => ["Domain7"]}}

You’ll notice that a classification can have a missing domain (in other words, the tuple would be {grouping, classification, ""}.

I am using NimbleCSV to get the file and convert it to the map.

defmodule TaxonomyMap do
  @doc """
  Open the target file, parse it, and create a list of tuples of
  {grouping, classification, specialization} for each row.
  """
  def get_taxonomies(file) do
    file
    |> File.stream!(read_ahead: 1000)
    |> NimbleCSV.RFC4180.parse_stream
    |> Stream.map(fn [_, grouping, classification, specialization, _, _] ->
      {grouping, classification, specialization}
    end)
  end

  defp maybe_blank_list(value) do
    case value do
      "" -> []
      _ -> [value]
    end
  end

  defp classification_map(classification, specialization) do
    %{classification => maybe_blank_list(specialization)}
  end

  defp nil?(acc, key_or_keys) do
    get_in(acc, key_or_keys) |> is_nil
  end

  def run(taxonomies) do
    taxonomies
    |> Enum.reduce(%{}, fn({g, c, s}, acc) ->
      cond do
        nil?(acc, [g]) ->
          # The grouping is not in the map
          # Add the grouping, classification, and specialization for this row
          put_in(acc, [g], classification_map(c, s))
        nil?(acc, [g, c]) ->
          # The classification is not in the grouping
          # Add the classification and specialization to the grouping
          put_in(acc, [g], Map.merge(get_in(acc, [g]), classification_map(c, s)))
        !nil?(acc, [g, c]) ->
          # The classification and grouping both exist
          # Add the specialization to the grouping
          put_in(acc, [g, c], get_in(acc, [g, c]) ++ maybe_blank_list(s))
      end
    end)
  end

The way you’d use this is by running TaxonomyMap.get_taxonomies("taxonomy.csv") |> TaxonomyMap.run.

Is there anything that can or should be improved? I’d love to hear your thoughts on how I can improve for clarity or to make my code more “elixiric”.

1 Like

It all looks reasonably good to me. The one place that might be more “elixiry” is the cond switch inside the reduce.

I found that part somewhat hard to reason about without flipping back and forth in the file. You’re basically doing a test and then a transformation. It’s more ‘elixiry’ to simply write the transformations as pattern matching function heads or case statements and let the computer sort out which one to use.

|> Enum.reduce(%{}, fn({g, c, s}, acc) -> inject(acc, [g, c], s ) end 

def inject( acc, [group, class], spec ) do
       case get_in(acc, [group, class] ) do 
           nil   -> inject_class( acc, [group, class], spec)
           found -> put_in(acc, [group, class], found ++ maybe_blank_lists(spec))
       end 
end  

def inject_class(acc, [group, class], spec ) do 
     case get_in(acc, [group] ) do 
           nil   -> put_in(acc, [group], classification_map(class, spec))
           found -> put_in(acc, [group], Map.merge( found, classification_map(class, spec)))
     end 
end 

I’m not entirely happy with that, but hope it shows the idea. I feel like there is probably an additional refactoring involving pulling out the case statements from inject that might make the code even more straightforward.

2 Likes

Noice! Thanks for the pattern. This is exactly what I needed to hear.

I can’t wait to get back to my workstation to refactor this.

1 Like