As an exercise, I decided to convert a CSV, similar in format to the one located here, to a map. Please notice that the file is subject to some non-free license, just in case you planned to use the data commercially.
The idea is that I convert the three categories (provided as columns in the CSV file) into a map of maps of lists.
%{"Group1" => %{"Classification1" => ["Domain1", "Domain2", "Domain3"]},
%{"Classification2" => ["Domain4", "Domain5", "Domain6"]},
%{"Classification3" => []},
%{"Group2" => %{"Classification4" => ["Domain7"]}}
Youâll notice that a classification can have a missing domain (in other words, the tuple would be {grouping, classification, ""}
.
I am using NimbleCSV to get the file and convert it to the map.
defmodule TaxonomyMap do
@doc """
Open the target file, parse it, and create a list of tuples of
{grouping, classification, specialization} for each row.
"""
def get_taxonomies(file) do
file
|> File.stream!(read_ahead: 1000)
|> NimbleCSV.RFC4180.parse_stream
|> Stream.map(fn [_, grouping, classification, specialization, _, _] ->
{grouping, classification, specialization}
end)
end
defp maybe_blank_list(value) do
case value do
"" -> []
_ -> [value]
end
end
defp classification_map(classification, specialization) do
%{classification => maybe_blank_list(specialization)}
end
defp nil?(acc, key_or_keys) do
get_in(acc, key_or_keys) |> is_nil
end
def run(taxonomies) do
taxonomies
|> Enum.reduce(%{}, fn({g, c, s}, acc) ->
cond do
nil?(acc, [g]) ->
# The grouping is not in the map
# Add the grouping, classification, and specialization for this row
put_in(acc, [g], classification_map(c, s))
nil?(acc, [g, c]) ->
# The classification is not in the grouping
# Add the classification and specialization to the grouping
put_in(acc, [g], Map.merge(get_in(acc, [g]), classification_map(c, s)))
!nil?(acc, [g, c]) ->
# The classification and grouping both exist
# Add the specialization to the grouping
put_in(acc, [g, c], get_in(acc, [g, c]) ++ maybe_blank_list(s))
end
end)
end
The way youâd use this is by running TaxonomyMap.get_taxonomies("taxonomy.csv") |> TaxonomyMap.run
.
Is there anything that can or should be improved? Iâd love to hear your thoughts on how I can improve for clarity or to make my code more âelixiricâ.