I’m coming over from the Ruby world and have decided to play with Elixir and Phoenix. I have started to read some books, and some online tutorials. However, for this issue, I haven’t found a clear, and concise answer. I truly appreciate any help.
I have a CSV file in the format of:
artist,album
SomeBand,SomeAlbum
SomeBand,AnotherAlbum
...
SomeBand,YetAnotherAlbum
And so on - perhaps up to 50 of SomeBand, and maybe another 50 for SomeOtherBand.
I have a script that iterates over this CSV file, and instead of inserting the Artist each time into the table artists, I want it to first check a Map to see if the artist has been inserted prior, and what the id is.
I have this so far in lib/populate_data.ex:
...
alias NimbleCSV.RFC4180, as: CSV
  
def parse_and_import do
 artists_map = %{}
 "priv/data/arists_and_albums.csv"
  |> File.stream!(read_ahead: 100_000)
  |> CSV.parse_stream
  |> Enum.map(fn row ->
     [artist, _album, _type, _release_date, _catalog, _label, _main_release] = row
      artists_map = check_or_insert_artist(artist, artists_map)
  end)
end
  def check_or_insert_artist(artist, artists_map) do
    case Map.has_key?(artists_map, artist) do
      true ->
        artists_map
      false ->
        insert_artist(artist, artists_map)
    end
  end
  def insert_artist(artist, artists_map) do
    case Repo.insert(%Artist{artist: artist}) do
      {:ok, struct}       ->
        Map.put(artists_map, artist, struct.artist_id)
      {:error, _} ->
        artists_map
    end
  end
What I am doing above is iterating over the CSV file, and on each row, checking if the artist already exists in the Map variable artists_map. I am sure many are seeing the gotcha…as artists_map is within a Enum.map loop in the parse_and_import function, it will not be recognized on the second run, or any subsequent run. I think it’s because I have:
 |> Enum.map(fn row ->
   ....
   artists_map = check_or_insert_artist(artist, artists_map)
I do not think artists_map is within scope on the next run - I am guessing this is the problem. If I debug:
Before the call to check_or_insert_artist:
%{}
After the call to check_or_insert_artist:
[debug] QUERY OK db=1.6ms decode=0.9ms queue=0.4ms idle=14.7ms
INSERT INTO "artists" ("artist") VALUES ($1) RETURNING "artist_id" ["Nine Inch Nails"]
%{"Nine Inch Nails" => 162}
Second run:
Before the value of check_or_insert_artist:
%{}
# error for trying to re-insert the same artist into the unique key field
Essentially, coming from the Ruby world, I’d do:
if artist_hash.key?(artist)
  artist_hash[artist]
else
  # insert into artists table and then:
  artist_hash[artist] = new_id
end
I have found conflicting answers on the best approach to this - I know I can use Agent but I’m wondering if it’s overkill for this. Also, there are other variables (label, type) that I would want to follow a similar pattern.
Thank you for any help in this matter.




















