I’m coming over from the Ruby world and have decided to play with Elixir and Phoenix. I have started to read some books, and some online tutorials. However, for this issue, I haven’t found a clear, and concise answer. I truly appreciate any help.
I have a CSV file in the format of:
artist,album
SomeBand,SomeAlbum
SomeBand,AnotherAlbum
...
SomeBand,YetAnotherAlbum
And so on - perhaps up to 50 of SomeBand
, and maybe another 50 for SomeOtherBand
.
I have a script that iterates over this CSV file, and instead of inserting the Artist
each time into the table artists
, I want it to first check a Map
to see if the artist
has been inserted prior, and what the id
is.
I have this so far in lib/populate_data.ex
:
...
alias NimbleCSV.RFC4180, as: CSV
def parse_and_import do
artists_map = %{}
"priv/data/arists_and_albums.csv"
|> File.stream!(read_ahead: 100_000)
|> CSV.parse_stream
|> Enum.map(fn row ->
[artist, _album, _type, _release_date, _catalog, _label, _main_release] = row
artists_map = check_or_insert_artist(artist, artists_map)
end)
end
def check_or_insert_artist(artist, artists_map) do
case Map.has_key?(artists_map, artist) do
true ->
artists_map
false ->
insert_artist(artist, artists_map)
end
end
def insert_artist(artist, artists_map) do
case Repo.insert(%Artist{artist: artist}) do
{:ok, struct} ->
Map.put(artists_map, artist, struct.artist_id)
{:error, _} ->
artists_map
end
end
What I am doing above is iterating over the CSV file, and on each row, checking if the artist
already exists in the Map
variable artists_map
. I am sure many are seeing the gotcha…as artists_map
is within a Enum.map
loop in the parse_and_import
function, it will not be recognized on the second run, or any subsequent run. I think it’s because I have:
|> Enum.map(fn row ->
....
artists_map = check_or_insert_artist(artist, artists_map)
I do not think artists_map
is within scope on the next run - I am guessing this is the problem. If I debug:
Before the call to check_or_insert_artist:
%{}
After the call to check_or_insert_artist:
[debug] QUERY OK db=1.6ms decode=0.9ms queue=0.4ms idle=14.7ms
INSERT INTO "artists" ("artist") VALUES ($1) RETURNING "artist_id" ["Nine Inch Nails"]
%{"Nine Inch Nails" => 162}
Second run:
Before the value of check_or_insert_artist:
%{}
# error for trying to re-insert the same artist into the unique key field
Essentially, coming from the Ruby world, I’d do:
if artist_hash.key?(artist)
artist_hash[artist]
else
# insert into artists table and then:
artist_hash[artist] = new_id
end
I have found conflicting answers on the best approach to this - I know I can use Agent
but I’m wondering if it’s overkill for this. Also, there are other variables (label
, type
) that I would want to follow a similar pattern.
Thank you for any help in this matter.