This is what I am trying to do. I have a text file (movies.csv) with 40k lines. I wrote a function to parse each line using Stream/Enums. It parses the data I want and i am able to view the parsed data from iex ( though i can only see partial list… say 200+ lines).
I am using the parser to insert the data into the DB. What is interesting is, it only inserts a small set of the entire file into the DB. I am unable to insert all the records into the DB ( even though the small set takes only 10 ms)…
How can i insert all the 40k records in the database?
Inserting a .csv file into Postgres will be much much faster is you use the Postgres COPY statement. That said, it’s very difficult to give guidance without seeing the code you are using to do the inserting.
You have -imdb_id not _imdb_id which I suspect means you probably end up with 239 rows in total (not tested).
Longer version
Here’s a sightly cleaned up version which I did just to understand what you’re trying to do. Its how I saw that you had mistyped _imdb_id. I also added {:ok, movie} = Repo.insert(movie) because if the insert fails you’ll get a match error and therefore know if the insert fails. This is a helpful debugging approach. I also changed Stream functions to Enum functions since you are already reading the whole file into memory so there’s nothing much to be saved and some performance to be gained.
defmodule Movie do
@name_file "data/movies.csv"
def movies do
[_header | data_rows] =
File.read!(@name_file)
|> String.split("\n")
data_rows
|> Enum.map(&String.split(&1, ","))
|> Enum.map(fn
[name, type, genres, _homepage, _imdb_id] ->
%Movie{name: name, type: type, genres: genres}
_ -> nil
end)
|> Enum.reject(&is_nil/1)
end
def load_database do
Enum.each movies(), fn movie ->
{:ok, movie} = Repo.insert(movie)
end)
end
end
@kip. Thank you very much. I made the changes you suggested but still no luck. It only inserts 239. Also I notied, it is randomly inserting the rows. i.e., it is not inserting rows sequentially…
i want to learn what exactly is happening when it is only inserting a small subset of records ( instead of the entire file) but due to time constraint, i am going to use Postgresql COPY.
If anyone has suggestions, pointers, it will be highly appreciated.
If you’re not getting match errors on the insert (with the code example I suggested) then I suspect something else is going on.
I’d be interested to know what you get with:
def load_database do
IO.inspect Enum.count(movies()), label: “Number of movies”
Enum.each movies(), fn movie ->
{:ok, movie} = Repo.insert(movie)
end)
end
@kip. Thank you. The count helped find the issue. There were records with homepage and genres has “,” in it. I used CSV library and resolved it.
I really love the quick support.