Is there no way to insert multiple entries in one go using Ecto.insert?

theagilecoder · January 13, 2020, 5:30pm

Given a list of structs based on a schema

[ %App.Car{model: “E”}, %App.Car{model: “X”}, %App.Car{model: “Y”} ]

Is there no way to insert them in one go using Repo.insert ?

Don’t want to use insert_all because first i wil have to convert my structs into maps and then manualy set the timestamps so that I dont get the “null value in column “inserted_at” violates not-null constraint” error. And also according to the docs, its a lower-level function.
Dont need to use Multi, becuase its not a transaction that I am dealing with.

So, Using Enum.each(list_of_structs, &Repo.insert) my only option ?

dimitarvp · January 13, 2020, 6:37pm

Have you checked Repo.insert_all in the docs?

wolfiton · January 14, 2020, 12:52am

Have you tried ecto multi?
https://elixirschool.com/blog/ecto-multi/

theagilecoder · January 14, 2020, 2:59am

Yes. I talk about it in the 1st point mentioned in my post.

theagilecoder · January 14, 2020, 2:59am

Yes. I talk about it in the 2nd point of my post.

dimitarvp · January 14, 2020, 4:03am

I guess I misunderstood your disclaimer then. Sorry.

Can you expand on this point? Transactions are just an added bonus. Not sure why having it is a deal-breaker for you.

theagilecoder · January 14, 2020, 4:17am

Multi transactions are required for the scenario where all the inserts are needed to go to DB or none at all. I dont have that scenario. The inserts I am trying to do are independent of each other. If half of the inserts are successful and half are not, I am still fine.

dimitarvp · January 14, 2020, 4:35am

In that case you can just use Task.async_stream and pass it a function that inserts one record. It will transparently parallelize the inserts over all your CPU cores and every operation is independent and isolated.

kip · January 14, 2020, 4:45am

If this is a pattern that’s important for you (insert several structs but in one database call) then perhaps worth investing in a helper function that converts to maps and adds the time stamps so you can use insert_all/2? Another approach might be to modify the database schema to have a default timestamp so you don’t have to think about it.

theagilecoder · January 14, 2020, 5:29am

When you say schema to have default timestamp, i already have it. I have timestamps() in the schema definition. Do you mean something else?

kip · January 14, 2020, 5:30am

Ah, I meant in the actual database itself (ie the migration in Ecto terms)

theagilecoder · January 14, 2020, 5:30am

Yes, using Task.async_stream seems to be the most efficient solution to my problem. Thanks.

theagilecoder · January 14, 2020, 5:35am

Yes, I do have timestamps() in the schema as well as in the migration file and as well as in the DB. and thats why I have to set the timestamps manually in the map if I were to use insert_all. This is what I have mentioned in the point 1 of my post. Do you mean something else ?

kip · January 14, 2020, 5:39am

I’m really doing a bad job of explaining myself today

The timestamps() call in the Ecto schema defines the columns on the DB. My alternative suggestion was to define a default value for those timestamps so that even for an insert_all/2 the timestamps would be set by the database.

An explanation of what I’m describing is at https://www.postgresqltutorial.com/postgresql-current_timestamp/

Sounds like Task.async_stream/3 is working for you so probably not much point in pursuing this angle.

kokolegorille · January 14, 2020, 6:24am

to insert multiple entries in one go

insert_all has been done for this. I would rather write a function to transform data into something suitable for insert_all, as mentionned by @kip, this can be done in one pass, with stream if working on large collection.

It might not change a lot for small dataset, but when You run batches of 10k elements, it’s better to use one insert_all, than 10k independant inserts.