PS: the following situation describes an hypothetical scenario, where I own a company that sells things to customers.
I have an Ecto query that is so big, that my machine cannot handle it. With billions of results returned, there is probably not enough RAM in the world that can handle it.
The solution here (or so my research indicates) is to use streams. Streams were made for potentially infinite sets of results, which would fit my use case.
So lets imagine that I want to delete All users that bought a given item. Maybe that item was not really legal in their country, and now me, the poor guy in IT, has to fix things so the world doesn’t come down crashing.
item_id = "123asdasd123" purchase_ids = Purchases |> where([p], p.item_id == ^item_id) |> select([p], p.id) |> Repo.all() Users |> where([u], u.purchase_id in ^purchase_ids) |> Repo.delete_all()
This is the naive way. I call it naive, because of 2 issues:
- We have so many purchases, that the machine’s memory will overflow (looking at
purchase_idswill likely have more than 100K ids, so the second query (where we delete things) will fail as it hits Postgres parameters limit of 32K: PostgreSQL - max number of parameters in "IN" clause? - Stack Overflow
What can I say, our product is highly addictive and very well priced!
Our customers simply cant get enough of it. Don’t know why. Nope. No reason comes to mind. None at all.
With these problems in mind, I cannot help my customers and grow my empire, I mean, little home owned business.
I did find this possible solution:
item_id = "123asdasd123" purchase_ids = Purchases |> where([p], p.item_id == ^item_id) |> select([p], p.id) stream = Repo.stream(purchase_ids) Repo.transacion(fn -> ids = Enum.to_list(stream) Users |> where([u], u.purchase_id in ^ids) |> Repo.delete_all() end)
However, I am not convinced this will work:
- I am using
Enum.to_listand saving everything into a variable, placing everything into memory again. So I am not gaining any advantage by using
- I still have too many
Repo.delete_allto work without blowing up
I guess the one advantage here is that this now a transaction, so either everything goes or nothing goes.
So, the following questions arise:
- How do I properly make use of
streamsin this scenario?
- Can I delete items by streaming parameters (
ids) or do I have to manually batch them?
- Can I stream ids to