Repo.stream with preload new warning

With a recent update to Ecto, I’m seeing this warning:
passing a query with preloads to MyApp.Repo.stream/2 leads to erratic behaviour and will raise in future Ecto versions. Looking at the code for Ecto 3 that will raise an exception, so I’m interested in taking care of that.

I’m using Repo.stream to export a bunch of data as a CSV and so can stream the results into other Stream functions to ultimately write the streamed data to a CSV file.

Does anyone have any advice about how to work around this? It seems like a pretty big limitation.
Thanks.

Out of curiosity is the query doing a join and a preload or just a preload? My hunch is because a preload without a join runs multiple separate queries this can lead to “erratic behaviour”. I’m just shooting in the dark here…

1 Like

It was just doing the straight preload syntax. But Repo.stream seems to call out any preload even if it’s a preload from a join (which I suspect wouldn’t really be a problem like you note).

The best I came up with is to do a query and format the results in the exact structure that I need instead of pulling out Ecto Schema Structs and mapping over those to format the data.

I see. Good to know. Maybe someone with more knowledge will chime in.

Here’s the issue that resulted in deprecation (and removal on ecto 3.0) of preloads with stream: https://github.com/elixir-ecto/ecto/issues/2424

2 Likes

Cool thanks for the link.

You have any idea what Jose is talking about with the chunk comment?

He was most likely talking about Stream.chunk_* functions.

I’m a little stumped by this. Shouldn’t preloads with joins work ok? Since each row would have the joined data?

Has anyone found a workaround for this?

The problem with JOINs is that you get duplicate content in rows and you cannot deduplicate the stream reliably without removing the benefits of streaming results. Like imagine your first result has 2000 associations, which are to be preloaded, but you tell your stream to only give you the first result lazily. Should the stream now return just the first row or the first entry with all it’s 1999 additional rows for all the associations. Additionally the stream would need to have a way to determine what rows still belong to that first entry. This issue gets even more complex when there are multiple preloads or some sorting, which scatters entries to be all over the resultset.

The simple workaround is doing Repo.preload on the entries you got from the stream itself. It’s an additional query, but it’s way simpler to understand and handle.

4 Likes