How to make 3 queries become one

tio407 · January 1, 2020, 7:38am

Hello, I’m trying to join 3 different tables with the only field in common as account_id with no success after 3 days of failure. I have no knowledge of SQL, but I have the following function being looped through 3x and merged to get the result I need. I’d like to compress it into one call.

I’m essentially running the function below on 3 different modules (read schemas). I’m sure there’s a better way.

  def get_data(module) do
    module
    |> order_by(desc: :id)
    |> preload(account: :users])
    |> Repo.all()
  end

Any help would be immensely appreciated.

mindok · January 1, 2020, 9:01am

Is your current approach working, just a bit messy, or is it not working?

A typical pure SQL approach would be something like this (depending on your SQL dialect):

SELECT combined.f1, combined.f2, combined.table_name, accounts.* FROM
  (SELECT f1, f2, 'table1' AS table_name, account_id from table1
   UNION ALL SELECT f1, f2, 'table2' AS table_name, account_id from table2
   UNION ALL SELECT f1, f2, 'table3' AS table_name, account_id from table3
  ) AS combined
 LEFT JOIN accounts on combined.account_id = account.id

tio407 · January 1, 2020, 9:02am

It’s working; but it’s making 3 Repo.all calls. I’d like to learn how to do this correctly with Elixir

I’ve read the relevant parts of the Programming Ecto and still can’t figure it out.

mindok · January 1, 2020, 9:10am

What you are looking for is a UNION query either with a preload, or pre-joined at the database. The above is in raw SQL. I had a quick hunt around the Ecto repository and it looks like there is some Ecto support for Unions - see https://github.com/elixir-ecto/ecto/pull/2678 and https://github.com/elixir-ecto/ecto/issues/1549. I am not familiar with the implementation, but hopefully that points you in the right direction.

PS: docs are here: https://hexdocs.pm/ecto/Ecto.Query.html#union_all/2

sfusato · January 1, 2020, 9:29am

Did you tried the Ecto.Query.join/5 examples?

In your case, you would start with one model/table and join the other two on account_id.

mindok · January 1, 2020, 9:34am

Hi @sfusato - I’m not sure join would work here - I think @tio407 is trying to combine the tables by adding extra rows, where as joining them adds the data in extra columns. @tio407 - can you confirm?

tio407 · January 1, 2020, 9:48am

I’m trying to create a CSV file with the exact same 4 or so columns, which exist on all the tables but I want them as rows on the CSV file if that makes sense.

Let’s say I had 3 columns: nyc_taxes, sf_taxes, la_taxes. I want the same 3 columns on these tables as headers on my CSV file; and the data from all 3 columns…

All 3 columns belong to account_id in their schemas which is why I mentioned it; not sure how relevant that is.

I was confused about doing join or union.

tio407 · January 1, 2020, 9:52am

As an update, the looping above is not completely working; please ignore my solution of using a comprehension. Bad, bad idea.

mindok · January 1, 2020, 10:34am

Maybe it would be best if you provided the following:
database schema (or migration code used to create schema)
some sample data
ecto schema code
sample csv output

obviously don’t provide anything sensitive…

egze · January 1, 2020, 10:53am

Maybe not the answer you are looking for, but I don’t see anything wrong with 3 separate SQL queries. Your DB will handle it no problem and your code will be cleaner, without hand-written SQL.

What you should consider, regardless if you go with unions or not - is the table size. You don’t want to load all records from the table into memory. It is fine for a small amount, but if it’s millions of records then it will get ugly.

Look into streaming data from DB directly into a file. Something like here http://joeellis.la/streaming-with-ecto/

dimitarvp · January 1, 2020, 2:02pm

This question lacks details. Please paste schema definitions, migrations and your code that attempts to unite 3 queries into one.

And if you post a small GitHub project I’ll personally make sure to help you!

al2o3cr · January 1, 2020, 3:15pm

General comment: I recommend you address that before trying to grapple with Ecto. It’s close enough to the underlying SQL that you really need to understand what you’re trying to send to the database server when you build queries.

To echo others in this thread, I’d recommend getting a working solution that’s built in a straightforward way with three queries. It’s entirely possible that’s “fast enough” for amount of data in your database. If it isn’t fast enough, then you probably don’t want to be doing Repo.all against giant tables anyways - consider something like Repo.stream instead.

tio407 · January 2, 2020, 7:17pm

Union was the correct solution however my tables do not have the same # of columns (and Ecto does not allow this). On the second table I’m selecting one additional column (and setting it as a virtual field for the others)…

what are some workarounds / routes I could take?

mindok · January 2, 2020, 11:34pm

You can “dynamically” add virtual fields (rather than use a schema) in the select part of the subqueries that needs it, for example:

Repo.all(User |> Ecto.Query.select([u], {u.email, "Dynamically added bit"}))

There is no avoiding having the same number of fields for each contributor to the UNION though, that’s just how they work