Ecto IN clauses with tuples

Hello people!

I have a requirement to perform a select query with IN clause using tuples. I am using PostgreSQL. Example:

select * from my_table where (col1, col2, col3) in ((1, 2, 3), (4, 5, 6), (7, 8, 9));

This is valid SQL that we can’t, currently, express with Ecto. Tried with several interpolation techniques but was not successful. The list of values is dynamic in content and length (can’t use a fragment here).

We are falling back to a Repo.query! (making sure this is not a public query filled with user provided data) but would like to know if anyone can think of a way to keep it in Ecto.Query API (mostly for the added readability and security).

Thanks in advance!

2 Likes

I suspect part of the challenge may be that the row type like (a, b, c) is an anonymous type in Postgres. And therefore interpolating may require typing on both the postgres and elixir sides (this is speculation on my part). I don’t have a repo handy right now to even see how Ecto handle select * from table where (a,b,c) in (1,2,3) type queries.

Another angle might be to try an = ANY type query which in most cases is semantically the same as an in query. But it would allow you to provide an elixir list as a parameter which may help make some progress.

I appreciate these are more questions, not answers.

1 Like

Is it possible to write the query differently? where col1 in (1,2,3) and col2 in (4,5,6) ?

I thought about that but I can’t convince myself it is the exact same result. Other than that, I want to use an index with col1, col2 and col3 and that would not use this index.

I thought that tuples (col1, col2, col3) is a standard SQL type though I may be mistaken. It is incredibly difficult to find a SQL specification.

I’ll look into any! Thanks a lot :slight_smile:

That isn’t logically equivalent. Suppose you have WHERE (col1, col2) IN ((1, 10), (2, 20)). If you do WHERE col1 in (1, 2) AND col2 IN (10, 20) that would allow a row that had col1 as 1 and col2 as 20, which would not have been allowed by the first query.

@victorolinasc I’ve had some success using unnest and JOIN for here’s an example:

    SELECT p.result_order as result_order, si.*
      FROM unnest($1::text[], $2::timestamp[], $3::timestamp[], $4::integer[])
      AS p(slug, starts_at, ends_at, result_order)
    LEFT JOIN sensors AS s ON s.slug = p.slug
    LEFT JOIN sensor_installations AS si
      ON s.id = si.sensor_id
      AND (si.activated_at, COALESCE(si.deactivated_at, p.ends_at + INTERVAL '5 minutes')) OVERLAPS
          (p.starts_at, p.ends_at)
    ORDER BY p.result_order ASC

The nice thing is that you get to at least use parameterized queries instead of interpolation, and if you join on multiple columns it should be perfectly able to use compound indices. The downside of course is that you’re still writing raw SQL. This is a pretty old query for us, you might be able to write the unnest as raw SQL but then use that as a subquery in a regular ecto query. Not sure.

2 Likes

Wow! Didn’t know unnest… Will try it out too.

Thanks!

I am about to open an issue on Ecto because I think tuples are a standard type in SQL and, therefore, should be supported. I know this might be tricky and huge to implement and might be used only once in a lifetime but is really one of those things that any workaround seems quite troublesome.

I would love Ecto to support tuples as well, too many times have I needed workarounds for lacking them…

You can express this in Ecto easily:

from row in "my_table",
  where: {row.col1, row.col2, row.col3} in [{1, 2, 3}, {4, 5, 6}, {7, 8, 9}],
  select: row.col1

Will do exactly what you want.

1 Like

Hmm, it works now? It didn’t work when I first wrote this code (admittedly quite a while ago). I should update it, would make it faster than my workaround. ^.^;

I haven’t marked this as the solution because it only works with literal tuples. So, if you build a list of tuples that is dynamic (with values not known in compile time) it will no work.

@josevalim wrote on the Ecto mailing list:

We already support {m.col1, m.col2} in [{1, 2}, …] but the biggest issue is that users want to encode the value on the right-side of IN dynamically, and that is a much harder problem to solve.

The error from Ecto when you try to compile a dynamic list is like so:

list_of_tuples = load_from_some_place()
from row in "my_table",
  where: {row.col1, row.col2, row.col3} in ^list_of_tuples,
  select: row.col1
** (Ecto.Query.CompileError) Tuples can only be used in comparisons with literal tuples of the same size
    expanding macro: Ecto.Query.where/3
1 Like

That’s not necessarily a bad way to do it, but it means you have to maintain four parallel lists.

I’m going to be implementing this as:

SELECT *
  FROM jsonb_to_recordset($1::jsonb) AS p(slug text, starts_at timestamp, ends_at timestamp, result_order integer)
  LEFT JOIN sensors AS s ON s.slug = p.slug
  LEFT JOIN sensor_installations AS si ON si.id = si.sensor_id
AND (si.activated_at, coalesce(si.deactivated_at, p.ends_at + interval '5 minutes') overlaps (p.starts_at, p.ends_at)
order by p.result_order ASC;

I need to do this to perform the equivalent of delete_all for a table with composite keys. This should be easier to do with a list of json elements because of jsonb_to_recordset.

1 Like

Oh excellent I hadn’t seen that before! Were you able to come up with any Ecto syntax that would let you use the jsonb to record set within an ordinary ecto query? I figure the actual record set call has to be in a fragment, but it would be nice to compose that fragment inside a normal ecto query.

No. If/when it’s possible to specify raw SQL or a fragment as a source for from, then it should be pretty easy. I’m not sure how that would work, though.

The jsonb_to_recordset is something that I’ve only been looking at in the last 3–4 weeks, but it really does work well.

1 Like

I wanted to highlight how I’m using these (the code is going to production tomorrow). Essentially, I’m getting my data records like this:

activity =
  PendingDataExchange
  |> where(exchange_type: ^config.exchange)
  |> where([pde], pde.updated_at <= ^config.timestamp)
  |> select([pde], map(pde, ~w(user_id business_id exchange_type update_type)a))
  |> limit(^config.size)
  |> Repo.all(timeout: config.timeout)

Later, I’m loading records based on the activity in question:

records =
  load_records_sql()
  |> Repo.query([activity])
  |> case do
    {:ok, %{rows: rows}} -> {:ok, Enum.map(rows, &hd/1)}
    error -> error
  end

Finally, I’m purging my records:

Repo.query(purge_activity_sql(), [activity])

The SQL statements are complex, so I’ve simplified them some, but the load_records_sql/0 and purge_activity_sql/0 functions are just functions that return a triple-quoted (""") string.

load_records_sql/0:

SELECT jsonb_build_object(
         'update_type', e.update_type,
         'customer_id', c.id,
         'owner_id', bo.id,
         'customer', jsonb_build_object(
           'first_name', c.first_name,
           'last_name', c.last_name,
           'role', buc.role
         ),
         'business', jsonb_build_object(
           'name', b.name
         ),
         'membership', jsonb_build_object(
           'code', m.code,
           'assigned_at', m.assigned_at
         )
       ) AS event
  FROM jsonb_to_recordset($1::jsonb) AS e(
         user_id uuid, business_id uuid, update_type update_type
       )
 INNER JOIN users AS c ON c.id = e.user_id
 INNER JOIN businesses AS b ON b.id = e.business_id
 INNER JOIN memberships AS m ON m.user_id = c.id
 INNER JOIN business_users AS buc ON buc.user_id = c.id AND buc.business_id = b.id
 INNER JOIN business_users AS buo ON buo.business_id = b.id AND buo.role = 'manager'
 INNER JOIN users AS bo ON bo.id = buo.user_id;

Note that this returns a jsonb object for each row, which will simplify a lot of the processing moving forward. jsonb_build_record is almost as awesome as jsonb_to_recordset.

purge_activity_sql/0:

DELETE FROM pending_data_exchange
 USING jsonb_to_recordset($1::jsonb) AS e(
         user_id uuid, business_id uuid, exchange_type citext
       )
 WHERE e.user_id = pending_data_exchange.user_id
   AND e.business_id = pending_data_exchange.business_id
   AND e.exchange_type = pending_data_exchange.exchange_type;

I just wish that I could somehow build something from a Repo.query or fragment that would let me use it as an Ecto source (e.g., just the jsonb_to_recordset($1::jsonb) AS … bit.

1 Like

Here is an example of both methods, the array per field and the jsonb_to_recordset, using ecto.

Array per field + unnest:

ids =[ 1,  2,  1]
ages=[10, 20, 30]

from x in Friends.Person, 
inner_join: j in fragment("SELECT distinct * from unnest(?::int[],?::int[]) AS j(id,age)", ^ids, ^ages),
        on: x.id==j.id and x.age==j.age,
select: [:name]

jsonb_to_recordset:

list = [%{id: 1, age: 10}, 
        %{id: 2, age: 20}, 
        %{id: 1, age: 30}]

from x in Friends.Person,
inner_join: j in fragment("SELECT distinct * from jsonb_to_recordset(?) AS j(id int,age int)", ^list),
        on: x.id==j.id and x.age==j.age,
select: [:name]
6 Likes