Ecto IN clauses with tuples

victorolinasc · December 2, 2019, 5:08pm

Hello people!

I have a requirement to perform a select query with IN clause using tuples. I am using PostgreSQL. Example:

select * from my_table where (col1, col2, col3) in ((1, 2, 3), (4, 5, 6), (7, 8, 9));

This is valid SQL that we can’t, currently, express with Ecto. Tried with several interpolation techniques but was not successful. The list of values is dynamic in content and length (can’t use a fragment here).

We are falling back to a Repo.query! (making sure this is not a public query filled with user provided data) but would like to know if anyone can think of a way to keep it in Ecto.Query API (mostly for the added readability and security).

Thanks in advance!

kip · December 2, 2019, 6:05pm

I suspect part of the challenge may be that the row type like (a, b, c) is an anonymous type in Postgres. And therefore interpolating may require typing on both the postgres and elixir sides (this is speculation on my part). I don’t have a repo handy right now to even see how Ecto handle select * from table where (a,b,c) in (1,2,3) type queries.

Another angle might be to try an = ANY type query which in most cases is semantically the same as an in query. But it would allow you to provide an elixir list as a parameter which may help make some progress.

I appreciate these are more questions, not answers.

egze · December 2, 2019, 8:21pm

Is it possible to write the query differently? where col1 in (1,2,3) and col2 in (4,5,6) ?

victorolinasc · December 3, 2019, 5:48pm

I thought about that but I can’t convince myself it is the exact same result. Other than that, I want to use an index with col1, col2 and col3 and that would not use this index.

victorolinasc · December 3, 2019, 5:51pm

I thought that tuples (col1, col2, col3) is a standard SQL type though I may be mistaken. It is incredibly difficult to find a SQL specification.

I’ll look into any! Thanks a lot

benwilson512 · December 3, 2019, 5:52pm

That isn’t logically equivalent. Suppose you have WHERE (col1, col2) IN ((1, 10), (2, 20)). If you do WHERE col1 in (1, 2) AND col2 IN (10, 20) that would allow a row that had col1 as 1 and col2 as 20, which would not have been allowed by the first query.

@victorolinasc I’ve had some success using unnest and JOIN for here’s an example:

    SELECT p.result_order as result_order, si.*
      FROM unnest($1::text[], $2::timestamp[], $3::timestamp[], $4::integer[])
      AS p(slug, starts_at, ends_at, result_order)
    LEFT JOIN sensors AS s ON s.slug = p.slug
    LEFT JOIN sensor_installations AS si
      ON s.id = si.sensor_id
      AND (si.activated_at, COALESCE(si.deactivated_at, p.ends_at + INTERVAL '5 minutes')) OVERLAPS
          (p.starts_at, p.ends_at)
    ORDER BY p.result_order ASC

The nice thing is that you get to at least use parameterized queries instead of interpolation, and if you join on multiple columns it should be perfectly able to use compound indices. The downside of course is that you’re still writing raw SQL. This is a pretty old query for us, you might be able to write the unnest as raw SQL but then use that as a subquery in a regular ecto query. Not sure.

victorolinasc · December 3, 2019, 5:58pm

Wow! Didn’t know unnest… Will try it out too.

Thanks!

I am about to open an issue on Ecto because I think tuples are a standard type in SQL and, therefore, should be supported. I know this might be tricky and huge to implement and might be used only once in a lifetime but is really one of those things that any workaround seems quite troublesome.

OvermindDL1 · December 3, 2019, 6:01pm

I would love Ecto to support tuples as well, too many times have I needed workarounds for lacking them…

hauleth · December 3, 2019, 11:52pm

You can express this in Ecto easily:

from row in "my_table",
  where: {row.col1, row.col2, row.col3} in [{1, 2, 3}, {4, 5, 6}, {7, 8, 9}],
  select: row.col1

Will do exactly what you want.

OvermindDL1 · December 4, 2019, 12:02am

Hmm, it works now? It didn’t work when I first wrote this code (admittedly quite a while ago). I should update it, would make it faster than my workaround. ^.^;

victorolinasc · January 2, 2020, 6:12pm

I haven’t marked this as the solution because it only works with literal tuples. So, if you build a list of tuples that is dynamic (with values not known in compile time) it will no work.

@josevalim wrote on the Ecto mailing list:

We already support {m.col1, m.col2} in [{1, 2}, …] but the biggest issue is that users want to encode the value on the right-side of IN dynamically, and that is a much harder problem to solve.

The error from Ecto when you try to compile a dynamic list is like so:

list_of_tuples = load_from_some_place()
from row in "my_table",
  where: {row.col1, row.col2, row.col3} in ^list_of_tuples,
  select: row.col1

** (Ecto.Query.CompileError) Tuples can only be used in comparisons with literal tuples of the same size
    expanding macro: Ecto.Query.where/3

halostatue · January 4, 2020, 6:01am

That’s not necessarily a bad way to do it, but it means you have to maintain four parallel lists.

I’m going to be implementing this as:

SELECT *
  FROM jsonb_to_recordset($1::jsonb) AS p(slug text, starts_at timestamp, ends_at timestamp, result_order integer)
  LEFT JOIN sensors AS s ON s.slug = p.slug
  LEFT JOIN sensor_installations AS si ON si.id = si.sensor_id
AND (si.activated_at, coalesce(si.deactivated_at, p.ends_at + interval '5 minutes') overlaps (p.starts_at, p.ends_at)
order by p.result_order ASC;

I need to do this to perform the equivalent of delete_all for a table with composite keys. This should be easier to do with a list of json elements because of jsonb_to_recordset.

benwilson512 · January 4, 2020, 9:56pm

Oh excellent I hadn’t seen that before! Were you able to come up with any Ecto syntax that would let you use the jsonb to record set within an ordinary ecto query? I figure the actual record set call has to be in a fragment, but it would be nice to compose that fragment inside a normal ecto query.

halostatue · January 5, 2020, 5:00am

No. If/when it’s possible to specify raw SQL or a fragment as a source for from, then it should be pretty easy. I’m not sure how that would work, though.

The jsonb_to_recordset is something that I’ve only been looking at in the last 3–4 weeks, but it really does work well.

halostatue · January 13, 2020, 10:56pm

I wanted to highlight how I’m using these (the code is going to production tomorrow). Essentially, I’m getting my data records like this:

activity =
  PendingDataExchange
  |> where(exchange_type: ^config.exchange)
  |> where([pde], pde.updated_at <= ^config.timestamp)
  |> select([pde], map(pde, ~w(user_id business_id exchange_type update_type)a))
  |> limit(^config.size)
  |> Repo.all(timeout: config.timeout)

Later, I’m loading records based on the activity in question:

records =
  load_records_sql()
  |> Repo.query([activity])
  |> case do
    {:ok, %{rows: rows}} -> {:ok, Enum.map(rows, &hd/1)}
    error -> error
  end

Finally, I’m purging my records:

Repo.query(purge_activity_sql(), [activity])

The SQL statements are complex, so I’ve simplified them some, but the load_records_sql/0 and purge_activity_sql/0 functions are just functions that return a triple-quoted (""") string.

load_records_sql/0:

SELECT jsonb_build_object(
         'update_type', e.update_type,
         'customer_id', c.id,
         'owner_id', bo.id,
         'customer', jsonb_build_object(
           'first_name', c.first_name,
           'last_name', c.last_name,
           'role', buc.role
         ),
         'business', jsonb_build_object(
           'name', b.name
         ),
         'membership', jsonb_build_object(
           'code', m.code,
           'assigned_at', m.assigned_at
         )
       ) AS event
  FROM jsonb_to_recordset($1::jsonb) AS e(
         user_id uuid, business_id uuid, update_type update_type
       )
 INNER JOIN users AS c ON c.id = e.user_id
 INNER JOIN businesses AS b ON b.id = e.business_id
 INNER JOIN memberships AS m ON m.user_id = c.id
 INNER JOIN business_users AS buc ON buc.user_id = c.id AND buc.business_id = b.id
 INNER JOIN business_users AS buo ON buo.business_id = b.id AND buo.role = 'manager'
 INNER JOIN users AS bo ON bo.id = buo.user_id;

Note that this returns a jsonb object for each row, which will simplify a lot of the processing moving forward. jsonb_build_record is almost as awesome as jsonb_to_recordset.

purge_activity_sql/0:

DELETE FROM pending_data_exchange
 USING jsonb_to_recordset($1::jsonb) AS e(
         user_id uuid, business_id uuid, exchange_type citext
       )
 WHERE e.user_id = pending_data_exchange.user_id
   AND e.business_id = pending_data_exchange.business_id
   AND e.exchange_type = pending_data_exchange.exchange_type;

I just wish that I could somehow build something from a Repo.query or fragment that would let me use it as an Ecto source (e.g., just the jsonb_to_recordset($1::jsonb) AS … bit.

alex_weinberger · July 7, 2020, 7:32pm

Here is an example of both methods, the array per field and the jsonb_to_recordset, using ecto.

Array per field + unnest:

ids =[ 1,  2,  1]
ages=[10, 20, 30]

from x in Friends.Person, 
inner_join: j in fragment("SELECT distinct * from unnest(?::int[],?::int[]) AS j(id,age)", ^ids, ^ages),
        on: x.id==j.id and x.age==j.age,
select: [:name]

jsonb_to_recordset:

list = [%{id: 1, age: 10}, 
        %{id: 2, age: 20}, 
        %{id: 1, age: 30}]

from x in Friends.Person,
inner_join: j in fragment("SELECT distinct * from jsonb_to_recordset(?) AS j(id int,age int)", ^list),
        on: x.id==j.id and x.age==j.age,
select: [:name]

dli · July 26, 2024, 11:03am

Late to the party, but I found a hack that builds and expands code at runtime and uses proper parameters instead of interpolation:

def tuple_in(fields, values) do
  fields = Enum.map(fields, &quote(do: field(x, unquote(&1))))
  values = for v <- values, do: quote(do: fragment("(?)", splice(^unquote(Tuple.to_list(v)))))
  field_params = Enum.map_join(fields, ",", fn _ -> "?" end)
  value_params = Enum.map_join(values, ",", fn _ -> "?" end)
  pattern = "(#{field_params}) in (#{value_params})"

  quote do
    dynamic(
      [x],
      fragment(unquote(pattern), unquote_splicing(fields), unquote_splicing(values))
    )
  end
  |> Code.eval_quoted()
  |> elem(0)
end

# usage
from p in Product, where: ^tuple_in([:category_id, :collection_id], [{1, 100}, {2, 200}])

# sql
SELECT
	p0."id",
	p0."title"
# more columns...
FROM
	"product" AS p0
WHERE ((p0."category_id", p0."collection_id")
	IN(($1, $2), ($3, $4));

Using fragment("(?) in (?)", splice(^[p.category_id, p.collection_id]), splice(^values)) does not work because p is out of scope inside of a ^ statement.

My solution uses a dynamic statement and instead creates the field bindings with field/2.

Code.eval_quoted is a bit unorthodox but required to let Ecto
build queries using AST. I hope this helps!

spicychickensauce · July 15, 2025, 9:01am

I landed here from some googling trying to find how to pass a list of maps into ecto as an inline table that could be used in a join expression.

I almost ended up doing the clever jsonb_to_recordset trick, however shortly after I luckily discovered values/2.

This probably didn’t exist at the time, but in case anyone else lands here after some searching, I hope leaving this here helps.

billylanchantin · July 15, 2025, 3:26pm

Thanks @spicychickensauce! You’re right, I don’t think values existed yet.

I’m trying to connect the dots for how to use it with this scenario. Can you (or anyone else) confirm that this is how you’d do it?

value_schema = [col1: :integer, col2: :integer, col3: :integer]
value_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

value_rows = Enum.map(value_lists, &Map.new(Enum.zip(Keyword.keys(value_schema), &1)))
on = Enum.reduce(Keyword.keys(value_schema), dynamic(true), &dynamic([x, v], ^&2 and field(x, ^&1) == field(v, ^&1)))

from(row in :my_table, join: values(value_rows, Map.new(value_schema)), on: ^on, select: row)

I also played with a version that uses module reflection so you don’t need to specify the column types:

def tuple_in(schema_module, columns, value_lists) do
  value_schema = Map.new(columns, &{&1, schema_module.__schema__(:type, &1)})
  value_rows = Enum.map(value_lists, &Map.new(Enum.zip(columns, &1)))
  on = Enum.reduce(columns, dynamic(true), &dynamic([x, v], ^&2 and field(x, ^&1) == field(v, ^&1)))
  from(row in schema_module, join: values(value_rows, value_schema), on: ^on, select: row)
end

tuple_in(My.Table, [:col1, :col2, :col3], [[1, 2, 3], [4, 5, 6], [7, 8, 9]])

But it didn’t seem quite general enough to be useful.

spicychickensauce · July 16, 2025, 12:14pm

Sorry, I only came here looking for a way to use maps as parameters for a join, not for this particular scenario.
I don’t really understand that dynamic usage, my case was much simpler didn’t require it.