Trouble with `row_number/0` and `with_cte/3`

I have a table of events, where some of the rows pertain to devices. I’m looking to select/delete the first/last N events per device in a single statement. The device ID is not itself a column of the table, but computed from one of the other columns.

So, first of all I managed to do what I want in SQLite using ROW_NUMBER(). These two queries should be more or less equivalent, and do what I expect:

Using nested queries
SELECT *
FROM (
	SELECT *, ROW_NUMBER() OVER (PARTITION BY device_id ORDER BY emitted_at) AS rn
	FROM (
		SELECT *, COALESCE(JSON_EXTRACT(arguments, '$.context.device_id'), JSON_EXTRACT(arguments, '$.arguments.device_id')) AS device_id
		FROM events
	)
);
Using WITH
WITH
	events_with_device_id AS (
			SELECT *, COALESCE(JSON_EXTRACT(arguments, '$.context.device_id'), JSON_EXTRACT(arguments, '$.arguments.device_id')) AS device_id
			FROM events
	),
	events_with_device_id_and_row_number AS (
		SELECT *, ROW_NUMBER() OVER (PARTITION BY device_id ORDER BY emitted_at) AS rn
		FROM events_with_device_id
	)
SELECT * FROM events_with_device_id_and_row_number;

Now I’m trying and failing to accomplish the same thing with Ecto.

First I tried the most straightforward option, putting everything in a single from/2:

from e in Event,
  windows: [device_id: [partition_by: e.device_id, order_by: :emitted_at]],
  select_merge: %{
    device_id: device_id_column(e),
    rn: row_number() |> over(:device_id)
  }

(device_id_column/1 is the COALESCE(...) you see in SQL)

This failed as I'd expected
[debug] QUERY ERROR source="events" db=0.0ms queue=0.1ms idle=1036.2ms
SELECT e0."id", e0."type", e0."arguments", e0."emitted_at", e0."inserted_at", e0."updated_at", coalesce(json_extract(e0."arguments", '$.context.device_id'), json_extract(e0."arguments", '$.arguments.device_id')), row_number() OVER "device_id" FROM "events" AS e0 WINDOW "device_id" AS (PARTITION BY e0."device_id" ORDER BY e0."emitted_at") LIMIT 1 []
↳ :elixir.eval_external_handler/3, at: src/elixir.erl:371
** (Exqlite.Error) no such column: e0.device_id
SELECT e0."id", e0."type", e0."arguments", e0."emitted_at", e0."inserted_at", e0."updated_at", coalesce(json_extract(e0."arguments", '$.context.device_id'), json_extract(e0."arguments", '$.arguments.device_id')), row_number() OVER "device_id" FROM "events" AS e0 WINDOW "device_id" AS (PARTITION BY e0."device_id" ORDER BY e0."emitted_at") LIMIT 1
    (ecto_sql 3.13.4) lib/ecto/adapters/sql.ex:1113: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.13.4) lib/ecto/adapters/sql.ex:1011: Ecto.Adapters.SQL.execute/6
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:241: Ecto.Repo.Queryable.execute/4
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
    iex:36: (file)

Then I tried the second most straightforward option:

events_with_device_id = from e in Event,
  select_merge: %{device_id: device_id_column(e)}

from e in events_with_device_id,
  windows: [device_id: [partition_by: e.device_id, order_by: :emitted_at]],
  select_merge: %{rn: row_number() |> over(:device_id)}
This also failed; it's how Ecto queries are composed after all
[debug] QUERY ERROR source="events" db=0.0ms idle=1531.7ms
SELECT e0."id", e0."type", e0."arguments", e0."emitted_at", e0."inserted_at", e0."updated_at", coalesce(json_extract(e0."arguments", '$.context.device_id'), json_extract(e0."arguments", '$.arguments.device_id')), row_number() OVER "device_id" FROM "events" AS e0 WINDOW "device_id" AS (PARTITION BY e0."device_id" ORDER BY e0."emitted_at") LIMIT 1 []
↳ :elixir.eval_external_handler/3, at: src/elixir.erl:371
** (Exqlite.Error) no such column: e0.device_id
SELECT e0."id", e0."type", e0."arguments", e0."emitted_at", e0."inserted_at", e0."updated_at", coalesce(json_extract(e0."arguments", '$.context.device_id'), json_extract(e0."arguments", '$.arguments.device_id')), row_number() OVER "device_id" FROM "events" AS e0 WINDOW "device_id" AS (PARTITION BY e0."device_id" ORDER BY e0."emitted_at") LIMIT 1
    (ecto_sql 3.13.4) lib/ecto/adapters/sql.ex:1113: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.13.4) lib/ecto/adapters/sql.ex:1011: Ecto.Adapters.SQL.execute/6
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:241: Ecto.Repo.Queryable.execute/4
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
    iex:35: (file)

So I turned to with_cte/3:

events_with_device_id =
  from e in Event,
    select_merge: %{device_id: device_id_column(e)}

{"events_with_device_id", Event}
|> with_cte("events_with_device_id", as: ^events_with_device_id)
|> windows(device_id_window: [partition_by: :device_id, order_by: :emitted_at])
|> select_merge(%{rn: row_number() |> over(:device_id_window)})
And of course it also failed, otherwise I wouldn't be here ^^
[debug] QUERY OK source="events_with_device_id" db=0.9ms idle=1269.0ms
WITH "events_with_device_id" AS (SELECT se0."id" AS "id", se0."type" AS "type", se0."arguments" AS "arguments", se0."emitted_at" AS "emitted_at", se0."inserted_at" AS "inserted_at", se0."updated_at" AS "updated_at", coalesce(json_extract(se0."arguments", '$.context.device_id'), json_extract(se0."arguments", '$.arguments.device_id')) AS "device_id" FROM "events" AS se0) SELECT e0."id", e0."type", e0."arguments", e0."emitted_at", e0."inserted_at", e0."updated_at", row_number() OVER "device_id_window" FROM "events_with_device_id" AS e0 WINDOW "device_id_window" AS (PARTITION BY e0."device_id" ORDER BY e0."inserted_at") LIMIT 1 []
↳ :elixir.eval_external_handler/3, at: src/elixir.erl:371
** (ArgumentError) struct Event does not have the key :rn
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:327: anonymous fn/4 in Ecto.Repo.Queryable.process/4
    (stdlib 7.2) maps.erl:894: :maps.fold_1/4
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:326: Ecto.Repo.Queryable.process/4
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:278: Ecto.Repo.Queryable.preprocess/4
    (elixir 1.19.5) lib/enum.ex:1688: Enum."-map/2-lists^map/1-1-"/2
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:246: Ecto.Repo.Queryable.execute/4
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
    iex:43: (file)

From the generated query I can see that there’s no AS rn next to the ROW_NUMBER() ... result expression, which could explain the error. Why is it missing, though?! Am I holding it wrong?

Thanks in advance!

I don’t use Ecto much, but if you have an SQL query that works, can you use

Ecto.Adapters.SQL.query()

or one of it’s friends? That way you don’t need to rely on query generation.

I think there are a few things here. What you call “nested” queries are subqueries. While subqueries in sql are only denoted by (…) – so barely visible – in ecto you need to be more explict and use subquery(query). This should help you get to where you want to go here.

WITH (when not used for recursive CTE) is mostly a more readable syntax for using subqueries, because you do not need to nest, but you can list queries one after the other. A bit like elixir pipelines and not using them. Given you have the expressiveness of elixir even with subqueries using with/cte should only be necessary for recursive cte.

But for the issue you get: The AS rn is not the problem. Ecto probably maps by column index, so the column label shouldn’t matter. But you’re selecting Event structs by using {"events_with_device_id", Event} as the queryable. This one means selecting Event schema structs, but using "events_with_device_id" as the source table over the default defined on the schema. With select_merge(%{rn: row_number() |> over(:device_id_window)}) you add an additional field :rn to results. Event struct doesn’t have a key :rn though. Ecto hence cannot give you Event structs including that additional key you select_merged – struct have a fixed set of keys after all.

You can either change what you select, e.g. returning plain maps over Event structs, or you can make the Event struct have that additional key.

2 Likes

TIL! This was indeed enough to get what I wanted.

EDIT: might as well add the working code:

events_with_device_id =
  from e in Event,
    select_merge: %{device_id: device_id_column(e)}

events_with_device_id_and_row_number =
  from e in subquery(events_with_device_id),
    windows: [device_id_window: [partition_by: :device_id, order_by: :inserted_at]],
    select_merge: %{row_number: row_number() |> over(:device_id_window)}

from e in subquery(events_with_device_id_and_row_number),
  where: [row_number: 1]

Correct again. I raised my eye-brow a little when I read it, because I got the device_id into the result the same way, but then I remembered I added the device_id virtual field to the Events schema while playing along the way.

Since structs are just maps, is there really no way to select %Event{}s that happen to have extra fields? In this case adding the device_id and rn virtual fields to the schema would be fine, but I wouldn’t like to do that generally for every possible returned column.

The whole reason for structs to exist is to limit a map to a static known set. You’re free to have many different structs for different purposes or drop using a struct in favor of plain maps for truely dynamic keys.

2 Likes

Yeah, I could do that, but past experience makes me want to avoid that at all costs. Plus flexing the Ecto muscle etc.

While refining the code, I removed the virtual fields (device_id and row_number), and when running the working query I get this in the list of results:

%{
    id: 12001,
    type: "...",
    arguments: %{...},
    __struct__: Event,
    device_id: "device_id_13",
    emitted_at: ~U[2026-02-04 10:28:30Z],
    inserted_at: ~U[2026-03-03 08:33:43Z],
    updated_at: ~U[2026-03-03 08:33:43Z],
    row_number: 1,
    __meta__: #Ecto.Schema.Metadata<:loaded, "events">
  }

Notice the %{...} in place of %Event{...}), but more importantly, no errors about columns that don’t exist… :thinking:

Yes. Elixir cannot prevent you from using map operations or recompilations to result in a struct, which doesn’t match its defined fields. But generally you want to avoid those. These’s are a fallback not a feature.

That was straight out of Ecto, though, I didn’t do any map operations, if that’s what you mean.