Ecto Postgres select ARRAY constructor

Hello,

We recently did some optimizations on a slow query in our system, and found the following article from dba exchange that has a nice syntax for selecting an array of items per row

This ended up being a more performant solution from what we were using previously. Below is a simple example of what we’re trying to do in raw postgresql

select *
from calls c
  left join lateral (
    select array (
        select cr.id
        from call_recordings cr
        where cr.call_id = c.id
          and cr.s3_key is not null) as call_recording_ids) as call_recordings on true
limit 10;

As far as I can tell there isn’t an easy way to do this in Ecto. The closest translation to Ecto I’ve got so far is this

  def test_select_array do
    call_recordings_query =
      from cr in Recording,
        select: %{
          call_recording_ids:
            fragment(
              "(SELECT ARRAY(SELECT cr.id FROM call_recordings cr WHERE cr.call_id = ? AND cr.s3_key IS NOT NULL))",
              parent_as(:call).id
            )
        }

    from(c in Call,
      as: :call,
      left_lateral_join: cr in subquery(call_recordings_query),
      limit: 10,
      select: %{
        call_id: c.id,
        call_recording_ids: cr
      }
    )
  end

But even that isn’t quite right, and it’s janky. Looking for any ideas/guidance on this. Maybe this is something Ecto supports without a fragment and I’m not aware of it, or could support?

– Will

2 Likes

You could move the fragment to a macro to make the query cleaner, here is an example on how that looks like:

defmacro st_transform(wkt, srid) do
  quote do: fragment("ST_Transform(?, ?)", unquote(wkt), unquote(srid))
end
1 Like

This is where Ecto start to show it’s brittleness, as one of the best ORM I have ever worked with, it still falls short in a lot of real world cases and especially on a enterprise level where you usually have dedicated DBAs that don’t know Elixir, GitHub - elixir-dbvisor/sql: Brings an extensible SQL parser and sigil to Elixir, confidently write SQL with automatic parameterized queries. could help you out here, still fresh but the idea is simple, lower the barrier between Elixir and SQL.

1 Like

I appreciate the help on this. I tried the macro route, but I ran into some compilation issues with how I was using it. It almost feels like there should be an array concept in Ecto that expects a subquery so something like this could be written:

    call_recording_query =
      from(cr in CallRecording,
        where: parent_as(:call).id == cr.call_id,
        select: cr.id
      )

    from(c in Call,
      as: :call,
      left_lateral_join: call_recording_ids in array(call_recording_query),
      select: %{
        id: c.id,
        call_recording_ids: call_recording_ids
      }
    )

Maybe something Ecto could support only for lateral joins? I could see this being useful in situations where preloads aren’t an option. I’d be happy to work on a PR to Ecto so.

Fortunately we were able to refactor this query to use preloads where we only selected the fields that we needed. Like so:

    from(c in Call,
      as: :call,
      join: u in assoc(c, :user),
      as: :user,
      left_join: cp in assoc(c, :contact_phone),
      as: :contact_phone,
      left_join: con in assoc(cp, :contact),
      as: :contact,
      left_join: pc in PatientContact,
      on: pc.patient_id == c.patient_id and pc.contact_id == con.id,
      left_join: ca in assoc(c, :call_audit),
      as: :call_audit,
      left_join: au in assoc(ca, :auditor),
      as: :auditor,
      left_join: p in assoc(c, :patient),
      as: :patient,
      left_join: d in assoc(c, :disposition),
      as: :disposition,
      left_join: cr in assoc(c, :call_recordings),
      preload: [
        user: u,
        patient: p,
        disposition: d,
        call_recordings: cr,
        call_audit: {ca, auditor: au},
        contact_phone: {cp, contact: {con, patient_contacts: pc}}
      ],
      order_by: [desc: c.started_at],
      select:
        map(c, [
          :id,
          :zoom_call_id,
          :started_at,
          :ended_at,
          :call_type,
          :purpose,
          :call_uuid,
          :notes,
          :patient_id,
          :thrio_workitem_id,
          :zoom_external_phone_number,
          user: [:id, :first_name, :last_name],
          patient: [:id, :first_name, :last_name, :preferred_language],
          contact_phone: [
            :id,
            :phone_number,
            :status,
            contact: [:id, :name, patient_contacts: [:id, :status]]
          ],
          call_audit: [:id, :status, auditor: [:full_name]],
          call_recordings: [:id, :call_id, :s3_key, :transcript_raw]
        ])
    )
    |> Call.with_duration()

Those two queries are very different, and over time different user might experience different performance.

But that said, if you are a young startup, then you can easily kick this can down the road.

Bare selects are not supported by ecto queries. That’s your issue, not necessarily the array part.

import Ecto.Query

array_query = 
  from cr in "call_recordings",
    where: cr.call_id == parent_as(:call).id,
    where: not is_nil(cr.s3_key),
    select: %{
      id: cr.id
    }

call_recordings_query =
      from cr in subquery(array_query),
        select: %{
          call_recording_ids: fragment("array_agg(?)", cr.id)
        }
query = 
  from(c in "calls",
    as: :call,
    left_lateral_join: cr in subquery(call_recordings_query), on: true,
    limit: 10,
    select: %{
      call_id: c.id,
      call_recording_ids: cr
    }
  )

{sql, _} = Repo.to_sql(:all, query)
IO.puts(sql)
# SELECT c0."id", s1."call_recording_ids" 
# FROM "calls" AS c0 
# LEFT OUTER JOIN LATERAL (
#   SELECT array_agg(ss0."id") AS "call_recording_ids" 
#   FROM (
#     SELECT ssc0."id" AS "id" 
#     FROM "call_recordings" AS ssc0 
#     WHERE (ssc0."call_id" = c0."id") AND (NOT (ssc0."s3_key" IS NULL))
#   ) AS ss0
# ) AS s1 ON TRUE 
# LIMIT 10

Functionally this should yield the same result as the requested query