How to merge duplicate records in subquery

Jedas · March 12, 2024, 6:45pm

I initially had this raw SQL query that I needed to translate

select * from entities e
   join users on users.id = e.user_id
   left join (select user_countries.* from user_countries
   join countries on user_countries.country_id = countries.id and countries.real_estate_id = ?) as a on a.user_id = users.id
   where
   e.assurance_id = ?
   and e.state in (?, ?)
   order by e.created_at desc;

And I was able to translate this to Ecto query

query =
      from entity in Entity,
        join: u in assoc(entity, :user),
        left_join:
          a in subquery(
            from user_country in UserCountry,
              join: country in Country,
              on: user_country.country_id == country.id,
              where: country.real_estate_id == ^real_estate_id,
              select: %{
                country_assoc_user_id: user_country.user_id,
                id: country.id,
                name: country.name,
                created_at: country.created_at,
                updated_at: country.updated_at
              }
          ),
        on: a.country_assoc_user_id == u.id,
        where:
        entity.assurance_id == ^assurance_id and
        entity.state in ^states
        order_by: [desc: entity.created_at],
        limit: ^limit,
        select: %{
          entity
          | user: %{
              u
              | countries: [
                  %{
                    id: a.id,
                    name: a.name,
                    created_at: a.created_at,
                    updated_at: a.updated_at
                  }
                ]
            }
        }

    query
    |> Repo.all()
    |> merge_countries()

Is this is the most elegant way to do this? Also it seems to me like it’s a lot of work to merge countries under user using custom code in Elixir inside merge_countries()

al2o3cr · March 12, 2024, 7:57pm

A few (mostly-independent) questions in no particular order:

Jedas:

        left_join:
          a in subquery(
            from user_country in UserCountry,
              join: country in Country,
              on: user_country.country_id == country.id,
              where: country.real_estate_id == ^real_estate_id,
              select: %{
                country_assoc_user_id: user_country.user_id,
                id: country.id,
                name: country.name,
                created_at: country.created_at,
                updated_at: country.updated_at
              }

This puts rows from countries into the results, but the original SQL only returns user_countries.* from the subquery. Was that intentional?

One optimization / simplification that may help: consider moving the filter for countries by real_estate_id out of the query entirely.

For instance, if you’ve already fetched the relevant country IDs your query could look like:

from entity in Entity,
join: u in assoc(entity, :user),
left_join: a in assoc(u, :user_countries), on: a.country_id in ^relevant_ids,
where:
  entity.assurance_id == ^assurance_id and
  entity.state in ^states,
order_by: [desc: entity.created_at],
limit: ^limit,
preload: [user: [countries: from(c in Country, where: c.id in ^relevant_ids)]]

What specifically is going on in merge_countries?

Jedas · March 12, 2024, 9:30pm

Thanks!

Yes, I’m only putting countries rows in the result, because I need it for the API json response.

Well I don’t have the relevant country IDs, so I can’t give up on

left join (select user_countries.* from user_countries
   join countries on user_countries.country_id = countries.id and countries.real_estate_id = ?) as a on a.user_id = users.id

merge_countries() merges the countries under the user struct. Because this query returns flat data from database, so there will be entities with the same user, but different countries, for those, I merge countries together.

 defp merge_countries(entities) do
    entities
    |> Enum.group_by(& &1.user_id)
    |> Enum.map(fn {_user_id, user_list} ->
     // this merges user and its countries
      merge_entities_with_same_id(user_list)
    end)
  end