Query in preload produces different result than by itself

Hey

I have a users - teams many to many setup through a users_teams table.

In the team schema I have the following fields mapping to the users:

    has_many :active_users_teams, UserTeam
    has_many :active_users, through: [:active_users_teams, :user]
    field :active_member_count, :integer, virtual: true, default: nil

In the users table I have these fields that I fill in queries based on the given team:

    field :role_in_selected_team, :string, virtual: true
    field :status_in_selected_team, :string, virtual: true

These work great separately.

Now I am trying to combine the two, so I can preload the users under a team and have these fields on the user loaded as well.

This code is just from messing around:

    user_with_role =
      from(
        ut in UserTeam,
        join: u in assoc(ut, :user),
        select: %User{u | role_in_selected_team: ut.role, status_in_selected_team: ut.status}
      )

    team_with_member_count =
      from(
        t in Team,
        join: ut in assoc(t, :active_users_teams),
        group_by: t.id,
        preload: [:owner, active_users: ^user_with_role],
        select: %Team{t | active_member_count: count(ut.id)}
      )

    teams = Repo.all(team_with_member_count)
    team = Enum.find(teams, fn team -> team.id == 3 end)

    IO.inspect(team.active_users |> length())
    IO.inspect(team.active_users_teams |> length())
    members = Repo.all(user_with_role)
    IO.inspect(members |> length())

I observed this strange thing, that as you see I print out 3 numbers.

The team in question has 25 members.

The output is somehow this:

5
25
25

As you see the latter 2 numbers are 25, the last is the query executed by itself that is used in the active_users field preload.

The middle one is the active_users_teams field form the team schema that gets loaded when the active_users automatically.

Now as you se the first number is not 25, when I use the user_with_role query in the preload of the active_users field.
Note: For running the first query by itself I supplied a static team_id(that matches the incoming one) in the query, else it returns all of them.

Just selecting u in the first query does not change the result.

If I don’t do the custom preload query for active_users it returns 25 users as well.

What am I messing up here?

Any ideas?

I’d suggest turning on logging of queries and compare the actual queries issued to the db.

OK here are the logs, I think only this part is relevant from each, only this is different:
1 - custom select in preload:

[debug] QUERY OK source="users_teams" db=0.3ms idle=6.6ms
SELECT u0."id", u1."id", u1."email", u1."name", u0."role", u0."status" 
FROM "users_teams" AS u0 
INNER JOIN "users" AS u1 ON u1."id" = u0."user_id" 
WHERE (u0."team_id" = 3) AND (u0."id" = ANY($1)) 
[[1, 2, 46, 81, 136, 44, 45, 135, 123, 82, 43, 42, 139, 138, 137, 89, 90, 133, 92, 93, 95, 96, 114, 115, 122, 134, 132, 129]]

2 - regular preload:

[debug] QUERY OK source="users" db=0.3ms idle=6.7ms
SELECT u0."id", u0."email", u0."name", u0."id" 
FROM "users" AS u0 WHERE (u0."id" = ANY($1)) 
[[1, 2, 46, 81, 136, 44, 45, 135, 123, 82, 43, 42, 139, 138, 137, 89, 90, 133, 92, 93, 95, 96, 114, 115, 122, 134, 132, 129]]

3 - preload query by itself:

[debug] QUERY OK source="users_teams" db=0.4ms idle=6.8ms
SELECT u1."id", u1."email", u1."name", u0."role", u0."status" 
FROM "users_teams" AS u0 
INNER JOIN "users" AS u1 ON u1."id" = u0."user_id" WHERE (u0."team_id" = 3) []

Ok so the problem seems to lie in the fact that in the 1st one, even though u0 refers to the users_teams table, at the end of the query the list of ids it is being checked again is in fact the users ids.

Which explains why in the regular preload we have the same list of ids at the end of the log, and why that works.

I checked against the db and it confirms this.

Now as to why this happens I don’t really have an idea yet.
Could have something to do with the through in the team schema:

    has_many :active_users, through: [:active_users_teams, :user]

In the docs here: Ecto.Query — Ecto v3.11.1

There is this part:

For has_many :through - it behaves similarly to a regular has_many but note that the IDs received are of the last association. Imagine, for example, a post has many comments and each comment has an author. Therefore, a post may have many comments_authors, written as has_many :comments_authors, through: [:comments, :author]. When preloading authors with a custom function via :comments_authors, the function will receive the IDs of the authors as the last step

It seems to be reasonable to assume that this is what is happening in my query as well.
The user ids are being passed.

What I am trying to do here might not be an indented use case.