Counting multiple foreign key associations

Hello everyone,

I am wondering how to make a composable query which returns counts from multiple foreign keys.

For example,

I have an accounts table, a users table and a devices table. There are more tables with associations, but for example’s sake just using these 2.

Users and devices have a fk to accounts via account_id
devices has an additional fk to users by user_id

What I want to get is.

user and device counts per the Fk to accounts in a summary like

Additionally I want to be able to pass a start and end date as well to each foreign key. Basically ‘retrieve all users and devices created between 09-01-2019 and 09-12-2019’

I have one a query to filter for dates with

def provisioned_during(query, %{start: start, finish: finish}) do
  from q in query, where: fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish)

I think the SQL would be something like this

  count(u.user_id) as user_count, 
  count(d.device_id) as device_count
FROM account a
INNER JOIN users u ON a.account_id = u.account_id AND provisioned_on BETWEEN ? AND ?
INNER JOIN devices d ON a.account_id = d.account_id AND provisioned_on BETWEEN ? AND ?
GROUP BY a.account_id;

Which translates to ecto something like

from a in "accounts",
join: u in "users",
on: a.account_id == u.account_id and fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish),
join: d in "devices",
on: a.account_id == d.account_id and fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish),
group_by: a.account_id,
select: %{account_name: a.account_name, 
          user_count: count(u.user_id),
         device_count: count(d.device_id)}


I had to use subqueries to get correct counts. I was not able to get correct counts with just the joins.

The query which seems to work properly is:

SELECT name, user_count, device_count FROM (
    (SELECT COUNT(*) from users u where u.account_id = AND u.provisioned_on BETWEEN '2019-09-01 00:00:00' AND '2019-09-30 23:59:59') as user_count,
    (SELECT COUNT(*) from devices d where d.account_id = AND d.provisioned_on BETWEEN '2019-09-01 00:00:00' AND '2019-09-30 23:59:59') as device_count
FROM accounts a
WHERE r1.user_count > 0 or r1.device_count > 0;

I am sure there is a better, more efficient/elegant way to get the count than how I have done it.

The main subquery r1 is so that I filter by accounts having those changes in that time period. Without it, I get 0 for accounts which have no users or devices added, which is fine

1 Like

What if you try with a left_join instead of a join?

Otherwise I guess you will only get results back where an account has both a device and a user.

EDIT: Actually. Let me think. There should be a way to do it without subqueries.

1 Like

The problem is that the counts are incorrect with join and left_join

With subqueries:


With join or left_join


The columns seem to be multiplied with each other.

They would be, you need to aggregate/group the results.

What is the relationship between user and device?

Does a device always belong to a user? and the account_id is only there for convenience?

So you have: Accounts -> User -> Device?

Or can devices hang off accounts free-standing?

If a device always belong to a user you can do the join like this:

FROM  account a 
LEFT JOIN user u ON = u.account_id and u.added_on BETWEEN ? AND ?
LEFT JOIN device d ON = d.user_id AND d.added_on BETWEEN ? AND ? 

Which seems to work better on the test tables I setup.

@cmkarlsson and @OvermindDL1

Thanks for your responses!

So the structure is like this.

Accounts have many users, devices.
Users can have many devices.

The associations are:
user -> references accounts on == user.account_id
device -> references accounts on == device.account_id
device -> references users on == device.account_id

Device can be orphaned without having a corresponding user and null is a valid value for user_id in devices and the other tables in the db.

All tables have the following date columns
provisioned_on and updated on which are UTC Datetime fields.

And adding a distinct count works if they are separate relationships:

FROM account a 
LEFT JOIN user u ON = u.account_id and u.added_on BETWEEN ? AND ?
LEFT JOIN device d on = d.account_id and d.added_on BETWEEN ? AND ? 
HAVING count( > 0 OR count( > 0

which in ecto should be:

def counts(s, e) do:

q = from(a in "account", 
left_join: u in "user", 
on: u.account_id == and (u.added_on >= ^s and u.added_on <= ^e), 
left_join: d in "device", 
on: d.account_id == and (d.added_on >= ^s and d.added_on <= ^e), 
select: %{account:, 
          user_count: count(, :distinct), 
           device_count: count(, :distinct)}, 

So this is almost as the initial one with the exception of:

  • left_join instead of join
  • count with (:distinct) option rather than just count

I skipped the fragment because I couldn’t figure out how to use table aliases in the ecto fragements.
I didn’t include the having clause in ecto but that should hopefully be simple.

With my little test data this seems to work fine


@cmkarlsson Thank you!!

That query works. The Ecto query also works.

I have a function which provides date ranges for me to do this with using Timex, so I can just call that function to get the entries for start and finish timestamps.

I added an Enum.reject after returning to results to filter out the accounts where all the counts are zero.


You could add that filtering to the query in the having section as I recall (I just got to work, need my coffee still so I hope this is legible). ^.^;

1 Like