Counting multiple foreign key associations

Hello everyone,

I am wondering how to make a composable query which returns counts from multiple foreign keys.

For example,

I have an accounts table, a users table and a devices table. There are more tables with associations, but for example’s sake just using these 2.

Users and devices have a fk to accounts via account_id
devices has an additional fk to users by user_id

What I want to get is.

user and device counts per the Fk to accounts in a summary like
“account_name”,“user_count”,“device_count”
Account1,10,4
Account2,20,6

Additionally I want to be able to pass a start and end date as well to each foreign key. Basically ‘retrieve all users and devices created between 09-01-2019 and 09-12-2019’

I have one a query to filter for dates with

def provisioned_during(query, %{start: start, finish: finish}) do
  from q in query, where: fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish)
end

I think the SQL would be something like this


SELECT 
  account_name, 
  count(u.user_id) as user_count, 
  count(d.device_id) as device_count
FROM account a
INNER JOIN users u ON a.account_id = u.account_id AND provisioned_on BETWEEN ? AND ?
INNER JOIN devices d ON a.account_id = d.account_id AND provisioned_on BETWEEN ? AND ?
GROUP BY a.account_id;

Which translates to ecto something like

from a in "accounts",
join: u in "users",
on: a.account_id == u.account_id and fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish),
join: d in "devices",
on: a.account_id == d.account_id and fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish),
group_by: a.account_id,
select: %{account_name: a.account_name, 
          user_count: count(u.user_id),
         device_count: count(d.device_id)}

@cmkarlsson

I had to use subqueries to get correct counts. I was not able to get correct counts with just the joins.

The query which seems to work properly is:

SELECT name, user_count, device_count FROM (
SELECT
    a.name,
    (SELECT COUNT(*) from users u where u.account_id = a.id AND u.provisioned_on BETWEEN '2019-09-01 00:00:00' AND '2019-09-30 23:59:59') as user_count,
    (SELECT COUNT(*) from devices d where d.account_id = a.id AND d.provisioned_on BETWEEN '2019-09-01 00:00:00' AND '2019-09-30 23:59:59') as device_count
FROM accounts a
GROUP BY a.id) r1
WHERE r1.user_count > 0 or r1.device_count > 0;

I am sure there is a better, more efficient/elegant way to get the count than how I have done it.

The main subquery r1 is so that I filter by accounts having those changes in that time period. Without it, I get 0 for accounts which have no users or devices added, which is fine

1 Like

What if you try with a left_join instead of a join?

Otherwise I guess you will only get results back where an account has both a device and a user.

EDIT: Actually. Let me think. There should be a way to do it without subqueries.

1 Like

The problem is that the counts are incorrect with join and left_join

With subqueries:

name,user_count,device_count
"Account1","2","1"
"Account2","1","1"
"Account3","1","2"
"Account4","5","9"
"Account5","4","4"
"Account6","6","2"

With join or left_join

name,user_count,device_count
"Account1","2","2"
"Account2","1","1"
"Account3","2","2"
"Account4","45","45"
"Account5","16","16"
"Account6","12","12"

The columns seem to be multiplied with each other.

They would be, you need to aggregate/group the results.

What is the relationship between user and device?

Does a device always belong to a user? and the account_id is only there for convenience?

So you have: Accounts -> User -> Device?

Or can devices hang off accounts free-standing?

If a device always belong to a user you can do the join like this:

SELECT 
   a.id, 
   count(u.id), 
   count(d.id) 
FROM  account a 
LEFT JOIN user u ON a.id = u.account_id and u.added_on BETWEEN ? AND ?
LEFT JOIN device d ON u.id = d.user_id AND d.added_on BETWEEN ? AND ? 
GROUP BY a.id;

Which seems to work better on the test tables I setup.

@cmkarlsson and @OvermindDL1

Thanks for your responses!

So the structure is like this.

Accounts have many users, devices.
Users can have many devices.

The associations are:
user -> references accounts on account.id == user.account_id
device -> references accounts on account.id == device.account_id
device -> references users on user.id == device.account_id

Device can be orphaned without having a corresponding user and null is a valid value for user_id in devices and the other tables in the db.

All tables have the following date columns
provisioned_on and updated on which are UTC Datetime fields.

And adding a distinct count works if they are separate relationships:

SELECT 
 a.name, 
 count(distinct u.id), 
 count(distinct d.id) 
FROM account a 
LEFT JOIN user u ON a.id = u.account_id and u.added_on BETWEEN ? AND ?
LEFT JOIN device d on a.id = d.account_id and d.added_on BETWEEN ? AND ? 
GROUP BY a.id
HAVING count(u.id) > 0 OR count(d.id) > 0

which in ecto should be:

def counts(s, e) do:

q = from(a in "account", 
left_join: u in "user", 
on: u.account_id == a.id and (u.added_on >= ^s and u.added_on <= ^e), 
left_join: d in "device", 
on: d.account_id == a.id and (d.added_on >= ^s and d.added_on <= ^e), 
select: %{account: a.name, 
          user_count: count(u.id, :distinct), 
           device_count: count(d.id, :distinct)}, 
group_by: a.id)

So this is almost as the initial one with the exception of:

  • left_join instead of join
  • count with (:distinct) option rather than just count

I skipped the fragment because I couldn’t figure out how to use table aliases in the ecto fragements.
I didn’t include the having clause in ecto but that should hopefully be simple.

With my little test data this seems to work fine

2 Likes

@cmkarlsson Thank you!!

That query works. The Ecto query also works.

I have a function which provides date ranges for me to do this with using Timex, so I can just call that function to get the entries for start and finish timestamps.

I added an Enum.reject after returning to results to filter out the accounts where all the counts are zero.

2 Likes

You could add that filtering to the query in the having section as I recall (I just got to work, need my coffee still so I hope this is legible). ^.^;

1 Like