I am wondering how to make a composable query which returns counts from multiple foreign keys.
For example,
I have an accounts table, a users table and a devices table. There are more tables with associations, but for example’s sake just using these 2.
Users and devices have a fk to accounts via account_id
devices has an additional fk to users by user_id
What I want to get is.
user and device counts per the Fk to accounts in a summary like
“account_name”,“user_count”,“device_count”
Account1,10,4
Account2,20,6
Additionally I want to be able to pass a start and end date as well to each foreign key. Basically ‘retrieve all users and devices created between 09-01-2019 and 09-12-2019’
I have one a query to filter for dates with
def provisioned_during(query, %{start: start, finish: finish}) do
from q in query, where: fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish)
end
SELECT
account_name,
count(u.user_id) as user_count,
count(d.device_id) as device_count
FROM account a
INNER JOIN users u ON a.account_id = u.account_id AND provisioned_on BETWEEN ? AND ?
INNER JOIN devices d ON a.account_id = d.account_id AND provisioned_on BETWEEN ? AND ?
GROUP BY a.account_id;
Which translates to ecto something like
from a in "accounts",
join: u in "users",
on: a.account_id == u.account_id and fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish),
join: d in "devices",
on: a.account_id == d.account_id and fragment("provisioned_on BETWEEN ? AND ?", ^start, ^finish),
group_by: a.account_id,
select: %{account_name: a.account_name,
user_count: count(u.user_id),
device_count: count(d.device_id)}
I had to use subqueries to get correct counts. I was not able to get correct counts with just the joins.
The query which seems to work properly is:
SELECT name, user_count, device_count FROM (
SELECT
a.name,
(SELECT COUNT(*) from users u where u.account_id = a.id AND u.provisioned_on BETWEEN '2019-09-01 00:00:00' AND '2019-09-30 23:59:59') as user_count,
(SELECT COUNT(*) from devices d where d.account_id = a.id AND d.provisioned_on BETWEEN '2019-09-01 00:00:00' AND '2019-09-30 23:59:59') as device_count
FROM accounts a
GROUP BY a.id) r1
WHERE r1.user_count > 0 or r1.device_count > 0;
I am sure there is a better, more efficient/elegant way to get the count than how I have done it.
The main subquery r1 is so that I filter by accounts having those changes in that time period. Without it, I get 0 for accounts which have no users or devices added, which is fine
Does a device always belong to a user? and the account_id is only there for convenience?
So you have: Accounts -> User -> Device?
Or can devices hang off accounts free-standing?
If a device always belong to a user you can do the join like this:
SELECT
a.id,
count(u.id),
count(d.id)
FROM account a
LEFT JOIN user u ON a.id = u.account_id and u.added_on BETWEEN ? AND ?
LEFT JOIN device d ON u.id = d.user_id AND d.added_on BETWEEN ? AND ?
GROUP BY a.id;
Which seems to work better on the test tables I setup.
And adding a distinct count works if they are separate relationships:
SELECT
a.name,
count(distinct u.id),
count(distinct d.id)
FROM account a
LEFT JOIN user u ON a.id = u.account_id and u.added_on BETWEEN ? AND ?
LEFT JOIN device d on a.id = d.account_id and d.added_on BETWEEN ? AND ?
GROUP BY a.id
HAVING count(u.id) > 0 OR count(d.id) > 0
which in ecto should be:
def counts(s, e) do:
q = from(a in "account",
left_join: u in "user",
on: u.account_id == a.id and (u.added_on >= ^s and u.added_on <= ^e),
left_join: d in "device",
on: d.account_id == a.id and (d.added_on >= ^s and d.added_on <= ^e),
select: %{account: a.name,
user_count: count(u.id, :distinct),
device_count: count(d.id, :distinct)},
group_by: a.id)
So this is almost as the initial one with the exception of:
left_join instead of join
count with (:distinct) option rather than just count
I skipped the fragment because I couldn’t figure out how to use table aliases in the ecto fragements.
I didn’t include the having clause in ecto but that should hopefully be simple.
I have a function which provides date ranges for me to do this with using Timex, so I can just call that function to get the entries for start and finish timestamps.
I added an Enum.reject after returning to results to filter out the accounts where all the counts are zero.