Help understanding difference between 2 queries

So, I am trying to query how many Things I have each hour in a given day. I want to get 0 values for hours I have no Things so I reached for generate_series with a right_join and I have this query:

[start, finish] = Helpers.generate_date_range("Today", DateTime.to_naive(DateTime.utc_now()))

SomeThing
|> join(:right, [st], t in fragment("select generate_series(?::timestamp, ?, '1 hour')::time AS hour", ^start, ^finish),
  on: fragment("date_part('hour', ?.hour)", t) == fragment("date_part('hour', ?.inserted_at)", st))
|> group_by([st, t], t.hour)
|> order_by([st, t], t.hour)
|> select([st, t], {t.hour, count(st.id)})

and it returns:

[
  {~T[00:00:00.000000], 1},
  {~T[14:00:00.000000], 1},
  {~T[16:00:00.000000], 3},
  {~T[18:00:00.000000], 1},
  {~T[19:00:00.000000], 1},
  {~T[20:00:00.000000], 2}
]

Not what I am looking for. However, when I write it like this:

from st in SomeThing,
right_join: t in fragment("select generate_series(?::timestamp, ?, '1 hour')::time AS hour", ^start, ^finish),
on: fragment("date_part('hour', ?.hour)", t) == fragment("date_part('hour', ?.inserted_at)", st),
group_by: t.hour,
order_by: t.hour,
select: {t.hour, count(st.id)}

I get exactly what I want:

[
  {~T[00:00:00.000000], 1},
  {~T[01:00:00.000000], 0},
  {~T[02:00:00.000000], 0},
  {~T[03:00:00.000000], 0},
  {~T[04:00:00.000000], 0},
  {~T[05:00:00.000000], 0},
  {~T[06:00:00.000000], 0},
  {~T[07:00:00.000000], 0},
  {~T[08:00:00.000000], 0},
  {~T[09:00:00.000000], 0},
  {~T[10:00:00.000000], 0},
  {~T[11:00:00.000000], 0},
  {~T[12:00:00.000000], 0},
  {~T[13:00:00.000000], 0},
  {~T[14:00:00.000000], 1},
  {~T[15:00:00.000000], 0},
  {~T[16:00:00.000000], 3},
  {~T[17:00:00.000000], 0},
  {~T[18:00:00.000000], 1},
  {~T[19:00:00.000000], 1},
  {~T[20:00:00.000000], 2},
  {~T[21:00:00.000000], 0},
  {~T[22:00:00.000000], 0},
  {~T[23:00:00.000000], 0}
]

I’m failing to understand why one returns the whole range and the other only returns times that have values > 0 since to me it seems like the same query. Can someone explain why? Btw, I am using Postgres. Thanks!

Check out Repo.to_sql(:all, query), but it seems that the produced query should be the same.

1 Like

OK, it actually has something to do with a couple of where clauses, the first query actually looks like:

SomeThing
|> where([sr], ^begin_date <= sr.inserted_at)
|> where([sr], ^end_date >= sr.inserted_at)
|> join(:right, [st], t in fragment("select generate_series(?::timestamp, ?, '1 hour')::time AS hour", ^start, ^finish),
  on: fragment("date_part('hour', ?.hour)", t) == fragment("date_part('hour', ?.inserted_at)", st))
|> group_by([st, t], t.hour)
|> order_by([st, t], t.hour)
|> select([st, t], {t.hour, count(st.id)}

But I’m still not sure why those where clauses are is limiting my result.

Figured it out. I needed to move the where clauses into the on: portion of the join:

SomeRequest
    |> join(
      :right,
      [sr],
      t in fragment(
        "select generate_series(?::timestamp, ?, '1 hour')::time AS step",
        ^start,
        ^finish
      ),
      on:
        fragment("date_part('hour', ?.step)", t) ==
          fragment("date_part('hour', ?.inserted_at)", sr) and
          sr.inserted_at >= ^start and
          sr.inserted_at <= ^finish
    )
    |> group_by([sr, t], t.step)
    |> order_by([sr, t], t.step)
    |> select([sr, t], {t.step, count(sr.inserted_at)})
    |> Repo.all()
1 Like