Postgres streak / run query convert to Ecto

jtomchak · November 13, 2019, 3:10am

I’ve got a side project to help me learn Elixir and I am trying to get a user’s streak of concurrent daily posts in a row. I was able to put together the postgres query and successfully understand it and run it. Now I’m stumped on how to translate that in Elixir using Ecto.

Here is the sql query that I’m able to run, and below that my attempt at the Elixir function to do the same.

WITH
	-- This table contains all the distinct date 
  -- instances in the data set
  dates(date) AS (
    SELECT DISTINCT CAST(updated_at AS DATE)
    FROM public.posts
    WHERE user_id = 20
  ),
  -- Generate "groups" of dates by subtracting the
  -- date's row number (no gaps) from the date itself
  -- (with potential gaps). Whenever there is a gap,
  -- there will be a new group
  groups AS (
    SELECT
      ROW_NUMBER() OVER (ORDER BY date) AS rn,
	  date + -ROW_NUMBER() OVER (ORDER BY date) * INTERVAL'1 day' AS grp,
      date
    FROM dates
  )
SELECT
  COUNT(*) AS consecutive_dates,
  MIN(date) AS start_date,
  MAX(date) AS end_date,
  current_date < MAX(date) + 1 AS active 
FROM groups
GROUP BY grp HAVING COUNT(*) > 1 -- must be more than 1 to be a streak
ORDER BY 1 DESC, 2 DESC

and the struggle into Elixir

  def streaks_of_post(user_id) do
    dates =
      from(p in Post,
        distinct: [desc: p.updated_at],
        order_by: [p.updated_at]
      )

    group_dates =
      from d in dates,
        select:
          {row_number() |> over(order_by: d.updated_at),
           type(fragment(date_add(d.updated_at, -1, "day")), :naive_datetime), d.updated_at}

    query = Repo.all(group_dates)
  end

I have been able to piece together that the WITH queries need to be separate in Ecto, so I’ve made those. I’m currently stuck on date + -ROW_NUMBER() OVER (ORDER BY date) * INTERVAL'1 day' AS grp, Haven’t gotten to the actual query of consecutive dates with min / max yet. I’ve been using sql consecutive streak as a rough guide for the sql bits.
Any help or direction is very much appreciated.

jtomchak · November 13, 2019, 3:12am

How did I end up with a type(fragment <> :naive_datetime) ? I got this error and fell into a deep rabbit hole.

iex(3)> Postit.Posting.streaks_of_post(20)
[debug] QUERY OK source="posts" db=17.5ms queue=1.7ms
SELECT DISTINCT ON (p0."updated_at") row_number() OVER (ORDER BY p0."updated_at"), (p0."updated_at"::date + (-1::decimal::numeric * interval '1 day'))::date, p0."updated_at" FROM "posts" AS p0 ORDER BY p0."updated_at" DESC, p0."updated_at" []
** (ArgumentError) cannot load `~D[2019-10-28]` as type :naive_datetime

OvermindDL1 · November 13, 2019, 3:50pm

Do the types in your schema and the database tables themselves have a time and/or timezone component or not? Basically a Date is trying to be passed to a NaiveDateTime.

type(fragment(date_add(d.updated_at, -1, "day")), :naive_datetime)

Specifically here you are doing a pure date operation, time is stripped, but then you are trying to type cast it to a :naive_datetime, which the database can’t do as there is no time component.

jtomchak · November 13, 2019, 6:18pm

the updated_at is just the default Timestamp, and that’s :naive_datetime, right?
I ended up trying to cast it in a shotgun approach to see what the output would be. What I’m trying to do in this query is dateadd(day, -ROW_NUMBER() OVER (ORDER BY date), date) as grp is subtract the row number from the date, producing a new date. This should show me in grp column groups of consecutive entries. i think

but when i try this

from(d in dates,
        select:
          {row_number() |> over(order_by: d.updated_at),
           date_add(d.updated_at, row_number() |> over(order_by: d.updated_at), "day"),
           d.updated_at}
      )

i get

[debug] QUERY OK source="posts" db=1.1ms queue=0.2ms
SELECT DISTINCT ON (p0."updated_at") row_number() OVER (ORDER BY p0."updated_at"), (p0."updated_at"::date + (row_number() OVER (ORDER BY p0."updated_at")::numeric * interval '1 day'))::date, p0."updated_at" FROM "posts" AS p0 ORDER BY p0."updated_at" DESC, p0."updated_at" []
** (ArgumentError) cannot load `~D[2019-11-30]` as type :naive_datetime

jtomchak · November 13, 2019, 6:33pm

I think I got it ?

   from(d in dates,
        select:
          {row_number() |> over(order_by: d.updated_at),
           datetime_add(d.updated_at, (row_number() |> over(order_by: d.updated_at)) * -1, "day"),
           d.updated_at}
      )

datetime not date because it’s a timestamp. and then multiple -1 to create the subtraction.

jtomchak · November 18, 2019, 7:33pm

I wanted to share the final solution that I have in converting a streak query in postgres to Ecto.

def streaks_of_post(user_id) do
    dates =
      from(p in Post,
        distinct: [desc: fragment("?::date", p.updated_at)],
        where: p.user_id == ^user_id
      )

    group_dates =
      from(d in subquery(dates),
        select: %{
          rn: row_number() |> over(order_by: fragment("?::date", d.updated_at)),
          # datetime_add(d.updated_at, (row_number() |> over(order_by: d.updated_at)) * -1, "day"),
          grp:
            fragment(
              "? + -ROW_NUMBER() OVER (ORDER BY ?) * INTERVAL'1 day'",
              fragment("?::date", d.updated_at),
              fragment("?::date", d.updated_at)
            ),
          date: fragment("?::date", d.updated_at)
        }
      )

    streaks =
      from(g in subquery(group_dates),
        select: %{
          consecutive_days: count(),
          start_date: min(g.date),
          end_date: max(g.date),
          active: fragment("current_date") < max(g.date) + 1
        },
        group_by: fragment("grp HAVING COUNT(*) > 1"),
        order_by: fragment("1 DESC, 2 DESC")
      )

    Repo.all(streaks)

I ended up going with fragments in places where I needed the update_at cast as a date type for comparison. I’m not sure if there’s a better way. Whenever I removed the "?::date" from the distinct query or the grp query I would be some strange 1 off errors like this:

SELECT DISTINCT ON (p0."updated_at"::date) + -ROW_NUMBER() OVER (ORDER BY p0."updated_at"::date) * INTERVAL'1 day', 
p0."updated_at"::date + -ROW_NUMBER() OVER (ORDER BY p0."updated_at"::date) * INTERVAL'1 day',
p0."updated_at"::date FROM "posts" AS p0 
WHERE (p0."user_id" = 20)

The final Repo.all(streaks) gives me nice list of streaks

[
  %{
    active: false,
    consecutive_days: 3,
    end_date: ~D[2019-10-30],
    start_date: ~D[2019-10-28]
  },
  %{
    active: false,
    consecutive_days: 2,
    end_date: ~D[2019-10-18],
    start_date: ~D[2019-10-17]
  }
]

I hope this helps someone else. And any feedback is definitely welcomed.

enkr1 · September 16, 2022, 3:16am

Thank you so much you saved me hours.

jtomchak · September 16, 2022, 3:45am

Great to hear. Totally made my day to see if was helpful for you.

enkr1 · September 16, 2022, 5:58am

Adding on,

I also modified/improved these few lines:

# ...
date_group:
  fragment(
    # Same thing, just that I bracketed it to a way I can understand better
    "?::date - ((ROW_NUMBER() OVER (ORDER BY ?::date)) * INTERVAL '1 day')"
# ...

# ...
select: %{
  active: fragment("CURRENT_DATE = MAX(?)::date", g.date), # You can just check if it is today like this too
  consecutive_day_nb: count(),
  start_date: min(g.date),
  end_date: max(g.date)
},
# ...

Just sharing!