Postgres streak / run query convert to Ecto

I’ve got a side project to help me learn Elixir and I am trying to get a user’s streak of concurrent daily posts in a row. I was able to put together the postgres query and successfully understand it and run it. Now I’m stumped on how to translate that in Elixir using Ecto.

Here is the sql query that I’m able to run, and below that my attempt at the Elixir function to do the same.

WITH
	-- This table contains all the distinct date 
  -- instances in the data set
  dates(date) AS (
    SELECT DISTINCT CAST(updated_at AS DATE)
    FROM public.posts
    WHERE user_id = 20
  ),
  -- Generate "groups" of dates by subtracting the
  -- date's row number (no gaps) from the date itself
  -- (with potential gaps). Whenever there is a gap,
  -- there will be a new group
  groups AS (
    SELECT
      ROW_NUMBER() OVER (ORDER BY date) AS rn,
	  date + -ROW_NUMBER() OVER (ORDER BY date) * INTERVAL'1 day' AS grp,
      date
    FROM dates
  )
SELECT
  COUNT(*) AS consecutive_dates,
  MIN(date) AS start_date,
  MAX(date) AS end_date,
  current_date < MAX(date) + 1 AS active 
FROM groups
GROUP BY grp HAVING COUNT(*) > 1 -- must be more than 1 to be a streak
ORDER BY 1 DESC, 2 DESC

and the struggle into Elixir :slightly_frowning_face:

  def streaks_of_post(user_id) do
    dates =
      from(p in Post,
        distinct: [desc: p.updated_at],
        order_by: [p.updated_at]
      )

    group_dates =
      from d in dates,
        select:
          {row_number() |> over(order_by: d.updated_at),
           type(fragment(date_add(d.updated_at, -1, "day")), :naive_datetime), d.updated_at}

    query = Repo.all(group_dates)
  end

I have been able to piece together that the WITH queries need to be separate in Ecto, so I’ve made those. I’m currently stuck on date + -ROW_NUMBER() OVER (ORDER BY date) * INTERVAL'1 day' AS grp, Haven’t gotten to the actual query of consecutive dates with min / max yet. I’ve been using sql consecutive streak as a rough guide for the sql bits.
Any help or direction is very much appreciated.

1 Like

How did I end up with a type(fragment <> :naive_datetime) ? I got this error and fell into a deep rabbit hole.

iex(3)> Postit.Posting.streaks_of_post(20)
[debug] QUERY OK source="posts" db=17.5ms queue=1.7ms
SELECT DISTINCT ON (p0."updated_at") row_number() OVER (ORDER BY p0."updated_at"), (p0."updated_at"::date + (-1::decimal::numeric * interval '1 day'))::date, p0."updated_at" FROM "posts" AS p0 ORDER BY p0."updated_at" DESC, p0."updated_at" []
** (ArgumentError) cannot load `~D[2019-10-28]` as type :naive_datetime

Do the types in your schema and the database tables themselves have a time and/or timezone component or not? Basically a Date is trying to be passed to a NaiveDateTime.

type(fragment(date_add(d.updated_at, -1, "day")), :naive_datetime)

Specifically here you are doing a pure date operation, time is stripped, but then you are trying to type cast it to a :naive_datetime, which the database can’t do as there is no time component.

1 Like

the updated_at is just the default Timestamp, and that’s :naive_datetime, right?
I ended up trying to cast it in a shotgun approach to see what the output would be. What I’m trying to do in this query is dateadd(day, -ROW_NUMBER() OVER (ORDER BY date), date) as grp is subtract the row number from the date, producing a new date. This should show me in grp column groups of consecutive entries. i think

but when i try this

from(d in dates,
        select:
          {row_number() |> over(order_by: d.updated_at),
           date_add(d.updated_at, row_number() |> over(order_by: d.updated_at), "day"),
           d.updated_at}
      )

i get

[debug] QUERY OK source="posts" db=1.1ms queue=0.2ms
SELECT DISTINCT ON (p0."updated_at") row_number() OVER (ORDER BY p0."updated_at"), (p0."updated_at"::date + (row_number() OVER (ORDER BY p0."updated_at")::numeric * interval '1 day'))::date, p0."updated_at" FROM "posts" AS p0 ORDER BY p0."updated_at" DESC, p0."updated_at" []
** (ArgumentError) cannot load `~D[2019-11-30]` as type :naive_datetime

I think I got it ?

   from(d in dates,
        select:
          {row_number() |> over(order_by: d.updated_at),
           datetime_add(d.updated_at, (row_number() |> over(order_by: d.updated_at)) * -1, "day"),
           d.updated_at}
      )

datetime not date because it’s a timestamp. :woman_facepalming: and then multiple -1 to create the subtraction.

1 Like

I wanted to share the final solution that I have in converting a streak query in postgres to Ecto.

def streaks_of_post(user_id) do
    dates =
      from(p in Post,
        distinct: [desc: fragment("?::date", p.updated_at)],
        where: p.user_id == ^user_id
      )

    group_dates =
      from(d in subquery(dates),
        select: %{
          rn: row_number() |> over(order_by: fragment("?::date", d.updated_at)),
          # datetime_add(d.updated_at, (row_number() |> over(order_by: d.updated_at)) * -1, "day"),
          grp:
            fragment(
              "? + -ROW_NUMBER() OVER (ORDER BY ?) * INTERVAL'1 day'",
              fragment("?::date", d.updated_at),
              fragment("?::date", d.updated_at)
            ),
          date: fragment("?::date", d.updated_at)
        }
      )

    streaks =
      from(g in subquery(group_dates),
        select: %{
          consecutive_days: count(),
          start_date: min(g.date),
          end_date: max(g.date),
          active: fragment("current_date") < max(g.date) + 1
        },
        group_by: fragment("grp HAVING COUNT(*) > 1"),
        order_by: fragment("1 DESC, 2 DESC")
      )

    Repo.all(streaks)

I ended up going with fragments in places where I needed the update_at cast as a date type for comparison. I’m not sure if there’s a better way. Whenever I removed the "?::date" from the distinct query or the grp query I would be some strange 1 off errors like this:

SELECT DISTINCT ON (p0."updated_at"::date) + -ROW_NUMBER() OVER (ORDER BY p0."updated_at"::date) * INTERVAL'1 day', 
p0."updated_at"::date + -ROW_NUMBER() OVER (ORDER BY p0."updated_at"::date) * INTERVAL'1 day',
p0."updated_at"::date FROM "posts" AS p0 
WHERE (p0."user_id" = 20)

The final Repo.all(streaks) gives me nice list of streaks

[
  %{
    active: false,
    consecutive_days: 3,
    end_date: ~D[2019-10-30],
    start_date: ~D[2019-10-28]
  },
  %{
    active: false,
    consecutive_days: 2,
    end_date: ~D[2019-10-18],
    start_date: ~D[2019-10-17]
  }
]

I hope this helps someone else. And any feedback is definitely welcomed.

4 Likes

Thank you so much you saved me hours.

1 Like

Great to hear. Totally made my day to see if was helpful for you. :+1:

1 Like

Adding on,

I also modified/improved these few lines:

# ...
date_group:
  fragment(
    # Same thing, just that I bracketed it to a way I can understand better
    "?::date - ((ROW_NUMBER() OVER (ORDER BY ?::date)) * INTERVAL '1 day')"
# ...

# ...
select: %{
  active: fragment("CURRENT_DATE = MAX(?)::date", g.date), # You can just check if it is today like this too
  consecutive_day_nb: count(),
  start_date: min(g.date),
  end_date: max(g.date)
},
# ...


Just sharing!

1 Like