Help constucting an Ecto Query

Hi,

So I think a left join would help a bit…

I have a PostgreSQL table storing metrics…Looks something like:

defmodule Metrics do
  use Ecto.Schema

  import Ecto.Changeset

  @primary_key false
  schema "metrics" do
    field :entity_id, Types.UUID, primary_key: true, default: Types.UUID.null()
    field :activity_date, :date, primary_key: true

    field :logins_established, :integer
    field :files_uploaded, :integer
    field :files_downloaded, :integer

An entity can be anything. A business, university, human, car etc. A record is created per day IF there is any interaction by the entity with the system. Once a record is created then the counters are incremented within that record until the next day rolls over.

So 1 record per day, but only if there is any interaction with the system.

My query, in SQLEnglish, is:

select all entity_ids and their most recent activity date from metrics for a specific counter had a value > 0 before a certain date and has no records at all or no counter with a value > 0 after that date.

Doing the values before and no value > 0 after part is easy. Where I’m hitting a mental wall is adding the no records at all part.

Any suggestions?

Thanks

This feels window-functiony, but I’m not sure exactly how to put the pieces together.

One domain-related question: what’s the precise definition for “most recent activity date”?

My naive guess (“the last activity_date with a value > 0”) suggests that “has no records at all or no counter with a value > 0 after that date” means the same thing as “the most recent activity is before that date”, but maybe a 0-value means something special.

“most recent activity date” means the entity_id and most recent timestamp of the record that had any activity (record exists and counter val > 0) for a specific counter.

I’m not averse to splitting up the query and use Elixir code to assist. But I’d prefer an a 100% Ecto way if possible since the table will get quite large

I think you could do the selection logic using a combination of WHERE EXISTS (...) AND NOT EXISTS (...), where ... are subqueries with your conditions before and after that certain date, respectively.

Then ORDER BY activity_date DESC and get only one row per entity_id using SELECT DISTINCT ON (entity_id).

So with variables activity_date and counter_field (atom) passed in, the Ecto code could be:

activity_before =
  from(m in Metrics,
    where:
      m.entity_id == parent_as(:metrics).entity_id and m.activity_date < ^activity_date and
        field(m, ^counter_field) > 0
  )

activity_after =
  from(m in Metrics,
    where:
      m.entity_id == parent_as(:metrics).entity_id and m.activity_date >= ^activity_date and
        field(m, ^counter_field) > 0
  )

from(m in Metrics,
  as: :metrics,
  distinct: m.entity_id,
  order_by: [desc: m.activity_date],
  where:
    exists(subquery(activity_before)) and not exists(subquery(activity_after)) and
      field(m, ^counter_field) > 0
)

Actually, I think you can even simplify this, because the main query is similar to the activity_before subquery and they can be merged, leaving only the activity_after subquery:

activity_after =
  from(m in Metrics,
    where:
      m.entity_id == parent_as(:metrics).entity_id and m.activity_date >= ^activity_date and
        field(m, ^counter_field) > 0
  )

from(m in Metrics,
  as: :metrics,
  distinct: m.entity_id,
  order_by: [desc: m.activity_date],
  where:
    m.activity_date < ^activity_date and field(m, ^counter_field) > 0 and
      not exists(subquery(activity_after))
)
1 Like

It looks like that works. Thank you very much.

1 Like