Deconstructing CTE into an Ecto query

Hello,

I’m trying to convert a SQL query to the Ecto query, but I’m having hard time figuring out how to do that. I know I could simply execute it using SQL adapter, but I thought I might give it a try using Ecto.

The query is supposed to compare dates in historical_reports table to all non-holiday, non-weekend dates (generated by generate_sequence CTE and then filtered against country_holidays table) since the first date found in the historical_reports table. For example, assuming I have records for a whole year in my historical_reports table and missed yesterday (non-holiday and non-weekend day), the result of executing the following query should be a single record of 2020-04-06.

WITH company_report_date AS
(
	select report_date
	from historical_reports
	where company_id = 1
),
year_without_weekends AS
(
	SELECT date(d)
	FROM generate_series((SELECT min(report_date) FROM company_report_date), current_date, interval '1 day') d
	WHERE EXTRACT('ISODOW' FROM d) < 6
)
SELECT date
FROM year_without_weekends yww
WHERE yww.date NOT IN (SELECT report_date FROM company_report_date) AND
yww.date NOT IN (SELECT holiday_date FROM country_holidays) AND
yww.date <> current_date;

I started deconstructing that CTE into smaller Ecto queries, like so:

company_report_date = from hr in HistoricalReport, where: hr.company_id == ^company_id
first_date_record = from q in company_report_date, select: min(q.report_date)

At this point, I need to use a fragment to generate a sequence of dates, but I’m not sure how to proceed:

from d in fragment("select date(d) from generate_series((?), current_date, '1 day'::interval) d where extract('ISODOW' from d) < 6", first_date_record)

I guess my question is how to construct the third intermediary query when I’m not using any defined schemas / tables and then continue filtering / deconstructing it further. Also, does that approach make sense at all or should I simply use the SQL adapter and run my query as is?

Thanks!

1 Like

I guess the problem is you’re using a subquery in the FROM part with generate_series, but ecto queries only support subqueries as from and join sources. Trying to compose subqueries and fragments will quickly become unwieldy, so I’d suggest sticking with the raw SQL unless you can refactor the query to have the generate_series be called differently, so ecto’s subquery handling can apply.

I had to do it for a few slightly complex CTEs, your best friend is Ecto.Query.subquery that transforms query composition into subquery expressions.

For example this CTE


with fast as (
	SELECT c2.city, st_distance(st_transform(c2."geom", 4326), point.geom) d 
	FROM "shapes"."cities" AS c2, point 
	WHERE st_intersects(st_transform(c2."geom", 4326), point.geom)
),
slow as (
	SELECT c2.citiy, st_transform(c2."geom", 4326) <-> point.geom d 
	FROM "shapes"."cities" AS c2, point
	WHERE st_transform(c2."geom", 4326) <-> point.geom <= 50.0/110574
)
select * from fast
UNION all
select * from slow
WHERE NOT EXISTS (
	select * from fast
)  
order by d
limit 1

Can be converted to

fast_query =
  from(m in module,
    where: st_intersects(st_transform(m.geom, 4326), st_makepoint(^lon, ^lat)),
    select: map(m, ^fields),
    select_merge: %{
      distance: st_distance(st_transform(m.geom, 4326), st_makepoint(^lon, ^lat))
    }
  )

count_query = from(f in subquery(fast_query), select: %{count: count()})

slow_query =
  from(m in module,
    join: c in subquery(count_query),
    where: in_range(st_transform(m.geom, 4326), st_makepoint(^lon, ^lat), 50),
    select: map(m, ^fields),
    select_merge: %{
      distance: st_distance(st_transform(m.geom, 4326), st_makepoint(^lon, ^lat))
    },
    where: c.count == 0
  )

fast = from(fast in subquery(fast_query))
slow = from(slow in subquery(slow_query))

from(
  q in subquery(
    from(fast in fast,
      union: ^slow,
      select: fast
    )
  ),
  order_by: q.distance,
  limit: 1,
  select: q
)

to print the generated SQL you can use

Ecto.Adapters.SQL.to_sql(:all, RepoModule, query)

EDIT: Ecto 3.2 already added support for CTE, https://github.com/elixir-ecto/ecto/blob/master/CHANGELOG.md but I find subqueries more readable in code, while CTE are usually better in SQL

5 Likes

Heh, TIL that Ecto finally allows subqueries in the source. Great, I was missing that for some time now.

And to update @kminevskiy - the approach by @massimo will be better as in PostgreSQL 9-11 the CTE was optimisation fence. Now it still is, but it can be lifted sometimes. So this approach will be better as it allows to bubble the filters up and down which allows for better optimisations.

3 Likes

Thanks for your detailed example, will definitely give mine a try later today!

OK, here’s an update:

The query I managed to compose is this:

from c in company_report_date, 
  right_join: dates in fragment("select date(d) from generate_series((?)::date, current_date, interval '1 day') d where extract('ISODOW' from d) < 6", ^min_date), 
  on: c.report_date == dates.date, 
  where: is_nil(c.report_date),
  select: %{missing_date: dates.date}

However, the resulting execution returns an empty set. To test it, I converted my original SQL query to the one matching that new Ecto query and it returns the same result. So there has to be something wrong with my new Ecto query.

Just a quick update. Here’s how the final query looks like in my case. Not sure if that’s the best / most efficient way of doing it, but hey, it works and makes sense when I read the code :slight_smile:

scoped_historical_reports_query =
  from hr in HistoricalReport, where: hr.company_id == ^company_id, select: [:report_date]

final_query =
      from hr in subquery(scoped_historical_reports_query),
        right_join:
          dates in fragment(
            "select date(d) from generate_series((?)::date, current_date, interval '1 day') d where extract('ISODOW' from d) < 6",
            ^min_date
          ),
        on: hr.report_date == dates.date,
        where: is_nil(hr.report_date),
        where: dates.date not in ^holiday_days,
        where: dates.date != fragment("current_date"),
        select: %{missing_date: dates.date}