How to work best with ugly SQL

I have one table which store information about school vacation periods and an other one which stores information about locations which are recursively nested. Think schools → cities → counties → federal_states → countries (each of them having their own vacation periods).

defmodule Feriendaten.Calendars.Period do
  use Ecto.Schema
  import Ecto.Changeset
  alias Feriendaten.Geography.Location

  schema "periods" do
    field :ends_on, :date
    field :holiday_or_vacation_type_id, :integer
    field :memo, :string
    field :starts_on, :date

    belongs_to :location, Location

    timestamps()
  end
end

defmodule Feriendaten.Geography.Location do
  use Ecto.Schema
  import Ecto.Changeset
  alias Feriendaten.EctoSlug.LocationNameSlug
  alias Feriendaten.Calendars.Period

  schema "locations" do
    field :name, :string
    field :parent_id, :integer
    field :slug, LocationNameSlug.Type

    has_many :periods, Period, on_delete: :delete_all

    timestamps()
  end
end

Because of better performance I use this SQL code to query:

SELECT
	periods.id,
	periods.starts_on,
	periods.ends_on,
	holiday_or_vacation_types.name,
	holiday_or_vacation_types.colloquial,
	periods.ends_on - periods.starts_on + 1 AS days,
	holiday_or_vacation_types.slug
FROM
	periods
	INNER JOIN holiday_or_vacation_types ON periods.holiday_or_vacation_type_id = holiday_or_vacation_types.id
WHERE
	periods.ends_on >= '2022-09-20'
	AND periods.starts_on <= '2023-12-31'
	AND holiday_or_vacation_types.school_vacation = TRUE
	AND location_id IN(WITH RECURSIVE cte_location AS (
			SELECT
				id,
				parent_id FROM locations
			WHERE
				id = 325
			UNION ALL
			SELECT
				e.id,
				e.parent_id FROM locations e
				INNER JOIN cte_location o ON o.parent_id = e.id
)
SELECT
	id FROM cte_location)
ORDER BY
	periods.starts_on;

Which results in this:

To make this work within my Phoenix application I use this function:

  def school_vacation_periods(%Location{} = location, starts_on, ends_on) do
    query =
      "SELECT periods.id,periods.starts_on,periods.ends_on,holiday_or_vacation_types.name,holiday_or_vacation_types.colloquial,periods.ends_on-periods.starts_on+1 AS days,holiday_or_vacation_types.slug FROM periods INNER JOIN holiday_or_vacation_types ON periods.holiday_or_vacation_type_id=holiday_or_vacation_types.id WHERE periods.ends_on>='" <>
        Date.to_string(starts_on) <>
        "' AND periods.starts_on<='" <>
        Date.to_string(ends_on) <>
        "' AND holiday_or_vacation_types.school_vacation=TRUE AND location_id IN(WITH RECURSIVE cte_location AS(SELECT id,parent_id FROM locations WHERE id=" <>
        Integer.to_string(location.id) <>
        " UNION ALL SELECT e.id,e.parent_id FROM locations e INNER JOIN cte_location o ON o.parent_id=e.id)SELECT id FROM cte_location)ORDER BY periods.starts_on;
      "

    result = Ecto.Adapters.SQL.query!(Repo, query, [])

    result.rows
  end

And the result is a list of lists:

[
  [1115, ~D[2022-10-17], ~D[2022-10-31], "Herbst", "Herbstferien", 15, "herbst"],
  [885, ~D[2022-12-23], ~D[2023-01-02], "Weihnachten", "Weihnachtsferien", 11,
   "weihnachten"],
  [944, ~D[2023-04-03], ~D[2023-04-06], "Ostern/Frühjahr", "Osterferien", 4,
   "ostern-fruehjahr"],
  [1002, ~D[2023-05-30], ~D[2023-06-07], "Himmelfahrt/Pfingsten",
   "Pfingstferien", 9, "himmelfahrt-pfingsten"],
  [1059, ~D[2023-07-24], ~D[2023-09-01], "Sommer", "Sommerferien", 40, "sommer"],
  [1390, ~D[2023-10-16], ~D[2023-10-27], "Herbst", "Herbstferien", 12, "herbst"],
  [1413, ~D[2023-12-27], ~D[2024-01-05], "Weihnachten", "Weihnachtsferien", 10,
   "weihnachten"]
]

Needless to say that it is ugly and hard to maintain code. It works but it is hard to look at and work with. I don’t get a struct which I can use within a HEEx template the easy way.

Is there a better approach for this problem? A way to get cleaner code which is better to maintain? And a result which contains a list of structs or some sort of schema.

Ecto has support for recursive cte as well as unions. I don’t see why you couldn’t build an ecto query for the SQL you got there.

1 Like

This is not tested, but should be close to what you’d want:

import Ecto.Query

location_tree_initial_query = from l in Location, where: l.id == ^325

location_tree_recursion_query = 
  from l in Location, 
  join: lt in "cte_location", 
    on: lt.parent_id == l.id

location_tree_query =
  location_tree_initial_query
  |> union_all(^location_tree_recursion_query)

cte_location = 
  {"cte_location", Location}
  |> recursive_ctes(true)
  |> with_cte("cte_location", as: ^location_tree_query)

query = 
  from p in Period, 
    join: type in "holiday_or_vacation_types", 
      on: p.holiday_or_vacation_type_id == type.id,
    where: p.ends_on >= ^~D[2022-09-20],
    where: p.starts_on <= ^~D[2023-12-31],
    where: type.school_vacation,
    where: p.location_id in subquery(cte_location),
    order_by: p.starts_on,
    select: %{
      id: p.id,
      starts_on: p.starts_on,
      ends_on: p.ends_on,
      name: type.name,
      colloquial: type.colloquial,
      days: p.ends_on - p.starts_on + 1,
      slug: type.slug
    }
8 Likes

Thanks @LostKobrakai !

This error pops up:

** (Ecto.QueryError) subquery must return a single field in order to be used on the right-side of in in query:

Repo.all(cte_location) results in

[
  %Feriendaten.Geography.Location{
    __meta__: #Ecto.Schema.Metadata<:loaded, "cte_location">,
    id: 325,
    name: "Koblenz",
    parent_id: 12,
    slug: "koblenz",
    periods: #Ecto.Association.NotLoaded<association :periods is not loaded>,
    inserted_at: ~N[2022-09-20 06:20:08],
    updated_at: ~N[2022-09-20 06:20:08]
  },
  %Feriendaten.Geography.Location{
    __meta__: #Ecto.Schema.Metadata<:loaded, "cte_location">,
    id: 12,
    name: "Rheinland-Pfalz",
    parent_id: 1,
    slug: "rheinland-pfalz",
    periods: #Ecto.Association.NotLoaded<association :periods is not loaded>,
    inserted_at: ~N[2022-09-20 06:20:08],
    updated_at: ~N[2022-09-20 06:20:08]
  },
  %Feriendaten.Geography.Location{
    __meta__: #Ecto.Schema.Metadata<:loaded, "cte_location">,
    id: 1,
    name: "Deutschland",
    parent_id: nil,
    slug: "deutschland",
    periods: #Ecto.Association.NotLoaded<association :periods is not loaded>,
    inserted_at: ~N[2022-09-20 06:20:08],
    updated_at: ~N[2022-09-20 06:20:08]
  }
]

I guess the problem is where: p.location_id in subquery(cte_location), which returns a list of %Location{} but the in part needs a list of ids. Where and how can I tell Ecto to only return the ids?

1 Like
cte_location = 
  {"cte_location", Location}
  |> recursive_ctes(true)
  |> with_cte("cte_location", as: ^location_tree_query)
  |> select([l], l.id)

That should do.

4 Likes

@LostKobrakai 's solution is excellent. In other scenarios where the SQL is impractical to replicate in Ecto’s query DSL keep in mind that Repo.query! supports postgres parameters which allows you to not need to break up the query string. In fact I generally try to have my query strings only as module attributes to guarantee that there’s no runtime interpolation.

3 Likes

Also Repo.load can load raw results into schemas.

2 Likes

Works like a charm! Thank you @LostKobrakai !

For future readers: I added a select for id and parent_id to keep the memory footprint minimal (I do wonder if this is needed or if PostgreSQL would do this automatically). Here’s the code:

    location_tree_initial_query =
      from l in Location,
        where: l.id == ^location.id,
        select: %{id: l.id, parent_id: l.parent_id}

    location_tree_recursion_query =
      from l in Location,
        join: lt in "cte_location",
        on: lt.parent_id == l.id,
        select: %{id: l.id, parent_id: l.parent_id}

    location_tree_query =
      location_tree_initial_query
      |> union_all(^location_tree_recursion_query)

    cte_location =
      {"cte_location", Location}
      |> recursive_ctes(true)
      |> with_cte("cte_location", as: ^location_tree_query)
      |> select([l], l.id)
3 Likes