Selecting parent records based on state of latest child record

I have two schemas which look similar to the example below:

defmodule MyApp.Delivery do
  use Ecto.Schema

  schema "deliveries" do
    # Define your fields here
    has_many :delivery_statuses, MyApp.DeliveryStatus
    timestamps()
  end
end

defmodule MyApp.DeliveryStatus do
  use Ecto.Schema

  schema "delivery_status" do
    field :insert_date, :naive_datetime
    field :status, :string
    field :some_attribute, :string
    belongs_to :delivery, MyApp.Delivery
    timestamps()
  end
end

I’m attempting to write a queries:

  • Fetch a delivery by id and preload the latest status
  • Fetch a list of deliveries where the latest status matches a status parameter
  • Fetch a list of deliveries where the latest status is in a list or not in a list (give me all deliveries where status is A, B or C)

The first one I have working I believe:

# Define the subquery to get the latest status for each delivery
latest_statuses = from ds in MyApp.DeliveryStatus,
  select: %{delivery_id: ds.delivery_id, latest_date: max(ds.insert_date)},
  group_by: ds.delivery_id

# Define the main query to join with deliveries and filter based on the status of the latest status
query = from d in MyApp.Delivery,
  join: ds_max in subquery(latest_statuses), on: d.id == ds_max.delivery_id,
  join: ds in MyApp.DeliveryStatus, on: ds.delivery_id == ds_max.delivery_id and ds.insert_date == ds_max.latest_date,
  preload: [delivery_statuses: ds],
  select: {d, ds}

# Execute the query
result = MyApp.Repo.all(query)

However trying to filter by ds.status doesn’t work:

query = from d in MyApp.Delivery,
  join: ds_max in subquery(latest_statuses), on: d.id == ds_max.delivery_id,
  join: ds in MyApp.DeliveryStatus, on: ds.delivery_id == ds_max.delivery_id and ds.insert_date == ds_max.latest_date,
    where: ds.status == "desired_status",
  preload: [delivery_statuses: ds],
  select: {d, ds}

Rather than returning no results it returns the delivery with the DeliveryStatus that matches a parameter which is not always the latest one.

To summarize I am attempting to select objects based upon conditions in both the parent and child record but wish to return nothing if both conditions are not met.

You probably have multiple delivery statuses on the same date. Try inserted_at instead of insert_date.

1 Like

What about this alternative query:

latest_statuses = from ds in MyApp.DeliveryStatus,
  select: %{delivery_id: ds.delivery_id, delivery_status_id: ds.id},
  order_by: [desc: ds.insert_date],
  limit: 1

query = from d in MyApp.Delivery,
  join: ds_max in subquery(latest_statuses), on: d.id == ds_max.delivery_id,
  join: ds in MyApp.DeliveryStatus, on: ds.delivery_id == ds_max.delivery_status_id,
  where: ds.status == "desired_status",
  preload: [delivery_statuses: ds],
  select: {d, ds}

This will produce Delivery structs with only the “latest” DeliveryStatus loaded into delivery_statuses.

FWIW, if your application has a similar balance of reads vs writes on the status as “deliveries” - lots of reads, a small number of writes - it may be more efficient to cache some data on deliveries:

  • if filtering / showing delivery_status.status is the main thing, a latest_status column on deliveries would avoid the above subquery entirely.

  • if DeliveryStatus has many relevant fields, a back-referencing latest_delivery_status_id on deliveries would let ordinary join and preload clauses produce the same result

1 Like

My mistake that should be inserted_at from the timestamps function call. The insert_date was replaced because I needed the extra precision as you correctly mentioned.

@al2o3cr that looks similar to something I’ve tried already. I did take another run at the query and came up with this:

latest_status = from ds in MyApp.DeliveryStatus, select: %{ds | row_number: row_number() |> over(order_by: [desc: ds.inserted_at])}, where: ds.status == ^status;

query = from d in MyApp.Delivery,
  join: ds in subquery(latest_status), on: d.id == ds.delivery_id and ds.row_number == 1,
  select: d

My test was still failing but as @fuelen mentioned it was likely caused by my factories inserting records with identical timestamps.

I edited the factory call to adjust the insertion time and the test passes. Alternatively I guess I could try and research if the timestamps call has an option for sub-second precision.

Use the :naive_datetime_usec type.

1 Like

Cool.

Looks like I can set that on a per schema basis or as a default across migrations:

 timestamps(type: :naive_datetime_usec)

OR

config :app, App.Repo, migration_timestamps: [type: :utc_datetime]
2 Likes