Ecto SQL query does not return the expected value

Hi all,

Could anyone give me a hint on why is this query not returning the result I expect?

I’m following this blog post on how to use PostgreSQL with Ecto.

I have the following function in a Car module :

  def adjacent_prices do
    query = """
      WITH base AS (
        SELECT lag(price, 2) OVER (ORDER BY price) lag,
        lead(price, 2) OVER (ORDER BY price) lead,
        c.*
        FROM cars c
      )
      SELECT base.* FROM base
      JOIN(
        SELECT lag, lead
        FROM base WHERE price = 5000
      ) sub ON base.price BETWEEN sub.lag AND sub.lead
    """

    Repo.query!(query)
  end

When I call Car.adjacent_prices(), I expect the following five rows:

  • one row with the car costing 5000
  • two rows with the first two cars cheaper than 5000
  • two rows with the first two cars more expensive than 5000

However, when I run the query, I get this result back:

iex(4)> Car.adjacent_prices()
[debug] QUERY OK db=1.3ms queue=1.4ms idle=1776.4ms
  WITH base AS (
    SELECT lag(price, 2) OVER (ORDER BY price) lag,
    lead(price, 2) OVER (ORDER BY price) lead,
    c.*
    FROM cars c 
  )
  SELECT base.* FROM base
  JOIN(
    SELECT lag, lead
    FROM base WHERE price = 5000
  ) sub ON base.price BETWEEN sub.lag AND sub.lead
 []
%Postgrex.Result{
  columns: ["lag", "lead", "id", "engine_type", "mpg", "kwh", "price", "make",
   "model", "vin", "inserted_at", "updated_at"],
  command: :select,
  connection_id: 19391,
  messages: [],
  num_rows: 0,
  rows: []
}

I have this data in the database:

And I’m using PostgreSQL 12.4 (Ubuntu 12.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit.

I also tried to run the query in PGadmin4, but it also did not return anything as a result.

It seems that I am missing something, but I can’t point out what exactly, even after reading the documentation on Ecto.Adapter.SQL.query/4 :thinking:

What is the content of the cars table?

This makes it seem like it’s a generic SQL issue and not related to Ecto. You should play with the raw SQL and the data to get it to work in PGadmin4 first, then translate that to Ecto.

2 Likes

Looks like this is the data.

(@ryanzidago, your screenshot of the data wasn’t showing for some reason; still hard to make out the details, though.)

This is just a table with the following fields: https://github.com/ryanzidago/dryve/blob/adjacent_prices/lib/dryve/cars/car.ex

You can find more on the app itself on GitHub: https://github.com/ryanzidago/dryve/tree/adjacent_prices

1 Like

Thanks, I edited my post to correct the formatting.

1 Like

lag(price, 2) is going to return NULL for the first and second results when sorting the window by price. Feeding NULLs to BETWEEN is not going to match any rows.

Try lag(price, 2, 0) - that will eliminate the NULLs.

2 Likes

The query from the blog post assumes that at least two cars are cheaper than $5,000.

Your seed data doesn’t meet this expectation.

Adding two more records for cheaper cars will give you the result you expect.

Otherwise, sub.lag is null and BETWEEN won’t behave as expected.

expand for my verson of your sample data
create table cars (
  id          serial PRIMARY KEY,
  engine_type VARCHAR(255),
  mpg         INT,
  kwh         INT,
  price       INT,
  make        VARCHAR(255),
  model       VARCHAR(255),
  vin         VARCHAR(255)
);

insert into cars values
( 1, 'electric', null,  230, 79039, 'Nissan' , 'Cube'      , '99W'),
( 2, 'gasoline',   79, null, 97028, 'Ford'   , 'Fiesta'    , '9EK'),
( 3, 'electric', null,  342, 22591, 'Lincoln', 'MKZ'       , '3B5'),
( 4, 'gasoline',  100, null, 59441, 'Dodge'  , 'Challenger', '92G'),
( 5, 'electric', null,  105, 73515, 'Lincoln', 'Navigator' , 'KZ6'),
( 6, 'gasoline',   67, null, 61586, 'Toyota' , 'Camry'     , 'W4B'),
( 7, 'electric', null,  437, 44025, 'Nissan' , 'Rogue'     , 'ZA1'),
( 8, 'gasoline',   88, null, 89244, 'BMW'    , 'M5'        , '585'),
( 9, 'electric', null,  194, 17778, 'Honda'  , 'Accord'    , '8M4'),
(10, 'gasoline',   50, null, 16804, 'Lincoln', 'Navigator' , 'P0G'),
(11, 'electric', null,  189, 84935, 'BMW'    , 'X5'        , 'YD5'),
(12, 'gasoline',   84, null, 94898, 'BMW'    , 'M5'        , 'EE7'),
(13, 'electric', null,  435, 23952, 'Chevy'  , 'Malibu'    , '174'),
(14, 'gasoline',   50, null, 44354, 'Buick'  , 'Verano'    , '78F'),
(15, 'electric', null,  225, 15177, 'Chevy'  , 'Camero'    , '76D'),
(16, 'gasoline',   64, null, 10110, 'Chevy'  , 'Malibu'    , '5P3'),
(17, 'electric', null,  221, 61772, 'Ford'   , 'Fiesta'    , 'B4T'),
(18, 'gasoline',   67, null, 83779, 'Toyota' , 'Camry'     , '8UJ'),
(19, 'electric', null,  459, 21043, 'Honda'  , 'CR-V'      , '69C'),
(20, 'gasoline',   74, null, 15451, 'Dodge'  , 'Charger'   , '6LD'),
(21, 'electric', null,  207,  5000, 'Ford'   , 'Fiesta'    , 'TVK'),
(22, 'gasoline',   99, null, 17750, 'Toyota' , 'Camry'     , '408');
insert into cars values
(23, 'electric', null,  121,  3000, 'DMC'    , 'DeLorean'  , 'B2F'),
(24, 'gasoline',   13, null,  1000, 'AMC'    , 'Gremlin'   , 'JNK');

EDIT: ninja’d by @al2o3cr, and with a better answer. :rocket:

1 Like

Works for me:

WITH base AS (
  SELECT lag( price, 2, 0)          OVER (ORDER BY price) lag,
         lead(price, 2, 2147483647) OVER (ORDER BY price) lead,
         c.*
    FROM cars c
)
SELECT base.*
  FROM base
 INNER JOIN
      (SELECT lag, lead
         FROM base
        WHERE price = 5000
      ) sub
   ON (base.price BETWEEN sub.lag AND sub.lead)

It returns three rows with @ryanzidago’s data set, which makes sense.

Adding one cheaper car bumps the results to four rows, which looks good.

1 Like

Thank you very much @al2o3cr and @Ted for the detailled explanation! It works for me too now :pray:

1 Like