PRQL vs/with Ecto (PRQL is a modern language for transforming data)

Just stumbled upon https://prql-lang.org/ which has an Elixir binding https://github.com/PRQL/prql/pull/1500 by @kasvith.

Not having worked with Ecto, but planning to start a project soon I wonder: what does PRQL that Ecto does not?Will PRQL ever have the tight integration that Ecto does (eg changesets and phoenix-forms).


Pipelined Relational Query Language, pronounced “Prequel”

— a simple, powerful, pipelined SQL replacement

Why PRQL?

For data engineers

  • PRQL is concise, with abstractions such as variables & functions
  • PRQL is database agnostic, compiling to many dialects of SQL
  • PRQL isn’t limiting — it can contain embedded SQL where necessary
  • PRQL has bindings to most major languages (and more are in progress)
  • PRQL allows for column lineage and type inspection (in progress)

For analysts

  • PRQL is ergonomic for data exploration — for example, commenting out a filter, or a column in a list, maintains a valid query
  • PRQL is simple, and easy to understand, with a small number of powerful concepts
  • PRQL allows for powerful autocomplete, type-checking, and helpful error messages (in progress)

For tools

  • PRQL is a stable foundation to build on; we’re open-source and will never have a commercial product
  • PRQL is a single secular standard which tools can target
  • PRQL is easy for machines to read & write
4 Likes

Having a NIF to compile queries feels a little heavy, but besides a few ecto query specific features like preloading and a bit of additional boilerplate loading data (see Repo.load) you should be fine using PRQL for querying the database even with using ecto for all the rest.

For insert queries ecto would still build the sql internally, but you‘re not touching that sql anyways, so I‘m not really sure what use you‘d get out of using PRQL on the write side of things in the first place.

I saw that on HN the other week, got to this ambiguous syntax sort [-amount] and closed the tab :joy:

Worse:

WITH table_0 AS (
  SELECT
    employees.*,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _rn
  FROM
    employees
)
SELECT
  table_0.*
FROM
  table_0
WHERE
  _rn <= 1

If only modern SQL would have some syntax to fetch N rows with ties, oh wait, it does:

SELECT
  employees.*
FROM employees
ORDER BY join_date
FETCH FIRST 1 ROWS WITH TIES
5 Likes

But the first query is valid SQL that they should support, no?

The point was that they use the first example as something they improved on (their top n items example on the website), while there’s already ways to write it more consicely in sql itself.

3 Likes

Yes, it is valid, but SQL standard already have solution for this problem. No need for CTE there. So the PSQL example:

from employees
group role (
  sort join_date
  take 1
)

Can be written in “standard SQL” using:

SELECT
  employees.*
FROM employees
ORDER BY join_date
FETCH FIRST 1 ROWS WITH TIES

Which is not much harder to read.

PS
Ecto.Query currently do not support WITH TIES but I have sent proposal to add support for it.

4 Likes

I’m wondering where the role column of the example went though. How is your SQL showing the lastest joined employee per role?

1 Like

Ahh, I missed the GROUP BY. I would need to check that out.

Oh I see. I didn’t realize they were claiming it was easier than SQL. That is pretty silly.

From what I see, I’d choose Ecto.

From my perspective, for PRQL to be a viable option in an Elixir project it would have to offer pretty clear and substantial benefits over and above Ecto This bar is high if only because the moment I want to bring in someone else to the project, as a team member or even just to ask a question to, I’m now limited to their knowledge of the tool whereas with Ecto there’s a reasonable chance someone familiar with Elixir may already come with Ecto knowledge; the expected knowledge of communities like this would also be to have a broader knowledge of Ecto as well. Appreciating that it sounds like you have a learning curve either way, PRQL will likely force that learning curve on anyone engaging with you concerning the project. Also, without substantial improvements over Ecto, it will be less likely that Elixir types are going to take the time to learn it.

PRQL does not seem to meet the bar of substantial improvement over the Ecto query DSL. I’ve only glossed it, but so far in the best case it looks like it only achieves some similar benefits such as being composable. After that I think I see possibility of it being more problematic than Ecto query DSL; if only because Ecto is much, much more than a query language.

In Ecto, I can usually tell what SQL will be sent to the database and can pick and chose between some different optimizations; for example using Ecto schema defined associations I can write an Ecto query retrieving joined data to either run in a single database query using the join function and a preload or I can actually have my single Ecto query run multiple database queries to reduce network time over the wire if my runtime processing is swamped by that network time. This can be done without having to write multiple Ecto queries and the result I get back from either choice is identical with most of the processing of getting it that way handled by Ecto. With PRQL such choices seem like they’d be out of reach; whereas Ecto can have a prior understanding of the information architecture (Ecto Schema definitions w/associations) as well as it can make assumptions about reasonable data structures with which to return data, PRQL appears to just be an alternative language for querying and won’t be able to assume that it can return data in ways that Ecto can since it has to support being called from many languages.

I guess the in the end Ecto does provide a query DSL, but then a lot of data validation and data handling conveniences beyond just the query DSL which are probably more the reason to use Ecto anyway. PRQL would therefore appear to compete more with a Postgrex/raw SQL queries solution than with Ecto anyway.

3 Likes

Does PRQL make it easier to compose runtime queries? In Ecto it can be difficult/not possible in certain situations.

It’s still a textual format as it seems, so the same issue around injection attacks apply.

Yep, “composability” is a spectrum and there are cases in which Ecto fails the absolutely composable test. The biggest issues that I run into with Ecto’s query DSL regarding composability come from the fact that ordering can be important in SQL and Ecto doesn’t try to abstract that away… so ordering can also be important in Ecto; controlling for that can cascade into other issues as far as things like, for example, controlling for the presence of prerequisite joins being added without being duplicated, etc.

I’ve looked a bit more at the docs and I can’t see that PRQL would solve that (their examples are too simple and they don’t address the question from what I saw) and frankly if it did, I might be more concerned about the correctness or efficiencies of the created queries. The more introduced magic and abstraction becomes greater assumptions (and hidden assumptions) which become a lowered likelihood to get a good query at the end. This is the problems that many ORMS suffer and which also means that to use this is the real world you’d need to know PRQL and SQL since once the PRQL produces a poor or inefficient result you need SQL chops to troubleshoot it. I think that’s a flaw in their approach: the only way you’re really going to “replace SQL” in any meaningful way is not just to build the right language/abstractions but also replace the parser (at least) and query optimizer in the database itself to make your new language first class.

Now that I’ve looked more closely at PRQL, I don’t get it… I just don’t understand what they’re really trying to achieve. Actual composability seems like it would be substantially worse than Ecto. In Ecto I’m treating my query as a part of Elixir; sure some of that comes from macros and such, but as the client developer I don’t need to care so much… I can pass my “query” (query description really) from function to function and safely add to it or inspect it a bit. In PRQL, as @LostKobrakai points out, I’m still just munging around strings… sure I have to worry less about the ordering for some elements, but I’ve turned my problem of managing a description of a query into the much more nit-picky problem of managing a string which will later become a description of a query… ack. Aside from relaxing some of SQL’s ordering requirements, the language is at least as complex as SQL and you’re still producing a declarative statement of what you want at the end which then gets compiled to SQL… so you never really escape it’s limitations and quirks. Finally my understanding of SQL history, which could be wrong, is that SQL itself was created for the same audience and purposes: a simpler way to retrieve data for data analysists and other non-programmer power user types. You could take most of the PRQL marketing blurbs back several decades and just replace PRQL with SQL with the irony being that in the SQL/historical context there would be more to talk about.

2 Likes

With recent ecto versions there have been many things added which make those less complicated, but it’s still not trivial (or without issues).

https://hexdocs.pm/ecto/3.9.4/Ecto.Query.html#dynamic/1
https://hexdocs.pm/ecto/3.9.4/Ecto.Query.html#with_named_binding/3
https://hexdocs.pm/ecto/3.9.4/Ecto.Query.API.html#parent_as/1
https://hexdocs.pm/ecto/3.9.4/Ecto.Query.API.html#selected_as/1

3 Likes

Agreed, I’m not seeing a reason to use it over Ecto either. And from their roadmap it seems a lot of the things Ecto does are not even on their radar.

I’m curious to dig through their codebase though to see if they have any ideas that Ecto could use.

2 Likes

PRQL was never meant to replace Ecto.
They are two different tools. You can use it with Ecto as well.

Also, it’s still under heavy development(not even v1.0 yet).

The main target audience for PRQL are data scientists and data analysts.
I added PRQL elixir integration because it would be helpful for data analysts to use it in the future with elixir(with tools like Livebook).

The reason it’s using a NIF is that the PRQL compiler is written in Rust, and all other bindings do the same, calling the native implementation.

A lot of issues mentioned here are being addressed by the core team(sort syntax, CTE).

2 Likes

I would propose a title change

PRQL with Ecto

You can pass SQL-like bindings($1, $2) and then use SQL driver to safely replace these bindings when executing a query

I’d imagine so, but given it works the same than with sql it can’t be any improvement over how ecto handles parameterized queries using sql right now.