SQL - Brings an extensible SQL parser and sigil to Elixir, confidently write SQL with automatic parameterized queries.

Hey there, I wrote this low-level library recently, its goal is simply to lower the barrier between Elixir and SQL, and it does that by providing a SQL sigil.

I’ll list a few highlights compared to raw SQL and Ecto.Query, although the goal with this library is not to replace either, but to provide fundamental safe building blocks to Ecto.SQL.Adapters.

  • Lower the barrier for DBAs to contribute to your codebase, without having to translate SQL to Ecto.Query.
  • Composable queries, no need for you to remember, when to start with select or from.
  • Interpolation-al queries, don’t fiddle with fragments and ?.
  • prevents SQL injection, by design.

You might have noticed that I’ve listed prevent SQL injection, and if you’re thinking: wait a minute, Ecto.Query is already by design preventing SQL injection.

You would be correct, However, Ecto.SQL also provides the query/2 and query!/2 functions that are vulnerable to SQL injection since they take a String and a list of params, the intention with this API, is to give you an escape hatch for executing parameterized queries.

This library is a result of a thread on the mailing list: https://groups.google.com/g/elixir-ecto/c/8MOkRFAdLZc

As this is an MVP, I’m looking for your feedback.

27 Likes

Very interesting work! It’s on my list of things to try out with EctoFDB.

My use case: FDB has no SQL so I’m thinking I could make a cli tool to query EctoFDB (similar to psql for Postgres) that uses the Ecto adapter and your SQL library.

1 Like

Pure serendipity, I’ve been looking at a very similar challenge with nimble parsec yesterday.

Namely, splitting a file full of AWS Athena DDL/SQL statements (it only accepts one statement at a time) while interpolating some variables in the DDL statements.

I’m eager to read your implementation. There are some challenging edge cases such as things like user ‘O\’Malley’ and the ubiquitous’Little Bobby;drop tables’.

2 Likes

I would love to see that file, although I did start out with a version in nimble parsec, but I was pushing the limit of how I wanted to write the MVP and I didn’t wanna try to retro fit the grammar to make it composable.

So I ended up hand writing it. (this could change in the future though)

If I understand parser terminology right, then what I have written is a recursive decent parser / single pass as I read the string the AST is being built. And nimble parsec does not fit that well, but it would be great for spitting out tokens and then you could write some logic to analyze and build your AST.

The edge cases you have listed are those interpolated values or strings?

If the latter, then I’m not sure if my parser could handle those cases, as I only built it with the sigil in mind, not to migrate files of exotic SQL statements. Furthermore I have yet to implement DDL keywords in the parser and any keyword not tokenized by the parser would end up as ident tokens and passed through.

2 Likes

Did you try with Erlang’s yecc and leex?

1 Like

Yes, I’ve used that in the past, and I might use it for this in the future. But generated parsers does not come without it flaws.

For this project I wanted to see if we could get all the benefits of Ecto.Query and Elixir with its composability but with less abstraction and more control.

5 Likes

I looked closely at this, and my parser does not support backticks yet. But it was pretty simple to add:

iex(18)> ~SQL[select `Little Bobby;drop tables`].query
[
  {:select, [line: 0, column: 6, end_line: 0, end_column: 32],
   [
     {:"``", [line: 0, column: 8, end_line: 0, end_column: 32],
      [
        {:ident, [line: 0, column: 8, end_line: 0, end_column: 32],
         [~c"Little Bobby;drop tables"]}
      ]}
   ]}
]

For what it’s worth, it looks like your examples got botched. I believe that only quotes and backticks are valid SQL syntax, but with that in mind, my parser would have trouble with O\Malley, but that also smells like invalid SQL, and should be: 'OMalley’:

iex(26)> ~SQL[select  'O`Malley'].query
[
  {:select, [line: 0, column: 6, end_line: 0, end_column: 17],
   [
     {:"''", [line: 0, column: 9, end_line: 0, end_column: 17],
      [
        {:ident, [line: 0, column: 9, end_line: 0, end_column: 17],
         [~c"O`Malley"]}
      ]}
   ]}
]
1 Like

Awesome library, thank you for sharing this!

After a cursory look, my 2c:

Currently we have:

iex> ~SQL"SELECT 42"
select 42

I’d consider using ~SQL[...] or #SQL[...] in the inspect implementation:

iex> ~SQL"SELECT 42"
~SQL[select 42]

Currently we have:

x = 1 ; ~SQL"SELECT #{x}"

By convention we use interpolations with lowercase sigils but (to my dismay) we can’t have lowercase multi letter ~sql so another idea would be to have slightly different interpolation syntax, say:

x = 1 ; ~SQL"SELECT {x}"

Besides following convention I think doing this would actually be a security feature, never ever allow users to use string interpolations anywhere near SQL. Once users feel comfortable using this library that does the right thing they will inevitably (ok, I’m projecting, I definitely will) call sql = "select #{x}" ; Repo.query!(sql) and that can turn into a bug pretty fast. Using a distinct syntax like {...} prevents this. In hindsight, Repo.query! accepting sanitized strings (akin to, say, Phoenix.HTML.Safe) might have been a better default.

4 Likes

Thanks for the feedback, I’m currently working my way through the SQL grammar, once I can generate a parser from the BNF file, then we can make a decision based on our options, from what I can tell then braces are valid SQL syntax, but I couldn’t find any #.

As I’m polishing off the next version which currently is scheduled to be released in the coming weeks, and I got two exciting sneak peaks, we’re now more or less conformant with SQL 2016 with over 900 generated tests and SQL generation is 400-650x faster than Ecto.

9 Likes

This is amazing!

1 Like

I’m very excited for the new release of sql, which comes with best in class performance and memory usage, with a minimum of 50x compared to Ecto.

The test suite has also gotten an overhaul with over 900 test, testing the conformance of SQL 2016.

sql does now also supports prepared queries.

Checkout the sql/CHANGELOG.md at main · elixir-dbvisor/sql · GitHub for more!

7 Likes

The time has come to merry Ecto and SQL, so I’m asking you for your feedback what do you think about this:

iex(1)> Enum.into(~SQL[from users select *], User.new())

16:31:00.046 [debug] QUERY OK db=0.4ms decode=0.7ms queue=1.6ms
select * from users []
[
  #User<
    __meta__: #Ecto.Schema.Metadata<:built, "users">,
    id: 1,
    name: "john",
    age: 18,
    ...
  >
]
defmodule User do
  use Ecto.Schema

  schema "users" do
    field :name, :string
    field :age, :integer, default: 0
    field :password, :string, redact: true
  end

  def new, do: %User{}

  defimpl Collectable do
    def into(%struct{__meta__: %Ecto.Schema.Metadata{}} = schema) do
      {[], fn rows, _acc -> Enum.map(rows, &(Ecto.Changeset.apply_action!(Ecto.Changeset.cast(schema, &1, struct.__schema__(:fields)), :insert))) end}
    end
  end
end
defmodule SQL.Repo do
  use Ecto.Repo, otp_app: :sql, adapter: Ecto.Adapters.Postgres

  defimpl Enumerable, for: SQL do
    def count(_enumerable) do
      {:error, __MODULE__}
    end
    def member?(_enumerable, _element) do
      {:error, __MODULE__}
    end
    def reduce(%SQL{} = enumerable, _acc, _fun) do
      {sql, params} = SQL.to_sql(enumerable)
      result = SQL.Repo.query!(sql, params)
      {:done, Enum.map(result.rows, &Map.new(Enum.zip(result.columns, &1)))}
    end
    def slice(_enumerable) do
      {:error, __MODULE__}
    end
  end
end

Obvious above is a minimal and unoptimized example that is leveraging Ecto.Changeset for the heavy lifting.

3 Likes

My 2 cents follow..

It’s neat but I’m afraid it asks too much of the reader. At first glance it appears to be a pure function but there are side effects deep in the protocol. I usually prefer to be able to point to obvious code where the DB work happens.

Yeah, the Enumerable and Collectable protocols are definitely the more advanced concepts in Elixir, and does require a lot of knowledge for new users. The same issue you point out here is true for any IO work even file streams.

2 Likes

Protocol implementations are of course arbitrary by design but it strikes me as odd that Enum.into(..., %User{}) returns anything other than a %User{}, the same type of thing we’re collecting into. Even the following which we tend to use only for side effects returns the collectable struct:

iex> Enum.into(~w[foo bar baz], IO.stream())
foobarbaz%IO.Stream{...}

That being said, at the moment I don’t have any alternative suggestions. :slight_smile: One thing about your implementation though, instead of Ecto.Changeset.apply_action!, I think you can use Repo.load/2. It should perform the same type “casting” (actually Ecto.Type.load/2) as when receiving the data from the usual Repo.all(struct):

iex> Repo.all(from u in "users", select: %{id: u.id, email: u.email}) |> Enum.map(&Repo.load(User, &1))
[
  %User{
    __meta__: #Ecto.Schema.Metadata<:loaded, "users">,
    id: 1,
    email: "alice@example.com",
    inserted_at: nil,
    updated_at: nil
  }
]
3 Likes

Enum.each(~SQL"...", fun) is really cool, I love stuff like that. By implementing Enumerable for %SQL{} ourselves we need to pick a particular repo which honestly is not that big of a deal, we tend to only have one anyway but still, it feels limiting.

WDYT about this?

  sql = ~SQL"..."
- Enum.each(sql, fun)
+ Repo.all(sql) |> Enum.each(fun)

i.e. instead of users implementing Enumerable themselves, SQL implements Ecto.Queryable!

4 Likes

The only alternative I could thing of would be a callback for the reduce in the SQL enumerable protocol.

iex(1)> Enum.to_list(~SQL[from users select *])

16:31:00.046 [debug] QUERY OK db=0.4ms decode=0.7ms queue=1.6ms
select * from users []
[
  #User<
    __meta__: #Ecto.Schema.Metadata<:built, "users">,
    id: 1,
    name: "john",
    age: 18,
    ...
  >
]

But that could only work if Ecto.Queryable would produce the SQL struct, so we know what structure it should be casted to.

oops, implementing Ecto.Queryable won’t work because it’s, well, generating SQL off Ecto.Query but the whole point is we already have SQL. Nevermind!

Agree, the super power here is the Enum protocol. And I definitely see a path forward for this, it’s just about finding the sweat spot of where to store and retrieve the essential information to query and load the result set.

Today SQL already leveraging using to set adapters, and this could be leveraged for Repo etc.

But I’m still not sure what the best approach is for a clean API.