Ecto not allowing string interpolation in fragments?

I am trying to use where exists (), which requires a fragment in ecto. I need to use a different table name for the subquery in different situations, so I thought a variable that gets interpolated would be ideal. No, the value does not come from an unsafe source. I am aware of the dangers of SQL injection attacks and how to avoid them. I am selecting the value of the variable from another ecto schema inside a case expression. Surely there must be some way of doing this, without having to entirely disconnect from the Ecto abstractions we already depend on?

Here is an example:

  def thing(schema) do
    other_table = Other.Schema.__schema__(:source)
    from(a in My.Schema,
      where: fragment("exists (
        SELECT 1
        FROM #{other_table} o
        WHERE o.column_name = ?)", ^a.my_field)
    )
    |> Repo.all()
  end

And here is the error:

(Ecto.Query.CompileError) to prevent SQL injection attacks, fragment(...) does not allow strings to be interpolated as the first argument via the `^` operator, got: `"exists (\n        SELECT 1\n        FROM #{other_table} o\n        WHERE o.column_name = ?)"

We are trying to use exists because it is significantly more performant than the alternatives that ecto provides for our use-case. We are reaching the scale where inefficient queries that can’t use indexes properly are becoming critical problems.

2 Likes

Have you tried do the string interpolation ahead of building the query.
e.g.

 other_table = ...
 frag = "exists(SELECT 1 FROM #{other_table} WHERE o.column_name = ?)"
 from (a in My.Scheme,
  where: fragment(frag, ^a.my_field)

I haven’t tested it myself, but it’s the first thing I’d try

Are you by chance using PostgreSQL? If so then SELECT 1 FROM table WHERE pattern LIMIT 1 will be equally performant. Unfortunately there is no way to do string interpolation in fragment as IIRC it is compile-time checked for correctness. There was fragment_unsafe for brief moment but IIRC it was removed.

2 Likes

No, this does not seem to work either.

Thanks, but unfortunately this is actually meant to operate on an existing query that is being passed in, kind of like so (though this is greatly simplified):

  def apply_grant_scoping(query, schema) do
    other_table = Other.Schema.__schema__(:source)
    query
    |> where([resource: a], fragment("exists (
        SELECT 1
        FROM #{other_table} o
        WHERE o.column_name = ?)", ^a.my_field)
    )
  end

It is basically an implementation of a grant-based authorization system. It runs on a lot of queries we make so performance is paramount. Honestly I guess I can find some way to get all the different possible strings constructed at compile time, but this is pretty frustrating. I feel like as an API it should simply accept the string I give it, or at least have some sort of escape hatch available.

The more we try to optimize our system for performance, the more we have become frustrated with Ecto. It is an excellent library, but it appears geared towards smaller-scale applications that treat the database more like simple data store rather than an equal and essential part of solving complex engineering problems. I of course knew that hand-writing SQL would be necessary eventually, but I just assumed ecto wouldn’t do things like this to prevent me from constructing queries manually, otherwise I would have gone for bare postgrex from the beginning.

Ecto supports subqueries natively in joins. Is there anything preventing you from refactoring the query to be joined instead of being a where condition?

1 Like

Is there anything preventing you from refactoring the query to be joined instead of being a where condition?

Yes, unfortunately, in order to accomplish this with joins we have to use joins, conditions, and even sometimes unions that end up having significantly worse performance. Initially this logic was all correctly implemented using joins without any fragments. I am only now trying to rewrite things in order to save performance. I have tried a lot of different combinations to try to get the query planner to use various indexes but so far using where exists and where not exists is the only strategy that has actually worked. The problem seems to come up a lot with left joins and conditions that exclude some of the resulting null values, along with a few other things. I want to avoid fragment usage wherever possible so I always try to use joins first, but sometimes I am forced to turn to fragments.

Bottom line seems to be that sometimes using exists with a subquery is significantly more performant than equivalent logic implemented with joins, though I can’t say I know exactly why. I’m sure some of it depends on what version of Postgres, but I’ve tried updating to the latest and still run into the same issues.

1 Like

For instance, this blog explains one case where using exists gives a 20x speedup over the equivalent non-exists SQL: https://danmartensen.svbtle.com/sql-performance-of-join-and-where-exists

There are a lot of other situations that arise like that one that are hard to generalize. You just sometimes run into them and need to switch over to using exists.

1 Like

Have you benchmarked it on your own with newer PSQL? You can also use LATERAL in JOIN to simulate EXISTS with current Ecto.

1 Like

Hi @Frogglet! We don’t allow interpolation because it can be unsafe and lead to SQL injection attacks.

Have you tried passing the table name as fragment interpolation: fragment(“... FROM ? ....”, ^table_name)?

3 Likes

Hey, Jose, thanks for taking the time to help!

Have you tried passing the table name as fragment interpolation: fragment(“... FROM ? ....”, ^table_name) ?

Yeah, unfortunately that doesn’t seem to work. I get this syntax error: (Postgrex.Error) ERROR 42601 (syntax_error) syntax error at or near "$1"

The sql in question is:

WHERE (exists (
        SELECT 1
        FROM $1
        WHERE 1 = 1))

It doesn’t seem to like using an argument in the FROM clause.

1 Like

I seem to remember having read somewhere that table names cannot be parameterised.

1 Like

Yeah, unfortunately then I think you will have to hard code the table name. If you have multiple table names, you will need a case and a clause for each:

  def thing(schema) do
    other_table = Other.Schema.__schema__(:source)
    from(a in My.Schema) |> add_exists(other_table) |> Repo.all()
  end

  defp add_exists(query, "omg")
    from q in query,
      where: fragment("exists (
        SELECT 1
        FROM omg o
        WHERE o.column_name = ?)", ^q.my_field)
      )
  end

Sorry. :confused:

5 Likes

Which can luckily be macro’ed away… Roughly:

[
  {"omg", "o"}
]
|> Enum.each(fn {tbl, t} ->
  defp add_exists(qry, unquote(tbl)) do
    from q in qry,
      where: fragment(unquote("exists (
        SELECT 1
        FROM #{tlb} #{t}
        WHERE #{t}.column_name = ?)"), ^q.my_field)
      )
  end
end

This is a very rough draft though… And I’m not exactly sure about unquoteing the string passed to fragment. But this way it should be evaluated at compiletime and “look like” a static string to fragment.

8 Likes

Have you benchmarked it on your own with newer PSQL? You can also use LATERAL in JOIN to simulate EXISTS with current Ecto.

I haven’t benchmarked the blog post’s example, but I have definitely benchmarked my code and the equivalent logic implemented in joins is significantly slower. That is a good point about the lateral join though. If I can’t do it with macros I’ll try that, although converting a complex series of exists and not exists might be difficult.

So this appears to work:

  defmacro exists_frag(table) do
    arguments = ["EXISTS (
      SELECT 1
      FROM #{table} a
      WHERE a.my_column = ?
    )", (quote do: x.id)]
    quote do: fragment(unquote_splicing(arguments))
  end

  def foo() do
    from(x in My.Schema,
      where: exists_frag("my_table_name")
    )
  end

I used this blog post as well: https://dreamconception.com/tech/ecto-how-to-build-dynamic-fragments-and-case-statements/

Nevermind, that doesn’t really appear to work if the tablename isn’t supplied as a literal, which makes sense I guess. Don’t know if there’s a way for macros to actually make this easier or not.

Which can luckily be macro’ed away… Roughly:

Oh I see! I didn’t understand what you were doing at first, but actually yes that is great! I’ll need to tweak it but that should work thanks! :grinning:

Hi all, I’ve spent a lot of time with this thread and with Ecto fragments - how to regex match dynamically? , but I still can’t quite tell if I’ve got the same problem. If not, I’ll move this over to a new question. Here’s the context:

We’re using Postgres 12 and the jsonb_path_exists function to find records with a specific value in a json column. (https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-JSON-PROCESSING-TABLE)

def search_text(query, %{"search_string" => _string_not_used}) do
  from(r in query,
    where:
      fragment(
        "jsonb_path_exists(?, '$.** \\? (@ == \"some_text\")')",
        r.body
      )
  )
end

The above code works like a charm, but when I plug in the variable interpolation like so:

def search_text(query, %{"search_string" => search_string}) do
  from(r in query,
    where:
      fragment(
        "jsonb_path_exists(?, '$.** \\? (@ == \"?\")')",
        r.body,
        ^search_string
      )
  )
end

the resulting query is {"SELECT r0.\"id\", r0.\"project_id\", r0.\"inserted_at\" FROM \"receipts\" AS r0 WHERE (r0.\"project_id\" = $1) AND (r0.\"inserted_at\" >= $2) AND (r0.\"inserted_at\" < $3) AND (jsonb_path_exists(r0.\"body\", '$.** ? (@ == \"$4\")')) ORDER BY r0.\"id\" DESC", [1, ~U[2020-03-31 23:17:04Z], ~U[2020-04-30 23:17:28Z], "variable_text"]} and I get:

%ArgumentError{
  message: "parameters must be of length 3 for query %Postgrex.Query{
    cache: :reference, columns: [\"count\"],
    name: \"ecto_7266\",
    param_formats: [:binary, :binary, :binary],
    param_oids: [23, 1114, 1114],
    param_types: [Postgrex.Extensions.Int4, Postgrex.Extensions.Timestamp, Postgrex.Extensions.Timestamp],
    ref: #Reference<0.1575397138.1124859907.232543>, result_formats: [:binary], result_oids: [20],
    result_types: [Postgrex.Extensions.Int8],
    statement: \"SELECT count('*') FROM \\\"receipts\\\" AS r0 WHERE (r0.\\\"project_id\\\" = $1) AND (r0.\\\"inserted_at\\\" >= $2) AND (r0.\\\"inserted_at\\\" < $3) AND (jsonb_path_exists(r0.\\\"body\\\", '$.** ? (@ == \\\"$4\\\")'))\",
    types: {Postgrex.DefaultTypes, #Reference<0.1575397138.1124990979.26616>}
  }"
}

I gather this has to do with Postgrex (or Ecto) passing in my variable as 'variable_text', resulting in "'variable_text'" in Postgres, and I’ve tried to use unquote(…) or the like, but can’t quite figure it out. Is there something big that I’m missing, and am I reading the right thread?

Would custom Ecto types be a better place to look?

Thanks so much in advance. @Frogglet, if that final @NobbZ post worked for you, would you mind sharing your code?

1 Like

@taylordowns2000
working example:

def search_text(query, %{"search_string" => search_string}) do
  from(r in query,
    where:
      fragment(
        "jsonb_path_exists(?, '$.** \\? (@ == $search_string)', jsonb_build_object('search_string', ?::text))",
        r.body,
        ^search_string
      )
  )
end
2 Likes