Are you by chance using PostgreSQL? If so then SELECT 1 FROM table WHERE pattern LIMIT 1
will be equally performant. Unfortunately there is no way to do string interpolation in fragment
as IIRC it is compile-time checked for correctness. There was fragment_unsafe
for brief moment but IIRC it was removed.
No, this does not seem to work either.
Thanks, but unfortunately this is actually meant to operate on an existing query that is being passed in, kind of like so (though this is greatly simplified):
def apply_grant_scoping(query, schema) do
other_table = Other.Schema.__schema__(:source)
query
|> where([resource: a], fragment("exists (
SELECT 1
FROM #{other_table} o
WHERE o.column_name = ?)", ^a.my_field)
)
end
It is basically an implementation of a grant-based authorization system. It runs on a lot of queries we make so performance is paramount. Honestly I guess I can find some way to get all the different possible strings constructed at compile time, but this is pretty frustrating. I feel like as an API it should simply accept the string I give it, or at least have some sort of escape hatch available.
The more we try to optimize our system for performance, the more we have become frustrated with Ecto. It is an excellent library, but it appears geared towards smaller-scale applications that treat the database more like simple data store rather than an equal and essential part of solving complex engineering problems. I of course knew that hand-writing SQL would be necessary eventually, but I just assumed ecto wouldn’t do things like this to prevent me from constructing queries manually, otherwise I would have gone for bare postgrex from the beginning.
Ecto supports subqueries natively in joins. Is there anything preventing you from refactoring the query to be joined instead of being a where condition?
Is there anything preventing you from refactoring the query to be joined instead of being a where condition?
Yes, unfortunately, in order to accomplish this with joins we have to use joins, conditions, and even sometimes unions that end up having significantly worse performance. Initially this logic was all correctly implemented using joins without any fragments. I am only now trying to rewrite things in order to save performance. I have tried a lot of different combinations to try to get the query planner to use various indexes but so far using where exists
and where not exists
is the only strategy that has actually worked. The problem seems to come up a lot with left joins and conditions that exclude some of the resulting null values, along with a few other things. I want to avoid fragment usage wherever possible so I always try to use joins first, but sometimes I am forced to turn to fragments.
Bottom line seems to be that sometimes using exists
with a subquery is significantly more performant than equivalent logic implemented with joins, though I can’t say I know exactly why. I’m sure some of it depends on what version of Postgres, but I’ve tried updating to the latest and still run into the same issues.
For instance, this blog explains one case where using exists
gives a 20x speedup over the equivalent non-exists SQL: https://danmartensen.svbtle.com/sql-performance-of-join-and-where-exists
There are a lot of other situations that arise like that one that are hard to generalize. You just sometimes run into them and need to switch over to using exists
.
Have you benchmarked it on your own with newer PSQL? You can also use LATERAL
in JOIN
to simulate EXISTS
with current Ecto.
Hi @Frogglet! We don’t allow interpolation because it can be unsafe and lead to SQL injection attacks.
Have you tried passing the table name as fragment interpolation: fragment(“... FROM ? ....”, ^table_name)
?
Hey, Jose, thanks for taking the time to help!
Have you tried passing the table name as fragment interpolation:
fragment(“... FROM ? ....”, ^table_name)
?
Yeah, unfortunately that doesn’t seem to work. I get this syntax error: (Postgrex.Error) ERROR 42601 (syntax_error) syntax error at or near "$1"
The sql in question is:
WHERE (exists (
SELECT 1
FROM $1
WHERE 1 = 1))
It doesn’t seem to like using an argument in the FROM
clause.
I seem to remember having read somewhere that table names cannot be parameterised.
Yeah, unfortunately then I think you will have to hard code the table name. If you have multiple table names, you will need a case and a clause for each:
def thing(schema) do
other_table = Other.Schema.__schema__(:source)
from(a in My.Schema) |> add_exists(other_table) |> Repo.all()
end
defp add_exists(query, "omg")
from q in query,
where: fragment("exists (
SELECT 1
FROM omg o
WHERE o.column_name = ?)", ^q.my_field)
)
end
Sorry.
Which can luckily be macro’ed away… Roughly:
[
{"omg", "o"}
]
|> Enum.each(fn {tbl, t} ->
defp add_exists(qry, unquote(tbl)) do
from q in qry,
where: fragment(unquote("exists (
SELECT 1
FROM #{tlb} #{t}
WHERE #{t}.column_name = ?)"), ^q.my_field)
)
end
end
This is a very rough draft though… And I’m not exactly sure about unquote
ing the string passed to fragment
. But this way it should be evaluated at compiletime and “look like” a static string to fragment
.
Have you benchmarked it on your own with newer PSQL? You can also use
LATERAL
inJOIN
to simulateEXISTS
with current Ecto.
I haven’t benchmarked the blog post’s example, but I have definitely benchmarked my code and the equivalent logic implemented in joins is significantly slower. That is a good point about the lateral join though. If I can’t do it with macros I’ll try that, although converting a complex series of exists
and not exists
might be difficult.
So this appears to work:
defmacro exists_frag(table) do
arguments = ["EXISTS (
SELECT 1
FROM #{table} a
WHERE a.my_column = ?
)", (quote do: x.id)]
quote do: fragment(unquote_splicing(arguments))
end
def foo() do
from(x in My.Schema,
where: exists_frag("my_table_name")
)
end
I used this blog post as well: https://dreamconception.com/tech/ecto-how-to-build-dynamic-fragments-and-case-statements/
Nevermind, that doesn’t really appear to work if the tablename isn’t supplied as a literal, which makes sense I guess. Don’t know if there’s a way for macros to actually make this easier or not.
Which can luckily be macro’ed away… Roughly:
Oh I see! I didn’t understand what you were doing at first, but actually yes that is great! I’ll need to tweak it but that should work thanks!
Hi all, I’ve spent a lot of time with this thread and with Ecto fragments - how to regex match dynamically? , but I still can’t quite tell if I’ve got the same problem. If not, I’ll move this over to a new question. Here’s the context:
We’re using Postgres 12 and the jsonb_path_exists
function to find records with a specific value in a json column. (https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-JSON-PROCESSING-TABLE)
def search_text(query, %{"search_string" => _string_not_used}) do
from(r in query,
where:
fragment(
"jsonb_path_exists(?, '$.** \\? (@ == \"some_text\")')",
r.body
)
)
end
The above code works like a charm, but when I plug in the variable interpolation like so:
def search_text(query, %{"search_string" => search_string}) do
from(r in query,
where:
fragment(
"jsonb_path_exists(?, '$.** \\? (@ == \"?\")')",
r.body,
^search_string
)
)
end
the resulting query is {"SELECT r0.\"id\", r0.\"project_id\", r0.\"inserted_at\" FROM \"receipts\" AS r0 WHERE (r0.\"project_id\" = $1) AND (r0.\"inserted_at\" >= $2) AND (r0.\"inserted_at\" < $3) AND (jsonb_path_exists(r0.\"body\", '$.** ? (@ == \"$4\")')) ORDER BY r0.\"id\" DESC", [1, ~U[2020-03-31 23:17:04Z], ~U[2020-04-30 23:17:28Z], "variable_text"]}
and I get:
%ArgumentError{
message: "parameters must be of length 3 for query %Postgrex.Query{
cache: :reference, columns: [\"count\"],
name: \"ecto_7266\",
param_formats: [:binary, :binary, :binary],
param_oids: [23, 1114, 1114],
param_types: [Postgrex.Extensions.Int4, Postgrex.Extensions.Timestamp, Postgrex.Extensions.Timestamp],
ref: #Reference<0.1575397138.1124859907.232543>, result_formats: [:binary], result_oids: [20],
result_types: [Postgrex.Extensions.Int8],
statement: \"SELECT count('*') FROM \\\"receipts\\\" AS r0 WHERE (r0.\\\"project_id\\\" = $1) AND (r0.\\\"inserted_at\\\" >= $2) AND (r0.\\\"inserted_at\\\" < $3) AND (jsonb_path_exists(r0.\\\"body\\\", '$.** ? (@ == \\\"$4\\\")'))\",
types: {Postgrex.DefaultTypes, #Reference<0.1575397138.1124990979.26616>}
}"
}
I gather this has to do with Postgrex (or Ecto) passing in my variable as 'variable_text'
, resulting in "'variable_text'"
in Postgres, and I’ve tried to use unquote(…) or the like, but can’t quite figure it out. Is there something big that I’m missing, and am I reading the right thread?
Would custom Ecto types be a better place to look?
Thanks so much in advance. @Frogglet, if that final @NobbZ post worked for you, would you mind sharing your code?
@taylordowns2000
working example:
def search_text(query, %{"search_string" => search_string}) do
from(r in query,
where:
fragment(
"jsonb_path_exists(?, '$.** \\? (@ == $search_string)', jsonb_build_object('search_string', ?::text))",
r.body,
^search_string
)
)
end
I had a similar issue trying to dynamically build fragment strings (from a trusted source) and it seems like a general escape hatch for these kind of issues, when trying to use third party macros with dynamic inputs where not explicitly supported is to use Code.eval_quoted
. So, for example, you can wrap an entire select statement and fool Elixir/Ecto into thinking the variables are in fact literals:
Code.eval_quoted(
quote do
select(unquote(escaped_q), [table], %{
dynamic_select: fragment(unquote(some_var), table.field))
})
end
)
I am pretty positive this is the last tool you should reach for and maybe it would be better to use postgrex directly for these cases. I haven’t tried that so I can’t speak to the pros/cons but I’d love to hear people’s thoughts about the practice, and maybe whether Ecto could adopt some “blessed” way of circumventing protections like this one because I certainly feel dirty doing this.
Searched long and hard for this solution! Worked like a charm—thank you!