Ecto - raw query of array of nested-jsonb, parameter issues. Fragments?

Hey, I am trying to figure out how to make a raw query in ecto.
The query shall search in an array of nested jsonb elements. and I had troubles with the parameter in ecto.

I specifically looked at these issues:
My question is a “mixture” of these I guess:

and this issue:

The working solution at the moment is: variable is the parameter, it comes in form of a "username" string.

raw_sql =  ~s{
SELECT * FROM
  (SELECT *,
  generate_subscripts(docs_map, 1) AS s
  FROM docassoc where archived_timestamp = '9999-01-01 01:01:00') AS foo
WHERE jsonb_path_exists(docs_map[s], '$.ad_name[*] ? (@ == "#{variable}" )') }

  Ecto.Adapters.SQL.query!(
    AwardDB.Repo, raw_sql , []
  )

because I was not able to pass the parameter via $1 when the sql variable is wrapped like this. (as in the first issue above)

Can someone help my by pointing out how this should/could be improved. I guess like this it is not really “safe” (or is it?).
The ecto learning-curve is quite steep for me, I have not managed by using fragments (of course starting with jsonb adds to the steepness :wink: ). Can every query, thus also mine be represented using fragments? Then I’ll try this again. On the other hand if it gets much more complicated, I think readability would also be an issue somewhen :expressionless:

Many thanks for thinking through and for possible ideas!

“Safety” depends on where variable comes from in the raw SQL; if an attacker can get "') OR TRUE -- in there, the WHERE clause is always true. How big of a problem this is depends on what’s in that table and who can access this query.

jsonb_path_exists takes a map of named parameters as a third argument, maybe that would be easier to bind in a fragment?

I can’t find any specific information regarding how operators like jsonb_path_exists use indexes - depending on the size of your table, that could make this query very very very slow. Consider storing the data in a shape that’s easier to query.

1 Like

Could you share your migration file that creates the table + jsonb[] field?

If you are using {:array, :map}, or {:array, :jsonb} as the underlying type, it can be replaced with :jsonb, default: "[]" - which might allow simple JSON querying as apposed to pg array + JSON querying

1 Like

Thank you for your thoughts!

Indeed, safety is maybe not really an issue. This is not exposed to “users” or the “internet” or such… I guess I was asking for sort of “malpractice” but also then, this question maybe is just not really relevant :wink:

It could be that jsonb_path_exists doesn’t support indexes, I have to digg into this, first I am planning to do a small “experiment” without further optimization. I am doing this “on purpose” to test if such datastorage could be a replacement for my ageing mongodb solution. I am also open to innovative ideas (graphdb or such), but I thought I give the “normal” solution with postgres a chance but try to still keep the good parts from previous solutions. As I am integrating different datasources (files, differend DB’s, later kafka) I try to find a principle which is very modular and does not need too complicated db specific “migrations” or modeling. But I know that this “dream” is often destroyed by reality (or at least I think I have read that this is an antipattern quite often already).

I’ll do the performance part first, because now I am sort of “afraid” but then this with the “named parameters” sounds interesting. It is good to know that you don’t see a reason why this “cannot work” with fragments.

Many thanks!

Hey, I don’t quite understand yet, but this sounds interesting!

This is my experimental setup:

    create table(:docassoc, primary_key: false) do
      add :archived_timestamp,  :utc_datetime, primary_key: true, default:  "9999-01-01 01:01:00Z"
      add :fid, :integer, primary_key: true
      add :source, :string, primary_key: true
      add :origin_timestamp, :utc_datetime
      add :docs_map, {:array, :map}, default: []
      add :source_file,  :string
   end

introducing a three part primary key…

is this "default: [] " of mine already like you mean it, or would that actually be with the quoted "[ ]"?

edit: I think the difference is to have the “array” in jsonb instead of having an array of jsonb? correct?

edit2: in case I got the difference right, would this solution also allow for indexing and be preferable also from this point of view?

edit3: and many thanks also to you of course! :smiley:

edit4: i just run the adapted migration and made the insertion test, I was affraid, that my embeds_many is going to break… but seems just to work :smiley: that’s nice!

edit5: the simplicity is overwhelming SELECT * FROM docassoc WHERE docs_map @> '[{"ad_name":"testad"}]' no to fragments… :wink: I am really glad I asked! (waisted a few hours, but worth it :wink: )

edit6:
the json variable part in ecto/postgrex is a struggle on its own, but thanks to other people, this is what I ended up doing:

    ad_name = %{"ad_name" => ad_name}

    query =
    from doc in AwardDB.Docassoc,
    where: doc.archived_timestamp ==  ^DateTime.truncate(~U[9999-01-01 01:01:00Z], :second) and
          fragment(~s(? @>  ? ) , doc.docs_map, ^[ad_name] )

    AwardDB.Repo.all(query)
1 Like