Ecto Fragment with dynamic JSON fields

steffan_levet · February 28, 2019, 7:36pm

Hi all,

I have an interesting situation where I am trying to build dynamic queries which includes the need to group the query by some dynamic json fields.
To make this a little more concrete, the table in question is called Event, with one column called topic, and one called data which is jsonb. Depending on the value of topic, the contents of data will be different.

What I want to do is this:

json_field = build_json_field(params) #interpret stuff that came in from a request say 

query
|> group_by([e], fragment(^json_field))

Unfortunately, this doesn’t work to prevent sql injection.

I know that fragment supports a list of fragments eg.

query
|> group_by([e], fragment("data->'?'", ^json_field))

however, the json_field can be any arbitrary value, that can go to any arbitrary depth ie.
“data->‘changes’->‘items’->0->‘plan’”

How could I do this using fragment?
I am afraid I may need to just make the raw sql query, but that feels like a hack.

Thanks for any guidance

OvermindDL1 · March 5, 2019, 9:12pm

Hmm, this would be in the area of unsafe_fragment but that feature was removed before its full release. If you know the possible depth range you can just setup a case to encode all variations perhaps?

axelson · March 5, 2019, 10:17pm

I don’t think unsafe_fragment is needed for this use-case. It sounds similar to my Building an Ecto macro to generate a row value question.

The core of creating that macro was:

    quote do
      fragment(unquote(frag_text), unquote_splicing(frag_arguments))
    end

Where frag_text is a binding that contains a variable amount of "?" and frag_arguments is a list with a value for each question mark.

@steffan_levet A few more examples of input and manual query would make it easier to help out.

OvermindDL1 · March 5, 2019, 10:19pm

Except from what I gathered the number of ? is based on user input, so a quote won’t work there, thus I’d think you’d have to generate a set of matches for it and pick them at compile-time based on the count, which does necessarily impose a max size of course… Hmm, might be a way to do nested where’s by binding a value, but that would make the query more costly…

axelson · March 5, 2019, 10:42pm

Ah yeah, that’s true. I guess I forgot that you have to expand the macro at compile time!

steffan_levet · March 5, 2019, 11:57pm

Yeah so I have tried to solve this with macros but as @OvermindDL1 correctly notes it would impose a max size.

Regarding more concrete examples, the column in question is just a blob of JSON that can be anything, so concrete examples are not that helpful I don’t think. I can explain what I am trying to do a little more, and they may help move the discussion.

The api in question lets you provide a couple of date ranges, a segment value, along with a query on a topic value, and get back a list of segmented results for the date range with a count and a total sum. eg:

-topic=account_created
-before=2019-02-22T00:00:00Z
-after=2019-02-19T00:00:00Z
-segment[unit]=day
-segment[value]=1

will return:

{
  "total": 5,
  "range": [
    {
      "value": 0,
      "total": 0,
      "before": "2019-02-20T00:00:00+00:00",
      "after": "2019-02-19T00:00:00+00:00"
    },
    {
      "value": 2,
      "total": 2,
      "before": "2019-02-21T00:00:00+00:00",
      "after": "2019-02-20T00:00:00+00:00"
    },
    {
      "value": 3,
      "total": 5,
      "before": "2019-02-22T00:00:00+00:00",
      "after": "2019-02-21T00:00:00+00:00"
    }
  ]
}

That works fine, and is not that hard to do. No though, we want to break up the value field of each segment by some arbitrary group_by. It is arbitrary because different topics have different chunks of JSON associated.

The example I provided in my opening question “data->‘changes’->‘items’->0->‘plan’” should result in something like:

{
      "value": {
          plan_1: 2, 
          plan_2: 1
      },
      "total": 5,
      "before": "2019-02-22T00:00:00+00:00",
      "after": "2019-02-21T00:00:00+00:00"
    }

Hopefully that is clear. Here is the query I would want to make in the above example:

query
    |> group_by([e], fragment("data->'changes'->'items'->0->'plan'"))
    |> select(
      [e],
      {fragment("data->'changes'->'items'->0->'plan'"),
       fragment("count(data->'changes'->'items'->0->'plan')")}
    )

pretty simple, we are just adding in the group_by, and instead of selecting the count, we are selecting on the fragments. I have written this in SQL, and got the results I want.

unsafe_fragment is what I want. I like that Ecto is protecting young players, but in this case it is really limiting me.
Do you know where I might be able to find that source, maybe I need to write my own unsafe_fragment?

Thanks for the help here