Parsing custom query language and query Ecto database

I’m rewriting a query parser I have already written in Python twice, with pyparsing/sqlalchemy, and with ply/django. I am now interested in seeing it at work within Elixir. I am VERY new to Elixir, so I’m afraid I’m not yet in the right mindset.

this is the target:

and this is the corresponding ply/Python code:

I am not sure about a large amount of issues.

when tokenizing words, I have reserved words, too. in my ply grammar, I let the user write strings quoted or unquoted, but I think I will drop this, to make things easier. or what would you suggest?

are there guidelines / better styles to follow when speaking of Terminals and Nonterminals? I would put Terminals in ALL CAPS, but what’s the impact on the code?

to make an example, is the form ‘[’ preferable to LBRACKET ?

coming from Python, I realise I have the inclination to think I’m producing an object when parsing the query string, and in the end I would evaluate the object, which would be a query. but I guess this is not the way I should think here. I would be building a data structure, which I would then feed to one or more functions (as many as the methods of my python class), defined by pattern-match.

leaving alone when we come to Ecto, where I will need to compute unions and intersections and negations of query sets… and navigating relations between tables… and implementing aggregating functions.

just as an example, these are two legal queries:
taxon where rank.id>=17 and count(verifications)>0
accession where id in [1 5 111] and count(plants.images)>0

it would be of great help getting: code contributions and reviews, reading suggestions, related GPL software sources.

1 Like

I’d highly suggest checking out https://github.com/plataformatec/nimble_csv as far as parsing goes.

Hmmmmm I think maybe @benwilson512 meant nimble_parsec. Which would be good for this project, more expressive and easier to debug than leex and yecc.

3 Likes

Oops, I did!

Wouldn’t leex/yecc mean writing less code and maybe be faster?

having seen the tiny examples in nimble_parsec, and being comfortable with the lex+yacc combination in C and Python, yes, I have this same impression.

Consider the leex and yecc packages in the OTP standard library:

Less code: yes, I would agree with that (I’ve done a reasonable amount of work in leex/yecc and with nimble_parsec).

Faster: Not sure I agree. Theres some strong sub-binary optiimization in nimble_parsec too, but for sure it would require proper testing of like for like to decide. Or someone more qualified in both approaches than me,

If one has been done the learning curve of leex/yecc or their even more ancient cousins then they are certainly straight forward to use (although I do find removing shift/reduce errors/warnings less than obvious sometimes).

All said, thats why hoped to indicate that if one is getting into parsing for the first time then parse combinators are more approachable and, in my opinion, easier to debug.

1 Like

so, I’m a few steps further. keep in mind that I already had the grammar, I only need the elixir/erlang code associated to the productions.

“a few steps further”, meaning I can parse stuff like "accession where taxon.rank.id=4", into {:where, {:domain, 'accession'}, {{:operator, :cmp_eq}, ['taxon', 'rank', 'id'], 4}}

and if you want to criticize my code to pieces, you’re most welcome! it’s as said on github.

now my question was, I come from object oriented C++ / Python, and my result of parsing would be an object which can execute tasks. in this case, it would be a database query, to which I would ask, please, the count, or again please, all the matching database records.

how would I proceed now, am not so sure, possibly just two count and add functions, defined by pattern matching, and I only need to find out how to manage the Ecto functions.

You should look at Filtrex, it runs ecto queries from a map of filters.
I used it in one of my project. You could also look at Forage, Mandarin + Forage - An admin tool for phoenix, it seems to be more powerful, there is no documentation but you could ask some help to the author. He answered quickly to my questions.

2 Likes

thank you for the hint, I had an extremely quick look (at Filtrex) … is it so that these filters only have intersect? and not union - exclusion? I’ll check with more time tomorrow or so. anyway looks like an interesting point, thank you.

Yes, shift-reduce errors can be quite difficult to find and fix. From the very start I sort of dived in at the deep end by implementing leex :wink:, and yecc is actually a very old erlang tool.

but I’m wondering … should I be scared of the quoted format. I’m not sure why I should not just produce it from my parser and put it into a macro?

iex(162)> quote do
...(162)> from c in City, where: (c.country == "Sweden") or
...(162)>                        (c.country == "USA" and c.name == "New York")
...(162)> end
{:from, [context: Elixir, import: Ecto.Query],
 [{:in, [context: Elixir, import: Kernel],
   [{:c, [], Elixir}, {:__aliases__, [alias: false], [:City]}]},
  [where: {:or, [context: Elixir, import: Kernel],
           [{:==, [context: Elixir, import: Kernel],
             [{{:., [], [{:c, [], Elixir}, :country]}, [], []}, "Sweden"]},
            {:and, [context: Elixir, import: Kernel],
             [{:==, [context: Elixir, import: Kernel],
               [{{:., [], [{:c, [], Elixir}, :country]}, [], []}, "USA"]},
              {:==, [context: Elixir, import: Kernel],
               [{{:., [], [{:c, [], Elixir}, :name]}, [], []}, "New York"]}
             ]}
           ]}
  ]
 ]}
iex(164)> "city where country='Sweden' or (country='USA' and name='New York')" |>
...(164)> to_charlist() |> :lexer.string() |>                           
...(164)> (fn {_, x, _} -> x end).() |> :parser.parse() |>              
...(164)> (fn {_, x} -> x end).()
{:where, {:domain, 'city'}, 
 {:atom_or, 
  {{:operator, :cmp_eq}, ['country'], 'Sweden'},
  {:atom_and, 
   {{:operator, :cmp_eq}, ['country'], 'USA'},
   {{:operator, :cmp_eq}, ['name'], 'New York'}}}}

Currently forage only supports intersection, not union. It’s easy to add support for unions, though.

Well my first unix was pre-System III and it’s friends lex and yacc. I think they qualify as ancient :slight_smile:

And thanks for writing them as part of Erlang, it feels like they should belong in any build system and it’s great they are standard issue.

1 Like

Yes, pre-System III lex and yacc qualify as ancient :wink:

While I did implement leex, I cannot take credit for yecc. The first yecc versions were implemented by another guy at the Ericsson Computer Science Lab, Carl Wilhelm Welin.

1 Like

since I’m here to learn, I would like to go through this macro idea.

I met two difficulties producing that Elixir quote directly from yecc:

  • I’m in Erlang, which I know even less than Elixir,

    so for example the <expression> ::= <expression> or <bterm> production:
    I would write the corresponding action as
    {or, [context: Elixir, import: Kernel], '$1', '$3'}.
    but I get a syntax error before: 'or',
    and once I replace the atom with the string "or", just to see what other problems there are, I get two illegal expression.

    This one works, but is obviously not what I need:
    {"or", [context, "Elixir", import, "Kernel"], '$1', '$3'}.

  • I miss the leading ‘c.’ (and would not want to ask the user to add it).

    I guess than a function in the Erlang code. section can solve this one.

Some quick comments:

  • in Erlang or is a reserved word, hence the syntax error, so to get the atom you need to write 'or'.
  • the syntax [context: Eiixir, import: Kernel] is illegal so you would have to write [{context,'Elixir'},{import,'Elixir.Kernel'}] to get the corresponding structure. Erlang has very few special syntax cases like Elixir property lists.

c. ?

1 Like

single quoting the or works, thank you. and yes I remembered that the [a: b] was a reduced representation of something else. I could just nor remember what.

the leading c. is the part from c in <table-name> of the Ecto query I’m reconstructing.

in Erlang I’m working with single quoted strings, and in Elixir I need double quoted ones, so I will need a conversion function. but I also need to convert single quoted strings to the corresponding atom. I will review in the light of your hint, and hope to be more specific, but it has to do with the production <query> ::= <domain> where <expression>. I have the name of the table in an Erlang single-quotes string, and I need the atom by that name. like 'City' and I need :City. see above, the third line in the quoted form of the Ecto query.

1 Like

the leading domain (or c.), I can do easily, and the conversion from string to binary and string to atom I also found their names.

so I’m all set I guess,
the single quotes to produce atoms, and the syntax for associative lists, …,
I’ll report here if I manage to get anything working, or at least looking like something that could work.

thank you all!