Is it possible to define recursive/inter-dependent parse combinators in NimbleParsec?

I’m trying to parse Zig source with NimbleParsec and come across some cases like these:

  • in the definition of containerdeclarations, it uses itself
containerdeclarations = choice(
  testdecl |> concat(containerdeclarations),
  toplevelcomptime |> concat(containerdeclarations),
  optional(doc_comment) |> optional(keyword_pub) |> concat(topleveldecl) |> concat(containerdeclarations)
)
  • expr uses asmexpr indirectly while asmexpr uses expr
asmexpr =
  keyword_asm
  |> optional(keyword_volatile)
  |> concat(lparen)
  |> concat(expr)
  |> optional(asmoutput)
  |> concat(rparen)
primaryexpr =
  choice([
    asmexpr,
    ifexpr,
    keyword_break |> optional(breaklabel) |> optional(expr),
    keyword_comptime |> concat(expr),
    keyword_nosuspend |> concat(expr),
    keyword_continue |> optional(breaklabel),
    keyword_resume |> concat(expr),
    keyword_return |> optional(expr),
    optional(blocklabel) |> concat(loopexpr),
    block,
    curlysuffixexpr
  ])

prefixexpr = repeat(prefixop) |> concat(primaryexpr)

multiplyexpr = prefixexpr |> optional(multiplyop |> concat(prefixexpr))

additionexpr = multiplyexpr |> optional(additionop |> concat(multiplyexpr))

bitshiftexpr = additionexpr |> optional(bitshiftop |> concat(additionexpr))

bitwiseexpr = bitshiftexpr |> optional(bitwiseop |> concat(bitshiftexpr))

compareexpr = bitwiseexpr |> optional(compareop |> concat(bitwiseexpr))

boolandexpr = compareexpr |> optional(keyword_and |> concat(compareexpr))

boolorexpr = boolandexpr |> optional(keyword_or |> concat(boolandexpr))

expr = boolorexpr

Yes, you can have recursive combinators. But they need to be defined as defparsec in order to create their own context. They will need to be defined in their own module since they become functions. Something like (not complete):

defmodule Combinators do
  import NimbleParsec
  
  defparsec :asmexpr, 
    keyword_asm
    |> optional(keyword_volatile)
    |> concat(lparen)
    |> parsec(:expr)
    |> optional(asmoutput)
    |> concat(rparen)
    
  defparsec :primaryexpr,
    choice([
      parsec(:asmexpr),
      ifexpr,
      keyword_break |> optional(breaklabel) |> optional(parsec(:expr)),
      keyword_comptime |> parsec(:expr),
      keyword_nosuspend |> parsec(:expr),
      keyword_continue |> optional(breaklabel),
      keyword_resume |> parsec(Lexpr),
      keyword_return |> optional(parsec(:expr)),
      optional(blocklabel) |> concat(loopexpr),
      block,
      curlysuffixexpr
    ]) 

  # Other combinators
end
5 Likes

They will need to be defined in their own module since they become functions.

I am trying to keep the structure and naming convention of Zig’s official peg so I’m using this kind of workaround for function and variable with same name. No sure if this would work.

expr = parsec(:expr_parsec)
.... others
boolorexpr = boolandexpr |> optional(keyword_or |> concat(boolandexpr))
expr = boolorexpr
.... others
defcombinatorp(:expr_parsec, expr, export_metadata: true)
  • after around 30s I got errors
==> kinda
Compiling 1 file (.ex)
Compiling lib/parser.ex (it's taking more than 10s)

== Compilation error in file lib/parser.ex ==
** (FunctionClauseError) no function clause matching in NimbleParsec.Compiler.label/1    
    
    The following arguments were given to NimbleParsec.Compiler.label/1:
    
        # 1
        10
    
    Attempted function clauses (showing 10 out of 12):
    
        defp label({:string, binary})
        defp label({:label, _combinator, label})
        defp label({:bin_segment, inclusive, exclusive, modifier})
        defp label(:eos)
        defp label({:lookahead, combinators, _})
        defp label({:repeat, combinators, _, _})
        defp label({:eventually, combinators})
        defp label({:times, combinators, _})
        defp label({:choice, choices, _})
        defp label({:traverse, combinators, _, _})
        ...
        (2 clauses not shown)
    
    (nimble_parsec 1.2.3) lib/nimble_parsec/compiler.ex:960: NimbleParsec.Compiler.label/1
    (elixir 1.14.2) lib/enum.ex:1755: anonymous fn/2 in Enum.map_join/3
    (elixir 1.14.2) lib/enum.ex:4292: Enum.map_intersperse_list/3
    (elixir 1.14.2) lib/enum.ex:1755: Enum.map_join/3
    (elixir 1.14.2) lib/enum.ex:1755: anonymous fn/2 in Enum.map_join/3
    (elixir 1.14.2) lib/enum.ex:4292: Enum.map_intersperse_list/3
    (elixir 1.14.2) lib/enum.ex:4292: Enum.map_intersperse_list/3
    (elixir 1.14.2) lib/enum.ex:1755: Enum.map_join/3
could not compile dependency :kinda, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile kinda", update it with "mix deps.update kinda" or clean it with "mix deps.clean kinda"

more updates:

update:

1 Like

update:

  • as expected, there are many bugs
  • one approach I found to debug combinators with NimbleParsec is to export all the sub combinators with defparsec to make it possible to run smaller tests to see if it works

also, adding tag to the combinator also help. Often the matching works but it is not matched by expected combinators

This may be of interest to you, though I’m not certain on its status: GitHub - ityonemo/zig_parser: Zig Parser for Elixir

It is created with GitHub - ityonemo/pegasus: peg -> nimbleparsec

An unrelated thought, but curious of the advantages and disadvantages of the PEG route vs a traditional lexer + parser

1 Like

It works (well enough for my zigler 0.10.x development branch) and is currently designed to parse zig 0.10.x

1 Like

It works!
Do you have any plan for source code generation from a Zig AST in Elixir?

I don’t. Actually in the long run I want to deprecate using the zig parser because there will ideally be compiler hooks in zig that will let us see this information without having to do it again

@ityonemo could you have a look at this PR? Fix ex_doc by jackalcooper · Pull Request #1 · ityonemo/zig_parser · GitHub

done, and updated on hex.pm as 0.1.4. Sorry, you sent the PR in a chaotic time (I got fired – but a good thing, I was trying to quit) so I missed a lot of things!

Sorry to hear that. Wish you find a good new job and work on something you really enjoy!

Here is another issue: Fail to parse float of exponential notation with 4-digit exponent · Issue #2 · ityonemo/zig_parser · GitHub

No need to feel sorry. Getting fired in the US is strictly better than quitting, and I’m not looking for a new job!

@ityonemo
the problematic expr seems not to be covered by the fix in the last commit

pub const __LDBL_MAX__ = @as(c_longdouble, 1.18973149535723176502e+4932);