Help with NimbleParsec

taguniversalmachine · May 21, 2022, 7:39pm

Hi,

I am trying to define a parser for a simple grammar and I am getting hung up on something that I think is simple but I can’t figure out.

The grammar has two main structures, a definition and an invocation, of the form:

invocationname($a$b)(c<>d<>)
definitionname[(a<>b<>)($c$d)somestuff:someotherstuff]

I have the invocation one working, but for some reason even though the definition structure is very similar, I get an error that it is expecting the opening bracket, even though I am sending it.

The whole parser definition is here, and I will copy my interaction with it:

defmodule Atomix.Invocation.Parser do
  import NimbleParsec

  name = ascii_string([?A..?z], min: 1)

  definition_name = ascii_string([?A..?z], min: 1)

  invocation_name = ascii_string([?a..?z], min: 1)

  content_name = ascii_string([?A..?Z], min: 1)

  destination_place =
    string("$")
    |> concat(name)

  destination_list =
    destination_place
    |> repeat(destination_place)

  source_place =
    name
    |> string("<>")

  source_list =
    source_place
    |> repeat(source_place)

  place_of_resolution =
    string("+")

  contained_definitions =
    string("=")


  defparsec(
    :invocation,
    invocation_name
    |> string("(")
    |> concat(destination_list)
    |> string(")")
    |> string("(")
    |> concat(source_list)
    |> string(")")
  )

  defparsec(
    :definition,
    definition_name
    |> string("[")
    |> concat(source_list)
    |> concat(destination_list)
    |> concat(place_of_resolution)
    |> string(":")
    |> concat(contained_definitions)
    |> string("]")
  )
end

iex(2)> Parser.invocation(“dfsd($a$b)(a<>b<>)”)
{:ok, [“dfsd”, “(”, “$”, “a”, “$”, “b”, “)”, “(”, “a”, “<>”, “b”, “<>”, “)”],
“”, %{}, {1, 0}, 18}
iex(3)> Parser.definition(“fdafa[(a<>b<>)fdsaf:fadf]”)
{:error, “expected string “[””, “(a<>b<>)fdsaf:fadf]”, %{}, {1, 0}, 6}

Why am I getting the error about expecting the bracket when it is clearly there?
Thanks for any pointers.

kip · May 21, 2022, 9:55pm

The problem is that you are defining character ranges that encompass the [ character:

  name = ascii_string([?A..?z], min: 1)
  definition_name = ascii_string([?A..?z], min: 1)

In the ASCII character set, ?a..?z and ?A..?Z are not contiguous with each other. For example:

iex> ?A..?z
65..122
iex> ?[
91

So you can see that [ fits in the range ?A..?z and therefore your [ is being consumed by definition_name. You can add a call to debug() in your combinator pipeline which will output some information that can often help with tracking down these issues.

I think you probably meant:

  name = ascii_string([?A..?Z,?a..?z], min: 1)
  definition_name = ascii_string([?A..?Z,?a..?z], min: 1)

kip · May 21, 2022, 10:15pm

I think your definition parser still needs some development but I took a stab at a version of invocation that is a bit more idiomatic:

defmodule Atomix.Invocation.Parser do
  import NimbleParsec

  name = 
    |> ascii_string([?A..?Z, ?a..?z], min: 1)
    |> unwrap_and_tag(:name)

  definition_name =
    ascii_string([?a..?z, ?A..?Z], min: 1)
    |> unwrap_and_tag(:destination_name)

  invocation_name =
    ascii_string([?a..?z], min: 1)
    |> unwrap_and_tag(:invocation_name)

  content_name =
    ascii_string([?A..?Z], min: 1)
    |> unwrap_and_tag(:content_name)

  destination_place =
    ignore(string("$"))
    |> concat(name)
    |> unwrap_and_tag(:destination_place)

  destination_list =
    destination_place
    |> repeat(destination_place)
    |> tag(:destination_list)

  source_place =
    name
    |> ignore(string("<>"))
    |> unwrap_and_tag(:source_place)

  source_list =
    source_place
    |> repeat(source_place)
    |> tag(:source_list)

  place_of_resolution =
    string("+")

  contained_definitions =
    string("=")


  defparsec(
    :invocation,
    invocation_name
    |> ignore(string("("))
    |> concat(destination_list)
    |> ignore(string(")"))
    |> ignore(string("("))
    |> concat(source_list)
    |> ignore(string(")"))
  )

  defparsec(
    :definition,
    definition_name
    |> ignore(string("["))
    |> concat(source_list)
    |> concat(destination_list)
    |> concat(place_of_resolution)
    |> string(":")
    |> concat(contained_definitions)
    |> string("]")
  )
end

In use:

iex> Parser.invocation("dfsd($a$b)(a<>b<>)")
{:ok,
 [
   invocation_name: "dfsd",
   destination_list: [destination_place: "a", destination_place: "b"],
   source_list: [source_place: "a", source_place: "b"]
 ], "", %{}, {1, 0}, 18}

taguniversalmachine · May 22, 2022, 12:40am

Ooh thank you very much, I don’t know how many years I’ve been looking at ASCII charts and I didn’t even think to check, Thank you very much for the tip about debug() and also for how to properly use tags!

taguniversalmachine · May 22, 2022, 5:57pm

Actually I do have one follow up question, if you don’t mind - what is the purpose of using ignore(string()) as opposed to just string()? I find it works both ways and the doc doesn’t really explain what ignore does. I mean I understand conceptually that string() would require the string to be present while ignore will just tolerate it but that doesn’t seem to be the way it works, ignore actually requires the string.

fuelen · May 22, 2022, 6:46pm

ignore simply instructs not to put data from combinator to the output.
Let’s say we have the following parser:

defmodule MyRange do
  import NimbleParsec
  range = integer(min: 1) |> ignore(string("..")) |> integer(min: 1)
  defparsec(:parse, range)
end

> MyRange.parse("200..300")
{:ok, [200, 300], "", %{}, {1, 0}, 8}

But when you remove ignore combinator, the output is this:

MyRange.parse("200..300")
{:ok, [200, "..", 300], "", %{}, {1, 0}, 8}

taguniversalmachine · May 22, 2022, 6:57pm

oh gotcha, thank you very much for the explanation!