Elixir has, well, way too many parsing libraries, and I tried a lot of them to parse a little experiment of mine, however I kept running into issues, everything from error reporting being near useless (leex/yecc) to missing a lot of primitives (Combine) to all of them seeming to have a bit of overhead in a programming perspective. Well after going through no less than 6 parsing libraries to parse my little experiment from both the Elixir and Erlang worlds, I decided to port an old project I used to work on (rightfully proclaimed fastest parser in the world, for good reason, this new one is not being built for speed though).
So Iām making ExSpirit, thus far only having the parsing portion (lexāing is less useful with this style parser though I plan to support that later if I see a need or anyone gives me a need), with the name of ExSpirit.Parser.
Iāll describe it after this paragraph, but first, is it worth putting this on hex.pm considering how many other parsing libraries there are? I really hate lots of competing libraries where none is really dominate and mine would just add to that problemā¦
However, to use mine you have to define a module where your parsers will belong, here is mine form one of my tests:
defmodule ExSpirit.Tests.Parser do
use ExSpirit.Parser
defrule testrule(
seq([ uint(), lit(?\s), uint() ])
)
defrule testrule_map(
seq([ uint(), lit(?\s), uint() ])
), map: Enum.map(fn i -> i-40 end)
defrule testrule_fun(
seq([ uint(), lit(?\s), uint() ])
), fun: (fn context -> %{context | result: {"altered", context.result}} end).()
defrule testrule_context(context) do
%{context | result: "always success"}
end
This is not how it would traditionally be written, this is showing the advanced usage features, however the use ExSpirit.Parser
at the top sets up everything (yeesh the syntax coloring on these forums need helpā¦). The doctests that use this specific test module will show you most of the currently existing capabilities:
# `lit` matches a specific string or character
iex> import ExSpirit.Tests.Parser
iex> context = parse("Test 42", lit("Test"))
iex> {context.error, context.result, context.rest}
{nil, nil, " 42"}
# `lit` matches a specific string or character
iex> import ExSpirit.Tests.Parser
iex> context = parse("Test 42", lit(?T))
iex> {context.error, context.result, context.rest}
{nil, nil, "est 42"}
# `uint` parses out an unsigned integer, default radix of 10 with a min size of 1 and max of unlimited
iex> import ExSpirit.Tests.Parser
iex> context = parse("42", uint())
iex> {context.error, context.result, context.rest}
{nil, 42, ""}
# `|>` Returns the result of the last parser in the pipe chain,
# `lit` always returns nil for example
iex> import ExSpirit.Tests.Parser
iex> context = parse("42Test", uint() |> lit("Test"))
iex> {context.error, context.result, context.rest}
{nil, nil, ""}
# `|>` Returns the result of the last parser in the pipe chain
iex> import ExSpirit.Tests.Parser
iex> context = parse("42Test64", uint() |> lit("Test") |> uint())
iex> {context.error, context.result, context.rest}
{nil, 64, ""}
# `uint` parsing out base-2
iex> import ExSpirit.Tests.Parser
iex> context = parse("101", uint(2))
iex> {context.error, context.result, context.rest}
{nil, 5, ""}
# `uint` parsing out base-16 lower-case, can be mixed too
iex> import ExSpirit.Tests.Parser
iex> context = parse("ff", uint(16))
iex> {context.error, context.result, context.rest}
{nil, 255, ""}
# `uint` parsing out base-16 upper-case, can be mixed too
iex> import ExSpirit.Tests.Parser
iex> context = parse("FF", uint(16))
iex> {context.error, context.result, context.rest}
{nil, 255, ""}
# `seq` parses a sequence returning the return of all of them, removing nils,
# as a list if more than one or the raw value if only one, if any fail then
# all fail.
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("42 64", seq([uint(), lit(" "), uint()]))
iex> {contexts.error, contexts.result, contexts.rest}
{nil, [42, 64], ""}
# `seq` Here is sequence only returning a single value
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("42Test", seq([uint(), lit("Test")]))
iex> {contexts.error, contexts.result, contexts.rest}
{nil, 42, ""}
# `alt` parses a set of alternatives in order and returns the first success
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("FF", alt([uint(16), lit("Test")]))
iex> {contexts.error, contexts.result, contexts.rest}
{nil, 255, ""}
# `alt` parses a set of alternatives in order and returns the first success
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("Test", alt([uint(16), lit("Test")]))
iex> {contexts.error, contexts.result, contexts.rest}
{nil, nil, ""}
# You can use `defrule`s as any other terminal parser
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("42 64", testrule())
iex> {contexts.error, contexts.result, contexts.rest}
{nil, [42, 64], ""}
# `defrule`'s also set up a stack of calls down a context so you know
# 'where' an error occured, so name the rules descriptively
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("42 fail", testrule())
iex> {contexts.error.context.rulestack, contexts.result, contexts.rest}
{[:testrule], nil, "fail"}
# `defrule`s can map the result to return a different one:
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("42 64", testrule_map())
iex> {contexts.error, contexts.result, contexts.rest}
{nil, [2, 24], ""}
# `defrule`s can also operate over the context itself to do anything
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("42 64", testrule_fun())
iex> {contexts.error, contexts.result, contexts.rest}
{nil, {"altered", [42, 64]}, ""}
# `defrule`s can also be a context function by only passing in `context`
iex> import ExSpirit.Tests.Parser
iex> contexts = parse("42 64", testrule_context())
iex> {contexts.error, contexts.result, contexts.rest}
{nil, "always success", "42 64"}
# `tag` can tag the output from a parser
iex> import ExSpirit.Tests.Parser
iex> context = parse("ff", tag(:integer, uint(16)))
iex> {context.error, context.result, context.rest}
{nil, {:integer, 255}, ""}
# You can have a skipper too, skippers should be run at the start of any
# terminal parser, it runs only once per pass, if you want it to repeat then
# set the skipper up so it repeats, a good one is `repeat(lit(?\\s))` for
# example
iex> import ExSpirit.Tests.Parser
iex> context = parse(" 42 ", uint(), skipper: lit(?\\s))
iex> {context.error, context.result, context.rest}
{nil, 42, " "}
# You can turn off a skipper for a parser as well with `no_skip`
iex> import ExSpirit.Tests.Parser
iex> context = parse(" Test:42 ", lit("Test:") |> no_skip(uint()), skipper: lit(?\\s))
iex> {context.error, context.result, context.rest}
{nil, 42, " "}
{nil, 42, " "}
# You can change a skipper for a parser as well with `skip`
iex> import ExSpirit.Tests.Parser
iex> context = parse(" Test:\t42 ", lit("Test:") |> skip(uint(), lit(?\\t)), skipper: lit(?\\s))
iex> {context.error, context.result, context.rest}
{nil, 42, " "}
# `char` can parse out any single character
iex> import ExSpirit.Tests.Parser
iex> context = parse("Test", char())
iex> {context.error, context.result, context.rest}
{nil, ?T, "est"}
# `char` can parse out any 'specific' single character as well
iex> import ExSpirit.Tests.Parser
iex> context = parse("Test", char(?T))
iex> {context.error, context.result, context.rest}
{nil, ?T, "est"}
# `char` can parse out any 'specific' single character from a range too
iex> import ExSpirit.Tests.Parser
iex> context = parse("Test", char(?A..?Z))
iex> {context.error, context.result, context.rest}
{nil, ?T, "est"}
# `char` can parse out any 'specific' single character from a list of
# characters or ranges too
iex> import ExSpirit.Tests.Parser
iex> context = parse("Test", char([?a..?z, ?T]))
iex> {context.error, context.result, context.rest}
{nil, ?T, "est"}
So unlike the old C++ library that I worked on years ago, this one is not operator-explosion, though that really would help make it readable I think, I decided I wanted descriptive names for everything.
I made a benchmark of simple integer parsing and of datetime parsing (the datetime parsing example from the Combine parsing library specifically), and benched with them benchee in a quick setup, the results:
##### With input parse_datetime #####
Name ips average deviation median
ex_spirit 204.55 K 4.89 Ī¼s Ā±12.16% 4.70 Ī¼s
combine 76.59 K 13.06 Ī¼s Ā±44.50% 16.00 Ī¼s
Comparison:
ex_spirit 204.55 K
combine 76.59 K - 2.67x slower
##### With input parse_int_10 #####
Name ips average deviation median
ex_spirit 626.87 K 1.60 Ī¼s Ā±31.93% 1.60 Ī¼s
combine 162.52 K 6.15 Ī¼s Ā±12.90% 6.20 Ī¼s
Comparison:
ex_spirit 626.87 K
combine 162.52 K - 3.86x slower
So my style does not seem slower at least, and seems to be a little bit faster, plus it is more built for the style that I expect, which is not necessarily the style everyone expects, but rather it works well for me. ^.^