Erlang/elixir and leex

Leex regex’s are greedy - they will return the longest match. Which is why you are getting {code,50,"[code]\nzz\nzzz\n[/code]\n\ntest\n\n[code]\nzz\nyyy\n[/code]"}

In part this is because you are trying to both tokenise and parse at the same time with a tool that is designed for tokenising. Probably better to use @OvermindDL1’s ex_spirit he described above.

If you choose to continue with Leex, then simplify your tokenising and use Yecc for parsing - it will save you a lot of grief over trying to coerce Leex into something its not designed for.

I think your basic tokens are [, ], \\, :, [a-zA-Z]+ and \n from which you can then parse the format you’re using with a simple parser in Yecc.

2 Likes

i have partial fix that greedy with

{C1}(([^({C1})|({C2})])+([\[\]#])+?(.|\n))+?{C2} : {token, {code, TokenLen, TokenChars}}.

but i have one problem –
my text consists of –

“some text, with some [b] inside”
“[ code]…[/code]”
“some text again”
“some [ code]…[/code] again”

and all [ code]…[/code] catch true, but next one - not catch

[code]
17> Z = #{a => 1, b => 3}.
#{a => 1,b => 3}
18> Z2 = maps:put(a, 42, Z).
#{a => 42,b => 3}
19> Z3 = maps:remove(b, Z2).
#{a => 42}
20> Z4 = maps:put(c, 77, Z3).
#{a => 42,c => 77}
21> maps:size(Z4).
2
[/code]

why? how fix?