Programmatically editing a Regex

tmbb · August 26, 2017, 5:47pm

Is there a way of programmatically manipulating regexs without parsing them myself? I need to do things like replacing a match group of one regex with a literal string. For examples:

# take this regex with a match group
r/\1abc/
# and return this regex:
r/XXXabc/
# where the match group has been replaced by the literal "XXX"

Is there any way of doing this except parsing the string representation myself and compile the regex dynamically?

Ideally I’d like an API that could replace the group with the literal in an already compiled expression, but I can’t find anything.

NobbZ · August 26, 2017, 6:43pm

In the case of ~r/\1abc/ you’ll need to parse it yourself, the sigil will refuse that, because it references an inexistent group:

iex(1)> ~r/\1abc/
** (Regex.CompileError) reference to non-existent subpattern at position 5
    (elixir) lib/regex.ex:170: Regex.compile!/2
    (elixir) expanding macro: Kernel.sigil_r/2
    iex:1: (file)

In a small-r-sigil though, you can use string substitution: ~r/#{"XXX"}abc/:

iex(1)> ~r/#{"XXX"}abc/
~r/XXXabc/

Perhaps this helps you?

And if you get your input as a string "\\1abc"[1], you can simply use String.replace/4:

iex(3)> "\\1abc" |> String.replace("\\1", "XXX") |> Regex.compile!()
~r/XXXabc/

[1] It seems as if something is wrong here, it should be 2 backslashes, a 1 and then abc… @AstonJ can you take a look whats happening here?

AstonJ · August 26, 2017, 6:57pm

Looks like it’s a bug and the fix should be rolled out in their new engine.

For now, it’s prob best to do as you did, and add a note to say there should be two backslashes and not one

tmbb · August 26, 2017, 11:19pm

Yes, I’ll probably use something like String.replace/4 but maybe more robust.

This won’t work… I have to work with pairs of regexs (say reA and reB) that will be matched in sequence. The regex reB can have capture groups that refer to groups matched by reA. The problem is that between reA and reB there will be some input that must be consumed. So I have to store the groups in reA and dynamically inject them in reB at runtime (to replace the capture groups). This seems to be the only way of doing this short of implementing a regex engine of my own… (The end goal is to parse and interpret TextMate language files)

But thanks, I’ll try to do it by replacing the capture group as you suggested

axelson · August 27, 2017, 12:32am

While I’m not familiar with that file format you may be better off implementing a simple parser. @OvermindDL1 has a parser library that he likes to recommend called ExSpirit

tmbb · August 27, 2017, 12:40am

I’m veeery familiar with ExSpirit and it’s great. In fact, I’m the author of the only hex package that uses ExSpirit xD, but parsing isn’t the hard part here. It’s just a subset of JSON. The problem is that tbe file is executable and the hard part is writing an interpreter or a compiler for said format…

axelson · August 27, 2017, 2:45am

Errr, you appear to be much more knowledgeable about this subject than I am. So I rest my case. Sounds like a tricky problem though.

NobbZ · August 27, 2017, 6:23am

iex(1)> Regex.run(~r/(abc)abc/, "abcabc")
["abcabc", "abc"]
iex(2)> ~r/#{grp1}/
~r/abc/

So you can use the matches and groups from one match in the second match.

OvermindDL1 · August 28, 2017, 2:44pm

Well the brute-force method of editing an existing regex (assuming you cannot use sigil variable interpolation or so) is just to stringify the regex, run another regex (or substitution or whatever) on it, then re-parse/2 it. ^.^;

tmbb · August 28, 2017, 3:00pm

Yes, If I ever go down that road, I’ll probably do that…