Simulate Regex match guards in functions

rvirding · May 24, 2016, 9:46am

Yes, compiling the regexps with a tool like leex[*] will generally produce a much faster program for doing this type of thing because:

It will only make one pass over the string you are testing irrespective of how many regexps you are testing it against. The other alternatives here will test your string against each possible regexp one at a time, this irrespective of whether they are hard-wired in a cond or defined in some nicer way.
That the leex version can do this partially depends on the regexps it allows are more restricted, amongst other things they never need backtracking which Perl and PCRE regexps may need.
For this type of usage we don’t have to actually generate a “token” as such just some tag indicating what we found.

It is honestly quite easy to write a tool which generates an input file for leex from a set of regexps and return values. It is compile time but it is not that difficult to fix it so you could handle changing the regexp set dynamically and recompile your “scanner”, though you wouldn’t want to do it too often.[**]

Robert

[*] Leex is based on the same principles as other scanner generating tools like lex and flex. Where did we get the name from? It leaks tokens.

[**] There are programs which handle configuration data in this way, they dynamically compile a new config module containing the config data instead of keeping it in a database. Quite cool actually.