Hello.
I need to create a RegExp from any string. I tried to use strings with special characters, e.g. "\d"
But when I try to compile this string, it escapes the symbols and shows something like ~r/\x7F/
The code to try:
That is because just using " as the string delimiters turns on special characters like that.
To turn it off there are a few ways, the usual one is via using the ~S (capital S) sigil, which turns interpolation and special characters off, thus:
iex> "\d"
"\d"
iex> ~s"\d"
"\d"
iex> ~S"\d"
"\\d"
EDIT1: For note, " basically is like having an implied ~s before it. You can change the delimiter to a certain set if you want, ", ', / and a half dozen others are all valid, so ~s/blah/ == "blah" is true.
EDIT2: Also, the r sigil both defines a string and passes it to regex compile all in the same step, so your original example could be done like:
The inspection protocol for compiled regexâs just converts the compiles regex into the sigil form for easy copy/pasting into the shell, but that is not how it really is internally. The inspection protocol is for ease of âyouâ reading it, not how it really is.
EDIT3: There are lots of sigils, you can even make your own, all documented at:
In my case I get a string from users input, and they expect to use a ânormalâ regexp.
So I canât use sigils, like ~S , because I canât pass a variable to the macro.
I have this string â^\dâ as an input from users (or a record from a DB) and I need to convert it to regexp.
Well doing regex_str = "^\d+,$" is most certainly not user input. ^.^
It would be more like regex_str = get_user_blah() or so. Strings are only escaped in âsource codeâ, not from anywhere else. So if it is user input then it is already fine.
The issue is that your binding content in regex_str was escaped, it was not theRegex.compile/1 escaping it, it was the " parts above it.
It is exactly the same in javascript, C, C++, ocaml, etc⊠etc⊠etcâŠ, almost every language out.
If you get \d from an external source, it will be represented as "\\d" as a string, and this will work fine with Regex.compile.
Elixir string "\d" does not represent two characters, but a single one. A user entering text into, say, text field, does not use Elixir syntax, though, so when they type \d, this will result in a two-byte string, represented as "\\d" in Elixir. When you use a string in a test, you use Elixir syntax, so additional escaping is necessary there.
You know I just got to thinking, is it possible to create a function to do a rough calculation of a âcostâ of a compiled regex? That way we could deny user inputted regex that goes above a certain âcostâ? Sounds like a hard problem to catch all the detrimental cases⊠I wonderâŠ