String.replace and escaping weirdness

I have some code that finds all instances of the characters , %, and _, inserting a backslash in front of them to escape them in the resulting SQL string. I’m a little bit confused about the amount of \ characters I need to use to do this. I would think that String.replace(string, ~r/([\\%_])/, "\\\\1") would do it, since I put in "\\" for a single backslash, then "\\1" for the backslash-one syntax to get my first capture. However, this results in substituting the characters backslash and 1, e.g., a_b -> a\1b (on IO.puts).

It seems like this because "\\\\1" is the intended syntax for substituting an actual, literal backslash and "\\\\\\1" does the trick, but to be honest I’'m confused about how the escaping is actually working in this case.

Does anyone have any insights?

In the replacement language \ has a special meaning. So if you want it literally, you need to escape it.

Your string \\\\1 is seen by the replacement language as \\1, which will result in the replace of \1 (as printed) or \\1 (as inspected).

To actually get a single backslash followed by the content of the capture, you need 3 backslashes followed by a one in the replacement language, which in a string literall have to be doubled, such that you end up with 6 of them.

When I do write replacments, I usually use ~S to avoid the duplication, then I can do ~S"\\\1".

5 Likes

Thanks, @NobbZ! I think I see what you mean by “replacement language”. This seems to be a special case when a regex is passed as the pattern to replace/4, as illustrated below:

$ iex
Erlang/OTP 22 [erts-10.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Interactive Elixir (1.9.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> "abc" |> String.replace("a", "z\\0") |> IO.puts()
z\0bc
:ok
iex(2)> "abc" |> String.replace(~r/a/, "z\\0") |> IO.puts()
zabc
:ok
iex(3)> "abc" |> String.replace("a", "\\") |> IO.puts()    
\bc
:ok
iex(4)> "abc" |> String.replace("a", "\\\\") |> IO.puts()
\\bc
:ok
iex(5)> "abc" |> String.replace(~r/a/, "\\") |> IO.puts()
\bc
:ok
iex(6)> "abc" |> String.replace(~r/a/, "\\\\") |> IO.puts()
\bc
:ok

Is this behavior documented anywhere? I know the docs mention using “\1”, etc. to do capture substitution, and that implies that \ is being treated specially, but I’m still surprised that even without capture replacement regex patterns cause replacements to behave differently.

Edit: For a simple regexes, one can avoid the replacement language by either 1) supplying a list of strings as the pattern argument, or a function as the replacement argument.

Ah, and when replacement is a function in Regex.replace/4, it behaves a little differently than String.replace/4. For Regex, the function gets n + 1 arguments where n is the number of captures in the regex. The first argument is the whole match and next n correspond to each match, whereas String.replace/4's replacement function only ever takes a single argument (the whole match).