Change of behaviour in Elixir 1.18 macros: (ArgumentError) tried to unquote invalid AST

Hi!

I have a piece of code that parses some json and creates validations from them. In one place, I need to create a regular expression from string.

In Elixir 1.17 this worked

iex(1)> regex_string = "[0-9]{4}"
"[0-9]{4}"
iex(2)> quote do unquote(~r/#{regex_string}/) end |> Macro.to_string
"~r/[0-9]{4}/"

In Elixir 1.18, I get an error

iex(1)> regex_string = "[0-9]{4}"
"[0-9]{4}"
iex(2)> quote do unquote(~r/#{regex_string}/) end |> Macro.to_string
** (ArgumentError) tried to unquote invalid AST: ~r/[0-9]{4}/
Did you forget to escape term using Macro.escape/1?
    (elixir 1.18.2) src/elixir_quote.erl:542: :elixir_quote.argument_error/1
    iex:3: (file)

I thought, it might be because of interpolation, so I tried interpolating outside of quote:

iex(1)> regex_string = "[0-9]{4}"
"[0-9]{4}"
iex(2)> regex = ~r/#{regex_string}/
~r/[0-9]{4}/
iex(3)> quote do unquote(regex) end |> Macro.to_string
** (ArgumentError) tried to unquote invalid AST: ~r/[0-9]{4}/
Did you forget to escape term using Macro.escape/1?
    (elixir 1.18.2) src/elixir_quote.erl:542: :elixir_quote.argument_error/1
    iex:7: (file)

The error suggests using Macro.escape, but it doesn’t make sense to me. Unquote should already escape and indeed, I would the generated code will have AST of the code isntead of the code.

iex(1)> quote do ~r/[0-9]{4}/ end
{:sigil_r, [delimiter: "/", context: Elixir, imports: [{2, Kernel}]],
 [{:<<>>, [], ["[0-9]{4}"]}, []]}
iex(2)> regex = ~r/#{regex_string}/
~r/[0-9]{4}/
iex(3)> escaped = Macro.escape(regex)
{:%{}, [],
 [
   __struct__: Regex,
   opts: [],
   re_pattern: {:{}, [],
    [
      :re_pattern,
      0,
      0,
      0,
      <<69, 82, 67, 80, 109, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 255, 255, 255,
        255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, ...>>
    ]},
   re_version: {"8.44 2020-02-12", :little},
   source: "[0-9]{4}"
 ]}
iex(3)> quote do unquote(escaped) end |> Macro.to_string
"%{\n  __struct__: Regex,\n  opts: [],\n  re_pattern:\n    {:re_pattern, 0, 0, 0,\n     \"ERCPm\\0\\0\\0\\0\\0\\0\\0\\x01\\0\\0\\0\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0@\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\x83\\0)n\\0\\0\\0\\0\\0\\0\\xFF\\x03\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0m\\0\\x04\\0\\x04x\\0)\\0\"},\n  re_version: {\"8.44 2020-02-12\", :little},\n  source: \"[0-9]{4}\"\n}"

I am starting to wonder if Elixir 1.18 became more strict or is it a bug in Elixir?

Regular expressions are just structs and as all structs they should be escaped before unquoting. What’s more interesting you can’t unquote 3-element tuple, but you can unquote each it’s element without escaping the tuple.

Examples

Structs

quote do
  unquote(URI.new!("this.is/not/allowed"))
end

quote do
  URI.new!(unquote("but-this.is/allowed"))
end

3-element tuple

quote do
  unquote({:this, :is_not, :allowed})
end

quote do
  {unquote(:but_this), unquote(:is), unquote(:allowed)}
end

Note: I just tried it on 1.17.3 and looks like all of those examples works which is indeed interesting. I’m so used to 1.18.x that I have completely forgot about such behaviour. :sweat_smile:

Thanks @Eiji
That partially solves my problem.
In the last of my snippets, the returned string is code representation of regex struct, so it technically should do what I want. If I do:

iex(2)> quote do unquote(escaped) end |> Macro.to_string |> IO.puts
%{
  __struct__: Regex,
  opts: [],
  re_pattern:
    {:re_pattern, 0, 0, 0,
     "ERCPh\0\0\0\0\0\0\0\x01\0\0\0\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\0\0\0\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x83\0$n\0\0\0\0\0\0\xFF\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0x\0$\0"},
  re_version: {"8.44 2020-02-12", :little},
  source: "[0-9]"
}
:ok

And then I copy-paste that value, I get back the regex.

iex(3)> %{
...(3)>   __struct__: Regex,
...(3)>   opts: [],
...(3)>   re_pattern:
...(3)>     {:re_pattern, 0, 0, 0,
...(3)>      "ERCPh\0\0\0\0\0\0\0\x01\0\0\0\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\0\0\0\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x83\0$n\0\0\0\0\0\0\xFF\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0x\0$\0"},
...(3)>   re_version: {"8.44 2020-02-12", :little},
...(3)>   source: "[0-9]"
...(3)> }
~r/[0-9]/

However, I am using the output to generate code. When I paste it in the iex, it converts back to ~r/[0-9]/ representation. But if I do Macro.to_string it keeps the verbose struct syntax. Do you know a way to force Macro.to_string to return the more readable representation?

The point is you escape to be able to call unquote.

escaped == quote do unquote(escaped) end
# => true

so I’m not sure what you are trying to do at the end …

Please keep in mind that Macro.escape/1 and quote/1 works differently. The quote may return different AST for same data, for example

  1. AST for the sigil
  2. AST for the struct (when %MyStruct{…} is used)
  3. Escaped map
map_escape = quote do
  %{__struct__: MyModule, …}
end

struct_escape = quote do
  %MyModule{…}
end

map_escape == struct_escape
# => false

Macro.escape/1 escapes all maps (including structs) as maps, so it’s result is equal to map_escape in above example.


In this case we deal with an escaped map. The AST of it is pretty simple:

{:%{}, [], Map.to_list(map)}

However it’s not the only escaped part … The second one is a re_pattern. This one is also very simple.

{:{}, [], Tuple.to_list(tuple)}

Let’s pattern-match!

regex = ~r/[0-9]{4}/
escaped = Macro.escape(regex)
{:%{}, [], keyword} = quote do unquote(escaped) end

Map.new(keyword) == regex
# => false (re_pattern tuple is still escaped)

unescape_tuple = fn {:{}, [], list} -> List.to_tuple(list) end
regex2 = keyword |> Keyword.update!(:re_pattern, unescape_tuple) |> Map.new()
regex == regex2
# => true

Note: Please keep in mind that re_pattern is documented as term, so you may need more code to handle different cases. Consider above code as a simple example.


However there is a simpler way. You can evaluate a quoted map:

regex = ~r/[0-9]{4}/
escaped = Macro.escape(regex)
{regex2, []} = Code.eval_quoted(escaped)
regex == regex2
# => true

Personally I try to avoid any kind of eval in code I write especially if the input source is untrusted.


It would be helpful if you would give us some more context. I still don’t understand why you need to quote regular expressions and work on the AST. Most probably there is much simpler solution …

Thanks for your answer. Indeed, more context could help you understand what I am trying to achieve.

We’ve build a custom code generator from API specs. Among others, it can generate structs for request params + changesets that validate those structs. E.g. we can have a JSON object defined like this:

title: My Object
type: object

properties:
  four_digit_code:
    description: |
      Unlocks pinpad
    type: string
    minLength: 4
    maxLength: 4
    pattern: "[0-9]{4}"
    example: 2137

Out of that, we generate struct like this:

defmodule MyObject
  embedded_schema do
    # Unlocks pinpad
    field(:four_digit_cde, :string)
  end

    def changeset(struct, params) do
    struct
    |> cast(params, [
      :four_digit_code
    ])
    |> validate_format(:four_digit_code, ~r/[0-9]{4}/)
    |> validate_length(:four_digit_code, is: 4)
  end
end

In the beginning, we used string interpolation to generate the code, but the development feedback loop was slow. E.g. if you messed up some code in the template, you wouldn’t know until you generate the struct and try to compile it.

Thankfully, we figured, that if we use macros instead, we get helpful errors immediately when we try to compile the generator (instead of after we use it). So, in the generator, we iterate over property constraints and add code based on them.

At the end, we use Macro.to_string to generate the final module.

Generating those modules from specs saves a lot of time (especially for more complex schemas).

The code that stopped working is for generating validate_format function based on regex.

        {:pattern, pattern} ->
          ast =
            quote do
              validate_format(unquote(field), unquote(~r/#{pattern}/))
            end

The pattern variable is string from openapi spec: [0-9]{4}

That worked until Elixir 1.17 because it was possible to pass regex to unquote. Since Elixir 1.18 this is no longer possible and it shows helpful message about trying Macro.escape. And it would probably work most of the time, but not for code generation.

I need the most readable version of the code for the generator even if the escaped one is equivalent.

E.g. this is super readable:

iex(12)> quote do ~r/[0-9]{4}/ end |> Macro.to_string
"~r/[0-9]{4}/"

But I need to insert the inner part of the pattern from string:

iex(1)> regex_string = "[0-9]{4}"
"[0-9]{4}"
iex(14)> regex = ~r/#{regex_string}/
~r/[0-9]{4}/
iex(2)> escaped = Macro.escape(regex)
{:%{}, [], [ struct representation omitted for clarity ]}
iex(3> quote do unquote(escaped) end |> Macro.to_string
"%{\n  __struct__: Regex,\n  opts: \"\",\n  re_pattern:\n    {:re_pattern, 0, 0, 0,\n     \"ERCPm\\0\\0\\0\\0\\0\\0\\0\\x01\\0\\0\\0\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0@\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\x83\\0)n\\0\\0\\0\\0\\0\\0\\xFF\\x03\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0m\\0\\x04\\0\\x04x\\0)\\0\"},\n  re_version: {\"8.44 2020-02-12\", :little},\n  source: \"[0-9]{4}\"\n}"

As you can see, that representation is far less readable.

It seems, the only way to achieve what I want is to to edit sigil AST directly.
E.g.

{:sigil_r, meta, [_tuple, list]} = quote do ~r// end
sigil_ast = {:sigil_r, meta, [{:<<>>, [], [regex_string]}, list]}
quote do unquote(sigil_ast) end |> Macro.to_string

"~r/[0-9]{4}/"

I wondered if there is a better way to contstruct the required AST without deeping into tuple representation.

In that case I guess you can compile the regular expression in compile time, then inspect it to get a string with sigil and finally convert a string to the AST.

Here is a complete script I have prepared:

Mix.install([:ecto])

defmodule MyLib.Schema do
  @doc """
  Generates the changeset function.

  The `name` is a changeset function name. Default to `:changeset`.
  The `properties` is an `Elixir` map with `atom` keys.
  The `func` is an optional funtion that could be used by the developer to add other validations. 
  """
  defmacro changeset(name \\ :changeset, properties, func \\ nil) do
    quote bind_quoted: [func: func, module: __MODULE__, name: name, properties: properties] do
      pipe =
        properties
        |> module.from_properties(__MODULE__)
        |> module.pipe_func(func)

      # pipe
      # |> Macro.to_string()
      # |> Code.format_string!()
      # =>
      # struct
      # |> cast(params, [:four_digit_code])
      # |> validate_format(:four_digit_code, ~r/[0-9]{4}/)

      struct = Macro.var(:struct, __MODULE__)
      params = Macro.var(:params, __MODULE__)

      def unquote(name)(unquote(struct), unquote(params)) do
        unquote(pipe)
      end
    end
  end

  @doc false
  def from_properties(properties, module) do
    fields = Map.keys(properties)
    struct = Macro.var(:struct, module)
    params = Macro.var(:params, module)

    cast =
      quote do
        cast(unquote(params), unquote(fields))
      end

    pipe = ast_pipe(struct, cast)
    Enum.reduce(properties, pipe, &from_field_properties/2)
  end

  @supported_validators ~w[pattern]a

  defp from_field_properties({field, properties}, acc) do
    properties
    |> Map.take(@supported_validators)
    |> Enum.reduce(acc, fn {key, value}, acc ->
      ast_pipe(acc, validator(field, key, value))
    end)
  end

  defp validator(field, :pattern, value) do
    quote do
      validate_format(unquote(field), unquote(quoted_regex_sigil(value)))
    end
  end

  defp quoted_regex_sigil(source) do
    source
    # creates regular expression from source
    |> Regex.compile!()
    # inspect returns a string with a sigil
    |> inspect()
    # escaped AST form
    |> Code.string_to_quoted!()
  end

  def pipe_func(left, nil), do: left

  def pipe_func(left, func) do
    ast_pipe(
      left,
      quote do
        then(unquote(func))
      end
    )
  end

  defp ast_pipe(left, right) do
    quote do
      unquote(left) |> unquote(right)
    end
  end
end

defmodule MyApp.Schema do
  use Ecto.Schema

  import Ecto.Changeset
  import MyLib.Schema

  embedded_schema do
    field(:four_digit_code, :string)
  end

  changeset(%{four_digit_code: %{pattern: "[0-9]{4}"}})
end

defmodule Example do
  def sample do
    MyApp.Schema.changeset(%MyApp.Schema{}, %{four_digit_code: "0007"})
    # => %Ecto.Changeset{valid?: true}
  end
end

You should be able to easily adapt the example code to your needs.

2 Likes

Thank you!

This part is exactly what I was missing!

  defp quoted_regex_sigil(source) do
    source
    # creates regular expression from source
    |> Regex.compile!()
    # inspect returns a string with a sigil
    |> inspect()
    # escaped AST form
    |> Code.string_to_quoted!()
  end

I didn’t realise, Code has string_to_quoted!. That will ensure correct representation in the final generated code.

Brilliant answer!

1 Like