Testing whether a string contains valid Elixir code?

This is admittedly a strange use-case, but I’ve been working on writing a Handlebars parser in Elixir and one of the gotchas is that you don’t want clever users injecting random Elixir code into your templates.

For example, this is a nice and benign Handlebars template:

{{#if some_variable}}
   Hello there!
{{/if}}

– it would get safely converted to EEx and no harm is done.

However, if some clever nefarious user provided a template like this:

{{#if File.write!("/path/to/webroot/index.html", "All your base belong to us!")}}
   Hello there!
{{/if}}

then I want to be able to catch it. It’s tough however… parentheses are optional, legitimate values may be quoted or not. The only thing I can think of is checking the input (File.write!("/path/to/webroot/index.html", "All your base belong to us!") in this case) to see if it
a) begins with a capital letter (i.e. if it might be a module name) or
b) begins with a colon (which would denote some Erlang code)

Does anyone have any other ideas? Many thanks!

Checking if a string contains “safe” Elixir code is a tough problem, but there are some tools in the language to play with.

If you have access to the some_variable part, the string containing the Elixir code to check, you can use Code.string_to_quoted/2 to get the AST representation of the code. This function will return an error when the code is not proper Elixir code. This is not executing the code itself, so it is safe to run (well, there is a limit in the number of atoms you can create, and the nodes of the AST will contain atoms, something to know beforehand).

You can use the AST form of the code to check which modules and functions are being called in the user-provided code. Unfortunately, this is not an easy process, because you also have to consider some tricks where modules are resolved at runtime (not being present on the static AST representation). If fact, it hard to get a fully secure solution unless you restrict much of the Elixir language (it may apply to your use case, tough).

I don’t want to use this reply to promote a personal project, but it is actually very related because the execution of untrusted Elixir code is at the heart of it. You can read it more about the decisions I made here: https://github.com/wyeworks/elixir_console#where-my-elixir-code-is-executed. I’m happy to keep discussing about it if you think it would make sense for your use case.

1 Like

Thank you, I wasn’t familiar with that function!

To be clear, I don’t need to check if the string contains safe Elixir code, I want to check if it contains ANY Elixir code – this is definitely a use case where restriction is desired (and yes, the heart of it sounds very relevant to your project).

As long as the Handlebars to EEx translation can detect safe (i.e. not code) inputs, the translation will work as expected. Continuing my previous example:

This Handlebars template

{{#if some_variable}}
   Hello there!
{{/if}}

is converted to this EEx:

<%= if some_variable do %> 
  Hello there!
<% end %>

Whereas this Handlebars template

{{#if File.write!("/path/to/webroot/index.html", "All your base belong to us!")}}
   Hello there!
{{/if}}

must not be allowed to be converted to this (valid, but dangerous) EEx code:

<%= if File.write!("/path/to/webroot/index.html", "All your base belong to us!") do %> 
  Hello there!
<% end %>

Maybe you can elaborate more on how to interpret the results of the Code.string_to_quoted/2 function? I don’t think I follow its output:

iex> x = ~S|File.write!("/path/to/webroot/index.html", "All your base belong to us!")|
"File.write!(\"/path/to/webroot/index.html\", \"All your base belong to us!\")"
iex> Code.string_to_quoted(x)
{:ok,
 {{:., [line: 1], [{:__aliases__, [line: 1], [:File]}, :write!]}, [line: 1],
  ["/path/to/webroot/index.html", "All your base belong to us!"]}}

iex> y = "nothing"
"nothing"
iex(26)> Code.string_to_quoted(y)
{:ok, {:nothing, [line: 1], nil}}

Thanks for any guidance!

This looks extremely hard and very error-prone.

You might be better off searching for a tool that converts your input to another format that is supported by Elixir.

1 Like

To understand the output of Code.string_to_quoted, you may find it interesting to learn more about “quoted expressions”. That’s related to the AST representation of Elixir source code. See https://elixir-lang.org/getting-started/meta/quote-and-unquote.html. It is an advanced topic that requires some time to master, at least that was my personal experience :slight_smile:

If I follow correctly, you want to filter out vast parts of Elixir, such as function invocations (., |>, among other variants) and maybe more. I would recommend thinking in terms of a whitelist instead. You may want to start allowing code that consists only in identifier names (represented as plain atoms in the AST) and some basic operators such as and, or, >, and so. You can play with iex to get familiar with the AST associated with elementary forms of Elixir code (the ones you could start supporting, if possible).

In any case, it is not easy to work with this stuff. I agree with the fact that it is very error-prone.

Converting anything from untrusted users into Elixir code that you then execute is doomed to be a security issue. There will always be some fancy combination of stuff that isn’t technically Elixir code that, when the template comes together, creates valid elixir code that is then executed. I’d use an Elixir library that evaluates mustache templates, instead of converting it to Elixir.

4 Likes

There are 2 approaches to a parser (that I can think of):

  • Covert to EEx
  • Parse everything from soup to nuts

What I have is the former… and to clarify, the entire point of building a Handlebars parser was because Mustache does not offer the features I require.

This is always a challenge, but I tend to look at templating as a user-land environment. Targeted mostly at designers and publishers, not developers. Therefore I would look to enhancing handlebars in this case, modelling the bits that are missing for you into a conceptual model that a designer or publisher could use.