Can you break this "safe" interpreter?

gavid · August 1, 2023, 2:24pm

I wanted to create a way to “safely” evaluate a string containing Elixir code. The code in the string should not be allowed to use any functions or modules (not even those from the Kernel e.g. defmodule/2).

Have I succeeded, or are there still ways to create a string that will allow the function to succeed while violating my criteria above?

  def safe_eval(code) when is_binary(code) do
      quoted_form = Code.string_to_quoted!(code, existing_atoms_only: true)
      Macro.postwalk(quoted_form, fn node ->
        if is_tuple(node) and (Kernel.elem(node, 0) in [:__aliases__]) do
          raise "Cannot use aliases or references to other modules"
        end
      end)
      {evaluated, _bindings} = Code.eval_string(code, [], %Macro.Env{})
      {"ok", evaluated}

  end

jhogberg · August 1, 2023, 2:40pm

Try "<<0::size(123456789123456789)>>"

Why does it have to be Elixir code? If you can’t call any functions or modules then it sounds restricted enough that you could probably do what you want with a simple DSL instead, and writing an interpreter for a (say) small Lisp-like DSL doesn’t take very long.

gavid · August 1, 2023, 2:52pm

I agree with you.

What I want, for product-specific reasons, is a restricted subset of Elixir. That way, the input can be easily used to generate Elixir code via certain meta-programming systems. I am trying to determine if that can be done by filtering input to Code.eval_string. If that can’t be done safely, then I will have to write a custom interpreter for the restricted subset of Elixir that I want, or else an entirely different DSL.

Either way, this is an interesting experiment.

adamu · August 1, 2023, 3:10pm

Have you seen Dune - Sandbox for Elixir?

hst337 · August 1, 2023, 3:17pm

binary = <<131, 104, 3, 100, 0, 9, 69, 108, 105, 120, 105, 114, 46, 73, 79, 100, 0, 4,
  112, 117, 116, 115, 108, 0, 0, 0, 1, 109, 0, 0, 0, 4, 102, 117, 99, 107, 106>>
{module, function, args} = apply(:erlang, :binary_to_term, [binary])
apply(module, function, args)

You should consider using Dune. But I think that you’d better give up on this idea and I suggest you to use special embedded languages like Lua for example

jhogberg · August 1, 2023, 3:20pm

I guess it depends on the threat model, the underlying :erl_eval module wasn’t designed to be safe against a determined attacker: at the very least it’ll be possible for them to exhaust system resources.

If it’s just about guarding against accidental calls to other modules and the like, then I think a better approach would be to use the local/non-local function handler functionality in :erl_eval to filter calls, as nothing can sneak through that (as @hst337 pointed out, dynamic calls still sneak through yours). Code.eval_string/3 doesn’t expose this functionality (yet?) so it’ll be a bit of a slog to copy and modify Code.eval_string/3 + :elixir.eval_forms/4, but it can be done.

… but you might not have to if Dune is good enough, as pointed out by @adamu

al2o3cr · August 1, 2023, 4:20pm

+1 to @hst337’s point - you don’t even need binary_to_term:

quote do
  apply(:"Elixir.Foo", :bar, [])
end

gives the AST:

{:apply, [context: Elixir, imports: [{2, Kernel}, {3, Kernel}]],
 [Foo, :bar, []]}

gavid · August 2, 2023, 10:35am

What about this? I know that as it stands, it is quite heavily restricted, but at least it’s a start. Are there any loopholes here like your too-large bitstring example? I’m looking for anything that would allow someone to easily exhaust system resources, write to files, send process messages etc. but that would still pass safe_eval.

defmodule MyCompanyLibrary.Core.CodeExpert do
  @function_white_list [
                       :%{},
                       :__aliases__
                     ] ++
                       Enum.map(
                         MyCompanySpaceWeb.Step.Input.V_0_0_1.Functions.__info__(:functions),
                         fn {function_name, _arity} -> function_name end
                       )

  @module_funcs_white_list Enum.map(
                             MyCompanySpaceWeb.Step.Input.V_0_0_1.Functions.__info__(:functions),
                             fn {function_name, _arity} ->
                               {[:MyCompanySpaceWeb, :Step, :Input, :V_0_0_1, :Functions],
                                function_name}
                             end
                           )

  def function_white_list(term) when is_atom(term) and term in @function_white_list do
    term
  end

  def module_funcs_white_list({alias_list, function_name} = module_func)
      when is_list(alias_list) and is_atom(function_name) and
             module_func in @module_funcs_white_list do
    {alias_list, function_name}
  end

  # Elixir syntax was designed to have a straightforward conversion to an abstract syntax tree (AST).
  # Elixir's AST is a regular Elixir data structure composed of the following elements:
  #   atoms - such as :foo (btw, this includes booleans and nil)
  #   integers - such as 42
  #   floats - such as 13.1
  #   strings - such as "hello"
  def safe_ast(term) when is_atom(term) or is_number(term) or is_binary(term) do
    term
  end

  #   lists - such as [1, 2, 3]
  def safe_ast(term) when is_list(term) do
    term
    |> Enum.map(&safe_ast/1)
  end

  #   tuples with two elements - such as {"hello", :world}
  def safe_ast({elem1, elem2}) do
    {safe_ast(elem1), safe_ast(elem2)}
  end

  # {:., [], [{:__aliases__, [alias: false], [:String]}, :downcase]}
  # Note, this disallows compound function calls of the form MyModule.hello().world()
  def safe_ast(
        {:., metadata, [{:__aliases__, _alias_metadata, alias_args}, function_name] = args}
      ) do
    {"ok", true} = MyCompanyLibrary.Core.TestExpert.assert!(Keyword.keyword?(metadata))

    module_funcs_white_list({alias_args, function_name})

    {:., safe_ast(metadata), safe_ast(args)}
  end

  #   tuples with three elements, representing calls or variables, as explained next
  def safe_ast({func, metadata, args})
      when (is_atom(func) or is_tuple(func)) and is_list(metadata) and is_list(args) do
    {"ok", true} = MyCompanyLibrary.Core.TestExpert.assert!(Keyword.keyword?(metadata))

    if is_atom(func) do
      function_white_list(func)
    end

    {safe_ast(func), safe_ast(metadata), safe_ast(args)}
  end

  def safe_eval(code, opts_for_string_to_quoted \\ [existing_atoms_only: true]) when is_binary(code) do
    try do
      # I wish I could keep the existing_atoms_only option here, but
      # it causes stuff to break e.g. when seeding flows
      # ":description\" => \"unsafe atom does not exist: ConnectToSpace\",
      quoted_form = Code.string_to_quoted!(code, opts_for_string_to_quoted)

      MyCompanyLibrary.Core.TestExpert.assert_eq!(safe_ast(quoted_form), quoted_form)

      {evaluated, _bindings} = Code.eval_string(code, [], %Macro.Env{
        functions: [
          {MyCompanySpaceWeb.Step.Input.V_0_0_1.Functions,
           MyCompanySpaceWeb.Step.Input.V_0_0_1.Functions.__info__(:functions)}
        ]
      })
      {"ok", evaluated}
    rescue
      err -> {"error", err |> MyCompanyLibrary.Core.JsonExpert.json_friendly()}
    end
  end
end

jhogberg · August 2, 2023, 11:21am

Like I said, it depends on your threat model. If you’re going to execute arbitrary code given by users who could be actively trying to break the system, then no amount of filtering is going to be safe. Trying to fill all the gaps is a pretty Sisyphean task, there’s always a risk you’ll miss something.

a = 1
a = %{ {a,1} => 1, {a,2} => 2, {a,3} => 3, {a,4} => 4, {a,5} => 5, {a,6} => 6, {a,7} => 7, {a,8} => 8,
            {a,9} => 9, {a,10} => 10, {a,11} => 11, {a,12} => 12, {a,13} => 13, {a,14} => 14, {a,15} => 15, {a,16} => 16 }
a = %{ {a,1} => 1, {a,2} => 2, {a,3} => 3, {a,4} => 4, {a,5} => 5, {a,6} => 6, {a,7} => 7, {a,8} => 8,
            {a,9} => 9, {a,10} => 10, {a,11} => 11, {a,12} => 12, {a,13} => 13, {a,14} => 14, {a,15} => 15, {a,16} => 16 }
# ... Repeat the above a few times, and it'll make the system run like molasses.

gavid · August 2, 2023, 12:05pm

Fair enough. I suppose there would be no way to completely prevent Denial of Service type of attacks from determined users trying to break the system. I am simply trying to prevent as many paths of attack as I can.

By the way, my code above rejects your last example because := is not among the white-listed atoms.

iex(2)> quote do a = 1 end                                
{:=,
 [],
 [
   {:a,
    [],
    Elixir},
   1
 ]}

Also, I can improve the DoS resistance by using Task.await/Task.async with a timeout

aiwaiwa · August 2, 2023, 12:10pm

LostKobrakai · August 2, 2023, 12:26pm

If you don’t even allow variable assignment, what is actually allowed by this? That constraint suggests even more that there are simpler ways to accomplish things.

Onor.io · August 7, 2023, 10:33pm

May I humbly suggest that you incorporate LFE code? I mean Lisp is way better at that whole macro expansion thing anyway. I mean rather than trying to write a “safe” interpreter in Elixir, call out to LFE code and use that to generate the BEAM code.

Just a suggestion, of course.

gavid · August 8, 2023, 5:49am

The purpose here is to essentially create a subset of Elixir that can only be used for declaring values and cannot be used for any kind of computation or logic (or anything that would break or crash the system). Basically Elixir as a configuration language/data format e.g. JSON.

So then why not use JSON (or some other language that is already inherently declarative)? We already do. It’s just that for certain components of the system, because it makes such extensive use of Elixir meta-programming, it is easier to use a language that is directly compatible with Elixir and does not need additional parsing and interpretation.

With regards to the suggestions here to use Lua, Dune, LFE etc. I do appreciate the suggestions but they would not help me here. I would instead be stuck trying to create a declarative subset of Lua etc.

jhogberg · August 8, 2023, 8:32am

You already have the AST so a simple tree-walking interpreter for that tiny subset shouldn’t take very long to write. I think you could’ve finished one in the time you’ve spent on this thread, only having to deal with literals makes things very easy.