Formular - A tiny DSL engine / code evaluator (configuration as code)

Hi, all,

I just published Formular package. It is a tiny library that evaluates a piece of Elixir code.

Online documentation

On the shoulder of Elixir’s Code module

Given a piece of Elixir code (as a string, or AST), Formular runs it with Elixir’s Code module under some security limitations.

So far, the limitations are:

  • No calling module functions;
  • No calling exit;
  • No sending messages.

Indeed, the whole library consists of only one thin module, thanks to the power Elixir has already shipped out of the box.

Motivation

Formular was developed to support some dynamic configuration scenarios. For example, in a scene of an online book store, the discount of a book can be dynamically configured as a piece of code, then evaluated by Formular:

iex> discount_formula = ~s"
...>   case order do
...>     # old books get a big promotion
...>     %{book: %{year: year}} when year < 2000 ->
...>       0.5
...>   
...>     %{book: %{tags: tags}} ->
...>       # Elixir books!
...>       if ~s{elixir} in tags do
...>         0.9
...>       else
...>         1.0
...>       end
...>
...>     _ ->
...>       1.0
...>   end
...> "
...>
...> book_order = %{
...>   book: %{
...>     title: "Elixir in Action", year: 2019, tags: ["elixir"]
...>   }
...> }
...>
...> Formular.eval(discount_formula, [order: book_order])
{:ok, 0.9}

In such a way, the discount calculation code, which changes frequently, is separated away from the stable business flow, and the primary code is probably more generic and flexible.

I’ve been using it in production for a while so I publish it today in case others may find it useful too.

Cheers!

9 Likes

Nice, though I’m curious if you tried using Sand?

1 Like

It is very unsafe implementation. Super simple example of how you can run arbitrary code with it:

Formular.eval(~S{
  import Kernel
  apply(IO, :puts, ["Hi"])
}, [])

And as soon as you have access to apply/3 (or any of the spawn_*/3 family) then you can run any code you want. In general as soon as you have access to import then you can do anything, and you do not prevent import in any way (it is imported by default as it is part of Kernel.SpecialForms).

If you want something like that, it is better to use any embedded language that is distinct from the Elixir and give it access only to needed primitives. You can take a look on Luerl or Erlog for example.

I suppose it wasn’t meant to be safe, meaning resistant to malicious input, but rather to impose some restrictions on the code that’s changing frequently to limit its impact on the system. Although I have doubts if that’s the correct approach, that’s why I mentioned Sand, that aims to be an actual sandbox.

Nice! I forgot that import is in Kernel.SpecialForms. It could be prevented after parsing.

Yes, the purpose was to separate complex configurations from code. Thanks for sharing Sand. I’ll give it a try!

Thanks for pointing out Luerl and Erlong which are very solid and good references. However, what I want to achieve is to compile the config into Elixir code which can be sent to and used in some Elixir applications.

Ideally, there can be a service with some UI to manage the configuration rules. On update of any rule, the change is synchronized to some services who are interested in the config. The configuration would be compiled into BEAM code so that it can be directly called in the code.

Lua, or Prolog also works in such scenario, but I prefer Elixir because:

a) Elixir has a more friendly syntax IMHO (personal taste?)
b) I think compiling to BEAM code instead of running in a sandbox is more performant. But I haven’t benchmarked it yet. Will try to see how different approaches work.

I built a configuration management system in Elixir years ago but the data format was a little lispy formatted JSON. It worked very well but I think compiling rules to Elixir code would be more fun! :smiley:

Forumlar 0.2.1 released

  • import and require are now disallowed in the code. Thank @hauleth for pointing it out. :slight_smile:

DoS (atom exhaustion):

Formular.eval(~S|for a <- %Range{first: 0, last: 100_000, step: 1}, do: :"#{a}"|, [])

I needed to create range manually, as you do not export ../2 operator.

4 Likes

I played with Sand as @mat-hek shared and it does what I wanted. Not implying by the name, under the scene, Sand runs the code with Code.eval_quoted/3 too. Only in a separated process which can be limited in reductions & memory usage. I think that is the right way to go.

Also, I did some benchmarks, and the result was surprising at first glance.

Code:

code = """
  squares = %{3 => 9, 4 => 16, 5 => 25}
  squares[3]
"""

ast = Code.string_to_quoted!(code)

Benchee.run(%{
  eval: fn -> {:ok, 9} = Formular.eval(code, []) end,
  eval_ast: fn -> {:ok, 9} = Formular.eval(ast, []) end,
  sand_run: fn -> {:ok, 9, _} = Sand.run(code) end,
  sand_run_without_cpu_memory_monitoring: fn ->
    {9, _} = Sand.run_without_cpu_memory_monitoring(code)
  end
})

(Sand doesn’t accept AST at this moment)

Result:

Name                                             ips        average  deviation         median         99th %
sand_run_without_cpu_memory_monitoring       23.91 K       41.82 μs    ±20.35%       39.04 μs       70.77 μs
sand_run                                      3.50 K      285.99 μs    ±28.04%      269.01 μs      509.55 μs
eval_ast                                      3.35 K      298.88 μs     ±9.51%      293.91 μs      437.12 μs
eval                                          3.09 K      323.70 μs     ±9.67%      317.00 μs      467.09 μs

Comparison: 
sand_run_without_cpu_memory_monitoring       23.91 K
sand_run                                      3.50 K - 6.84x slower +244.18 μs
eval_ast                                      3.35 K - 7.15x slower +257.06 μs
eval                                          3.09 K - 7.74x slower +281.88 μs

I figured out the reason after some research and it is very interesting. I’m excited about the work ahead.

Thank you @hauleth , really appreciate it. I know the next direction now.

1 Like

We use it in production now :slight_smile:
Thanks a lot for this package. We replaced Expreso with Formular and so far it’s been great.

I’m glad to see it is helpful to you too! And I really appreciate that you give it a try.

Please be aware that, as discussed here, Formular is not a safe sandbox right now. The design purpose is more about compiling your configuration into runnable code inside the application. So if the code comes from some untrusted user inputs, it could potentially damage the system. Improving is on the way though.

1 Like

I don’t get code from user input.
So safety is sorted for us.

The big benefit with Formular for us is that since it’s easy to understand, business can modify the rules.
And that way we can keep the logic as Formular rules and away from the codebase.

Thanks a lot for Formular.

1 Like

Two new versions have been released:

  • v0.2.2

    • brings performance improvements to 0.2.x
  • v0.3.0

    • allows limiting the execution time and heap size
    • allows compiling a code string to an Elixir module which brings more performance improvements,
    • adds new API Formula.used_vars/2 to extract the used variable names in the formula.

Highlights:

Performance improvements:

A simple benchmark on a simple formula results in this:

Name                      ips        average  deviation         median         99th %
compiled_module      947.84 K        1.06 μs  ±5536.01%        0.83 μs        1.45 μs
eval                  14.53 K       68.82 μs     ±8.93%       68.69 μs       87.93 μs
eval_ast               4.86 K      205.58 μs    ±18.43%      193.89 μs      333.89 μs

Comparison: 
compiled_module      947.84 K
eval                  14.53 K - 65.23x slower +67.76 μs
eval_ast               4.86 K - 194.85x slower +204.52 μs

The compilation approach is about 80x faster than the original eval approach in v0.2.2, and 300x faster than versions prior to v0.2.2.

Restricted evaluation:

  • limited execution time:

    Formular.eval(code, timeout: :timer.seconds(5))
    
  • limited max heap size by word:

    Formular.eval(code, max_heap_size: 15_000)
    

If limited, the evaluation will be run in a separate process.

Extract used vars API:

This can be helpful if you need to build some UI arround the formula.

iex> Formular.used_vars("a + b - 1")
[:a, :b]
2 Likes