Dynamically generating long-living but temporary and many functions in Elixir

This is mostly a question for anyone who knows the Elixir engine better than I, but conversation as well. :slight_smile:

I’m needing a little scripting language that generates functions that I can call, there will be a lot of them and they will be generated often.

Consequently I’m torn between the :erlua library, that does not match the messaging style semantics as well as I want (plus I need integers, not floats), and parsing out something to the Elixir AST tree and compiling it via Code.eval_quoted/3. However, Code.eval_quoted/3 would be called a lot to create functions in that AST that would be held in memory for an extended time before finally being ‘forgotten’ somehow. So I’m curious, where is the code held in where Code.eval_quoted/3 is made? And more importantly, how can functions that I no longer care about be removed so I do not blow the memory? Or should I manually create a module each update and basically ‘pause the world’ when I update its contents? Or any other ideas?

1 Like

I have the feeling you might be jumping to a possible solution too fast. What is exactly the problem you have? Maybe we can side-step the issue all together.

Here are some questions to get you started:

  • Where on the turing-completeness scale should this scripting language lie?
  • How performant does it need to be?
  • In what ways does the scripting language need to interact with the rest of the system? And with the outside world?
  • How often are snippets in the scripting language created? How often are they invoked?

For instance, it might be possible to create a DSL-interpreter on top of Elixir, rather than compiling down to Elixir code (and then having to worry about, for instance, code safety).

1 Like

I’m wanting users (untrusted!) to submit bits of code to do bits of work, like this could be some user code:

input + 42

I’m generating it by wrapping it up in the Elixir AST of basically a function to make this:

fn(input, other, named, args) -> user_code(here) end

Then eval’ing that AST to get that function pointer.

I will then about once a second be calling this function pointer on a set of data to map it (quite literally just doing something like Enum.map(datas, fn data -> userFunc(data, and, other, stuff) end. If their Process takes too much time or causes an error or whatever then it is killed and they get a report and lose some of their things due to their coding behaviour.

Fully turing complete, need to be able to define functions, structures, pass messages, basically needs the power of Elixir.

As fast as they really can possibly be, on the BEAM. Shelling out to another process would incur sizable overhead as 99% of the functions are short enough for the transfer time to dominate, and there will be a LOT of user functions being called (limited number per user).

Optimally I’d prefer to pass in, say, an Elixir module and they could only call the things on that (which is what I’ve managed to do so far). I’d like it to be as much like Elixir syntax as possible. It needs the full range of the BEAM types.

Per user created quickly in rapid succession as they test code, then long boughts of no changes, and there may be many users. They are invoked essentially all the time, basically every function will be invoked likely at least once per ‘tick’ if not many times per ‘tick’, and a tick is one full set of user code invocations (as well as book-keeping and such) when then another ‘tick’ will be run, thus ‘as fast as possible’ (In reality I’ll likely slow it down to be no more than once a second).

Thought of that, but it is a lot more work.

 

To be more descriptive, I’m basically making something Screeps-like, but using Elixir code as a fun teaching tool slash game to learn and get better with Elixir. Entirely doing it for fun and it likely will not amount to anything as it is purely a challenge to myself to figure out how to safely sandbox Elixir/Erlang code (as this question comes up many times over the years). I doubt I will succeed to be honest, the VM is just not built for it.

I do have one main alternate idea instead of compilation but it is significantly slower, however that slowness I could probably work into the game as a computational limiting factor (thus throwing the slowness of the code back into the game dynamics) and that would be instead of eval’ing the code into fast callable native functions, instead I’d evaluate the AST manually, running a certain amount of code per ‘tick’ (maybe even only one node per call), so I’d do something like this:

def run_user_ast(env, ast)
...
def run_user_ast(env, {:+, meta, [a0, a1]}) do
  {v0, env} = run_user_ast(env, a0)
  {v1, env} = run_user_ast(env, a1)
  sleep_tick()
  {v0 + v1, env}
end
...
def run_user_ast(env, int) when is_integer(int), do: {int, env}
...
def run_user_ast(env, {binding, meta, scope) when is_atom(binding) and is_atom(scope) do
  sleep_tick()
  value = Env.get_binding_value(env, binding)
  {value, env}
end

And so forth, which of course would be a perfect whitelist then but also means I need to re-implement a lot of stuff.

I would like it to be a fun little community teaching tool/game for Elixir, basically have them write code that controls all of their objects, perhaps having to manage their own time-sharing and all as well instead of a code per unit (as screeps does it, though I would prefer that method).

To be honest I’m probably going to end using another language just to get something complete because I know how limited the BEAM is in this way. Forth would be fun but is obviously not well known, but would be oh so easy to implement safely. :luerl would be easy to add in but I’d have to segment each thing in different processed to detect loops to kill them and such. Etc… etc… with outside Ports, etc… etc…

Making a Forth-style screeps would be so fun, but being forth no one would ever use it even though it could be perfectly segmented, each unit running its own code, passing messages between everything, etc… :frowning:

I could fairly easily do that by manually parsing Elixir ast too though, more grunt work there, but still…

:heart: Forth! :stuck_out_tongue_winking_eye:

In this case I think your best bet is to actually let users write Elixir code, compile that down to AST using Code.string_to_quoted, then running this through a function that whitelists (Very important to use a whitelist rather than a blacklist here!) a subset of all possible operations.

However, even this will be extremely hard to make somewhat safe.
For instance, how to prevent this:

x = :erlang
x.apply(EvilModule, :function, [1,2,3]);

Hehe, likewise, such an awesome and fantastic and a bit mind-bending language. ^.^

Already done, though only a first version (you can supply your own whitelist detector as the default list is very simple, though will grow later as I use it), just remember to pass in existing_atoms_only: true to the opts argument for user input instead of just dynamic input (hmm, maybe I should make that the default…).

Indeed, right now I have that construct entirely not allowed, and I’m fine with that for my purpose even if it does diverge a bit from Elixir.

If I went the path of interpreting each AST element then I could keep that construct no problem though as there’d be a whitelist of the actual calls and it would know the binding value. So many decisions…

Mostly I’m curious how the system handles having, say, 5 million functions being loaded as I doubt it garbage collects ones that are no longer referenced… That could be an immediate stopper for the initial form and I’d have to fall back to interpreting the AST manually.

1 Like

This is how you remove modules from the runtime:

:code.purge(MyModule)
:code.delete(MyModule)

It is still tricky to whitelist, for example you still allow calling any local function including imported ones and you allow importing, so that way you can call any function. You can also call apply and that way call any function in the system. You also allow atoms, so even though you disallow aliases you still allow :"Elixir.Kernel".

Yep, modules yes, but all I’m doing is just compiling the equivalent of:

iex(38)> {f, _} = Code.eval_string("fn x -> x + 1 end")
{#Function<6.52032458/1 in :erl_eval.expr/5>, []}
iex(39)> f.(42)
43

Where does that f anonymous function reside? How can I clear just it when I am done with it in, oh, 40 hours?

Hence one of my ideas above to use a container module and just recompile it anytime a single function on it changes, but that becomes far more hefty, even if on a user-by-user basis I could not do that (could be millions of ‘users’/bots).

Yep yep, lots and lots of various things (apply is disallowed in my default set, everything in Kernel itself is even disallowed in my testing library), hence the thought on having to disallow calling a binding like that (or transform the ast to one that checks that it is an allowed module first), hence why I am disallowing it by default, have to whitelist individual cases or transform the code to test and allow it instead. I’m being really hard on what is allowed currently.

Is this the default set https://github.com/OvermindDL1/safe_script/blob/master/lib/safe_script.ex#L177? All of the things I mentioned are allowed here.

It creates a closure in the erl_eval module that will be garbage collected like any other closure. It doesn’t actually create a new function, because of this one thing to keep in mind is that even though you can call the function from compiled code it will always be a slow function that is evaled every time you call it. Check here for internals: https://github.com/erlang/otp/blob/master/lib/stdlib/src/erl_eval.erl#L285.

I’m using this from elsewhere and the defaults there have not been kept up to date (I should do that, and make atoms not created default, and make requires default to []).

But yeah, the Elixir AST setup is not friendly to auto-generation as that, hence why transforming the AST first would likely help a lot, make it so every call can only go to certain places.

That is what I was curious about, in that case making me own interpreter over the AST really would be better, not be any more costly than what Elixir is doing now, and I can add features like only allowing a set amount of instructions per ‘tick’, plus much easier to make safe.

(EDIT: Well either that or compiling modules full of functions directly…)

You think it is worth making something Elixir’ish as a game teaching tool? Or should I toss it and go to Forth or so instead? ^.^