How do online Elixir interpreters work? Do you think I can do it?

To tell you about my level, I am extremely serious about mastering Elixir, I have read through nearly all the documentation for Elixir , have read a tutorial series cover to cover while doing exercises, and have read one introductory book cover to cover, while trying many things, for example I just put up a GenServer and supervisor simple example. I am committed to fully mastering Elixir and making it my specialization.

Given my current level of progress, do you think I would be able to put up an online interpreter where I execute user-submitted code?

I am thinking of these examples - basically the top results for “online elixir interpreter” :

I would like to make my own Elixir learning game that is online and uses some similar functionality.

How do you think these are done at the architecture level? Since there are several different online interpreters it seems like something that is pretty easy to do in Elixir (otherwise there wouldn’t be so many sites doing it), but I’m not sure of the approach that these online interpreters take.

How do you think they’re structured architecturally? How would you go about making an online interpreter like this?

Obviously I can’t just write out whatever the user submits to a file and execute it with Elixir right on the server as the user could include lines to read or modify my server settings.

I am thinking the way the above sites must do it is either that they use a subset of allowed Elixir (things that don’t include file access) and control how long processes can run, or perhaps they run it in a read-only VM without network access and just reset it after each request.

I think it would be a good demonstration of mastery of the basics of the Elixir language, and I am thinking of making an Elixir game out of it where users can complete small challenges to help learn Elixir. Let me know how you think the above sites do it, and whether you think it is something I might be able to do given my level.

Shortly summarized they all suck. You either limit the functionality to a straight minimum, or you have potential leaks and security vulnerabilities. This is due the fact that concurrency is a vital part of the ecosystem, and it naturally doesn’t bode well with random people creating elixir processes on your server.

The only sane way to make a fully isolated sandbox is to create a VM for every user, which is not an affordable approach even at the small scale, maybe if you have a couple of huge servers in your basement :sob: .

You might be able to run isolated processes if you compile Erlang to WASM and test all the security holes are filled. Should be a fun weekend project :wink:.

1 Like

Based on how some services are priced, that may be exactly what’s happening: for instance, repl.it’s free plan limits code to a slow CPU and 512MB of RAM, while paying for the service lets you upgrade both.

An option to side-step the whole issue would be to run the code on the browser side with something like Firefly.

You either limit the functionality to a straight minimum, or you have potential leaks and security vulnerabilities.

I don’t mind “limiting the functionality to a straight minimum”, I think it could work okay. After all I am interested in making a game to teach certain concepts and syntax, so a minimum of functionality would be okay. I can just teach whatever my limited version supports.

How do you think I could limit the functionality?

There are other discussions on the topic. Search the forum for sandbox.

4 Likes

That is an interesting idea, but if it can only compile Erlang (with a lot of extra steps) I am sure compiling Elixir itself on top of it is not going to really be viable, Elixir makes a lot of assumptions and has machine-specific code, I am sure a wasm build of Erlang on Firefly isn’t one of the supported platforms.

Plus isn’t Erlang like 300 megabytes itself in a native build? My Erlang OTP installation folder is 475 megabytes, so presumably we would need something like that size to support Elixir.

I don’t think I can expect my users to want to devote 500 megabytes of RAM to a web assembly version of Erlang and Elixir, and there is nothing in the Firfly document about speed so I’m not sure if it could be expected to be fast enough at the end.

It seems doing whatever the three online services I linked are doing would be the best way.

Great, this seems exactly what is needed. Do you think any of the three online compilers I linked in my original comment use Dune?

Thank you @cmo for mentioning Dune :heart:

This is actually exactly the use case I had in mind when writing Dune.
I planned to write such a learning game myself but never got to it (yet?) :sweat_smile:

It is doing everything you’re mentioning: limiting functionality, running time and memory for a process…
You can give it a spin at https://playground.functional-rewire.com/.

Great, this seems exactly what is needed. Do you think any of the three online compilers I linked in my original comment use Dune?

It doesn’t seem to be the case. Dune would fail with a DuneRestrictedError if you try to run File.read!("foo") for instance.

3 Likes

If you are prepared to host your own VMs on a cloud such as Vultr, then you could run FreeBSD VM on their plans and provide the Elixir environment using FreeBSD Jail’s which provide isolation without VM overhead and which can be read only and ephemeral if you need also. Jails are similar to containers but they don’t have the security weaknesses that require running them in VMs (which is what cloud providers like fly.io and Amazon have to do).

Using thin jails (a base zfs image root shared by each jail instance) means you have very minimal additional disk overhead for each instance. The FreeBSD kernel will ensure that only one copy of anything read from disk is mapped into a page in ram so you will have good memory utilisation across all jail instances. However it depends if the Erlang BEAM maps modules into ram or reads into buffers if that actually results in a real memory copy in each BEAM instance.

You can also limit each jails memory and CPU usage using rctl.

Unless you have a solid revenue model or money to burn you would not want to be providing any kind of execution environment to anyone as it can easily be abused for crypto mining, botnets and other nefarious purposes.

Supporting and operating such a service where users can execute code on your own server will cost money and it will need to be monitored for abuse , so make sure it’s more than just a hobby curiosity.

It is different when a REPL is running in a user’s own browser as it’s running on their resource and that is really what you need. There is currently no port of the Erlang BEAM to WASM yet (which would allow running in the browser). It is definitely doable if you wanted to go down that path and port Erlang to WASM but it will require some lower level systems programming experience. The easiest path today is most likely porting to WASM using WASIX as it provides the core POSIX library routines such as threading that the BEAM can leverage to make the port more straightforward. Again this is a significant undertaking and will depend on your experience if it’s feasible for you to persue this or not.

3 Likes

Well my thoughts exactly when I mentioned that existing solutions suck. They limit so much functionality to the point that it is no longer useful, might as well spin a new project to try something out.

From what I remember, just by the nature of how runtime boostraps its other functionality it is not possible to port to systems like WASM, this is one of the reasons people starting working on projects like firefly.

1 Like

Thanks for your post sabiwara. andrewh mentions " Unless you have a solid revenue model or money to burn you would not want to be providing any kind of execution environment to anyone" and " will cost money and it will need to be monitored for abuse".

What were your experiences running https://playground.functional-rewire.com/ so far?

Also, since you already have experience running this, perhaps we could collaborate so I could focus on the front-end of my game and use your infrastructure on the back-end, if you are interested please send me an email and I will reply with a proposal: rviragh at gmail is my email address.

1 Like

How do you mean here? If you look at the size of the BEAM executable, beam.smp, for OTP 26.2.1 it 3956264 bytes big, 3 Mb. How much memory is used when you run it completely depends on what you do. I have run erlang on boards with 32Mb of memory and it could do things, i this case running a small very simple lego robot.

The size of the directory containing ALL the standard Erlang executables (including beam.smp) included in the complete release is 10Mb.

The size of an Erlang/OTP installation also depends what you decide to include in it. What you mention is the complete release which basically contains ALL the standard applications. When you build a release you decide which applications you want/need to include and that determines its size.

The same applies to Elixir.

Just to wind up if you are deciding to do a new release just to make small then you have keep track of where the memory goes and how much you can expect to save. Of course, if you are doing an implementation just for the fun of it because you like implementing languages is a completely different reason, one which I whole heartedly condone. :laughing: :wink: :laughing:

6 Likes

Thanks for all this, Robert, but I am not expert enough to package something like that on a platform that Elixir doesn’t even target. Currently my Elixir installation on my base Amazon image of Ubuntu is broken, Elixir segfaults, and I couldn’t resolve it after trying for several hours, so now I just run a docker image of Elixir on my Linux server, which works fine. So if I can’t even successfully install Elixir on my base image of Ubuntu on an Amazon ec2 instance, an architecture that Elixir 100% absolutely supports, then I doubt very much that I would get anywhere near successfully compiling Elixir to target a wasm build of Erlang which I make myself.

If it were so easy someone would have done it already, I don’t think it is easy at all. Think of it this way, if an Elixir consultancy were given a product requirement: “We need a fully functional Elixir running on Erlang built for Web Assembly and able to run on the front-end and take less than 50 megabytes of space while supporting most standard Elixir functionality fast enough for a past-paced front-end game” how much do you think their quote would be for and how long do you think it would take? I think they would quote hundreds of hours at a senior developer’s rate, perhaps thousands of hours. And also, I’m not sure they would succeed at the end. They might have to rewrite parts of Elixir to support such a target architecture.

If someone else wants to do it that’s great but this is completely outside of my current level of expertise. I literally started learning Elixir in December. The last thing I’m thinking is, “Yep I think it’s time to compile Erlang for Web Assembly, target just the parts I need including the parts of beam.smp that Elixir depends on, patch Elixir successfully to target this webapp platform, and bundle it all into a complete web assembly application.”

I’ve shaved my fair share of yaks over the years but this is beyond my skills or abilities and may not even be possible.

Also, at the end of it it may be extremely slow, take a look at this web assembly project, start it up and try some of the built-in interpreters (like ruby and python) and tell me how you find the speed: https://webvm.io/

That one runs a special kernel for webasm and actually supports, out of the box, GCC / Clang / Python / Node.js / Ruby. It doesn’t support Elixir/Erlang at the moment but it might be possible to build special versions of them. However, I’m not sure the speed would be fast enough for a game.

If anyone has any link to any front-end demonstration of anything like an Erlang or Elixir build done in web assembly I’d love to have a look at it, especially to see the performance and whether it might be fast enough for an Elixir learning game.

I just benchmarked webvm versus my aws micro instance and the difference was
72 seconds versus 2.77 seconds for the benchmark I ran. That is 25x as long. So I’m really not sure if the result will be fast enough for my game.

Of course security-wise a front-end only solution would be a dream as there is nowhere for the users to escape their sandbox to but I am not sure a webassembly solution can ever be fast enough.

GitHub - jjcarstens/extty: Terminal shell emulation as a process could be a helpful example. It’s IEx as a process and you would send IO back and forth. We use it for remote shells and consoles over various transports. Not completely on the frontend, but a simple websocket gets your IO back and forth to a dedicated process

Slightly off topic, but somewhat relevant is the product I work on. The backend is an Elixir cluster, hosted on AWS via k8s. It’s setup with very limited access for security (no ssh or IEx). Debugging was a pain.

So I implemented the following process:

  1. Write the Elixir code you want to execute on the backend (normally short snippets of code)
  2. Merge into a branch and send out a PR
  3. Once the PR is approved Github Actions kicks in to:
    3.1 Package the snippet into an ephemeral k8s container that contains a special Elixir app.
    3.2 The app parses the code snippet to make sure no one has done anything like System.halt
    3.3 Expands special helper functions that we allow in the snippet of code
    3.4 Wraps the snippet around an anonymous function with some extra code (see below)
    3.5 Connects to one of the backend nodes and uses :erpc to call Code.eval_string that turns the code into an actual function
    3.6 Calls Kernel.apply/2 (via :erpc.call) to execute the code.

So the snippet ends up looking like:

"fn -> 
    snippet = fn -> "your code" end
    p_node = :cs_apply@some_ip
    p_pid = self()
    s_pid = spawn(fn -> 
                    receive do _ -> :ok end # msg from monitoring code
                    send(p_pid, snippet.()) 
                 end)
    spawn(fn -> "monitoring code" end)
    receive do 
       m -> m 
    after
      5000 -> Process.exit(s_pid, :kill)
    end 
end"

The :erpc.call has an additional timeout.

The monitoring code checks to ensure the process running the snippet behaves (not using too much memory, too many reductions etc) and also monitors the remote node.

Of course we could’ve also written a GenServer and have that do all steps 3.2 onwards, but we really wanted no extraneous code on the core product. It could, however, work here?

The main points are the code to execute has been parsed to ensure no one does anything nasty, it has a timeout and something monitors the code as it is running to ensure it’s not being used for bitcoin mining or similar.

1 Like

He’s right when he mentions these disclaimers which would apply to anything else more serious than a hobby project, which was my case: there is no business model behind the playground, no user database to hack, no big infra…

On the other hand, Dune provides a snappy enough experience on a Gigalixir free instance for what I needed. Anyone trying to hack it to mine bitcoin would be much better off just getting one of these themselves :slight_smile:

The only infrastructure here is Dune and Gigalixir :wink:

I sent you an email.

Cheers!

1 Like

You can also do it just for the fun of doing it. :smile: It is actually quite fun to do one.

2 Likes