Enigma - A simple implementation of the Erlang VM in Rust

archSeer · January 8, 2019, 3:19pm

Hi! I’m usually a lurker, but I wanted to show you what I’ve been working on for a while. About a month ago, I started working on an Erlang VM in Rust. My goal was to learn more about OTP internals so I’d be able to work on some contributions to BEAM. It’s been a ton of fun, and I’ve quickly progressed to implementing most of the opcodes (but not all of the native functions) – I’m currently working on getting the application bootstrap to run (:init.start).

It’s an interesting look into how BEAM might look like if it was built from scratch today. I’m also planning to experiment with a few different ideas (alternative GC implementations, cross compiling to WebAssembly, different ETS backends, map structures, etc.), as well as writing a bit more information on how BEAM works internally (The BEAM Book is a wonderful resource, but incomplete in some parts).

Because of the scale of work, I’m looking for more contributors! If all this sounds exciting, or you’d like to learn more about OTP or low-level programming, please reach out

Also, a shoutout to @bettio ! I’ve seen the AtomVM announcement around the time I started working on Enigma, hopefully we can share notes on a few things

Link to repository: https://github.com/archSeer/enigma

OvermindDL1 · January 8, 2019, 4:19pm

Ooooo, this looks fascinating!

hauleth · January 8, 2019, 4:30pm

For supporting “GC-ish” functionality you can use per-proces arena. As Erlang always copy data between processes and most of the processes are short-lived then you do not need highly sophisticated algorithms and just free whole arena at once when process dies.

Project seems very interesting and when I will have some time I will try to contribute to it.

hauleth · January 8, 2019, 5:04pm

Also as binary pattern matches can match only beginning of the binary it is better to use something like fst or just finite automata, as even in multi pattern case you can build such machine and it should then work in linear time. It will be much easier and simpler rather than using Aho-Corasick. Bit patterns can be a little PITA, but even then you can just build N nodes for each combination that will match given bit pattern and match against it.

OvermindDL1 · January 8, 2019, 5:14pm

I wonder if this is worth posting at https://groups.google.com/forum/#!forum/beam-community or if not now perhaps when a bit more developed as it would gain a lot more visibility especially from core devs who might find this interesting.

bettio · January 8, 2019, 7:52pm

Sure, I would be glad to answer to any question Also let me know if you have any doubt about undocumented stuff

archSeer · January 9, 2019, 10:53am

So I sort of do that at the moment, unintentionally so Using a global allocator, I allocate a block of memory for each process to then bump allocate into (a process can always request more blocks). When the process dies, the block list is dropped, which then mass drops everything (in theory, I think the block just drops at the moment which means reference counts will be borked). So I definitely think arenas will be able to carry us for a while; given that I’m not at a point where we can continuously run larger programs.

Please do! (Also hi from /r/vim, I’ve copied a bunch of settings from your vimrc in the past Most things are in flux now (for example value/term implementation will change further, I don’t do much in terms of immediate vs boxed value packing. I brought it down today to 2 words/128 bits – enum discriminant + pointer/immediate, but I got a couple of interesting ideas from @michalmuskala yesterday if we tried JVM-style 32 bit sized pointers), but it’s always good to get more contributors with fresh ideas into the mix.

Yeah sorry, that README section is mislabeled, s/matching/searching/: Erlang uses aho-corasick + boyer-moore for binary:match/2,3 otp/erts/emulator/beam/erl_bif_binary.c at 9009934d56625249503210298e102d8e5744f546 · erlang/otp · GitHub

fst looks promising for binary matching/parsing though, I was also considering using nom since I already use it for the loader, and it can do bit level matching. (It’s also macro based so should be pretty efficient).

I think using rust’s regex which is basically PCRE without lookahead and some other features to make it linear time would be an interesting experiment – since it’s linear time, we’d be able to estimate reductions based on the string & pattern length. (OTP at the moment manually patches PCRE to be re-entrant, which is a very complicated mess to update to a new PCRE version. We’d drop some compatibility to make the implementation a bit simpler.

Looks promising once I get things a bit more sketched out! It seems to mainly be related to GSoC though?

Joined your slack channel! Maybe we could start a general #low-level or #vm/beam Slack channel?

hauleth · January 9, 2019, 12:45pm

I would <3 to see unikernel built on top of that, as I am still trying to write my app in that form, so I will just upload AMI and run that directly. It is sad that LING died and this can be basis for building similar solution without worrying about the VM itself (as this is both - binary and library at once).

archSeer · March 5, 2019, 2:48pm

Quick update: I’m making progress, currently about 30k instructions into the OTP boot-up. There’s binary and map support, process links and monitors, some limited file I/O, and I’m currently working on a minimal ETS implementation. It doesn’t work just out of the box yet, but I’ll have some preview builds soon I hope

I’ve also set up a channel on the Elixir Slack, #enigma for anyone interested.