Greetings fellow alchemists !
I have started to write an open-source interpreter in Elixir (https://github.com/nicolasdilley/dwarf-interpreter) one week ago and I decided to write some tutorials on each stage of the development of my new language. I have finished the first one on the lexer. If you are interested in interpreters or even compilers these tutorials would give you a good practical approach to write your own !
You can find these tutorials on my website : http://nicolasdilley.com/?p=1 I would love to have your opinion !
The language is called “Dwarf” and has already lots of functionalities. The next steps are implementing loops, a type system and a good stdlib. If you want to contribute on it feel free to pm me
Yay another person making interpreters on the beam! We are getting quite a collection, from a couple of lisps to a multitude of typed languages to a beam native Lua to others. I love seeing blogs about the actually design though, looking forward to updates!
The simplest strategy to decode the source program would be to look at each character. However, this would result in a huge amount of “if then else” statements and the algorithm would need to backtrack and look forward a lot. For example, for the keyword “print” we would have had to check for a “p” then for a “r”, if not print an error, then for a “i”, if not print an error, etc. This would result in an inefficient and a code which is hard to read. This is why we use regular expressions (regexps). Regular expression allows us to specify how words are layed out. For example, hereis a regular expression that checks if the input language contains the keyword print in Elixir:
Not really, PEG parsers are often quite faster than lexing/parsing steps without any slow regex or nasty cases. Like look at NimbleParsec for a simple context-less PEG library or my own ExSpirit PEG library if you need context information for parsing (slower but significantly more powerful, I’m hoping NimbleParsec gets this ability so I can deprecate ExSpirit, but the way it’s designed currently it’s very difficult to so no one has yet).
Thanks for the tips. Your library is very interesting ! I gave you a star However, I was talking about lexical analyzer in that article not parsers
I know, but PEGs remove the need for the lexical analyzer entirely while making everything far more readable and faster as well.
I did not know about them. I am doing it the old way then I am using a recursive descent parser for the parsing.
Yep, recursive decent split lexical parsing steps is very classical, PEGs became more popular about 15 years ago, but they are not taught in schools or anything yet so few know about them, even though they are significantly better designs in my opinion, they can act as a lexer, a parser, both, an AST builder, an AST runner, any mix thereof, all with near the same code, while if implemented properly will run circles around the classical methods in speed while being far easier to read.
Thanks I will definitely look more into it ! Do you know of any good reference ?
What kind of reference are you wanting? The Wikipedia article (I’m on phone or is be giving substantially more direct links) talks about the basics well enough, but actual implementations are a bit different. C++'s Spirit PEG library is easily the fastest one in the world and will not be best due to its unique code generation, but it is easily the most powerful form out too with the most abilities except streaming parsing (but you can implement that yourself via a custom iterator or a thread anyway, it can even reverse a custom format back to strings too, or parse binary, or whatever, its PEG engine can parse back and forth between streams of anything to custom data structures of anything). There is one standalone generator it somewhere that can generate PEGs via a custom PEG language that compile to a whole host of languages, forgot it’s name but it’s been around for a long long time and it’s decently fast. There are many blogs about it as well and a few papers. What do you prefer? Or can Google for Parsing Expression Grammers.
You are right lots of resources actually. Parsing Expression Grammars:
A Recognition-Based Syntactic Foundation should be a good start I think
If you find good references you should put them in a blog post too so others can see as well!
It’s really nice that you are writing an interpreter in Elixir! Welcome to the club!
I really hope to be able to continue working on Jux, the minimalist actor-model functional forth-like language at some point… so much other stuff that takes priority first though.
Could you tell us a little bit more about the design ideas behind Dwarf?
Thanks @Qqwy !
I am actually writing this language just to be a very simple and small language ( hence the name ). The aim is not to make anything new or fancy but to show a practical way of writing an interpreter. I found that when I wanted to learn how to build one, I could not find many practical advice and examples. In addition, I wanted to put in practice the theory that I am learning My area of interest is concurrency so I might use it to test some ideas in the future. For the syntax rules, you can see a definition in BNF form of the language in the README. There are also examples of source program in the benchmarks folder.
I have written a new article called “How to write a recursive descent parse using Elixir” on my website.
I have also added while loops to the dwarf language !
Why not eDSL? It’s a power of Elixir to use DSLs including compiler-like tasks.
Is Descent parsing is differing from parsing combinators?