Elixir alternative to pygments

tmbb · July 13, 2017, 10:19am

Is there any elixir alternative to pygments? I can’t find anything on hex. The core is obviously very easy to port, but the real work is in porting all the lexers. It’s probably very easy too, but tedious and repetitive, and to be properly done it probably requires at least a cursory knowledge of the programming language the lexer is highlighting.

tmbb · July 13, 2017, 3:34pm

An alternative is to copy what haskell has done. They’ve got a library that parses kate syntax XML highlighting files and generates lexers from them at compile time. The format of the XML files is quite simple, but most of the XML files are LGPL, which is probably too restrictive for most people.

Another alternative is to parse textmate’s syntax highlighting files, but their grammar is more confusing because they do funny things with regexs…

cjk · August 16, 2017, 7:23am

Depending on your use case you could do what pygments.rb does: long running python processes to which you pipe your code you want to highlight.

tmbb · August 16, 2017, 10:31am

Thanks, but I’ve already written my pure elixir version of pygments:

It only has two lexers right now (elixir and HTML), but I’ll keep adding the in the future and write some docs so that you can contribute your own.

My approach is technically superior to Pygments, and can do more interesting things (like highlighting matching delimiters or do ... end blocks). This is becaus I use a real PEG parser instead of the Regex approach taken by Pygments.

cjk · August 17, 2017, 5:07am

Neat! Will have a look, thanks!

ShalokShalom · August 17, 2017, 2:10pm

Is it possible to use the stack of Pygments? Simply porting it to Makeup?

tmbb · August 17, 2017, 2:20pm

I don’t understand. Are you asking if one can port the Lexers from Pygments to makeup?

ShalokShalom · August 17, 2017, 2:55pm

Yep. See? You understand

tmbb · August 17, 2017, 2:58pm

The obvious answer is that yes, you can port the lexers as long as you port the Python code to Elixir.

The practical answer is: Yes, but you have to rewrite the weird Pygments’ state-machines into a grammar for a PEG parser (which is way simpler than the original state-machines.

The crazy answer is that tah Pygments lexers are somewhat data-driven, if you squint a little (a lot), so it’s probably possible to write a Python program that compiles most of the Lexers into Elixir. Writing such program would be highly non-trivial, and my library is technically superior (and more powerful) than Pygments in a fundamental way, so I think that it’s better to just port the lexers.

I’d like to write a guide on how to write the lexers and crowdsource it to the polyglot members of the Elixir community, but I have no time for that now…

ShalokShalom · August 17, 2017, 3:02pm

Some Perl 6 guys tried Pygment yesterday and they mean it’s not very correct.

tmbb · August 17, 2017, 3:03pm

The elixir lexer had some problems too. I’ve corrected them in my implementation. Pygments in general is not a very good piece of software. The real killer feature of pygments is the enormous number of man-hours put into writing lexers for all those languages. The lexers themselves are not necessarily very good.

ShalokShalom · August 17, 2017, 3:07pm

How can someone invest so much into a project which is based on such weak bones?

tmbb · August 17, 2017, 3:19pm

Hm… I don’t want to criticise Pygments too much. Pygments is very versatile, and even though I don’t like the architecture for its lexers, it’s a fact that people have been writing new lexers. Some people might even prefer their way of writing lexers. If I were to start Pygments today, I’d use a PEG parser like Arpeggio, but they surely had their reasons to start with a state-machine based approach… Maybe there were no good PEG parsers for python in 2006? Maybe despite my opinion egexes + state-machines are easier to use than PEG parsers? Maybe people started writing things that way and then there was already too much code to change it to a different architecture? I don’t know.

My own design borrows a lot from Pygments: it has lexers, formatters and styles. The styles themselves were shamelessly copied from Pygments (by writing a python program that introspects the styles and turns them into Elixir modules), and my HTML formatter is not as feature rich as theirs. I can copy all of their features given enough time, but I’m only one person and have limited time… Have you looked at their list of contributors? They mention 212 people… My project is a joke in comparison xD

tmbb · August 17, 2017, 3:25pm

Just added my own post as a solution.

ShalokShalom · August 17, 2017, 4:01pm

I guess this fundamentally supported by the huge Python community itself.