In my humble conclusion, I state that benchmarking is not very useful since the goal and main strength of each library is different. Also the tested methods are not really comparable, since the implemented overhead is very different between each library. It is safe to say that all libraries perform very very fast.
All in all, I would say, the focused nature of the tools makes it easy for the user to pick the right tool for the job.
However, the ecosystem of tools is still quite young. There is room for improvement.
Please feel invited to discuss missing features or differences to other libraries in other languages.
Edit: Also, it’s probably beyond the scope of an overview of HTML tools, but I kind of wish there was an explicit indicator of whether a library is intended to be used with XML- I see a fair amount of people reaching for Floki to work with XML when it just doesn’t have parsers intended for it.
As far as I can tell it only works for changing an element’s tag or its attributes.
Yeah, a lot of people use the mochiweb_html parser, which does an ok job with XML, but I’ve even seen people using the html5ever parser and that’s just a bad idea.
People using html5ever for parsing XML was one of the main motivations I had for adding an XML parser to Meeseeks- if they were going to try to do it anyway I wanted to provide a good parser for them to use.
I’m pretty hesitant to at the moment- adding nodes in the right place as per the HTML5 spec, etc, is a pretty complicated topic, as is figuring out efficient ways to update what I currently treat as an immutable structure (the Document).
Meeseeks has from the first been designed as a tool to search for and extract data from HTML (and now XML), which is the purpose I use it for, and for the foreseeable future I plan to limit it to that.
But what about drab, EEx and Phoenix.HTML?
Strictly speaking these are also “html tools”.
I would love to see more html processing on the backend side, rather then pushing half backed data to the client and let them figure out the rest.
I mean, website performance (and with that user experience) has gotten so worse since everything is being done “on load” or “async”.
IMO those projects are born out of the unique connection of phoenix and websockets. It just makes it much more obvious that you can do something like that. Or do I miss something? I am not a UI or even frontend guy, so maybe I don’t see the full picture here.
IMO 90% of the JS devs just have no idea how to use them. And have in mind I am a JS hater so I ain’t gonna be one of these guys that tell you “just use it right” – but in this case it’s partially true. I’ve seen some rare JS website gems that are incredibly fast and smooth even on spotty 3G on an iPhone 5c.
That being said, I fully agree with you. And I am going back to server-side rendering more and more with time.
All parsers except ModestEx return html encoded into a list of tuples.
Meeseeks returns it as a Document, which is a flat map of node id to node struct.
You also appear to be using the :mochiweb_html parser for Floki, which is the non-HTML5 compliant one, so you’re comparing apples to the HTML5 compliant oranges of the other parsers. Of course, AFAIK it’s impossible to run Floki’s HTML5 parser on the latest version of OTP, but that’s a different problem all-together.
Also curious if when you’re benchmarking, are you disabling CPU throttling (as mentioned here)? I’ve found that can reduce variation in run times when benchmarking.
Finally I’m interested why the averages shown in the text results don’t appear to be reflected in the images: for instance it appears when looking at the images that 50k Floki is faster than 50k Meeseeks, but according to the text 50k Floki averaged 16633.17 µs/op while 50k Meeseeks averaged 12018.79 µs/op.
Hardware variation aside, it seems like the included graphs might just be a little wonky
The graphs I generated were more in line with the textual output and clearly showed Meeseeks parsing smaller input slower and larger input faster than Floki (which is not surprising when Floki is using the :mochiweb parser).