The landscape of available Elixir packages for html tooling is overseeable but in that sense also very focused. Each library is there for a distinct use case.
In my humble conclusion, I state that benchmarking is not very useful since the goal and main strength of each library is different. Also the tested methods are not really comparable, since the implemented overhead is very different between each library. It is safe to say that all libraries perform very very fast.
All in all, I would say, the focused nature of the tools makes it easy for the user to pick the right tool for the job.
However, the ecosystem of tools is still quite young. There is room for improvement.
Please feel invited to discuss missing features or differences to other libraries in other languages.
I think it might be worthwhile to indicate whether a parser is HTML5 compliant (so Floki with html5ever parser but not default parser, Meeseeks, Myhtmlex, and ModestEx).
Edit: Also, it’s probably beyond the scope of an overview of HTML tools, but I kind of wish there was an explicit indicator of whether a library is intended to be used with XML- I see a fair amount of people reaching for Floki to work with XML when it just doesn’t have parsers intended for it.
Also true. I forgot. It is a feature that is easy to oversee however. I will also add this info to the table. Is Floki.map/2 only limited to changing attributes?
Yes I also agree here. But how would you like to see this information represented?
Really? That seems odd. Xml and html are completely different languages. Maybe they use the default parser and hope for the best.
Also popular resources like awesome-elixir only have a section for “XML” but not explicitly “HTML”. I think this is misleading!
As far as I can tell it only works for changing an element’s tag or its attributes.
Yeah, a lot of people use the mochiweb_html parser, which does an ok job with XML, but I’ve even seen people using the html5ever parser and that’s just a bad idea.
People using html5ever for parsing XML was one of the main motivations I had for adding an XML parser to Meeseeks- if they were going to try to do it anyway I wanted to provide a good parser for them to use.
Do you plan to add functions that let you manipulate nodes to Meeseeks?
Actually that was the motivation for me to start ModestEx, because I missed those features in other libraries. Or is it just me and people only need to parse HTML and not change it?
I’m pretty hesitant to at the moment- adding nodes in the right place as per the HTML5 spec, etc, is a pretty complicated topic, as is figuring out efficient ways to update what I currently treat as an immutable structure (the Document).
Meeseeks has from the first been designed as a tool to search for and extract data from HTML (and now XML), which is the purpose I use it for, and for the foreseeable future I plan to limit it to that.
Hey @f34nk, I only used floki in my projects and it suits for my needs.
I’m taking a look at your project https://github.com/f34nk/modest_ex and it seems awesome!
Good work dude, if I found something useful or missing about html tools I’ll post here!
Thanks man.
But what about drab, EEx and Phoenix.HTML?
Strictly speaking these are also “html tools”.
I would love to see more html processing on the backend side, rather then pushing half backed data to the client and let them figure out the rest.
I mean, website performance (and with that user experience) has gotten so worse since everything is being done “on load” or “async”.
IMO those projects are born out of the unique connection of phoenix and websockets. It just makes it much more obvious that you can do something like that. Or do I miss something? I am not a UI or even frontend guy, so maybe I don’t see the full picture here.
Drab is a framework, it handle with dom using websockets, he has your “own” controller caled “commander” and it uses Phoenix as a base, so is a complement.
IMO 90% of the JS devs just have no idea how to use them. And have in mind I am a JS hater so I ain’t gonna be one of these guys that tell you “just use it right” – but in this case it’s partially true. I’ve seen some rare JS website gems that are incredibly fast and smooth even on spotty 3G on an iPhone 5c.
That being said, I fully agree with you. And I am going back to server-side rendering more and more with time.
All parsers except ModestEx return html encoded into a list of tuples.
Meeseeks returns it as a Document, which is a flat map of node id to node struct.
You also appear to be using the :mochiweb_html parser for Floki, which is the non-HTML5 compliant one, so you’re comparing apples to the HTML5 compliant oranges of the other parsers. Of course, AFAIK it’s impossible to run Floki’s HTML5 parser on the latest version of OTP, but that’s a different problem all-together.
Also curious if when you’re benchmarking, are you disabling CPU throttling (as mentioned here)? I’ve found that can reduce variation in run times when benchmarking.
Finally I’m interested why the averages shown in the text results don’t appear to be reflected in the images: for instance it appears when looking at the images that 50k Floki is faster than 50k Meeseeks, but according to the text 50k Floki averaged 16633.17 µs/op while 50k Meeseeks averaged 12018.79 µs/op.
That’s odd.
One is the output of benchfella, the other benchee.
Do you have time to clone and repeat the bench yourself?
I actually did some benchmarking in C for my package and came out with different results too. I will have to investigate further to be sure what’s going on.
Hardware variation aside, it seems like the included graphs might just be a little wonky
The graphs I generated were more in line with the textual output and clearly showed Meeseeks parsing smaller input slower and larger input faster than Floki (which is not surprising when Floki is using the :mochiweb parser).