Meeseeks - A library for extracting data from HTML and XML with CSS or XPath selectors

Cool I’ll keep that in mind when writing stuff for the app i need to write.

Release v0.9.0

Meeseeks.Error

Prior to v0.9.0, errors in Meeseeks were all over the place- sometimes they returned {:error, string} or :error, sometimes they raised RuntimeErrors or ArgumentErrors or one of an assortment of custom Meeseeks exceptions.

To combat this, I have added a Meeseeks.Error struct that implements the Exception behaviour and used it throughout the library.

I go into more details about the rationale and implementation in this issue, but the quick takeaway is that this kind of error struct is flexible, plays nicely with pattern matching in places like case and with, and makes it easier to provide useful errors to users.

This is a breaking change because it modifies the returned or raised type of errors. If your Meeseeks-related code handles {:error, ???} or catches one of the old Meeseeks exception types, you will need to make changes.

I apologize for the inconvenience, but this change should lead to safer, more friendly code in the future.

Meeseeks.fetch_all and Meeseeks.fetch_one

The more that I use Elixir in anger, the more I appreciate functions that return {:ok, ...} or {:error, ...}.

In light of the feedback I received on this issue I decided to add Meeseeks.fetch_all and Meeseeks.fetch_one which work like Meeseeks.all and Meeseeks.one respectively, but wrap the result in {:ok, ...} if there is a match or return {:error, %Meeseeks.Error{type: :select, reason: :no_match}} if there is not.

Now it’s easier to write code like:

with({:ok, qt} <- Meeseeks.fetch_one(doc, css(".qt"))) do
  ...
else
  {:error, %Meeseeks.Error{type: :select, reason: :no_match}} ->
    ...
end

My thanks to those who provided feedback.

Other

A bug related to Meeseeks.html was fixed, see this issue for more details

4 Likes

Just popping in to say that I’m still loving meeseeks. I think I’ve used every selector it has (and made one as well) for parsing both large amounts of html and xml both. ^.^

3 Likes

Release v0.9.1

A small release fixing a couple bugs and some typespec problems, primarily thanks to work by @asonge.

The first bug fix is that Document.get_nodes/1 now raises instead of adding a nil to the returned nodes if a node is - impossibly - not found in the document.

The second bug fix is that Document.get_nodes/2 now actually works right.

2 Likes

Release v0.9.2

Super tiny update to allow the css and xpath macros to accept vars.

iex> import Meeseeks.XPath
Meeseeks.XPath
iex> path = "//li[last()]"
"//li[last()]"
iex> xpath(path)
%Meeseeks.Selector.Element{...}

It is worth noting, however, that using a var (or string interpolation) in the css or xpath macros moves the creation of the selector to run time, while using a static string literal allows it to be created at compile time. If your use case permits, prefer xpath("//li[last()]").

2 Likes

Release v0.9.3

This release fixes a Dialyzer-related problem identified by @sztosz and correctly diagnosed by @NobbZ. Thanks for your help.

4 Likes

Release v0.9.4

This release fixes some XPath selection bugs discovered by anulman.

3 Likes

Release v0.9.5

This release fixes another selection bug, again discovered by anulman.

2 Likes

Lol, awesome finds by @anulman, great update relate as always, this explains why my bot got update notifications, thanks much!

1 Like

Release v0.10.0

This release adds support for OTP 21.

5 Likes

Release v0.10.1

This is a very minor release adding “support” for Elixir 1.7. In truth it’s been working fine with 1.7 this whole time, but now Travis CI ensures that fact.

In addition to that I added a bunch of older Elixir+OTP combinations to also be tested by Travis CI. Meeseeks started out on Elixir 1.3 and OTP 19 (a combination on which it still runs fine, thanks to the awesome Elixir team), and rather than just testing that and the latest combination I now also test some past combination existing between those two.

5 Likes

Release v0.11.0

As of this release Elixir 1.3 is no longer tested or otherwise supported, and Elixir 1.8 is tested and supported. The minimum tested combination is now Elixir 1.4.0 and Erlang/OTP 19.3, and the maximum tested combination is now Elixir 1.8.1 and Erlang/OTP 21.0. Sorry Elixir 1.3 users, it was time.

This release also pulls in the recently released Meeseeks_Html5ever v0.11.0, which makes parsing faster and more memory efficient on Erlang/OTP 21.

Finally I’d like to note that as of a couple days ago Meeseeks is now over two years old!

I want to give a big “Thank you!” to all the people who have helped me design, implement, and test Meeseeks over the past couple years. Your collaboration and feedback have made the headaches worthwhile.

5 Likes

In honor of Meeseeks 2nd birthday the first post in this thread has been updated and now includes an examination of why Meeseeks was created when Floki already existed in the space, and why you might (or might not) want to use Meeseeks instead of Floki.

3 Likes

I’ve added an update to the memory use issue discussing the positive impact of Erlang/OTP 21.

2 Likes

I’ve updated the Meeseeks vs. Floki benchmark to the latest versions of both, as well as the latest Benchee which means there’re now memory numbers.

Meeseeks’s edge in speed is even smaller than before - Meeseeks does a little more now than before, and Floki got faster on OTP 21 - though the memory measurements seem to clearly favor Meeseeks.

4 Likes

Release v0.11.1

This release makes a couple of small improvements.

Firstly, Meeseeks now returns a better error when trying to parse a string of non-UTF-8 encoded content. Thank you to anulman for reporting that problem.

Secondly parse/2 now accepts :tuple_tree as a type, and parsing tuple trees with parse/1 has been soft deprecated and will emit a warning. Additionally, parsing tuple trees with unexpected node structure will now return a parse error instead of silently ignoring the node. Thank you to @axelson for reporting the parsing issue and discussing whether parse/1 should accept tuple trees.

4 Likes

Release v0.11.2

This release includes several fixes to CSS selectors.

Previously names, idents, and strings in CSS selectors would not accept escaped or unicode characters, now CSS selectors accept both escaped characters and Elixir-style unicode code points. A big thank you to @Ripster for creating the issue that led to these fixes.

iex(1)> import Meeseeks.CSS
Meeseeks.CSS

iex(2)> Meeseeks.one("<div id=\"~\"></div>", css("#\\~"))
#Meeseeks.Result<{ <div id="~"></div> }>

iex(3)> Meeseeks.one("<div class=\"❤\"></div>", css(".❤"))  
#Meeseeks.Result<{ <div class="❤"></div> }>

This release also includes some minor improvements to the CSS selector parser errors.

7 Likes

Release v0.12.0

This release fixes Meeseeks.html/1 so that when encoding attribute values double quotes are always used and & and " are escaped as character entities and when encoding text <, >, and & are escaped as character entities.

This should be considered a breaking change since the output for Meeseeks.html/1 may be slightly different, but means for instance that round tripping

"<span>&lt;script&gt;Hello&lt;/script&gt;</span>"

through Meeseeks.parse then Meeseeks.html produces

"<span>&lt;script&gt;Hello&lt;/script&gt;</span>"

instead of

"<span><script>Hello</script></span>"

A big thanks to @ericlathrop for bringing the issues to my attention and proposing a solution.

5 Likes

Release v0.13.0

This release adds support for Erlang/OTP 22 (and Elixir 1.9, though that was working fine before), and removes support for Elixir 1.4, Elixir 1.5, and OTP 19. I had only planned on removing support for Elixir 1.4 and OTP 19, but a recent change in Rustler made 1.6 its minimum version. Sorry for any inconvenience that this might cause.

4 Likes

Release v0.13.1

Since the minimum supported version of Erlang/OTP is now 20 it was possible to switch the NIF from working asynchronously to using a dirty scheduler, which simplified the NIF’s implementation and may speed things up (I’ll get an updated version of the Meeseeks vs. Floki bench out soon).

3 Likes