Meeseeks - A library for extracting data from HTML and XML with CSS or XPath selectors



Release v0.8.0

This release:

  • Ensures Elixir 1.6 compatibility
  • Adds a .formatters.exs and formats the project
  • Fixes some typespec errors, thanks to @OvermindDL1 for raising the issue
  • Adds Document.delete_node/2, thanks to @willbarrett for the contribution
  • Adds get_root_ids/1, get_node_ids/1, and fetch_node/2 to Document
  • Improves the safety of many Document functions by raising when node_id does not exist in the Document (before they might have raised or might have handled the problem gracefully)

Special thanks to Will, who becomes the first code contributor other than myself.


Looking for feedback as to whether I should add Meeseeks.fetch_one and Meeseeks.fetch_all, and what the :error value should look like if I do.


@mischov Is there an easy way, to get an xpath of a found node based on the node_id of that node? I simply need to get a path for a given text and get element from that path from another page to check if they match. All the info I need is there, but to be honest I don’t want to reinvent the wheel :wink:

Though I was unable to find any such functionality. So if there is no such functionality I’m curious if you would accept PR for such functionality, getting xpath based on id of a node. I’m not saying I will find a time to write it soon, just exploring options here :wink:


Ha! That’s an interesting one!

No, that functionality doesn’t exist in Meeseeks.

I would be open to a PR for that functionality (probably as a Document function, but perhaps as an extractor too).


Cool I’ll keep that in mind when writing stuff for the app i need to write.


Release v0.9.0


Prior to v0.9.0, errors in Meeseeks were all over the place- sometimes they returned {:error, string} or :error, sometimes they raised RuntimeErrors or ArgumentErrors or one of an assortment of custom Meeseeks exceptions.

To combat this, I have added a Meeseeks.Error struct that implements the Exception behaviour and used it throughout the library.

I go into more details about the rationale and implementation in this issue, but the quick takeaway is that this kind of error struct is flexible, plays nicely with pattern matching in places like case and with, and makes it easier to provide useful errors to users.

This is a breaking change because it modifies the returned or raised type of errors. If your Meeseeks-related code handles {:error, ???} or catches one of the old Meeseeks exception types, you will need to make changes.

I apologize for the inconvenience, but this change should lead to safer, more friendly code in the future.

Meeseeks.fetch_all and Meeseeks.fetch_one

The more that I use Elixir in anger, the more I appreciate functions that return {:ok, ...} or {:error, ...}.

In light of the feedback I received on this issue I decided to add Meeseeks.fetch_all and Meeseeks.fetch_one which work like Meeseeks.all and respectively, but wrap the result in {:ok, ...} if there is a match or return {:error, %Meeseeks.Error{type: :select, reason: :no_match}} if there is not.

Now it’s easier to write code like:

with({:ok, qt} <- Meeseeks.fetch_one(doc, css(".qt"))) do
  {:error, %Meeseeks.Error{type: :select, reason: :no_match}} ->

My thanks to those who provided feedback.


A bug related to Meeseeks.html was fixed, see this issue for more details


Just popping in to say that I’m still loving meeseeks. I think I’ve used every selector it has (and made one as well) for parsing both large amounts of html and xml both. ^.^


Release v0.9.1

A small release fixing a couple bugs and some typespec problems, primarily thanks to work by @asonge.

The first bug fix is that Document.get_nodes/1 now raises instead of adding a nil to the returned nodes if a node is - impossibly - not found in the document.

The second bug fix is that Document.get_nodes/2 now actually works right.


Release v0.9.2

Super tiny update to allow the css and xpath macros to accept vars.

iex> import Meeseeks.XPath
iex> path = "//li[last()]"
iex> xpath(path)

It is worth noting, however, that using a var (or string interpolation) in the css or xpath macros moves the creation of the selector to run time, while using a static string literal allows it to be created at compile time. If your use case permits, prefer xpath("//li[last()]").


Release v0.9.3

This release fixes a Dialyzer-related problem identified by @sztosz and correctly diagnosed by @NobbZ. Thanks for your help.


Release v0.9.4

This release fixes some XPath selection bugs discovered by anulman.


Release v0.9.5

This release fixes another selection bug, again discovered by anulman.


Lol, awesome finds by @anulman, great update relate as always, this explains why my bot got update notifications, thanks much!


Release v0.10.0

This release adds support for OTP 21.


Release v0.10.1

This is a very minor release adding “support” for Elixir 1.7. In truth it’s been working fine with 1.7 this whole time, but now Travis CI ensures that fact.

In addition to that I added a bunch of older Elixir+OTP combinations to also be tested by Travis CI. Meeseeks started out on Elixir 1.3 and OTP 19 (a combination on which it still runs fine, thanks to the awesome Elixir team), and rather than just testing that and the latest combination I now also test some past combination existing between those two.