Meeseeks - A library for extracting data from HTML and XML with CSS or XPath selectors

meeseeks
Tags: #<Tag:0x00007f039b62afc0>

#41

Release v0.8.0

This release:

  • Ensures Elixir 1.6 compatibility
  • Adds a .formatters.exs and formats the project
  • Fixes some typespec errors, thanks to @OvermindDL1 for raising the issue
  • Adds Document.delete_node/2, thanks to @willbarrett for the contribution
  • Adds get_root_ids/1, get_node_ids/1, and fetch_node/2 to Document
  • Improves the safety of many Document functions by raising when node_id does not exist in the Document (before they might have raised or might have handled the problem gracefully)

Special thanks to Will, who becomes the first code contributor other than myself.


#42

Looking for feedback as to whether I should add Meeseeks.fetch_one and Meeseeks.fetch_all, and what the :error value should look like if I do.


#43

@mischov Is there an easy way, to get an xpath of a found node based on the node_id of that node? I simply need to get a path for a given text and get element from that path from another page to check if they match. All the info I need is there, but to be honest I don’t want to reinvent the wheel :wink:

Though I was unable to find any such functionality. So if there is no such functionality I’m curious if you would accept PR for such functionality, getting xpath based on id of a node. I’m not saying I will find a time to write it soon, just exploring options here :wink:


#44

Ha! That’s an interesting one!

No, that functionality doesn’t exist in Meeseeks.

I would be open to a PR for that functionality (probably as a Document function, but perhaps as an extractor too).


#45

Cool I’ll keep that in mind when writing stuff for the app i need to write.


#46

Release v0.9.0

Meeseeks.Error

Prior to v0.9.0, errors in Meeseeks were all over the place- sometimes they returned {:error, string} or :error, sometimes they raised RuntimeErrors or ArgumentErrors or one of an assortment of custom Meeseeks exceptions.

To combat this, I have added a Meeseeks.Error struct that implements the Exception behaviour and used it throughout the library.

I go into more details about the rationale and implementation in this issue, but the quick takeaway is that this kind of error struct is flexible, plays nicely with pattern matching in places like case and with, and makes it easier to provide useful errors to users.

This is a breaking change because it modifies the returned or raised type of errors. If your Meeseeks-related code handles {:error, ???} or catches one of the old Meeseeks exception types, you will need to make changes.

I apologize for the inconvenience, but this change should lead to safer, more friendly code in the future.

Meeseeks.fetch_all and Meeseeks.fetch_one

The more that I use Elixir in anger, the more I appreciate functions that return {:ok, ...} or {:error, ...}.

In light of the feedback I received on this issue I decided to add Meeseeks.fetch_all and Meeseeks.fetch_one which work like Meeseeks.all and Meeseeks.one respectively, but wrap the result in {:ok, ...} if there is a match or return {:error, %Meeseeks.Error{type: :select, reason: :no_match}} if there is not.

Now it’s easier to write code like:

with({:ok, qt} <- Meeseeks.fetch_one(doc, css(".qt"))) do
  ...
else
  {:error, %Meeseeks.Error{type: :select, reason: :no_match}} ->
    ...
end

My thanks to those who provided feedback.

Other

A bug related to Meeseeks.html was fixed, see this issue for more details


#47

Just popping in to say that I’m still loving meeseeks. I think I’ve used every selector it has (and made one as well) for parsing both large amounts of html and xml both. ^.^