XML Parsing, Building, and Validating

xml

#1

I have been looking at the available Elixir/Erlang libraries for working with XML and cannot seem to find something that can do everything Nokogiri (Ruby) does.

I have come down to sweet_xml for parsing xml and erlsom for validating xml. I could use xml_builder to build xml or even erlsom. I have not found a library other than erlsom that does all three- parsing, building and validating. I am also an Elixir newbie so I could be missing something.

What would you all suggest? Using a combination of libraries as mentioned above or is there something else out there I’m missing?


#2

I had the similar issue. I need to parse complicated XML into many structs with hierarchy (to provide good abstraction and functions on them), and also need to update XML (which is actually create new copy with changes, you know)

I’m using sweet_xml for parsing to hierarchy, but I think I have to deal with underlying erlang library anyway - probably xmerl format instead of erlsom, since xmerl looks like the built-in app in Erlang and probably be accepted by more libraries.

But yes, I miss Nokogiri style DOM manipulation and easy output (to_xml).


#3

@chulkilee

Does xmerl do xml schema validation? I briefly looked at it, but did not see that it did.


#4

Oh I don’t need xml validation so I didn’t check that out.

However quick google gives me xmerl_xsd which should except xmerl form :slight_smile:


#5

If you don’t need validation and the XML can all fit in memory then the Meeseeks library has a FANSTATIC query interface into XML that puts anything else I’ve used yet to shame. :slight_smile:


#6

So would that be even better than sweet_xml. I will still need validation and the ability to build xml, but I could just use xmerl for that part.


#7

I’ll keep looking into this and post back if I find some new info.


#8

Yes, Meeseeks is very nice! I recently used it for a small scraper and it was faster and easier to use than anything else I looked at.

Note that it’s fast because it uses a NIF to wrap the Rust library html5ever. I installed Rust (using asdf), added Meeseeks to my deps and it just worked. But it’s good to be aware of the extra dependency.


#9

This is also the case with Scrape and it actually suffers some compilation issues because of having to compile rust in a rather brittle way. The only way I could make it work was to actually go into the deps manually and run a cargo command.

Still good, though.


#10

How did you setup sweet_xml to parse XML into a hierarchy of structs?

I’m trying to get it using the @schema and i don’t know how to nest more complicated structs


#11

Check out https://github.com/chulkilee/ex_scems.

Basically I use SweetXml.transform_by/2 and then pass map into struct!/2.

I wrote it a while ago, so I may want to do it different way now though :slight_smile:


#12

Thanks!