XML Parsing, Building, and Validating

I have been looking at the available Elixir/Erlang libraries for working with XML and cannot seem to find something that can do everything Nokogiri (Ruby) does.

I have come down to sweet_xml for parsing xml and erlsom for validating xml. I could use xml_builder to build xml or even erlsom. I have not found a library other than erlsom that does all three- parsing, building and validating. I am also an Elixir newbie so I could be missing something.

What would you all suggest? Using a combination of libraries as mentioned above or is there something else out there I’m missing?

1 Like

I had the similar issue. I need to parse complicated XML into many structs with hierarchy (to provide good abstraction and functions on them), and also need to update XML (which is actually create new copy with changes, you know)

I’m using sweet_xml for parsing to hierarchy, but I think I have to deal with underlying erlang library anyway - probably xmerl format instead of erlsom, since xmerl looks like the built-in app in Erlang and probably be accepted by more libraries.

But yes, I miss Nokogiri style DOM manipulation and easy output (to_xml).

1 Like

@chulkilee

Does xmerl do xml schema validation? I briefly looked at it, but did not see that it did.

Oh I don’t need xml validation so I didn’t check that out.

However quick google gives me xmerl_xsd which should except xmerl form :slight_smile:

If you don’t need validation and the XML can all fit in memory then the Meeseeks library has a FANSTATIC query interface into XML that puts anything else I’ve used yet to shame. :slight_smile:

4 Likes

So would that be even better than sweet_xml. I will still need validation and the ability to build xml, but I could just use xmerl for that part.

1 Like

I’ll keep looking into this and post back if I find some new info.

1 Like

Yes, Meeseeks is very nice! I recently used it for a small scraper and it was faster and easier to use than anything else I looked at.

Note that it’s fast because it uses a NIF to wrap the Rust library html5ever. I installed Rust (using asdf), added Meeseeks to my deps and it just worked. But it’s good to be aware of the extra dependency.

5 Likes

This is also the case with Scrape and it actually suffers some compilation issues because of having to compile rust in a rather brittle way. The only way I could make it work was to actually go into the deps manually and run a cargo command.

Still good, though.

How did you setup sweet_xml to parse XML into a hierarchy of structs?

I’m trying to get it using the @schema and i don’t know how to nest more complicated structs

Check out https://github.com/chulkilee/ex_scems.

Basically I use SweetXml.transform_by/2 and then pass map into struct!/2.

I wrote it a while ago, so I may want to do it different way now though :slight_smile:

2 Likes

Thanks!

So… it’s been some time since the last post here :slight_smile: and I have some XMLs to process both directions, with XSD support, and I have problem with external resources there. The XSD uses types definitions referenced as external URL (https:// …) when trying to process the XSD (using :xmerl_xsd.process_schema/1) I get :enoent error related to those external resources.

Is there a way to make xmerl[_xsd] fetch them as needed?

I suspect you may need to supply your own fetch_fun option to xmerl_xsd - the default one only matches http: URIs:

Thank you, although I am wondering whether that’s feasible. I mean w/o reimplementing the thing. Is there a place where I could simply provide the fun?

Most public functions in that file take an optional options_list() argument, which can include a fetch_fun:

1 Like

TNX! Shall explore this path