I ran into this issue of atom exhaustion vulnerability recently. In my case, this would have been a major issue because I was looking to parse SAML assertions received via a publicly accessible endpoint.
I ended up going a similar route as @Adzz and others by using saxy and its SimpleForm output.
To help avoid this repeated pain for others, I’ve created the following library, which avoids this atom exhaustion problem and includes the ability to verify XML signatures.
I welcome any community contributions or feedback .
Just wanted to add to this old discussion for anybody coming across it on their search for XML tooling.
I have the case where XPath (including XML Namespaces) is a must (because I already have lots of them, which I am merely porting), and accessing attributes with XPath is also a must (which Meeseeks refuses to do, according to its documentation).
I found a young rustler NIF binding by @jgwmaxwell (kudos!) here, which up to now seems to tick all my boxes: expath | Hex
It’s young and 0.2, so might not sound production-ready, but on the other hand, it’s just a rustler binding to a mature library (even though that one is also not 1.0 yet, but that’s quite common in Rust, I believe) - judge for yourself.
In case you are interested, I went the route of compiling our xsd files with :erlsom (which generates atoms, but we are in control of those files) (at compile-time!), parsing the (untrusted) XML with the result (which feels extremely fast) - which never creates new atoms - and only then, knowing the XML ist good, feeding it into the (otherwise unsafe regarding atom creation) :xmerl for XPath evaluations (which support context nodes, other than Expath).
Of course, the idea of an upfront Schema validation before using an arbitrary library only works for predefined shapes of XML data.
P.S.: And you should make sure that your XML Schema Definitions don’t allow xs:any, or at least only strictly-checked; the same for attributes.