Any well established methods of handling XML data from / to PostgreSQL xml columns using Ecto? How do Elixir veterans do it? If they do, of course… ![]()
Is there a way to just get the binary data for XML from postgres (that is, treat…
Any well established methods of handling XML data from / to PostgreSQL xml columns using Ecto? How do Elixir veterans do it? If they do, of course… ![]()
AFICT, nothing in the ecosystem today (postgrex nor extensions/peers) supports postgres xml types, functions, or queries.
For types, you can use custom Ecto.Types to handle serialization between PG and Elixir. For DDLs, I’d just write my migrations in straight SQL strings, executed by Repo.query! rather than a DSL.
For functions and queries, you’re going to want to use fragments. Essentially, escaping the pure data-modelling of Ecto to insert raw SQL into an otherwise safely Ecto-built query.
This is similar to what I do in my python projects that use XML columns, and it works fairly well. Our use-case is write-once query-occasionally so it’s fairly trivial: we rarely need to read the full document into memory once inserted, and never need to update it.
More integrated functionality would require more work. Perhaps a library calling out to be made?
As the efficiency maniac that I will always be, I’d use Erlang’s :xmerl even if I know no XML content would ever be more than 20KB.
But I am not aware of a library that directly adds a custom Ecto.Type for it, no.
Yeah, I was afraid that might be the answer. I looked around a bit but didn’t find anything. I guess the option is to somehow treat the column as text and then do some string parsing, etc. Although even on parsers/builders (especially the latter) choice seems to be limited. Either wrapper on :xmerl (sweetxml was it?) w/o schema support or sax parsers with other limitations. Nothing really full-fledged so far.
I’m don’t know how expansive the choice needs to be, really—parsing and building are pretty straightforward until you need SAX, and even then you just use a SAX parser if you’re memory constrained. I’ve built a few XML-handling things in Elixir just fine with sweet_xml and xml_builder. Usually end up with something like this as my main entrypoint, with application-specific helpers for common workflows, and submodules for specific domains (ex XML.SOAP).
XML moduledefmodule XML do
defmacro __using__(_options \\ []) do
quote do
import XML, only: [sigil_x: 1, sigil_x: 2]
end
end
@type document() :: SweetXml.doc()
@type element() :: SweetXml.xmlElement()
####
# XML builder functions
##
defdelegate document(elements), to: XmlBuilder
defdelegate document(name, attrs_or_content), to: XmlBuilder
defdelegate document(name, attrs, content), to: XmlBuilder
defdelegate element(name), to: XmlBuilder
defdelegate element(name, attrs), to: XmlBuilder
defdelegate element(name, attrs, content), to: XmlBuilder
####
# XML parser functions
##
defdelegate generate(any, options \\ []), to: XmlBuilder
defdelegate parse(document, options \\ []), to: SweetXml
defdelegate sigil_x(path, modifiers \\ []), to: SweetXml
defdelegate xpath(parent, spec, subspec \\ []), to: SweetXml
end
That really depends on what you’re doing with the XML. If you’re looking for feedback, share some details about the problem you’re trying to solve with us!
In short the case at hand is:
The columns are of xml type and I cannot change it. I can imagine reading and processing / parsing out data from strings. I can also imagine “shelling out” to xmllint or something to do the validations, etc. But it would be “nice to have” it all in some Elixir lib or so.
P. S. No, it is not SOAP I am doing there ![]()
Yup, in that case I’d make a custom Ecto.Type for serializing/deserializing the xml column between Elixir as binary, and look into erlxml the sdtlib’s xmerl_xsd for XSD validation. You could then consider sweet_xml if you want to use xpath to extract data, or just work with the stdlib’s erlxmlxmerl parsing. The xml_builder library should help you build responses and you can pass that back through xmerl_xsd for validation, and a properly implemented Ecto.Type should be capable of serializing resultant binaries back into postgres xml columns.
Truly, you are blessed. ![]()
Roger, thank you. I’ll have to sculpt something out of it. I didn’t say it’s not possible but that nothing I found is “full-fledged” as in “handles all both ways”. Things are scattered, and there are bits and pieces here and there. Even with your suggestions – where you surely have done these type of things before – it’s still three different libs, including an Erlang one. I’d dream of something like e. g. JSON.decode, JSON.encode, as XML.de/encode, augmented with XML.validate and XML Ecto type. Anyway - it is what it is.
Hehe… Yes, I know. Been there, seen that. Not in Elixir though ![]()
I didn’t say it’s not possible but that nothing I found is “full-fledged” as in “handles all both ways”. Things are scattered, and there are bits and pieces here and there – it’s still three different libs, including an Erlang one. I’d dream of something like e. g.
JSON.decode,JSON.encode, as XML.de/encode, augmented withXML.validateand XML Ecto type.
I agree it’s scattered, but in practice those bits and pieces are well battle-tested, so it becomes hard to rationalize a thin wrapper around them just for convenience’s sake (when you can be production ready with a handful of function calls, or defdelegates if you want to make your own holistic interface).
Not saying we shouldn’t make a nice convenient wrapper—see, for example, the widely popular ets package—just encouraging you to articulate where you could see such a library substantially adding value, enough to rationalize a new dependency.
That kind of shopping list is what motivates people to strike dirt and create a new library.
I myself have thought about it before, for example, but ultimately the only two compelling differentiating features I’ve been able to come up with would be an Ecto.Type and streamlining SOAP handling specifically, both of which work just fine as their own stand-alone libraries and do not benefit substantially from being tightly integrated with the other bits and pieces of validation, streaming, building, etc.
The other attractive proposition for a holistic wrapper library would be to improve upon the Elixir ergonomics of working with the erlang libraries’ records. However, since polymorphism isn’t particularly useful in this domain outside of tapping into existing protocols—specifically Inspect and Enumerable—I’ve found that motivation to be sufficiently lacking to build out a library for that purpose alone.
One day something may tip that scale, and conversations like these are what cause me to mix new in my ~/code/oss/elixir folder more than anything.
I am joining @christhekeele here – f.ex. using :xmerl is maddening because it has a nice low-level streaming interface (among others!) because on the one hand, 95% of the code is exactly the same… but the other 5% vary every frakkin time. This really makes it very difficult to extract out libraries. I entertained the idea but ultimately concluded it would add too little value for what would basically be just one instance of a SAX parser with the ability to pass “callbacks” to it… something everyone can roll in literal one hour maximum.
FWIW - in relation to working with PostgreSQL XML columns’ records - this comment is a “saviour” ![]()
Is there a way to just get the binary data for XML from postgres (that is, treat…
I’m about to deal with XML in my postgres driver, currently it just returns as a string automatically, but I wonder if you preferred it to be structured.
I wonder if you preferred it to be structured
I am not 100% sure I understand correctly, but in the end XML being string is of little value. It’s only a string representation of some structure so at some point I always need it “structured” in the sense I wrote above some time ago:
- I need to retrieve it and validate it against XSD schema
- IF it is valid given the schema then
- decompose (parse) it into Elixir terms
- process some data from the parsed document
- […]
Yeah, I think I miss understood your comment and the issue.
I understod the issue to just wanting the xml to be decoded as a string, there is not a lot of value in that if you want to do further validation in your application code, that was what was what I was trying to refer to with structured, where I’m thinking Elixir terms. It could be an XML AST.
The post about “saviour” comment on github was about the fact that even if you create a custom Ecto type, you still cannot access XML data (even as string) from PostgreSQL column of XML type as you’ll get “type ‘xml’ can not be handled” errors. The solution is to add Postgrex.Extension implementation that treats it as text I presume. Not something exceptionally elegant I’d say (if I understood all of it)
And - TBH - the quote of the author there:
I don’t grok some of the quote stuff yet or matching on the encoded binaries, so I’m not sure if this is right, but it works
applies directly to me too ![]()
IOW - it was about getting the XML data out of the column “at all” and hence as string.