Saxaboom - a port of SaxMachine from Ruby for mapping XML to elixir data types

ducharmemp · October 9, 2023, 4:59pm

Hi all!

I’m still fairly new to elixir (mostly Ruby/C#/Rust in my background) and this is my first stab at a semi-useful library so feedback would be much appreciated!

I created a declarative library for creating data mappers that consume XML via strings or streams and return well-defined elixir structs, optionally with default values, attribute extraction, text extraction, and type casting. It’s also fairly fast at what it does based on my benchmarks on some large-ish XML documents. If you’ve used SaxMachine with Ruby it’s almost a 1:1 port to see if I could define a similar interface, since I really like the semantics of the SaxMachine DSL. The general gist is that you describe your expected structure and the library extracts out that information for you without having to specify XPaths or write Sax handlers yourself. This library also tries to be compatible with 3 (currently) XML parsers that support SAX-- erlsom, xmerl_sax_parser, and saxy.

Repo: GitHub - ducharmemp/saxaboom
Hex: saxaboom | Hex

Example:

You can turn:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
</catalog>

Into:

{:ok,
 %Catalog{
   books: [
     %Book{
       id: "bk101",
       author: "Gambardella, Matthew",
       title: "XML Developer's Guide",
       genre: "Computer",
       price: 44.95,
       publish_date: {:ok, ~D[2000-10-01]},
       description: "An in-depth look at creating applications\n      with XML."
     }
   ]
 }}

with the following Saxaboom definition:

defmodule Book do
  use Saxaboom.Mapper

  document do
    element :book, as: :id, value: :id
    element :author
    element :title
    element :genre
    element :price, cast: :float
    element :publish_date, cast: &__MODULE__.parse_date/1
    element :description
  end

  def parse_date(value), do: Date.from_iso8601(value)
end


defmodule Catalog do
  use Saxaboom.Mapper

  document do
    elements :book, as: :books, into: %Book{}
  end
end

cblavier · October 9, 2023, 6:05pm

I don’t know anything about your use case, I’m just stopping by to say I love your project name

Good luck with your project!

ducharmemp · October 9, 2023, 8:43pm

Appreciate it very much! I updated the description because on reflection I did realize that I was a bit lacking on the “what the heck does this do” part of the library. Hopefully my examples clear it up for folks!

re: the name, I’m actually blown away that I was able to grab it

ducharmemp · October 16, 2023, 4:39pm

Happy to announce that v0.2.1 has been released!

hex.pm
docs

This release introduces a new attribute directive for extracting attributes from document nodes. This macro is helpful to extract multiple attributes from a single node, while the element/2 and elements/2 macros only allow for the extraction of a single attribute from a node.

This release also includes improvements to documentation to clarify parsing/data mapping semantics.

Adzz · October 16, 2023, 5:28pm

Nice, I made a very similar thing before too:

Always interesting to see how others approach the same problem

dimitarvp · October 16, 2023, 6:23pm

I love it how your latest commit message is just sure.

Adzz · October 16, 2023, 6:48pm

it be like that sometimes…

ducharmemp · October 17, 2023, 12:36am

Definitely very similar, nice looking library! I actually read through your blog post on App signal, I think? I think that saxaboom can fill a specific niche that you called out in there, where you don’t want to hold the whole dom/document in memory at once. Technically speaking Saxaboom does single shot parsing with support for streams, with no ability to access contextual information (as in, no ability to query siblings) so whatever doesn’t match up against a parae can be easily thrown away.

That said, the more libs the merrier I think, I’ll have to read through the data schema code because I definitely agree with respect to the overall similarities! Plus it was a fun lib to tinker with

Adzz · October 17, 2023, 1:54am

oh absolutely, and you never really learn anything until you try it for yourself. Data schema probably needs a refactor so apologies in advance for anything you see there haha.

And you are correcct data schema entirely sidesteps the process of parsing XML, it really picks up once you have created a DOM. (This is mostly an accident of how the library got developed, and for flexibility on the input data).

In a past job I wrote some proprietary code that hooked into Saxy such that you would ignore all events for nodes in the XML that were not definied in a given schema. I need to re-write and open source it really because it can help if the document is large but you only need a few things from it. (I’m just not sure how often that is the case). But I agree that if you know you are using Saxxy and XML then casting the data as you create the DOM is nice