XmlSchema - Parse and generate XML with a schema DSL

New! Parse (and generate) XML with a DSL that is built on top of Ecto.Schema. Makes handling XML easy if you want structs out of XML input.

On hex: XmlSchema

hexdocs: XmlSchema

An example:

defmodule Simple do
  use XmlSchema, xml_name: "a"
  xml do
    xml_tag :x, :string
    xml_tag :y, :boolean
    xml_one :z, Z do
      xml_tag :a, :string
      xml_tag :b, :string
    end
    xml_many :j, J do
      xml_tag :q, :string
    end
    xml_tag :g, {:array, :string}
  end
end

This example is illustrated in the module doc

3 Likes

To have some context let’s use you example XML from documentation:

Example file
xml = """
<?xml encoding="utf-8" ?>
<a someattr="blue" otherattr="red">
  <x>hill</x>
  <y>false</y>
  <z>
    <a>tree</a>
    <b>bush</b>
  </z>
  <j>
    <q>cat</q>
  </j>
  <j>
    <q>dog</q>
  </j>
  <g>hippo</g>
  <g>elephant</g>
  <g>rhino</g>
</a>
"""

First of all your code is inspired by ecto, but have a different naming and way too many logic is in one file.

Updated example schema definition
defmodule Example do
  use YourLibName.Schema

  schema "a" do
    field :x, :string
    field :y, :boolean

    embeds_one :z, Z do
      field :a, :string
      field :b, :string
    end

    embeds_many :j, J do
      field :q, :string
    end

    field :g, :sring
  end
end

While I understand that using the existing ecto schema in some cases may be even impossible, I still recommend to support such schema, so in some cases developers could use an existing schemas and define their own ones only when needed.

Unfortunately _attributes tag name is correct, so even if it’s an edge case we still support it. Therefore it’s much easier to deal with attrs and contents fully separately i.e. we should use a map with such 2 keys.

Aggregate is really helpful, but not always desired. Regardless of what’s your defaults (if any) I would recommend to add an option to disable or enable it. This however prevents us from generating maps. Since we have a Keyword lists it’s really not a big deal and also it allows us to preserve the order which is really important in few cases especially when we want to re-encode said xml document.

Same goes for working with whitespace characters. I even gave a real world example for floki library in the Floki removes blank text nodes without option to avoid this #75 issue.

Here are some examples I have prepared:

YourLibName.decode!(xml, aggregate_adjacent_siblings: false, skip_empty_text_nodes: false)
%Example{
  __meta__: %LibName.Schema.Metadata{schema: Example, source: "inline"}
  attrs: %{"otherattr" => "red", "someattr" => "blue"},
  children: [
    _: "\n  ",
    x: "hill",
    _: "\n  ",
    y: false,
    _: "\n  ",
    z: %Example.Z{
      __meta__: %LibName.Schema.Metadata{schema: Example.Z, source: "inline"}
      attrs: [],
      children: [_: "\n    ", a: "tree", _: "\n    ", b: "bush", _: "\n  "]
    },
    _: "\n  ",
    j: [
      _: "\n    ",
      %Example.J{
        __meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"}
        attrs: [],
        children: [q: "cat",],
      },
      _: "\n    ",
      %Example.J{
        __meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"}
        attrs: [],
        children: [q: "dog",],
      },
      _: "\n  "
    ],
    _: "\n  ",
    g: "hippo",
    _: "\n  ",
    g: "elephant",
    _: "\n  ",
    g: "rhino",
    _: "\n"
  ]
}
YourLibName.decode!(xml, aggregate_adjacent_siblings: false, skip_empty_text_nodes: true)
%Example{
  __meta__: %LibName.Schema.Metadata{schema: Example, source: "inline"},
  attrs: %{"otherattr" => "red", "someattr" => "blue"},
  children: [
    x: "hill",
    y: false,
    z: %Example.Z{
      __meta__: %LibName.Schema.Metadata{schema: Example.Z, source: "inline"},
      attrs: [],
      children: [a: "tree", b: "bush"]
    },
    j: [
      %Example.J{
        __meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"},
        attrs: [],
        children: [q: "cat"]
      },
      %Example.J{
        __meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"},
        attrs: [],
        children: [q: "dog"]
      },
    ],
    g: "hippo",
    g: "elephant",
    g: "rhino"
  ]
}
YourLibName.decode!(xml, aggregate_adjacent_siblings: true, skip_empty_text_nodes: false)
%Example{
  __meta__: %LibName.Schema.Metadata{schema: Example, source: "inline"}
  attrs: %{"otherattr" => "red", "someattr" => "blue"},
  children: [
    _: "\n  ",
    x: "hill",
    _: "\n  ",
    y: false,
    _: "\n  ",
    z: %Example.Z{
      __meta__: %LibName.Schema.Metadata{schema: Example.Z, source: "inline"}
      attrs: [],
      children: [_: "\n    ", a: "tree", _: "\n    ", b: "bush", _: "\n  "]
    },
    _: "\n  ",
    j: [
      _: "\n    ",
      %Example.J{
        __meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"}
        attrs: [],
        children: [q: "cat",],
      },
      _: "\n    ",
      %Example.J{
        __meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"}
        attrs: [],
        children: [q: "dog",],
      },
      _: "\n  "
    ],
    _: "\n  ",
    g: "hippo",
    _: "\n  ",
    g: "elephant",
    _: "\n  ",
    g: "rhino",
    _: "\n"
  ]
}
YourLibName.decode!(xml, aggregate_adjacent_siblings: true, skip_empty_text_nodes: true)
%Example{
  __meta__: %LibName.Schema.Metadata{schema: Example, source: "inline"},
  attrs: %{"otherattr" => "red", "someattr" => "blue"},
  children: [
    x: "hill",
    y: false,
    z: %Example.Z{
      __meta__: %LibName.Schema.Metadata{schema: Example.Z, source: "inline"},
      attrs: [],
      children: [a: "tree", b: "bush"]
    },
    j: [
      %Example.J{
        __meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"},
        attrs: [],
        children: [q: "cat"]
      },
      %Example.J{
        __meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"},
        attrs: [],
        children: [q: "dog"]
      },
    ],
    g: ["hippo", "elephant", "rhino"]
  ]
}

Those are simplest examples, but there are other things to cover:

  1. What to do when said xml document have tags we don’t declare in schema (because for example we don’t need a data from them). You can add strict_document_structure boolean option, so in let’s say updated documents for some standard (like newer XML API).

  2. How to properly deal with attributes. What if we expect some url which is supposed to be in some attribute? Should strict_document_structure be also used for attributes? You definitely need some attr DSL.

  3. There is no information about comments. Since we can encode xml back we most probably want to have said document properly updated without anything missing.

Finally some ideas/questions about your code:

  1. The links in documentation does not works. ex_doc fallbacks to default branch which is main. They should point to a specific version (like within a git’s `tag).

  2. I have no idea about Erlang’s XML parsers. It’s obvious why you didn’t wrote your own, but why did you choose erlsom over Erlang’s xmerl?

  1. The above code can be written much simpler: module |> Module.split() |> List.last() :wink:

  2. Every public function should be documented. Many developers may give up at this point, some may try to check links, but oh, we’re back in 1st point

  3. mix format is your friend. If you are still lonely credo is another one. Even if you want to do everything yourself then he have even it’s own style guide

  4. support directory name is not bad, but more common for your case is fixtures. The first one is general and when developer see it then first thing coming to mind is phoenix stuff. fixtures is more explicit naming.

  5. File.read calls are not best if you can do that in compile-time.

for name <- ~w[first second third] do
  path = Path.join([__DIR__, "fixtures", name <> ".xml")
  xml = File.read!(path)
  def get_xml(unquote(name)), do: unquote(xml)
end
  1. You can have both your fixtures and xml in same directory or even in same file. It’s even better than extra File.read!/1 call:
defmodule MyAppTest.MyFixture do
  # DSL comes here

  def get_xml do
    """
    <?xml version="1.0" encoding="UTF-8" ?>
    <!-- Employee Information-->
    """
  end
end
  1. You can extent the idea above and add a function with expected data i.e. output of XML document parsing. Therefore 99% of your tests looks like:
defmodule MyAppTest do
  use ExUnit.Case

  alias MyAppTest.Fixtures

  for fixture <- [Fixtures.First, Fixtures.Second, Fixtures.Third] do
    test "parses #{inspect(unquote(fixture))}" do
      fixture = unquote(fixture)
      xml = fixture.get_xml()
      assert parse(xml) == fixture.get_expected_data()
    end
  end

  # the rest are
  # edge cases
  # error handling
  # and so on …
end
  1. Search for inspiration. What I written above does not comes from my mind. Both metaprogramming and naming are well covered in floki, jason, ecto and elixir documentations.
1 Like

Released to hex, Version 1.3, improved docs, some attribute handling fixes, better generation support for custom types and arrays, more tests and generation of document examples from tests.

Update to 1.3.0 with improved docs and some refactoring. Xml can be easy! But, it isn’t, this only makes it easier.

1 Like