To have some context let’s use you example XML
from documentation:
Example file
xml = """
<?xml encoding="utf-8" ?>
<a someattr="blue" otherattr="red">
<x>hill</x>
<y>false</y>
<z>
<a>tree</a>
<b>bush</b>
</z>
<j>
<q>cat</q>
</j>
<j>
<q>dog</q>
</j>
<g>hippo</g>
<g>elephant</g>
<g>rhino</g>
</a>
"""
First of all your code is inspired by ecto
, but have a different naming and way too many logic is in one file.
Updated example schema definition
defmodule Example do
use YourLibName.Schema
schema "a" do
field :x, :string
field :y, :boolean
embeds_one :z, Z do
field :a, :string
field :b, :string
end
embeds_many :j, J do
field :q, :string
end
field :g, :sring
end
end
While I understand that using the existing ecto
schema in some cases may be even impossible, I still recommend to support such schema, so in some cases developers could use an existing schemas and define their own ones only when needed.
Unfortunately _attributes
tag name is correct, so even if it’s an edge case we still support it. Therefore it’s much easier to deal with attrs
and contents
fully separately i.e. we should use a map with such 2 keys.
Aggregate is really helpful, but not always desired. Regardless of what’s your defaults (if any) I would recommend to add an option to disable or enable it. This however prevents us from generating maps. Since we have a Keyword
lists it’s really not a big deal and also it allows us to preserve the order which is really important in few cases especially when we want to re-encode said xml
document.
Same goes for working with whitespace characters. I even gave a real world example for floki
library in the Floki removes blank text nodes without option to avoid this #75 issue.
Here are some examples I have prepared:
YourLibName.decode!(xml, aggregate_adjacent_siblings: false, skip_empty_text_nodes: false)
%Example{
__meta__: %LibName.Schema.Metadata{schema: Example, source: "inline"}
attrs: %{"otherattr" => "red", "someattr" => "blue"},
children: [
_: "\n ",
x: "hill",
_: "\n ",
y: false,
_: "\n ",
z: %Example.Z{
__meta__: %LibName.Schema.Metadata{schema: Example.Z, source: "inline"}
attrs: [],
children: [_: "\n ", a: "tree", _: "\n ", b: "bush", _: "\n "]
},
_: "\n ",
j: [
_: "\n ",
%Example.J{
__meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"}
attrs: [],
children: [q: "cat",],
},
_: "\n ",
%Example.J{
__meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"}
attrs: [],
children: [q: "dog",],
},
_: "\n "
],
_: "\n ",
g: "hippo",
_: "\n ",
g: "elephant",
_: "\n ",
g: "rhino",
_: "\n"
]
}
YourLibName.decode!(xml, aggregate_adjacent_siblings: false, skip_empty_text_nodes: true)
%Example{
__meta__: %LibName.Schema.Metadata{schema: Example, source: "inline"},
attrs: %{"otherattr" => "red", "someattr" => "blue"},
children: [
x: "hill",
y: false,
z: %Example.Z{
__meta__: %LibName.Schema.Metadata{schema: Example.Z, source: "inline"},
attrs: [],
children: [a: "tree", b: "bush"]
},
j: [
%Example.J{
__meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"},
attrs: [],
children: [q: "cat"]
},
%Example.J{
__meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"},
attrs: [],
children: [q: "dog"]
},
],
g: "hippo",
g: "elephant",
g: "rhino"
]
}
YourLibName.decode!(xml, aggregate_adjacent_siblings: true, skip_empty_text_nodes: false)
%Example{
__meta__: %LibName.Schema.Metadata{schema: Example, source: "inline"}
attrs: %{"otherattr" => "red", "someattr" => "blue"},
children: [
_: "\n ",
x: "hill",
_: "\n ",
y: false,
_: "\n ",
z: %Example.Z{
__meta__: %LibName.Schema.Metadata{schema: Example.Z, source: "inline"}
attrs: [],
children: [_: "\n ", a: "tree", _: "\n ", b: "bush", _: "\n "]
},
_: "\n ",
j: [
_: "\n ",
%Example.J{
__meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"}
attrs: [],
children: [q: "cat",],
},
_: "\n ",
%Example.J{
__meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"}
attrs: [],
children: [q: "dog",],
},
_: "\n "
],
_: "\n ",
g: "hippo",
_: "\n ",
g: "elephant",
_: "\n ",
g: "rhino",
_: "\n"
]
}
YourLibName.decode!(xml, aggregate_adjacent_siblings: true, skip_empty_text_nodes: true)
%Example{
__meta__: %LibName.Schema.Metadata{schema: Example, source: "inline"},
attrs: %{"otherattr" => "red", "someattr" => "blue"},
children: [
x: "hill",
y: false,
z: %Example.Z{
__meta__: %LibName.Schema.Metadata{schema: Example.Z, source: "inline"},
attrs: [],
children: [a: "tree", b: "bush"]
},
j: [
%Example.J{
__meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"},
attrs: [],
children: [q: "cat"]
},
%Example.J{
__meta__: %LibName.Schema.Metadata{schema: Example.J, source: "inline"},
attrs: [],
children: [q: "dog"]
},
],
g: ["hippo", "elephant", "rhino"]
]
}
Those are simplest examples, but there are other things to cover:
-
What to do when said xml
document have tags we don’t declare in schema
(because for example we don’t need a data from them). You can add strict_document_structure
boolean option, so in let’s say updated documents for some standard (like newer XML
API).
-
How to properly deal with attributes. What if we expect some url which is supposed to be in some attribute? Should strict_document_structure
be also used for attributes? You definitely need some attr
DSL.
-
There is no information about comments. Since we can encode xml
back we most probably want to have said document properly updated without anything missing.
Finally some ideas/questions about your code:
-
The links in documentation does not works. ex_doc
fallbacks to default branch which is main
. They should point to a specific version (like within a git’s `tag).
-
I have no idea about Erlang
’s XML parsers. It’s obvious why you didn’t wrote your own, but why did you choose erlsom over Erlang’s xmerl
?
-
The above code can be written much simpler: module |> Module.split() |> List.last()

-
Every public function should be documented. Many developers may give up at this point, some may try to check links, but oh, we’re back in 1st point
-
mix format
is your friend. If you are still lonely credo
is another one. Even if you want to do everything yourself then he have even it’s own style guide
-
support
directory name is not bad, but more common for your case is fixtures
. The first one is general and when developer see it then first thing coming to mind is phoenix stuff
. fixtures
is more explicit naming.
-
File.read
calls are not best if you can do that in compile-time.
for name <- ~w[first second third] do
path = Path.join([__DIR__, "fixtures", name <> ".xml")
xml = File.read!(path)
def get_xml(unquote(name)), do: unquote(xml)
end
- You can have both your
fixtures
and xml
in same directory or even in same file. It’s even better than extra File.read!/1
call:
defmodule MyAppTest.MyFixture do
# DSL comes here
def get_xml do
"""
<?xml version="1.0" encoding="UTF-8" ?>
<!-- Employee Information-->
"""
end
end
- You can extent the idea above and add a function with expected data i.e. output of
XML
document parsing. Therefore 99% of your tests looks like:
defmodule MyAppTest do
use ExUnit.Case
alias MyAppTest.Fixtures
for fixture <- [Fixtures.First, Fixtures.Second, Fixtures.Third] do
test "parses #{inspect(unquote(fixture))}" do
fixture = unquote(fixture)
xml = fixture.get_xml()
assert parse(xml) == fixture.get_expected_data()
end
end
# the rest are
# edge cases
# error handling
# and so on …
end
- Search for inspiration. What I written above does not comes from my mind. Both metaprogramming and naming are well covered in
floki
, jason
, ecto
and elixir
documentations.