Modifying an XML doc - parsing and writing back out to file

I would like to modify an XML file; meaning parsing and writing back out to file.

Lets make it simple: i want to uppercase all b elements with a name attribute.

ORIGINAL

<a>
   <b name="sam">text</b>
   <c name="sal">text</b>
   <b title="bob">text</b>
</a>

UPDATED

<a>
   <b name="SAM">text</b>
   <c name="sal">text</b>
   <b title="bob">text</b>
</a>

ive spent a reasonable amount of time searching for examples of reading and writing, thus updating an XML string/file but just can not seem to find anything.

sweet_xml has been great, but i cant see how to use it to modify and write an xml string (file).

What am I missing in the elixir ecosystem to read/transform/write XML?

thx!! << q

1 Like

I’d recommend looking at saxy, which is both a parser and encoder that should enable what you’re looking for.

1 Like

thx! any links to examples?

ok, to answer my own question, here is what i have for my simple example; i hope it works with minimal changes on a complex one (with UTF chars in it too!)

Mix.install([
  {:saxy, "~> 1.4.0"}
])

defmodule ExampleHandler do
  @behaviour Saxy.Handler

  def handle_event(:start_element, {"b", [{"name", name}]}, state) do
    {:ok, state <> ~s|<b name="#{String.upcase(name)}">|}
  end

  def handle_event(:start_element, {tag, attrs}, state) do
    attrs = attrs
    |> Enum.map(fn {key, value} -> ~s|#{key}="#{value}"| end)
    |> Enum.join()

    {:ok, state <> "<#{tag} #{attrs}>"}
  end

  def handle_event(:end_element, tag, state) do
    {:ok, state <> "</#{tag}>"}
  end

  def handle_event(:characters, cdata, state) do
    {:ok, state <> cdata}
  end

  def handle_event(_, _, state), do: {:ok, state}
end

[xmlfile|_] = System.argv()

IO.puts("Processing #{xmlfile}")

{:ok, result} =
  Saxy.parse_stream(File.stream!(xmlfile), ExampleHandler, "")

result
|> IO.puts

execution:

$ cat example.xml        
<a>
   <b name="sam">text</b>
   <c name="sal">text</c>
   <b title="bob">text</b>
</a>

$ elixir xml.exs example.xml
Processing example.xml
<a >
   <b name="SAM">text</b>
   <c name="sal">text</c>
   <b title="bob">text</b>
</a>

Awesome! Glad you got it figured out.

A minor suggestion: use IO data instead of concatenating strings directly. IO data is a sort of composite data type meant for this exact use-case, where the result is modeled as an arbitrarily nested list of strings/characters/etc. In the example below, I’m using IO.chardata_to_string/1, but if you’re just writing it back out to a file and have no use for further string processing, you can actually pass the chardata directly to most (all?) IO functions!

For a super small example the runtime will be essentially the same, but using IO data will definitely be faster for large files (and I think the resulting code is a bit cleaner).

Mix.install([
  {:saxy, "~> 1.4.0"}
])

defmodule ExampleHandler do
  @behaviour Saxy.Handler

  def parse_stream!(xml_stream) do
    {:ok, rev_chardata} = Saxy.parse_stream(xml_stream, __MODULE__, [])

    rev_chardata
    |> Enum.reverse()
    |> IO.chardata_to_string()
  end

  def build(:open, tag, attrs) do
    encoded_attrs = Enum.map(attrs, fn {name, val} -> [" ", name, "=\"", val, "\""] end)
    ["<", tag, encoded_attrs, ">"]
  end

  def build(:close, tag) do
    ["</", tag, ">"]
  end

  def handle_event(:start_element, {"b", [{"name", name}]}, state) do
    {:ok, [build(:open, "b", [{"name", String.upcase(name)}]) | state]}
  end

  def handle_event(:start_element, {tag, attrs}, state) do
    {:ok, [build(:open, tag, attrs) | state]}
  end

  def handle_event(:end_element, tag, state) do
    {:ok, [build(:close, tag) | state]}
  end

  def handle_event(:characters, cdata, state) do
    {:ok, [cdata | state]}
  end

  def handle_event(_, _, state), do: {:ok, state}
end

[xmlfile | _] = System.argv()

IO.puts("Processing #{xmlfile}")

ExampleHandler.parse_stream!(File.stream!(xmlfile))
|> IO.puts()

Works fine with unicode too =)

> elixir saxy_example.exs example.xml
<a>
   <b name="SAM">π</b>
   <c name="sal">text</c>
   <b title="bob">text</b>
</a>
2 Likes

Don’t make XML by gluing together strings - it’s too easy to create bugs.

For instance, ExampleHandler will produce invalid XML when given this document:

<something>
  <b name="foo">blargh</b>
  <b name="foo&quot;bar">baz</b>
  <z nothing="nope" />
</something>

the output fails to re-escape the double-quote character in the second element:

# result from ExampleHandler
<something >
  <b name="FOO">blargh</b>
  <b name="FOO"BAR">baz</b>
  <z nothing="nope"></z>
</something>

(it also adds stray blanks to the end of tags like something and swaps z to the other format, but IIRC both of those aren’t semantic changes)

1 Like

You can use something like XmlBuilder as well. Here’s how it handles escaping.

Edit: also relevant StackOverflow about XML escaping requirements.

1 Like

thx everyone and awesome community!

:grin:

I’m trying to do something similar, I was wondering what your final solution for this problem looks like?