What is your best trick to pretty-print a XML string with Elixir or Erlang?

Hi!

I am creating a LiveBook notebook which deals with XML responses, and would like to display an extract of it in indented form, for clarity.

My favorite way to pretty-print XML outside the Elixir world is xmllint --format file.xml, something that is robust, not likely to disturb the output by introducing troubles etc.

I am looking for a robust but more “built-in” solution to convert a XML binary into an indented binary.

What is your preferred trick for this?

Thanks!

– Thibaut

Floki also works with XML. You can do parse_document then raw_html
https://hexdocs.pm/floki/Floki.html#raw_html/2

5 Likes

It is a good trick, thanks! The code is:

IO.puts body
|> Floki.parse_document!()
|> Floki.raw_html(pretty: true)

There is a drawback though. It will indent a bit too much for what is typical XML:

<siri:location>
    <siri:longitude>
        -2.4645
    </siri:longitude>
    <siri:latitude>
        48.6289
    </siri:latitude>
</siri:location>

Ideally it would be more natural and compact to have (for raw types):

<siri:location>
    <siri:longitude>-2.4645</siri:longitude>
    <siri:latitude>48.6289</siri:latitude>
</siri:location>

It is a good starting point though, and maybe it could be tweaked in Floki itself (I’ll dig deeper). Thanks!

If anyone has other options, it is good to share :slight_smile:

1 Like

Is your original XML text with or without those extra newlines?

Usually XML text generators don’t omit whitespace if it’s already there during parsing. I tried parsing the more verbose XML fragment and both floki and xml_builder serialized it exactly as it was before because the whitespace was already there. Here’s what the parsed form looks like when parsing the verbose XML:

{"siri:location", [],
 [
   {"siri:longitude", [], ["\n        -2.4645\n    "]},
   {"siri:latitude", [], ["\n        48.6289\n    "]}
 ]}

Basically, those libraries don’t improvise when it comes to whitespace.

But when I fed them an XML text without any whitespace they happily spat out exactly the same, byte-for-byte, compressed XML when asked to format it.

text = "<siri:location><siri:longitude>-2.4645</siri:longitude><siri:latitude>48.6289</siri:latitude></siri:location>"
doc = Floki.parse_document!(text)
doc |> Floki.raw_html()
doc |> XmlBuilder.generate(format: :none)

XmlBuilder.generate/2 can produce some whitespace for you when generating textual XML but it requires that no whitespace is there in the contents of the elements in order to do so. If it finds any whitespace it just prints it like it found it.


I’ve worked extensively with :xmerl (Erlang’s lower-level XML parser) about 2 years ago but I forgot the details. It likely allows for more fine-grained parsing (where whitespace is discarded) and then you can print whatever XML format you like (no whitespaces / partial whitespaces / full whitespaces).

Another alternative would be to run your input XML through a processing tool (maybe xmlstarlet?) that can trim whitespace from the content and then feed that result to my above code which will then happily produce minimal XML without whitespace.

You likely don’t need this but I got curious and dug deeper. Here’s a complete (non-Elixir) example that can help in case you have a verbose XML and want to slim it down.

Imagine this is your input XML file (verbose.xml):

<siri:location xmlns:siri="http://test.siri.com/">
    <siri:longitude>
        -2.4645
    </siri:longitude>
    <siri:latitude>
        48.6289
    </siri:latitude>
</siri:location>

And this is an XSLT command file (normalize_whitespace.xsl):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <xsl:apply-templates />
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:attribute name="{name()}">
          <xsl:value-of select="normalize-space()"/>
        </xsl:attribute>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:value-of select="normalize-space()"/>
  </xsl:template>

</xsl:stylesheet>

Now install XMLStarlet and run it like so:

xml tr --omit-decl normalize_whitespace.xsl verbose.xml

And this should be the result:

<siri:location xmlns:siri="http://test.siri.com/"><siri:longitude>-2.4645</siri:longitude><siri:latitude>48.6289</siri:latitude></siri:location>

Sadly I found no way to do it without skipping the XML namespace definition at the start of the XML file. But that was the best I could do in 20 minutes.

Not sure if this is useful to you but leaving it here for posterity.

3 Likes

Thanks for sharing your experiments. It will definitely help at some point, either me or someone else :slight_smile: