Search and replace text XML

Hello, I have a nested value that I’d like to search and replace. I’ve read the README.md of sweet_xml and I’m unsure how to go about it. Any help or pointers in the right direction would be immensely appreciated. Thank you.

For example:

<response>
          <record>
            <id>123</id>
          </record>
</response>

How do I replace 123 with a different value?

Depends how strict you want the XML manipulation be. If you are feeling brave, you can just use regexes (very, very not recommended), or you have to use :xmerl to do stream parsing and replacing nodes as you go.

I’ve done the latter before. It’s not a very short code but it works, I can show you if you like.

Yes that would be awesome. I have been using :xmerl but don’t feel confident.

It’s a bit long:

defmodule ReplaceXml do
  @type path :: nonempty_list(atom)
  @type mutator :: (charlist -> charlist)
  @type xmerl_doc_or_element :: tuple
  @type xmerl_result :: {xmerl_doc_or_element, list}
  @type stream_item :: tuple
  @type stream_accumulator :: list
  @type stream_state :: term
  @type stream_result :: {stream_accumulator, stream_state}
  @type export_result :: list

  @ansi_whitespace '\t\n\v\r '

  @doc """
  Exports an XML structure to a string.
  """
  @spec export(stream_item) :: export_result
  def export(el) do
    :xmerl.export([el], :xmerl_xml,
      prolog: '<?xml version=\'1.0\' encoding=\'UTF-8\'?>\n'
    )
  end

  @doc """
  Accumulates parsed XML in a streaming fashion and modifies text nodes in accordance
  with functions corresponding to XPath-like paths (which are lists of element names
  as atoms). When this function detects a path that corresponds to one of the paths
  passed as a parameter to it, then it calls the function associated with the path.
  The resulting modified text is appended to the resulting XML in the place of the
  original text.
  """
  @spec stream_text_mutator(stream_item, stream_accumulator, stream_state) ::
          stream_result
  def stream_text_mutator(
        {:xmlText, parents, pos, language, value, type} = parsed,
        acc,
        global_state
      ) do
    mutators = :xmerl_scan.user_state(global_state)

    current_path =
      parents
      |> Keyword.keys()
      |> Enum.reverse()

    value_without_whitespaces = Enum.reject(value, &(&1 in @ansi_whitespace))

    case Map.get(mutators, current_path) do
      mutator when is_function(mutator, 1) and value_without_whitespaces != [] ->
        modified_text = {:xmlText, parents, pos, language, mutator.(value), type}
        {[modified_text | acc], global_state}

      _ ->
        {[parsed | acc], global_state}
    end
  end

  def stream_text_mutator(parsed, acc, global_state), do: {[parsed | acc], global_state}


  @doc """
  Finds and replaces XML text nodes; the replacers parameter is expected to have list of
  XML element names (as atoms) pointing at functions that modify the text that is passed
  to them. The result is an XML structure that can optionally be serialised back to XML
  text through the `export/1` function in this module.
  """
  @spec find_and_replace_xml_text_nodes(charlist, %{required(path) => mutator}) ::
          xmerl_result
  def find_and_replace_xml_text_nodes(xml, %{} = paths_to_replacers) do
    :xmerl_scan.string(
      trim_leading(xml),
      acc_fun: &stream_text_mutator/3,
      user_state: paths_to_replacers
    )
  end

  defp trim_leading([?\t | t]), do: trim_leading(t)
  defp trim_leading([?\n | t]), do: trim_leading(t)
  defp trim_leading([?\v | t]), do: trim_leading(t)
  defp trim_leading([?\r | t]), do: trim_leading(t)
  defp trim_leading([?\s | t]), do: trim_leading(t)
  defp trim_leading(l), do: l

  def test1() do
    xml = """
    <response>
      <record>
        <id>123</id>
      </record>
    </response>
    """

    replacer1 = fn(x) ->
      Enum.reverse(x)
    end

    paths_to_replacers = %{[:response, :record, :id] => replacer1}

    {resulting_xml_structure, []} =
      xml
      |> to_charlist()
      |> find_and_replace_xml_text_nodes(paths_to_replacers)

    resulting_xml_structure
    |> export()
    |> to_string()
  end
end

Just run ReplaceXml.test1 in your iex console for a demonstration on how to reverse the contents of your desired XML node.

Some notes:

  • Replace the prolog: contents in the export function with an empty charlist if you don’t want the resulting XML to have the standard prolog/header (<?xml version='1.0' encoding='UTF-8'?>).
  • :xmerl works with charlists, not strings. Check the code where conversions are made. Also the replacing function is using Enum.reverse and not String.reverse because the former operates on a [char]list and not on a string.
  • The meat and potatoes of the code is the stream_text_mutator function. If you read closely, you’ll notice that it accumulates parsed XML data (in the format that :xmerl parser is feeding to it) and only changes the XML text nodes that it receives and which correspond to the path you want changed. The path is encoded as a list of atoms that represents the hierarchy of the element names. In your case that hierarchy is [:response, :record, :id] and that’s exactly what is visible in the test1 function.

Any other questions, feel free to ask. :xmerl isn’t the easiest thing to work with but it’s very powerful.

1 Like