Parsing XML key/value pairs with \"key\":"\value\" synyax

I am new to Elixir, but love what I am seeing so far! I would think Elixir would be a great suit for my query below given the pattern matching capabilities, but at my current level I am stuck.

My issue is as follows: I am trying to parse a list of key:value pairs from an XML file. I have included the first couple of pairs below.

["\"ClassNotationDocumentUrl\":\"\",\"VesselId\":\"40775\",\"IMONo\":\"9866718\",\"Currentname\":\"LOWLANDS CRIMSON\",\"VesselGuid\":\"a5379088-2c7d-42d8-bdf5-c5677ce5c1d2\",

I am struggling to figure out how to parse this into an Elixir struct, could anyone point me in the right direction?

Many thanks for your help!

This is one approach to get started; see below for notes about why it’s brittle:

iex(20)> Regex.scan(~r/"([^"]*)":"([^"]*)",?/, s, capture: :all_but_first)
[
  ["ClassNotationDocumentUrl", ""],
  ["VesselId", "40775"],
  ["IMONo", "9866718"],
  ["Currentname", "LOWLANDS CRIMSON"],
  ["VesselGuid", "a5379088-2c7d-42d8-bdf5-c5677ce5c1d2"]
]

The brittleness of this solution is from subexpressions like [^"]* - if the data ever contains " characters as part of a value this parser will break, as it doesn’t implement any kind of escaping.

2 Likes

I usually do something like this:

Enum.reduce(input_list, %{}, fn item, acc ->
  [key, value] =
    item
    |> String.replace("\"", "")
    |> String.split(":")

  Map.put(acc, key, value)
end)

For each item in the list, get rid of the quotation marks, split at the colon, and then put them into a map (or struct if the keys are always the same). Note how Elixir’s pattern matching allows us to easily assign key and value variables from the result of the pipeline.

2 Likes

Can’t provide a good example right now on phone but look at nimbleparsec package on hex. It’s brilliant for parsing all sorts of input.

3 Likes

Many thanks for quick responses - Super helpful!
I ended up going with the regex as it does the job well enough for my purpose.