Convert amazon ion objects into elixir types

After querying database tables from amazon ledger database via python/erlport I get a list of amazon ion objects like

     [
  '"{subject:\\"Elixir\\",state:\\"in use\\",url:\\"https://elixirforum.com",retrieved_on:2019-12-21T00:00:00.000000-00:00}"',
  '"{subject:\\"Phoenix\\",state:\\"in use\\",url:\\"https://www.phoenixframework.org",retrieved_on:2019-12-21T00:00:00.000000-00:00}"'
]

The objects represent neither elixir types nor valid json.

How can I convert such objects or the list of objects into a list of maps or into json to use in elixir.

Looks like they are just escaped json. Unescape it, then Jason.decode

You are right. They call it a superset of json.

I have already tryed to use Jason.decode but all I got get was the same list without the leading and closing single quote like:

[
  "{subject:\\"Elixir\\",state:\\"in use\\",url:\\"https://elixirforum.com",retrieved_on:2019-12-21T00:00:00.000000-00:00}",
  "{subject:\\"Phoenix\\",state:\\"in use\\",url:\\"https://www.phoenixframework.org",retrieved_on:2019-12-21T00:00:00.000000-00:00}"
]

How can I unescape it?

It is not a json, it contains own data structures, like datetime in UTC retrieved_on:2019-12-21T00:00:00.000000-00:00, so Jason isn’t able to parse this.

@joges you need your own parser or just do deserialization into JSON on python side.

ha, I was on mobile so didn’t see that far into the data structure!

+1 on own parser

How exactly do the objects arrive, in which form?

Do they start with a single quote, or a double quote? It’s not very clear from the snippet you posted in your OP.

Ideally, please post exactly how you see them in iex. Show us an assignment to a variable and the output in the console.

Here’s a quick and very dirty and unsafe implementation that works on your both samples:

defmodule CustomParser do
  def sample1(), do: "{subject:\"Elixir\",state:\"in use\",url:\"https://elixirforum.com\",retrieved_on:2019-12-21T00:00:00.000000-00:00}"

  def sample2(), do: "{subject:\"Phoenix\",state:\"in use\",url:\"https://www.phoenixframework.org\",retrieved_on:2019-12-21T00:00:00.000000-00:00}"

  def parse(t) when is_binary(t) do
    t
    |> String.trim_leading("{")
    |> String.trim_trailing("}")
    |> String.split(",")
    |> Enum.map(fn(x) -> String.split(x, ":", parts: 2) end)
    |> Enum.map(fn([key, value]) -> {key, String.replace(value, "\"", "")} end)
    |> Map.new()
  end
end

Now test it in iex:

iex> CustomParser.parse(CustomParser.sample1())
%{
  "retrieved_on" => "2019-12-21T00:00:00.000000-00:00",
  "state" => "in use",
  "subject" => "Elixir",
  "url" => "https://elixirforum.com"
}
iex> CustomParser.parse(CustomParser.sample2())
%{
  "retrieved_on" => "2019-12-21T00:00:00.000000-00:00",
  "state" => "in use",
  "subject" => "Phoenix",
  "url" => "https://www.phoenixframework.org"
}

I am not sure if this code properly assumes the shape of the input data though. The assumptions are shown in the hardcoded text samples.

From then on, you can additionally parse the date/times into proper Elixir objects. It’s pretty easy when you use timex.

I’d strongly recommend using something more proper like nimble_parsec for this task though!

The looks like a solution! Many thanks!!

But maybe there is potential for improvements, so in order to answer your questions and in order to explain better what I do on python side:

I have a function on python side, which converts an ion object into python:

def get_result(cursor):
    """
    Pretty print the result set. Returns the number of documents in the result set.

    :type cursor: :py:class:`pyqldb.cursor.stream_cursor.StreamCursor`/
                  :py:class:`pyqldb.cursor.buffered_cursor.BufferedCursor`
    :param cursor: An instance of the StreamCursor or BufferedCursor class.

    :rtype: int
    :return: Number of documents in the result set.
    """

    result = []
    for row in cursor:

        # Each row would be in Ion format.
        result.append(dumps(row, binary=False,
                            omit_version_marker=True))

    return result

When I print this function in python or call it via erlport in elixir and make an IO.inspect() I get the following result in iex (this time with original and not example data):

['{version:"BGBl. II Nr. 163/2007",valid:true,url:"https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20005279",subject:"Einweisungsbest\\xe4tigung",state:"in use",retrieved_on:null,name:"Medizinproduktebetreiberverordnung (MPBV)",id:null,country:"AT",confirmation_template:[]}', '{subject:"Einweisungsbest\\xe4tigung",name:"Medizinproduktebetreiberverordnung (MPBV)",country:"AT",url:"https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20005279",retrieved_on:2019-12-21T00:00:00.000000-00:00,version:"BGBl. II Nr. 163/2007",state:"in use",valid:true}']

So in this case, I get a single quoted result.

I also tried to convert this object on python side to json:

def get_result(cursor):
    """
    Pretty print the result set. Returns the number of documents in the result set.

    :type cursor: :py:class:`pyqldb.cursor.stream_cursor.StreamCursor`/
                  :py:class:`pyqldb.cursor.buffered_cursor.BufferedCursor`
    :param cursor: An instance of the StreamCursor or BufferedCursor class.

    :rtype: int
    :return: Number of documents in the result set.
    """

    result = []
    for row in cursor:

        # Each row would be in Ion format.
        result.append(json.dumps(dumps(row, binary=False,
                                       omit_version_marker=True)))

    return result

The result of “already converted to json on python side” is as I posted before single and double quoted:

['"{version:\\"BGBl. II Nr. 163/2007\\",valid:true,url:\\"https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20005279\\",subject:\\"Einweisungsbest\\\\xe4tigung\\",state:\\"in use\\",retrieved_on:null,name:\\"Medizinproduktebetreiberverordnung (MPBV)\\",id:null,country:\\"AT\\",confirmation_template:[]}"', '"{subject:\\"Einweisungsbest\\\\xe4tigung\\",name:\\"Medizinproduktebetreiberverordnung (MPBV)\\",country:\\"AT\\",url:\\"https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=20005279\\",retrieved_on:2019-12-21T00:00:00.000000-00:00,version:\\"BGBl. II Nr. 163/2007\\",state:\\"in use\\",valid:true}"']

I don’t know Python but this doesn’t look right; it does not even attempt to format the keys to JSON – namely enclose them in double quotes. If you want to just pass a valid JSON to the Elixir code – which you should IMO, and you should ignore my code because it’s unsafe and makes too many assumptions – then fight with your Python code and make it format your key/value pairs as JSON. Shouldn’t be that hard I reckon.

You also have your double quotes strangely escaped: you have the backslash itself escaped. Pretty weird.

Thanks for your help and advise. I am going to play around on python side again. By the way the python side code is an example code snippet directly catched from the amazon python driver example code for the ledger database.

Eh, I didn’t help much.

I’d additionally advise you to go to StackOverflow and ask how can you format Amazon Ions to valid JSON in Python. Once you have that solved, the Elixir part is a piece of cake – just a matter of Jason.decode. :slight_smile:

Good luck!

thx dimitarvp. StackOverflow is a good idea!

it’s inside of a charlist.

iex(1)> bs = ?\\
92
iex(2)> q = ?"
34
iex(3)> IO.inspect [bs, q]
'\\"'
'\\"'

charlists: the one footgun of elixir :smiling_imp:

Have you tried to use json.dump here instead of the unqualified dumps noone knows where it is defined or imported from?

Perhaps you are lucky and you can already safely make them JSON that way.

Unfortunately, using just json.dumps does not work. json.dumps comes from python and “dumps” comes from amazon.ion.simpleion library which is needed to convert a bufferedCursor (result of query) object to an Amazon Ion.