Decimal to ASCII conversion

Hi all,

I am aware that this is an old topic but I just cannot seem to wrap my head around this. I am reading data sent to me through a socket connection. The data is just this simple thing:

{“timestamp”: “data”}

Over the wire (wireshark) I can see I am getting something like this:

{“2019-11-22 16:14:12”:“00450”}… (dots included)

As expected (well not really but I can understand the why) Elixir displays it as:

<<32, 123, 34, 50, 48, 49, 57, 45, 49, 49, 45, 50, 50, 32, 49, 54, 58, 50, 57, etc…

Is there a function to or library to extrat the ASCII?

I am borrowing this piece of code: https://gist.github.com/cblavier/9ff624d9afd3cdc5671786cfeefeb6ae

Which I am trying to use as:

def handle_info({:tcp, socket, packet}, state) do
    packet_str = StringUtil.raw_binary_to_string(packet)
    Logger.info("#{inspect(packet_str)}\n")
   {:noreply, state}
  end

But so far no success. Am I doing something wrong? I know I can always map the binary to the ASCII equivalent but I really do not want to do that if I can avoid it.

Thanks,

Well this part is all string of " {\"2019-11-22 16:29", probably some trailing null bytes or so. Change your inspect(packet_str) to be inspect(packet_str, limit: :infinity) to print the entire message. I bet it’s just a string with trailing null’s or so (a very common pattern). :slight_smile:

1 Like

A useful debugging technique for these issues it to call String.codepoints/1 on the result. Using a bit a your data where I appended a <<0>> to it:

iex> x = <<32, 123, 34, 50, 48, 49, 57, 45, 49, 49, 45, 50, 50, 32, 49, 54, 58, 50, 0>>
<<32, 123, 34, 50, 48, 49, 57, 45, 49, 49, 45, 50, 50, 32, 49, 54, 58, 50, 0>>
iex> String.codepoints x
[
  " ",
  "{",
  "\"",
  "2",
  "0",
  "1",
  "9",
  "-",
  "1",
  "1",
  "-",
  "2",
  "2",
  " ",
  "1",
  "6",
  ":",
  "2",
  <<0>>
]

You can see its a lot easier to find where the non-UTF8 characters are lurking!

2 Likes

Thanks for the tip on inspect. I am getting now the full decimal:

123 34 50 48 49 57 45 49 49 45 50 50 32 49 54 58 53 52 58 49 51 34 58 34 48 48 48 48 48 34 125 0 195 191 195 191 195 191 195 191

Which is:
{“2019-11-22 16:54:13”:“00000”}ÿÿÿÿ

I am not sure where the ÿÿÿÿ are coming from. I have to figure it out on the microcontroller that is sending me the data. Btw, there was a trainling empty space at the beginning, you were right, I removed it.

So, is the idea that one I remove all that garbage I will get a nice clean ASCII formated string or would still have a decimal which I need to convert to ASCII?

1 Like

Thanks for the reply. Yes, I thougt that was what the function I borrow did to “decode” to ASCII.

def raw_binary_to_string(raw) do
    raw
    |> String.codepoints()
    |> Enum.reduce(fn(w, result) ->
      if String.valid?(w) do
        result <> w
      else
         << parsed :: 8>> = w
         result <>   << parsed :: utf8 >>
      end
    end)
   end

So I would guess that the 0 marks the end of the string and the stuff after it is one of:

  • padding to some standard block length, or a multiple
  • binary control messages

I’m thinking the first is more likely, and if so, yeah, you just find the 0, truncate before it, and now you have a binary that is the string you’re looking for.

1 Like

Yep very likely. But yes of your data of 123 34 50 48 49 57 45 49 49 45 50 50 32 49 54 58 53 52 58 49 51 34 58 34 48 48 48 48 48 34 125 0 195 191 195 191 195 191 195 191 that 0 there is most definitely the end of string character, I.E. a standard null-terminated string. The ‘space’ at the beginning is fine and is allowed json. So what you should probably do is just take your packet and parse out a null-terminated string, here’s a function (entirely untested and typed in-post, but it ‘should’ work well):

def parse_null_terminated_utf8_string(bin), do: parse_null_terminated_utf8_string(bin, 0, byte_size(bin))
defp parse_null_terminated_utf8_string(bin, count, size) do
  case bin do
    <<_::size(count)-binary, 0, _::binary>> ->
      <<string::size(count)-binary, 0, rest::binary>> = bin
      {string, rest}
    _ when size > count ->
      parse_null_terminated_utf8_string(bin, count + 1, size)
    _ ->
      {bin, ""}
  end
end

It returns a tuple of the parsed string and the ‘rest’ of the binary, so use it like:

{packet_str, _rest} = parse_null_terminated_utf8_string(packet)

Or however you want to use it, can add it to your StringUtil module or something. ^.^

4 Likes

Thanks very much @OvermindDL1. I will try this out. If I need to make any change to the function I willpost it here for future reference.

1 Like

@Asimov Did it work for you I’m curious? Any changes to it needed? :slight_smile:

Hi @OvermindDL1,

Sorry I didn’t reply before. I was just able to come back to this today. It actually worked beautifully. It was straightforward cut and paste. Thanks so much!

1 Like

Awesome! Great to hear! :smile:

Actually I did a very small change that better suit me. Since I know that anything after the string termination is pure garbage (in my case), I just changed the return like this:

def parse_null_terminated_utf8_string(bin), do: parse_null_terminated_utf8_string(bin, 0, byte_size(bin))
defp parse_null_terminated_utf8_string(bin, count, size) do
  case bin do
    <<_::size(count)-binary, 0, _::binary>> ->
      <<string::size(count)-binary, 0, rest::binary>> = bin
      string
    _ when size > count ->
      parse_null_terminated_utf8_string(bin, count + 1, size)
    _ ->
      bin
  end
end

Just replying here to clarify some things that look like sources of misunderstandings here.

In Elixir, strings are binaries. This is to say, strings are blobs of UTF-8 encoded binary data. This also means that whenever a binary “looks like” a string (i.e. UTF-8), IEx and inspect will print that binary as a readable string by default.

What happened here was the inverse of that. You got binary data and Elixir determined that it was not printable (because it contained a null byte for example), and printed out the binary representation. What you saw (<<32, 123, 34, 50, ...>>) is how Elixir pretty prints binaries (that are not printable as strings). They are not decimals, but a representation of binary data, with a number (0–255) representing the value of each byte.

2 Likes

Hi @Nicd,

You are quite right. I am still trying to wrap my head around most of Elixir. What you described is just exactly my case: I have a microcontroler that is sending a simple JSON {key; value}. It happens that it appends a NULL at the end. That is just how it is. I come from a Python world where I would do somethig like:

myContent = "{key:value}NULL"
myCleanContent = myContent.rstrip(myContent)

I know that pattern matching would give me what I want but I just cannot come up with what it is that I have to do (I know it must be terribly simple). Somethig like these?

{myCleanContent, nil} = {myContent, _}

Thanks!

Well in Elixir you could also simply do this:

iex(1)> myContent = "{key:value}\0"
<<123, 107, 101, 121, 58, 118, 97, 108, 117, 101, 125, 0>>
iex(2)> String.trim_trailing(myContent, <<0>>)
"{key:value}"

The issue with pattern matching it is that pattern matching can’t deal with dynamic length content at the start of a match. So you can’t do for example <<data::binary, 0>> = myContent. If you knew that your data was always a certain length, you could do

iex(4)> <<data::binary-size(11), _rest::binary>> = myContent
<<123, 107, 101, 121, 58, 118, 97, 108, 117, 101, 125, 0>>
iex(5)> data
"{key:value}"

But you don’t have to pattern match everything. It’s perfectly fine to do the first thing I showed you if the length is unknown.

4 Likes

Thanks so much @Nicd. Yes, it was that simple! And I was banging my head agains the wall with it!