:erlang.term_to_binary/1 returns unexpected/incorrect binary for integer lists

Hey folks,

I have an Elixir application that communicates with a Java application through ports.
For communication, I convert values into binary format using :erlang.term_to_binary/1.

However, I’ve noticed that when :erlang.term_to_binary/1 receives a list of integers, it returns an incorrect binary.

For example:
Input: [2]
Expected: <<131, 108, 0, 0, 0, 1, 97, 2, 106>>
Result: <<131, 107, 0, 1, 2>>

Can anyone help me avoid this issue?

Thank you :smiley:

Best regards,
Kalle

Hey @KalleJoP I also get the same result, but Erlang at least seems to have no problem converting it back to a list:

iex(1)> [2] |> :erlang.term_to_binary |> :erlang.binary_to_term
[2]

Can you elaborate why you expect <<131, 108, 0, 0, 0, 1, 97, 2, 106>>?

out of curiosity I just

iex(1)> :erlang.binary_to_term <<131, 108, 0, 0, 0, 1, 97, 2, 106>>                                                
[2]
iex(2)> :erlang.binary_to_term <<131, 107, 0, 1, 2>>                                                               
[2]
iex(3) (:erlang.binary_to_term <<131, 108, 0, 0, 0, 1, 97, 2, 106>>) == (:erlang.binary_to_term <<131, 107, 0, 1, 2>>)
true

:thinking:

That’s the missing parenthesis . Without the parenthesis round your function calls that I think line is being interpreted as

: erlang.binary_to_term(<<131, 108, 0, 0, 0, 1, 97, 2, 106>>  == :erlang.binary_to_term<<131, 107, 0, 1, 2>>)   

which is the same as

erlang.binary_to_term(false)
iex(1)> :erlang.binary_to_term(<<131, 108, 0, 0, 0, 1, 97, 2, 106>>) == :erlang.binary_to_term(<<131, 107, 0, 1, 2>>)  
true

I don’t know off the top of my head why the the two binary forms are equivalent. I could guess but I am trying to avoid being further nerd sniped by this question right now. :-/

Seems like there’s a list and string type:

https://www.erlang.org/doc/apps/erts/erl_ext_dist.html#list_ext
https://www.erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext

Sounds like there’s overlap for charlists.

1 Like

yes, I just found I was missing the parenthesis! Puzzled and intrigued too… :slight_smile:

There are no parenthesis missing

I edited my previous post without making it evident :frowning: my bad

2 Likes

Thank you… I don’t know why I’m always so scared to go an look into Erlang docs … :grimacing: :japanese_ogre:

I found out that the problem is the decoding on java side. I use the OtpErlang package and for some reason, they decode small integer lists to strings.

You can view it here:

Heh

iex(1)> :erlang.term_to_binary('hello') 
<<131, 107, 0, 5, 104, 101, 108, 108, 111>>
iex(2)> :erlang.term_to_binary('🧐')   
<<131, 108, 0, 0, 0, 1, 98, 0, 1, 249, 208, 106>>
iex(3)> '🧐'
[129488]
iex(4)> :erlang.term_to_binary([129488])
<<131, 108, 0, 0, 0, 1, 98, 0, 1, 249, 208, 106>>

[Edit] - For a minute I’d fooled myself into thinking that Erlang was intentionally supporting unicode in charlists but I wasn’t thinking straight. It’s just an artefact of Elixir’s charlist support

iex(1)> cs = 'hello 🧐'
[104, 101, 108, 108, 111, 32, 129488]
iex(2)> :erlang.iolist_to_binary(cs)
** (ArgumentError) errors were found at the given arguments:

  * 1st argument: not an iodata term

    :erlang.iolist_to_binary([104, 101, 108, 108, 111, 32, 129488])
    iex:2: (file)

``

Also 

> Only if a string contains code points < 256, can it be directly converted to a binary 

https://www.erlang.org/doc/apps/stdlib/unicode_usage.html#standard-unicode-representation

Lists of integers are charlists, which is the default string type for erlang. For a decoding library there’s no way to know if your intend was to transfer a string encoded as a charlist or an actual list of integers.

Okay, I understand.

In my use case, there is no need to decode charlists, so I may be able to fix the decoding “issue”.

Another idea I have is to add an empty string or something similar as the first item in the list to prevent it from containing only integers.

What would be the best approach to work around this behavior?

Not sure what you mean by this, as even the link you gave points out:

A Unicode string in Erlang is a list containing integers, where each integer is a valid Unicode code point and represents one character in the Unicode character set.

That’s what charlists are. You need to use the unicode module to convert to binary as you must choose which binary encoding to use (typically you want UTF-8).

Well

iex(1)> :unicode.characters_to_binary(['hello matey ', [129335, 8205, 9792, 65039]])
"hello matey 🤷‍♀️"

ill_get_my_coat

2 Likes

I’m sorry, I don’t understand what you mean by this.

1 Like

The 107 in the binary indicates that it is STRING_EXT, a string, which is lan optimisation for a list of bytes, while 108 indicates a LIST_EXT which is for a list where you specify each element and a tail.

6 Likes