IO.binstream is not unicode safe. What does it mean?

Hello,
I have read the IO module documentation and I must say I am confused about it and all the types related to strings.
In particular, I read for the binstream functio doc that is is “Unicode unsafe”. And that “Finally, do not use this function on IO devices in Unicode mode as it will return the wrong result.”

Now if I try:

{:ok, pid} = "àéb€%$\nùç*µ" |> StringIO.open()
> {:ok, #PID<0.114.0>}
pid |> IO.binstream(:line) |> Enum.to_list()  
> ["àéb€%$\n", "ùç*µ"]

It seems to work. Is that just lucky ?

If I create to IO device with StringIO.open(), I understand it is a unicode IO device. Am I supposed to stream it with IO.stream instead of IO.binstream? Does it make a difference?

Thanks!

PS: I read for binstream that “The device is iterated by the given number of bytes or line by line if :line is given. This reads from the IO device as a raw binary.” What’s a raw binary ? Binary data with non-unicode encoding for example? But how can binstream detect new lines if the encoding is unknown?

\n might work by accident because in almost all encodings it is the 10 byte. The issue with binstream will manifest in cases where the device supports multiple encodings, such as files. Try this:

  1. Write Unicode to a file
  2. Open the file with :utf8 flag
  3. Read the file with binread

In a nutshell, if a file was opened without the utf8 flag, use binread, otherwise read.

PS: yes, a raw binary is meant to be a string that is not in utf8 encoding. Improvements to the docs are welcome!

2 Likes

Thanks for the reply José.

I did the test you suggested, and using binstream on a file opened with :utf8 effectively gives wrong results.

From what I understand, the doc in binstream implicitely refers to the doc of File.open. But in my case, the input did not come from a File but from a string. Thus I was reading the documentation of StringIO.open, which also has an encoding option.

When the IO device comes from StringIO.open, no matter what encoding (:unicode or :latin1) is chosen, or wich function is used to stream (IO.binstream or IO.stream), everything seems to work fine.

Thanks for your help!

1 Like