Weird behavior of! and utf8 bom

Hi there,

I am parsing a utf-8 file and I ran into weird issues with the first line (I read line by line). I create the stream like!("data/foo_bar_file_name_utf8.txt", [:utf8])

The first line container three extra (non printable characters): <<239, 187, 191>> which I learned (the hard way) is the byte order marker. Isn’t the File module supposed to remove these characters from the Stream? O am I doing something wrong?


1 Like

We do not strip those by default because what if those are important for whatever is consuming the stream? We could possibly support an option though, :strip_bom. If such is desired, please open up an issue in the issues tracker.

1 Like

I think when reading line by line I cannot imagine that I want the bom in the first line. But having another option is ok as well. I filed #5695 in the issue tracker.


1 Like