Weird behavior of File.stream! and utf8 bom

Hi there,

I am parsing a utf-8 file and I ran into weird issues with the first line (I read line by line). I create the stream like

File.stream!("data/foo_bar_file_name_utf8.txt", [:utf8])

The first line container three extra (non printable characters): <<239, 187, 191>> which I learned (the hard way) is the byte order marker. Isn’t the File module supposed to remove these characters from the Stream? O am I doing something wrong?

Cheers
Marcus

1 Like

We do not strip those by default because what if those are important for whatever is consuming the stream? We could possibly support an option though, :strip_bom. If such is desired, please open up an issue in the issues tracker.

1 Like

I think when reading line by line I cannot imagine that I want the bom in the first line. But having another option is ok as well. I filed #5695 in the issue tracker.

Thx

1 Like