File.steam! ignores weird Mac line breaks

I have to write an csv import for our app, knowing that this is always a master pain in the back, but…

It seams that importing CSV files that were exportet with OSX Excel version will not generate valid lines in File.stream!. Instead on big line is returned.

The line breaks are \r only. Is there a way to tell steam! to check for these particular line breaks or do I have to build the line reading from scratch?

It seams that importing CSV files that were exportet with OSX Excel version will not generate valid lines in File.stream!. Instead on big line is returned.

The line breaks are \r only. Is there a way to tell steam! to check for these particular line breaks or do I have to build the line reading from scratch?

Can you provide a reproducible test case?

I just tried using File.stream! on a very simple self-generated text file
(macOS 10.12.5) with \r line endings and they were handled correctly.

I don’t have Excel though.

sure:

iex(1)> File.write("test.txt", "first\rsecond")
:ok
iex(2)> File.stream!("test.txt") |> Enum.into([])
["first\rsecond"]
iex(3)>

my env is:
Erlang/OTP 19 [erts-8.3] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Elixir 1.4.2

iex(1)> File.write(“test.txt”, “first\rsecond”)
:ok
iex(2)> File.stream!(“test.txt”) |> Enum.into([])
[“first\rsecond”]
iex(3)>

You’re right, this doesn’t work. (My test file had both CRs and NLs.)

A “\r” on its own is not a line break supported by Elixir. You can define your own parser with NimbleCSV though: https://hexdocs.pm/nimble_csv/NimbleCSV.html

Note you won’t be able to use the stream variants though and you will have to load the whole csv into memory.

Well, it was a long time ago, and might still be used in some aged software (as we see here).

It was the official lineending in Mac OS before X and also in the Commodore OS’s (Amiga, C64, C+4, etc).

The english wikipedia isn’t clear on this, but the german one shows a small table:

Terminology and layout of the table should be clear even without translation.

The simplest solution might be to just normalise files before consuming them. It should be enough to treat it with tr '\r' '\n' < raw > normalised in the shell.

@michalmuskala: Yes, that’s exacly my solution for this issue. It is just a good thing to improve the platform by addressing these edge cases. This is my hotfix solution, not nice bit works: https://gist.github.com/mlankenau/784e8bc2e0a7ba1050715e68a03e5303
@NobbZ: It is Excel 2011 on OSX. I think that is not an small part of these business guys producing csv files.

At least for the Microsoft Office Version on Windows I used some years ago, I were able to specify some things when exporting or importing CSV, like line-endings and field separator. Perhaps this is possible on the MacOS version as well?

@NobbZ I am quite sure there are workarounds even for the customer. I could go and run sed to fix files as well. But CSV interfaces are a pain by definition. I had a customer that produced CSV with SQL that he never checked (quoting when delimeter is part of the text). So no point to make things more painful :slight_smile:

I filed an issue.

Wtf? Is this Mac OS9 or really OSX? Because the ending line character for OSX is \n, not \r.

Given that you could always transform the stream when you load it into the BEAM to convert all \r to \n, then streaming that to the csv parser, no third-party tools needed then.

But still, wtf? \r is not OSX’s linebreak character…

You are probably right. German Wikipedia says: Mac OS Classic, Apple II, C64 uses CR as line break.
But it seems to be an issue with Excel 2008/2011 that still uses this line breaks.
And yes - you are right - it is really a f*ck with CSV and Excel :slight_smile:

Do you expect much else from Microsoft products? :wink:

The WTF’s I get at work with IIS alone would fill multiple books… >.>

I feel blessed since 2010 when I returned my last Windows notebook to my former employer :slight_smile:

1 Like

Lol, yeah I have a Win10 notebook, all it does is gather dust sitting in my locked cabinet. I can get more work done using my phone and bluetooth keyboard than I can that thing. ^.^

Well, Desk IT (they manage the local desktops, build desks, etc… etc… I work in the Server IT area) comes around every 3 months freaking out because it has not been turned on in 3 months, their tracking system freaks out when something does not ‘phone home’ at least once every 3 months, which of course it cannot do if it is not even turned on… So they just turn it on for a moment after I unlock the cabinet, then turn it off and leave again for 3 months, so I lock the cabinet again… >.>

Yeah, its what Mac OS 9 used, support for which is basically the kernel of Excel for Mac. I’ve been banging my head against their version of CSV export for years now.

I’ll have to go dig up some code but i think what I do is a regex replacement on \r\n? to \r\n. It looks like yours is maybe better if you’re trying to stream it? I know I’m reading in the whole file to deal with the issue.

You might want to also verify your code still works with UTF8 characters—i’ve had weird issues crop up there as well when receiving input from users with international character sets. Its kind of hard to confirm because I’m like 3 stops down the line from the source, but its like something non-UTF8 gets stored by an upstream processor as UTF8 which apparently Works On Their Machine™ but causes headaches downstream.

Yes, i was afraid to blow up mem by loading the whole thing. In our case the csvs are rather small, but you never know.

I can confirm the issue. In project where we had csv import we always were struggling with our German Umlaute and stuff. But move to xls does not make it any better, even worse.

I was having the same issue. I solved it with the next code:

File.stream!( "test.txt")
|> Stream.flat_map(fn linea -> linea |> String.split("\r") end)
|> Enum.count()
2 Likes