I have to write an csv import for our app, knowing that this is always a master pain in the back, but…
It seams that importing CSV files that were exportet with OSX Excel version will not generate valid lines in File.stream!. Instead on big line is returned.
The line breaks are \r only. Is there a way to tell steam! to check for these particular line breaks or do I have to build the line reading from scratch?
It seams that importing CSV files that were exportet with OSX Excel version will not generate valid lines in File.stream!. Instead on big line is returned.
The line breaks are \r only. Is there a way to tell steam! to check for these particular line breaks or do I have to build the line reading from scratch?
Can you provide a reproducible test case?
I just tried using File.stream! on a very simple self-generated text file
(macOS 10.12.5) with \r line endings and they were handled correctly.
The simplest solution might be to just normalise files before consuming them. It should be enough to treat it with tr '\r' '\n' < raw > normalised in the shell.
@michalmuskala: Yes, that’s exacly my solution for this issue. It is just a good thing to improve the platform by addressing these edge cases. This is my hotfix solution, not nice bit works: https://gist.github.com/mlankenau/784e8bc2e0a7ba1050715e68a03e5303 @NobbZ: It is Excel 2011 on OSX. I think that is not an small part of these business guys producing csv files.
At least for the Microsoft Office Version on Windows I used some years ago, I were able to specify some things when exporting or importing CSV, like line-endings and field separator. Perhaps this is possible on the MacOS version as well?
@NobbZ I am quite sure there are workarounds even for the customer. I could go and run sed to fix files as well. But CSV interfaces are a pain by definition. I had a customer that produced CSV with SQL that he never checked (quoting when delimeter is part of the text). So no point to make things more painful
Wtf? Is this Mac OS9 or really OSX? Because the ending line character for OSX is \n, not \r.
Given that you could always transform the stream when you load it into the BEAM to convert all \r to \n, then streaming that to the csv parser, no third-party tools needed then.
But still, wtf? \r is not OSX’s linebreak character…
You are probably right. German Wikipedia says: Mac OS Classic, Apple II, C64 uses CR as line break.
But it seems to be an issue with Excel 2008/2011 that still uses this line breaks.
And yes - you are right - it is really a f*ck with CSV and Excel
Lol, yeah I have a Win10 notebook, all it does is gather dust sitting in my locked cabinet. I can get more work done using my phone and bluetooth keyboard than I can that thing. ^.^
Well, Desk IT (they manage the local desktops, build desks, etc… etc… I work in the Server IT area) comes around every 3 months freaking out because it has not been turned on in 3 months, their tracking system freaks out when something does not ‘phone home’ at least once every 3 months, which of course it cannot do if it is not even turned on… So they just turn it on for a moment after I unlock the cabinet, then turn it off and leave again for 3 months, so I lock the cabinet again… >.>
Yeah, its what Mac OS 9 used, support for which is basically the kernel of Excel for Mac. I’ve been banging my head against their version of CSV export for years now.
I’ll have to go dig up some code but i think what I do is a regex replacement on \r\n? to \r\n. It looks like yours is maybe better if you’re trying to stream it? I know I’m reading in the whole file to deal with the issue.
You might want to also verify your code still works with UTF8 characters—i’ve had weird issues crop up there as well when receiving input from users with international character sets. Its kind of hard to confirm because I’m like 3 stops down the line from the source, but its like something non-UTF8 gets stored by an upstream processor as UTF8 which apparently Works On Their Machine™ but causes headaches downstream.
Yes, i was afraid to blow up mem by loading the whole thing. In our case the csvs are rather small, but you never know.
I can confirm the issue. In project where we had csv import we always were struggling with our German Umlaute and stuff. But move to xls does not make it any better, even worse.