We do, but all libraries are dealing with the File.read/File.stream! output which doesn’t have an option to specify a line-separation sequence, as in Python, for example Glossary — Python 2.7.18 documentation
This is described in the second link of original message:
File.Stream uses IO.each__stream which uses :io.get_line. LF and CRLF are handled, but not CR, which is used in some Mac OS and other legacy formats.
I use nimble_csv myself, and the thing it (and just about every other CSV library i’ve ever used, in any language) breaks down on is weird encodings you often get from Excel users. i’ve basically given up on parsing these documents, which I’m pretty sure are broken but also very common, so I go with the regex replacement above.
The problem with setting the line endings in nimble is that you have to know ahead of time what’s in the file, and if you have a wide variety of these files coming in, you can’t just set it once and forget it.
I ran into this problem and I cleaned the file up on the command line using this Unix command.
tr -d ‘\r’ <file1.csv > file2.csv
It removes all of the carriage returns. The file should be left with \n. I found this to be
the easiest solution for me to work with. I often just use simple Unix commands to clean files up which makes them much easier to work with in Elixir.
I have the same problem, but I written thin csv to JSON converter, it’s not generic though. I simply scan documents from beginning and check what line ending I encounter first and what delimiter there was, comma or semicolon. Then pass it to proper csv reading library with proper line ending and delimiter.
Not the fastest approach, far from perfect, but so far in production had only one not parsed/erroring file and it was really XLS with csv file extension.
PS. That rust program is run through cli piping stdout back to an application.
Actually I make converters that they have to use. ^.^
Like the banking reports are… random. I have SO many fixups to get it into a defined format.
Like their’s is something kind of like csv, except it doesn’t escape, so, say, comma’s in text (which happens often) are just ‘there’, unescaped and unquoted… It is horrifying. But my cleanup scripts fix them up. ^.^
Dang, I really don’t understand how you dealt with unescaped commas in a Comma Separated Value (CSV) file. How do you even know where the commas are supposed to go? I guess if you have a description field next to a field with known values you can say that any extra commas on the row go to the description field, but that sounds quite error-prone.
Sometimes I wish there was a file format similar to CSV but complicated enough that everyone would realize that they need an actual parser/outputer. CSV isn’t as simple as sticking commas between fields. (JSON isn’t a good alternative due to all of the duplication if you have lots of rows).
Everything that is able to distinguish between at least 5 basic types: Integers, Floats, Strings, Composites and an ordered kollection of items (List/Array).
Also it needs to support a Schema.
Therefore, whenever a client yells at me that he can provide a JSON or CSV dump of his database, then I ask for an SQL export instead.
If he wants to provide a CSV export of some OOo, LO or Excel spreadsheet, I ask for a copy of the spreadsheet.
If he insists on giving only CSV or JSON my boss doubles the price per hour…
We’ve been through many hoops that could have been avoided if we had access to raw data right away… So we came up with the solution mentioned above, after adjusting the payrate, most clients are able to provide raw data all of a sudden and they do not care for all the corporation policies they mentioned in the discussion earlier…