Can´t generate file (csv) with special chars

how-to-question

#1

Hi,

i use the following code to generate a csv file. The code itself works. The problem are special chars / umlauts.

  • if i use :utf8 i get the file, but special chars like €, öäü are written like €, ü, …

  • if i don´t use :utf8 it works as long as there are no special chars in a row like €, –, … . Then it breaks with the error erlang error: :no_translation

      file = File.open!(System.tmp_dir()<>"/"<>filename, [:write, :utf8])
      users = get_users(conn, params)
               |> Repo.all
               |> CSV.encode(separator: ?;)
               |> Enum.each(&IO.write(file, &1))
    

Thanks!


#2

When you write using utf8 make sure you use it for reading as well. As you describe it, it seems as if your reader uses latin1 aka iso-8859-1


#3

I just realize that you get data from db, maybe it’s not utf8 in there, elixir will not convert.


#4

Are you using MySQL or MariaDB? They use latin1 as their default apparently.


#5

None of them. Postgres 9.6 on AWS RDS. Default settings.


#6

AWS RDS with postgres 9.6 is set to utf8. Not sure what to change in code.


#7

So what do you use to display the output?


#8

Excel 2016. I know you can set encoding on import. But at the end i need a file which can be opened without doing anything. Our users don´t know how do import a csv in excel.


#9

Then you will have 3 options:

  • Deliver an excel spreadsheet with correctly set up encoding,
  • Restrict yourself to US-ASCII
  • Write proper documentation how to import your data into Excel

If the customer doesn’t know how to do, it is not your fault. Also it is not your fault, that excel doesn’t use utf8 as default import encoding. Even worse, depending on the system locale, excel may assume different default encodings (thats why I told you to restrict yourself to US-ASCII, which is a subset of all encodings used by MS I already encountered in the wild).

So two things that aren’t even your fault, that you can’t do anything about, but your customer tries to blame you?


#10

Mhm, i know how to generate a csv. But as fas as i know i can´t set encoding in the file(?). I don´t know how to generate a xls file. But thanks for your input and time.

And yes, the customers blame me :frowning: We have strange customers.


#11

Maybe you can use elixlsx?


#12

I´ve seen it. But it seems more suited for xlsx file build from the ground up, not from a database query. But thanks.


#13

Well, even the csv isn’t build from a db query but from data.

Of course when spitting out excel you need to have more to do to bootstrap the file. Also excel spreadsheets are so much more powerful than plain csv. To be able to use this power in your export as well, of course the library needs to have a slightly more complex API than a CSV export.


#14

I have encountered this problem at a prior job. You’ll also find different behavior for different version of Excel and whether it is the Windows or Mac version. Here’s a very good summary of the problem and possible approaches: https://stackoverflow.com/questions/6002256/is-it-possible-to-force-excel-recognize-utf-8-csv-files-automatically

If I recall, what we did was offer the user to export as either .csv or .xls. The .xls was really html table markup filled with the data, but Excel recognized that and imported it as utf8. This approach is discussed in the link above, though not the accepted answer.


#15

This may be too much of a tangent, but since this question is really about Excel and not about Elixir, I don’t feel too bad. If you know Ruby, this is the code my co-worker extracted for doing that “html table inside a .xls file” approach. Even if you don’t know Ruby, the README has more background too. https://github.com/livingsocial/excelinator


#16

Check out my library codepagex on hex.pm. You shoul be able to convert between iso-___- and utf8 easily. Also in the readme of that package, mentioned some simpler built in erlang functions for the same functionality


#17

Just to mention what I think in general avout this stuff: convert to utf8 at the boundaries of your program. Ie. when reading or writing to csv files. Then stick to utf8 inside your system


#18

No need to convert anything. The problem is that the customers reader (Excel) does default to some iso-encoding and therefore can’t read utf8 encoded CSV properly. The OPs system does already use utf8 internally completely as he describes.

Converting UTF8 to iso before writing isn’t a solution either, due to the possible loss of information.


#19

In my experience Excel doesnt generate utf8 csv files

Sendt fra min telefon


#20

Excel isn’t used as Writer/Generator, but as a Reader in this case…