:zip.zip_list_dir not happy with "crazy" filenames (omit chars)

I have two errors to report. Elixir forum can not handle the chars I’m trying to submit. Erlang :zip doesn’t appear to either. I’m having to redact the chars from my question/observation. I get an internal server error when I tried to save my question.

The sequence of chars from a hexdump are:
0000000 2d20 85e3 e38b 8b85 85e3 e38b 8b85 000a
The sequence from okteta:
20 2D E3 85 8B E3 85 8B E3 85 8B E3 85

I am trying read the contents of a zip file. The .zip file contains a file called “cloud/1533930101293-(omited chars).docx”

When I call zip_list_dir :zip.zip_list_dir(zip_handle) I get a list of the files. However, the file with the “(omited chars)” gets printed as “cloud/1533930101293-ãããã.docx”

Here is some of the iex output of :zip.zip_list_dir:
{:zip_file, ‘cloud/1533930179386-test.docx’,
{:file_info, 6368, :regular, :read_write, {{2018, 8, 10}, {19, 50, 10}},
{{2018, 8, 10}, {19, 50, 10}}, {{2018, 8, 10}, {19, 50, 10}}, 54, 1, 0, 0,
0, 0, 0}, [], 2621, 5676},
{:zip_file,
[99, 108, 111, 117, 100, 47, 49, 53, 51, 51, 57, 51, 48, 49, 48, 49, 50, 57,
51, 45, 227, 133, 139, 227, 133, 139, 227, 133, 139, 227, 133, 139, 46,
100, 111, 99, 120],
{:file_info, 6470, :regular, :read_write, {{2018, 8, 10}, {19, 50, 10}},
{{2018, 8, 10}, {19, 50, 10}}, {{2018, 8, 10}, {19, 50, 10}}, 54, 1, 0, 0,
0, 0, 0}, [], 8356, 5779}

Note the first :zip_file “cloud/1533930179386-test.docx” displays nicely. However, the “crazy” file displays as a list. I haven’t been able to get the filename returned by :zip to display correctly

tmp = [99, 108, 111, 117, 100, 47, 49, 53, 51, 51, 57, 51, 48, 49, 48, 49, 50, 57,
51, 45, 227, 133, 139, 227, 133, 139, 227, 133, 139, 227, 133, 139, 46,
100, 111, 99, 120]

Before I go crazy with my hex editor, I’d like to know if I’m missing something silly. Or if :zip doesn’t support this type of encoding. Ubuntu & Arch Linux’s zip 3.0 application are able to unzip the file correctly. Chrome can copy and paste the chars into the text window. (I just get errors when I try to save it)

So I think part of the problem, is the type of encoding used for the filename

:erlang.iolist_to_binary(tmp) |> String.graphemes
[“c”, “l”, “o”, “u”, “d”, “/”, “1”, “5”, “3”, “3”, “9”, “3”, “0”, “1”, “0”, “1”,
“2”, “9”, “3”, “-”, “ㅋ”, “ㅋ”, “ㅋ”, “ㅋ”, “.”, “d”, “o”, “c”, “x”]

So then I tried to_string(tmp) |> String.codepoints
[
“c”,
“l”,
“o”,
“u”,
“d”,
“/”,
“1”,
“5”,
“3”,
“3”,
“9”,
“3”,
“0”,
“1”,
“0”,
“1”,
“2”,
“9”,
“3”,
“-”,
“ã”,
<<194, 133>>,
<<194, 139>>,
“ã”,
<<194, 133>>,
<<194, 139>>,
“ã”,
<<194, 133>>,
<<194, 139>>,
“ã”,
<<194, 133>>,
<<194, 139>>,
“.”,
“d”,
“o”,
“c”,
“x”
]

If you notice, where the funny chars show up you have another binary. I tried several different encodings, utf8, utf16 (big and little), utf32 (big and little), latin1, and a few others. If you find out what string encoding the file name is you should then be able to decode it.

You might also be able to get by with

iex(6)> :erlang.iolist_to_binary(tmp)
“cloud/1533930101293-ㅋㅋㅋㅋ.docx”

iex(10)> tmp == :erlang.iolist_to_binary(tmp) |> :binary.bin_to_list
true

What operating system are you using and what encoding does your terminal use?

@crazymevt, thanks for showing me iolist_to_binary. That appears to be getting the correct chars. (How did you get them to print in the web page? I keep getting internal errors).

@NobbZ bz, they are created on ubutnu 18.04 LTS using Ruby ‘zip’ gem.

require 'zip'
...
buffer = Zip::OutputStream.write_buffer do |out|
...
  files.each do | file |
    next if file[:response_code] != 200
    out.put_next_entry("cloud/#{file[:file_name_base]}.#{file[:file_ext]}")
    out.write file[:response_body]
  end
end

All I did was copy and paste from the terminal. My guess is it’s because I pasted the console results from :erlang.iolist_to_binary which probably encoded the chars as utf8 in the terminal.

This does not answer my question about the encoding of your terminal.

All the systems are:
en_US.UTF-8

(sorry miss read)

By passing my string to :erlang.binary_to_list before sending it to :zip allows me to get the files from zip.

:zip.zip_get(:erlang.binary_to_list(filename), zip)

And passing the filename returned by zip via :erlang.iolist_to_binary(name) displays the name correctly.

1 Like