Zip file name encoding

Hi,

I’m trying to unzip an uploaded file and store the compressed files to my database. I’ve a working process but I’ve just discovered that files with non ascii chars in names are stored weirdly in my postgres database.

How I unzip and save file is :

with {:ok, handle} <- :zip.zip_open(path, [{:cwd, path_name}]),
      {:ok, file_names} = :zip.zip_get(handle) do
  try do
    filter_hidden_files(file_names)
    |> to_plug_upload()
    |> to_multi(user, kind, Multi.new())
    |> Repo.transaction()
  after
    :zip.zip_close(handle)
  end
end

defp filter_hidden_files(_, files \\ [])

defp filter_hidden_files([], files) do
  files
end

defp filter_hidden_files([file | tail], files) do
  filename = Path.basename(file)

  case String.first(filename) === "." do
    true -> filter_hidden_files(tail, files)
    false -> filter_hidden_files(tail, files ++ [file])
  end
end

defp to_plug_upload(_, uploads \\ [])

defp to_plug_upload([], uploads) do
  uploads
end

defp to_plug_upload([file | tail], uploads) do
  upload = %Plug.Upload{
    content_type: MIME.from_path(file),
    filename: Path.basename(file),
    path: to_string(file)
  }

  to_plug_upload(tail, uploads ++ [upload])
end

defp to_multi([], %User{}, _kind, multi) do
  multi
end

defp to_multi(
        [file | tail],
        %User{entity_name: entity_name} = user,
        kind,
        multi
      ) do
  attrs = %{
    brand: entity_name,
    name: file,
    kind: kind,
    file_hash: file_hash(file),
    file_name: file.filename
  }

  m =
    %Media{}
    |> Media.changeset(attrs)

  to_multi(tail, user, kind, multi |> Multi.insert(file.filename, m))
end

When I try to upload and save a file called 2809575_xxléàî.jpg I get the following saved in database 2809575_xxléàî.jpg?63728256310

I think there is an encoding problem when reading files using erlang :zip module.

Thanks.

As far as I know, ZIP does not enforce any encoding of filenames, it is just saved as it comes from the operating system as raw bytes.

So how well non-asciis get carried over depends on where you zip and where you unzip, but as one does not use non-ascii in filenames, this in not a problem at all :wink:

If you know the encoding used by the “zipper”, then you should be able to use iconv like tools to transfer in any encoding you like.

Also remember, that you get double trouble if the encoding in the database differs from the encoding of data you write to it.

1 Like