Unable to recreate file after downloading it in chunks

I am trying to download files from Azure Blob Storage to process after they are uploaded by my users through a separate client app. For small files I have no issue downloading the data and writing it to a file, however larger ones timeout so I am using the http range header to download the file in smaller parts then write it to a file. My issue is that after downloading all the data I get an error (ErlangError) Erlang error: :no_translation:

If I download a small file that the entire thing fits inside the range I provide the same code works perfectly. This leads me to believe I may be doing something wrong when I combine the bitstrings.

Here is the code

      ranges = blob_ranges(get_blob_length!(blob_name, player_id)) # Ranges to be used for chunking files
      {last_byte, data} = Enum.reduce(ranges, {0, ""}, fn end_byte, acc  ->
        case get_blob!(blob_name, player_id, [timeout: 120], [{"x-ms-range", "bytes=#{elem(acc, 0)}-#{end_byte}"}]) do
          blob -> 
            {end_byte+1, elem(acc, 1) <> blob}
          {:error, err} -> 
              IO.puts("Request failed: #{Exception.message(err)}")
        end
      end)
      create_temp_file(blob_name, data)

I think the best bet would be to download the larger file locally, load it in and dump what the bytes look like compared to your algorithm/streamed parts, and modify the algorithm until it matches :thinking:

The no_translation code is returned by a couple spots in the BEAM’s file server, related to Unicode formatting. For instance:

Does the error still happen with larger text-only files?

Yes, I’m trying to download a large csv file. I’ve been able to now get data written to the file. However any file that I get via multiple requests ends up being corrupted and just random letters and characters are written to the file. Here is the updated code


 def download_blob_in_chunks(blob_name, player_id) do
  chunk_size = 1024
  length = get_blob_length!(blob_name, player_id)
  ranges = blob_ranges(length, chunk_size)
    IO.inspect(ranges)
  # Open a temporary file for writing binary
  {:ok, file} = File.open("./#{blob_name}", [:write, :utf8])

  # Iterate over the ranges, downloading chunks and writing them to the file
  Enum.reduce_while(ranges, 0, fn end_byte, acc ->
      if acc < length do
      case get_blob!(blob_name, player_id, [timeout: 120], [{"x-ms-range", "bytes=#{acc}-#{acc+chunk_size}"}, {"x-ms-range-get-content-md5", "true"}]) do
        blob ->
           # Convert the binary data to UTF-8 string before writing
          case :unicode.characters_to_binary(blob, :utf8, :utf8) do
            binary_chunk ->
              # Write the chunk to the file as UTF-8 text
              IO.write(file, binary_chunk)
              end_byte + 1

            _ -> 
              IO.puts("Failed to convert binary data to UTF-8")
              acc
            end

        {:error, err} ->
          IO.puts("Request failed: #{Exception.message(err)}")
      end
      else 
          {:halt, acc}
    end
  end)

This will not work. You chunk may straddle across a UTF-8 multi-byte sequence. You need to concatenate the binary raw.

1 Like