Binaries and Strings Length Confusion

I am reading a video file and want to create an Nx Tensors for each frame:

frames = file_contents # list of binaries
  |> Enum.map(fn frame -> 
     Nx.from_binary(frame, {:u, 8}) |> Nx.reshape({h, w, 3})
  end)

The tensor creation works, but the reshaping fails.
The bitstrings I am reading all have a different size according to byte_size, but String.length returns the expected length (width * height * 3 byte) for all frames equally.
The max difference in count between frames according to byte_size is about 7% (ca 42kb).

Questions

  • Why is there such a big difference between something seemingly equal?
  • How do I need to create the tensor, such that I can reshape it?

byte_size is the number of bytes. String.length/1 returns the number of Unicode graphemes in a UTF-8 string.. I’m not sure what your reshaping function is doing, but this is likely related to the cause.

1 Like