Binaries and Strings Length Confusion

I am reading a video file and want to create an Nx Tensors for each frame:

frames = file_contents # list of binaries
  |> Enum.map(fn frame -> 
     Nx.from_binary(frame, {:u, 8}) |> Nx.reshape({h, w, 3})
  end)

The tensor creation works, but the reshaping fails.
The bitstrings I am reading all have a different size according to byte_size, but String.length returns the expected length (width * height * 3 byte) for all frames equally.
The max difference in count between frames according to byte_size is about 7% (ca 42kb).

Questions

  • Why is there such a big difference between something seemingly equal?
  • How do I need to create the tensor, such that I can reshape it?

byte_size is the number of bytes. String.length/1 returns the number of Unicode graphemes in a UTF-8 string.. I’m not sure what your reshaping function is doing, but this is likely related to the cause.

1 Like

What is the difference between binaries and strings.

like <<34,34,23,55,>> and “asbvc”
how to convert to either

iex(1)> <<34,34,23,55>>
<<34, 34, 23, 55>>
iex(2)> "asbvc" |> inspect(binaries: :as_binaries)
"<<97, 115, 98, 118, 99>>"

They just aren’t the same thing at all? What does “convert” mean here?

EDIT: Re-reading your question, are you asking just in general about binaries and strings or those specific binaries?

If it’s about them in general Binaries in Elixir on Exercism talks through things, as does Binaries, strings, and charlists — Elixir v1.17.1