Converting a smart cell to a one-file elixir script

I am trying to change a livebook smart cell to a one-file Elixir script, mostly for educational reasons. I successfully Mix.install dependencies, load the model (“Salesforce/blip-image-captioning-base”), create a featurizer, a tokenizer, run the configure step and create a serving.

serving =
  Bumblebee.Vision.image_to_text(model_info, featurizer, tokenizer, generation_config,
    compile: [batch_size: 1],
    defn_options: [compiler: EXLA]

And then I load an image like so:

image = 
  |> Nx.from_binary(:u8)
  |> ....

I cannot figure out what the last step in the image processing pipeline should be. The original smart cell does the following:

    image =
      |> Kino.Input.file_path()
      |> Nx.from_binary(:u8)
      |> Nx.reshape({image.height, image.width, 3})

The height and width are 3468, 4624 but when I hard-code the values and do

      |> Nx.reshape({3468, 4624, 3})

I get the following error:

  cannot reshape, current shape {4190582} is not compatible with new shape {3468, 4624, 3}

The very last step needs to be, image)

This is my first look at livebook - I don’t yet understand where image in the smart cell comes from.

Oh… Am I missing jpeg decoding machinery? How does kino do it?

Kino does it on the client. You can check the examples folder in Bumblebee. You can however use libraries like StbImage to read the file for you and NxImage for doing the resizing.

Yes! I have figured it out today!

For my batch job I used the Image module and then all I needed to do was

    {:ok, image} =
    {:ok, tensor} = Image.to_nx(image)

    %{results: [%{text: text}]} =, tensor)

Livebook is awesome for exploring ML. I have very little experience with ML and I am barely starting to learn Elixir and after two days I will have classified all images on disk.

Thank you!