Convert an image to dx3-ubyte to work with Nx

arpan · February 20, 2021, 9:38am

Hi, I was very excited after the latest talk about the new ML library Nx by Jose.

When trying out the example neural network that he showed in the talk on my machine it worked fine however I wanted to test it out with my own images.

But the problem was that the neural network requires images of the format dx3-ubyte which is the format of the MNIST dataset.

UByte is an 8-bit unsigned integral data type, with values ranging from 0 to 255 which is just the pixel value.

I was able to get an image in greyscale and resized and to the required 28*28 size via ImageMagick by
convert input.png -set colorspace Gray -separate -average -resize 28x28! output.png

However, I am not able to create the required dx3-ubyte image binary to feed into the neural network. I saw some existing stuff like this one but I wish there was a way to do this in elixir.

I saw elixir implementations to parse png images by binary pattern matching but I have no idea how to get a dx3-ubyte from the PNG.

I have very little knowledge of ML and image processing so I would love some help on this.

hauleth · February 20, 2021, 9:59am

I assume that this is IDX format that is described there. In that case it is quite simple format to parse:

<<0, 0, data_type, 2, height::32-unsigned-little, width::32-unsigned-little, data::binary>> = file

In your case the interesting part is data which is just sequence of bytes in row-order that describe each pixel of the input image (1 byte per pixel).

arpan · February 21, 2021, 9:05am

I was not able to extract the raw pixel information from the PNG images directly using binary pattern matching.

However, I found this library which solved the problem and gave me the pixel information from the PNG image.

I resized and converted the image to monochrome using Image Magick and I was then able to load the image in the required MNIST format by doing the following…

defmodule ImageParser do

  # Load pixel values from a png image
  # Uses the Pixels library to read pixel data from PNG image
  def load_img(path) do
    {:ok, %Pixels{data: data}} = Pixels.read_file(path)
    data
  end

  # Use ImageMagick to resize the image to 28 * 28
  # Also, converts image to monochrome(black and white)
  def resize(input_file_path) do
    System.cmd("convert", [
      input_file_path,
      "-resize",
      "28x28!",
      "-monochrome",
      "/tmp/output.png"
    ])
  end

  # Take image pixel data and read it as dx3-ubyte which is the MNIST data set format
  # Return a tensor created from the image data
  def parse_and_make_tensor(image_data) do
    image_data
    |> :binary.bin_to_list()                     # Convert Image binray to list
    |> Enum.chunk_every(4)                       # Chunk every 4 values to get a singal RGBA pixel
    |> Enum.map(fn                               # Since we are processing a back and white image there
      [0, 0, 0, 255] -> 0                        # can be only two pixels values rgba(0, 0, 0, 255) -> Black and rgba(255,255, 255, 255) -> White
      [255, 255, 255, 255] -> 255                # We map Black pixels to a single value 0 and white pixels to a single value 255
    end)
    |> :binary.list_to_bin()                     # Convert list back to binary
    |> Nx.from_binary({:u, 8})                   # Create a Nx tensor
  end
end