Perspective distortion with Vix

TL;DR: Is there an Elixir lib out there that implements a version of Evision’s getPerspectiveTransform where I can just pass two lists of simple coordinates?

The X in my XY problem is that I recently started a new job that relies on a lot of image processing. Their current implementation is a little old and slow running on ImageMagick. I’ve been doing some spikes with Image and Vix (kudos and many thanks, Kip and Akash—very good work) to try and move it over to libvips but, as I’ve learned is sort of a thing, libvips does not support perspective distortions which we do need.

Thanks to a couple of good ol’ stack overflow posts and github comments, I’ve been able to get halfway there (code below for those who are interested) but now stuck on actually creating a transformation matrix.

I’ve read a bunch on the topic and will prob eventually figure it out, but I’m just wondering if anyone else has done this and has an easy answer :upside_down_face: Even if I do figure out how to use Evision in the way I want, it’s a pretty heavy dependency to bring in to call one function. Also just interested in general image discussion :slight_smile:

Anyway, thanks!

Here is the perspective code—sorry for the shitty variable names:

  def skew(%Vix.Vips.Image{} = image, {t0, t1, t2, t3, t4, t5, t6, t7} = _this_is_the_transform_matrix_i_need_to_generate) do
    alias Vix.Vips.Operation, as: O

    {width, height} = {Image.width(image), Image.height(image)}

    with {:ok, index_image} <- O.xyz(width, height),
         {:ok, copy_image} <- O.copy(index_image),
         {:ok, band_0} <- O.extract_band(index_image, 0),
         {:ok, band_1} <- O.extract_band(copy_image, 1),
         {:ok, x_1} <- O.linear(band_0, [t0], [0]),
         {:ok, x_2} <- O.linear(band_1, [t1], [0]),
         {:ok, x_3} <- O.add(x_1, x_2),
         {:ok, x_a} <- O.linear(x_3, [1], [t2]),
         {:ok, x_4} <- O.linear(band_0, [t6], [0]),
         {:ok, x_5} <- O.linear(band_1, [t7], [0]),
         {:ok, x_6} <- O.add(x_4, x_5),
         {:ok, x_b} <- O.linear(x_6, [1], [1]),
         {:ok, x} <- O.divide(x_a, x_b),
         {:ok, y_1} <- O.linear(band_0, [t3], [0]),
         {:ok, y_2} <- O.linear(band_1, [t4], [0]),
         {:ok, y_3} <- O.add(y_1, y_2),
         {:ok, y_a} <- O.linear(y_3, [1], [t5]),
         {:ok, y_4} <- O.linear(band_0, [t6], [0]),
         {:ok, y_5} <- O.linear(band_1, [t7], [0]),
         {:ok, y_6} <- O.add(y_4, y_5),
         {:ok, y_b} <- O.linear(y_6, [1], [1]),
         {:ok, y} <- O.divide(y_a, y_b),
         {:ok, mapimage} <- O.bandjoin([x, y]),
         {:ok, image} <- O.mapim(image, mapimage) do
      Image.write!(image, "is_it_working_yet.jpg")
    end
  end
2 Likes

So after a good sleep the matrix math stuff I was reading seemed less scary.

There are still some things happening that I don’t understand but for all intents and purposes, I got it working!

Here’s the result for anyone interested. If I can get this fully working for my use-case, I’ll write up a post about it.

So this function takes an image and a list of destination points to skew it to. The destination points are just given as a flat list of repeated x, ys in the order: [bottom_left_x, bottom_right_y, bottom_right_x, bottom_right_y, top_right_x, top_right_y, top_left_x, top_left_y]. This implementation always uses the four corners of the original image as the source points.

This is obviously super spikey code that shouldn’t have that very specific point-order requirement:

defmodule Foo do
  def skew(%Vix.Vips.Image{} = image, points) do
    {width, height} = {Image.width(image), Image.height(image)}

    [sx1, sy1, sx2, sy2, sx3, sy3, sx4, sy4] = [0, 0, width, 0, width, height, 0, height]
    [dx1, dy1, dx2, dy2, dx3, dy3, dx4, dy4] = points

    src =
      [
        [sx1, sy1, 1, 0, 0, 0, -sx1 * dx1, -sy1 * dx1],
        [sx2, sy2, 1, 0, 0, 0, -sx2 * dx2, -sy2 * dx2],
        [sx3, sy3, 1, 0, 0, 0, -sx3 * dx3, -sy3 * dx3],
        [sx4, sy4, 1, 0, 0, 0, -sx4 * dx4, -sy4 * dx4],
        [0, 0, 0, sx1, sy1, 1, -sx1 * dy1, -sy1 * dy1],
        [0, 0, 0, sx2, sy2, 1, -sx2 * dy2, -sy2 * dy2],
        [0, 0, 0, sx3, sy3, 1, -sx3 * dy3, -sy3 * dy3],
        [0, 0, 0, sx4, sy4, 1, -sx4 * dy4, -sy4 * dy4],
      ]

    dest = [dx1, dx2, dx3, dx4, dy1, dy2, dy3, dy4]

    result = Nx.LinAlg.solve(Nx.tensor(src), Nx.tensor(dest))

    [t0, t1, t2, t3, t4, t5, t6, t7] = Nx.to_list(result)

    alias Vix.Vips.Operation, as: O

    with {:ok, index_image} <- O.xyz(width, height),
         {:ok, copy_image} <- O.copy(index_image),
         {:ok, band_0} <- O.extract_band(index_image, 0),
         {:ok, band_1} <- O.extract_band(copy_image, 1),
         {:ok, x_1} <- O.linear(band_0, [t0], [0]),
         {:ok, x_2} <- O.linear(band_1, [t1], [0]),
         {:ok, x_3} <- O.add(x_1, x_2),
         {:ok, x_a} <- O.linear(x_3, [1], [t2]),
         {:ok, x_4} <- O.linear(band_0, [t6], [0]),
         {:ok, x_5} <- O.linear(band_1, [t7], [0]),
         {:ok, x_6} <- O.add(x_4, x_5),
         {:ok, x_b} <- O.linear(x_6, [1], [1]),
         {:ok, x} <- O.divide(x_a, x_b),
         {:ok, y_1} <- O.linear(band_0, [t3], [0]),
         {:ok, y_2} <- O.linear(band_1, [t4], [0]),
         {:ok, y_3} <- O.add(y_1, y_2),
         {:ok, y_a} <- O.linear(y_3, [1], [t5]),
         {:ok, y_4} <- O.linear(band_0, [t6], [0]),
         {:ok, y_5} <- O.linear(band_1, [t7], [0]),
         {:ok, y_6} <- O.add(y_4, y_5),
         {:ok, y_b} <- O.linear(y_6, [1], [1]),
         {:ok, y} <- O.divide(y_a, y_b),
         {:ok, mapimage} <- O.bandjoin([x, y]) do
      O.mapim(image, mapimage)
    end
  end
end

Usage:

iex> {:ok, image} = Image.open("path/to/image.jpg")
iex> {:ok, result} = Foo.skew(image, [0, 0, Image.width(image), 0, Image.width(image), Image.height(image), 0, Image.height(image)]
iex> Image.write(result, "foo.jpg")

A bit of a trolling example but if it works you should get a completely unaltered image :slight_smile: Otherwise, play around with the point values.

2 Likes
         {:ok, x_1} <- O.linear(band_0, [t0], [0]),
         {:ok, x_2} <- O.linear(band_1, [t1], [t2]),
         {:ok, x_a} <- O.add(x_1, x_2),
         {:ok, x_4} <- O.linear(band_0, [t6], [0]),
         {:ok, x_5} <- O.linear(band_1, [t7], [1]),
         {:ok, x_b} <- O.add(x_4, x_5),
         {:ok, x} <- O.divide(x_a, x_b),
         {:ok, y_1} <- O.linear(band_0, [t3], [0]),
         {:ok, y_2} <- O.linear(band_1, [t4], [t5]),
         {:ok, y_a} <- O.add(y_1, y_2),
         {:ok, y_4} <- O.linear(band_0, [t6], [0]),
         {:ok, y_5} <- O.linear(band_1, [t7], [1]),
         {:ok, y_b} <- O.add(y_4, y_5),
         {:ok, y} <- O.divide(y_a, y_b),

Slight simplification, rather than multiplying by 1 and adding a new value, you can add the constant in the previous call to linear (instead of adding 0).

2 Likes

Oh great, thank you! I saw something like this in one of the many examples I saw, but honestly I hardly understand what these functions are doing. All I get is that we’re extraction colour channels and, uh, doing stuff with them and composing them back together… and stuff :upside_down_face: I’m learning, though!

I’m also not sure the copy is necessary. I saw it in one example but not another.

I don’t know libvips specifics but when I see numerical work I want to optimize it, old habits die hard. Like how I want to see:

    [w, h] = [-width, -height]
    [_, _, dx2, dy2, dx3, dy3, dx4, dy4] = points

    src =
      [
        [0, 0, 1, 0, 0, 0, 0, 0],
        [w, 0, 1, 0, 0, 0, w * dx2, 0],
        [w, h, 1, 0, 0, 0, w * dx3, h * dx3],
        [0, h, 1, 0, 0, 0, 0, h * dx4],
        [0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, w, 0, 1, w * dy2, 0],
        [0, 0, 0, w, h, 1, w * dy3, h * dy3],
        [0, 0, 0, 0, h, 1, 0, h * dy4],
      ]

just to save some cycles but optimization is often at odds with writing easily understood code.

You can import image operators for solving the equation, that should make it read better.

  defp generate_map(width, height, tensor) do
    alias Vix.Vips.Operation, as: O
    import Kernel, except: [+: 2, *: 2, /: 2]
    import Image.Math, only: [+: 2, *: 2, /: 2]

    [t0, t1, t2, t3, t4, t5, t6, t7] = Nx.to_list(tensor)
    index = O.xyz!(width, height)

    x =
      (
        t = index * [t0, t1]
        x_a = t[0] + t[1] + t2

        t = index * [t6, t7]
        x_b = t[0] + t[1] + 1

        x_a / x_b
      )

    y =
      (
        t = index * [t3, t4]
        y_a = t[0] + t[1] + t5

        t = index * [t6, t7]
        y_b = t[0] + t[1] + 1

        y_a / y_b
      )

    O.bandjoin!([x, y])
  end

But, maybe Nx is better suited for these types of array operations. You can switch between Vix and Nx, back and forth, easily. Using Vix for reading, and writing image in different formats and using Nx for something like this.

And yes, making a copy is not necessary.

1 Like

Oh crazy! I totally missed those operators., thank you! The Ruby version did look much nicer to read. Readability isn’t my primary concern as this sort of code is sorta (hopefully) write once and forget (and maybe tweak once) and I would rather it be fast, but I’m definitely going to rewrite it your way as it is much cleaner.

Do you mind my asking what the ranged access is doing on %Vix.Vips.Image{} in this case? It seems to just return a different reference?

PS, thanks for Vix! It along with Image is really making my life easier.

Oh that’s not needed in this case. I was testing something else and forgot to remove.

It’s creating an image with first 2 bands (0…1), which is already is the case here.

Updated the post now.

Oh nice. That’s sorta what I thought but wasn’t sure but means it’s a good little shortcut :slight_smile:

Thanks again!

@sodapopcan this is great to see. I was planning to add a function to Image that wraps Evision.warpPerspective but it would be amazing to add this to Image without that dependency for this case. A PR would be warmly welcomed (doesn’t have to be perfect - if you get the math sorted out I can do the docs and ergonomics).

1 Like

That’s great to hear—I would love to have this as a part of Image!

I’ll put together a PR either this evening or tomorrow evening. Or possibly even during the day tomorrow as this is very relevant to my job :smiley:

1 Like

Excellent news. I was already planning to make Nx a non-optional dependency to Image given the many use cases and the very efficient interop @akash-akya implemented so it should be possible to have a functional and performant implementation I believe.

1 Like

Simple range access like my_image[0..1] interprets the range as the bands to be returned. More complex ranges (designed to be compatible with Nx range definitions), return slices. The access protocol definitions are at here.