Scholar.Preprocessing.one_hot_encode does not work?

Hey all. I’m working through some of my old torch into Nx. I ran into a particular problem that I’m hoping someone can shed some light on.

I’m attempting to one-hot encode a Nx.tensor, but using Scholar I get an error. Here’s the code, followed by error:

tensor = Nx.tensor(5)
num_classes = 27
Scholar.Preprocessing.one_hot_encode(tensor, num_classes: num_classes)

This gives me this error:

** (ArgumentError) given axis (0) invalid for shape with rank 0
    (nx 0.7.3) lib/nx/shape.ex:1121: Nx.Shape.normalize_axis/4
    (nx 0.7.3) lib/nx.ex:14975: anonymous fn/3 in Nx.sort/2
    (nx 0.7.3) lib/nx.ex:5368: Nx.apply_vectorized/2
    (scholar 0.3.1) lib/scholar/preprocessing/ordinal_encoder.ex:53: Scholar.Preprocessing.OrdinalEncoder."__defn:fit_n__"/2
    (nx 0.7.3) lib/nx/defn/compiler.ex:218: Nx.Defn.Compiler.__remote__/4
    (scholar 0.3.1) lib/scholar/preprocessing/one_hot_encoder.ex:62: Scholar.Preprocessing.OneHotEncoder."__defn:fit_n__"/2
    (scholar 0.3.1) lib/scholar/preprocessing/one_hot_encoder.ex:133: Scholar.Preprocessing.OneHotEncoder."__defn:fit_transform__"/2
    #cell:bdn3o6ty5cug3rcb:8: (file)

I dug a bit into the original Scholar code that added one hot encoding here: Add ordinal and one-hot encodings by msluszniak · Pull Request #26 · elixir-nx/scholar · GitHub It looks like this is no longer the code that does one-hot encoding.

I tested it out, and this seems to do what I expect. For example:

tensor = Nx.tensor(5)
num_classes = 27
Nx.equal(
  Nx.new_axis(tensor, -1),
  Nx.iota({1, num_classes})
)

This looks correct to me:

#Nx.Tensor<
  u8[1][27]
  EXLA.Backend<host:0, 0.1032028734.2104360976.142186>
  [
    [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  ]
>

Am I doing something wrong or is there some sort of bug in the one_hot_encode function?

Started to do my own investigation of this. Looks like the issue is with the Nx.sort call here: scholar/lib/scholar/preprocessing/ordinal_encoder.ex at 8712e96189983e56f32050ba91874e115ed70e1d · elixir-nx/scholar · GitHub

It doesn’t properly work for scalar tensors:

Nx.sort(Nx.tensor(5))

Returns the same error:

** (ArgumentError) given axis (0) invalid for shape with rank 0
    (nx 0.7.3) lib/nx/shape.ex:1121: Nx.Shape.normalize_axis/4
    (nx 0.7.3) lib/nx.ex:14975: anonymous fn/3 in Nx.sort/2
    (nx 0.7.3) lib/nx.ex:5368: Nx.apply_vectorized/2
    #cell:wjcoh5dp2qo2qlha:4: (file)

So this seems like a bug (or feature?) in Nx itself.

However, Nx.sort does work with 1D tensors, and doesn’t error out. So attempting the original with a 1D tensor, I get a different error.

Scholar.Preprocessing.one_hot_encode(
    Nx.tensor([5]),
    num_classes: 27
)
** (ArgumentError) index -2 is out of bounds for axis 0 in shape {1}
    (nx 0.7.3) lib/nx/tensor.ex:196: Nx.Tensor.normalize_index/3
    (nx 0.7.3) lib/nx/tensor.ex:145: Nx.Tensor.fetch_axes/7
    (nx 0.7.3) lib/nx/tensor.ex:92: Nx.Tensor.fetch_axes/2
    (nx 0.7.3) lib/nx/tensor.ex:56: Nx.Tensor.fetch/2
    (elixir 1.17.1) lib/access.ex:322: Access.get/3
    (scholar 0.3.1) lib/scholar/preprocessing/ordinal_encoder.ex:59: Scholar.Preprocessing.OrdinalEncoder."__defn:fit_n__"/2
    (nx 0.7.3) lib/nx/defn/compiler.ex:218: Nx.Defn.Compiler.__remote__/4
    #cell:wjcoh5dp2qo2qlha:2: (file)

So there’s probably 2 different issues going on here. :slight_smile: I’m going to continue digging to see if I can find a resolution to this problem.

The ordinal encoder is expecting a tensor with at least two elements. Once you fulfill that requirement, it should work. Can you please open up an issue on Scholar? We should check for the shape and “hardcode” the answer if a tensor of size 1 is given. We should probably check the shapes in general.

1 Like

Good to know - thank you. I’ve submitted the issue here: `one_hot_encode` errors with tensor of size 1 · Issue #290 · elixir-nx/scholar · GitHub