[Book Question] Machine Learning in Elixir: Poor accuracy for Chapter 1's example

djaouen · July 21, 2023, 10:19pm

Hello,

I am working through the examples in the new book, Machine Learning in Elixir and I am having an issue with poor accuracy in Chapter 1’s example. You can find the Livebook I created here: https://github.com/danieljaouen/machine-learning-in-elixir/blob/main/machine-learning-in-elixir-chapter-1.livemd

And here is the accuracy I am getting on my machine:

Batch: 0, accuracy: 0.0666667
%{
  0 => %{
    "accuracy" => #Nx.Tensor<
      f32
      0.06666667014360428
    >
  }
}

However, the training accuracy seems fine:

Epoch: 0, Batch: 450, accuracy: 0.8331868 loss: 0.5048826
Epoch: 1, Batch: 450, accuracy: 0.8779556 loss: 0.4173653
Epoch: 2, Batch: 450, accuracy: 0.9101056 loss: 0.3732252
Epoch: 3, Batch: 450, accuracy: 0.9288760 loss: 0.3434850
Epoch: 4, Batch: 450, accuracy: 0.9367946 loss: 0.3209158
Epoch: 5, Batch: 450, accuracy: 0.9416718 loss: 0.3026979
Epoch: 6, Batch: 450, accuracy: 0.9494675 loss: 0.2874412
Epoch: 7, Batch: 450, accuracy: 0.9583363 loss: 0.2743504
Epoch: 8, Batch: 450, accuracy: 0.9583363 loss: 0.2629215
Epoch: 9, Batch: 450, accuracy: 0.9626405 loss: 0.2528131

Not sure what I am doing wrong here. Any help? Thanks in advance!

t12a · July 22, 2023, 7:06am

96% accuracy is not bad. IMHO.

bdarla · July 22, 2023, 7:24am

Your code is correct (in accordance with the book).
In some runs, I also noticed low accuracy. This is because of the small dataset (150 samples).

If you re-run the steps from the shuffle step and below, then you will receive different results every time. In some cases, it can easily be 96% accuracy. Just, rerun the experiment.

djaouen · July 22, 2023, 5:04pm

Am I looking at the accuracy score wrong? Is it 1 - 0.0667 and not 0.0667? Sorry for being so confused lol.

djaouen · July 22, 2023, 5:08pm

I tried re-running it, and now it’s even worse:

Batch: 0, accuracy: 0.0000000
%{
  0 => %{
    "accuracy" => #Nx.Tensor<
      f32
      0.0
    >
  }
}

Is there a way to pull the actual predictions from Axon.Loop.evaluator? I tried removing the accuracy metric, but that just returns an empty map. How can I compare the predicted values with y_test?

grossvogel · July 23, 2023, 2:24am

I ran into this also, and decided it had to be some kind of typo with how the test set is set up. After a lot of head scratching, I think there’s a more subtle error with the setup of the test data. I believe when the species are assigned their positions in the one-hot encoding vector, that order is determined by the order in which the species are encountered in the test and training data.

For instance, if the species of the first 3 rows of the training set are "Iris-virginica", "Iris-setosa", ""Iris-versicolor", then those entries in the train_y data will look like [1, 0, 0], [0, 1, 0], [0, 0, 1] and the model will learn to predict [1, 0, 0] if the features match what it’s learned about “Iris-virginica.”

If the species are encountered in a different order in the test data, then we may end up with “Iris-virginica” having the 2nd position instead of the first in the test_y data, so the model will predict [1, 0, 0] but the scoring logic will be comparing against [0, 1, 0]

grossvogel · July 23, 2023, 2:24am

You can see what the model is predicting for the test data with Axon.predict/4

Axon.predict(model, trained_model_state, x_test)

grossvogel · July 23, 2023, 1:50pm

I had a few more minutes to play with this, and so far it looks like we can get better results by processing the x and y data into tensors before splitting up test and training sets.

feature_columns = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
label_column = "species"

x_all = Nx.stack(shuffled_normalized_iris[feature_columns], axis: 1)

y_all =
  shuffled_normalized_iris[label_column]
  |> Explorer.Series.cast(:category)
  |> Nx.stack(axis: -1)
  |> Nx.equal(Nx.iota({1, 3}, axis: -1))

x_train = x_all[0..119]
x_test = x_all[120..149]

y_train = y_all[0..119]
y_test = y_all[120..149]

djaouen · July 23, 2023, 2:45pm

Thanks, I will try this. Could it be that the one-hot encoding creates different category values since we are encoding different sets of labels? I think that might explain the low accuracy, if so.

Edit: Yep, I just tried it, and this seems to have fixed the problem. Thanks, @grossvogel!

shawn_leong · August 8, 2023, 2:52am

I ran into the same problem too.

First place I had looked was the accompanying Livebooks from PragProg that clued me in to the ordering.

Here’s the code from the accompanying Livebook:

feature_columns = [
  "sepal_length",
  "sepal_width",
  "petal_length",
  "petal_width"
]

label_column = "species"

x_train = Nx.stack(train_df[feature_columns], axis: 1)

y_train =
  train_df
  |> DF.pull(label_column)
  |> Explorer.Series.to_list()
  |> Enum.map(fn
    "Iris-setosa" -> 0
    "Iris-versicolor" -> 1
    "Iris-virginica" -> 2
  end)
  |> Nx.tensor(type: :u8)
  |> Nx.new_axis(-1)
  |> Nx.equal(Nx.iota({1, 3}, axis: -1))

x_test = Nx.stack(test_df[feature_columns], axis: 1)

y_test =
  test_df
  |> DF.pull(label_column)
  |> Explorer.Series.to_list()
  |> Enum.map(fn
    "Iris-setosa" -> 0
    "Iris-versicolor" -> 1
    "Iris-virginica" -> 2
  end)
  |> Nx.tensor(type: :u8)
  |> Nx.new_axis(-1)
  |> Nx.equal(Nx.iota({1, 3}, axis: -1))

Notice it manually maps the categorical labels to {0,1,2}, before converting it to a tensor to be one-hot encoded

I have to say I really prefer @grossvogel’s alternative code that uses Explorer.Series.cast(:category) instead since Explorer already does provide a convenient method for one-hot encoding.

I’ve reported this issue to @seanmor5 as an Errata on the official devtalk forum referencing this thread.

Edit: Added the link to download the accompanying Livebooks from PragProg.