`Scholar.Linear.LogisticRegression.predict(model, test_X)` returns only zeroes

Pistrie · September 10, 2024, 4:55pm

Here is the relevant livebook snippet:

annotated livebook snippet

train_test_split = fn data_X, data_y, train_size ->
  {train_X, test_X} = Nx.split(data_X, train_size, axis: 0)
  {train_y, test_y} = Nx.split(data_y, train_size, axis: 0)

  {train_X, test_X, train_y, test_y}
end

#Function<40.3316493/3 in :erl_eval.expr/6>

matrix_X = Nx.stack(df_X, axis: 1)
matrix_y = Nx.stack(df_y) |> Nx.flatten()

{train_X, test_X, train_y, test_y} = train_test_split.(matrix_X, matrix_y, 0.7) |> dbg()

model = LogisticRegression.fit(train_X, train_y, num_classes: 2)
predictions = LogisticRegression.predict(model, test_X)
|> IO.inspect(limit: :infinity)

The train test split output:

{#Nx.Tensor<
   f64[500][7]
   EXLA.Backend<host:0, 0.3224788307.3268018198.174312>
   [
     [22.0, 7.25, 1.0, 0.0, 1.0, 0.0, 0.0],
     [38.0, 71.2833, 0.0, 1.0, 0.0, 1.0, 0.0],
     [26.0, 7.925, 0.0, 1.0, 1.0, 0.0, 0.0],
     [35.0, 53.1, 0.0, 1.0, 0.0, 1.0, 0.0],
     [35.0, 8.05, 1.0, 0.0, 1.0, 0.0, 0.0],
     [54.0, 51.8625, 1.0, 0.0, 0.0, 1.0, 0.0],
     [2.0, 21.075, 1.0, 0.0, 1.0, 0.0, 0.0],
     ...
   ]
 >,
 #Nx.Tensor<
   f64[214][7]
   EXLA.Backend<host:0, 0.3224788307.3268018198.174313>
   [
     [32.0, 30.5, 1.0, 0.0, 0.0, 1.0, 0.0],
     [9.0, 27.9, 0.0, 1.0, 1.0, 0.0, 0.0],
     [28.0, 13.0, 0.0, 1.0, 0.0, 0.0, 1.0],
     [32.0, 7.925, 1.0, 0.0, 1.0, 0.0, 0.0],
     [31.0, 26.25, 1.0, 0.0, 0.0, 0.0, 1.0],
     [41.0, 39.6875, 0.0, 1.0, 1.0, 0.0, 0.0],
     [20.0, 7.8542, 1.0, 0.0, 1.0, 0.0, ...],
     ...
   ]
 >,
 #Nx.Tensor<
   s64[500]
   EXLA.Backend<host:0, 0.3224788307.3268018198.174314>
   [0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, ...]
 >,
 #Nx.Tensor<
   s64[214]
   EXLA.Backend<host:0, 0.3224788307.3268018198.174315>
   [1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, ...]
 >}

The IO.inspect() output:

#Nx.Tensor<
  s64[214]
  EXLA.Backend<host:0, 0.3224788307.3268018198.174324>
  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>

#Nx.Tensor<
  s64[214]
  EXLA.Backend<host:0, 0.3224788307.3268018198.174324>
  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
>

I’m pretty sure I did it correctly. Can someone spot a mistake of mine, or is this a problem with Scholar? Thanks

krstopro · September 11, 2024, 12:48am

Hi @Pistrie! I am not able to reproduce your result as I don’t have df_X and df_Y. Could you maybe share those? Also, it is a good idea to set a random key in Nx.split/2 to keep the results reproducible.

Pistrie · September 11, 2024, 7:57am

I’ve DMed you the notebook and the CSV file