How do I get a history of the loss from Axon?

woohaaha · May 21, 2023, 9:19pm

I posted the SO question here too: machine learning - How to get the loss history using Elixir's Axon library? - Stack Overflow

Basically I can see how I can view the loss over time, but I don’t see how I can extract it after the loop is run.

something = fn state ->
  IO.inspect(state, label: "state_is")
  {:continue, state}
end

Axon.Loop.trainer(model, loss, optimizer)
  |> Axon.Loop.handle_event(:epoch_completed, something)
  |> Axon.Loop.run(loop, train_data, %{}, epochs: epochs)

With Keras it’s done like so:

history = model.fit(X_train, y_train, epochs=2)
loss_history = history.history["loss"]

polvalente · May 23, 2023, 1:31am

You can just accumulate the loss (or any value you want to accumulate) on a separate process using send or something like that instead of just IO.inspect

krasenyp · May 23, 2023, 4:37am

Did you give Axon.Loop.metric/5 a try?

woohaaha · May 23, 2023, 9:12pm

@krasenyp
I’m not sure metric/5 will work. I’m reading it as, it accepts an array and then applies either running_avg or running_sum on those

By default, metrics keep a running average of the metric calculation. You can override this behavior by changing accumulate

My search boils down to, how to get Axon.Loop.run to return the accumulated state.

Adding a metric does not change the output which with a single dense layer would look like:

%{
  "dense_0" => %{
    "bias" => #Nx.Tensor<
      f32[1]
      [0.036935850977897644]
    >,
    "kernel" => #Nx.Tensor<
      f32[1][1]
      [
        [1.0034809112548828]
      ]
    >
  }
}

So no metrics available here.

woohaaha · May 23, 2023, 9:22pm

I had thought of that but it felt like I was working around Axon rather than with it. On that same note, I can also (I think) write the state to a disk

Was really hoping there would be something built in like

Axon.Loop.run(loop, data, %{}, [return_history: [filter: :epoch]]) # 😅

Which will return the history from (values/metrics/etc…) from every epoch.

Right now I feel like I have to run training multiple times in order to pull the data I want if I make a mistake. On a larger dataset with more complex model this may be too time consuming

shanesveller · November 24, 2023, 1:20am

While trying to build my first-timer intuition about hyperparameters I also wished that the training loop run could return accuracy or other aggregate metrics in the final data structure, so that I could compare them against other runs more readily.

seanmor5 · November 24, 2023, 10:44am

All of the loop metrics and metadata are kept in the loop state struct. They are discarded at the end of the loop though if you leave the default output_transform as is because the default output_transform for a training loop is to just return the model state. I have an open issue to remove this behavior though.

For now, you can simply take your training loop (before running it) and do:

loop = %{loop | output_transform: & &1}

That will have the loop return the entire loop state which will include per epoch metrics