How is the community currently evaluating AI applications?

A pluggable backend sounds a bit strange though because you’d have a proscribed schema/API required. Do you mean like allow swapping a storage layer as opposed to the entire backend?

Yes, the idea is that there is a core Store interface (link) that can be implemented by any storage layer. I went ahead and pushed ash_ex_eval for you to see. I one shotted it with claude code so it needs a lot of work but hoping it’ll demonstrate what I’m thinking.

I don’t see any code there.

If the goal is to allow people to bring their own database, then you’d want to look at allowing people to swap either the Ash data layer or the Ecto repo that powers some piece of generic code IMO, as opposed to having entirely different implementations of the same app.

The repo is entirely a mix task that generates all of the resources needed for evaluations. The code also generates a Store implementation (link) which then allows users to use the pipeline syntax above for doing evals.


  ExEval.new()
  |> ExEval.put_store(MyApp.Evaluations.Store)
  |> ExEval.put_dataset(dataset)
  |> ExEval.put_response_fn(response_fn)
  |> ExEval.put_experiment(:my_experiment)
  |> ExEval.run()

The intention is to be able to bring my own database resources and then modify as needed, kind of like your approach with mix ash_ai.gen.chat.

Later on, there’s no reason why some web layer couldn’t just implement the pipeline syntax above, along with some of the other things provided in ex_eval for realtime/liveviews.

Ah, sorry I didn’t look at the mix tasks :laughing: