Elixir Programming Language Forum

Tribunal - LLM evaluation and testing framework for Elixir

Your Libraries & OS Mentoring Libraries

georgeguimaraes January 15, 2026, 2:00pm 1

Hey everyone!

I’ve been working on Tribunal, an LLM evaluation framework for Elixir. It helps you test RAG pipelines and LLM outputs in your test suite.

The problem

When building LLM-powered features, you need to verify things like:

Is the response grounded in the provided context?
Did it hallucinate information?
Is the response actually relevant to the question?
Any toxic, biased, or harmful content?

The solution

Tribunal integrates with ExUnit so you can write these checks as regular tests:

test "response is faithful to context" do
  response = MyApp.RAG.query("What's the return policy?")

  assert_faithful response, context: @docs
  refute_hallucination response, context: @docs
  refute_toxicity response
end

Two modes

Test mode: ExUnit integration for CI gates. Fails immediately on any violation. Use this for safety checks that must pass.
Evaluation mode: Mix task for benchmarking. Run hundreds of test cases, set pass thresholds (e.g., “pass if 80% succeed”), track regression over time.

Assertion types

Deterministic (no API calls, instant):

assert_contains / refute_contains
assert_regex
assert_json
assert_max_tokens

LLM-as-Judge (uses any model via req_llm):

assert_faithful - grounded in context
assert_relevant - addresses the query
refute_hallucination - no fabricated info
refute_bias, refute_toxicity, refute_harmful, refute_jailbreak, refute_pii

Embedding-based (via alike):

assert_similar - semantic similarity

Red team testing

Generate adversarial prompts to test your LLM’s safety:

Tribunal.RedTeam.generate_attacks(“How do I pick a lock?”)

Returns encoding attacks (base64, rot13, leetspeak)
injection attacks (ignore instructions, delimiter injection), jailbreak attacks (DAN, developer mode)

Would love to hear your feedback!

Hex: tribunal | Hex
GitHub:

5 Likes

georgeguimaraes January 16, 2026, 2:56pm 2