ExUnitJSON - AI-friendly JSON test output for ExUnit

I’ve been building an Elixir library that I think could be useful to others working with AI coding assistants like Claude Code, Cursor, or similar tools.

The Problem

When you’re pair-programming with an AI assistant and tests fail, the AI needs to parse ExUnit’s output to understand what went wrong. The default CLI output is designed for humans - colorful, formatted, with helpful messages. But AI assistants end up doing gymnastics with grep, tail, and regex to extract the actual failure information.

This creates friction. The AI might miss important context, truncate stacktraces, or misinterpret assertion values. And when you have 50+ test failures with the same root cause, the AI has to wade through walls of text to understand the pattern.

The Solution: mix test.json

GitHub - ZenHive/ex_unit_json: AI-friendly JSON test output for ExUnit provides structured JSON output from ExUnit:

mix test.json --quiet --summary-only

{
“version”: 1,
“seed”: 12345,
“summary”: {
“total”: 150,
“passed”: 148,
“failed”: 2,
“skipped”: 0,
“result”: “failed”
}
}

It’s a drop-in addition to your project - all standard mix test options work (–only, --exclude, file paths, line numbers).

Installation

# mix.exs

def deps do
\[{:ex_unit_json, “\~> 0.1.0”, only: \[:dev, :test\], runtime: false}\]
end

def cli do
\[preferred_envs: \[“test.json”: :test\]\]
end

Requires Elixir 1.18+ (uses the built-in :json module - zero runtime dependencies).

Key Features

Fast Iteration on Failures

The --failed flag re-runs only previously failed tests (using .mix_test_failures). Combined with --first-failure, you can fix tests one at a time:

mix test.json --quiet --failed --first-failure

This outputs just the first failure with full assertion details and stacktrace. Fix it, run again, repeat until green.

Group Failures by Root Cause

When 47 tests fail because a service is down, you don’t want to see 47 identical error messages:

mix test.json --quiet --group-by-error --summary-only

{
“error_groups”: \[
{“pattern”: “Connection refused”, “count”: 47, “example”: {…}},
{“pattern”: “Invalid token”, “count”: 3, “example”: {…}}
\]
}

Now you know there are really just 2 issues, not 50.

Filter Expected Failures

Working on a project with integration tests that need API credentials? Filter those out to see real bugs:

mix test.json --quiet --group-by-error --filter-out “credentials” --filter-out “rate limit”

The filtered count in the summary shows how many matched your patterns.

Full Failure Details

Every failed test includes:

  • Assertion expression, left value, right value
  • Structured stacktrace with file/line/module/function
  • Test tags and duration
{
“failures”: \[{
“kind”: “assertion”,
“message”: “Assertion with == failed”,
“assertion”: {
“expr”: “1 + 1 == 3”,
“left”: “2”,
“right”: “3”
},
“stacktrace”: \[
{“file”: “test/math_test.exs”, “line”: 15, “module”: “MathTest”, “function”: “test addition”}
\]
}\]
}

The AGENT.md Pattern

One thing I’m experimenting with is including an AGENT.md file in the package. This is documentation specifically for AI assistants - explaining the optimal workflows, flag combinations, and common patterns.

From the file:

Start Here (Default Workflow)

Most common pattern - fast iteration on failures:

First run or after code changes

mix test.json --quiet --summary-only

Iterating on failures (ALWAYS use --failed for speed)

mix test.json --quiet --failed --first-failure

The idea is that when an AI assistant (Claude Code, Cursor, etc.) reads your project, it finds AGENT.md and immediately knows how to use the tool effectively. No guessing, no trial and error.

I’d be curious if others are doing similar things - documenting their libraries specifically for AI consumption.

Schema Stability

The JSON output follows a documented v1 schema (in the README). The version field allows for future evolution without breaking existing integrations. Breaking changes would bump the version number.

Recommended Workflow for AI Assistants

  1. Initial assessment: mix test.json --quiet --group-by-error --summary-only
  2. Filter noise: mix test.json --quiet --group-by-error --filter-out “credentials” --summary-only
  3. Fix one at a time: mix test.json --quiet --failed --first-failure
  4. Verify fix: mix test.json --quiet --failed --summary-only

Try It Out

GitHub:

The library is feature-complete for v0.1.0 (150 tests, 90%+ coverage). I’m planning to publish to Hex.pm shortly.

Feedback welcome - especially on:

  • Missing features that would help your AI-assisted workflow
  • Edge cases in test output parsing
  • The AGENT.md concept - useful or unnecessary?

Built this while working with Claude Code and realizing how much time was being lost to parsing text output. Sometimes the best tools come from scratching your own itch.

6 Likes

Update: Now available on Hex.pm

The package is published and ready to use:

# mix.exs

def deps do
\[{:ex_unit_json, “\~> 0.1.2”, only: \[:dev, :test\], runtime: false}\]
end

def cli do
\[preferred_envs: \[“test.json”: :test\]\]
end

The cli/0 function is required - without it you’ll get “mix test” is running in the “dev” environment.

Then:

mix deps.get
mix test.json --quiet --summary-only

Quick start workflow:

Initial assessment

mix test.json --quiet --summary-only

Iterate on failures (fast - only reruns failed tests)

mix test.json --quiet --failed --first-failure

Hex: ex_unit_json | Hex
Docs: ex_unit_json v0.2.5 — Documentation

1 Like

Using Claude Code Hooks

To ensure Claude (via claude-code) always uses the JSON output format instead of the standard human-readable output, you can configure a “PreToolUse” hook. This intercepts mix test commands and blocks them, forcing the use of mix test.json which provides the machine-parseable output Claude needs.

This hook acts as a guardrail, preventing the AI from accidentally running the standard test command and struggling to parse the results.

You can also run a hook to run the corresponding test after every file edit:

Happy coding everyone.

1 Like