ExUnitJSON - AI-friendly JSON test output for ExUnit

e.fu · January 9, 2026, 9:15am

I’ve been building an Elixir library that I think could be useful to others working with AI coding assistants like Claude Code, Cursor, or similar tools.

The Problem

When you’re pair-programming with an AI assistant and tests fail, the AI needs to parse ExUnit’s output to understand what went wrong. The default CLI output is designed for humans - colorful, formatted, with helpful messages. But AI assistants end up doing gymnastics with grep, tail, and regex to extract the actual failure information.

This creates friction. The AI might miss important context, truncate stacktraces, or misinterpret assertion values. And when you have 50+ test failures with the same root cause, the AI has to wade through walls of text to understand the pattern.

The Solution: mix test.json

GitHub - ZenHive/ex_unit_json: AI-friendly JSON test output for ExUnit provides structured JSON output from ExUnit:

mix test.json --quiet --summary-only

{
“version”: 1,
“seed”: 12345,
“summary”: {
“total”: 150,
“passed”: 148,
“failed”: 2,
“skipped”: 0,
“result”: “failed”
}
}

It’s a drop-in addition to your project - all standard mix test options work (–only, --exclude, file paths, line numbers).

Installation

# mix.exs

def deps do
\[{:ex_unit_json, “\~> 0.1.0”, only: \[:dev, :test\], runtime: false}\]
end

def cli do
\[preferred_envs: \[“test.json”: :test\]\]
end

Requires Elixir 1.18+ (uses the built-in :json module - zero runtime dependencies).

Key Features

Fast Iteration on Failures

The --failed flag re-runs only previously failed tests (using .mix_test_failures). Combined with --first-failure, you can fix tests one at a time:

mix test.json --quiet --failed --first-failure

This outputs just the first failure with full assertion details and stacktrace. Fix it, run again, repeat until green.

Group Failures by Root Cause

When 47 tests fail because a service is down, you don’t want to see 47 identical error messages:

mix test.json --quiet --group-by-error --summary-only

{
“error_groups”: \[
{“pattern”: “Connection refused”, “count”: 47, “example”: {…}},
{“pattern”: “Invalid token”, “count”: 3, “example”: {…}}
\]
}

Now you know there are really just 2 issues, not 50.

Filter Expected Failures

Working on a project with integration tests that need API credentials? Filter those out to see real bugs:

mix test.json --quiet --group-by-error --filter-out “credentials” --filter-out “rate limit”

The filtered count in the summary shows how many matched your patterns.

Full Failure Details

Every failed test includes:

Assertion expression, left value, right value
Structured stacktrace with file/line/module/function
Test tags and duration

{
“failures”: \[{
“kind”: “assertion”,
“message”: “Assertion with == failed”,
“assertion”: {
“expr”: “1 + 1 == 3”,
“left”: “2”,
“right”: “3”
},
“stacktrace”: \[
{“file”: “test/math_test.exs”, “line”: 15, “module”: “MathTest”, “function”: “test addition”}
\]
}\]
}

The AGENT.md Pattern

One thing I’m experimenting with is including an AGENT.md file in the package. This is documentation specifically for AI assistants - explaining the optimal workflows, flag combinations, and common patterns.

From the file:

Start Here (Default Workflow)

Most common pattern - fast iteration on failures:

First run or after code changes

mix test.json --quiet --summary-only

Iterating on failures (ALWAYS use --failed for speed)

mix test.json --quiet --failed --first-failure

The idea is that when an AI assistant (Claude Code, Cursor, etc.) reads your project, it finds AGENT.md and immediately knows how to use the tool effectively. No guessing, no trial and error.

I’d be curious if others are doing similar things - documenting their libraries specifically for AI consumption.

Schema Stability

The JSON output follows a documented v1 schema (in the README). The version field allows for future evolution without breaking existing integrations. Breaking changes would bump the version number.

Recommended Workflow for AI Assistants

Initial assessment: mix test.json --quiet --group-by-error --summary-only
Filter noise: mix test.json --quiet --group-by-error --filter-out “credentials” --summary-only
Fix one at a time: mix test.json --quiet --failed --first-failure
Verify fix: mix test.json --quiet --failed --summary-only

Try It Out

GitHub:

The library is feature-complete for v0.1.0 (150 tests, 90%+ coverage). I’m planning to publish to Hex.pm shortly.

Feedback welcome - especially on:

Missing features that would help your AI-assisted workflow
Edge cases in test output parsing
The AGENT.md concept - useful or unnecessary?

Built this while working with Claude Code and realizing how much time was being lost to parsing text output. Sometimes the best tools come from scratching your own itch.

e.fu · January 9, 2026, 11:36am

Update: Now available on Hex.pm

The package is published and ready to use:

# mix.exs

def deps do
\[{:ex_unit_json, “\~> 0.1.2”, only: \[:dev, :test\], runtime: false}\]
end

def cli do
\[preferred_envs: \[“test.json”: :test\]\]
end

The cli/0 function is required - without it you’ll get “mix test” is running in the “dev” environment.

Then:

mix deps.get
mix test.json --quiet --summary-only

Quick start workflow:

Initial assessment

mix test.json --quiet --summary-only

Iterate on failures (fast - only reruns failed tests)

mix test.json --quiet --failed --first-failure

Hex: ex_unit_json | Hex
Docs: ex_unit_json v0.2.5 — Documentation

e.fu · January 13, 2026, 10:34pm

Using Claude Code Hooks

To ensure Claude (via claude-code) always uses the JSON output format instead of the standard human-readable output, you can configure a “PreToolUse” hook. This intercepts mix test commands and blocks them, forcing the use of mix test.json which provides the machine-parseable output Claude needs.

github.com/ZenHive/claude-marketplace-elixir

plugins/core/scripts/prefer-test-json.sh

main

#!/usr/bin/env bash
set -eo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/lib.sh"

# Read hook input from stdin
read_hook_input

COMMAND=$(echo "$HOOK_INPUT" | jq -r '.tool_input.command')
HOOK_CWD=$(echo "$HOOK_INPUT" | jq -r '.cwd')

# Skip if command is null or empty
is_null_or_empty "$COMMAND" && { emit_suppress_json; exit 0; }

# Check if this is a `mix test` command but NOT `mix test.json` or `mix test.something_else`
# Pattern: "mix test" optionally followed by space and args, but NOT followed by a dot
if ! echo "$COMMAND" | grep -qE 'mix\s+test(\s|$)'; then
  # Not a mix test command at all
  emit_suppress_json

This file has been truncated. show original

This hook acts as a guardrail, preventing the AI from accidentally running the standard test command and struggling to parse the results.

You can also run a hook to run the corresponding test after every file edit:

github.com/ZenHive/claude-marketplace-elixir

plugins/core/scripts/test-json-runner.sh

main

#!/usr/bin/env bash
# Runs mix test.json on related test file after editing .ex/.exs files
# Non-blocking hook - provides test results as additionalContext

INPUT=$(cat) || exit 1
FILE_PATH=$(echo "$INPUT" | jq -e -r '.tool_input.file_path' 2>/dev/null) || exit 1

# Validate
if [[ -z "$FILE_PATH" ]] || [[ "$FILE_PATH" == "null" ]]; then
  exit 0
fi

# Only .ex/.exs files
if ! echo "$FILE_PATH" | grep -qE '\.(ex|exs)$'; then
  jq -n '{"suppressOutput": true}'
  exit 0
fi

# Find Mix project root
find_mix_project_root() {

This file has been truncated. show original

Happy coding everyone.

e.fu · February 4, 2026, 11:03pm

v0.4.1 Update – Coverage & Smarter Defaults

This release includes several improvements since the initial version, including
a breaking change to defaults and new code coverage support.

Breaking change (v0.3.0)

The default output now includes only failing tests.

When all tests pass, the output looks like:

{"version":1,"summary":{"total":50,"passed":50,"failed":0},"tests":[]}

Use --all if you need passing tests included as well.

New in v0.4.x: Code Coverage

You can now include coverage data in the JSON output.

Include coverage information:

mix test.json --quiet --cover

Fail the run if coverage drops below a threshold:

mix test.json --quiet --cover --cover-threshold 80

The output includes total coverage and per-module details, including uncovered
line numbers:

{
  "coverage": {
    "total_percentage": 92.5,
    "modules": [
      {
        "module": "MyApp.Users",
        "percentage": 95.0,
        "uncovered_lines": [45, 67]
      }
    ]
  }
}

Other additions

--group-by-error
Clusters failures by error message (e.g. 50 occurrences of “connection refused”
collapse into a single root cause)
--filter-out "pattern"
Excludes expected failures such as missing credentials or rate limits
Automatic reminder:
Shows TIP: use --failed when previous failures exist

Updating

{:ex_unit_json, "~> 0.4", only: [:dev, :test], runtime: false}

Links

Hex package: ex_unit_json | Hex
Documentation: ex_unit_json v0.4.1 — Documentation