I’ve been building an Elixir library that I think could be useful to others working with AI coding assistants like Claude Code, Cursor, or similar tools.
The Problem
When you’re pair-programming with an AI assistant and tests fail, the AI needs to parse ExUnit’s output to understand what went wrong. The default CLI output is designed for humans - colorful, formatted, with helpful messages. But AI assistants end up doing gymnastics with grep, tail, and regex to extract the actual failure information.
This creates friction. The AI might miss important context, truncate stacktraces, or misinterpret assertion values. And when you have 50+ test failures with the same root cause, the AI has to wade through walls of text to understand the pattern.
The Solution: mix test.json
GitHub - ZenHive/ex_unit_json: AI-friendly JSON test output for ExUnit provides structured JSON output from ExUnit:
mix test.json --quiet --summary-only
{
“version”: 1,
“seed”: 12345,
“summary”: {
“total”: 150,
“passed”: 148,
“failed”: 2,
“skipped”: 0,
“result”: “failed”
}
}
It’s a drop-in addition to your project - all standard mix test options work (–only, --exclude, file paths, line numbers).
Installation
# mix.exs
def deps do
\[{:ex_unit_json, “\~> 0.1.0”, only: \[:dev, :test\], runtime: false}\]
end
def cli do
\[preferred_envs: \[“test.json”: :test\]\]
end
Requires Elixir 1.18+ (uses the built-in :json module - zero runtime dependencies).
Key Features
Fast Iteration on Failures
The --failed flag re-runs only previously failed tests (using .mix_test_failures). Combined with --first-failure, you can fix tests one at a time:
mix test.json --quiet --failed --first-failure
This outputs just the first failure with full assertion details and stacktrace. Fix it, run again, repeat until green.
Group Failures by Root Cause
When 47 tests fail because a service is down, you don’t want to see 47 identical error messages:
mix test.json --quiet --group-by-error --summary-only
{
“error_groups”: \[
{“pattern”: “Connection refused”, “count”: 47, “example”: {…}},
{“pattern”: “Invalid token”, “count”: 3, “example”: {…}}
\]
}
Now you know there are really just 2 issues, not 50.
Filter Expected Failures
Working on a project with integration tests that need API credentials? Filter those out to see real bugs:
mix test.json --quiet --group-by-error --filter-out “credentials” --filter-out “rate limit”
The filtered count in the summary shows how many matched your patterns.
Full Failure Details
Every failed test includes:
- Assertion expression, left value, right value
- Structured stacktrace with file/line/module/function
- Test tags and duration
{
“failures”: \[{
“kind”: “assertion”,
“message”: “Assertion with == failed”,
“assertion”: {
“expr”: “1 + 1 == 3”,
“left”: “2”,
“right”: “3”
},
“stacktrace”: \[
{“file”: “test/math_test.exs”, “line”: 15, “module”: “MathTest”, “function”: “test addition”}
\]
}\]
}
The AGENT.md Pattern
One thing I’m experimenting with is including an AGENT.md file in the package. This is documentation specifically for AI assistants - explaining the optimal workflows, flag combinations, and common patterns.
From the file:
Start Here (Default Workflow)
Most common pattern - fast iteration on failures:
First run or after code changes
mix test.json --quiet --summary-only
Iterating on failures (ALWAYS use --failed for speed)
mix test.json --quiet --failed --first-failure
The idea is that when an AI assistant (Claude Code, Cursor, etc.) reads your project, it finds AGENT.md and immediately knows how to use the tool effectively. No guessing, no trial and error.
I’d be curious if others are doing similar things - documenting their libraries specifically for AI consumption.
Schema Stability
The JSON output follows a documented v1 schema (in the README). The version field allows for future evolution without breaking existing integrations. Breaking changes would bump the version number.
Recommended Workflow for AI Assistants
- Initial assessment: mix test.json --quiet --group-by-error --summary-only
- Filter noise: mix test.json --quiet --group-by-error --filter-out “credentials” --summary-only
- Fix one at a time: mix test.json --quiet --failed --first-failure
- Verify fix: mix test.json --quiet --failed --summary-only
Try It Out
GitHub:
The library is feature-complete for v0.1.0 (150 tests, 90%+ coverage). I’m planning to publish to Hex.pm shortly.
Feedback welcome - especially on:
- Missing features that would help your AI-assisted workflow
- Edge cases in test output parsing
- The AGENT.md concept - useful or unnecessary?
Built this while working with Claude Code and realizing how much time was being lost to parsing text output. Sometimes the best tools come from scratching your own itch.






















