Elixir Skills for Claude, Cursor, Codex

Hey there,

I haven’t seen a list of useful Elixir specific skills for the popular AI tools yet so I’m starting one here with my first.

This first entry is a skill to automatically remove cyclic dependencies from your code base. Copy the skill directory to your tools skills directory (see README) and kick it off to reduce dependencies.

Please share what useful skills you have found & developed so far. Also if you try this one out I would be happy to get your feedback and make changes to make this even better.

Cheers!

13 Likes

I use these claude skills → GitHub - j-morgan6/elixir-claude-optimization

3 Likes

I think you may miss understood skills, from my perspective skill is an action which you describe how to make. And then you just trigger that action. Your skills mostly a general knowledge how to write better elixir code, that should be known by a good model (or forced by credo).

1 Like

That is not at all my understand of skills and I have personally written several (although not the ones mentioned in my post above). Allow me to walk you through the evolution of providing direction to agents.

For a while now when you generate a new phoenix application it produces an AGENTS.md file. This is not an elixir specific thing - but it was an early attempt to provide agents with guidance. The two challenges were that it was generic (not focused on a specific topic or action) and it was loaded into the agents context. This meant that it was a nice suggestion although the agent was not really compelled to follow. If the context became cluttered it was very possible for this information to be completely removed from the context and it was nearly as if it did not exist. It was the best that was available until this past October when Claude Code introduced Skills.

Skills were different in two very important ways. First of all they were contextual - meaning that they were only loaded when the agent determined it needed the additional knowledge provided by the skill and secondly they were not part of the agent context. Think of them as a much stronger suggestion that is invoked just-in-time. They are a great improvement over the basic AGENTS file.

What sort of things go in Skills? The skills I have built are related to workflow for a task coordinator I use. I also use a brainstorming skill all of the time. The skills I mentioned above are invoked constantly during my development. For example, the ecto-database that is part of the collection largely contains a lot of the information that Phoenix places in the AGENTS.md file (with some additions) but it is in a more useful form - Skills over AGENTS file.

You might also notice that the plugin I mentioned also contains a few SubAgents. There is one for Testing. SubAgents combined with hooks add more rigour to the agents producing our code.

IMO agents by them selves do not produce exactly what we want. Sometimes that “general knowledge” we expect does not appear. Skills, SubAgents, and Hooks are there to help us direct the agent better.

4 Likes

It seems for Claude it can be either:
a) background knowledge
b) task descriptions

but for Cursor we only have b) task descriptions so for the time being.

Initially I was confused too - Full disclosure I’m a Cursor user - but reading a bit more Optimizing Claude Code: Skills, Plugins, and the Art of Teaching Your AI to Code Like You — mays.co and Extend Claude with skills - Claude Code Docs I’m not sure…

The cursor documentation Agent Skills | Cursor Docs describes “skills” as being strictly task definitions but reading on the Claude definition they do have both:

From claude docs: Extend Claude with skills - Claude Code Docs
disable-model-invocation: true: Only you can invoke the skill. Use this for workflows**
user-invocable: false**: Only Claude can invoke the skill. Use this for background knowledge that isn’t actionable as a command

TIL - thanks @Cheezy

This is all very interesting and I am not sure how all of this will shake out but Copilot, Cursor, and OpenCode all claim they will use Claude Code Skills installed in the .claude directory. Gemini also claims it will use Claude Code Skills although it wants them installed in the .gemini/skills directory. Even Windsurf Cascade supports Skills now. This week I have been testing the same four skills with each of these agents to confirm behaviour and I can say that it works. At this point Kimi is an outlier.

1 Like

Amazing! I’ll try that, especially curios to see Cursor using those background knowledge pieces…

To add to this: I also have a skill called something like “prepare for handoff “. It’s designed to help with context management.

Whenever I finish a task — or the context starts running out for a big task — I tell Claude to “prepare for handoff (with whatever)”. It basically creates or adds to a HANDOFF.md doc and writes up a handoff mini prompt in the CLI. Then I clear context and start fresh with a new task or continue the last task that was running low on context.

Anecdotally, Claude Code seems to go off the rails less when I aggressively manage context this way.

2 Likes

Nice Topic,

I’ve been building Giulia, an Elixir daemon that provides AST-level code intelligence via REST API. It parses your codebase with Sourceror, builds a dependency graph with libgraph, runs semantic search with Bumblebee/Nx (all-MiniLM-L6-v2, 80MB, on CPU), and stores everything in ETS. It runs on OTP, serves any client over HTTP, and responds in under 300ms.

The idea is simple: give AI coding agents (or any tool) structural understanding of Elixir codebases instead of letting them grep around like it’s 1985.

To test it on something I didn’t write, I cloned Commanded and ran a full analysis. Here’s what Giulia found.

The Headlines

  • 66 modules, 438 functions, 500 graph vertices, 520 dependency edges

  • Zero red-zone modules in the heatmap — speaks to Commanded’s maturity

  • Zero behaviour fractures — every @behaviour contract is fully satisfied

  • Zero orphan specs — every @spec matches its function

  • Only 4 dead functions out of 438 (0.9%)

  • 2 circular dependencies — one is a 10-module cycle through the entire command execution pipeline

  • 7.3% spec coverage — only 32 specs for 438 functions across a public framework

The Insight That Static Analysis Misses

Most tools rank modules by complexity alone. Giulia combines complexity with topology — and the results tell a very different story.

Event.Handler has the highest complexity score in the project at 93. It has 36 functions across 1,549 lines and wears five hats: behaviour definition, macro, GenServer, event processor, and telemetry emitter.

Sounds dangerous, right? Except Giulia’s knowledge graph shows it has only 1 downstream dependent. It’s a leaf node. Refactoring it is low-risk.

Aggregates.Aggregate scores lower at 63 complexity. Only 716 lines, 28 functions, clean code with zero structural redundancy.

But it has degree 17 in the dependency graph — the highest in the project. Fan-in 6, fan-out 11, part of a 10-module circular dependency cycle. It’s the #1 change risk module (score 844) because modifying it affects the entire command execution pipeline from dispatch through persistence.

Half the code, lower complexity, but double the danger.

Any tool that ranks by one dimension misses this. Giulia gives both.

Three Core Modules, Side by Side

Aspect Aggregates.Aggregate Event.Handler ProcessManagerInstance
Lines 716 1,549 649
Functions 28 36 30
Complexity 63 93 53
Change risk rank #1 (844) #3 (462) #2 (537)
Dependency degree 17 9 11
Downstream dependents 6 1 3
State persistence Event stream None Snapshots
Error recovery paths 1 1 2
Typespecs 0 0 0

The three most important modules in the framework. Zero typespecs across all of them.

What Giulia Exposes via REST

Every endpoint responds in under 300ms, project-scoped with a ?path= parameter. Multi-project support — one daemon, multiple codebases indexed simultaneously.

Code Understanding: modules, functions, specs, types, callbacks, structs, module details (one call, full profile)

Knowledge Graph: impact maps (blast radius at depth N), dependents, dependencies, centrality, dependency path tracing, cycle detection

Health Metrics: change risk scores, god modules, fan-in/fan-out, coupling analysis, API surface ratios, heatmap (red/yellow/green zones), dead code detection, orphan specs, behaviour integrity

Semantic Search: two-stage retrieval — Bumblebee embeds module docs + function signatures, cosine similarity finds code by intent (“entity movement physics” finds set_velocity/2 without keyword match), surgical briefings combine semantic results with knowledge graph data

The Stack

All Elixir. All on the BEAM.

  • Sourceror for AST parsing and code analysis

  • libgraph for dependency topology, Dijkstra pathfinding, cycle detection

  • Bumblebee + Nx (EXLA) for semantic embeddings (all-MiniLM-L6-v2)

  • ETS for the artifact store (modules, functions, ASTs, vectors)

  • Bandit + Plug for the REST API

  • OTP supervision for the whole thing

No Python. No external vector database. No sidecar services. One supervised Elixir application.

I work with AI coding agents daily. They’re powerful but architecturally blind — they understand code as text, not as structure. They grep for function definitions instead of querying a graph. They can’t tell you the blast radius of a change or whether a module is a hub or a leaf.

Giulia gives them eyes. One API call returns what would take 10+ grep/read cycles and thousands of context window tokens. The AI agent I use reports ~80-90% token savings and calls the impact/centrality endpoints mandatory before planning any modification.

Giulia is a personal project, not open source yet. Happy to discuss the architecture, the analysis results, or the approach. Feedback welcome.

If someone want a full Analysis, just contact me for the document.

Best,

4 Likes

I am interested in an analysis for my most complex libraries if you have the time : jsv, oaskit and gen_mcp.

Regarding skills as tasks vs knowledge to me it makes no difference, at the technical level the agent will match a topic and ingest the related text, whatever that is. A “how do I do X” task description is still knowledge. Promps that you invoke manually I call that “slash commands” but once it’s in the context, it does not make much difference, except that in VS Code copilot the path to the file is prefixed with “follow the instructions in” which directs the LLM to explicitly follow what’s inside instead of just getting some knowledge, it’s more suited for tasks in my opinion.

Also LLMs are not Elixir specialists and will follow the most used patterns all languages wide. I stopped to tell them to write tests like this:

assert %{
  foo: 123,
  bar: "hello"
} = stuff()

because they just don’t listen. The want to write this as:

assert x = stuff()
assert x.foo == 123
assert x.bar == "hello"

This is just what typescript and python does. I’m now willing to try to write a “testing with elixir” skill to see if this can be improved.

@SyntaxSorcerer this pattern is interesting! I’m using SDD tools like openspec sometimes, but I always find those frameworks a bit too rigid. Thanks to the nature of LLMs it’s always possible to steer away from the basic workflow, but I believe for smaller tasks it could be more flexible to have a “handoff to impl”, “handoff to to fix”, “handoff to tests” kind of prompts.

Ah but saying that I realize that I’m willing to adapt AI-as-a-tool to my natural workflow (which is mostly TDD). And that there are two paradigms that are actually clashing: Do you use the AI as support and a tool to generate parts of the code faster, or do you want it to be an independent developer working on its side and have a coworker relationship with it. Or is it a spectrum between the “10x senior” and the “vibe coder”? I think that if you want an autonomous agent you have to fully embrace it (which I’m not ready to do) and provide many skills to describe the “AI developer” as a profession on top of the “competent programmer” skills.

Well sorry this was mostly off-topic :smiley:

Yes of course, i runned the analysis for jsv, consider the software is still in development, send me a DM, so I can sent you the report.

Best,

I implemented some corrections on report :

  1. Behaviour integrity — now returns enriched fracture data with 4 categories: missing (real gaps), injected (MacroMap-detected), optional_omitted (legal), heuristic_injected (ghost-detected). Only
    missing triggers fracture status.
  2. Preflight Behaviour Contract — now has a 4-level integrity status: consistent, consistent_with_optionals, heuristic_match, fractured. Plus new fields: optional_omitted and heuristic_injected.

here the SKILL file : SKILL.md

@lud , good questions. I think the easiest way for me to answer is with a high-level overview.

First, I like TDD, DDD, and SDD, and try to incorporate them into most of my projects.

I think building with AI is a different skill set than traditional software development. As others have noted in this forum, AI has the tendency (as of now) to try to force non-idiomatic patterns on Elixir code, so I maintain a healthy skepticism with AI-generated code.

Eventually, I think there’s a significant chance the majority of code may be AI-generated, so I want to be prepared for this, but I don’t think it’s going to happen as fast as some of the big proponents would have you believe. So I’m kind of in this intermediate state where I try to push AI as much as possible with a “trust but verify” approach.

I’m also working on my own harnesses for spec-driven development and multi-agent coordination and task management. My thought is this exercise is a good way to learn about what the symbiosis between AI and developer could look like, and maybe will be a better way to produce quality code with a very specific toolset.

2 Likes

Would love to get your review of packages across the Jido ecosystem, starting with jido_action, jido_signal and jido - I’ll DM you

1 Like

Well keep me posted!

I’m evaluating Claude Code in deptht for my company right now so I’m writing a feature with mostly long, documented prompts. Well that was the plan but I have to intervene all the time. The code is sometimes ok, sometimes crap.

Consistency is very important in development, but AI is everything but consistent, this is what makes it so unreliable to my eyes. You will get good results with minor problems, which will build your trust and slightly lower your standards at the same time, which is fine since you are delegating to a junior. But it’s a trap.