Successful development with local AI setup

Hi guys,

I’ve been running Qwen locally for the last two weeks using an opencode + Ollama + qwen3.6:35b-A3B setup with q4_k_m quantization on a MacBook Pro M2 Max (32GB RAM).

My development workflow is heavily GitHub-centric: GitHub Issues and PRs are the source of truth and, in many cases, effectively act as the issue/PR state store.

To improve reliability and maintain consistently high code quality, I’ve been building a set of “skills” that act as boundary setters, orchestration layers, and quality gates.

Current skill set (still evolving):

Issue Lifecycle Skills

Skill Target Model Description
improve-issue opus Enriches a raw GitHub issue into a precise, implementation-ready specification persisted directly in the issue body. This is always the first step before coding. The generated spec includes explicit Acceptance Criteria (AC).
evaluate-issue sonnet or qwen Sizes an enriched issue and recommends KEEP, COMPLETED, or SPLIT, including the recommended model tier for execution.
orchestrate-issue sonnet or qwen; opus for complex tasks Runs the full implementation → review → correction loop for an enriched issue. When the issue is considered done, it automatically invokes pr-from-issue.
pr-from-issue qwen or haiku Opens a PR from a completed issue, validates the “ready to be closed” marker, and executes the full test gate beforehand.

The most interesting skill is probably orchestrate-issue, since it behaves as a controlled execution loop coordinating multiple sub-skills:

Implementation & Review Skills

Skill Target Model Description
code-issue sonnet or qwen; haiku for trivial tasks; opus for complex ones Implements every Acceptance Criterion from the enriched specification (or unresolved review gaps) in an Elixir/Phoenix/Ash codebase, including tests.
review-issue sonnet or qwen; opus for complex reviews Performs a senior-reviewer pass over the implementation. Runs mix precommit and mix test, evaluates implementation coverage against the specification, and writes identified gaps back into the GitHub issue using a structured review marker block.

After the PR is generated, I manually review it and add comments for anything that should be discussed, improved, or refactored. If further work is needed before merging, I invoke an additional skill:

PR Review Resolution

Skill Target Model Description
address-pr-review sonnet or qwen Resolves PR review comments, updates tests when necessary, and iterates until the review state is acceptable for merge.

The whole system is still evolving, but at this point it’s already producing surprisingly high-quality code with a fairly reliable workflow.

Performance is actually very good locally, although I do need to keep background activity to a minimum — typically just one Chrome tab open, while Docker and PostgreSQL are always running.

Next step is to separate the testing generation from the code, I’ll probably use a agent swarm…

I’m curious whether others here are experimenting with similar AI-assisted development workflows around Elixir/Phoenix/Ash projects, especially with local-first setups.

2 Likes

I am certainly experimenting with all things Elixir, Rust and to some degree other languages and aspects (C, Zig, microcontrollers, Nerves, various UI interfaces, industrial network protocols, camera controls, video streaming and using/ tuning AI).

The Elixir and Rust has come the furthest where basically the setup is a using various skills for planning and implementing with hooks forcing use, and with a static code analyzer and a review skill to find, interpret and fix any issues. Do this with TDD active and reiterate until done, then a loop with feedback for the skills, hooks and analyzer. This works surprisingly well. It could use some outer framing for the rest of the environment, but so far looking good and steadily improving.

On the local level I have an old NVidia Orin 64GB just sitting about for the most part, so I play a bit with local ideas on that one.

1 Like

@Vidar, do you let AI do part of your coding? If so, what do you use (Copilot, Claude, Cursor, …)?

Just for curiosity :slightly_smiling_face:

Yes, the AI does part of the coding. I don’t do life support or critical safety work, nor millions of users, and I am supposed to experiment. So worst case mistakes get expensive, but there is the other way around too. So far I’m well ahead but it is not without frustrations. It is a conscious choice though to jump into the water even if it is not all cozy yet.

My long term goal is for the AI to do most of the coding required to meet requirements and specifications, and to get fairly automatically with reliable quality results. Hence I do tests as such, no editor touched, and the size of the projects and the areas where that seems achievable is getting ever larger. (If you are curious I did an automated test with Elixir and the RealWorld Medium blog clone project. Two prompts, and apart from those just giving permissions as asked for. GitHub - BadBeta/Automated-Elixir-Code-Test-RealWorld · GitHub ).

I use Claude, usually on high, and with 1M context. The latter is a blessing!

I tried Qwen running on the Orin 64GB some months ago, I think it was a 14b (?) q4 model of some version. I didn’t try it for coding though just kicked the tires for sensible answers. But Claude is working for me, for the most part, and nobody seem to have something much better. Thus I haven’t bothered shopping around.

Nice, I am wondering how do you do the requirements and specs?. I exclusively use Claude Opus for those tasks. The specs are github issues centric, so every issue have an enriched spec. The requirements are kept in the repository under docs (so I can have in github public pages). My flow is: requirements in docs (create/review/update)-> issues → issue enrichment → code issue (write/review loop) → close issue PR. Not automatically, I still need to feel some control over the project :sweat_smile:

I did that too, previous qwen were worthless, they were very unreliable. But a couple of weeks ago qwen3.6 was released and, sincerely, it is another story. I keep Claude because of Opus, but qwen quality is similar to sonnet. I actually did several runs starting from the same spec in different branches (one for qwen and one for sonnet) and both performed so much similar that even some functions had the same arity, name and arguments (crazy). I also assess how they do on task that were too complex for their capabilities, and both failed in similar ways, at the end I have nothing to say that differentiate Sonnet from Qwen performance. Therefore, for the past couple of weeks I am resolving most (if not all) of the issues with qwen3.6:35b-A3b (q4_k_m) and with a very good quality.

I use Claude even for making requirements. Instead of trying to nail all the requirements up front I rather go in early and just document what I got. Then a stepwise plan, and during the implementation I use Claude and the process to fill in the weak spots, do active exploration of alternatives, comparison tests, metrics, etc and documenting it all.

This ends up with a working prototype and with both me and Claude knowing more about the problem space. It clarifies what works well, what doesn’t, and any unclear areas or wrong assumptions from the initial requirements.

Then I iterate and start a new version based on that. The second time is always better. It is based on better information so the project documents are more specific and known to work. LLMs make it easy to iterate over even large problem spaces so this workflow got more viable. And the results are better for it.

That is interesting. Sonnet is quite capable, so I might have to look again at Qwen one of these days. Even if slower a capable local solution would be useful.

1 Like