vkryukov

vkryukov

Opus 4.5 vs GPT 5.2 vs Gemini 3 Pro for Elixir development

People are constantly debating which LLM is better for writing Elixir code, so I decided to compare the three SoA models from Google, OpenAI, and Antropic to see which one would be better in designing a medium-size feature for a medium-size project.

The project is ReqLLM, a wonderful new LLM library by @mikehostetler, and the feature is adding image generation support, a first part of Add Image Generation and Audio Transcription Support · Issue #14 · agentjido/req_llm · GitHub.

I’ve used Gemini 3 Pro, GPT 5.2, and Claude Opus 4.5 in gemini, codex, and claude code clis, respectively, with the same prompt. After each model wrote a plan, I asked each (in a separate session) to compare the three plans. The results are here: ReqLLM image support plans · GitHub

Bottom line:

  • My ranking of the plans is GPT 5.2 > Opus 4.5 > Gemin 3 Pro; and each of the three models agreed with this assessments.
  • Arguably, GPT’s is the only correct plan; while Opus’s plan works, it essentially introduces a parallel response parsing infrastructure, and would make it hard or impossible to extend the image support going forward, add streaming, etc.
  • For some reason, Claude likes to write big implementation chunks as part of its plan
  • Gemini’s is the least concrete and least accurate plan (and also uses the wrong image generation endpoints, for some reason).

This matches my experience working with Claude Code and Codex daily: while Claude Code has a nicer output, more features (like parallel/background execution), and works faster, Codex is much, much more thorough and most often generates higher quality code.

And also, the “/review” function in Codex is underrated. My current workflow is to always run a “/review”, for code written by me, Claude, or another codex. It excells at finding some very subtle edge cases and bugs that were introduced by the latest patch.

Most Liked

egeersoz

egeersoz

GPT is really bad with Elixir in my experience. I regularly run experiments where I ask multiple models the same question (about design or troubleshooting a bug) and GPT is consistently bottom tier. It’s also slow as hell. Not sure why people like it as a coding assistant.

I used to use it for product management to build domain expertise but Gemini 3 is better at that now.

FlyingNoodle

FlyingNoodle

I’m going to have to agree to disagree on this.

LLMs and brains don’t work the same way at all.

FlyingNoodle

FlyingNoodle

each of the three models agreed with this assessments.

This should really say “each of these models generated text which said that they agreed.”

If you worded your question slightly differently the models would write something else. They are just text generators, they can’t “agree”.

Where Next?

Popular in AI / LLMs Top

Joser
Claude Code Plugin for Elixir: Custom Skills and Hooks for Better Code Quality I’ve been experimenting with Claude Code for Elixir devel...
New
DaAnalyst
How much would you really be willing to spend (more) to keep it going with Claude should Anthropic go berserk with the rates, or put diff...
New
garrison
For those who are not aware, “AI agents” are, for the most part, commodity LLMs which are given access to “tools” and prompted to complet...
#ai
New
AndyL
For development and prototyping, I’d like to retain a basic ability to perform LLM inference on my own hardware, using open source models...
#ai
New
ken-kost
Agreed; I just want to add that IMO Ash amplifies this even further. Especially since the dawn of usage rules. :cowboy_hat_face:
New
asianfilm
For those who haven’t seen it, here is a research paper from August that found that LLMs perform twice as well with Elixir than with Pyth...
New
calebjosue
What sort of libraries are available to integrate LLMs into your Phoenix Web Framework applications? e.g. Mistral, since these guys have ...
New
Vidar
I’ve been trying to come up with a fairly real life representative way of evaluating code quality from AI, and by extension the functiona...
New
dogweather
First, the downsides. I periodically fire the AI. :smiling_face_with_tear: ^^ This was earlier today in Cursor AI, working with “Clau...
New
nickurban
Cursor Cloud Agents are quite good at writing Elixir now, but only with a custom configuration. It took me a while to get this working p...
New

Other popular topics Top

vertexbuffer
Hello, can anybody help here..? I have a list of players and I what to delete an element, but every for loop the list is reverting to ori...
New
JakeBecker
TL;DR: I’ve just released an implementation of Microsoft’s IDE-independent Language Server Protocol for Elixir. It adds language support ...
1144 53578 245
New
Nvim
Anybody knows a comprehensive comparison of Django and Phoenix, thanks for the help. Where are they similar? Where do they differ the m...
New
lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
aalberti333
As the title describes, I’m trying to run Enum.map() over a list of key/value pairs, where the value is a map. My data looks like this: ...
New
hariharasudhan94
lets say i have a sample like a = 20; b = 10; if (a > b) do {:ok, "a"} end if (a < b) do {:ok, b} end if (a == b) do {:ok, "eq...
New
Brian
What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....
New
sergio
Kind of like when jquery came out, it was super necessary. Existing drag and drop libraries have a bunch of baggage to support old browse...
New

We're in Beta

About us Mission Statement