For those who haven’t seen it, here is a research paper from August that found that LLMs perform twice as well with Elixir than with Python or JavaScript.
Sounds exciting, unfortunately in the article researchers tell that “likely because DeepSeek-Coder-V2-Lite
struggles to filter out simple problems in these scenarios due to its limited capability in handling low-
resource languages“. In their experiment design they wanted to run the tests on the generated problems, yet only on the complex ones and.. possibly used “complexity filter” that was unable to understand whether elixir problem was too simple (thus possibly making elixir score artificially high).
Do we still like elixir - oh, yes. Does this article mean that LLMs are better with elixir - probably not so much. Especially since the research goal was quite orthogonal - they were evaluating models against models (and against complex problems), not languages.






















