Is anyone working on "AI Agents" in Elixir?

garrison · March 16, 2025, 5:53pm

For those who are not aware, “AI agents” are, for the most part, commodity LLMs which are given access to “tools” and prompted to complete tasks, possibly in some sort of loop.

The tool use is facilitated by a program which scans the output text of the LLM and looks for a “tool call” request (in some standard format), and then executes that call. For example, you might give the model access to a “calculator” tool which enables it to do math, or a “weather API” tool to check the weather. And so on. The model is given a prompt which tells it what tools it has access to, and I believe most models coming out nowadays are trained to some degree on tool use so that they get the general idea.

The “agentic” behavior here is somewhat arbitrary, but the idea is that you have some sort of feedback loop. The model generates a tool call, receives the result, and then perhaps generates more calls based on that result. People have been using this to write code, for example, with (so far) limited success.

The current emerging “killer app” for agents is the “deep research” model, which has been adopted by google, openai, perplexity, twitter (lol), and so on. The basic idea here is that you give the model a “search engine” tool and then just prompt it to run in a loop searching, reading results, and then coming up with more searches. Then it generates a nice summary (“report”) at the end for human consumption. It goes without saying that this task is a lot easier than writing code, and as a result agents seem to be actually “catching on” for the first time.

Due to the autoregressive nature of current LLMs, which has proved to be quite sticky thus far, they perform extremely poorly for “local” use. Current autoregressive models require the entire model to be run through the GPU’s registers on every forward pass just to generate one token. As a result, “local” inference is completely bottlenecked by memory bandwidth. If you have a 30GB model (on the low end of “useful”), and a GPU with 600GB/s memory bandwidth (that’s pretty good), you would expect 20 tokens/sec (fairly usable). Unfortunately GPU memory bandwidth is expensive and 30GB is not enough for a top tier model.

However, this problem vanishes with batching. GPUs are built for parallel compute, and deep nets are built to utilize it. If you batch, say, 10 requests at a time, all of a sudden you are getting 200 tokens/sec on the same hardware (flops notwithstanding). The point being: there is a forcing function towards multitenancy. This is why everyone is using cloud APIs instead of running their own models - the cost reduction is enormous.

What this means is that “AI agents” are actually just glue code for interacting between LLM APIs and “tool” APIs. And that’s where Elixir comes in: we are very good at soft-realtime. Elixir and the BEAM are the ideal ecosystem for this. LiveView is the perfect tool for server-side realtime UI. If you were going to build some sort of “agentic” app, this would be the platform.

So I’m curious, is anyone doing something in that space?

joelpaulkoch · March 16, 2025, 8:18pm

Hey, so I know that there are these libraries which I didn’t try yet: jido swarm_ex

And I have these blog posts open in a tab but couldn’t find the time to read them so far:

I’m sure there is more going on

acrolink · March 16, 2025, 9:43pm

druyang · March 18, 2025, 3:58am

I have been throwing together a list of Elixir based openAI-style clients as well with some of the forementioned libraries (in addition to some others): GitHub - druyang/awesome-elixir-llm-genai: A list of LLM and GenAI Elixir Resources/Tools

Like you said, technically all you need for an agentic library is the ability to function call (which is really just structured outputs) and feed the feedback back into the model. Many of the libraries work, the amount of extra lifting to create the loops varies. I think the aforementioned Jido seems the most promising (not to pick favorites).

Since most agentic frameworks are bottlenecked by IO requests rather than pure computation, I also believe that BEAM/Elixir is an extremely underrated choice for GenAI. This is exacerbated by the use of batching LLM calls at scale for further cost reduction.

I’m still new to Elixir, but in my free time I want to work on some OSS projects in this area

garrison · March 20, 2025, 8:16pm

I gave them a quick scan and they seem to be making roughly the same point I was, which is encouraging to me

It seems like “agents” in general are a broad enough concept as to be effectively turing complete - i.e. there is no wake to make an “agent framework”, because agents are just code. And not only that, but at least for the time being running large models on your own servers is not cost-effective unless you have massive scale, and even then you would want as much multitenancy as possible within your own systems, so that forcing function doesn’t go away. In practice I think even large orgs will have to disaggregate their “ai compute” from their application compute, like we do with databases and storage, for the foreseeable future.

It seems like the one bit of tooling we still need is, well, tooling for tool calls. The impetus for this post was that there was another post on here about a ruby library called rubyllm which I thought looked like a fantastic model for how we could implement that functionality.

I am curious, though: is anyone on here working on an application (an agent) in this space, as opposed to libraries and frameworks? I am curious to hear people’s experiences with this.

joelpaulkoch · March 23, 2025, 11:51am

So, I really liked the blog post, especially that it boils down to this

I hope these examples showcase how we can build sophisticated LLM workflows using simple Elixir mechanisms. While our implementation might seem basic - just pattern matching and with statements - don’t be fooled by the simplicity. This approach of composing small, focused functions and passing message chains through them can take us surprisingly far.

The beauty of this approach lies in its transparency - we can see exactly how our LLM interactions flow, what context is being maintained, and where we might want to add error handling or logging. No magic, no hidden state, just clear functional transformations.

Libraries can still be useful when they add convenience and structure.

I also like that this article follows the structure of Anthropic’s article as it’s a very good reminder that you can probably solve most problems without “agents”, even if you make use of LLMs.

Yes, I agree. We have database and storage as building blocks, so I could imagine that in the same way many applications have a “reasoning” block that makes use of LLMs for certain innovative features. Depending on how performant and cheap they will be, I could also imagine that we can use LLMs to avoid implementing some features in code and use a call to an LLM as shortcut.
Recently, I’ve been thinking of software defined networking as analogy (no expert here, so could be totally off ). Before that you’d do all networking in hardware. As software performance increased and it got cheaper at the same time it suddenly made sense to do networking in software as you could work way more flexible.
In the same way, I could see LLMs enabling “flexible” code. I also like this analogy because there is still a lot of hardware involved in networking, it’s just that you can do innovative things with software defined networking.

Not an application in that sense but I want to try to build an LLM based program that converts models from transformers written in pytorch to Elixir and bumblebee.
Partially as a learning project for building such projects, and I’m just curious how far I can get.
I also want to take it step by step following Anthropic’s categories and only move to agents if I really have to.
As a sidenote, here are people from huggingface saying that it’s more effective to give LLMs the ability to write and execute code to do something instead of JSON tools.
I also think that when your problem allows it it’s in general more effective to let LLMs write code instead of asking the LLM to perform the operation directly.
In my case I guess there is a lot of Python/pytorch code that follows a structure that can simply be parsed and transformed to Elixir, and I don’t really need an LLM to perform that operation but rather code that parses and transforms. On the other hand, for arbitrary functions or control flow I might need an LLM and potentially a feedback loop.

I guess for real agents the core issue is that you give up control to LLMs, which might limit their use cases. Or, you’d need other ways to exercise control over the results and actions that will be performed.

Are you thinking about building any sort of agentic application?

garrison · March 23, 2025, 5:10pm

The reason databases and storage are cheaper disaggregated is that multitenancy has inherent efficiency gains. A common example is S3, where high-storage low-bandwidth customers and high-bandwidth low-storage customers can share the same physical drives and use the capacity more effectively. LLMs currently exhibit a dramatic cost reduction in a multitenant environment because they are autoregressive.

If trends were to reverse this could just as easily end up not being the case. For example, if diffusion LLMs were to catch on and on-CPU accelerators and larger memory bandwidth catch on (e.g. amd strix halo) then you could imagine it being simpler to just run your zero-shot “AI” tasks directly on CPU (for us, this would mean Nx/Axon/Bumblebee). Models have also been shrinking, which helps.

But thus far, autoregressive models have kept winning. I have no deep technical understanding of why - maybe nobody does?

Nobody has solved prompt injection yet so giving your models the ability to run arbitrary code seems unwise. My bet would be for “real” products this will remain a very bad idea for a while.

I am not. I didn’t see any value in the paradigm at all until these “deep research” tools came out, but it’s the first use case that actually makes sense to me and I thought it was interesting that Elixir/BEAM mesh really well with that type of product.

catethos · April 11, 2025, 2:50pm

Has anyone look at the Google A2A agent protocol (Announcing the Agent2Agent Protocol (A2A) - Google Developers Blog) ? Initially I am thinking Elixir genserver would be a good fit to model agent to agent communication , but with these standardize protocols being pushed by big companies, what are some of the advantages of using Elixir instead of Python in this domain?

garrison · April 11, 2025, 4:52pm

I find myself agreeing with this post by antirez (redis guy) about MCP and other “AI protocols”.

Coming up with “structured” protocols for LLMs is quite possibly the least interesting thing you can do with a paradigm which is revolutionary solely because it can interact with unstructured data. We have, for the first time in the history of our field, finally found a way to interact with computers in the way that “regular people” expect and of course the first thing programmers do is try to find a bunch of ways to get rid of the uncertainty and reassert structure.

Of course, our ability to reason about computing in a structured way is why we are programmers and everyone else is not, so this is not surprising. But I don’t think these things are going to last - they are a product of hype IMO.

Of course I cannot write this comment without linking the XKCD.

garrison · April 11, 2025, 5:03pm

Protocols aside, the big advantage for Elixir here is that our ecosystem was practically made for this. There is no better platform for writing soft-realtime glue code between different models/services/APIs.

If you take a step back and think about this it makes perfect sense: Erlang/OTP were literally designed for telecommunications. This problem, facilitating communication between models/services/users, is telcom, it’s just that the scope has expanded far beyond phone calls. It is a testament to the wisdom and creativity of those who built these systems that they can still be so relevant today.

“Agentic” apps built with Elixir will scale better with much lower latency (especially tail latency) than anything built with Python, and (IMO) developer ergonomics are much better too, though Python is far from the worst.

If anything, our biggest “competitor” will probably be JS simply because a couple of large companies (e.g. Cloudflare) have committed to in-process multitenancy which will drive costs down significantly for those who aren’t serving enough to saturate a VM core. On the other hand, one might argue those customers aren’t very valuable.

mikehostetler · April 17, 2025, 5:51pm

Author of Jido here

I’m actively wrestling with these ideas. I’ve implemented several “Applications” with Jido now that … after finishing them … I really struggle to answer whether they are better with Jido or not. Long term - a few GenServers that wrap req calls to LLM API’s are better.

I have this overwhelming feeling that Jido is the right direction - but has not arrived at a sensible destination.

A few other thoughts to share:

There’s a complexity vector here - a simple LLM wrapper doesn’t need a sophisticated agent framework - Oban works great.
Most agent implementations are really really simplistic - any dreams of massive swarms of agents demonstrating collective intelligence are still dreams - Elixir is more suited to larger swarms of agents due to OTP
While implementing Jido, I learned a lot about OTP - the educational journey was amazing. I found Joe Armstrong’s blog while on this journey - and realized that some of the features of Jido are simply constrained implementations of OTP - I don’t think that’s bad necessarily - but probably not the best implementation
While it’s easy to write an run 10,000 agents with Jido - it’s not that useful - for all of the normal distributed systems problems that come from running and coordinating 10,000 GenServers

There’s more questions then answers right now - but I do think LLM’s are here to stay so it’s better to wrestle with them

This particular space in our industry is evolving a lot right now - so I’m content to just continue wrestling and playing with the ideas. A few “first principles” I’ve collected so far:

Agents will be a new buzzword for “LLM Applications”
Agentic workflows are just workflows - low volume ETL pipelines with more variety
Multi-agent is simply a new variant on distributed systems problems
Elixir is well-suited for this - but suffers from a slower iteration cycle - so I’ve been following Python and TS “agent” frameworks closely and pulling in patterns to Jido that I feel will be durable over time

garrison · April 17, 2025, 6:16pm

Thanks for the reply! It’s good to hear from someone working on this directly. Unlike you I am not currently working on anything in this area, so I am operating almost entirely on intuition. But with that said I do have some suspicions as to how things will play out, and I’ve voiced some of them already.

I think it’s more important that we provide “building blocks” in Elixir rather than an “agentic framework” per se, since the BEAM itself provides that framework. Not even OTP necessarily, but the scheduler and runtime is just so well-suited to the task IMO. Providing tools for tool calling, monitoring, etc is what will be important. From a cursory look I see Jido is providing some of those things, so I think you’re probably on to something there.

Yeah, I don’t see the value proposition for “swarms of agents” or anything like that (seems almost anthropomorphic tbh). But Elixir is great because it will scale out so well if you deploy a real app with real users.

Of course one can get fantastic value out of this model. The best example would be LiveView, which is “just” a GenServer if you squint and yet provides much more than the sum of its parts in practice.

I think it’s best not to get distracted here. There is a lot of hype nonsense going around (see my above comment about protocols) which is probably not going to stick. The LLMs are the only interesting thing about LLMs (imagine that) - so it’s best to focus on them. APIs will come and go (and can be trivially implemented with Req and so on).

mikehostetler · April 17, 2025, 6:43pm

I love talking and thinking about this stuff! The depth of conversations I’ve had from other community members makes me appreciate Elixir even more

I think it’s more important that we provide “building blocks” in Elixir rather than an “agentic framework” per se, since the BEAM itself provides that framework.

Yes, I’m 100% with you here - this is why I was opinionated about keeping any specific LLMs out of Jido core. LLM’s are implemented as specific Agents, Skills and Actions in the jido_ai package - I was pretty happy with how well this worked

Yeah, I don’t see the value proposition for “swarms of agents” or anything like that

Short term - no - I don’t know of any examples deploying many unique agents right now. That said, I see a natural evolution where a product implements a 1-to-many agent pattern where one agent orchestrates the work of other agents on behalf of a user. This was the breakthrough that Manus demonstrated so well.

I’m not convinced this is good though - as you said it gets very anthropomorphic and too magical. Interestingly, I have asked my kids about this (ages 15 & 17) and they have less of an issue - so I’m acknowledging some personal bias here.

I think it’s best not to get distracted here.

Solid point - I pushed out the Jido weather example and went on holiday for Spring Break - which has prompted a lot of this reflection. I’ve been soaking up other example frameworks, reading everything I can get my hands on etc

This was a great read as well.

It’s been easy to get lost in the weeds - I’m back to building agents this week with some fresh perspectives and will be shipping more soon!

nallwhy · April 18, 2025, 6:33am

Really enjoying this thread—so many insightful takes. I’d love to add a perspective from someone currently building a service on top of Ash and Ash AI(GitHub - ash-project/ash_ai: Structured outputs, vectorization and tool calling for your Ash application), where we’re integrating an agent-style chatbot into a real workflow.

One thing I believe strongly:

An agent’s job is to elevate and clarify user intent, then communicate and act on it to drastically simplify the UX.

In the app I’m building, the flow looks something like this:

a user uploads a contract file and just says “process this”.

The agent then:

reads the file and extracts key data
checks if the referenced client is already in the system
creates the client if necessary
continues to create an invoice→ all while mixing automation with UI-driven confirmation, so the user stays in control.

Internally, this feels incredibly natural and surprisingly fluid .

Where Elixir shines in this setup:

Ash Framework: The declarative power of Ash is hard to overstate. It lets me define tools cleanly and declaratively, which plug seamlessly into AshAI. This means I can expose my application logic as “tools” with almost no extra effort.
LiveView or Channels: Real-time interaction with the user is crucial. For example, once the agent identifies what kind of data needs to be confirmed, we bring the user to a LiveView-powered form to review/edit inputs. Once confirmed, the agent resumes and continues the workflow. This kind of multi-step, multi-view interactivity is where LiveView or channels make things incredibly smooth.

I’m still early in the journey, but this combination of structured domain logic and real-time agent orchestration feels like a powerful direction.

bradley · April 20, 2025, 6:35pm

I’ve played around with porting ADK to Elixir.

The repo (which isn’t deployed to Hex) is mainly an experiment in trying to understand what it means to implement AI agents in Elixir. I will say firsthand, after diving into the ADK framework, that we have a HUGE advantage over the Python ecosystem in the sense that deployment is just… not a thing for us. It’s wild how much simpler it is in Elixir. Plus, we can theoretically spin up agents at runtime for thousands, if not millions, of users and spin them down on demand.

I couldn’t agree more with you! I’m also wrestling with whether these abstractions are necessary. That said, I think things could be a lot easier. For example, maybe I want to perform evaluations against a few agents that are chained together in some way, and ensure they perform tool calls as expected, etc. etc. Right now, I don’t think that experience is all that pleasant.

Additionally, I think it would be helpful to have dedicated telemetry and monitoring around AI agents so I can easily introspect them. I haven’t used Langsmith or LangGraph, so I can’t say whether they’re useful or not, but that kind of tooling seems potentially valuable.

While we could use Oban or other existing tools, it still feels like I’m building something custom that’s prone to error. Maybe this is a bad analogy, but it kind of feels like reinventing auth, where each agent has its own error-prone setup, and a simplified framework could help ensure things are done correctly.

So, I’m leaning toward the idea that we probably do need something like Jido or ADK, but with the caveat that you might not always need it.

With that said, I need to look at Jido in more detail. You’ve done a ton of work and it looks amazing at quick glance!

mikehostetler · April 20, 2025, 9:33pm

I couldn’t agree more with you! I’m also wrestling with whether these abstractions are necessary. That said, I think things could be a lot easier. For example, maybe I want to perform evaluations against a few agents that are chained together in some way, and ensure they perform tool calls as expected, etc. etc. Right now, I don’t think that experience is all that pleasant.

Agreed - it’s not easily testable in the traditional sense either (ExUnit) - nevermind the evals - but the patterns seem to be emerging

With that said, I need to look at Jido in more detail. You’ve done a ton of work and it looks amazing at quick glance!

Thank you!!!

This blog just dropped from Harrison Chase, author of the original Langchain (in Python): How to think about agent frameworks

He does a great job and moves this conversation forward. I made a ton of notes and figured it would be helpful to share them here.

Note: These notes my notes, so are admittedly Jido focused

Jido is more correctly defined as an “Agent Abstraction”
Oban is better for predictable workflows in Elixir
Jido has solid orchestration primitives, but this is underdeveloped
Jido is building blocks for making existing systems more agentic - declaration, orchestration, observability, guardrails
Need to invest more in the “multi-agent” story - as that is where Elixir can shine the best

Reading this was invigorating and encouraging as I’m strategizing the next big push on Jido. I really think the orchestration and observability focus is what will be the most valuable for the ecosystem going forward. Would love to hear any other thoughts on this.

garrison · April 21, 2025, 3:59pm

I am still very skeptical of this idea. I have followed deep learning progress for a while, and probably the clearest trend I’ve observed (other than scaling laws, perhaps) is that every time something is made differentiable performance improves. Neural nets which are trained separately are essentially guaranteed to perform worse than those trained jointly (and those trained jointly are, philosophically speaking, one model). You can see this trend everywhere - vision, self-driving, and of course language modeling.

As an example, ChatGPT used to generate images by instructing the LLM to prompt an image generator, with mediocre results. The recent “integrated” image generation capabilities were a huge improvement and were the subject of a viral sensation for that reason. Multi-modal models for image and audio understanding, trained jointly, are also a recent trend (everyone is doing it now).

Today there are still use-cases where “multi-agent” or multi-model agents win. Aider for example uses separate architect and editor models which they found led to better performance. But Aider is an open source project which cannot afford to train their own models. OpenAI/Anthropic will simply train (and RL) an integrated model and obliterate those results. Maybe they already have.

In the limit, a big, integrated model is always going to win at any task. Squeezing a model’s intent into a limited token stream and transferring it to another model, one which was not even jointly trained to understand said information, is unlikely to produce the best results.

I mentioned that this line of thinking is anthropomorphic - this is what I meant. Humans expect this approach to work because it’s how humans work. A single human cannot be scaled, so we specialize into jobs and pass information around through language. We understand this is inefficient, but we know no alternative.

Models are digital information; they can be perfectly cloned. Human patterns do not apply.

garrison · May 22, 2025, 3:17pm

There have been a few recent developments that have me wondering if this pendulum is about to swing in the other direction.

First of all, consumer hardware is continuing to improve as expected. Nothing surprising here, but new GPUs and CPUs (with integrated engines) are getting better at running models.

Second, frontier model progress seems to be stalling out while small open models are improving. GPT4.5 was a flop, Sonnet 3.7 is seen by many as not much better than 3.6. Gemini 2.5 is apparently very good, but they’re just now catching up from their abysmal performance thus far in the race. Llama4 was so bad they had to cheat.

On the other hand, everything I’ve heard about Gemma3 is that it’s a substantial improvement. Qwen3 also seems to have been well received, I think.

But critically, a mainstream lab finally released a diffusion model. Like I said before, I don’t fully understand why autoregressive models have kept on winning thus far. The only incentive I see, really, is that the big providers batch their prompts anyway so they don’t care about the waste. But this Gemini Diffusion model is really fast (800+ t/s).

If they release a Gemma3-like diffusion model to the public everything I speculated about here could well be reversed. If this becomes a trend, tools like Bumblebee could be back in play and we could be running models within the BEAM for some tasks. Or even on the client. Not everything needs a terabyte of weights.

Something to think about.

garrison · May 22, 2025, 6:59pm

Of course Claude 4 is announced immediately after I posted this lol. It will be interesting to see if they have made any real progress.

Side note: these companies have got to stop with these ridiculous model names. How is anyone supposed to keep track of any of this.

dimitarvp · May 22, 2025, 8:48pm

It gives them a nice escape hatch. “Oh, you didn’t use FrobnicatorCanOpenerDeluxeUltra v167.98, of course you’ll get suboptimal results”.

Though as cynical as I am, I believe this to be a nice bonus emergent property that works in their favor. The true reason is that they are having difficult time to catch up with everything, their own strategy included. I kind of sympathize with their predicament but yeah, I’ll agree it’s getting ridiculous at places.