jonator

jonator

Using FLAME with many nodes

FLAME is basically the dream for running AI agents that need system command access such as coding agents. However, I have concerns about scaling to a very high number of nodes (many thousands). I can just write application Elixir code and scalability is batteries included; incredible devx.

With distributed erlang, the default is for the nodes to not run in “hidden” mode and to instead create many-to-many connections to all nodes, which severely limits the overhead of each additional node that is added to the cluster. I noticed this seems to also extend to the default in FLAME as I did not see an option to support hidden node detection. (I may be wrong, was a fairly quick scan).

Would the solution be, in the case of the Fly backend, to run each node using a separate docker image that passes the “-hidden” flag then fork/update FLAME to support usage of nodes(hidden) for spawned workers? (Or maybe this is already achieved somehow). My ideal is the main cluster that runs the web servers and Oban workers for each agent is in the “non hidden” cluster, and they each only know about some number of dedicated node connections running each agent (a hub and spoke architecture).

Regardless, am curious to see what you think. Cheers.

Most Liked

paulsabou

paulsabou

Maybe it’s worth considerding libcluster with partisan and a custom partitioning mechanism. this way you could avoid the mesh & decide how to split your nodes into small clusters => you could run flame in each small cluster

This should scale well I believe

cevado

cevado

it’s not a problem of pg but actually a problem of how disterl works… it is explained on partisan docs

Erlang/OTP, specifically distributed erlang (a.k.a. disterl), uses a full-mesh overlay network. This means that in the worst case scenario all nodes are connected-to and communicate-with all other nodes in the system.
Failure detector. These nodes send periodic heartbeat messages to their connected nodes and deem a node “failed” or “unreachable” when it misses a certain number of heartbeat messages i.e. the net_tick_time setting in disterl.

but is worth reading the full section in the doc:

nulltree

nulltree

Hijacking with a noob question here: partisan solves a different problem than pg’s process group scopes, correct?

I assume partisan avoids the full mesh at the network level and process group scopes create an overlay network within the full mesh used for group membership propagation?

Where Next?

Popular in Discussions Top

PragTob
Hello everyone, I know we had quite some threads (read through lots of them) about background job processing but it remains a hotly deba...
New
blackode
Elixir Upgrading is so Simple in Ubuntu and It worked for me Ubuntu 16.04 git clone https://github.com/elixir-lang/elixir.git cd elixir...
New
arpan
Hello everyone :wave: Today I am very excited to announce a project that I have been working on for almost 3 months now. The project is...
New
Fl4m3Ph03n1x
Background A few days ago I was listening to The future of Elixir from Elixir Talks, with Dave Thomas (@pragdave ) and Brian Mitchell. I...
New
New
nburkley
AWS re:Invent is on at the moment with some interesting announcements. One new feature in particular is the Lambda Runtime API for AWS La...
New
Qqwy
Looking at the stacks that existing large companies have used, WhatsApp internally uses Mnesia to store the messages, while Discord uses ...
New
rms.mrcs
A couple of days ago I was discussing with a friend about different approaches to write microservices. He said that if he was going to w...
New
klo
Got a question about when to concat vs. prepending items to list then reversing to achieve appending. So i know lists boil down to [1 | ...
New
dogweather
I wrote this comment on r/haskell, and it’s not popular there. :wink: But I think I’m on to something… Haskell reminds me of Java, and e...
New

Other popular topics Top

sorentwo
Hello! tl;dr Announcing Oban, an Ecto based job processing library with a focus on reliability and historical observability. After spen...
985 42842 311
New
aesmail
Hello guys, I have finally made it. I created an admin interface for a framework. It’s been on my todo list for years and with the curre...
New
belgoros
I’m not a pro in using Regex and can’t figure out why the following behaviour happens, especially if we take into account the difference ...
New
chrismccord
This release brings a number of exciting features, including integration with the new Phoenix LiveDashboard and Phoenix LiveView. There h...
New
ashish173
I am using Ecto timestamps with postgres, I can see the timestamps() use the :naive_dateime but for my use case I wanted to store the ti...
New
jason.o
In the code below, if the create action is not set to accept “extra_key” as an input, it errors out with a message shown above. Is there ...
New
KronicDeth
Elixir plugin for JetBrain’s IntelliJ Platform (including Rubymine) This is a plugin that adds support for Elixir to JetBrains IntelliJ...
289 35953 110
New
dblack
I’ve got an issue with an app and I’ve no idea of how to troubleshoot it. I’m hoping someone here might have seen something similar. I p...
New
romenigld
I am trying to run a deploy with docker and I successfully runned with this command: docker build -t romenigld/blog-prod . but when I t...
New
sergio
Kind of like when jquery came out, it was super necessary. Existing drag and drop libraries have a bunch of baggage to support old browse...
New

We're in Beta

About us Mission Statement