Please don’t thank me, I just submitted a quick doc fix this morning and didn’t do any of the hard stuff This was all @chrismccord@jeregrine and @mcrumm
This looks lit! I am sure this simplifies certain use cases.
Regarding CPU-intensive workloads, how does it compare against Vertical Auto-scaling? Unlike horizontal auto-scaling, where you would end up running multiple instances of your unneeded web servers, with vertical auto-scaling, resource allocation would be more granular, and allotted CPU will be used by the process that needs it. And unlike Flame, we don’t have to run another instance of our application or need distribution setup. Theoretically, we should be able to provide enough CPU based on the actual usage (up to the machine limit).
On a minor note, if the workload spawns an OS process (like ffmpeg in the example), then we can go even more granular and cover the safety aspect by limiting the CPU using cgroups so the web server always stays up.
That said, I can see that doing auto-scaling and cgroups can be complex, and it does not address all the use case Flame covers, and probably the developer experience, tooling will be much better with Flame. I just want to know others thoughts on such approaches, if I am missing some details.
Thinking about cgroups and Flame, I guess they can work together too. We can create a Flame Backend Adaptor to start applications locally with safety guarantees (CPU/memory limit) set using cgroups. And it can be made to work with or without Erlang Distribution.
I appreciate the tag but if you check the contributions you’ll note that Flame is almost entirely Chris– as of this morning the rest of us all have one commit each
Firstly, this is cool as hell, congrats to everyone that worked on it.
Secondly I have a problem that this kind of looks like it might help with however I have a stumbling block.
I have an application that I want to call bits of it remotely like how FLAME seems to be designed but how the method is orchestrated is via an elixir script file. It acts like a DSL for the whole thing.
Id also have a need to customise the image that actually ran remotely, say I wanted one FLAME job to be run on alpine, one on ubuntu.
If File.stream! returns a struct with a path key containing the path to the file on the local machine, how can Enum.into(parent_stream, flame_stream) start a stream and read from that file from the remote machine?
Not to confuse with a small project I’ve had for 7 years already called “Flames” on hex which is a sort of simplistic version of an error aggregation service.
This project reminded me that it was time to publish my liveview rewrite I’ve been running off a branch for the past year.
File io in BEAM is process based. Streaming here will Just Work™ and chunk at 2048 bytes at a time. It’s just part of beam and Elixir’s File interface, so you’ll need go spelunking if you want the impl details
It really blows your mind just how many ridiculous things we get for free like this
I’d be interested to see a Membrane video + AI + Fly + FLAME demo app
if I am reading this correctly, also interested in how existing monoliths could FLAME out expensive services like video or AI
some folks with bandwidth and commercial skin in the game would benefit the Elixir Phoenix ecosystem if they could do a side-by-side $ comparison of FLAME fly vs Lambda AWS etc
with tech landscape facing budget constraints right now, focus on ROI with FLAME
I really like the just-in-time or on demand or drop in Lambda, the ability to make your monolith fragment into ephemeral microservices
certainly another genuine game changer from @chrismccord
Cost benefits will come back to the flag fall metering model for on demand workloads vs a commitment tier.
At the very low end it may be beneficial, but at some point you will be better off with a commitment tier as that always provides the largest discount because cloud service providers are guaranteed ROI when you commit. They may be some benefit to using demand based between increments in number of nodes on a commitment tier and to handle spikes that exceed any reserve capacity in your commitments.
Where FLAME is different is that you can delegate specialised work to a pool that scales to zero without a lot of complexity so your primary nodes continue serving the non-specialised work.
This also afords targeting work to run on specialised VMs and very little change to the code to achieve a very different architecture.
You’ll need to consider model load time on cold start, which could be 10-20s to get things loaded into memory for sizable models. Dynamic provisioning any kind of ML setup pays this price, but something to keep in mind. You’ll also want to bake the model/xla/bublebee artificats into your build step for as long as is feasible, so all the cached artifacts are in the docker container without needing to pull them from elsewhere.
just out of curiosity: did you manage to run a 7B model (like mistral) via bumblebee right within a phx application and chat with it? I mean, “just like” a dependency? I am still a bit unsure if this stuff will “just work” when deploying to fly as usual