Flame – rethinking serverless

josefrichter · December 6, 2023, 6:18pm

Seems like this one is not here yet:

Imagine if we could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of the app.

Enter the FLAME pattern.

FLAME - Fleeting Lambda Application for Modular Execution

With FLAME, you treat your entire application as a lambda, where modular parts can be executed on short-lived infrastructure.

Check the screencast to see it in action:

Nezteb · December 6, 2023, 6:59pm

Shout out to @chrismccord @jeregrine @mcrumm @seanmor5 for this awesome work, and their ability to build hype!

EDIT:

HN: Rethinking serverless with FLAME | Hacker News

Lobsters: Rethinking Serverless with FLAME | Lobsters

seanmor5 · December 6, 2023, 7:58pm

Please don’t thank me, I just submitted a quick doc fix this morning and didn’t do any of the hard stuff This was all @chrismccord @jeregrine and @mcrumm

D4no0 · December 6, 2023, 10:22pm

Great work, absolutely love the fact that infrastructure is abstracted away!

I used on a few project erlang erpc, however the biggest problem with that was always having to maintain 2 separate codebases and their contracts.

akash-akya · December 7, 2023, 7:08am

This looks lit! I am sure this simplifies certain use cases.

Regarding CPU-intensive workloads, how does it compare against Vertical Auto-scaling? Unlike horizontal auto-scaling, where you would end up running multiple instances of your unneeded web servers, with vertical auto-scaling, resource allocation would be more granular, and allotted CPU will be used by the process that needs it. And unlike Flame, we don’t have to run another instance of our application or need distribution setup. Theoretically, we should be able to provide enough CPU based on the actual usage (up to the machine limit).

On a minor note, if the workload spawns an OS process (like ffmpeg in the example), then we can go even more granular and cover the safety aspect by limiting the CPU using cgroups so the web server always stays up.

That said, I can see that doing auto-scaling and cgroups can be complex, and it does not address all the use case Flame covers, and probably the developer experience, tooling will be much better with Flame. I just want to know others thoughts on such approaches, if I am missing some details.

Thinking about cgroups and Flame, I guess they can work together too. We can create a Flame Backend Adaptor to start applications locally with safety guarantees (CPU/memory limit) set using cgroups. And it can be made to work with or without Erlang Distribution.

mayel · December 7, 2023, 9:57am

May be cool to see an Oban integration, where you could configure job queues to run in flame pools?

mcrumm · December 7, 2023, 5:19pm

I appreciate the tag but if you check the contributions you’ll note that Flame is almost entirely Chris– as of this morning the rest of us all have one commit each

jkbbwr · December 7, 2023, 10:06pm

Firstly, this is cool as hell, congrats to everyone that worked on it.

Secondly I have a problem that this kind of looks like it might help with however I have a stumbling block.

I have an application that I want to call bits of it remotely like how FLAME seems to be designed but how the method is orchestrated is via an elixir script file. It acts like a DSL for the whole thing.

Id also have a need to customise the image that actually ran remotely, say I wanted one FLAME job to be run on alpine, one on ubuntu.

I need to think about this.

lud · December 7, 2023, 10:25pm

def generate_thumbnails(%Video{} = vid, interval) do
  parent_stream = File.stream!(vid.filepath, [], 2048)
  FLAME.call(MyApp.FFMpegRunner, fn ->
    tmp_file = Path.join(System.tmp_dir!(), Ecto.UUID.generate())
    flame_stream = File.stream!(tmp_file)
    Enum.into(parent_stream, flame_stream)

    tmp = Path.join(System.tmp_dir!(), Ecto.UUID.generate())
    File.mkdir!(tmp)
    args =
      ["-i", tmp_file, "-vf", "fps=1/#{interval}", "#{tmp}/%02d.png"]
    System.cmd("ffmpeg", args)
    urls = VidStore.put_thumbnails(vid, Path.wildcard(tmp <> "/*.png"))
    Repo.insert_all(Thumb, Enum.map(urls, &%{vid_id: vid.id, url: &1}))
  end)
end

If File.stream! returns a struct with a path key containing the path to the file on the local machine, how can Enum.into(parent_stream, flame_stream) start a stream and read from that file from the remote machine?

Otherwise, this looks really nice! Amazing even.

mgwidmann · December 8, 2023, 1:05am

Not to confuse with a small project I’ve had for 7 years already called “Flames” on hex which is a sort of simplistic version of an error aggregation service.

This project reminded me that it was time to publish my liveview rewrite I’ve been running off a branch for the past year.

chrismccord · December 8, 2023, 1:46am

File io in BEAM is process based. Streaming here will Just Work™ and chunk at 2048 bytes at a time. It’s just part of beam and Elixir’s File interface, so you’ll need go spelunking if you want the impl details

It really blows your mind just how many ridiculous things we get for free like this

niccolox · December 8, 2023, 3:59am

I’d be interested to see a Membrane video + AI + Fly + FLAME demo app

if I am reading this correctly, also interested in how existing monoliths could FLAME out expensive services like video or AI

some folks with bandwidth and commercial skin in the game would benefit the Elixir Phoenix ecosystem if they could do a side-by-side $ comparison of FLAME fly vs Lambda AWS etc

with tech landscape facing budget constraints right now, focus on ROI with FLAME

I really like the just-in-time or on demand or drop in Lambda, the ability to make your monolith fragment into ephemeral microservices

certainly another genuine game changer from @chrismccord

adw632 · December 8, 2023, 6:14am

Cost benefits will come back to the flag fall metering model for on demand workloads vs a commitment tier.

At the very low end it may be beneficial, but at some point you will be better off with a commitment tier as that always provides the largest discount because cloud service providers are guaranteed ROI when you commit. They may be some benefit to using demand based between increments in number of nodes on a commitment tier and to handle spikes that exceed any reserve capacity in your commitments.

Where FLAME is different is that you can delegate specialised work to a pool that scales to zero without a lot of complexity so your primary nodes continue serving the non-specialised work.

This also afords targeting work to run on specialised VMs and very little change to the code to achieve a very different architecture.

lud · December 8, 2023, 8:55am

  @doc false
  def __open__(%File.Stream{path: path, node: node}, modes) when node == node() do
    :file.open(path, modes)
  end

  @doc false
  def __open__(%File.Stream{path: path, node: node}, modes) do
    :erpc.call(node, :file_io_server, :start, [self(), path, List.delete(modes, :raw)])
  end

Alright, the node is stored in the stream struct and the file stream implementation will just ask the right node to do the work.

Indeed this is so simple and Just Works™

tiagodavi · December 8, 2023, 2:26pm

This is great. this reminds me this decorator python function that runs functions on different infrastructures with GPU attached created by Fal.ai

chrismccord · December 8, 2023, 3:50pm

You can also do that with Fly GPU machines, where your app is running in a regular machine, and FLAME in GPU. For example:

{FLAME.Pool,
 name: BBRunner,
 backend:
   {FLAME.FlyBackend,
   gpu_kind: "a100-pcie-40gb", cpu_kind: "performance", cpus: 8, memory_mb: 20480}},

You’ll need to consider model load time on cold start, which could be 10-20s to get things loaded into memory for sizable models. Dynamic provisioning any kind of ML setup pays this price, but something to keep in mind. You’ll also want to bake the model/xla/bublebee artificats into your build step for as long as is feasible, so all the cached artifacts are in the docker container without needing to pull them from elsewhere.

chrismccord · December 8, 2023, 3:56pm

According to @seanmor5 JAX has a way now to cache compilations, so we may be able to optimize the cold load time as well

gist.github.com

https://gist.github.com/shawwn/16d89ea5121d10214459238225453b13#

JAX_compliation_cache.md

JAX released a persistent compilation cache for TPU VMs! When enabled, the cache writes compiled JAX computations to disk so they don’t have to be re-compiled the next time you start your JAX program. This can save startup time if any of y’all have long compilation times.

First upgrade to the latest jax release:
```
pip install -U "jax[tpu]>=0.2.18" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
```

Then use the following to enable the cache in your jax code:
```py
from jax.experimental.compilation_cache import compilation_cache as cc

This file has been truncated. show original

Hisako1337 · December 8, 2023, 4:28pm

just out of curiosity: did you manage to run a 7B model (like mistral) via bumblebee right within a phx application and chat with it? I mean, “just like” a dependency? I am still a bit unsure if this stuff will “just work” when deploying to fly as usual

chrismccord · December 8, 2023, 5:49pm

Yes. This is not using FLAME, but here’s llama2-13b running Elixir/bumblebee on Fly GPU https://gpubee.fly.dev/