I saw with the initial announcement that, on Fly at least, a cold start is ~3 seconds, which is pretty good!
I’m wondering if there’s anything we can do as a community or individually to get that as low as possible. To that end, I have a few questions I hope people more knowledgeable than me can answer:
Is that startup time more a Fly machines thing, or more due to Elixir’s startup time?
Are there startup optimisations for Elixir that haven’t been worked on yet because it’s been more than good enough so far?
What about changes to our individual setups, is there anything in configuration of our apps or beam settings that might lower startup times?
I assume most of the cost in the
Fly example was Fly booting the machine with the container in the same DC from cold.
You are going to struggle to get faster than that for a freshly provisioned machine.
As for booting the application, flame exposes if you are a child or the parent and if you are the child you can opt out of booting components of your application, as is shown in the example code.
I have personally never encountered slow startup times with the BEAM.
Better to say that it meets your tolerances - and those of many developers and organizations. But let’s not discuss it as if there’s no room for improvement, or for criticism, or that “slow startup” is not a matter of perspective. Dismissing the OP’s concern like that isn’t cool IMO.
Elixir’s a VM-based language with a runtime that takes measurable time to start, which is why we’re still suboptimal for things like adhoc scripting and creating CLI applications and wouldn’t have been a natural choice for actual Lambdas either. The BEAM is probably competitive with something like Python or Node for cold-start time but it is probably not quite the same level as something like Golang or Rust + Tokio is capable of. It’s a little better than non-Graal JVM IIRC.
Anecdata on my 2023 16" M2 Max:
hyperfine --warmup 2 -S none --export-markdown results.md -- 'elixir -e \'System.halt(0)\'' 'elixir -e \'IO.puts("Hello world")\''
elixir -e 'System.halt(0)'
|95.7 ± 0.9
elixir -e 'IO.puts("Hello world")'
|97.2 ± 1.0
|1.02 ± 0.01
That’s not loading an application or starting an Ecto pool or anything a lot of projects would need to be actually useful, just the most trivial things the runtime is capable of. So IMO it’s not a great floor to start from.
This Rust example starts a Tokio runtime (with a WAY smaller scope than OTP, so not particularly apples-to-apples in that regard), then does a TCP dial and writes to the conn before shutting down:
gh repo clone tokio-rs/tokio
cargo build --release --example hello_world
ncat -kl 6142
# new shell
hyperfine --warmup 3 -S none --export-markdown rust-results.md -- './target/release/examples/hello_world'
|1.6 ± 0.2
On a platform like Fly or EC2 any of that is dwarfed by provisioning time, of course. And for “traditional” Elixir applications, no one much cares because the ratio between “time spent serving a purpose” and “time spent starting/restarting” is extreme.
Just as a side note, I didnt dismiss anyones concern, I was giving personal perspective
Nothing to do with Elixir in this case. I haven’t measured the beam release startup for a typical phoenix app in a while, but it’s not a concern in the context of this original question. The three seconds is for Fly to pull the image, place it, connect it to our proxy and the child to Node.connect back to the parent. The release start is in there somewhere, but complete unmeasured guess is less than 0.5s for a phoenix app to be up serving traffic. So I don’t think anything will move the needle much here with any beam/elixir optimizations.
Having said all that, the current 3s baseline is not a concern for me. Faster is always better of course, and it’s possible the fly team can work some optimizations on our side. For example, waking up idle (already created) machines is usually 500ms. But there is some nuance to comparing cold starts to typical FaaS. I know AWS also keeps their functions “hot” underneath but it’s much less 1:1 for us. Runners in the pool will almost always be serving many concurrent operations (the pools
:max_concurrency) vs thinking of them as cold starting to serve 1 user. Also, the way elixir supervision trees start up also guarantee that
:min runners will be hot before the app starts serving traffic, so you aren’t bouncing servers to then hit thundering herd of cold starts. You can also use a
min: 0 , but still warm up a min number in your sup tree to get scale to zero, but warm startup. The runners would idle down after boot if no traffic is active.
The missing piece currently is our pool growth strategy waits for the pool to be at capacity before growing, so cold starts can be hit as the pool grows. Next steps are to mitigate these cases with more sophisticated strategies like observing growth rate / percentage based early starts / etc
So I hope that helps give some insight. Faster is always better, but current baseline is not a concern for me, and I don’t think beam/elixir startup has much to move the needle on the FLAME startup side.
I accept that there is a different setup process in how workloads get spun up on fly vs AWS.
The Firecracker VM (a KVM based VM tech that AWS developed and what fly use to run our workloads and “transmogrify” containers into).
Firecracker is awesome as it provides VM level isolation that containers can’t , but the VM abstraction is also different to allow incredibly fast boot times (eg no ACPI).
Fundamentally a Linux VM takes about 75ms to start on Firecracker.
In contrast FreeBSD boots in 25ms so a savings of 50ms in start time.
Phoenix server startup time for releases is quite fast but I haven’t timed it precisely either. It is may be as high as 50-100ms and I am interested if anyone has a precise metric to time taken to start phoenix and listen on the socket.
So accounting for VM start, OS boot and BEAM/phoenix start should be achievable in well under 100ms. I remember the Erlang Ling VM could boot on Xen and serve a request in about 50ms.
A 50ms target is certainly a possibility on FreeBSD but not on Linux as its boot time on Firecracker is 75ms, not including starting the BEAM. Fly doesn’t support FreeBSD currently so right now it’s more likely that integating FLAME on AWS offers the path to much lower startup latency.
The 3+ second start time is almost all provisioning overhead and further opportunity for fly to optimise. Certainly scheduling work, and copying images to the target instance adds some unavoidable overhead due to that approach.
I guess it’s a fly business decision for approaching this differently from a warm start service tier. I can imagine low latency startup for serving dynamic workloads is something people would pay for, but getting the balance right between a dynamic on demand service pricing vs commitment tier pricing is something that needs consideration also.
Yes good insights. One approach the FlyBackend could take is creating the max pool machines on start, then idling them to min. Then as the pool grows it wakes them up, which should be more like 500ms. Idle’d machines aren’t free, but much less expensive than running machines. Still more expensive than them not existing at all, so tradeoff.
The caveats here are you risk orphaning idle’d machines since the parent could go down without cleaning them up, and they’d also never clean themselves up if they aren’t running. The backend could try to handle cleanup the next time it starts up by finding “old” runners, but we’re in orchestration land and a lot more complexity. So I’m pretty happy with the tradeoffs we currently have by creating resources on demand and fresh in a way that requires no coordination and no chance for orphaning.
Some ideas for folks if they want to experiment with their own fly backend
That sounds quite encouraging.
Even if fly can’t kill off orphaned runners because it can’t know about internal Erlang process state, it sounds feasible that from Elixir land our cluster could feasibly discover this orphaned runner situation itself and cleanup orphaned idle machines with a bit of bookkeeping, perhaps leveraging fly machine meta and abstracting this concern as a cluster wide “fly runner supervisor”.
yup exactly. Each app instance could also race cleanup because it doesn’t matter