Blog Post: How I use Erlang Hot Code Updates

One of the Erlang ecosystem’s spiciest nerd snipes are hot code updates. Because it can do it. In ways that almost no other runtime can.

16 Likes

I mean this in the most supportive way: I wish this was more of a deep dive. It kind of feels like a “this is a cool feature, I promise!”

I have found hmr very useful in my experimentations with desktop video game development in Elixir, mainly with exsync, although it had some problems. Problems with could’ve been solved with code_change/3 (kinda - because wx_object isn’t actually a gen_server, I’d need a bit more plumbing to get it to work, I think), but it’s something most people seem to consider not worth the effort, as far as I can tell.

Using r in IEx did definitely help with some of that. But there are a lot of things that need to be set on startup that recompiling doesn’t fix - things like window title, size, OpenGL attributes, etc. Think of games that require a restart for certain setting changes - one of the interesting aspects of this ecosystem is that the window and renderer can be supervised, crash, and restart. But there’s not really any writing I’ve found about these things.

Alternative, since you mention Nerves, shouldn’t it be possible to use code_change/3 when you change the size of some static values (buffer size or whatever) that are set on init, so a change to them doesn’t need a C-c C-c; iex -S mix?

I know this is a really niche view of things, and I appreciate any writing about this. I just want more! :slight_smile:

1 Like

Yeah, this seems like a can of worms.

If I’m not mistaken a wx window is a separated runtime entity that the erlang VM communicates with (at least this is how it is implemented in scenic), which limits the potential interactions by a lot. This kinda makes hot-code reloading not a thing applicable to it, as the initialization arguments are passed when opening a new window.

1 Like

It’s definitely a can of worms :slight_smile: I was hoping my point was more that there are complex instances where code_change/3 can do some impressive stuff, which is not really discussed much, as far as I know. I didn’t mean to get bogged down in wx stuff.

Mainly the point was, code_change/3 can be capable of pretty complex stuff, but those things are difficult, require careful design, and may not be worth it. But there isn’t much written about what to do when it is worth it. Nerves is a good example of when it may be possible worth the effort.

One of the issues, I think, is that things are pretty speedy, so killing and re-running is quick enough, generally.

The only upside of hot code reloads that my brain could think of was the evolution of a DB schema without having to do the whole N steps dance for it to be truly safe in the classic deployment scenarios.

But when I sat down and tried to sketch a solution, it turns out the steps were just 1 or 2 less and still mandated code that seemed more complex.

Not to rain on anybody’s parade obviously, if you have a truly useful and money- and time-saving scenario for it, I’d like to hear it!

But for 8.5 years with Elixir I have not found a compelling use-case for it.

1 Like

I’ve never used it in production systems too, however the fact that you can deploy it to a system in less than a second in most cases seems like a compelling enough argument, especially when we do CD by default these days.

I think what makes it complex is the amount of variables involved. I would be more than happy to do hot-code deploys where the beam files are replaced and the application tree is restarted, it is times faster than replacing entire containers/instances.

I agree, it’s just that in a world of containers and blue-green deployments this is also not as compelling as it was 10+ years ago. And Phoenix’s team took special care to handle draining of connections on shutdown, so Phoenix seems more or less perfect for blue-green deployments.

Agreed as well, if hot code reloading was somehow simpler I’d give it an honest chance. But when I last researched it 2-3 years ago it just looked way too involved given the alternatives.

I could be missing out and not realizing it, but it’s also not one of the things that seems so crushingly valuable. If it was, we would have very visible blog posts and videos about it. So I made a half-informed decision not to pursue it.

2 Likes

Sorry to half-derail your thread @lawik. On topic and after reading your blog post I’ll join @harrisi in saying that I wish it was a bit deeper. I trust your assessment that it’s useful for you but as mentioned above, I never worked on projects where it would seem that it could add visible value. Thanks for writing this up, I’d be looking forward to a future more detailed blog post.

1 Like

I think one of the things is that, for server-side work, there are better options (such as blue-green, rolling updates, etc.) which don’t impact users. However, for client-side applications (a game, in my case, or scenic, or nerves), it seems like there’s definitely opportunity for situations where you want to have live updates. LiveView Native is another instance, potentially, where it could be neat.

I think the biggest barrier for either case, though, is that it’s usually not a big deal to just have a popup that says “update now?” and restart the app, which takes a second or two. The “excellent” behavior would be to update without user interaction, but that’s not necessarily even correct, because most uses should allow the user to decide to update. Or, it can update when no users are using it overnight, or whatever.

My point, I guess, is it’s an extraordinarily unique scenario where things make sense. I still haven’t figured out the unique set of circumstances where it does, even, as much as I’ve tried. Currently, here’s my list:

  • client-side
  • updates don’t impact work
  • a stop-everything-to-update scenario is unacceptable, or at least not desired
    • maybe because the initial startup is prohibitively expensive, but you can do it in the background?
  • or, it’s just neat

Again, Nerves development (specifically not actual user usage), and game development and potentially game playing. Weird instances.

2 Likes

I use hot code loading a lot in a monorepo with a baseline app and client-specific isolated features.

There are few rules :
Client features depend on the baseline app api with tight but unidirectional coupling so they can be in the same codebase to be easily tested.
The base app can never call client code and has no knowledge of it.
Two client features cannot have knowledge of each other.
Of course client A code has no knowledge of client B code either.

So in dev I can load some, or all client specific features. In testing too, everything stays in sync because it is just a single codebase.

In CI, after the tests, both the client namespace & tests are wiped before the release gets built.

Then the specific features are injected into specific instances and persisted to be reloaded at startup in case of an app restart. There is some plumbing that allows to remove a feature, upgrade it, make it available to the end user, hide it.

What I like with this setup is that I keep it simple but client specific code never makes it into the release. Also, a breaking change in the main app instantly pops up by breaking the unit tests of client features since it’s just a single codebase with parts that happen to be selectively loaded later.

Of course there are a few specifics, like how are specific parts then delivered to a live system, but after reading F. Cesarini’s Erlang books this kind of thing did not seem too weird anymore to me.

I’m at a very small scale and every client runs a separate instance, so what works here for me is not universal. I use hot code loading a bit like a feature flag system built into the runtime, but the features are not idle, they just aren’t there to begin with.

3 Likes

Happy to read people actually using hot code re/loading in the article’s comments on HN.

4 Likes

Me too. But I didn’t have deeper things to say that wasn’t covered by Learn You Some Erlang and the AppSignal guide.

Best part of this post is what it lured out in the hacker news comments. Love when a Discord dev pops up.

Nice to see a bunch of peoole using it.

If you want a deeper look at it Bryan Hunter’s first episode on BEAM Radio “Punking the Servers” and his 2023 talk at GigCity Elixir were both fascinating. Not code-level typically but a fair amount of detail. I think the HCA talk at Code BEAM NYC was a deeper look but it isn’t out yet.

4 Likes

We used hot code reloading in production occasionally at my previous position. We only used it when things were dire, because we were nervous something bad might happen, but it never did. It worked every single time.

It was mainly useful if something went out slightly broken, a missing where clause or a faulty comparison, a team member needed a tweak in validation to finish onboarding a customer, or if we just desperately needed an extra log message. Or maybe the CI job was not working or Kafka was looping on a single malformed event and crashing, causing a backup.

Check out the current tag, modify that module, run a script that would upload that file to each k8s pod and run a Code.compile on it. Worked every time, just takes a couple seconds. Then get code committed and the next deploy would have that code. We never had to modify state.

4 Likes