Elixir Crashes in Virtualbox with monotonic time message

I’m running Elixir/Phoenix in a VirtualBox instance. After a short period of time (typically 5 minutes or so), Elixir quits with this output:

Monotonic time stepped backwards!
Previous time: 2140883699436
Current time:  2140883692221
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
[os_mon] memory supervisor port (memsup): Erlang has closed
zsh: IOT instruction (core dumped)  mix phx.server

I can’t find any sort of erlang crash log. What should I do to troubleshoot this crash? At the time of the crash I have plenty of memory and swap available.

VM OS: Lubuntu 22.04.1 LTS
VM Memory: 2GB
VM Swap: 2GB
Elixir: 1.14.0
Erlang: 25.0.4

I think you might need to report this upstream to the Erlang issue tracker. Monotonic time going backwards is definitely a case for shutting down! At a quick glance I didn’t see this issue already reported.

Are you able to try it in a different virtual environment to rule out this being a virtualbox time issue rather than an Erlang VM issue?
Based on this article

  • Erlang monotonic time. This is Erlang’s view of the OS monotonic time if available, or the VM’s own monotonic version of the system time if not available. This is the adjusted clock that is used for events, timers, and so on. Its stability makes it ideal to count a time interval. Do note that this time is monotonic, but not strictly monotonic, meaning that the clock can’t go backwards, but it can return the same value many times!

I don’t know if it is possible for VB to report a monotonic time that goes backwards, but if it can, then the VM that is setting its own monotonic time based on that “OS” monotonic time could crash if the time is not actually monotonic.

1 Like

When searching for virtualbox and monotonic time you find some issues. For example https://www.virtualbox.org/ticket/7915

It’s a very old issue. Although there might still be issues around?

2 Likes

Switching VM’s would be a pretty big pain. I tried exporting my existing setup to VMWare but couldn’t get the image to work, and really don’t want to build this environment back up again.

I think you’re right that this is a VM issue, not Erlang. I’m guessing there isn’t a way to make Erlang more fault-tolerant to this issue? I’m not experiencing any other issues in my environment with other tooling or apps.

One strange issue is I’ve found this doesn’t affect all Elixir instances. I have my server running and fired up my tests. The tests crashed during compilation due to the time, but the server kept on running no problem. Then I reran the tests and they compiled and ran ok, but the server crashed shortly afterwards. Guessing Elixir/Erlang have to be actively working on something to detect the time error?

Did you look at the Time Warp flags in that post I linked about Erlang time? In particular this seems maybe applicable:

+C single_time_warp This is a special hybrid mode to be used on embedded hardware when you know Erlang boots before the OS clock is synchronized. It works in two phases:

  1. a.(with +c true) When the system boots, the monotonic clock is kept as stable as possible, but no system time adjustments are made
    b. (with +c false) Same as no_time_warp
  2. The user calls erlang:system_flag(time_offset, finalize), the Erlang system time warps once to match the OS system time, and then the clocks become equivalent to those under no_time_warp.
2 Likes

Aah, I’m sorry. I read your comment and quote but didn’t follow the link. That sounds exactly like what I want. I have it running right now and no crashes so far.

To summarize what I’m trying, it’s elixir --erl "+c false" -S mix phx.server. The description sounds like exactly what I want:

If the OS system time jumps backwards, the Erlang Monotonic clock stalls until the OS system time jumps back forward, which can take a while.

Erlang’s clock stalling for a bit is much better (in my dev environment) than the whole thing crashing.

Thank you! I’ll report back here if it ends up dying eventually, but it looks promising.

2 Likes

Just curious if the fix was persistent. If so, I’m wondering if someone smarter than me can explain what could be the potential pitfalls of having the BEAM clock stall at random times as dictated by the OS clock. I’m guessing log messages might end up with incorrect time stamps that could make debugging harder. Staying synchronous across multiple nodes I think might be another that is potentially only relevant for certain types of applications. Anything else?

Yeah, still good. It looks like this solved it (usually it would have crashed a hundred times by now).

I’m curious about the implications as well. The clock rolling back is probably a huge red flag normally. I wonder what happens internally in Erlang’s VM when the clock is paused waiting for a successful forward tick.

1 Like

Might be worth reporting to VirtualBox as a bug, assuming expected behavior for monotonic time in VB is similar to BEAM with no backwards ticks allowed.

At its core the BEAM is a preemptive multi-threaded system. So if the clock is funky then the scheduling will not be fair. If the clock warp too much then I guess some GenServer calls could even timeout.

3 Likes