Guardian generating an expired token because System.system_time != DateTime. Is this time warp?

adw632 · September 12, 2023, 3:12am

Things did change in OTP 26 with time warping as described here.

If you want time that is immune to warping then you possibly want Erlang Monotonic Time if your system is able to record the time of token issuance then it can measure the relative time between the token issuance and the current instant and forget about wall clock.

You should also understand the time warp mode you are using.

There is no absolute time in the universe so agreeing on the time will always be a difficult thing, overlay politcally defined timezone rules about wall clock time and another layer of hell is introduced.

There is a good discussion of the monotonic time system in the first link I provided, which I have extracted here:

New Erlang Monotonic Time

Erlang monotonic time as such is new as from ERTS 7.0. It is introduced to detach time measurements, such as elapsed time from calendar time. In many use cases there is a need to measure elapsed time or specify a time relative to another point in time without the need to know the involved times in UTC or any other globally defined time scale. By introducing a time scale with a local definition of where it starts, time that do not concern calendar time can be managed on that time scale. Erlang monotonic time uses such a time scale with a locally defined start.

The introduction of Erlang monotonic time allows us to adjust the two Erlang times (Erlang monotonic time and Erlang system time) separately. By doing this, the accuracy of elapsed time does not have to suffer just because the system time happened to be wrong at some point in time. Separate adjustments of the two times are only performed in the time warp modes, and only fully separated in the multi-time warp mode. All other modes than the multi-time warp mode are for backward compatibility reasons. When using these modes, the accuracy of Erlang monotonic time suffer, as the adjustments of Erlang monotonic time in these modes are more or less tied to Erlang system time.

The adjustment of system time could have been made smother than using a time warp approach, but we think that would be a bad choice. As we can express and measure time that is not connected to calendar time by the use of Erlang monotonic time, it is better to expose the change in Erlang system time immediately. This as the Erlang applications executing on the system can react on the change in system time as soon as possible. This is also more or less exactly how most operating systems handle this (OS monotonic time and OS system time). By adjusting system time smoothly, we would just hide the fact that system time changed and make it harder for the Erlang applications to react to the change in a sensible way.

pejrich · September 12, 2023, 5:52am

Yes, I understand that two things cannot ever truly be synced regarding time, but there is still major difference between(say if the “true” time is 14:14:14) computer A says 14:14:15, computer B says 14:14:16, versus computer A says 14:14:15, and computer B says 23:48:12. They are both out of sync with each other, and both out of sync with the real time, but that doesn’t mean they are both equal. I’m not suggesting Guardian needs to be within a millionth of a second of the atomic clock, but I’m not sure why, for a publicly facing timestamp, one would elect to use a time that is, at least sometimes, many hours away from any notion of “the correct time”.

adw632 · September 12, 2023, 6:03am

If you are at the mercy of the wall clock for determining an elasped time or an agreed future time where a token is considered valid or expired across different systems then you need to make sure your systems wall clocks are synchronised using a reliable time source, and ideally the same time source.

Check timezone configuration. Are they all in the same timezone or different zones. Is the timezone respected when parsing times?

If it’s not timezone and the operating systems disagree on their version of UTC time then it could be NTP. This could be network related, system configuration related, or even DNS related. Check system logs for NTP errors.

However if the operating systems are in agreement on UTC time and the Erlang VM is in disagreement then check the time warp configuration.

Excluding any coding errors, it could be a possible bug in the BEAM on the architecture you’re running on, but I would eliminate all other possible sources of error first.

pejrich · September 12, 2023, 6:15am

That may all very well be true. I’m not stating that time isn’t complicated, and it’s possible I have the wrong time warp settings, but i’ve also never set or changed any time warp settings, so if they’re wrong for me, they’re likely wrong for many people. But I have on many occasions noticed large time drifts with system_time that do not occur in os_time. My argument isn’t about whether or not system_time is a worthy timestamp under any circumstances, but merely why is it considered better here? To me it’s like we have an API returning color names, and our only options for “black” are “blacg” and “eriirijf”, and the argument seems to be, they’re both wrong, so let’s go with “eriirjf”. Under what circumstances is os_time a worse choice for a publicly facing time stamp than system_time? Because while I understand my evidence is anecdotal, i’ve seen system_time drift from “the generally agreed up real time”(however fluid that definition is), by many many hours, on dozens of occasions(this is running OTP 25, so not exactly ancient). I’ve yet to ever see os_time not be “correct”, at least on the scale of “within a minute”. System_time is sometimes not even “within a half a day”. I’m not trying to time particle collisions on the hadron collier here, but it would be nice if my 1hr TTL token didn’t cause issues because it was expired when it was issued as far as the client is concerned.

adw632 · September 12, 2023, 7:42am

I don’t think you read the resources I provided as it explains why certain things need to be done.

You can’t actually make time jump backwards in a system as everything will break. You can only actually warp time in the forward direction instantly (which can still cause timeouts to fire prematurely and break lots of stuff but at least you don’t get negative numbers!) or when system time is ahead of the os time slow the advancement of time so the system time regresses to the os time (jumping back instantly and reversing time creates negative relative time and will generally break all we hold true that time moves in one direction).

Erlang needs to both provide a frequency stable low variance precision time for accurate deadlines, timeouts and scheduling (elapsed time calculations) whilst also providing a system time source / wall clock that smooths out any time warps by slewing time. Not all runtimes even care, the rule is they don’t generally care about latency or fair scheduling or time warps anyway.

The new time handling only landed in OTP 26 and it has a better approach to providing guarantees no matter what time warping happens and can still provide precision clocks whilst smoothing the transition of the system time without taking a long time to get there.

Prior to OTP 26 this is the behaviour:

No Time Warp Mode

The time offset is determined at runtime system start and does not change later. This is the same behavior as was default prior to OTP 26 (ERTS 14.0), and the only behavior prior to OTP 18 (ERTS 7.0).

As the time offset is not allowed to change, time correction must adjust the frequency of the Erlang monotonic clock to align Erlang system time with OS system time smoothly. A significant downside of this approach is that we on purpose will use a faulty frequency on the Erlang monotonic clock if adjustments are needed. This error can be as large as 1%. This error will show up in all time measurements in the runtime system.

If time correction is not enabled, Erlang monotonic time freezes when OS system time leaps backwards. The freeze of monotonic time continues until OS system time catches up. The freeze can continue for a long time. When OS system time leaps forwards, Erlang monotonic time also leaps forward.

The new OTP 26 behavior using multi time warp mode improves the accuracy, performance, frequency stability and the time to synchronise the system wall clock after any time warps.

What also might be happening is the OS time has not fully synced before Erlang starts tracking the offset and frequency drift and locks it in.

Single time warp mode is used to solve a problem that often occurs in embedded systems without clocks or batteries and need to synch the time first from an external source:

If no time warp mode is used, and the Erlang runtime system is started before OS system time has been corrected, Erlang system time can be wrong for a long time, centuries or even longer.

If you need to use Erlang code that is not time warp safe, and you need to start the Erlang runtime system before OS system time has been corrected, you may want to use the single time warp mode.

Maybe your server is starting with some default time provided from the hardware or hypervisor and it hasn’t synced it’s clocks before Erlang VM starts and the offset gets locked in as per above.

There are also some notorious problems with clock drift in hypervisors also which can mess with time synchronisation at both the OS level or any runtime that is attempting to provide sane clocks.

Are you running on a cloud ? Have you installed all hypervisor extensions and relevant drivers for the VM and NTP is working?

D4no0 · September 12, 2023, 7:55am

I can confirm that after upgrading to OTP-26 the time warp synchronizes now on problematic devices like mac M1, you even get logs in console confirming this. I guess this is the whole point of the discussion, because as I posted above, on x64 devices everything works as expected and there is no noticeable time warp.

Schultzer · September 13, 2023, 6:15pm

There is a lot if back and forth in this thread, so I’m just leaving this here for anyone in the future:

When it comes to authentication a reliable timestamp is essential, one of the bigger user authentication library out there uses :os.system_time/1 https://github.com/pow-auth/pow/blob/7c5775818b799d590d441d52510ae6deeaf6f074/lib/pow/store/credentials_cache.ex#L116.

As people have correctly pointed out :erlang.system_time can drift from the :os.system_time, and furthermore you want to make sure your operating system time is synced up with NTP for it to be reliable. I have no idea why someone would use :erlang.system_time for anything authentication related. It is highly discouraged to use an unreliable timestamp, it is a security risk!

adw632 · September 13, 2023, 9:24pm

I certainly agree that authentication needs to adopt an external time perspective when sharing timestamps in protocols. Ultimately we want UTC time there (or a local time with the offset from UTC).

For internal (elapsed time) use cases where the wall clock time is irrelevant then the Erlang systems monotonic time is the safest. It is immune to all manner of poor system administration and configuration, NTP failures and manual abrupt setting of the OS time.

The reason Erlang system time exists is to compensate for the poor system administration and configuration or NTP failures/outages. There are cases where this will provide better resilience than accepting the os time, and there are cases where slewing/smoothing time because of time warps will create problems.

That said, I do stand on the side of adopting the external world view of the time (the OS time) when we are dealing with sharing timestamps in protocols used between different systems, such as authentication protocols and audit logs. This also has a critical security prerequisite of a correctly configured operating system clock that is synchronised to a reliable and agreed stable time source, to ensure correct behavior. This is a typical security control in most security control frameworks.

D4no0 · September 13, 2023, 9:38pm

Could you elaborate on cases when this might happen? I’ve deployed a lot of times on linux, but never thought about it. Do distributions like debian use some kind of NTP by default to synchronize the time? because nowadays pretty much 95% of servers have connection to internet.

adw632 · September 14, 2023, 2:17am

Yes most distributions enable NTP usually by default, but it always pays to check.

It is only since OTP 26 that we got separation of the clock used for elapsed time / precision time requirements from the wall clock and it required the new time API.

I would read the docs that explain some of the system use cases they are defending against which can include belligerent system administrators doing forced and manual time changes, systems without battery backed clocks or flat/failed battery, or start Erlang VM before time is in a reasonable state (e.g. NTP may not be syncing).

D4no0 · September 14, 2023, 8:29am

Makes sense, thanks for shedding light on this use-case. When this thread was started I was baffled on why such a system exists in the first place.

adw632 · September 15, 2023, 1:31am

Yeah it’s a thing in Erlang because most runtimes are relatively ignorant and are not trying to provide quite the same guarantees around scheduling.

I think in this case I think the original time abstraction was simply insufficient and maintaining that abstraction in the face of precision time and wall clock warping led to some slow syncing / unwanted consequences.

But things are good now with the new time API AFAIKT.