This was posted in the #nerves-dev Slack channel (by @fhunleth) and I found it to be fascinating read:
It looks like a protocol that could be very useful for an embedded system that needs accurate-enough time (especially for validating SSL certificates). Also it’s a much more lightweight protocol than ntp.
Could anyone shine some light on how this can be used as a replacement for NTP?
In Roughtime or similar network time update implementation? Frank alludes to the possibility of replacing NTP with Roughtime. I’m not familiar with how it’s done in practice, but as far as I understand, a client uses the time obtained from a few NTP servers to set its system clock. In the case of Nerves, the client would be some device deployed in the field somewhere.
What I don’t understand is how important the monotonicity of the system clock (or lack thereof) is for a Nerves device
First, it stores the current system time in t0. Then it sends a request to a Roughtime server, gets back the server’s time t1 and stores the difference between server and client system times in delta. The delta is then used to define a new function now which is supposed to be used as a replacement for time.Now().
My concern is that delta is itself a rough difference between client’s time and server’s time. In addition to the actual (objective) time difference it also includes the time it took to send a request to the server, get back a response and decode it. What if a subsequent request ends up being served faster? In that case, the client’s system clock may have to be shifted backwards.
How much of a problem is that in practice? Maybe that already happens routinely when using NTP, I just don’t know.
There are a lot of considerations, so I don’t want to say that replacing NTP is what everyone should do with Nerves. There’s some backstory on the issue:
The pain that brought this up is that many Nerves devices require a TLS connection to a server (AWS IoT, NervesHub, kiosk display webserver, etc) to do anything. Unlike a web browser, a Nerves device can’t ask a user if it’s ok to accept an invalid cert. The most commonly reported failure here is that the Nerves device’s clock is set incorrectly and it’s outside of the server’s cert validity window.
Nerves devices are being built without battery-backed real-time clocks, so the devices start up at Jan 1, 1970. These devices experience a time warp with any time synchronization protocol (both to the OS system time and Erlang’s system time since Nerves’ default is to enable multi-time warp mode). Nerves and the nerves_time library can preemptively warp the time closer to the real time without network connectivity, but there’s usually still a little warp when NTP kicks in (should be forward). Per your question about whether Nerves devices can deal with system time warps, the answer is that they need to already.
Back to TLS, the desire is to get that TLS connection working as soon as possible after boot. That problem turned into how quickly the device can get a trusted timestamp that’s within the server cert’s validity window. Depending on the network, NTP takes about 10 seconds or more to get enough samples to set the time. Related to this is that Nerves uses Busybox’s NTP client and it doesn’t support authenticated time. All of this makes Roughtime’s quick authenticated timestamps interesting.
One last thing is that your point about how naively Roughtime determines time is good to know. I don’t think we’re giving up on NTP and I’m pretty sure that given enough time and thought NTP could address this as well. If Roughtime were only used for TLS server cert verification, that would suffice, but it’s also “good enough” for many Nerves applications.