Synchronizing time between two processes NOT within cluster

mspanc · March 26, 2017, 9:24pm

Hello,

I am writing an app where I need to synchronize time between two processes, running on two nodes that are not within cluster (and they cannot, by design). I can’t use NTP daemon as I cannot assume anything about host OS configuration. So basically I need something like NTP-per-process or just generic NTP client/server implementation and I will wrap it. Any recommendations on how to do this properly erlang-way or about battle-proven ready-to-use libraries?

The problem I need to resolve is that basically both apps “tick” every 10ms, and send data to each other. There’s a certain time window for which the receiver waits for the data. I have found out that when using :timer.send_interval the timer skews and window moves and receiver starts dropping data as it is considering them arriving too late.

Qqwy · March 26, 2017, 10:05pm

What is the reason that you are using timers here? (especially using strict time windows).

Attempting to pretend like the laws of nature can be broken (here: that communication between two nodes can always happen instantly) will result in an hill-up battle when building algorithms.

If it is possible to use conflict-free replicated data types (CRDTs) for your application, I would definitely take a look at that instead.

Can you tell us more about your problem domain? There might be other approaches to circumvent the problems you’re having.

benwilson512 · March 26, 2017, 11:59pm

Yeah 100hz real time sync guarantees across a cloud network sounds like a problem I would want to avoid having.

rvirding · March 27, 2017, 6:59am

Couldn’t you read the time after each tick, or a number of ticks, and use that to adjust the length of time for :timer.send_interval?

Mandemus · March 27, 2017, 8:22am

You answered your own question in that you need to synch ticks not time. Similar to how RTS games such as Starcraft work under the hood, being turn based. Only, you won’t need the fixed point deterministic simulation part.

You need to weight your timeout window in order to adjust for changing network latency. So you just need the delta between the last two ticks.

mspanc · March 27, 2017, 11:13am

I am writing a low-latency audio link. Jitter buffer at the receiver side has to know when packets are too late and wait for them for certain, predictable amount of time. So it is a window that holds e.g. 10 frames, that advances e.g. every 10ms if we use 10ms audio frames. Sender is supposed to send frame every 10ms and receiver is supposed to advance every 10ms. It loses sync after some time even on localhost link where there’s no network jitter. Receiver moves one step too far to the future and starts considering all incoming frames to be late. Seems that clock at the receiver advances faster than clock at the sender, especially that clock at the sender is sound card based (so it will work differently than erlang VM-based clock) and clock in jitter buffer is erlang VM-based. I started to wonder if time sync and sticking to one of the clocks wouldn’t have been an option to overcome this. Now I use System.monotonic_time/0-based diff to compute real time that has passed between ticks and it still happens and unless there’s a bug in the code (but we believe there’s none after a few reviews) there’s just a clock skew.

I’ll take a look at @Mandemus’ suggestion, seems that in gaming there’re similar issues.

cdegroot · March 27, 2017, 1:18pm

That smells pretty much like you want to have a time code in your frames, either a monotonically increasing number, or probably simpler a straight hh:mm:ss:nnn (where “nnn” is frame number) timestamp (in 4 bytes). The receiver can then place frames exactly where it wants to and confidently drop frames that arrive too late.

mspanc · March 27, 2017, 1:45pm

I already do this. The issue is with considering what means “too late”.

Qqwy · March 27, 2017, 2:47pm

In this scenario, do you have multiple senders, or a single one?

In any case, you can just work with an incremented number instead of a hh:mm:ss:ms clock here. It might be interesting to look up Lamport Timestamps, even though it sounds like, in your system, you don’t need to create a total ordering of events.

Wouldn’t a system like this work?

The sender keeps track of a counter.
Every message that is sent contains the value of the counter. The counter then is incremented. Thus, all sent messages can be ordered by the receiver.
The receiver waits for any message whose counter is higher than the latest message it received. Messages with a lower counter are simply discarded.

You will now be able to reconstruct the audio signal at the receiver’s side. In the case two messages appear out-of-order, there will be a small jump in the audio.

If you want to reduce the amount of packets that are discarded when they appear out-of-order, you could add a small buffer, and say that you wait N timesteps for messages between the old observed counter and the highest current observed counter to arrive, before incrementing this counter to the new highest observed value. In the best case, N messages will appear during that time, regardless of what order they arrive in. In the worst case, some of these messages never arrive. But after these N timesteps, the receiver will always re-sync itself with the current highest observed message counter, meaning that it can never get too much out of sync with the sender.

Mandemus · March 27, 2017, 2:50pm

The low latency is an illusion, as you should be intentionally processing frames late. You never want to be hungry for data. That’s touched on in the article, where you are processing say simulation tick 3 while tick 5 has already been received. For audio you would probably want a bigger buffer than a few frames.

entone · March 27, 2017, 8:51pm

Open Sound Control is a protocol for network based music systems. It aims to solve this exact problem. http://opensoundcontrol.org/introduction-osc

andre1sk · March 27, 2017, 9:08pm

if for some reason you really want to rely on a clock:

Qqwy · March 27, 2017, 10:06pm

OSC has two problems:

If I recall correctly, it is more geared towards sending control messages (similar to MIDI) rather than audio streams over the network.
From the bottom of the spec page:

An OSC server must have access to a representation of the correct current absolute time. OSC does not provide any mechanism for clock synchronization.

It then goes on to explain that the timestamps that OSC uses for its messages are exactly the format that is used in the NTP protocol.

entone · March 28, 2017, 3:46am

Yes, you are correct, definitely for control. Not sure your exact use case, but if the local system could generate the sound, OSC might still be applicable.

cdegroot · March 28, 2017, 4:24pm

You buffer a couple of seconds, start your local playback timer, and then basically place received packets on their designated spot on the timeline based on both their timecode and the current state of the playback timer. Too late quite simply means: later than the current playback timer. That’s the easy bit - the hard bit is deciding when too many packets are too late so you need to pause and buffer more, etcetera