Writing a timer that is durable across a deployment or crash

jsmestad · November 20, 2020, 1:10am

I am wondering how you would structure a simple kitchen timer. For example, you set a timer for 15 minutes and need to count down to zero. When it reaches zero, you print “DONE” to the terminal.

The solution I have uses a GenServer to start the timer and then Process.send_after(self(), :times_up, 900000) which works in a happy path situation.

My question is how do you make this same thing resilient to a crash or a restart? If you restart back at 15 minutes, you burn the food. Would you have to lean on a database? What would you store so you don’t skip a beat when the timer recovers?

srowley · November 20, 2020, 1:25am

I’ll bite - if I had a kitchen timer that might randomly fail, I would just set as many timers as I thought were needed and ignore the ones that ring after the first one.

jsmestad · November 20, 2020, 1:27am

Yeah it’s a contrived example as it’s the same core problem as a larger one that I am facing.

In this case a false alarm may as well be as bad as an alarm that did not arrive. These kitchen timers are getting started and stopped at the user’s request. I need to build in some fault tolerance so if something goes wrong, I can gracefully recover.

cmo · November 20, 2020, 2:12am

If you don’t want to use a db you could write the expiration time to file and poll it to check if an alarm needs to be raised. Delete the file when the alarm is raised.

Or if you’re happy to use a db, use something like oban?

jsmestad · November 20, 2020, 3:46am

Oh interesting yeah something like Oban may work

akash-akya · November 20, 2020, 12:57pm

Edit: Eh, I just realized from the title that you meant across deployments.

Old reply

Similar to what @cmo suggested, you can use ets.

Write expiration time to ets with process identity as key. Every time process starts, it checks if timer is already present in ets and gets it,

if expired, do the action
else call send_after with remaining time as timeout

Delete key after the action. I’m assuming atleast-once guarantee is acceptable

cmo · November 20, 2020, 6:27pm

That’ll work if you use dets for persistent storage

mpope · November 20, 2020, 7:14pm

If you’re designing a ‘highly scalable fault tolerant kitchen timer as a service’ type deal, I’ll assume you have an external database. A simple table with two columns, an id and an end time could work. You can spawn a process that executes Process.send_after to read from the db every x milliseconds, or even less if you cache the timers. Gets tricky with multiple nodes reading from the same db.

shanesveller · November 20, 2020, 10:49pm

That’ll work if you use dets

You’ll almost certainly inherit a different set of problems instead, though. I wouldn’t use dets in almost any case where an external state storage, such as a DB, was already available, and I’d be hesitant to use it in an embedded use case as well.