Oban update from 2.17.4 to 2.17.10 introduced latency

Hello, my team recently updated our phoenix application from oban 2.17.4 to 2.17.10 . Immediately after, we noticed a ~30% increase in latency on a metric that tracks the average delta between when a job is enqueued and when the job begins execution. The increase was around 1-1.5s.

“Enqueue requests” are buffered, so there is a chance this issue was not introduced by the oban update, but the timing lines up perfectly with the deploy of the oban update and no changes went out for the buffering system.

We are using the Phoenix PubSub notifier and we have Postgres insert triggers disabled.

Here’s a screenshot of the metric

Questions:

  1. Is it safe to rollback from 2.17.10 to 2.17.4 (or some version inbetween)? We want to make sure there will be no problem consuming jobs created by newer versions from nodes running older versions.
  2. Any ideas what might have caused the increase? I noticed this commit in particular - Emit insert notification directly from Engine · oban-bg/oban@a3e8a99 · GitHub .

Thanks a lot for your help!

From the sound of the increased wait period, it sounds like the trigger isn’t working. Did you happen to disable triggers in the config as well, e.g. insert_trigger: false? Also, are you sure that the notifier is working? Check the status with Oban.Notifier.status/1 to see.

It’s safe. There aren’t any backward incompatible changes. That said, I truly don’t believe there’s a performance regression that would necessitate rolling back.

This is most likely due to a lack of insert triggers.

Thanks for the reply!

Did you happen to disable triggers in the config as well, e.g. insert_trigger: false ?

We do have insert_trigger: false configured:

config :ourapp, :oban,
  repo: Ourapp.Repo.ObanWrapper,
  engine: Oban.Pro.Engines.Smart,
  insert_trigger: false,
  notifier: {Oban.Notifiers.Phoenix, pubsub: Ourapp.PubSub},

We explicitly disable insert_triggers as we’ve had performance issues with postgres triggers in the past. But we’ve had insert_triggers set to false, and Oban.Notifiers.Phoenix in place for the past 5 months without issue.

Also, are you sure that the notifier is working? Check the status with Oban.Notifier.status/1 to see.

> Oban.Notifier.status
:clustered

I just got some historical context and I think we misunderstood how insert_trigger works and mistakenly believed that it only affects the postgres notifier.

When we initially updated to from 2.16.3 to 2.17.2, (with no config changes) we saw pg_notify activity begin again, even though we had previously deleted the pg triggers via a migration. We realized we missed the new insert_trigger setting, so we set it to false and saw the postgres triggers stop.

A month later we updated our notifier to Oban.Notifiers.Phoenix but left insert_trigger: false in place. It seems clear now that we’ve just never been using the Phoenix notifier :sweat_smile:.

We’re going to remove insert_trigger: false and see what happens. However I’m still confused about the latency increase. If triggers have been disabled this whole time, what caused the latency increase?

That’s the mysterious part. There’s nothing else I can think of that would cause a latency increase, especially not something that’s consistently 1-1.5s.

Enabling the Phoenix pubsub trigger (by removing insert_trigger: false) dropped our “enqueue to perform” latency significantly! It’s now down to ~1.3s. It was ~3.5s after the update to 2.17.10, and ~2.5s for months on 2.17.4 (with insert_trigger: false). The remaining latency is explained by enqueue buffering and other operations that happen between the timing measurement points.

Still curious about the mystery latency in the .4 → .10 update, but we’re pleased with the current performance so will likely leave the investigation alone for now. Thanks for the help :slight_smile: