Slow recurring insert Oban query caused by too many cancelled scheduled jobs

DaemonSnake · June 26, 2024, 4:38pm

Hello,

We were having some issue with Oban Pro (1.4.2) in our AWS production environment (Aurora PG 16.1).

Pretty often during the day the following oban-internal query would takes the first place
in the performance analysis view in AWS with around:

3+ AAS,
~3s+ average latency
+1 call/s.

The query goes as follow:

SELECT o0."state", o0."scheduled_at" FROM "public"."oban_jobs" AS o0 WHERE (o0."meta" @> $1) AND (o0."state" != 'completed') AND (o0."id" < $2) ORDER BY o0."id" DESC LIMIT 1

We realized that we had many scheduled jobs that had been cancelled (around 200k) and were not getting pruned.
We decided to delete them and the above now well-behaves.

We believe it was caused by state != 'completed'.

What is the purpose of that query and why is ‘completed’ the only state that is excluded ?

sorenone · June 26, 2024, 5:12pm

Welcome, @DaemonSnake! Thank you for bringing your question to the forum.

Definitely upgrade. There are major bug fixes and performance improvements.

How did you cancel those jobs? What is your pruning config? Cancelled jobs are normally prunable.

Are you using chains or workflows? That’s going to help us narrow it down.

DaemonSnake · June 28, 2024, 3:43pm

Thanks for you quick reply @sorenone

Definitely upgrade. There are major bug fixes and performance improvements.

Thanks for the info, done ^^
indeed we are seing a bunch of performance improvements

How did you cancel those jobs? What is your pruning config? Cancelled jobs are normally prunable.

We used Oban.cancel_all_jobs, but those jobs had schedule_at dates far into the future.

Are you using chains or workflows ? That’s going to help us narrow it down.

We are using the Chunk worker