Yes, with this particular Oban DB Migration, you should be careful or take some steps before proceeding.
Essentially, we decided to split the Oban Pro Ecto migration into multiple steps, and mostly do manual migrations, following the pattern suggested in the article Automatic and manual Ecto migrations written by the folks from Dashbit.
But, first, you have to understand what these DB schema migrations entail, I will not post the code here out of respect to the Oban authors, and also because you can easily find it at lib/oban/pro/migrations/v1_5_0.ex
. But, in general lines, there are 5 big steps:
- Creates a UDF or user-defined function that essentially translates Oban states into bits. For example, the
scheduled
state is 0, the available
state is 1, and so on. This is safe to apply even as an automatic Ecto migration.
- An
ALTER TABLE
in the oban_jobs
table that will add a generated column uniq_key
; this column is special because its value will always be computed from other columns or expressions. This step must be applied in a manual migration after reducing the number of rows as much as possible because it will involve a sequential operation and Access Exclusive
lock (the strictest one), as much of the ALTER TABLE
operations. If you want to inspect all the locks for this query, check the wonderful Safe Ecto Migrations series from David Bernheisel.
- It will add a new partial
UNIQUE INDEX
against the previous column uniq_key
, this index is not concurrent.
- It will add a new partial index for batches, this index is not concurrent.
- It will add a new partial index for chains, this index is not concurrent.
Once we understand these 5 steps and the problems that could arise when you have millions of rows in the oban_jobs
table, you can take some actions. The first thing we did was to take advantage of the DynamicPruner
to reduce our policy retention and be more aggressive, as you, we have to keep the records around, thankfully, the DynamicPruner
provides an option to set up a callback before deleting the records, in that callback, you can perfectly copy the jobs that are about to be deleted into cold storage or another table, your choice.
But while we were on this pruning crusade, we discovered, and then confirmed with the Oban authors, that the DynamicPruner
plugin uses the scheduled_at
field as a limit, which is fine, the “problem” is that we use a lot of Oban Workflows, and there is a workaround that sets the scheduled_at
field way ahead in the future as a way to put “on hold” the workflow dependencies without introducing a new state like available
, scheduled
, etc. So, you can opt to create your own plugin that considers this case too, allowing you to prune even more records. This bug in the DynamicPruner
is fixed in the most recent stable (v1.5.2
) release, but at this point, we were using v1.4.14
.
Once you have reduced the size of the oban_jobs
table, it should be safe to apply the manual migration for step 2, this preferably should be applied at a time when you don’t have too much load in your system, probably during a weekend would be a good idea.
While I understand why the Oban authors decided to avoid creating these indexes concurrently, you must understand that creating these 3 indexes will lock the table for writes (not reads), preventing concurrent inserts, updates, or deletes until the index it’s done. So, we decided to wrap these index creations into their own manual migration and set the option concurrently: true
, considering that you must set the module attributes to disable DDL transactions and the migration lock.
Keep in mind that applying the UNIQUE INDEX
in the uniq_key
column could fail. This means you have duplicate entries for this column, and you have to solve that issue first, and try again.
Once you have applied all these migrations, it should be safe to proceed with the Oban upgrade to v1.5.