Oban: High bloat on oban_prouducers table

hubertlepicki · January 10, 2025, 1:47pm

I have a situation where I can see very high Bloat on some of the tables in the system, and the biggest one by factor of 10 or more is always oban_producers (see attached LiveDashboard screenshot)

I suspect this is causing issues: I just had several hours of 100% CPU usage, and autovacuum process was failing to delete any tuples whatsoever from the system.

Oban may not be the root cause of the problem, and it may be an overall misconfigured vacuum settings, that I did not tweak at all (uses whatever Google sets up on their hosted databases).

I have a question: how do I fix that bloat, in least interruptive way? The system is having a lot of users, and a lot of activity, so I would prefer not taking it down for maintenance but could do that over the weekend if needed.

I have limited knowledge of tuning PostgreSQL, and would appreciate any tips.

It may also be something is misconfigured in Oban itself? I am using the Reindexer plugin, but I think the problem is not the indexes but actual bloat on the tables… @sorentwo can you advise?

sorentwo · January 11, 2025, 2:46pm

The oban_producers table is ephemeral and may be written to frequently, depending on your configuration. Vacuuming can be tweaked per-table, and is rather conservative by default, but is probably worth tuning if you’re using a lot of global or rate limits because those centralize through the producer records.

You can run VACUUM FULL oban_producers even with an active system. The table is ephemeral and should be very small (nodes * queues), and recent Oban/Pro versions are resilient to timeouts and will retry with backoff if the table is locked due to vacuuming.

Another option, if you’d like to be very cautious, is to do a table swap:

begin;
create table oban_producers_new (like oban_producers including all);
alter table oban_producers rename to oban_producers_old;
alter table oban_producers_new rename to oban_producers;
drop table oban_producers_old;
commit;

That won’t have the same impact as a vacuum full though, because it won’t clean out dead tuples on disk.

That’s correct—table bloat is separate from index bloat (and I don’t think the live dashboard view combines them). The producers table doesn’t have secondary indexes anyhow, as there are so few tables normally that Postgres wouldn’t choose to use an index for frequent queries.

michallepicki · January 11, 2025, 3:41pm

Hi brother

Edit: after re-reading @sorentwo 's message now I see that this is probably unrelated and doesn’t happen on the oban_producers table, but I’ll leave the message here, maybe it helps someone else

Not sure if those count into bloat, but maybe it’s related to the fact that reindexing can sometimes fail and leave _ccnew or _ccold or similarly named invalid indexes? It is documented that it is safe to drop them, we do that from time to time. So we run e.g.

drop index oban_jobs_pkey_ccnew;
drop index oban_jobs_pkey_ccnew1;

On top of that we’re sometimes seeing pg_toast.pg_toast_XXXXXX_index_ccnew (where XXXXXX is some ID number) and similarly named indexes in Phoenix LiveDashboard in the Ecto Stats → Duplicate indexes tab. To verify that they were from oban_jobs we ran select relname from pg_class where oid = XXXXXX; (by that ID number from the pg_toast index name). I think it’s safe to just drop those as well.

sorentwo · January 11, 2025, 4:14pm

Those should be cleaned up by the reindexer automatically.

Interesting! Those aren’t explicitly handled by the reindexer cleanup query, but it seems like they should be.