How to measure execution time for Oban Chunks?

Hi!

I’m running some benchmarks and trying to see how long it takes a job to complete in chunked jobs. I’m running this query

select AVG(completed_at - attempted_at) AS average_duration, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY completed_at - attempted_at) AS median_duration
from oban_jobs 
where worker = 'MyWorker'

which I think would accurately show the execution time of a single job but it’s showing some surprising results (55s - 62s when the expectation is around a second). Do you have any suggestions on how to find the execution time of a single job in a chunk?

Thanks!
Eli

Hello,

Measuring the time for a single job won’t be meaningful for a chunk. Instead, we group by chunk.

Each chunk has a leader and that’s recorded in the attempted_by column. So, here is how you could do it based on the query you sent:

with chunk_durations as (
  select max(completed_at) - min(attempted_at) as duration, attempted_by as chunk
  from oban_jobs
  where state = 'completed' and worker = 'MyWorker'
  group by attempted_by
)

select avg(duration) from chunk_durations;

The state is set to completed to prevent measuring the in-progress chunks.

Hope this helps!

2 Likes

Hi! Thank you and that definitely does help. I am seeing a bit of variance in the query and want to understand it a bit more deeply. Is attempted at set when Oban first attempts a job regardless of if the chunk is run or not?

Would the flow be something like

Oban queries for job
Oban gets job and sets attempted at
Oban checks to see if the job fits in a chunk
Chunk can be run or not

That would totally explain the variance to me as we set the chunk timeout to 10 minutes!

You have a great point.

That’s mostly right. The chunk will always run, even if it isn’t “full”, and the first job will always have an attempted_at.

The rest of the chunk is all fetched together when it reaches the size or timeout. You could get a more accurate value by using max(attempted_at) instead. That’s when the chunk actually starts processing.

Thanks @sorenone!

I noticed that first job is executed immediately, without waiting for the timeout, but rest of the jobs are fetched and executed together. Is it possible to execute first job also as part of the group?