Trying to understand gc_interval in Nebulex

Hey folks! I’m trying to understand gc_interval in Nebulex.

# GC interval for pushing new generation: 12 hrs
gc_interval: :timer.hours(12),

I know how generations work in .Net and JVM, but I struggle to understand what that parameter means and how I should choose the optimal timeout?

Let’s say I have a cache of short-living objects with TTL 5mins, what is the best gc_interval?
Another case, I cache the HTML file, the TTL is 1 day, what is the best gc_interval?

I checked nebulex source code, and it’s a purely garbage collection concept. One way to do garbage collection (i.e. deleting stale values from the cache backed by an ets table) is to create a timer for each new entry and delete them when the timer fires. Another is to run a cleanup process that deletes all stale values in one big scan every other interval. And then there is the way nebulex uses, which is to have multiple ets tables and release whole tables when the time comes. So every gc_interval the following happens:

  • tables before garbage collection: [ets1, ets0]
  • gc_interval timer fires, a new ets table is created [ets2, ets1, ets0]
  • oldest table is dropped [ets2, ets1], since nebulex seems to keep at most two tables at a time, all entries of ets0 are now “garbage collected”

So the value of gc_interval seems to depend on how long you can tolerate stale values occupying memory on one hand (so it should be as little as possible after ttl expires) and on the other hand tables shouldn’t be deleted while there are active entries in them so gc_interval >= ttl (you can cache a value right before gc_interval timer fires so lifetime(old_ets) should be at least ttl).


IMHO this approach might be overkill in some scenarios. Personally, I’ve never had issues with “naive” cleanup techniques.

1 Like

Thank you @ruslandoga!