Named pooled processes in shards, LRU

Hi,
Currently I’m experimenting with Elixir. My use case is the following

  • have many millions of aggregates (in DDD terms), and they are not depending on each other
  • keeping all of them in a live or hibernated process does not seem to be a valid options

Plan A: (seems better for me)

  • they can be grouped in shards by an attribute
  • my plan is to have a worker pool per shard --> that implies multiple pools
  • have only a single instance of a given aggregate
  • since the pool size is limited, the least recently used aggregate should be replaced by the last referenced one in the given shard

Plan B:

  • i can think of aggregate processes with limited life time (e.g. if it’s not called within 5 mins, the process should die automatically)

questions:

  1. could someone validate these approaches?
  2. could you please give me some hints regarding the implementation?

Thanks in Advance

1 Like

Forgot to mention that I’m open to suggestions

  • have many millions of aggregates (in DDD terms), and they are not depending on each other
  • keeping all of them in a live or hibernated process does not seem to be a valid options

Why do you want to use processes? And why keeping them live is not valid? I am trying to understand your constraints. Also, will you store the aggregate data somewhere to rebuild it again in case you need it again?

I would try Plan B only because it is simpler. Although you will indeed need something like LRU if you have bursts of load which cause too many aggregates to be in memory for those 5 minutes (albeit unlikely?).

1 Like

Hi, thanks for getting back to me. Please note that I may have a lot of false assumptions, since I’m new to Elixir.

Using named processes seemed a good way to serialize the request for a specific aggregate

Keeping millions of processes alive could cause a huge pressure on memory.

Yes

In general, only approximately 10k of the aggregates are active simultaneously. They are accesed randomly.

Regarding Plan B, it appears to be less efficient when response time matters (i didn’t do any benchmarking)

Hope there is a kind of best practice for handling similar tasks

It depends. On Plan A, if you are triggering the LRU eviction frequently, it will lead to word response times. Plan B should theoretically be fine: only the first load after every five minutes with no access will be a problem. If you are doing other things in your system, say background work, Plan B will help you release memory pressure when no aggregates are used. That’s why I suggest to start with Plan B and add Plan A if a timeout is not enough to control memory pressure.

Thank you for the hints