Hello forum,
This question is following the footsteps of the “Back on your feet” talk by @cloud8421 https://www.youtube.com/watch?v=kWYgrA2YshE talk.
Context:
- there is an application that can report a status of the transaction and one application can report status for one particular account, i.e. it’s required to spawn 1k applications to monitor 1k accounts. Accounts are created and destroyed dynamically, on scale > 1 million a week.
Problem:
- what is the most scalable and reliable OTP architecture to monitor > 1 million transactions simultaneously
The first and the simplest option is to run a process per account, simple and easy.
Second is to create a pool of processes each of which will be in control of N accounts/processes and circularly poll them (RR).
+---->ScannerWorker +---> N account RPC processes
|
Scanner+---->ScannerWorker +---> N account RPC processes
|
+---->ScannerWorker +---> N account RPC processes
Now there a couple problems with restoring failed d ScannerWorker
- If I will use
phash2
to distribute accounts between workers I can’t useScannerWorker
PID because it will be impossible to address requests to its accounts because of the new PID after restart. I need to have a stack of IDs and accounts attached to each ID and have newly createdScannerWorker
pickup that first ID in the stack and spawn processes for the associated accounts.
Does that makes sense? Where and how that can possibly go wrong? What could I miss?