I have a concept for a distributed cluster wide task system and I’m looking for advice on what implementation strategies I could build this with.
the high level case:
I want to spin up an “infinite” number of workers to perform some headless browser tasks using phantomjs/wallaby
they will all report back to a main supervisor and i can aggregate/parse the results there
my real use case:
I want to determine the impact a large email blast linking to one of our servers would have from a “full asset standpoint”
(loading js, images, assets, etc using a headless browser)
e.g:
10k users opening a page within the same minute
1k authenticated users performing the same action, etc
my thought process:
- There is a central supervisor that has a known hostname and can “distribute” method calls to it’s children.
- There is an unlimited/indefinite number of child nodes that all listen to the supervisor
- Once a supervisor sends a function (e.g. fn -> visit page and return the results you are looking for ), the child that receives the function will then report back up the chain with the result
- The supervisor receives the result and adds it to some form of log (db/redis/txt file/???).
Some notes:
when running phantomjs/wallaby there are memory limitations per instance, hence the idea of having this distributed across a number of instances all watching a common supervisor
I am interested in some architecture suggestions on implementing this, do I use Flow? Is this Docker compatible? What are some options for logging/generating a report based on the activity of all the child nodes?
Appreciate your time & look forward to your suggestions.