Distributed headless browser load tester

dezmathio · May 25, 2017, 11:24pm

I have a concept for a distributed cluster wide task system and I’m looking for advice on what implementation strategies I could build this with.

the high level case:

I want to spin up an “infinite” number of workers to perform some headless browser tasks using phantomjs/wallaby
they will all report back to a main supervisor and i can aggregate/parse the results there

my real use case:

I want to determine the impact a large email blast linking to one of our servers would have from a “full asset standpoint”
(loading js, images, assets, etc using a headless browser)
e.g:
10k users opening a page within the same minute
1k authenticated users performing the same action, etc

my thought process:

There is a central supervisor that has a known hostname and can “distribute” method calls to it’s children.
There is an unlimited/indefinite number of child nodes that all listen to the supervisor
Once a supervisor sends a function (e.g. fn -> visit page and return the results you are looking for ), the child that receives the function will then report back up the chain with the result
The supervisor receives the result and adds it to some form of log (db/redis/txt file/???).

Some notes:

when running phantomjs/wallaby there are memory limitations per instance, hence the idea of having this distributed across a number of instances all watching a common supervisor

I am interested in some architecture suggestions on implementing this, do I use Flow? Is this Docker compatible? What are some options for logging/generating a report based on the activity of all the child nodes?

Appreciate your time & look forward to your suggestions.

Azolo · May 26, 2017, 1:33am

One supervisor per node, make that supervisor a simple supervisor.

Use a pubsub that is optimized to be distributed. phoenix_pubsub would work fine.

Alternatively you can setup a GenStage pipeline where each process registers 1 demand to the producer.

The pubsub is going to faster execution but only once per process. The GenStage solution the dispatch isn’t going to be distributed but is going to be more tunable. Or just mix and match as you please.

You’ll have a listener of some kind to receive your input for the test. Just have that process emit a pubsub message or register events with the GenStage producer.

Have another process, I would do one per node, that receives the data. If you can, just write to whatever datastore in that process. If you have a lot of backpressure use a GenStage pipeline to accumulate and batch write.

I don’t think you would want to use Flow. You’re going to need your processes spun up before hand.

What isn’t Docker compatible? Just depends on how you want to architect things.

No idea, but I’m sure there are tools that will do all of this for you. If you want to write your own it probably means that none of those options work for you and you decided to make something that does work for you.