We are building an application where we have a lot of import requests, which each spawn one or multiple processes, one per requested resource we want to import.
The concern of a resource-worker is:
- get data from external API.
- transform data into our internal data format.
- return data to batch (which bundles all resources in a request together, before returning all of them at once to the requester).
The actual resources are requested from an external API. However, on a single API-account, you’re only allowed to do n API requests to the server per second, or do m queries every 15 minutes, etc.
What I was thinking, was to create a pool of n HTTP-client-workers, and have all resource-workers request data through this pool.
But here is the catch: What is the best way to communicate between the resource-workers and the pool?
- Should all resource-workers try to open a pool-connection continuously? (i.e. a resource-worker loops until one can be established). This seems to result in an insane amount of message-passing in the application, as all resource-workers will send ‘can I be helped yet?’ messages all the time.
- Should resource-workers be put in some sort of queue in the pool-server, and then get a worker-PID returned to them as soon as they are next in line? Caveat: What should a resource do while waiting? Should the resource monitor the pool to ensure that it will still be helped somewhere in the future? (because when the pool-server crashes, the queue will be gone)
- Maybe there is even another alternative?