I’m struggling to understand a problem I’m facing, and I’m not sure if there’s an existing solution.
In the project I’m working on, we need to create embeddings using Bumblebee in conjunction with pgvector
. Currently, all nodes in the cluster are running the same release. However, with the new requirements for embeddings generation, I want some nodes to have more powerful GPUs. These nodes will detect the presence of a GPU using an environment variable and will start the Bumblebee Serving.
The core idea is that all nodes can request embeddings generation, but the actual worker processes will only run on the GPU nodes.
I’m looking for a solution that combines a distributed registry with something like Poolboy, which would allow for round-robin or other load balancing on the registered processes. I’ve looked into HordeRegistry
, but it seems to be designed for single processes only.
I have some ideas for building a custom solution from scratch, but I have a feeling that my requirements—such as service discovery and load distribution—might be met with an off-the-shelf solution.
Do you have any pointers?
Thanks,
Udo