I am going to use genstage and flow to build a bioinformatics pipeline for the analysis of tumor samples. The pipeline takes millions of reads of several samples and needs to launch external programs built in separate docker images to process such reads. bwa is one of the most important programs, which aligns the reads to the reference genome. It is very computation intensive and needs to be executed with multiple threads. Could someone please tell me how to achieve this?
This depends on multiple things…
- Does your application itself runs in docker?
- Is your other application started once and then does communicate over network or other means with its partner or is it started per job and input?
If your app is in docker and your other application is deamon like, just let them both manage via docker compose and send data between them, you’ll probably need to implement the protocol yourself.
If your app does not run in docker and the other app is in docker and uses single jobs, just start them via
System.cmd/3 as you would start it from the command line.
If your app does run in docker and the other application does do a per job run, you can either use docker in docker to start the other app in a container in the container (not a good idea) or use external provisioning services to manage that stuff.