I want to use GenStage to rewrite Bioinformatics pipelines. Processing bam files is one of most important tasks in Bioinformatics pipelines. Bam files are compressed using BGZF. Is there a package of dealing with bam files in Elixir? If not, could you please give me some hints about how to work with bam files?
I suspect you would need to write the relevant code to handle reading/writing BAM files. I just started looking into The SAM/BAM Format Specification - 4 - where is described the binary structure of both BGZF and BAM. It seems it is all in there… but … well… so much of it
Have you started working on it? If so, do you have a github repo?
@zhangzhen, maybe you can call into some C code via a port to decode it before a library in erlang or elixir appears?
No, I did not, just skimmed over the specification document at the moment…
I got intrigued by the question, as I’ve been writing (in the far past) some glue scripts, for processing sequencing data, multiple alignments, primer search…
Anyway I feel it is a big effort for (-unfortunately to me -just ) a pet project, even if I understand Elixir is very well suited for the task, and I’d be learning a lot out of it!
Moreover I feel that Elixir has lot of potential applied to bioinformatics, but this seems an almost unexplored domain yet.