Scalable Genotyping for Large Cohorts #162

ytt01 · 2025-03-05T09:32:56Z

I used Minimap2 to align ONT data and called structural variants (SVs) for an ONT cohort using Sniffles. These SVs are used as a reference panel to genotype short-read data. However, due to the large number of short-read samples, my server cannot handle joint genotyping of all samples simultaneously.
Graphtyper’s documentation suggests providing all BAMs together for optimal genotyping, but splitting samples into sub-cohorts seems necessary for computational feasibility.
My questions are:
Does Graphtyper support merging genotype results from separately processed sub-cohorts (e.g., population-specific batches) into a unified VCF?
Are there recommended tools or built-in functions in Graphtyper to merge sub-cohort VCFs while resolving potential conflicts (e.g., duplicate variants, inconsistent FORMAT/INFO fields)?
If merging is possible, what steps or precautions should be taken to ensure consistency (e.g., handling reference panels, avoiding batch effects)?
This workflow is critical for scaling to large cohorts. Any guidance would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalable Genotyping for Large Cohorts #162

Scalable Genotyping for Large Cohorts #162

ytt01 commented Mar 5, 2025

Scalable Genotyping for Large Cohorts #162

Scalable Genotyping for Large Cohorts #162

Comments

ytt01 commented Mar 5, 2025