Skip to content

Scalable Genotyping for Large Cohorts #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ytt01 opened this issue Mar 5, 2025 · 0 comments
Open

Scalable Genotyping for Large Cohorts #162

ytt01 opened this issue Mar 5, 2025 · 0 comments

Comments

@ytt01
Copy link

ytt01 commented Mar 5, 2025

I used Minimap2 to align ONT data and called structural variants (SVs) for an ONT cohort using Sniffles. These SVs are used as a reference panel to genotype short-read data. However, due to the large number of short-read samples, my server cannot handle joint genotyping of all samples simultaneously.
Graphtyper’s documentation suggests providing all BAMs together for optimal genotyping, but splitting samples into sub-cohorts seems necessary for computational feasibility.
My questions are:
Does Graphtyper support merging genotype results from separately processed sub-cohorts (e.g., population-specific batches) into a unified VCF?
Are there recommended tools or built-in functions in Graphtyper to merge sub-cohort VCFs while resolving potential conflicts (e.g., duplicate variants, inconsistent FORMAT/INFO fields)?
If merging is possible, what steps or precautions should be taken to ensure consistency (e.g., handling reference panels, avoiding batch effects)?
This workflow is critical for scaling to large cohorts. Any guidance would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant