Skip to content

[hifiasm_0.25.0-r726] Single k-mer peak (16X) contradicts high heterozygosity ratio (289:1) in assembly #800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
darkxin12 opened this issue Mar 21, 2025 · 4 comments

Comments

@darkxin12
Copy link

I am seeking guidance to resolve an apparent inconsistency in my analysis. While learning to use hifiasm for a genome assembly project, I noticed two results that I might be misunderstanding:

k-mer profile: The k-mer coverage distribution shows a single peak at ~16X ([M::ha_ft_gen] peak_hom: 16), which initially suggested a homozygous genome based on standard k-mer theory.
Heterozygosity statistics: However, the log reports a high heterozygosity ratio ([M::stat] heterozygous:homozygous bases ≈289:1), which typically implies a highly heterozygous genome with bimodal k-mer peaks.

[M::ha_analyze_count] lowest: count[5] = 11210960
[M::ha_analyze_count] highest: count[16] = 73404125
[M::ha_hist_line] 2: ********** 7517307
[M::ha_hist_line] 3: ******** 5950814
[M::ha_hist_line] 4: *********** 7923837
[M::ha_hist_line] 5: *************** 11210960
[M::ha_hist_line] 6: ********************* 15651122
[M::ha_hist_line] 7: ***************************** 21069458
[M::ha_hist_line] 8: ************************************** 27882266
[M::ha_hist_line] 9: ************************************************ 35166958
[M::ha_hist_line] 10: *********************************************************** 43039027
[M::ha_hist_line] 11: ********************************************************************* 50968341
[M::ha_hist_line] 12: ******************************************************************************** 58504231
[M::ha_hist_line] 13: ***************************************************************************************** 65161626
[M::ha_hist_line] 14: ************************************************************************************************ 70143067
[M::ha_hist_line] 15: *************************************************************************************************** 72966931
[M::ha_hist_line] 16: **************************************************************************************************** 73404125
[M::ha_hist_line] 17: ************************************************************************************************* 71437531
[M::ha_hist_line] 18: ******************************************************************************************** 67359635
[M::ha_hist_line] 19: ************************************************************************************ 61349348
[M::ha_hist_line] 20: ************************************************************************** 54563791
[M::ha_hist_line] 21: ***************************************************************** 47861182
[M::ha_hist_line] 22: ********************************************************* 41575691
[M::ha_hist_line] 23: ************************************************* 36265920
[M::ha_hist_line] 24: ******************************************** 32037788
[M::ha_hist_line] 25: *************************************** 28721116
[M::ha_hist_line] 26: ************************************ 26486023
[M::ha_hist_line] 27: ********************************** 25044742
[M::ha_hist_line] 28: ********************************* 24079502
[M::ha_hist_line] 29: ******************************** 23632477
[M::ha_hist_line] 30: ******************************** 23467603
[M::ha_hist_line] 31: ******************************** 23153048
[M::ha_hist_line] 32: ******************************* 23015716
[M::ha_hist_line] 33: ****************************** 22380247
[M::ha_hist_line] 34: ***************************** 21389297
[M::ha_hist_line] 35: **************************** 20376062
[M::ha_hist_line] 36: ************************** 19146936
[M::ha_hist_line] 37: ************************ 17626954
[M::ha_hist_line] 38: ********************** 15984342
[M::ha_hist_line] 39: ******************** 14378007
[M::ha_hist_line] 40: ***************** 12787803
[M::ha_hist_line] 41: *************** 11132014
[M::ha_hist_line] 42: ************* 9605896
[M::ha_hist_line] 43: *********** 8143294
[M::ha_hist_line] 44: ********* 6926763
[M::ha_hist_line] 45: ******** 5801225
[M::ha_hist_line] 46: ******* 4850012
[M::ha_hist_line] 47: ****** 4048750
[M::ha_hist_line] 48: ***** 3410010
[M::ha_hist_line] 49: **** 2845662
[M::ha_hist_line] 50: *** 2418978
[M::ha_hist_line] 51: *** 2045373
[M::ha_hist_line] 52: ** 1798807
[M::ha_hist_line] 53: ** 1573588
[M::ha_hist_line] 54: ** 1425472
[M::ha_hist_line] 55: ** 1288114
[M::ha_hist_line] 56: ** 1208866
[M::ha_hist_line] 57: ** 1117988
[M::ha_hist_line] 58: * 1034859
[M::ha_hist_line] 59: * 973983
[M::ha_hist_line] 60: * 928617
[M::ha_hist_line] 61: * 856342
[M::ha_hist_line] 62: * 814749
[M::ha_hist_line] 63: * 773728
[M::ha_hist_line] 64: * 733380
[M::ha_hist_line] 65: * 684491
[M::ha_hist_line] 66: * 661488
[M::ha_hist_line] 67: * 630514
[M::ha_hist_line] 68: * 609039
[M::ha_hist_line] 69: * 575526
[M::ha_hist_line] 70: * 555132
[M::ha_hist_line] 71: * 535845
[M::ha_hist_line] 72: * 517110
[M::ha_hist_line] 73: * 497096
[M::ha_hist_line] 74: * 473363
[M::ha_hist_line] 75: * 449207
[M::ha_hist_line] 76: * 429007
[M::ha_hist_line] 77: * 410004
[M::ha_hist_line] 78: * 400178
[M::ha_hist_line] 79: * 385339
[M::ha_hist_line] 80: * 375382
[M::ha_hist_line] rest: ************************** 19199508
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: none
[M::ha_ft_gen] peak_hom: 16; peak_het: -1
[M::ha_ct_shrink::930.6807.59] ==> counted 19574890 distinct minimizer k-mers
[M::ha_ft_gen::943.417
[email protected]] ==> filtered out 19574890 k-mers occurring 80 or more times
[M::ha_opt_update_cov] updated max_n_chain to 100
[M::yak_count] collected 1440731874 minimizers
[M::ha_pt_gen::1164.312*11.07] ==> counted 83289253 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
……
[M::stat] # heterozygous bases: 3820503823; # homozygous bases: 13212505

Could you help me understand:

How these observations might coexist under hifiasm’s internal model?
Whether adjustments to parameters (e.g., --hom-cov) could reconcile this discrepancy?

I would greatly appreciate any insights to improve my understanding and ensure proper assembly configuration.

hifiasm2.log

@darkxin12 darkxin12 changed the title [hifiasm_0.25.0-r726] Inconsistent homozygous coverage threshold (29X) vs k-mer peak (16X) in highly heterozygous genome [hifiasm_0.25.0-r726] Single k-mer peak (16X) contradicts high heterozygosity ratio (289:1) in assembly Mar 21, 2025
@olekto
Copy link

olekto commented Mar 21, 2025

Hi,
you have a nice shoulder around 30-32x coverage, so I would not classify this distribution as "single peak". It is rather quite heterozygous (with the homozygous k-mers at 32x or so) as you say.

@darkxin12
Copy link
Author

Hi,
Thank you for your insights. I noticed that Hifiasm printed:

[M::purge_dups] homozygous read coverage threshold: 29

Since this is close to the homozygous coverage we discussed, do I need to reset the --hom-cov value and rerun Hifiasm, or is the current setting sufficient?

Best regards!

@olekto
Copy link

olekto commented Mar 21, 2025

Hi,
I don't think you need to do anything unless you get an assembly that is different to what you would expect (whatever that could be).

Ole

@darkxin12
Copy link
Author

Thanks for the suggestion!
In our current phased assembly (haplotype-resolved assembly), the primary contigs (p_ctg) exhibit a 20% size increase compared to the last version of reference genome (Ref: 1.6 Gb vs. Assembly: 2.0 Gb).
Specifically, haplotype 1 (hap1) approximates the reference size at 1.6 Gb, whereas haplotype 2 (hap2) reaches 2.0 Gb.
I am testing smaller values for -s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants