Are there recommended pipelines/methods for dereplicating LARGE custom databases? #43
-
I was hoping to find the process for dereplicating the provided sylph databases (e.g. from the GTDB). In lieu of that, are there any recommended pipelines/methods for dereplicating large custom databases? I'm not a bioinformatician, but I can't make use of the segmented genomes after profiling with gtdb-r220-c200-dbv1.syldb. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
@bryantmurphy The GTDB-R220 database is the species-level dereplicated genomes from GTDB. They have a specific pipeline for dereplication, see https://academic.oup.com/nar/article/50/D1/D785/6370255 For dereplicating large custom databases, see https://github.com/MrOlm/drep (quite popular) or https://github.com/raufs/skDER for possible tools. |
Beta Was this translation helpful? Give feedback.
@bryantmurphy The GTDB-R220 database is the species-level dereplicated genomes from GTDB. They have a specific pipeline for dereplication, see https://academic.oup.com/nar/article/50/D1/D785/6370255
For dereplicating large custom databases, see https://github.com/MrOlm/drep (quite popular) or https://github.com/raufs/skDER for possible tools.