Skip to content

Commit ae44cab

Browse files
committed
chore: add reference to lrge for genome size estimation
1 parent dadbbd1 commit ae44cab

File tree

1 file changed

+54
-14
lines changed

1 file changed

+54
-14
lines changed

README.md

+54-14
Original file line numberDiff line numberDiff line change
@@ -14,26 +14,52 @@
1414

1515
## Table of Contents
1616

17+
- [Table of Contents](#table-of-contents)
1718
- [Motivation](#motivation)
1819
- [Install](#install)
19-
- [`cargo`](#cargo)
20-
- [`conda`](#conda)
21-
- [Container](#container)
22-
- [`homebrew`](#homebrew)
23-
- [Release binaries](#release-binaries)
24-
- [Build locally](#build-locally)
20+
- [`cargo`](#cargo)
21+
- [`conda`](#conda)
22+
- [Container](#container)
23+
- [`singularity`](#singularity)
24+
- [`docker`](#docker)
25+
- [Build locally](#build-locally)
2526
- [Usage](#usage)
26-
- [Basic usage - reads](#basic-usage---reads)
27-
- [Basic usage - alignments](#basic-usage---alignments)
28-
- [Required parameters](#required-parameters)
29-
- [Optional parameters](#optional-parameters)
30-
- [Full usage](#full-usage)
27+
- [Basic usage - reads](#basic-usage---reads)
28+
- [Basic usage - alignments](#basic-usage---alignments)
29+
- [Required parameters](#required-parameters)
30+
- [Input](#input)
31+
- [Coverage](#coverage)
32+
- [`-c`, `--coverage`](#-c---coverage)
33+
- [Genome size](#genome-size)
34+
- [`-g`, `--genome-size`](#-g---genome-size)
35+
- [Optional parameters](#optional-parameters)
36+
- [Output](#output)
37+
- [`-o`, `--output`](#-o---output)
38+
- [Output compression/format](#output-compressionformat)
39+
- [`-O`, `--output-type`](#-o---output-type)
40+
- [Compresion level](#compresion-level)
41+
- [`-l`, `--compress-level`](#-l---compress-level)
42+
- [Target number of bases](#target-number-of-bases)
43+
- [`-b`, `--bases`](#-b---bases)
44+
- [Number of reads](#number-of-reads)
45+
- [`-n`, `--num`](#-n---num)
46+
- [Fraction of reads](#fraction-of-reads)
47+
- [`-f`, `--frac`](#-f---frac)
48+
- [Random seed](#random-seed)
49+
- [`-s`, `--seed`](#-s---seed)
50+
- [Verbosity](#verbosity)
51+
- [`-v`](#-v)
52+
- [Full usage](#full-usage)
53+
- [`reads` command](#reads-command)
54+
- [`aln` command](#aln-command)
3155
- [Benchmark](#benchmark)
32-
- [Single long read input](#single-long-read-input)
33-
- [Paired-end input](#paired-end-input)
56+
- [Single long read input](#single-long-read-input)
57+
- [Results](#results)
58+
- [Paired-end input](#paired-end-input)
59+
- [Results](#results-1)
3460
- [Contributing](#contributing)
3561
- [Citing](#citing)
36-
- [Bibtex](#bibtex)
62+
- [Bibtex](#bibtex)
3763

3864
## Motivation
3965

@@ -281,6 +307,20 @@ suffixes include:
281307
Alternatively, a [FASTA/Q index file][faidx] can be given and the genome size will be
282308
set to the sum of all reference sequences in it.
283309

310+
> [!TIP]
311+
> If you want to use `rasusa` in a scenario where you don't know what the genome size is,
312+
> such as in an automated pipeline that can take in any kind of organism, you could estimate
313+
> the genome size with something like [`lrge`](https://github.com/mbhall88/lrge) (#shamelessplug).
314+
>
315+
> ```
316+
> $ gsize=$(lrge reads.fq)
317+
> $ rasusa reads -g $gsize -c 10 reads.fq
318+
> ```
319+
> `lrge` is designed for long reads. If you want to estimate the genome size from short
320+
> reads, you could use something like [Mash](https://github.com/marbl/Mash) or
321+
> [GenomeScope2](https://github.com/tbenavi1/genomescope2.0). See [the `lrge` docs](https://github.com/tbenavi1/genomescope2.0)
322+
> for examples of how Mash/GenomeScope2 can be used for this task.
323+
284324
[faidx]: https://www.htslib.org/doc/faidx.html
285325
286326
### Optional parameters

0 commit comments

Comments
 (0)