Skip to content

Commit a116754

Browse files
committed
Update docs for metadata
1 parent 6f4c3a0 commit a116754

File tree

1 file changed

+64
-62
lines changed

1 file changed

+64
-62
lines changed

docs/tools/metadata.md

+64-62
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
1-
21
# seqfu metadata
32

4-
Given one (or more) directories containing sequencing reads,
5-
will produce a metadata file extracting the ID from the filename
6-
and optionally adding the file paths or read counts.
3+
Given one (or more) directories containing sequencing reads, this tool produces a metadata file by extracting the ID from the filename and optionally adding file paths or read counts.
74

5+
## Usage
86
```
97
Usage: metadata [options] [<dir>...]
8+
metadata formats
109
1110
Prepare mapping files from directory containing FASTQ files
1211
@@ -16,11 +15,19 @@ Options:
1615
-s, --split STR Separator used in filename to identify the sample ID [default: _]
1716
--pos INT... Which part of the filename is the Sample ID [default: 1]
1817
19-
-f, --format TYPE Output format: dadaist, irida, manifest, metaphage, qiime1, qiime2 [default: manifest]
20-
--pe Enforce paired-end reads (not supported)
18+
-f, --format TYPE Output format: dadaist, irida, manifest, metaphage, qiime1, qiime2, lotus, ampliseq, rnaseq, bactopia, mag [default: manifest]
2119
-p, --add-path Add the reads absolute path as column
22-
-c, --counts Add the number of reads as a property column
23-
-t, --threads INT Number of simultaneously opened files [default: 2]
20+
-c, --counts Add the number of reads as a property column (experimental)
21+
-t, --threads INT Number of simultaneously opened files (legacy: ignored)
22+
--pe Enforce paired-end reads (not supported)
23+
--ont Long reads (Oxford Nanopore) [default: false]
24+
25+
GLOBAL OPTIONS
26+
--abs Force absolute path
27+
--basename Use basename instead of full path
28+
--force-tsv Force '\t' separator, otherwise selected by the format
29+
--force-csv Force ',' separator, otherwise selected by the format
30+
-R, --rand-meta INT Add a random metadata column with INT categories
2431
2532
FORMAT SPECIFIC OPTIONS
2633
-P, --project INT Project ID (only for irida)
@@ -29,89 +36,66 @@ Options:
2936
--meta-default STR Default value for metadata, used in MetaPhage [default: Cond]
3037
3138
-v, --verbose Verbose output
39+
--debug Debug output
3240
-h, --help Show this help
3341
```
3442

3543
## Output formats
3644

37-
* manifest (used as import manifest for [Qiime2](https://qiime2.org/) artifacts)
38-
* qiime1, qiime2 (forward-compatible [qiime1](http://qiime.org/) mapping file; a dedicated [Qiime2](https://qiime2.org/) metadata file is under development)
39-
* dadaist ([Dadaist2](quadram-institute-bioscience.github.io/dadaist2) compatible metadata)
40-
* lotus ([Lotus](http://lotus2.earlham.ac.uk/) mapping file - tested with Lotus1)
41-
* irida ([IRIDA uploader](https://github.com/phac-nml/irida-uploader) sample sheet. Requires `-P PROJECTID`)
42-
* metaphage ([MetaPhage](https://mattiapandolfovr.github.io/MetaPhage), use `--meta-split`, `--meta-part` and `--meta-default` to customize a Treatment column)
45+
SeqFu metadata now supports the following output formats:
46+
47+
1. **manifest**: Used as import manifest for [Qiime2](https://qiime2.org/) artifacts.
48+
2. **qiime1**: Forward-compatible [Qiime1](http://qiime.org/) mapping file.
49+
3. **qiime2**: [Qiime2](https://qiime2.org/) metadata file.
50+
4. **dadaist**: [Dadaist2](https://quadram-institute-bioscience.github.io/dadaist2) compatible metadata.
51+
5. **lotus**: [Lotus](http://lotus2.earlham.ac.uk/) mapping file (tested with Lotus1).
52+
6. **irida**: [IRIDA uploader](https://github.com/phac-nml/irida-uploader) sample sheet. Requires `-P PROJECTID`.
53+
7. **metaphage**: [MetaPhage](https://mattiapandolfovr.github.io/MetaPhage) metadata file. Use `--meta-split`, `--meta-part`, and `--meta-default` to customize a Treatment column.
54+
8. **ampliseq**: [nf-core/ampliseq](https://nf-co.re/ampliseq) metadata file.
55+
9. **rnaseq**: [nf-core/rnaseq](https://nf-co.re/rnaseq) metadata file.
56+
10. **bactopia**: [Bactopia](https://bactopia.github.io/) FOFN (File of File Names) file.
57+
11. **mag**: [nf-core/mag](https://nf-co.re/mag) metadata file.
58+
59+
## New Features
60+
61+
- Support for `--format bactopia` to generate Bactopia FOFN files.
62+
- Added `--ont` option for long reads (Oxford Nanopore Technology).
63+
- Enhanced support for various bioinformatics pipelines (ampliseq, rnaseq, mag).
4364

4465
## Examples
4566

46-
### Manifest
67+
### Manifest (default)
4768

48-
```
69+
```bash
4970
seqfu metadata ./MiSeq_SOP/
5071
```
5172

52-
Will produce this output:
73+
Output:
5374
```
5475
sample-id forward-absolute-filepath reverse-absolute-filepath
5576
F3D0 /Users/telatin/MiSeq_SOP/F3D0_S188_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D0_S188_L001_R2_001.fastq.gz
5677
F3D1 /Users/telatin/MiSeq_SOP/F3D1_S189_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D1_S189_L001_R2_001.fastq.gz
57-
F3D141 /Users/telatin/MiSeq_SOP/F3D141_S207_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D141_S207_L001_R2_001.fastq.gz
58-
F3D142 /Users/telatin/MiSeq_SOP/F3D142_S208_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D142_S208_L001_R2_001.fastq.gz
59-
F3D143 /Users/telatin/MiSeq_SOP/F3D143_S209_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D143_S209_L001_R2_001.fastq.gz
60-
F3D144 /Users/telatin/MiSeq_SOP/F3D144_S210_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D144_S210_L001_R2_001.fastq.gz
61-
F3D145 /Users/telatin/MiSeq_SOP/F3D145_S211_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D145_S211_L001_R2_001.fastq.gz
62-
F3D146 /Users/telatin/MiSeq_SOP/F3D146_S212_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D146_S212_L001_R2_001.fastq.gz
63-
F3D147 /Users/telatin/MiSeq_SOP/F3D147_S213_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D147_S213_L001_R2_001.fastq.gz
64-
F3D148 /Users/telatin/MiSeq_SOP/F3D148_S214_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D148_S214_L001_R2_001.fastq.gz
65-
F3D149 /Users/telatin/MiSeq_SOP/F3D149_S215_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D149_S215_L001_R2_001.fastq.gz
66-
F3D150 /Users/telatin/MiSeq_SOP/F3D150_S216_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D150_S216_L001_R2_001.fastq.gz
67-
F3D2 /Users/telatin/MiSeq_SOP/F3D2_S190_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D2_S190_L001_R2_001.fastq.gz
68-
F3D3 /Users/telatin/MiSeq_SOP/F3D3_S191_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D3_S191_L001_R2_001.fastq.gz
69-
F3D5 /Users/telatin/MiSeq_SOP/F3D5_S193_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D5_S193_L001_R2_001.fastq.gz
70-
F3D6 /Users/telatin/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq.gz
71-
F3D7 /Users/telatin/MiSeq_SOP/F3D7_S195_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D7_S195_L001_R2_001.fastq.gz
72-
F3D8 /Users/telatin/MiSeq_SOP/F3D8_S196_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D8_S196_L001_R2_001.fastq.gz
73-
F3D9 /Users/telatin/MiSeq_SOP/F3D9_S197_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/F3D9_S197_L001_R2_001.fastq.gz
74-
Mock /Users/telatin/MiSeq_SOP/Mock_S280_L001_R1_001.fastq.gz /Users/telatin/MiSeq_SOP/Mock_S280_L001_R2_001.fastq.gz
78+
...
7579
```
7680

77-
### Qiime mapping file
78-
79-
Note that `-f qiime2` will add a second header line.
81+
### Qiime1 mapping file
8082

81-
```
83+
```bash
8284
seqfu metadata MiSeq_SOP -f qiime1 --add-path --counts
8385
```
8486

8587
Output:
86-
8788
```
8889
#SampleID Counts Paths
8990
F3D0 7793 F3D0_S188_L001_R1_001.fastq.gz,F3D0_S188_L001_R2_001.fastq.gz
9091
F3D1 5869 F3D1_S189_L001_R1_001.fastq.gz,F3D1_S189_L001_R2_001.fastq.gz
91-
F3D141 5958 F3D141_S207_L001_R1_001.fastq.gz,F3D141_S207_L001_R2_001.fastq.gz
92-
F3D142 3183 F3D142_S208_L001_R1_001.fastq.gz,F3D142_S208_L001_R2_001.fastq.gz
93-
F3D143 3178 F3D143_S209_L001_R1_001.fastq.gz,F3D143_S209_L001_R2_001.fastq.gz
94-
F3D144 4827 F3D144_S210_L001_R1_001.fastq.gz,F3D144_S210_L001_R2_001.fastq.gz
95-
F3D145 7377 F3D145_S211_L001_R1_001.fastq.gz,F3D145_S211_L001_R2_001.fastq.gz
96-
F3D146 5021 F3D146_S212_L001_R1_001.fastq.gz,F3D146_S212_L001_R2_001.fastq.gz
97-
F3D147 17070 F3D147_S213_L001_R1_001.fastq.gz,F3D147_S213_L001_R2_001.fastq.gz
98-
F3D148 12405 F3D148_S214_L001_R1_001.fastq.gz,F3D148_S214_L001_R2_001.fastq.gz
99-
F3D149 13083 F3D149_S215_L001_R1_001.fastq.gz,F3D149_S215_L001_R2_001.fastq.gz
100-
F3D150 5509 F3D150_S216_L001_R1_001.fastq.gz,F3D150_S216_L001_R2_001.fastq.gz
101-
F3D2 19620 F3D2_S190_L001_R1_001.fastq.gz,F3D2_S190_L001_R2_001.fastq.gz
102-
F3D3 6758 F3D3_S191_L001_R1_001.fastq.gz,F3D3_S191_L001_R2_001.fastq.gz
103-
F3D5 4448 F3D5_S193_L001_R1_001.fastq.gz,F3D5_S193_L001_R2_001.fastq.gz
104-
F3D6 7989 F3D6_S194_L001_R1_001.fastq.gz,F3D6_S194_L001_R2_001.fastq.gz
105-
F3D7 5129 F3D7_S195_L001_R1_001.fastq.gz,F3D7_S195_L001_R2_001.fastq.gz
106-
F3D8 5294 F3D8_S196_L001_R1_001.fastq.gz,F3D8_S196_L001_R2_001.fastq.gz
107-
F3D9 7070 F3D9_S197_L001_R1_001.fastq.gz,F3D9_S197_L001_R2_001.fastq.gz
108-
Mock 4779 Mock_S280_L001_R1_001.fastq.gz,Mock_S280_L001_R2_001.fastq.gz
92+
...
10993
```
11094

11195
### IRIDA uploader
11296

113-
```
114-
seqfu metadata -f irida -P 123 data/pe/
97+
```bash
98+
seqfu metadata -f irida -P 123 data/pe/
11599
```
116100

117101
Output:
@@ -121,7 +105,25 @@ sample1,123,sample1_R1.fq.gz,sample1_R2.fq.gz
121105
sample2,123,sample2_R1.fq.gz,sample2_R2.fq.gz
122106
```
123107

108+
### Bactopia FOFN
109+
110+
```bash
111+
seqfu metadata -f bactopia data/pe/
112+
```
113+
114+
Output:
115+
```
116+
sample runtype r1 r2
117+
sample1 paired-end /path/to/data/pe/sample1_R1.fq.gz /path/to/data/pe/sample1_R2.fq.gz
118+
sample2 paired-end /path/to/data/pe/sample2_R1.fq.gz /path/to/data/pe/sample2_R2.fq.gz
119+
```
120+
121+
## Notes
124122

125-
## Screenshot
123+
- The `--ont` option is useful for projects involving Oxford Nanopore long reads.
124+
- Use `--add-path` to include full file paths in the output (when supported by the format).
125+
- The `--counts` option adds read counts to the output (experimental feature, not supported by all formats).
126+
- Format-specific options (like `--project` for IRIDA) are required for certain output types.
127+
- Use `--verbose` for detailed processing information and `--debug` for troubleshooting.
126128

127-
![Screenshot of "seqfu metadata"]({{site.baseurl}}/img/screenshot-metadata.svg "SeqFu metadata")
129+
For more information on each format and its specific options, please refer to the respective tool's documentation.

0 commit comments

Comments
 (0)