Skip to content

Strain metadata field missing from datasets download (for oropouche) #471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anna-parker opened this issue Apr 3, 2025 · 1 comment
Open
Labels
bug Something isn't working

Comments

@anna-parker
Copy link

Describe the bug

We are using ncbi datasets to download data from NCBI we recently realized that the strain name is not added when we download all Oropouche assemblies.

datasets download virus genome taxon 118655

For example AF164550.1 has strain="MD023" but this is not included as a field in ncbi_dataset/data/data_report.jsonl.

To Reproduce

  1. Download all oropouche sequences with metadata using
    datasets download virus genome taxon 118655
  2. Unzip the file using /usr/bin/unzip ncbi_dataset.zip
  3. Search for strain, search for MD023 in the file -> not found.

Expected behavior
We would expect to find the strain field (if this is renamed this is not a problem) when we download the data. This would be quite useful as some submitters use this a sample identifier. Thanks in advance for any help!

@olearyna
Copy link
Contributor

olearyna commented Apr 4, 2025

Hi anna-parker,

Thanks for opening this issue. I've passed your request along to Eneida Hatcher and the NCBI Virus team, as they are the group responsible for the underlying data. They're looking into it and any updates they make will flow through to the the Datasets report.

Thanks,
Nuala

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants