Skip to content

Some disgenet data lacking disgenet.xrefs #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
colleenXu opened this issue Jan 3, 2025 · 0 comments
Open

Some disgenet data lacking disgenet.xrefs #66

colleenXu opened this issue Jan 3, 2025 · 0 comments

Comments

@colleenXu
Copy link
Contributor

colleenXu commented Jan 3, 2025

(EDIT: found while working on biothings/biothings_explorer#900)

Currently, BTE uses the field disgenet.xrefs.umls to retrieve disgenet data because it's the single-namespace disease-ID field that covers the most data: 22574 / 27431 documents with disgenet.

But that's still a good chunk of data that's not accessible: 4857 documents. Today I dug a little more and realized they don't have the parent field disgenet.xrefs either. Instead, the disease ID seems to be in _id only, but with no prefix to say what the ID-namespace is.

I think this may be worth digging into more, and perhaps:

  • adding the disgenet.xrefs field to all documents with disgenet. Probably would include at least the disgenet.xrefs.umls and disgenet.xrefs.disease_name fields (I think disgenet data is based on disease UMLS IDs?)
  • adding the ID-prefix to _id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant