-
Notifications
You must be signed in to change notification settings - Fork 78
New slot subject_lineage, object_lineage, predicate_lineage #1549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Provenance of node normalization is something we’ve talked about needing in the NCATS Translator project as well, but don’t yet have a model for. It would be nice to work on a shared solution here. |
Absolutely! Most of the data we use is anyways NcATS kg data.. how would you like to proceed? |
I think it's important. Pretty sure we added this into DOGSLED proposal, but don't remember the when/where of it. But I think that this would be a great feature to add into biolink. Nodenorm probably doesn't much care, because the interface there is just curie in / stuff out, and while it returns e.g. biolink categories and biolink prefixes the output structure is not biolink itself (maybe a problem?). And then the caller smushes that into whatever biolink message/dataset they are making. |
TLDR: I don't think this is relevant to NodeNorm itself, but I think it could be useful to users of NodeNorm, so I agree that it would be useful. NodeNorm groups identifiers using "glomming", where we obtain ID-ID pairs from different sources and then combine them into a single clique. For example, if source 1 asserts We did propose some NodeNorm provenance work for DOGSLED, but I believe that's been pushed past year one. But I've sketched up a plan to implement provenance in Babel with minimal extra work, which won't be able to tell you exactly where a particular ID-ID pair came from, but would tell you all the provenance that went into a particular compendium. So when looking at the So, on the one hand, NodeNorm wouldn't be able to produce something that looks like |
Thank you @gaurav! Does anyone have an opinion on the modelling? |
@matentzn - are there parallels here that we can leverage from SSSOM? |
I believe we need a better name for the concepts you call |
One thing I'm not sure how to handle is that often a grounding or normalization will be to a clique, even if it is represented by a clique leader. Is that something important to track? |
I don't think this is needed - all that is needed is the fact that the
What I am proposing are lineage fields specifically on the
The idea is good, but the final model would look quite verbose:
Of course I personally like this, but I don't think anyone would ever go to the trouble of implementing this? |
Is your feature request related to a problem? Please describe.
I would like to request slots for
subject_lineage
,object_lineage
,predicate_lineage
to complementoriginal_subject
slots. The idea is to be able to capture the transforms edge data undergoes every step of the way, from source extraction to final KG (aggregator or otherwise).For example,
Alzheimer
DOID:123
MONDO:123
ICD10:EXX999
What working group (or team) did this request originate from?
Monarch Initiative, Every Cure
Describe the solution you'd like
I don't know yet how this would look like. I can see two obvious ways: using complex lists or simple lists.
Simple example:
The advantage would be that we could express this easily in a KGX TSV file like:
Complex example:
The simple example lacks a lot of detail, and most importantly, extensibility (making it much less future proof). A more complex solution would look like this:
The advantage is that this could be modelled with arbitrary depth, like this:
Additional information to support this request (optional)
I don't know what the best way to do this is, but I am certain this request would greatly help elevating biolink to taking its role in making AI outputs explainable / transparent. Many times we wonder: how "good" is our data, but if we do not reflect the normalisation lineage, we loose information every time the edges are integrated through a new context.
Tag relevant members for discussion
@kevinschaper @sierra-moxon @cmungall @cbizon
The text was updated successfully, but these errors were encountered: