-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support gds var length label pruning #4868
base: master
Are you sure you want to change the base?
Conversation
auto satisfyForwardPruning = forwardPruningFunc(srcTableID, dstTableID); | ||
if (directionType == RelDirectionType::BOTH) { | ||
if (satisfyForwardPruning || | ||
(dstTableIDSet.contains(srcTableID) && srcTableIDSet.contains(dstTableID))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have a backwardPruningFunc
in order to be consistent.
if (rel.isRecursive()) { | ||
auto nodeTableIDs = getNodeTableIDs(); | ||
// there is no label on both sides | ||
if (rel.getUpperBound() == 0 || (srcTableIDSet.size() == dstTableIDSet.size() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check srcTableIDSet.size() == dstTableIDSet.size() && dstTableIDSet.size() == nodeTableIDs.size()
necessary? Seems your logic would still work in this case. So we would we return early?
if (currentLength >= upperBound) { | ||
return; | ||
} | ||
if (graph.contains(curTableID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be KU_ASSERT. Is there a case where graph
not contains curTableID
?
@@ -102,6 +106,7 @@ class RelExpression final : public NodeOrRelExpression { | |||
recursiveInfo = std::move(recursiveInfo_); | |||
} | |||
const RecursiveInfo* getRecursiveInfo() const { return recursiveInfo.get(); } | |||
RecursiveInfo* getRecursiveInfo() { return recursiveInfo.get(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename -> getRecursiveInfoUnsafe
src/function/gds/rec_joins.cpp
Outdated
@@ -45,6 +47,14 @@ PathsOutputWriterInfo RJBindData::getPathWriterInfo() const { | |||
return info; | |||
} | |||
|
|||
std::vector<table_id_set_t> RJBindData::getStepActiveRelTableIDs() const { | |||
if (flipPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good time to rename flipPath
to extendLeftToRight
. Otherwise reading this part is a bit confusing.
} | ||
} | ||
rel.setEntries(prunedEntries); | ||
return stepActiveTableIDs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to summarize the logic here and pls correct me if I'm wrong.
Consider topology Person - knows -> Person
and Person - livesIn -> City
and query
MATCH (a:Person) - [*2..2] -> (b:City)
WLOG, we assume going from left to right.
The dfs
part will first generate all size 2 path
<<knows>, <knows>>
<<knows>, <livesIn>>
and because <<knows>, <knows>>
dst is not b:City
it will be pruned.
So eventually we only compute <<knows>, <livesIn>>
.
If my understanding is correct, can u also give a benchmark number in the PR description to make sure we actually get performance benefit.
|
||
// There may be multiple rel type between src and dst | ||
using PATH = std::vector<table_id_vector_t>; | ||
std::vector<std::vector<PATH>> paths(upperBound + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to comment
std::vector<PATH>
are paths of the same length.
And the outer-most std::vector are paths of different length.
In fact, it might be easier to read if just use std::vector<PATH>
here.
const table_id_set_t srcTableIDSet, const table_id_set_t dstTableIDSet, size_t lowerBound, | ||
size_t upperBound, RelDirectionType relDirectionType) const { | ||
// src-->[dst,[rels]] | ||
std::unordered_map<table_id_t, std::unordered_map<table_id_t, table_id_vector_t>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to use but I would use std::unrdered_map<table_id_t, STRUCT>
and STRUCT contains
table_id_t dstTableID
std::vector<tabel_id_t> relTableIDs
because we always access by srcTableID instead of src&dstTableID
// todo we need reset rel entries? | ||
auto temp = mergeTableIDs(stepFromLeftTableIDs, stepFromRightTableIDs); | ||
std::vector<TableCatalogEntry*> newRelEntries{temp.begin(), temp.end()}; | ||
if (!isSameTableCatalogEntryVector(newRelEntries, rel.getEntries())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we do l284-l312 as a separate PR because
- it can be viewed as another optimization after pruning recursive rel labels; and
- we will also need to get a benchmark number to make sure there are performance improvements for node pruning.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4868 +/- ##
==========================================
+ Coverage 85.80% 85.83% +0.02%
==========================================
Files 1397 1397
Lines 60504 60614 +110
Branches 7451 7473 +22
==========================================
+ Hits 51917 52026 +109
- Misses 8406 8407 +1
Partials 181 181 ☔ View full report in Codecov by Sentry. |
I found gds join performance worse than before, there are so many memory minor fault. This opt does not provide any performance improvement in current code. Under recursive join ,this opt can get 900ms -->300ms in sf100,"match (a:person {id:24189255912498})-[f*2]-(b) return count(1)" (Previous test results); I will provide performance benchmark, when i resolve the current performance problem. |
c5130de
to
78f3840
Compare
78f3840
to
2ae6a39
Compare
Description
Fixes #4396
Contributor agreement