Support gds var length label pruning #4868

ted-wq-x · 2025-02-07T07:32:21Z

Description

Contributor agreement

I have read and agree to the Contributor Agreement.

andyfengHKU · 2025-02-08T03:05:26Z

src/binder/query/query_graph_label_analyzer.cpp

+        auto satisfyForwardPruning = forwardPruningFunc(srcTableID, dstTableID);
+        if (directionType == RelDirectionType::BOTH) {
+            if (satisfyForwardPruning ||
+                (dstTableIDSet.contains(srcTableID) && srcTableIDSet.contains(dstTableID))) {


I would have a backwardPruningFunc in order to be consistent.

andyfengHKU · 2025-02-08T03:12:29Z

src/binder/query/query_graph_label_analyzer.cpp

+    if (rel.isRecursive()) {
+        auto nodeTableIDs = getNodeTableIDs();
+        // there is no label on both sides
+        if (rel.getUpperBound() == 0 || (srcTableIDSet.size() == dstTableIDSet.size() &&


Is this check srcTableIDSet.size() == dstTableIDSet.size() && dstTableIDSet.size() == nodeTableIDs.size() necessary? Seems your logic would still work in this case. So we would we return early?

andyfengHKU · 2025-02-10T05:16:33Z

src/binder/query/query_graph_label_analyzer.cpp

+        if (currentLength >= upperBound) {
+            return;
+        }
+        if (graph.contains(curTableID)) {


I think this should be KU_ASSERT. Is there a case where graph not contains curTableID?

andyfengHKU · 2025-02-10T05:46:18Z

src/include/binder/expression/rel_expression.h

@@ -102,6 +106,7 @@ class RelExpression final : public NodeOrRelExpression {
        recursiveInfo = std::move(recursiveInfo_);
    }
    const RecursiveInfo* getRecursiveInfo() const { return recursiveInfo.get(); }
+    RecursiveInfo* getRecursiveInfo() { return recursiveInfo.get(); }


rename -> getRecursiveInfoUnsafe

andyfengHKU · 2025-02-10T05:49:53Z

src/function/gds/rec_joins.cpp

@@ -45,6 +47,14 @@ PathsOutputWriterInfo RJBindData::getPathWriterInfo() const {
    return info;
 }

+std::vector<table_id_set_t> RJBindData::getStepActiveRelTableIDs() const {
+    if (flipPath) {


I think it's good time to rename flipPath to extendLeftToRight. Otherwise reading this part is a bit confusing.

andyfengHKU · 2025-02-10T06:01:35Z

src/binder/query/query_graph_label_analyzer.cpp

        }
    }
-    rel.setEntries(prunedEntries);
+    return stepActiveTableIDs;


Let me try to summarize the logic here and pls correct me if I'm wrong.

Consider topology Person - knows -> Person and Person - livesIn -> City and query

MATCH (a:Person) - [*2..2] -> (b:City)

WLOG, we assume going from left to right.

The dfs part will first generate all size 2 path

<<knows>, <knows>> <<knows>, <livesIn>>

and because <<knows>, <knows>> dst is not b:City it will be pruned.

So eventually we only compute <<knows>, <livesIn>>.

If my understanding is correct, can u also give a benchmark number in the PR description to make sure we actually get performance benefit.

andyfengHKU · 2025-02-10T06:18:39Z

src/binder/query/query_graph_label_analyzer.cpp

+
+    // There may be multiple rel type between src and dst
+    using PATH = std::vector<table_id_vector_t>;
+    std::vector<std::vector<PATH>> paths(upperBound + 1);


It would be good to comment

std::vector<PATH> are paths of the same length.

And the outer-most std::vector are paths of different length.

In fact, it might be easier to read if just use std::vector<PATH> here.

andyfengHKU · 2025-02-10T06:20:27Z

src/binder/query/query_graph_label_analyzer.cpp

+    const table_id_set_t srcTableIDSet, const table_id_set_t dstTableIDSet, size_t lowerBound,
+    size_t upperBound, RelDirectionType relDirectionType) const {
+    // src-->[dst,[rels]]
+    std::unordered_map<table_id_t, std::unordered_map<table_id_t, table_id_vector_t>>


Up to use but I would use std::unrdered_map<table_id_t, STRUCT>

and STRUCT contains

table_id_t dstTableID std::vector<tabel_id_t> relTableIDs

because we always access by srcTableID instead of src&dstTableID

andyfengHKU · 2025-02-10T06:23:26Z

src/binder/query/query_graph_label_analyzer.cpp

+        // todo we need reset rel entries?
+        auto temp = mergeTableIDs(stepFromLeftTableIDs, stepFromRightTableIDs);
+        std::vector<TableCatalogEntry*> newRelEntries{temp.begin(), temp.end()};
+        if (!isSameTableCatalogEntryVector(newRelEntries, rel.getEntries())) {


Shall we do l284-l312 as a separate PR because

it can be viewed as another optimization after pruning recursive rel labels; and

we will also need to get a benchmark number to make sure there are performance improvements for node pruning.

codecov · 2025-02-10T08:00:43Z

Codecov Report

Attention: Patch coverage is 99.30070% with 1 line in your changes missing coverage. Please review.

Project coverage is 85.83%. Comparing base (b89dd36) to head (c5130de).
Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
src/binder/query/query_graph_label_analyzer.cpp	99.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4868      +/-   ##
==========================================
+ Coverage   85.80%   85.83%   +0.02%     
==========================================
  Files        1397     1397              
  Lines       60504    60614     +110     
  Branches     7451     7473      +22     
==========================================
+ Hits        51917    52026     +109     
- Misses       8406     8407       +1     
  Partials      181      181

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ted-wq-x · 2025-02-18T13:01:45Z

I found gds join performance worse than before, there are so many memory minor fault. This opt does not provide any performance improvement in current code.

Under recursive join ,this opt can get 900ms -->300ms in sf100,"match (a:person {id:24189255912498})-[f*2]-(b) return count(1)" (Previous test results);

I will provide performance benchmark, when i resolve the current performance problem.

ray6080 requested a review from andyfengHKU February 7, 2025 07:42

andyfengHKU approved these changes Feb 10, 2025

View reviewed changes

ted-wq-x force-pushed the recurse-label-pruning branch from c5130de to 78f3840 Compare February 18, 2025 13:02

Support gds var length label pruning

2ae6a39

ted-wq-x force-pushed the recurse-label-pruning branch from 78f3840 to 2ae6a39 Compare February 19, 2025 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support gds var length label pruning #4868

Support gds var length label pruning #4868

ted-wq-x commented Feb 7, 2025

andyfengHKU Feb 8, 2025

andyfengHKU Feb 8, 2025

andyfengHKU Feb 10, 2025

andyfengHKU Feb 10, 2025

andyfengHKU Feb 10, 2025

andyfengHKU Feb 10, 2025

andyfengHKU Feb 10, 2025

andyfengHKU Feb 10, 2025

andyfengHKU Feb 10, 2025

codecov bot commented Feb 10, 2025

ted-wq-x commented Feb 18, 2025

Support gds var length label pruning #4868

Are you sure you want to change the base?

Support gds var length label pruning #4868

Conversation

ted-wq-x commented Feb 7, 2025

Description

Contributor agreement

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 10, 2025

Codecov Report

ted-wq-x commented Feb 18, 2025