Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Estimate cardinality of multiple equivalent predicates using multi-column combined stats #56836

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stephen-shelby
Copy link
Contributor

@stephen-shelby stephen-shelby commented Mar 12, 2025

Why I'm doing:

What I'm doing:

major changes:

  1. adjust the calculation method of multiple equivalent predicates cardinality estimation without multi-column stats
  2. estimate cardinality of multiple equivalent predicates using multi-column combined stats

without multi-columns stats

  • use up to 4 equivalent predicates with the best filtering performance
  • Considering the correlation between columns, use the decreasing formula ref to SQL Server.
    Sel(p1 ∧ p2 ∧ p3) = Sel(p1) × Sel(p2)^{1/2} × Sel(p3)^{1/4}

Multi-column combined statistics based:
S_mc = max(min(1/NDV, min_sel), prod_sel)
Where:
1/NDV is the selectivity based on multi-columns ndv
min_sel is the minimum selectivity among correlated columns
prod_sel is the product of individual column selectivities

added session variable:
cbo_use_correlated_predicate_estimate
if use decreasing formula for CE. default value is true.

Fixes #issue
#56358

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@stephen-shelby stephen-shelby requested a review from a team as a code owner March 12, 2025 06:28
@stephen-shelby stephen-shelby changed the title [Enhancement] estimate cardinality of multiple equivalent predicates using multi-column combined stats [UT] estimate cardinality of multiple equivalent predicates using multi-column combined stats Mar 12, 2025
@stephen-shelby stephen-shelby changed the title [UT] estimate cardinality of multiple equivalent predicates using multi-column combined stats [Enhancement] estimate cardinality of multiple equivalent predicates using multi-column combined stats Mar 12, 2025
@stephen-shelby stephen-shelby force-pushed the estimate_multi_eq_pred_with_mcstats branch from 41e4a88 to 2d8b9cf Compare March 13, 2025 08:14
@stephen-shelby stephen-shelby changed the title [Enhancement] estimate cardinality of multiple equivalent predicates using multi-column combined stats [UT] estimate cardinality of multiple equivalent predicates using multi-column combined stats Mar 13, 2025
@stephen-shelby stephen-shelby changed the title [UT] estimate cardinality of multiple equivalent predicates using multi-column combined stats [UT] Estimate cardinality of multiple equivalent predicates using multi-column combined stats Mar 13, 2025
@stephen-shelby stephen-shelby changed the title [UT] Estimate cardinality of multiple equivalent predicates using multi-column combined stats [Enhancement] Estimate cardinality of multiple equivalent predicates using multi-column combined stats Mar 13, 2025
@stephen-shelby stephen-shelby force-pushed the estimate_multi_eq_pred_with_mcstats branch 2 times, most recently from 8a1ee55 to 4d6cd6c Compare March 13, 2025 12:26
…using multi-column combined stats

Signed-off-by: stephen <[email protected]>
@stephen-shelby stephen-shelby force-pushed the estimate_multi_eq_pred_with_mcstats branch from 4d6cd6c to c6fb081 Compare March 13, 2025 13:24
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
B Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

pass : 112 / 114 (98.25%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/sql/optimizer/statistics/StatisticsEstimateUtils.java 86 88 97.73% [93, 129]
🔵 com/starrocks/qe/SessionVariable.java 4 4 100.00% []
🔵 com/starrocks/sql/optimizer/Utils.java 13 13 100.00% []
🔵 com/starrocks/sql/optimizer/statistics/BinaryPredicateStatisticCalculator.java 1 1 100.00% []
🔵 com/starrocks/sql/optimizer/statistics/PredicateStatisticsCalculator.java 8 8 100.00% []

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant