-
Notifications
You must be signed in to change notification settings - Fork 919
Poc for adaptive parquet predicate pushdown(bitmap/range) with page cache(3 data pages) #7454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@alamb ./bench.sh compare main test_default_parquet_push_down
Comparing main and test_default_parquet_push_down
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ test_default_parquet_push_down ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 0.32ms │ 0.29ms │ +1.13x faster │
│ QQuery 1 │ 46.96ms │ 48.09ms │ no change │
│ QQuery 2 │ 75.59ms │ 71.36ms │ +1.06x faster │
│ QQuery 3 │ 74.12ms │ 83.81ms │ 1.13x slower │
│ QQuery 4 │ 556.03ms │ 528.82ms │ no change │
│ QQuery 5 │ 563.52ms │ 557.09ms │ no change │
│ QQuery 6 │ 0.31ms │ 0.32ms │ no change │
│ QQuery 7 │ 52.23ms │ 55.85ms │ 1.07x slower │
│ QQuery 8 │ 720.15ms │ 646.68ms │ +1.11x faster │
│ QQuery 9 │ 741.10ms │ 752.54ms │ no change │
│ QQuery 10 │ 171.95ms │ 175.20ms │ no change │
│ QQuery 11 │ 187.66ms │ 201.39ms │ 1.07x slower │
│ QQuery 12 │ 597.16ms │ 616.24ms │ no change │
│ QQuery 13 │ 877.71ms │ 832.41ms │ +1.05x faster │
│ QQuery 14 │ 605.11ms │ 619.46ms │ no change │
│ QQuery 15 │ 630.66ms │ 618.32ms │ no change │
│ QQuery 16 │ 1422.47ms │ 1333.25ms │ +1.07x faster │
│ QQuery 17 │ 1221.90ms │ 1189.71ms │ no change │
│ QQuery 18 │ 2773.23ms │ 2762.43ms │ no change │
│ QQuery 19 │ 66.30ms │ 65.92ms │ no change │
│ QQuery 20 │ 682.62ms │ 692.95ms │ no change │
│ QQuery 21 │ 800.86ms │ 718.37ms │ +1.11x faster │
│ QQuery 22 │ 1521.09ms │ 1252.25ms │ +1.21x faster │
│ QQuery 23 │ 4223.95ms │ 2965.58ms │ +1.42x faster │
│ QQuery 24 │ 286.83ms │ 323.07ms │ 1.13x slower │
│ QQuery 25 │ 274.47ms │ 304.48ms │ 1.11x slower │
│ QQuery 26 │ 320.45ms │ 353.38ms │ 1.10x slower │
│ QQuery 27 │ 945.72ms │ 927.91ms │ no change │
│ QQuery 28 │ 8206.32ms │ 8225.21ms │ no change │
│ QQuery 29 │ 459.59ms │ 448.50ms │ no change │
│ QQuery 30 │ 493.35ms │ 489.28ms │ no change │
│ QQuery 31 │ 585.82ms │ 580.06ms │ no change │
│ QQuery 32 │ 2436.43ms │ 2556.73ms │ no change │
│ QQuery 33 │ 2916.52ms │ 2586.42ms │ +1.13x faster │
│ QQuery 34 │ 2975.16ms │ 3235.47ms │ 1.09x slower │
│ QQuery 35 │ 866.75ms │ 880.71ms │ no change │
│ QQuery 36 │ 104.22ms │ 49.53ms │ +2.10x faster │
│ QQuery 37 │ 62.50ms │ 38.01ms │ +1.64x faster │
│ QQuery 38 │ 107.57ms │ 44.28ms │ +2.43x faster │
│ QQuery 39 │ 167.64ms │ 52.52ms │ +3.19x faster │
│ QQuery 40 │ 46.49ms │ 38.20ms │ +1.22x faster │
│ QQuery 41 │ 45.49ms │ 37.30ms │ +1.22x faster │
│ QQuery 42 │ 42.51ms │ 36.32ms │ +1.17x faster │
└──────────────┴───────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main) │ 39956.84ms │
│ Total Time (test_default_parquet_push_down) │ 37995.72ms │
│ Average Time (main) │ 929.23ms │
│ Average Time (test_default_parquet_push_down) │ 883.62ms │
│ Queries Faster │ 16 │
│ Queries Slower │ 7 │
│ Queries with No Change │ 20 │
└───────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ test_default_parquet_push_down ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 1.10ms │ 1.23ms │ 1.12x slower │
│ QQuery 1 │ 24.27ms │ 27.97ms │ 1.15x slower │
│ QQuery 2 │ 58.03ms │ 53.24ms │ +1.09x faster │
│ QQuery 3 │ 58.91ms │ 62.60ms │ 1.06x slower │
│ QQuery 4 │ 478.56ms │ 481.69ms │ no change │
│ QQuery 5 │ 549.38ms │ 526.84ms │ no change │
│ QQuery 6 │ 1.18ms │ 1.19ms │ no change │
│ QQuery 7 │ 40.05ms │ 41.35ms │ no change │
│ QQuery 8 │ 645.79ms │ 615.65ms │ no change │
│ QQuery 9 │ 671.10ms │ 663.80ms │ no change │
│ QQuery 10 │ 133.89ms │ 154.96ms │ 1.16x slower │
│ QQuery 11 │ 159.81ms │ 172.84ms │ 1.08x slower │
│ QQuery 12 │ 561.74ms │ 614.07ms │ 1.09x slower │
│ QQuery 13 │ 750.14ms │ 783.97ms │ no change │
│ QQuery 14 │ 525.03ms │ 548.79ms │ no change │
│ QQuery 15 │ 553.88ms │ 557.18ms │ no change │
│ QQuery 16 │ 1417.77ms │ 1373.96ms │ no change │
│ QQuery 17 │ 1104.70ms │ 1170.50ms │ 1.06x slower │
│ QQuery 18 │ 3037.46ms │ 2778.68ms │ +1.09x faster │
│ QQuery 19 │ 45.81ms │ 47.01ms │ no change │
│ QQuery 20 │ 733.71ms │ 733.04ms │ no change │
│ QQuery 21 │ 789.70ms │ 730.95ms │ +1.08x faster │
│ QQuery 22 │ 1299.84ms │ 1172.80ms │ +1.11x faster │
│ QQuery 23 │ 3952.24ms │ 2819.91ms │ +1.40x faster │
│ QQuery 24 │ 273.40ms │ 317.58ms │ 1.16x slower │
│ QQuery 25 │ 274.14ms │ 276.03ms │ no change │
│ QQuery 26 │ 320.12ms │ 322.94ms │ no change │
│ QQuery 27 │ 900.06ms │ 909.25ms │ no change │
│ QQuery 28 │ 7812.82ms │ 7976.90ms │ no change │
│ QQuery 29 │ 390.07ms │ 396.34ms │ no change │
│ QQuery 30 │ 420.68ms │ 468.25ms │ 1.11x slower │
│ QQuery 31 │ 571.58ms │ 591.13ms │ no change │
│ QQuery 32 │ 2585.80ms │ 2684.39ms │ no change │
│ QQuery 33 │ 2621.59ms │ 2831.80ms │ 1.08x slower │
│ QQuery 34 │ 3144.83ms │ 3258.26ms │ no change │
│ QQuery 35 │ 855.71ms │ 808.77ms │ +1.06x faster │
│ QQuery 36 │ 80.10ms │ 33.07ms │ +2.42x faster │
│ QQuery 37 │ 35.71ms │ 26.33ms │ +1.36x faster │
│ QQuery 38 │ 79.06ms │ 34.07ms │ +2.32x faster │
│ QQuery 39 │ 123.78ms │ 39.47ms │ +3.14x faster │
│ QQuery 40 │ 29.03ms │ 27.64ms │ no change │
│ QQuery 41 │ 27.67ms │ 25.72ms │ +1.08x faster │
│ QQuery 42 │ 25.98ms │ 24.90ms │ no change │
└──────────────┴───────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main) │ 38166.21ms │
│ Total Time (test_default_parquet_push_down) │ 37187.06ms │
│ Average Time (main) │ 887.59ms │
│ Average Time (test_default_parquet_push_down) │ 864.82ms │
│ Queries Faster │ 11 │
│ Queries Slower │ 10 │
│ Queries with No Change │ 22 │
└───────────────────────────────────────────────┴────────────┘ |
These results look great -- thank you @zhuqi-lucas -- I am very close to the more targeted benchmark; Hopefully we can use that to confirm your results as well as further optimize |
I am testing this PR using the benchmark in And I will report back |
Testing this PR using my benchmark in Shows measureable improvements. The only thing that looks like it may be slower is Q1 which is roughly what you observed as well
|
Great work @alamb , it's big improvement for us to mock the clickbench result now. And meantime, i will try to investigate if we also can improve the Q1 case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhuqi-lucas -- this is pretty epic work. I rerunning the benchmarks in #7470 to get one last measurement.
Then I think we should move on to "productionizing" this code in some smaller pieces
/// Unlike intersection, the `other` [`BooleanRowSelection`] must have exactly as many set bits as `self`. | ||
/// This method will keep only the bits in `self` that are also set in `other` | ||
/// at the positions corresponding to `self`'s set bits. | ||
pub fn and_then(&self, other: &Self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the other thing we can and should do here is change the signature to take an owned self
-- also for intersection
-- the API now forces a new memory allocation.
Thank you @alamb , i agree with you, i am thinking to remove unused code and make the code clear for our first step which performance is similar to above result, here is my plan for the PR to merge:
And the final PR will not be huge, it will be less than 1000 line code i believe, i will do this soon! And i will still use this PR to submitted, thanks! |
Thank you -- this sounds great. I'll hope to get the benchmarks merged soon so we can incrementally evaluate our progress |
I completed another benchmark run now and got these results The major difference is slow downs in the async reader for Q38-Q40:
|
Thank you @alamb and the result is also reasonable Because, the result for me here is compared the Unified select PR with the main branch(And no parquet filter pushdown). So when we improve most of the regression for filter push down compared to no pushdown, it may also cause some regression to the original default push down, we can improve it further. And the sync is no change because we still don't implement the sync version for the improvement PR. The improvement is for async. |
Which issue does this PR close?
and Parquet decoder / decoded page Cache #7363
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?