-
Notifications
You must be signed in to change notification settings - Fork 925
arrow-select: add support for optimized concatenation of struct arrays #7517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks @asubiotto -- I suspect the CI failure will be fixed by (not related to your PR) |
Thanks @alamb, will rebase once that's in. |
I wonder would it be possible to get some benchmarks (ideally added as another PR) for concatenating structs in Also, note to myself and other reviewers, I verified there are already functional tests for concatenating structs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @asubiotto -- I think the only thing that is needed is another test for concatenating arrays without nulls.
A benchmark would be really nice too
ce9e90e
to
1c779e8
Compare
Thanks for the review @alamb. Added a test for struct concatenation with no nulls and will work on adding a benchmark in a separate PR. |
Added the benchmark in: #7520. Result of this PR:
This was around a 20% improvement without the primitive dictionary field, but I thought I might add that to the benchmark to see the improvements once #7519 is also in. |
🤖 |
🤖: Benchmark completed Details
|
has_nulls |= a.null_count() > 0; | ||
a.as_struct() | ||
}) | ||
.collect::<Vec<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collecting a.as_struct()
here into vec is not strictly needed I think?
We can use arrays.iter()
in the later loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have these as StructArrays to call column(i)
below so we might as well downcast here I think. Happy to change this if I'm missing something.
Can you also please merge/rebase this PR against main so it contains the latest benchmark? THe run that just finished doesn't seem to include the new struct concat kernel |
(I would be happy normally to do the merging / rebase but I don't have write access to the polarsignals repo) |
Struct array concatenation previously fell back to concat_fallback, which uses MutableArrayData. This is an issue because any nested types within the struct will not benefit from optimized concatenation. Dictionaries within structs, for example, would not be merged and could result in excessive memory usage. This commit adds no tests because there are already existing tests for struct concatenation.
1c779e8
to
f76014e
Compare
Thanks for the additional review and apologies for the delay. I didn't have the bandwidth to work on this until now. Comments addressed + rebased. |
Struct array concatenation previously fell back to concat_fallback, which uses MutableArrayData. This is an issue because any nested types within the struct will not benefit from optimized concatenation. Dictionaries within structs, for example, would not be merged and could result in excessive memory usage.
This commit adds no tests because there are already existing tests for struct concatenation.
Which issue does this PR close?
Closes #7516
What changes are included in this PR?
This PR adds
concat_structs
and uses it in theconcat
match statement.Are there any user-facing changes?
No