-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saner handling of nulls inside arrays #15149
base: main
Are you sure you want to change the base?
Conversation
5fe5bf3
to
57d105b
Compare
Thank you @joroKr21 ! Perhaps @Dandandan or @thinkharderdev or @avantgardnerio could help review this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me. I'm unclear on the coercion to variable length arrays though
ArrayFunctionArgument::Element, | ||
ArrayFunctionArgument::Array, | ||
], | ||
array_coercion: Some(ListCoercion::FixedSizedListToList), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh I'm not entirely clear on how the coercion comes into play but an append to a fixed size list would still be a fixed size list right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a type signature PoV yes, it's possible to infer that the return type should also be a fixed-size list with size n + 1
but the implementation itself is not able to handle it:
pub(crate) fn array_append_inner(args: &[ArrayRef]) -> Result<ArrayRef> {
let [array, _] = take_function_args("array_append", args)?;
match array.data_type() {
DataType::LargeList(_) => general_append_and_prepend::<i64>(args, true),
_ => general_append_and_prepend::<i32>(args, true),
}
}
It's unclear to me how an implementation for FixedSizeList
would look like and I would say probably it's out of scope for this PR, it should be separate.
57d105b
to
0d49f0d
Compare
Also added a commit to fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Which issue does this PR close?
NULL
as an element #7142Rationale for this change
#10790 changed empty arrays to default to
Int64
but that is just a band-aid which creates other problems downstream, e.g. type errors when nesting expressions. It also introduced a bug (and possible panic) inarray_union
andarray_intersect
which are fixed here.Apart from that, from a type-theoretical PoV empty arrays should indeed be typed as
Array<Null>
becauseNull
is coercible to any other type and by extension (immutable arrays are covariant)Array<Null>
is coercible to any other array type.The crux of the problem is that Arrow is incapable of dealing with the mismatching array types that arise from this encoding. To workaround this limitation of Arrow, we unify array function argument types in advance and coerce them to the common type. In addition, we handle
DataType::Null
explicitly in Array functions for the case when all arguments are empty arrays or nulls.What changes are included in this PR?
We use the existing
type_union_resolution
type unification algorithm to handle array function arguments. This is a more robust and consistent approach as opposed to implementing custom coercion rules. Depending on the function we unify either the base types (because some functions work on arrays of arbitrary dimensions) or the array / element types. Then we coerce all arguments to the common type. We still need to handle the edge case when all arguments are nulls or empty arrays explicitly in the array function implementations.Are these changes tested?
Yes, I have amended the
array.slt
tests accordingly.Are there any user-facing changes?
Yes, there are some changes in array function semantics that can be seen in the updated tests:
array_union
andarray_intersect
(which are now also commutative)array_position(Array<Int64>, Utf8)
)I also extended the public API of
Signature
with more helper methods and by default it coerces fixed-size lists to non-fixed-size lists because most array functions don't preserve the arity of lists (e.g. append, prepend, concat, union, intersection, etc.)