-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix invalid schema for unions in ViewTables #15135
base: main
Are you sure you want to change the base?
Conversation
I'm not sure if there are still valid uses of pub fn coerce_union_schema(inputs: &[Arc<LogicalPlan>]) -> Result<DFSchema> {
coerce_union_schema_with_schema(inputs, inputs[0].schema())
}
fn coerce_union_schema_with_schema(inputs: &[Arc<LogicalPlan>], base_schema: &DFSchemaRef) -> Result<DFSchema> {
...
} |
There seems to have been quite a few cracks at this particular situation:
The trouble seems to stem from the decision to maintain |
Maybe @jonahgao can offer an opinion on this suggestion |
@@ -776,8 +777,32 @@ impl LogicalPlanBuilder { | |||
&missing_cols, | |||
is_distinct, | |||
)?; | |||
|
|||
let mut sort_exprs = normalize_sorts(sorts, &plan)?; | |||
if matches!(&plan, LogicalPlan::Union(_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? It seems to make the following invalid query executable.
DataFusion CLI v46.0.0
> create table t1(a int);
0 row(s) fetched.
Elapsed 0.007 seconds.
> create table t2(a int);
0 row(s) fetched.
Elapsed 0.007 seconds.
> select t1.a from t1 union all select t2.a from t2 order by not_exist_table.a;
+---+
| a |
+---+
+---+
0 row(s) fetched.
Elapsed 0.008 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it feels strange and I wouldn't mind getting rid of this, but I put it in for backwards compatibility, because currently if you ran:
select t1.a from t1 union all select t2.a from t2 order by t1.a
it would work. But after this PR, it would fail without this pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we stripped qualifiers, it should be reasonable that order by t1.a
fails. I tested this behavior in PostgreSQL, and it works that way.
psql=> select a from t1 union all select a from t2 order by t1.a;
ERROR: missing FROM-clause entry for table "t1"
LINE 1: select a from t1 union all select a from t2 order by t1.a;
^
psql=> select a from t1 union all select a from t2 order by a;
a
---
(0 rows)
Moreover, it seems like a bug to me if order by not_exist_table.a
executes successfully.
Which issue does this PR close?
Rationale for this change
At the core, if a
Union
goes through theTypeCoercion
analyzer twice, the schema can drop the qualifier information for fields that were cast in the first pass ofTypeCoercion
. To preserve this information, this PR changescoerce_union_schema
to start from the schema of theUnion
instead of the schema of the first child.Are these changes tested?
Yes
Are there any user-facing changes?
Yes, this changes the signature of
coerce_union_schema
so that the schema carried by theUnion
is available.Before:
After: