-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix invalid schema for unions in ViewTables #15135
base: main
Are you sure you want to change the base?
Conversation
I'm not sure if there are still valid uses of pub fn coerce_union_schema(inputs: &[Arc<LogicalPlan>]) -> Result<DFSchema> {
coerce_union_schema_with_schema(inputs, inputs[0].schema())
}
fn coerce_union_schema_with_schema(inputs: &[Arc<LogicalPlan>], base_schema: &DFSchemaRef) -> Result<DFSchema> {
...
} |
There seems to have been quite a few cracks at this particular situation:
The trouble seems to stem from the decision to maintain |
Maybe @jonahgao can offer an opinion on this suggestion |
@@ -776,8 +777,32 @@ impl LogicalPlanBuilder { | |||
&missing_cols, | |||
is_distinct, | |||
)?; | |||
|
|||
let mut sort_exprs = normalize_sorts(sorts, &plan)?; | |||
if matches!(&plan, LogicalPlan::Union(_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? It seems to make the following invalid query executable.
DataFusion CLI v46.0.0
> create table t1(a int);
0 row(s) fetched.
Elapsed 0.007 seconds.
> create table t2(a int);
0 row(s) fetched.
Elapsed 0.007 seconds.
> select t1.a from t1 union all select t2.a from t2 order by not_exist_table.a;
+---+
| a |
+---+
+---+
0 row(s) fetched.
Elapsed 0.008 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it feels strange and I wouldn't mind getting rid of this, but I put it in for backwards compatibility, because currently if you ran:
select t1.a from t1 union all select t2.a from t2 order by t1.a
it would work. But after this PR, it would fail without this pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we stripped qualifiers, it should be reasonable that order by t1.a
fails. I tested this behavior in PostgreSQL, and it works that way.
psql=> select a from t1 union all select a from t2 order by t1.a;
ERROR: missing FROM-clause entry for table "t1"
LINE 1: select a from t1 union all select a from t2 order by t1.a;
^
psql=> select a from t1 union all select a from t2 order by a;
a
---
(0 rows)
Moreover, it seems like a bug to me if order by not_exist_table.a
executes successfully.
|
||
// Generate unique field name | ||
let name = if let Some(count) = name_counts.get_mut(&base_name) { | ||
*count += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me👍. But it doesn't work for this case:
> create table t1(a int);
0 row(s) fetched.
Elapsed 0.005 seconds.
> create table t2(a int);
0 row(s) fetched.
Elapsed 0.007 seconds.
> select * from t1,t2 union all select * from t1,t2;
type_coercion
caused by
Schema error: Schema contains duplicate unqualified field name
The above query works on the main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like TypeCoercion
undoes the renaming efforts here.
Maybe TypeCoercion
should only modify the data types of union inputs, rather than field names 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, because the union schema is discarded during coercion. My initial attempt to fix this issue propagated the union schema through, but I removed it when I decided not to maintain qualifiers, but I think adding that back would resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 612c217
Which issue does this PR close?
Rationale for this change
At the core, if a
Union
goes through theTypeCoercion
analyzer twice, the schema can drop the qualifier information for fields that were cast in the first pass ofTypeCoercion
. To preserve this information, this PR changescoerce_union_schema
to start from the schema of theUnion
instead of the schema of the first child.Are these changes tested?
Yes
Are there any user-facing changes?
Yes, this changes the signature of
coerce_union_schema
so that the schema carried by theUnion
is available.Before:
After: