Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tree explain for ArrowFileSink #15206

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

irenjj
Copy link
Contributor

@irenjj irenjj commented Mar 13, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels Mar 13, 2025
@@ -1711,35 +1711,58 @@ physical_plan

query TT
explain COPY (VALUES (1, 'foo', 1, '2023-01-01'), (2, 'bar', 2, '2023-01-02'), (3, 'baz', 3, '2023-01-03'))
TO 'test_files/scratch/explain_tree/1.json';
TO '/tmp/1.json';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also print my local directory if I use a relative path.👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need to avoid printing the entire path. Let me see if I can find some way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can save the original path for display - perhaps we could save the output_url here as a field on DataSinkConfig 🤔

https://github.com/apache/datafusion/blob/db45ff3eea33c0e3ad607ce1abff266a9956ab22/datafusion/core/src/physical_planner.rs#L500-L499

FileSinkConfig {
...
  /// The unresolved URL specified by the user
  original_url: String
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb ,That's a good idea!

09)│ rows: 1 │
10)└───────────────────────────┘
03)│ -------------------- │
04)│ file:///tmp/1.json │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also possibly add format: csv ?

@@ -1711,35 +1711,58 @@ physical_plan

query TT
explain COPY (VALUES (1, 'foo', 1, '2023-01-01'), (2, 'bar', 2, '2023-01-02'), (3, 'baz', 3, '2023-01-03'))
TO 'test_files/scratch/explain_tree/1.json';
TO '/tmp/1.json';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need to avoid printing the entire path. Let me see if I can find some way

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@@ -1711,35 +1711,58 @@ physical_plan

query TT
explain COPY (VALUES (1, 'foo', 1, '2023-01-01'), (2, 'bar', 2, '2023-01-02'), (3, 'baz', 3, '2023-01-03'))
TO 'test_files/scratch/explain_tree/1.json';
TO '/tmp/1.json';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can save the original path for display - perhaps we could save the output_url here as a field on DataSinkConfig 🤔

https://github.com/apache/datafusion/blob/db45ff3eea33c0e3ad607ce1abff266a9956ab22/datafusion/core/src/physical_planner.rs#L500-L499

FileSinkConfig {
...
  /// The unresolved URL specified by the user
  original_url: String
...

@alamb
Copy link
Contributor

alamb commented Mar 13, 2025

This is so close thank you @irenjj

@github-actions github-actions bot added the proto Related to proto crate label Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate datasource Changes to the datasource crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement tree explain for ArrowFileSink
2 participants