Skip to content

GH-46222: [Python] Allow to specify footer when opening IPC file for writing #46354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chaubold
Copy link

@chaubold chaubold commented May 7, 2025

Rationale for this change

The Arrow APIs for other languages offer to set the footer metadata of an IPC file, but pyarrow was lacking this. I opened #46222 because we need this feature at KNIME, and took a shot at suggesting an implementation. Happy to get your feedback on the change!

What changes are included in this PR?

A new keyword argument metadata in RecordBatchFileWriter.__init__ as well as in ipc.new_file. The value is None by default to not break backwards compatibility.

Also added a metadata property to the RecordBatchFileReader to be able to extract the metadata easily.

Are these changes tested?

Yes, by a unit test.

Are there any user-facing changes?

Yes, see above :)

@chaubold chaubold requested review from AlenkaF, raulcd and rok as code owners May 7, 2025 16:12
Copy link

github-actions bot commented May 7, 2025

⚠️ GitHub issue #46222 has been automatically assigned in GitHub to PR creator.

@chaubold chaubold changed the title GH-46222: [Python] Allow to specify footer when opening IPC file for … GH-46222: [Python] Allow to specify footer when opening IPC file for writing May 7, 2025
@chaubold chaubold force-pushed the GH-46222-allow-set-ipc-footer-pyarrow branch 2 times, most recently from 17c2586 to af34416 Compare May 7, 2025 16:16
@rok
Copy link
Member

rok commented May 7, 2025

Hey @chaubold ! Thanks for doing this, PR looks pretty straightforward, I can take a look at it tomorrow.

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for an issue with the docstring (see comment) this looks good to me. It would be nice to get another quick look @raulcd ?
The gdb issue (AMD64 Conda Python 3.10 Without Pandas) seems unrelated.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 8, 2025
@chaubold chaubold force-pushed the GH-46222-allow-set-ipc-footer-pyarrow branch from af34416 to e139f66 Compare May 9, 2025 07:25
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 9, 2025
@chaubold chaubold force-pushed the GH-46222-allow-set-ipc-footer-pyarrow branch 2 times, most recently from b23205b to 990ab60 Compare May 9, 2025 09:45
@chaubold chaubold force-pushed the GH-46222-allow-set-ipc-footer-pyarrow branch from 990ab60 to 0a62797 Compare May 9, 2025 10:12
@chaubold
Copy link
Author

chaubold commented May 9, 2025

Sorry for all the iterations here. But now the docstring complains are resolved and only the gdb issue remains.

@rok
Copy link
Member

rok commented May 9, 2025

Thanks for iterating @chaubold ! I think this is done, I'd just like to @AlenkaF or @raulcd to do another pass before merging.

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels May 9, 2025
@chaubold chaubold force-pushed the GH-46222-allow-set-ipc-footer-pyarrow branch from 56c834c to bd7c065 Compare May 9, 2025 13:45
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor linting issue

buffer = sink.getvalue()
with pa.ipc.open_file(buffer) as r:
assert r.metadata == meta

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add two empty lines to fix the linter failure?

Suggested change

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants