Skip to content

ValueError when calling from_arrow on iterable with empty pyarrow.Table #22634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
mateuszpoleski opened this issue May 6, 2025 · 0 comments
Open
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@mateuszpoleski
Copy link

mateuszpoleski commented May 6, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import pyarrow as pa

# works, single empty pa.table
pl.from_arrow(
    pa.table({
        "id": pa.array([], type=pa.int64())
    })
)

# works, iterable with non empty pa.table
pl.from_arrow(
    (pa.table({
        "id": pa.array([1], type=pa.int64())
    }),)
)

# failes, iterable with empty pa.table
# ValueError: Must pass schema, or at least one RecordBatch
pl.from_arrow(
    (pa.table({
        "id": pa.array([], type=pa.int64())
    }),)
)

Log output

Traceback (most recent call last):
  File "C:\Users\mateu\Desktop\file.py", line 22, in <module>
    pl.from_arrow(
  File "C:\Users\mateu\anaconda3\Lib\site-packages\polars\convert\general.py", line 568, in from_arrow
    pa_table = pa.Table.from_batches(
               ^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow\table.pxi", line 3848, in pyarrow.lib.Table.from_batches
ValueError: Must pass schema, or at least one RecordBatch

Issue description

It seems that pl.from_arrow does not handle input provided as an iterable properly. It appears to fail when given an empty pyarrow.Table

I think the error comes from

pa_table = pa.Table.from_batches(
itertools.chain.from_iterable(
(b.to_batches() if isinstance(b, pa.Table) else [b]) for b in data
)
)

If we call .to_batches() on empty pyarrow.Table, it returns an empty list, and we lose the schema/type information because there are no batches at all.

Expected behavior

Return empty DataFrame with preserved schema.

Installed versions

--------Version info---------
Polars:              1.29.0
Index type:          UInt32
Platform:            Windows-10-10.0.26100-SP0
Python:              3.11.3 | packaged by Anaconda, Inc. | (main, Apr 19 2023, 23:46:34) [MSC v.1916 64 bit (AMD64)]
LTS CPU:             False

----Optional dependencies----
Azure CLI            'az' is not recognized as an internal or external command,
operable program or batch file.
<not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.24.28
cloudpickle          2.2.1
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2023.3.0
gevent               <not installed>
google.auth          2.22.0
great_tables         <not installed>
matplotlib           3.7.1
numpy                1.24.3
openpyxl             3.0.10
pandas               1.5.3
polars_cloud         <not installed>
pyarrow              11.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           1.4.39
torch                2.3.0
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@mateuszpoleski mateuszpoleski added bug Something isn't working python Related to Python Polars needs triage Awaiting prioritization by a maintainer labels May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant