Skip to content

Dispatching eager to lazy causes errors to be cryptic #22596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
Juan-132 opened this issue May 4, 2025 · 1 comment
Open
2 tasks done

Dispatching eager to lazy causes errors to be cryptic #22596

Juan-132 opened this issue May 4, 2025 · 1 comment
Labels
A-exceptions Area: exception handling accepted Ready for implementation bug Something isn't working P-medium Priority: medium python Related to Python Polars

Comments

@Juan-132
Copy link

Juan-132 commented May 4, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame({
    "Column1": [1, 2],
    "Column2": ["A", "B"]
})
df.select('Collumn1')
---------------------------------------------------------------------------
ColumnNotFoundError                       Traceback (most recent call last)
Cell In[12], [line 5](vscode-notebook-cell:?execution_count=12&line=5)
      [1](vscode-notebook-cell:?execution_count=12&line=1) df = pl.DataFrame({
      [2](vscode-notebook-cell:?execution_count=12&line=2)     "Column1": [1, 2],
      [3](vscode-notebook-cell:?execution_count=12&line=3)     "Column2": ["A", "B"]
      [4](vscode-notebook-cell:?execution_count=12&line=4) })
----> [5](vscode-notebook-cell:?execution_count=12&line=5) df.select('Collumn1')

File c:\Users\name\AppData\Local\Programs\Python\Python313\Lib\site-packages\polars\dataframe\frame.py:9657, in DataFrame.select(self, *exprs, **named_exprs)
   [9557](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9557) def select(
   [9558](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9558)     self, *exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr
   [9559](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9559) ) -> DataFrame:
   [9560](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9560)     """
   [9561](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9561)     Select columns from this DataFrame.
   [9562](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9562) 
   (...)
   [9655](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9655)     └──────────────┘
   [9656](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9656)     """
-> [9657](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/dataframe/frame.py:9657)     return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)

File c:\Users\name\AppData\Local\Programs\Python\Python313\Lib\site-packages\polars\_utils\deprecation.py:93, in deprecate_streaming_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     [89](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/_utils/deprecation.py:89)         kwargs["engine"] = "in-memory"
     [91](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/_utils/deprecation.py:91)     del kwargs["streaming"]
---> [93](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/_utils/deprecation.py:93) return function(*args, **kwargs)

File c:\Users\name\AppData\Local\Programs\Python\Python313\Lib\site-packages\polars\lazyframe\frame.py:2224, in LazyFrame.collect(self, type_coercion, _type_check, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, collapse_joins, no_optimization, engine, background, _check_order, _eager, **_kwargs)
   [2222](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/lazyframe/frame.py:2222) # Only for testing purposes
   [2223](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/lazyframe/frame.py:2223) callback = _kwargs.get("post_opt_callback", callback)
-> [2224](file:///C:/Users/name/AppData/Local/Programs/Python/Python313/Lib/site-packages/polars/lazyframe/frame.py:2224) return wrap_df(ldf.collect(engine, callback))

ColumnNotFoundError: Collumn1

Resolved plan until failure:

	---> FAILED HERE RESOLVING 'sink' <---
DF ["Column1", "Column2"]; PROJECT */2 COLUMNS

Probably the most common error to make.

In VS Code working in a ipynb, the output of the traceback is so long that it's truncated.
The line: " ColumnNotFoundError: Collumn1 " is therefor hidden by default for me.

Is it possible to reduce the size of this traceback? A couple of observations:

  • Line 2222-2224: Only for testing purposes, can these be removed?
  • The information from "Resolved plan until failure .... etc, etc" doesn't help me a lot to resolve the issue. Does this provide any useful information? Can this be removed?

Log output

Issue description

Traceback for column not found is not very helpful to resolve the issue

Expected behavior

Provide a helpful traceback for the most common error

Installed versions

--------Version info---------
Polars:              1.29.0
Index type:          UInt32
Platform:            Windows-11-10.0.26100-SP0
Python:              3.13.1 (tags/v3.13.1:0671451, Dec  3 2024, 19:06:28) [MSC v.1942 64 bit (AMD64)]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               5.5.0
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
numpy                2.2.2
openpyxl             <not installed>
pandas               2.2.3
polars_cloud         <not installed>
pyarrow              20.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@Juan-132 Juan-132 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 4, 2025
@coastalwhite coastalwhite changed the title Traceback - ColumnNotFoundError is fuzzy Dispatching eager to lazy causes errors to be cryptic May 4, 2025
@coastalwhite
Copy link
Collaborator

The problem is that we dispatch many eager operations to .lazy().operation() and that causes the error message to be quite cryptic. We should probably make the error messages appear differently if they were dispatched like this.

@coastalwhite coastalwhite added accepted Ready for implementation P-medium Priority: medium and removed needs triage Awaiting prioritization by a maintainer labels May 4, 2025
@MarcoGorelli MarcoGorelli added the A-exceptions Area: exception handling label May 4, 2025
coastalwhite added a commit to coastalwhite/polars that referenced this issue May 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-exceptions Area: exception handling accepted Ready for implementation bug Something isn't working P-medium Priority: medium python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants