feat: Change SQL-Explode/UNNEST to Dataframe.explode method #22546

Felix-Blom · 2025-05-01T12:03:20Z

Summary

Currently, using UNNEST (which corresponds to explode in Polars) within sqlContext relies on the Expr.explode() method, which does not preserve the row-wise mapping between the exploded list and the other columns in the DataFrame. As a result, attempting to UNNEST a list column alongside another column (e.g., sort_key) does not yield the expected exploded shape and leads to a shape mismatch error when trying to align non-list columns.

Example

import polars as pl

df = pl.DataFrame(
    {
        "list_long": [[1, 2, 3], [4, 5, 6]],
        "sort_key": [2, 1],
    }
)

print(df.sql("SELECT UNNEST(list_long), sort_key FROM self"))

Old behaviour

polars.exceptions.ShapeError: Series length 2 doesn't match the DataFrame height of 6

New behaviour


shape: (6, 2)
┌───────────┬──────────┐
│ list_long ┆ sort_key │
│ ---       ┆ ---      │
│ i64       ┆ i64      │
╞═══════════╪══════════╡
│ 1         ┆ 2        │
│ 2         ┆ 2        │
│ 3         ┆ 2        │
│ 4         ┆ 1        │
│ 5         ┆ 1        │
│ 6         ┆ 1        │
└───────────┴──────────┘

Solution

Changed the sql unnest to use the DataFrame.explode() instead of the Expr.explode()/List.explode() method.

codecov · 2025-05-01T12:16:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.89%. Comparing base (716c902) to head (d1a0fbc).
Report is 10 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #22546      +/-   ##
==========================================
- Coverage   80.93%   80.89%   -0.04%     
==========================================
  Files        1651     1656       +5     
  Lines      233014   234059    +1045     
  Branches     2752     2752              
==========================================
+ Hits       188599   189352     +753     
- Misses      43754    44047     +293     
+ Partials      661      660       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

feat: Change SQL-Explode to Dataframe.explode method

Felix-Blom · 2025-05-09T18:29:36Z

@alexander-beedie Based on previous commits, this seems like something right up your alley. Still learning, so I wanted to check a couple of my assumptions and lay them out to you (and others):

Regarding the ORDER BY tests in Python: the ordering now happens after the EXPLODE/UNNEST statement, which seems logically correct to me. However, I'm not entirely confident about any downstream implications this might have—does anything come to mind? The previous NULL checks seem redundant to me now, and the new behavior seems more in line with what i would expect to happen, but maybe my assumptions are wrong.

I did test some of my assumptions in other sql tools, which showed similar results to my expectations!

import duckdb
import pandas as pd

conn = duckdb.connect(database=":memory:")
df = pd.DataFrame.from_dict({"list_long": [[1, 2, 3], [4, 5, 6]], "sort_key": [2, 1]})

results = duckdb.sql("SELECT UNNEST(list_long) FROM df").df()
results2 = duckdb.sql("SELECT UNNEST(list_long), sort_key FROM df").df()
results3 = duckdb.sql(
    "SELECT UNNEST(list_long), sort_key FROM df ORDER BY sort_key"
).df()
print(results3)

Curious on your thoughts! Thanks in advance!

Felix-Blom requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa and orlp as code owners May 1, 2025 12:03

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels May 1, 2025

Felix-Blom force-pushed the main branch 6 times, most recently from 37b7864 to 56abf91 Compare May 3, 2025 18:17

feat: Change SQL-Explode to Dataframe.explode method

d1a0fbc

feat: Change SQL-Explode to Dataframe.explode method

Felix-Blom force-pushed the main branch from 56abf91 to d1a0fbc Compare May 3, 2025 18:20

Felix-Blom changed the title ~~feat: Change SQL-Explode to Dataframe.explode method~~ feat: Change SQL-Explode/UNNEST to Dataframe.explode method May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Change SQL-Explode/UNNEST to Dataframe.explode method #22546

feat: Change SQL-Explode/UNNEST to Dataframe.explode method #22546

Felix-Blom commented May 1, 2025 •

edited

Loading

codecov bot commented May 1, 2025 •

edited

Loading

Felix-Blom commented May 9, 2025

feat: Change SQL-Explode/UNNEST to Dataframe.explode method #22546

Are you sure you want to change the base?

feat: Change SQL-Explode/UNNEST to Dataframe.explode method #22546

Conversation

Felix-Blom commented May 1, 2025 • edited Loading

Summary

Example

Solution

codecov bot commented May 1, 2025 • edited Loading

Codecov Report

Felix-Blom commented May 9, 2025

Felix-Blom commented May 1, 2025 •

edited

Loading

codecov bot commented May 1, 2025 •

edited

Loading