Skip to content

Clarify which column is kept in join with left_on and right_on #22659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jxu opened this issue May 8, 2025 · 2 comments
Open

Clarify which column is kept in join with left_on and right_on #22659

jxu opened this issue May 8, 2025 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@jxu
Copy link

jxu commented May 8, 2025

Description

import polars as pl
df1 = pl.DataFrame({"a": ["x","y","z"], "lk": [1,2,3]})
df2 = pl.DataFrame({"b": ["x","y","z"], "rk": [1,2,3]})

print(df1.join(df2, how="inner", left_on="lk", right_on="rk"))

Result: only lk column is kept. IIRC in SQL, in a SELECT with a join, both columns are kept around and you can choose which ones to keep with columns in SELECT. But doing a polars .select("rk") won't work because it's already gone.

Link

https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

@jxu jxu added the documentation Improvements or additions to documentation label May 8, 2025
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented May 8, 2025

FYI, you can return both columns by setting coalesce=False, though we could probably make the "coalesce" docstring even more explicit about what it means to coalesce the join columns (eg: keeps the left-hand column) 🤔

df1.join(df2, how="inner", left_on="lk", right_on="rk", coalesce=False)
# shape: (3, 4)
# ┌─────┬─────┬─────┬─────┐
# │ a   ┆ lk  ┆ b   ┆ rk  │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ i64 ┆ str ┆ i64 │
# ╞═════╪═════╪═════╪═════╡
# │ x   ┆ 1   ┆ x   ┆ 1   │
# │ y   ┆ 2   ┆ y   ┆ 2   │
# │ z   ┆ 3   ┆ z   ┆ 3   │
# └─────┴─────┴─────┴─────┘

@jxu
Copy link
Author

jxu commented May 8, 2025

I see. There's no indication that an inner join will only keep the left key, especially since inner joins are symmetric.
Incidentally, SQL has a poorly names COALESCE function that does something completely different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants