Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve on the restriction on X and y not to have same columns #179

Open
vspinu opened this issue Mar 6, 2025 · 0 comments
Open

Comments

@vspinu
Copy link

vspinu commented Mar 6, 2025

Currently if X and y have common columns the error ValueError: Xandy must not share column names is thrown.

Would it be possible possible to check for common columns in X and y after the recipe has been applied?

Given that Drop and Select would be there, It would make more sense to enforce no column columns after the pipeline has processed, not before.

    import pandas as pd
    import ibis
    import ibis_ml as ml
    con = ibis.duckdb.connect()
    df = pd.DataFrame({
        'cat1': ['AA', 'BBB', 'AA', 'BBB', 'CCC'],
        'cat2': ['X', 'Y', 'Y', 'X', 'Z'],
        'value': [10, 20, 30, 40, 50]
    })
    tbl = con.create_table("tmp", df, overwrite=True)

    tr_oe = ml.Recipe(
        ml.OrdinalEncode(ml.string(), min_frequency=2),
        ml.Drop("value")
    ).fit(tbl, tbl.value)
   # ValueError: `X` and `y` must not share column names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: backlog
Development

No branches or pull requests

1 participant