You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
instead of a full sql translation of one-hot encoding algorithm, he was envisioning more of as a backend registered function, which probably will be more performant.
That requires SQL metaprogramming to first enumerate a list of unique values in the columns and then craft the create statement
additionally it pigeon holes you into one hot encoding having to return separate columns per value whereas many frameworks, i.e. Spark MLlib if I remember correctly, return something like a 2d vector in lieu of those columns (which can be handled much more efficiently from a memory / hardware perspective)
Some response:
When fitting a one-hot-encoder we already have to collect all the cases so they're consistent across all applications of transform.
The only difference here would be whether a one-hot-encoder should return a column-per-case or a column of an array of cases. I'd argue that since the consuming tooling will want a flat array, not special casing one-hot-encoding (for now) and returning a column-per-case is the correct approach.
Also note - ibisml already has a OneHotEncode step that does all this.
The text was updated successfully, but these errors were encountered:
There are pros and cons of representing as separate columns vs. vectors. For now, we will wait for additional signal before switching to further leverage vector abstractions. In the short term, more backends cleanly support standard column operations than vector operations, and the aforementioned SQL metaprogramming to generate CREATE statements with a lot of columns is not as big of a drawback when using Python/Ibis.
Performance considerations should be evaluated more thoroughly at a later point.
deepyaman
changed the title
feat: Alternative OHE using vectors
feat: alternative OHE using vectors
Jul 1, 2024
background:
Some response:
The text was updated successfully, but these errors were encountered: