[Feature Request] Add Liger CE Loss #2692

pbontrager · 2025-05-07T16:26:45Z

Add a new loss in the cross_entropy_loss.py file that inherits from SFT loss but calls the Liger fused_linear_cross_entropy loss. It will need to handle if the input is a DTensor and convert it before calling the liger loss.

Edge Case: if the model output is a tied embedding and TP sharded (DTensor). Then either we'll have to unshard and then reshard the weight every step, or throw an error for that case. (This assumes that liger losses don't work with sharded weights)

A good validation of this feature would be to see if this loss even further improves the numbers here over compiled linear cross entropy loss.

mananchawla2005 · 2025-05-10T13:53:41Z

I wanna work on this one

joecummings · 2025-05-11T14:09:55Z

I wanna work on this one

Thanks, feel free to take it on! Please tag me and @pbontrager on the PR.

pbontrager added the community help wanted We would love the community's help completing this issue label May 7, 2025

pbontrager mentioned this issue May 7, 2025

Update ReadMe Optimization Flags #2691

Open

mananchawla2005 linked a pull request May 16, 2025 that will close this issue

Add feature ligerceloss #2741

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add Liger CE Loss #2692

[Feature Request] Add Liger CE Loss #2692

pbontrager commented May 7, 2025

mananchawla2005 commented May 10, 2025

joecummings commented May 11, 2025

[Feature Request] Add Liger CE Loss #2692

[Feature Request] Add Liger CE Loss #2692

Comments

pbontrager commented May 7, 2025

mananchawla2005 commented May 10, 2025

joecummings commented May 11, 2025