Skip to content

Integrate Muon optimizer #2725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
joecummings opened this issue May 13, 2025 · 0 comments
Open

Integrate Muon optimizer #2725

joecummings opened this issue May 13, 2025 · 0 comments
Labels
community help wanted We would love the community's help completing this issue enhancement New feature or request

Comments

@joecummings
Copy link
Contributor

The Muon optimizer has shown to be an efficient optimizer, potentially outpacing AdamW for LLM training. To quote Essential AI “Muon requires 10–15 % fewer tokens than AdamW to reach an identical loss and converts these savings into faster wall-clock convergence, with the advantage staying constant or growing as the batch size increases… These results establish Muon as a drop-in successor to AdamW for second-order optimization at scale.”

We'd love to accept a contribution of a canonical example of Muon in the torchtune library, specifically for our full SFT recipes (single device and multi GPU).

Artifacts

  • An implementation of the Muon optimizer as a Pytorch Optimizer
  • Any changes needed to the recipes to support the change for our feature set

Acceptance Criteria

  • Clean, well documented code with proper citations
  • Tests
  • Logs comparing Muon to AdamW for text training
  • Logs comparing Muon to AdamW for multimodal (image + text) training

Resources

@joecummings joecummings added enhancement New feature or request community help wanted We would love the community's help completing this issue labels May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community help wanted We would love the community's help completing this issue enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant