Skip to content

Inconsistency: DeepSeek-R1 defines 61 layers but model weights contain 62 #643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lichangshiwhu opened this issue Apr 20, 2025 · 0 comments

Comments

@lichangshiwhu
Copy link

I've noticed a mismatch between the model architecture and the saved weights in DeepSeek-R1 on Hugging Face. Specifically, the code defines a 61-layer model, but the checkpoint appears to include 62 layers.

Could you please clarify why 62 layers are saved? Is this intentional (e.g., for an embedding or auxiliary layer), or could it be an off-by-one issue?

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant