You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed a mismatch between the model architecture and the saved weights in DeepSeek-R1 on Hugging Face. Specifically, the code defines a 61-layer model, but the checkpoint appears to include 62 layers.
Could you please clarify why 62 layers are saved? Is this intentional (e.g., for an embedding or auxiliary layer), or could it be an off-by-one issue?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
I've noticed a mismatch between the model architecture and the saved weights in DeepSeek-R1 on Hugging Face. Specifically, the code defines a 61-layer model, but the checkpoint appears to include 62 layers.
Could you please clarify why 62 layers are saved? Is this intentional (e.g., for an embedding or auxiliary layer), or could it be an off-by-one issue?
Thanks in advance!
The text was updated successfully, but these errors were encountered: