Inconsistency: DeepSeek-R1 defines 61 layers but model weights contain 62 #643

lichangshiwhu · 2025-04-20T10:10:24Z

I've noticed a mismatch between the model architecture and the saved weights in DeepSeek-R1 on Hugging Face. Specifically, the code defines a 61-layer model, but the checkpoint appears to include 62 layers.

Could you please clarify why 62 layers are saved? Is this intentional (e.g., for an embedding or auxiliary layer), or could it be an off-by-one issue?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency: DeepSeek-R1 defines 61 layers but model weights contain 62 #643

Inconsistency: DeepSeek-R1 defines 61 layers but model weights contain 62 #643

lichangshiwhu commented Apr 20, 2025

Inconsistency: DeepSeek-R1 defines 61 layers but model weights contain 62 #643

Inconsistency: DeepSeek-R1 defines 61 layers but model weights contain 62 #643

Comments

lichangshiwhu commented Apr 20, 2025