The goal of this project is to explore small NN architectures that excel at chess. In contrast, Stockfish uses an NN architecture that sacrifices model capacity for extreme speed, while Leela uses a powerful but slow NN that is infeasible to train on consumer hardware.
With FishBrain, I explore the middle ground - fast NNs with no architectural sacrifices but small model size. I believe this is a viable approach to create a competitive chess engine at home, once tree search is also implemented. I hope to release a write-up about FishBrain's architecture and training soon.
I finished the first NN (see v0_legacy) in 2024 and it works okay. The NN is much smaller than DeepMind's and achieves a Blitz Elo of about 1800. Some of the code is "research quality" and it lacks an interface for users. Maybe just wait for the next version.
I am in the process of updating and reworking the dataset. It's going to be 2.5x bigger and a friendlier format.
Old dataset: HuggingFace dataset. The data is extracted from the lichess.org open database and contains all games from 2023 for which Stockfish evaluations were available. It's easy to use with the HuggingFace dataloader, but I'm unhappy with the dependency on zstd. It will be much better in the future.
- Leela has produced enormous amounts of data of very high quality. Ideally, I want to extract as much of this as I can into a deduplicated dataset of FEN positions.
- FishBrain should be an MoE, because it consumes little memory due to its small size. This means, we can scale it up to MoE virtually for free.
- Quantization: maybe FP4 for acceleration on Nvidia 50xx series?
- Pruning: train a very deep model and prune it in depth?