Pivotal Token Search
-
Updated
May 17, 2025 - Python
Pivotal Token Search
Adversarial Manipulation of CoT
Analysed determinism, faithfulness, reasoning patterns, & steering. Developed and tested methods to enhance control and fail-safes
Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.
My AI interpretability research journey
Add a description, image, and links to the mech-interp topic page so that developers can more easily learn about it.
To associate your repository with the mech-interp topic, visit your repo's landing page and select "manage topics."