I’m Lucia, a neural network interpretability researcher interested in unsupervised methods and the aspects of human intelligence that evade description. I became interested in machine learning research after reading a blog post about superintelligence during highschool.
I like to train strange sparse autoencoders, most recently binary TopK autoencoders (BAEs) and TopK SAEs trained on backward pass gradients. I’m interested in using the board game Diplomacy as a testbed to study the extent to which interpretability tools provide strategic value in multi-agent environments. I also have active projects in the fields of data attribution and tamper-resistant unlearning.
Github: https://github.com/luciaquirke
Twitter: https://twitter.com/lucia_quirke