Research Engineer, Interpretability
Anthropic
External ApplicationSan Francisco, CAHybridFull Time
$315,000 - $560,000 / year
Posted 3 weeks ago18 views
About the Role
The Interpretability team at Anthropic is working to reverse-engineer how trained models work, believing a mechanistic understanding is the most robust way to make advanced systems safe. As a Research Engineer, you'll build and maintain the specialized inference and training infrastructure that powers interpretability research, including instrumented forward/backward passes, activation extraction, and steering vector application. Salary: $315,000–$560,000 USD.
Requirements
You May Be a Good Fit If You
• Have 5–10+ years of experience building software
• Are highly proficient in at least one programming language (Python, Rust, Go, Java) and productive with Python
• Are extremely curious about unfamiliar domains; can quickly learn and put knowledge to work
• Have a strong ability to prioritize the most impactful work and operate with ambiguity
• Are curious about interpretability research and its role in AI safety
Strong Candidates May Also Have
• Optimizing performance of large-scale distributed systems
• Language modeling fundamentals with transformers
• High Performance LLM optimization: memory management, compute efficiency, parallelism strategies
• Working hands-on in mainstream ML stack (PyTorch/CUDA on GPUs or JAX/XLA on TPUs)