Research Engineer, Interpretability
Anthropic
External ApplicationSan Francisco, CAHybridFull Time
$315,000 - $560,000 / year
Posted 3 hours ago0 views
Applying takes you to the company's website. Udyra tracks the click but can't confirm whether you completed the application.
About the Role
The Interpretability team at Anthropic is working to reverse-engineer how trained models work, believing a mechanistic understanding is the most robust way to make advanced systems safe. As a Research Engineer, you'll build and maintain the specialized inference and training infrastructure that powers interpretability research, including instrumented forward/backward passes, activation extraction, and steering vector application. Salary: $315,000–$560,000 USD.
Requirements
You May Be a Good Fit If You
• Have 5–10+ years of experience building software
• Are highly proficient in at least one programming language (Python, Rust, Go, Java) and productive with Python
• Are extremely curious about unfamiliar domains; can quickly learn and put knowledge to work
• Have a strong ability to prioritize the most impactful work and operate with ambiguity
• Are curious about interpretability research and its role in AI safety
Strong Candidates May Also Have
• Optimizing performance of large-scale distributed systems
• Language modeling fundamentals with transformers
• High Performance LLM optimization: memory management, compute efficiency, parallelism strategies
• Working hands-on in mainstream ML stack (PyTorch/CUDA on GPUs or JAX/XLA on TPUs)
Similar Jobs
Applying takes you to the company's website. Udyra tracks the click but can't confirm whether you completed the application.