Research Engineer, Interpretability

Anthropic

External ApplicationSan Francisco, CAHybridFull Time

$315,000 - $560,000 / year

Posted 3 hours ago0 views

About the Role

The Interpretability team at Anthropic is working to reverse-engineer how trained models work, believing a mechanistic understanding is the most robust way to make advanced systems safe. As a Research Engineer, you'll build and maintain the specialized inference and training infrastructure that powers interpretability research, including instrumented forward/backward passes, activation extraction, and steering vector application. Salary: $315,000–$560,000 USD.

Requirements

You May Be a Good Fit If You • Have 5–10+ years of experience building software • Are highly proficient in at least one programming language (Python, Rust, Go, Java) and productive with Python • Are extremely curious about unfamiliar domains; can quickly learn and put knowledge to work • Have a strong ability to prioritize the most impactful work and operate with ambiguity • Are curious about interpretability research and its role in AI safety Strong Candidates May Also Have • Optimizing performance of large-scale distributed systems • Language modeling fundamentals with transformers • High Performance LLM optimization: memory management, compute efficiency, parallelism strategies • Working hands-on in mainstream ML stack (PyTorch/CUDA on GPUs or JAX/XLA on TPUs)

Similar Jobs

Applying takes you to the company's website. Udyra tracks the click but can't confirm whether you completed the application.

Research Engineer, Interpretability at Anthropic | Udyra | Udyra™