Research Engineer, Interpretability

Anthropic

External ApplicationSan Francisco, CAHybridFull Time

$315,000 - $560,000 / year

Reposted 1 week ago37 views

About the Role

The Interpretability team at Anthropic is working to reverse-engineer how trained models work, believing a mechanistic understanding is the most robust way to make advanced systems safe. As a Research Engineer, you'll build and maintain the specialized inference and training infrastructure that powers interpretability research, including instrumented forward/backward passes, activation extraction, and steering vector application. Salary: $315,000–$560,000 USD.

Requirements

You May Be a Good Fit If You • Have 5–10+ years of experience building software • Are highly proficient in at least one programming language (Python, Rust, Go, Java) and productive with Python • Are extremely curious about unfamiliar domains; can quickly learn and put knowledge to work • Have a strong ability to prioritize the most impactful work and operate with ambiguity • Are curious about interpretability research and its role in AI safety Strong Candidates May Also Have • Optimizing performance of large-scale distributed systems • Language modeling fundamentals with transformers • High Performance LLM optimization: memory management, compute efficiency, parallelism strategies • Working hands-on in mainstream ML stack (PyTorch/CUDA on GPUs or JAX/XLA on TPUs)

Similar Jobs

Data Engineer, Analytics

Discord

On Site

Principal Engineer, Authentication

Databricks

On Site

Engineering Manager, Data Platform

Discord

On Site