Full-Stack Software Engineer, Reinforcement Learning

Anthropic

External ApplicationSan Francisco, CAHybridFull Time

$300,000 - $405,000 / year

Posted 13 hours ago1 views

About the Role

About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role As a Full-Stack Software Engineer in RL, you'll build the platforms, tools, and interfaces that power environment creation, data collection, and training observability. The quality of Claude's next generation depends on the quality of the data we train it on — and the systems you build are what make that data possible. You'll own product surfaces end-to-end — from backend services and APIs to the web UIs that researchers, external vendors, and thousands of data labelers use every day. You don't need a background in ML research. What matters is that you can take an ambiguous, high-stakes problem and ship a polished, reliable product against it, fast. This team moves very quickly. Claude writes a lot of the code we commit, which means the bottleneck isn't typing — it's judgment, taste, and the ability to react to what researchers need next. You'll iterate on data collection strategies to distill the knowledge of thousands of human experts around the world into our models, and you'll do it in a loop that closes in hours and days, not quarters or months. What You'll Do • Build and extend web platforms for RL environment creation, management, and quality review — including environment configuration, versioning, and validation workflows • Develop vendor-facing interfaces and tooling that let external partners create, submit, and iterate on training environments with minimal friction • Design and implement platforms for human data collection at scale, including labeling workflows, quality assurance systems, and feedback mechanisms • Build evaluation dashboards and observability UIs that give researchers real-time insight into environment quality, training run health, and reward hacking • Create backend services and APIs that connect environment authoring tools, data collection systems, and RL training infrastructure • Build and expand scalable code data generation pipelines across languages and difficulty levels • Develop onboarding automation and documentation tooling so new vendors and internal users ramp up in hours, not weeks • Partner closely with RL researchers, data operations, and vendor management to translate ambiguous requirements into well-scoped, well-designed products Compensation: $300,000 – $405,000 USD annually Anthropic offers competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space. Hybrid policy: staff expected in office at least 25% of the time.

Requirements

You May Be a Good Fit If You • Have strong software engineering fundamentals and real full-stack range — comfortable owning a surface from database schema to frontend • Are proficient in Python and a modern web stack (React, TypeScript, or similar) • Have a track record of shipping systems that solved a hard problem — e.g. built the thing that made your team 10x faster • Operate with high agency: identify what needs to be done and drive it forward without waiting for a ticket • Care about UX and can build interfaces that are intuitive for both technical researchers and non-technical labelers • Communicate clearly with researchers, operations teams, and engineers • Thrive in a fast-moving environment where priorities shift • Care about Anthropic's mission to build safe, beneficial AI Strong Candidates May Also Have • Built data collection, labeling, or annotation platforms • Background building multi-tenant platforms with role-based access, audit trails, and vendor management workflows • Experience with cloud infrastructure (GCP or AWS), Docker, and CI/CD pipelines • Familiarity with LLM training, fine-tuning, or evaluation workflows • Experience with async Python (Trio, asyncio) or high-throughput API design • Background in dashboards, monitoring, or observability tooling Minimum education: Bachelor's degree or equivalent combination of education, training, and/or experience.

Similar Jobs

Applying takes you to the company's website. Udyra tracks the click but can't confirm whether you completed the application.