From evals to experiments: How to ship successful AI initiatives by failing cheaply

Discover practical ways to blend qualitative evals with quantitative experiments to reduce risk, improve learning speed, and deliver impactful AI projects.

Speakers: Ryan Lucht

Register or log in to access this video

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

We have linked your account and just need a few more details to complete your registration:

First name Last name Job title Company Country

Terms and conditions I agree to the LeadDev.com terms and conditions of use

Create a password

November 14, 2025

How to use both evals and experiments in your AI development lifecycle to avoid costly mistakes and ship more successful projects.

AI initiatives are anything but a slam dunk: projects can be expensive, hard to measure, and face many failures along the way. The key to success lies in building the capability to learn quickly and fail cheaply. By going beyond upstream, qualitative evals and incorporating downstream, quantitative experiments, teams can shorten feedback loops and allow for rapid course correction. In this talk, Datadog Senior Technical Advocate Ryan Lucht will show teams how to leverage both approaches and join model performance with business metrics to ship more successful AI initiatives.

Key takeaways

Running experiments at scale with basic and advanced A/B testing approaches for AI development
How to handle the low success rate of AI initiatives by failing cheaply
The different purposes of evals and experiments – and why you need both
Using error analysis to define and measure evaluators

Slides