Lessons from 100 P0 incidents

Hard-won patterns for designing systems and leading teams to detect failures sooner, respond faster, and limit the blast radius.

Speakers: Dileep Kumar Pandiya

Register or log in to access this video

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

We have linked your account and just need a few more details to complete your registration:

First name Last name Job title Company Country

Terms and conditions I agree to the LeadDev.com terms and conditions of use

Create a password

June 03, 2026

Practical lessons from handling 100 P0 production incidents over 20 years, focusing on system design, incident response, observability, and leadership decisions that reduce impact and recovery time.

New York • September 15 & 16, 2026

Expectations are rising. See how other leaders are handling it at LDX3 New York.

Explore

Over the past 20 years, I have been directly involved in handling more than 100 P0 production incidents across large-scale, distributed systems. These incidents included full outages, severe performance degradation, and data integrity failures, often under significant time pressure and business impact.

This talk shares the most important lessons learned from being on the front line of those incidents. Rather than walking through individual war stories, it focuses on the recurring technical and organizational patterns that consistently shaped outcomes, both good and bad.

I will explain how early system design decisions, alerting quality, ownership boundaries, and operational practices influence incident severity long before anything breaks. The talk also examines what actually helps teams stay effective during high-stress incidents and what tends to increase confusion, delay recovery or lead to burnout.

Drawing from real experience as a hands-on engineer and technical leader, this session offers practical guidance on improving incident response, building more resilient systems and leading teams through P0 situations with clarity and confidence. The goal is not to eliminate incidents entirely, but to reduce their impact and help teams respond better when failures inevitably occur.

Key takeaways

Recognize recurring patterns behind high-severity production incidents.
Design systems and alerts that surface problems earlier and more clearly.
Improve incident response by reducing cognitive load during P0 events.
Run incident reviews that lead to meaningful system and process changes.
Lead teams through production failures without blame or burnout.

Slides

About the speaker

Dileep Kumar Pandiya

Dileep Pandiya is a Principal Engineer and AI strategist with over 20 years of experience building scalable, enterprise-grade systems at Fortune 500 companies including ZoomInfo, Wayfair, Walmart, and IBM.

Newsletters

Panel discussions

Videos

Reports

For you

London

Meetups

New York

Berlin

Lessons from 100 P0 incidents

Speakers: Dileep Kumar Pandiya

Register or log in to access this video

Key takeaways

About the speaker

Dileep Kumar Pandiya

London

Meetups

New York

Berlin

Lessons from 100 P0 incidents

Speakers: Dileep Kumar Pandiya

Register or log in to access this video

Key takeaways

Share:

About the speaker

Share:

More like this