Reducing the burden of incident response on your teams

Unlock system reliability through a culture of shared responsibility and collaboration

Moderated by Kristie Baker

Speakers: Mirella Batista , Cheng Leong , Chris Evans , Spencer Norman

Register or log in to access this video

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

We have linked your account and just need a few more details to complete your registration:

First name Last name Job title Company Country

Terms and conditions I agree to the LeadDev.com terms and conditions of use

Create a password

November 01, 2023

Achieve higher availability with leaner teams by alleviating the pressure of incident management on your developers

Imagine it’s 3 am, and your phone jolts you from deep sleep with an outage alert. You fumble to decipher the urgency – is it a minor hiccup or a full-blown catastrophe? You groggily search for that runbook and deliberate over who else to wake up.

While this scenario may sound familiar, this is not a sustainable or optimal approach to incident response. With stripped back teams and fewer hands on deck, relying on just one person in the dead of night only creates stress and adds to the already-too-high developer cognitive load. Human judgement isn’t perfect at normal times, and expecting a handful of folks to shoulder all the blame for any mishap is unrealistic.

This panel of engineering leaders share how they reduce the burden of incident response for their teams. They advocate for a culture of shared responsibility across the board, offering practical strategies to educate the business about engineering practices during the chaos of an outage. Emphasizing transparency and teamwork to bolster system reliability not only eases the load on developers but also puts an end to the blame game. The result? Improved predictability, system reliability, and a boost in confidence for individual developers when the inevitable incident comes knocking.

Key takeaways

How to instil a culture of incident awareness within the wider org
Understand what a well-documented, established incident response process looks like
Learn how to adopt DevOps practices that actually breakdown silos
Learn how to minimize the impact of incidents on productivity and performance