Automation and closer collaboration with customer service teams can help your organization absorb any new shocks 2023 brings.
The past year has been a rollercoaster for developers and their employers. Just as the world appeared to be exiting one crisis, another is emerging, as the global economy teeters on recession. As potential redundancies and burnout add to the workload for stretched developer teams, organizations must find a way to empower technical teams to work smarter.
For companies that have gone all-in on digital, stress levels are rising as incident volumes mount and customer loyalty gets harder to gain and retain. Automation will help technical teams get ahead of incidents before customers are impacted, in some cases without even needing to get hands on. When combined with closer collaboration with customer service teams, it could herald an agile, productive start to 2023.
Feeling the pressure
Downtime is a death sentence for digital-first companies. Research reveals that a single hour offline can cost anything from $100,000 up to $5 million. That puts tremendous pressure on incident responders to fix issues before customers realize something is wrong. The pandemic threw an uncompromising spotlight on the pressure digital operations teams, including developers, were being put under to keep their organization afloat. As more digital services launched, inevitably more failures started impacting customers. Additional research highlights a 19% increase in critical incidents between 2019 and 2020.
In many ways, that pressure has not receded. From banking to retail, digital is the new normal, with profound implications for IT ops teams. Add the Great Resignation into the mix, alongside team members lost to burnout, and the dangers of a vicious circle in incident response start to build. Over half (54%) of responders were interrupted outside of normal hours in 2021, and over two-fifths (42%) said they worked more hours than the previous year. Those remaining are forced to pick up extra work, but then burn out themselves, and the cycle repeats. As a global recession looms, potential job losses will only accelerate and deepen the cycle, but not all hope is lost.
The automation challenge
Against this volatile backdrop, automation holds the key to optimizing developer productivity, whilst ensuring customer expectations are met and incidents are resolved rapidly and proactively. Automation enhances digital operations by reducing manual toil, enabling repetitive incidents to be resolved without human intervention, and empowering developer subject matter experts (SMEs) to work efficiently to fix issues.
Event orchestration platforms apply logic and automation to figure out what to do with each alert in real time. Features like intelligent alert grouping, alert suppression, and noise reduction enable developers to focus only on high priority incidents. Runbook automation ensures the right job gets delivered to the right SME at the right time, along with any diagnostic information so that they can hit the ground running and streamline post-mortem investigations. Automation not only drives productivity and customer satisfaction, it can also support compliance, by ensuring processes are worked through in the same, consistent manner every time, in line with best practices.
Yet there are also challenges. Automation capabilities are often the preserve of only a few SMEs. This is self-defeating if it means that first responders from customer service teams are forced to escalate tickets to developers, even for recurring issues. These SMEs must then start from scratch running diagnostics – taking up more of their valuable time that should be spent elsewhere.
Another challenge is there is a lot of fragmented automation based on siloed expertise. This can be hard to unify and leverage without ripping and replacing it. A lack of intelligent automation can also lead to alert overload, which threatens to further impact developer productivity. In some cases, organizations are forced to bring in a second responder simply to acknowledge notifications, because the first one is too busy fixing problems.
Digital maturity and automation
Getting to a place where automation is second nature will require organizations to move through the gears during 2023 and beyond. That means getting away from a “manual” phase where incident handling is slow, workflows are queued, and ticketing systems dominate; with no efficient way to reach SMEs. The next stage up on the maturity curve is a “reactive” mode where distributed teams struggle to share knowledge and there are still no established, defined processes for managing issues.
Half of organizations (50%) currently fall into the ‘responsive’ category – which means they’re starting to use automation to help with issue identification and resolution, and to mobilize the right experts. However, the goal should be achieving either a ‘proactive’ or ‘preventative’ state. The former means organizations are finding issues before their customers and using automation to aid in response actions and communication. Preventative teams take things one stage further, by using automated, machine learning-powered workflows to pre-emptively find and remediate issues. Automated processes are everywhere in these companies, eliminating escalations to developers and manual toil.
Reaching out across the organization
Achieving digital maturity shouldn’t just be about putting advanced capabilities in the hands of developers. It should democratize intelligent automation throughout the organization, to ultimately take the strain off technical teams.
Empowering customer service agents is a great example. Runbook automation can be deployed to trigger diagnostic and remediation actions, like a server restart, for repetitive incidents. Then automated workflows should enable them to easily escalate to the right SME for more complex issues. With this approach, fewer incidents will be referred to SMEs, and those that do will come with contextual customer info automatically added, in order to shorten response times further.
Bi-directional communication between developers and customer service operations (CSOps) teams is important here. This ensures SMEs can easily share updates with customer service agents, who in turn can keep customers informed about resolution timeframes. The latter point is critical to overall satisfaction levels: customers will often be understanding of a service outage as long as they are regularly updated.
New automation trends for 2023
That’s why we’re likely to see AIOps take giant strides in 2023. It will become a force multiplier for enhanced decision making that helps businesses automate away mundane, time-consuming work and allow their developer teams to shine.
To make this a reality next year, focus first on consolidating islands of automation – standardizing and optimizing incident response by breaking down silos of data held across third-party solutions.
In doing so, organizations will also find the barriers between their CSOps, developers and other teams start to dissolve. That will be priceless at a time when operational efficiency emerges as a key competitive advantage.