You have 1 article left to read this month before you need to register a free LeadDev.com account.
Your inbox, upgraded.
Receive weekly engineering insights to level up your leadership approach.
Estimated reading time: 6 minutes
The practice of LLM-assisted coding is on the rise. How is it affecting site reliability engineering (SRE) teams?
AI-assisted coding is booming. Microsoft’s CEO revealed that AI now writes up to 30% of their code, comparable to Google’s reported 25% back in 2024. Among younger companies, the numbers can be even higher; Y Combinator shared that a quarter of the winter 2025 startup batch had 95% of their codebases generated by AI. The rapid rise of Cursor – an AI-powered code editor – shows how quickly the tech industry is adopting these products.
Developers can now write and ship code faster than ever, but acceleration does not equate to reliability – in some cases, it’s quite the opposite.
The problem with LLM-assisted coding and its impact on SRE
LLM-assisted coding can be a great driver of developer productivity, improving their ability to write and deploy code faster. However, increased deployment frequency can lead to instability unless strong automation workflows and monitoring are in place. Reliability teams should use metrics like the DORA change failure rate (CFR) to catch problems early.
AI-assisted coding also empowers developers to write more code than necessary. This extra, often verbose, code increases batch size and ultimately impacts reliability, making debugging harder when an incident arises.
What’s more, code generated by LLMs is typically functional, yet often flaky and prone to errors. While CI/CD pipelines are supposed to catch issues through unit tests, integration tests, static analysis, and peer reviews, none achieve exhaustive code coverage, so some issues inevitably slip through.
As developers increasingly rely on LLMs, they will spend less time reasoning through the code and more time approving AI suggestions, reducing their familiarity with the codebase. This shift will also shrink the pool of subject matter experts, those essential team members who know system internals, dependencies, and behavior very well.
The implications are even more pronounced for junior developers who may not have the knowledge of what reliable code is yet. Unlike senior developers, junior engineers can’t use their experience to bake best practices into LLM-suggested code. They could even be deprived of the chance to acquire it through hands-on learning.
While developers may celebrate shipping faster, having fewer domain experts combined with an increased number of production incidents spells trouble for SRE teams. More incidents will strain these teams, and fewer experts mean less deep knowledge available for quick diagnosis. In an era where engineering teams are expected to accomplish more with less due to AI adoption, there’s a significant chance there won’t be a budget for additional incident responder hiring, or even worse, a reduction of the existing workforce.
More like this
Is AI the solution to AI-related issues?
LLM-assisted development is pushing software code writing beyond the limits of human speed toward machine speed – or at the very least, a hybrid pace where engineers and AI work side by side. There’s no turning back from this shift, putting SRE teams in a position where “if you can’t beat them, join them.” Simply put, only AI-powered tools can effectively handle the scale and pace of issues introduced by AI-driven development.
Modern incident management platforms increasingly use LLMs to automate time-consuming, manual incident management tasks. By ingesting incident-related Slack channels and Zoom call transcripts, LLM-powered chatbots can quickly bring first responders up to speed or provide succinct executive updates. This same data can be used to generate initial incident reports and analyses.
But a new generation of LLM-powered tools is emerging that can significantly enhance SRE teams – AI SREs. AI SREs can quickly process thousands of datapoints, including system metrics, code changes, logs, and traces, to identify potential issues. They can react to incident alerts, evaluate impact, pinpoint the issue, and recommend actions to resolve it.
AI SRE tools aren’t here to replace human engineers – they exist to augment your team, empowering them to resolve incidents faster and more efficiently.
Putting the necessary precautions in place is important, as LLMs will make mistakes, as humans do. That’s why AI SRE tools should be used in a “read-only” mode at first, letting them surface issues and recommend fixes without being able to act on those suggestions. An AI SRE that provides a confidence percentage alongside its recommendations enables responders to assess the robustness of the proposed solutions quickly. For example, if the system suggests that a misconfiguration has a 92% likelihood of causing an outage, then the diagnostic should be taken seriously. If the percentage is too low, ignore it and use your traditional investigating methods.
When you are ready to trust it with more active responsibilities and deploy fixes, ensure their actions go through the same flow of established guardrails we use for human-generated code, such as code reviews, CI/CD, and canary deployment.
What to watch for when using an AI SRE
With great power comes great responsibility. Because of how deeply integrated these tools need to be and the impact they can have on an engineering organization, there are a few things to watch out for when adopting them.
First, the security of company data. AI SREs need to access critical information, including your codebase, knowledge base, monitoring, and logging tools. Technical leaders must understand how their data is being used: where it is transferred, where it is stored, and how securely it is handled, and ensure that it complies with their security standards.
The more data LLMs are trained on, the better they become. Software vendors are eager to use any available data to improve their generative AI solutions, but this presents a cause for concern. Similar to how Samsung reportedly experienced data leaks involving ChatGPT, an AI SRE could be trained on your proprietary data, which could then leak to competitors or compromise your security.
Alongside security, cost is another critical consideration. Generative AI-based systems can be resource-intensive and expensive to operate. Unchecked usage can quickly lead to significant expenses. Applications built on LLMs, such as AI SRE tools, typically charge based on the number of tokens consumed, each representing a piece of analyzed data (word, datapoint, etc) and output result (generated by the model). The more data sent for processing and the more the model generates output, the higher the overall cost. SRE teams working within tight budgets should carefully evaluate when to invoke the AI, especially for trivial incidents that don’t need analysis, or complex incidents where repeated AI analyses fail to yield meaningful insights.
The final risk consideration to be made is the potential for SRE teams to develop an over-reliance on these tools. If engineers lean on them too much, they may lose touch with their systems. In a scenario where a major incident hits and AI falls short, teams may struggle to respond quickly as they are no longer knowledgeable about what they’re working with.
Ultimately, engineers need to strike the right balance between staying up-to-date with what is happening in their system while taking advantage of the AI capabilities that allow them to quickly solve incidents. Here is my recommendation: just as “vibe coding” isn’t suitable for enterprise-grade software development, “incident vibing” – where SREs mindlessly trust AI SRE suggestions and apply fixes without review – should also be avoided.
Final thoughts
As LLM-assisted coding reshapes software engineering, SRE teams must prepare for more frequent and increasingly complex incidents. And with the rising enthusiasm around vibe coding, which deepens developers’ reliance on LLMs even further, we can anticipate this trend to accelerate, especially as companies have already started hiring engineers specifically focused on vibe coding.
Fortunately for SREs, the same AI revolution fueling these challenges introduces powerful new tools to address them effectively. Instead of resisting the AI-driven tide, teams can lean in, integrating these tools to meet new reliability demands. This will not only empower them to handle what’s coming but also reach the elusive six nines of reliability, without spending Google’s SRE budget.