You have 1 article left to read this month before you need to register a free LeadDev.com account.
Estimated reading time: 4 minutes
AI coding tools with broad permissions and limited oversight can quickly turn access decisions into production-level risk.
According to the Financial Times, changes made by Amazon’s Kiro AI coding agent triggered multiple outages to AWS services, specifically AWS Cost Explorer, after engineers allowed it to make changes with minimal oversight.
The incident, which was described by one engineer as “entirely foreseeable”, offers an early glimpse into what can go wrong as companies hand more control of core systems to autonomous agents.
One episode, a roughly 13-hour interruption in December, allegedly began when the tool decided the best fix was to delete and recreate an environment. That kind of decisive, context-light action will sound familiar to anyone who has watched automated systems confidently bulldoze the wrong thing because, according to the inputs, it made sense.
Addressing the FT report, Amazon says Kiro requests authorization before taking action and maintains that the issue stemmed from the agent having broader permissions than engineers expected. The company has framed the agent’s involvement as “coincidental” and said the incidents were down to user error rather than AI error, adding that its tools are no more prone to mistakes than people.
Regardless of who is to blame, the incident highlights some emerging issues that engineering leaders need to be aware of.
Your inbox, upgraded.
Receive weekly engineering insights to level up your leadership approach.
The permission problem is the real story
For most engineers, the AWS incidents will feel familiar, as production problems often come down to changes made by someone – or something – that had more permissions than people realized. The difference now is that AI tools are making some of those changes autonomously, putting the spotlight firmly on the level of permissions and visibility they are being afforded.
“Kiro is just the tip of the iceberg,” Kashif Nazir, senior technical architect at Cloudhouse, told LeadDev. “Everyone’s focused on whether AI will flood codebases with insecure code, but the scarier problem is agents being handed permissions no one properly thought through.”
The pace of new agent tooling across cloud platforms is relentless, with pressure on teams to adopt quickly, said Nazir. “When AI can write, test, and ship autonomously, a sloppy access control decision isn’t just an ops issue, it’s an unlocked door straight to production,” he said. “Agents will make that problem significantly worse before anyone properly solves it.”
What this really highlights is not whether AI writes bad code, which engineers have been dealing with forever, but whether teams fully understand the level of access they are handing over. With agentic tools, the real risk tends to come down to permissions.
Speed is the multiplier
The underlying risk is not rogue intent: it’s velocity. Autonomous systems do not hesitate, get tired, or stop to ask whether a change feels right. If something is technically permissible, it can happen instantly and everywhere.
“This is unlikely to be the last incident of its kind,” said Martin Neale, CEO and founder of ICS.AI. “If companies allow coding agents to operate without meaningful oversight, they’re effectively introducing a high-speed contributor that can replicate mistakes at scale.”
Neale argues organizations need what he calls an “agentic dome,” comprising strict guardrails, scoped permissions, enforced review layers, and continuous monitoring to keep autonomy within defined limits.
“Autonomy without control isn’t innovation, it’s operational risk,” he told LeadDev.
None of this is especially surprising, as new tooling tends to arrive faster than the processes around it. We saw the same thing with cloud, containers, and CI/CD, where teams moved quickly and then figured out the guardrails afterwards.
More like this
QA was not built for this
Experts say the shift is not just operational but cultural too, as traditional assurance models were built around the idea that humans were the bottleneck. Increasingly, they are becoming the supervisors instead.
“Incidents like this highlight both the perils and promise of AI as it becomes more embedded in software delivery,” said Roman Zednik, field CTO for EMEA at Tricentis. “The real risk isn’t just insecure code; it’s tools operating without shared context, governance, or the right control frameworks, often at speeds humans struggle to match.”
Quality engineering is moving toward continuous validation and shared oversight to ensure AI agents stay within clear boundaries, Zednik said, adding that the real question is whether teams can evolve their practices quickly enough to keep control of what is being shipped.
This is probably the easy phase of AI coding
So far, the AWS incidents have been relatively mild: short outages, internal disruptions, and the kind of thing companies can chalk up to lessons learned. The real test comes when these tools are woven more deeply into large production environments, where a bad change could mean more than a bit of downtime.
The pattern will be familiar to anyone who has watched major outages over the years. Changes propagate too fast, assumptions go unchallenged, and visibility lags behind reality. Agentic systems do not create those dynamics so much as accelerate them.
Amazon may insist the incidents were down to human error, and technically, that is probably true. However, as engineers increasingly act more like supervisors than authors, the line between human and machine responsibility starts to blur.
If this is the early warning, the real test will be whether teams tighten controls now – or wait until a much bigger outage forces the issue.

London • June 2 & 3, 2026
LDX3 London prices rise March 4. Save up to £500 💸