Why untested AI-generated code is a crisis waiting to happen

Here’s why the sector should embrace the principle of more haste, less speed.

By Chris Stokel-Walker

May 13, 2025

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Your inbox, upgraded.

Receive weekly engineering insights to level up your leadership approach.

Estimated reading time: 2 minutes

AI coding assistants have turbo-charged coding, shrinking release cycles to days and becoming the norm within companies. But that speed boost comes with fresh trade-offs.

A new survey of more than 2,700 devops and quality assurance leaders, published today by Tricentis, suggests nearly two-thirds of teams push code without fully testing it to hit deadlines. At the same time, more than four in five IT leaders think AI will deliver both speed and quality, and nearly 90% claim they already see a return on generative AI investments.

“We’re seeing speed and innovation soar,” David Colwell, Tricentis’ vice president of AI and machine learning says. “But without robust testing, we risk building on shaky foundations.”

According to Tricentis’ survey, nine in ten tech leaders now trust AI to green-light releases, but 66% of organizations still expect a major outage in the next 12 months. While that level is fairly static, it highlights that AI isn’t improving things. That disconnect, Colwell argues, means “storing up security risks we’ll be patching for years.”

Lighting a long fuse

The fear is of a slow-burn crisis where generative AI engines spew reams of code, stitched from web-scraped snippets with dubious provenance.

When that code leaps from prompt to production without being vetted, the potential attack surface balloons in size. The bill for defective code is already sizable: 40% of firms say malfunctioning or miscoded software costs them at least $1 million a year, through staff churn, increased technical debt, and escalating maintenance costs, with losses above $5 million in almost half of large US firms.

Friction within organizations compounds the risks. A third of respondents say they have weak feedback loops between developers and testers in their company, with 29% saying leadership has never spelled out clear quality metrics – meaning they can’t pinpoint where things are going wrong.

Unresolved technical debt, which is named the top obstacle by 34% of respondents to the survey, also add up to the risk that vulnerabilities slip through. AI itself can make this worse: biased or hallucinated outputs can result in flaws traditional unit tests miss being embedded in code, with AI-reliant engineers unable to troubleshoot and solve the issues.

Green shoots?

Despite the pessimism, the same survey outlines a potential path forward. Nearly a year after Devin promised a false dawn in AI coding, excitement about agentic AI remains high within the industry: 82% of teams want autonomous agents to take over repetitive chores, and virtually all believe AI-driven testing will raise both speed and quality.

Colwell says that optimism must be matched by governance in order to stay safe. “Leaders need to define what quality means, decide what level of risk is acceptable, and bake that into testing from day one,” he explains.

Practically, that means security-focused, AI-augmented checks at every commit, from coverage tools flagging blind spots, scanners probing generated code, and policy gates halting releases when risk spikes. Grasping how AI makes decisions, spotting bias and interpreting complex toolchains are fast becoming critical skills for engineering managers to adopt.

The stakes are stark. Two-thirds of organizations are expecting outages this year. For software engineering teams and the businesses that employ them, the question is no longer whether AI will write your code, but whether defenses can keep pace – and what the business can do about it.

About the author

Chris Stokel-Walker

Chris Stokel-Walker is a freelance journalist based in the UK.
- @stokel

Newsletters

Webinars

Videos

Reports

For you

New York

Berlin

London

Meetups

Why untested AI-generated code is a crisis waiting to happen

By Chris Stokel-Walker

Your inbox, upgraded.

Lighting a long fuse

Green shoots?

About the author

Chris Stokel-Walker

New York

Berlin

London

Meetups

Why untested AI-generated code is a crisis waiting to happen

By Chris Stokel-Walker

Your inbox, upgraded.

Lighting a long fuse

Green shoots?

Share:

About the author

Share:

More like this