New York

October 15–17, 2025

Berlin

November 3–4, 2025

London

June 2–3, 2026

AI doesn't make devs as productive as they think, study finds

A new study raises serious questions about how perceptions of productivity translate into results.
July 17, 2025

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Estimated reading time: 3 minutes

Despite software engineers believing that AI helps unlock more pace in their work, a new study suggests that those perceptions of productivity gains might be overblown

Researchers from the Model Evaluation & Threat Research (METR) organization gave the contributors to large, mature GitHub projects 246 issues randomly tagged “AI allowed” or “AI disallowed”.

When permitted, participants mainly used the Cursor Pro IDE and Anthropic’s Claude 3.5/3.7 models to support their work, logging their time and screen activity over roughly two months between February and June 2025.

Before being given the task to complete, the 16 developers were asked to estimate how much time using AI would help them save while completing the task. Despite thinking they would work 24% quicker, in reality, it took them 19% longer than without AI assistance.

“Our primary motivation in doing this study was to figure out the methods we would use to understand if a certain set of software developers were being sped up by AI,” explains Nate Rush at METR, and one of the co-authors of the paper.  “We thought we’re probably going to get a very obvious result: a speed up of let’s say 20% or 50%, or two times, and of course, that’s not what we found.”

Unbelievable results

Predictably, both AI boosters and skeptics have taken the study to reinforce their views. For those bullish on AI, the study is so implausible that it must be wrong. For the AI skeptics, it’s evidence that the hype doesn’t match the reality

Steve Newman, the co-founder of Writely, which became Google Docs, admitted they thought the results were “too bad to be true”, before digging into how the study was conducted and coming to the conclusion its findings were legitimate.

“The response to the paper to us has really demonstrated that people are very desperate and hungry for good information about the impact of AI in the wild,” says Rush.

“It feels to me very credible research,” says Simon Willison, the co-creator of the Django Web framework and regular user of AI coding tools. While he acknowledges the study’s small sample size, he believes that it still can inform developers how to adopt AI. “I think that the absolute truth in it is the thing about how people are terrible at estimating their productivity performance,” he says. 

“The AI productivity myth just got some real data behind it,” says Milan Milanović, a chief technology officer at 3MD with more than 20 years’ experience in various industries. Milanović says that the findings highlight one of the conundrums that more experienced developers may encounter – they know their codebase better than the AI ever could. These developers “knew their codebases inside out, working on million-line repositories with years of accumulated complexity,” he says. “In that environment, AI becomes a liability rather than an asset.”

What can developers learn from the paper?

The researchers are keen to highlight that the results don’t conclude that AI assistance hinders developer productivity. “We certainly would discourage anyone from interpreting these results as: ‘AI slows down developers’,” says Rush. Instead, the authors say it could be used to help better inform people how to use AI appropriately

The team probed 20 possible explanations for the disparity between perceptions and reality. An over-optimistic trust in AI, the sheer size and idiosyncrasies of mature repositories, and the need to correct unreliable model suggestions came out on top.

Lena Reinhard on stage at LeadDev New York 2023

But co-author Joel Becker also accepts that their findings could only hold for the AI models that the researchers tested. “It’s possible, and indeed, many people suggest that the frontier AI models and tooling has progressed very rapidly since then, such that these developers in this setting today, let alone in the near future, might likely be sped up,” says Becker. 

Either way, it has highlighted the need for more rigorous measurement and research. “If this inspired a bunch of other organizations that are well-funded to do that sort of level of meticulous study, that would be good,” he says.