London

June 2–3, 2026

New York

September 15–16, 2026

Berlin

November 9–10, 2026

The struggle to prove AI productivity gains

Measuring developer productivity is still difficult!
March 25, 2026

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Key takeaways:

  • Nearly all teams are adopting AI, yet most can’t clearly measure its impact despite growing pressure from leadership to prove results.
  • Engineering productivity is a moving target: traditional metrics like lines of code break down with AI.
  • Measure broadly, not perfectly. Use simple, consistent metrics across usage, customer value, quality, and human impact.

Organizations are still struggling to measure AI’s impact on engineering productivity, as board-level expectations shift from teams simply adopting AI tools to delivering tangible output with them.

A new report from the engineering intelligence platform Multitudes paints a paradoxical picture of AI-coding tool adoption. For the study, Multitudes surveyed more than 700 engineering professionals and found that 75% of respondents struggle to measure AI’s impact.

These findings chimed with LeadDev’s own research. According to the AI Impact Report 2025, 98% of respondents are exploring the use of AI tools and models, yet the most common challenge for 60% of respondents is a lack of clear metrics to evaluate their impact.

We’re measuring productivity, again

Engineering productivity is often the byword for success when it comes to working with AI-coding tools, but it is proving the most difficult to gauge, cited by 26% of Multitude’s respondents. That’s ahead of other important factors such as tech debt (13%) and the quality of outputs (11%).

The survey also found that 40% of those polled feel board-level pressure just to adopt and use AI tools, and 39% feel under pressure to deliver and demonstrate improvement outcomes with them. When it comes to what those improved outcomes look like, 60% cited productivity specifically.

Respondents are actively investing more resources and effort into AI learning and development (22%) to help in this regard, the survey found. This is followed by a focus on transparent communication (18%) and resetting board expectations via education (18%).

Far less are planning to push back on those expectations (7%) or define usage policies (4%).

The challenge of measuring engineering productivity

Measuring engineering productivity has proved to be a stubborn challenge. Just ask McKinsey. Software development is complex, collaborative knowledge work, making it difficult to gauge with simple metrics.

“Companies have never been meaningfully able to measure engineering productivity. Literally every metric that has even tried has failed, badly,” Mark Robinson, staff infrastructure engineer at Plaid, told LeadDev. Most leaders are looking for retrospective metrics that they can point to to justify their investments, he added.

Throwing AI into the mix complicates matters significantly, because effective measurement isn’t as simple as tallying lines of code generated. AI might increase code output while decreasing quality, or reduce code volume by generating more efficient solutions, meaning traditional metrics can mislead. 

It’s also difficult to isolate AI’s contribution, since many other factors like team experience, tools, and task complexity influence productivity.

A lack of AI-specific metrics adds to the challenge. Only 31% of firms have Objectives and Key Results (OKRs) in place to specifically measure the impact of AI tooling, according to Multitudes’ survey. What’s more, while 86% of respondents want to use AI to move faster, only 15% have OKRs in place to track associated speed and velocity impacts.

“Even before AI, it was difficult for the industry to agree on one definition of engineering productivity or how to measure it,” Krishnan Sridhar, VP of engineering, DevX and platforms, at LinkedIn, told LeadDev. “AI makes that harder by changing how work happens across the full path from idea to production, not just how fast tasks get completed.”

Previous engineering productivity frameworks generally focused on when an engineer picked up the ticket or started writing code, said Lauren Peate, founder and CEO of Multitudes. “Now, the engineer might be picking up an idea or problem, talking to users, building prototypes, and then taking it through to implementation and production.”

In some organizations, AI has sped up development so much that now the hardest part is figuring out what to build next, and preparing it properly. Any productivity metrics that just focus on writing code will miss that new bottleneck, Peate said.

How to measure AI’s impact on engineering productivity

As AI moves from experimentation into core engineering workflows, it’s becoming increasingly important for engineers to accurately gauge its impact on productivity. 

“Leaders need to understand whether AI is improving velocity, quality, and decision‑making in meaningful ways, or adding complexity,” said Sridhar. “Without that clarity, it’s difficult to scale adoption responsibly or make informed investment decisions.”

So how should organizations approach measuring AI’s impact on productivity? Peate recommends a three-tiered approach:

  1. “Start with the goal you want to achieve with AI – it’s usually productivity, but how do you define that? How much do you care about quality alongside that?”
  1. Then find metrics that are good enough. “Done is better than perfect on this; it’s better to have a metric that’s feasible to track consistently than one that’s more accurate but hard to get.”
  1. Last, focus on improvement, not the current state. “Run experiments and track progress over time. Then use the metrics to prompt discussion about why things are getting better or worse, and what the next experiment is that the team should run.”

Productivity should be treated as a system‑level outcome rather than a single metric, which means looking at how AI affects flow, collaboration, and judgment across the software development lifecycle, Sridhar added.

“The real test is whether those signals help leaders understand if teams are shipping more effectively with the same resources.”

LDX3 New York lineup

Which metrics are key?

There are four metric buckets that are important now, Peate said. These are AI usage data, customer value delivered, customer pain created, and the impact on human engineers

“Specifically, track AI usage data like tokens used, lines accepted, and cost, plus the dates of AI interventions. Combined, these help leaders better tease out the impact of AI.”

By comparing metrics before and after AI experiments, teams can net out differences that existed before the AI intervention. By looking at high AI adopters, teams can measure the people using AI the most, versus those using AI less. 

As for customer value delivered, this should link to a unit of customer value, like a feature, but fallbacks might be pull requests shipped or tickets completed, Peate added. 

Meanwhile, the number of incidents is the obvious metric to use for gauging customer pain created, but bug reports are also a good way to track the broader impact. “I also recommend having leading indicators for this, as no one wants their first indicator of AI quality issues to come from an AI incident at 2 am,” Peate said. 

Finally, AI rollouts are bringing more out-of-hours work, so teams should track this as an indicator of the human impact of AI changes, Peate said.

“Outcome‑oriented metrics tend to be more meaningful than raw activity counts, because they reflect whether AI is actually helping teams build and ship software more effectively,” said Sridhar.

These kinds of signals also avoid incentivizing speed at the expense of long‑term quality and resilience.

“Ultimately, the most important question is whether teams are delivering on their roadmaps faster and with confidence,” Sridhar added. “Metrics are most impactful when they tie directly to the outcomes the organization is trying to drive.”