New York

October 15–17, 2025

Berlin

November 3–4, 2025

London

June 16–17, 2025

AI coding trackers are here. Proceed with caution

Companies are finally starting to track AI usage within their engineering orgs. Should we be worried or remain cautiously optimistic?
June 18, 2025

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Estimated reading time: 15 minutes

It’s finally happened. Engineering orgs have taken to the dashboards to measure AI usage. But is this such a bad thing? 

The rise of AI coding assistants like GitHub Copilot, Cursor, and Windsurf has ignited a new challenge in the field of software engineering: how to understand the impact of these tools.

Acceptance rate (how often developers accept the code suggestions an AI tool makes) has taken center stage as the de facto metric for AI coding tools. But a new crop of dashboards designed to track developer usage offer even more granular details around how teams and individuals are using them. 

While these metrics can be helpful for validating if tools are worth their cost and identifying power users who can educate others inside an organization, some engineering managers warn of over-relying on “vanity” metrics and remain wary of using them to set goals or KPIs. 

“Introducing a new piece of technology and moreover, a new or updated development paradigm for our engineers, really was the impetus to find ways to understand the impact of the change,” said Matt Fisher, VP of product engineering at video company Vimeo. “Also, these tools cost money and we’re always looking for what the return on investment is – not just the productivity improvements, but also the quality improvements.”

What are the AI dashboards tracking?

About a year and a half ago, Todd Willms, director of engineering at digital asset management company Bynder, started using the GitHub Copilot Dashboard from Jellyfish. He made Copilot available for his developers to use without mandating it. He wanted to know if his teams were actually integrating it into their workflows to make sense of the conflicting experiences he was hearing from across the company.  

“I would get feedback that the tool isn’t doing anything and we’re not using it. And then I would get feedback that, no, we really use it a lot. So it was hard to square that and know if it was worth paying for,” Willms said. “At the end of the day, that was really what I was trying to figure out. And so that’s where this tool, Jellyfish, became really useful for me personally and to justify it.”

The Jellyfish dashboard dives into acceptance rate, showing the percentage of Copilot-written code suggestions accepted each day, the rate of accepted and rejected lines of code broken down by coding language, and each developer’s individual acceptance rate and total number of accepted suggestions. 

Fig.1. Sample Jellyfish dashboard view. The data shown is for demonstration purposes only.

Fig.1. Sample Jellyfish dashboard view. The data shown is for demonstration purposes only.

Acceptance rate is just the beginning, however. A pie chart breaks down how many engineers are “power users,” “casual users,” “idle users,” “new users,” and “unlicensed.” The dashboard also displays the total adoption rate among all Bynder developers, the adoption rate by team, and adoption rate by role, comparing the usage of software engineers versus senior engineers versus DevOps engineers and so on. Additionally, the dashboard shows how timelines for different parts of the engineering workflow, like refinement, review, and deployment, are being impacted, showing timeline data for when Copilot is used versus when it isn’t. There are also metrics showing usage over time, which were “up and to the right” and certainly helped Willms feel confident about the investment.

Jellyfish is just one offering, with similar products from companies like Ospera and Faros AI in the market. Some coding tools themselves also offer limited metrics. 

Simon Lau, engineering manager at electric vehicle charging software company ChargeLab, said his team uses Windsurf and the tool shows each user’s total completions (or instances of accepting a code suggestion) over the past eight days, total completions over time, a streak of how many days they used it in a row, and a breakdown of the coding languages used with the tool. Lau can’t see each developer’s individual metrics, but has them self-report to him for an overview. 

Overall, it’s a lot of data packaged up neatly and available to monitor regularly. Since engineering managers can easily track intricate data on developer AI usage, the real challenge comes with deciphering the data and recognizing the gaps and what it isn’t showing.

Separating “vanity” metrics from true impact

Lau is an example of an engineering leader who took direct action based on the usage metrics. “We used the reported data to shape our OKRs, including a goal last quarter to reach 6,000 completions – which we successfully achieved,” he said.

Others, however, are more cautious – or are at least hyper-aware that it shows an incomplete picture. The data represents usage, not actual impact on the products or business.  

One engineering manager at a fintech startup said they feel lucky their boss listened to their recommendation to focus AI ROI discussions on deployment frequency, not acceptance rates or the number of lines of code written by AI. 

Citing Godhart’s Law, they stressed that AI code-generation metrics are a “terrible choice” because they incentivize more code where less would do. Instead, they suggest evaluating AI coding tools through engineer qualitative self-assessments and throughput metrics attached to the wider engineering organization rather than individuals. This includes: on-time delivery, and SPACE and DORA frameworks.

“It’s outcomes, not activities, that get results,” he said.

Fisher, who uses the Faros AI coding assistant dashboard, and Willms, both similarly said that the value of AI copilot usage data, especially acceptance rates, is limited and that they aren’t yet comfortable using it to set any goals or KPIs. The data gained them insight into whether engineers were actually adopting the AI coding tools, providing a “30,000-foot view,” but it didn’t clarify if the lines of code were satisfactory, how many errors the code carried at deployment, or how many outages it created. 

What’s more, Willms and Fisher said the most important benefits aren’t reflected in the data at all. For example, they both pointed to how AI coding tools have significantly sped up the acquisition of context as the biggest gain, allowing developers to easily jump in and work in codebases they’ve never seen, creating huge upside for new hires in particular. 

“It’s not just about slamming out code. It’s about understanding where to put the code. How to write the code, sure, but also how to test the code and how to deploy the code,” said Fisher.

For him, where the Faros AI dashboard has been the most useful is where it goes beyond the “vanity” metrics on pure coding tools usage and shows information about the entire code development and developer workflow – from working through ticketed backlogs to how long it takes to build, implement, and deploy changes. This is important context, he said, because AI coding assistants are impacting multiple steps in these workflows.

“It’s nice to know how it’s being used, but then correlating that back to ‘do we see expedited throughput for ticketing now?’ And understanding, for individuals using the service this much, does it equate to something else on the backend? And it may or may not, but having all the data and then visualizations and tooling under one roof allows us to at least start understanding the attribution from impact to downstream,” said Fisher. 

Power users and open dialogues 

With the overall rise of “bossware” designed to track employees’ every move and measure their productivity, especially in the age of remote work, developers, understandably, might be wary about intricate data being tied to how they use these tools. 

Some companies have introduced AI coding assistant mandates, and this data could act as fuel for misinformed non-technical leaders. Without properly understanding what the tools can do, they risk miscalculating what success and productivity look like in the developer workflow. But when approached thoughtfully as part of an open dialogue, some engineering leaders argue that data can also be useful to developers.

Data around usage and acceptance rate per coding language, for example, can help developers determine when the tools are most beneficial. All developers at Bynder have access to the dashboard and their own metrics, with Willms saying it’s meant to be a collaborative resource rather than something he is tracking as a manager. 

“It’s for them to look at it and for them to try to understand what’s going on with their delivery and productivity,” he said. 

Having mutual access to the dashboard has also sparked enlightening conversations between Willms and some developers. For example, one senior developer refused to use any AI tools in his work. Then, after a new capability that went beyond coding was released, the developer started experimenting. The dashboard showed that his usage shot up from zero to “power user,” igniting a productive discussion around what tasks the developer and Willms find it useful for. 

Lena Reinhard on stage at LeadDev New York 2023

The ability to easily identify power users has been another huge gain of having these metrics, said Fisher. In turn, power users have taught others about how to derive benefit from AI coding tools based on first-hand experience. Fisher says this has been extremely effective since it’s coming from someone they know, trust, and who is familiar with their work, rather than the hype cycle.

“It’s about understanding across the aggregate how different people are using it to build that inspiration, to build that understanding. And generally speaking, when creative minds have that spark, it ignites something,” said Fisher. “It ignites enthusiasm and energy, and it builds on the excitement that will drive somebody to want to dig deeper. Now that’s a flywheel.”