You have 1 article left to read this month before you need to register a free LeadDev.com account.
Estimated reading time: 9 minutes
Key takeaways:
- Cognitive debt is the technical debt nobody is measuring. When AI generates code faster than engineers can understand it, teams lose track of what their system does and why.
- The tool isn’t the problem – the relationship with it is: two interns used AI at the same frequency. The difference? Whether they engaged with what it produced.
- Your satisfaction scores are hiding the problem! The most reliable signal isn’t how engineers feel, it’s whether their skills are actually developing.
Dr. Margaret-Anne Storey, professor of computer science at the University of Victoria, first used the term ‘cognitive debt’ in October 2025 while teaching an entrepreneurship startup course. She had encouraged her students to use AI to move faster, and that worked well: the students were putting products into the hands of users and getting feedback. However, they were struggling with implementing that feedback back into their product.
“At first I thought they had technical debt,” Dr. Storey tells LeadDev, “but as I started talking to them, I realized it was not that. It was about the fact that they had lost the plot. They had lost track of what features they were trying to build and why. They didn’t even really know who knew what on the team.
Worse, on one of the teams, one person knew all the code and understood it because they were supervising the AI that generated it, while the rest of the team was unable to do that. The person who had generated the code didn’t really understand what was generated either. So when a change was requested, they admitted, “No, I can’t really make that change. Every time I try to change something, something else breaks.”
Your inbox, upgraded.
Receive weekly engineering insights to level up your leadership approach.
The invisible kind of debt
Cognitive debt, as Storey defines it, is the accumulated loss of understanding of what a system is doing and why and who on the team knows what. It is distinct from technical debt.
You can have clean, well-tested code and still have cognitive debt if the people who built it can no longer fully account for it, or if that understanding is concentrated in one person rather than distributed across the team, she explains.
The concept draws on an old idea from computer scientist Peter Naur of programming as theory building. Naur argued decades ago that software is not really the code; it is the theory in the heads of the people who built it. What matters is the shared mental model: how the system is designed, how the architecture maps to user needs, and what the capacity for change actually is.
When you generate code faster than people can build that theory, cognitive debt forms. The challenge for engineering leaders is that standard metrics don’t surface cognitive debt.
When you know that you have a blind spot, you check it when you’re in the car, Dr. Storey says, but it’s the blind spots you don’t know about that you don’t check.
With the focus on acceleration and productivity with AI tools, engineering teams don’t think of pausing and asking questions like, who’s the go-to when we need to understand a particular part of the system? Do I know what you know? Do you know what I don’t know?
Dr. Storey says that cognitive debt is a property of the system or of the project; there is no ‘my cognitive debt.’
“I heard one person say that sometimes it scares him that he created this system and it works and it’s running, and he’s forgotten what it was he was even trying to create in the first place. It’s that loss of control, that loss of memory across the system about what questions to even ask anymore.”
Same internship program, less technical growth
Tereza Jurić, principal engineer at Infobip, has been watching this happen firsthand. She has been co-running an engineering internship program at Infobip since 2022. Over eight cohorts and 225 interns, they tracked every selection round, every internship, and knew almost every intern individually.
When Anthropic published a controlled study in January 2026, a randomized trial with 52 junior developers showing that those who used AI assistants scored 17% lower on code comprehension tests, the equivalent of nearly two letter grades, they were inspired to do their own study.
“We realized we had something Anthropic didn’t,” she says. “Longitudinal data on real people across real cohorts, from 2022 to the present. They measured a short-term effect in a lab setting; we could look at what this looks like across years in a real program, on real careers.”
They designed an alumni survey, dividing the cohorts into a pre-AI group (interned up to summer 2023) and an AI-era group (from winter 2024 onward). The survey measured self-assessed skill growth across four dimensions at the start of the internship, retrospectively today, and the current level:
- AI tool usage patterns.
- The Dunning-Kruger gap (the difference between how they assessed themselves then and how they retrospectively assess themselves today).
- FOBO (fear of becoming obsolete) indicators.
- What they do when they get stuck, whether they understand AI output, mentor support, and challenge calibration.
The results were consistent across all respondents:
- Pre-AI cohorts grew an average of +2.50 in technical skills; AI-era cohorts +1.56.
- Teamwork: +2.51 versus +1.69.
- Problem-solving: +2.21 versus +1.52.
- Personal excellence: +2.67 versus +1.85.
The starkest comparison was between Winter 2023 and Summer 2025, the same program, 2.5 years apart, where technical growth was +2.89 versus +0.69. The most significant difference between those cohorts: 0% heavy AI usage in Winter 2023, 54% in Summer 2025.
“What makes the finding particularly notable is how the program’s quality metrics moved over the same period. Net promoter score (NPS) was rising and 91% of interns rated the challenge level as well-calibrated. Mentor scores held stable at around 7.6 out of 10. By every visible measure, the program had improved,” Jurić says.
This confirmed that the problem wasn’t the quality of the program. AI was removing the productive struggle – the ‘wrestling with a problem’ that is the learning.
More like this
Passive delegation vs. cognitive engagement
The Anthropic study points to the mechanism. Participants who scored poorly used AI for code generation and debugging without trying to understand what was produced. Those who scored well used it to generate code and then asked follow-up questions or requested both code and an explanation of the logic. The tool wasn’t the determining factor; the cognitive relationship to the tool was.
Infobip’s current cohort data shows the same. Among the ten interns in the Winter 2026 cohort, Jurić observed two, R10 and R8, who both went to AI first when stuck and used it daily. Their outcomes could not have been more different.
R10 understood the reasoning behind every piece of AI-generated code, asked follow-up questions, and could fully explain what had been produced. Their growth across the internship: +3 points, the highest in the cohort. R8 used the output with minimal review, accepting it as a black box. Their growth: 0.
“They use the same tool, at the same frequency, with the same initial pattern,” says Jurić, “but with opposite cognitive engagement and correspondingly opposite growth. That is what Anthropic’s study calls the difference between passive delegation and cognitively engaged AI use.”
There is another dimension to the findings of Infobip’s study. Of the 96 alumni surveyed, only one person explicitly identified the mechanism, noting in a written comment that if interns have no productive struggle, they will grow more and more reliant on AI, which is not good for someone just starting out. The other 95 didn’t articulate it.
“The problem may be operating below the level of awareness,” says Jurić. AI-era cohorts also showed a larger Dunning-Kruger gap: +1.30 versus +0.88 in the pre-AI cohorts. They didn’t know what they didn’t know. Their satisfaction with the program was, if anything, higher.

New York • September 15 & 16, 2026
Delivering AI results without a playbook?
Find what’s working at LDX3
Building back the mental model
Junior engineers who previously would have asked a senior colleague when stuck are now, understandably, asking an AI assistant. The answer is often better: faster, more patient, more tailored.
However, the senior now doesn’t know what the junior was struggling with. The relationship doesn’t form. The mental map of who knows what, which used to emerge organically through code reviews, pair programming, and the friction of collaborative work, doesn’t emerge at all.
Infobip’s research resulted in concrete strategies they’re implementing to strike the balance between banning AI tools for juniors (which does not make sense) and impaired technical growth that comes with overreliance on AI. Their rules for interns now are:
- “Try first, check with AI second:” the goal is to form a habit of independent thinking before AI becomes the default.
- AI sequencing: build a mental model first, then use AI to accelerate. AI after understanding, not before.
- AI-free checkpoints: occasional checks to see whether an intern can solve a problem without AI. Not as punishment, but as calibration so both the intern and the mentor know where they actually stand.
- Mentor as coach, not answer provider: mentors should track the process, not just review pull requests. The right mentor is someone who can create an environment in which new people grow, not just someone who is senior enough.
- Visibility of professional standards: 20% of Winter 2026 interns didn’t know what’s expected of a junior engineer. Without a reference frame, you can’t assess your own progress. Make visible what a junior engineer’s level actually looks like.
- Distinguish enhancement from replacement: the problem isn’t how much AI is used, but how. R10 and R8 used AI equally often, but R10 used it as an extension, while R8 used it as a replacement for their own thinking.
- Measure trajectory, not satisfaction: Infobip NPS and program satisfaction scores were rising while growth was falling. The most robust early indicator is self-assessed skills trajectory, not experience ratings.
At the team level, Storey advocates for retrospectives with questions pointed specifically at assessing the level of shared understanding:
“Put up an architecture diagram and ask who can explain each component. Map features to user needs and ask when the team last checked whether those features are actually serving those needs. Ask explicitly who on the team understands each major subsystem. These questions surface cognitive debt in a way that no deployment metric will.”
Being careful at developing signals of when a team has cognitive debt is more important at the beginning than trying to find another number to track, she adds. Technical debt is a tradeoff engineering leaders are well aware of, and they know how to repay it: find it and refactor it.
Cognitive debt has no equivalent. You cannot run a script to recover what engineers never learned or restore the shared understanding a team never built. The only way to manage it is to prevent it from accumulating in the first place, and that requires looking for it before it shows up in the work.