Development metrics you should use (but don’t)

How to sensitize your teams to factors that positively impact results

By Cat Swetel

October 12, 2020

You have 1 article left to read this month before you need to register a free LeadDev.com account.

When was the last time you physically bumped into someone?

Even when not framed as a target, metrics have an unusual way of sensitizing us to things previously invisible and/or unnoticed. Take, for example, the guidance we have all received during this pandemic: six feet (two meters) is a safe distance to prevent the spread of COVID-19. Having this metric constantly repeated to us and stressed as important has made everyone more conscious of where they are in a physical space. Now it is impossible not to notice someone standing just a little too close in the grocery line, and it is impossible not to appreciate physical closeness when safe.

So how do we, as technology leaders, help our teams choose metrics that will sensitize them to factors that can actually impact results in a positive way? I have found a few metrics and frames that help teams feel empowered when seeking continuous improvement and when interacting with customers.

Time in process

The first metric I recommend collecting is time in process[1], a.k.a. the units of time per one unit of work. All that’s needed to calculate time in process is the start and end date for each work item.

Time in process

This data can be easily plotted to examine trends in time in process. Once you have time in process for each work item, you can also begin to answer questions like, ‘How long do you think this work item will take?’ using simple probability – for example, ‘80% of work items are delivered in 31 days or less.’ (See Figure 1.)

While many work-tracking tools promote the use of standard deviations, probability actually tends to be more accurate (and easier to calculate). This is due to the distribution of the times in process. There is a minimum time in process for every work item, and this is the minimum amount of time it would take to actually do the work to complete the story if all the needed resources and knowledge were readily available. However, there is no maximum duration. Whether waiting for reviews and sign-offs, or sitting stalled out due to shifting priorities, work items can be infinitely delayed in an endless variety of ways – so the minimum is fixed, but the maximum is not. For this reason, the distribution of times in process does not look like a perfectly normal and lovely bell curve; the distribution is a bit lopsided. Therefore, probability (the easier-to-calculate option) is much better than standard deviations for providing a delivery forecast for customers and stakeholders.

[Figure 1]

Productivity: throughput

Throughput is the number of units of work[2] per one unit of time, e.g. twelve stories delivered every two weeks. If you have the end date for each unit of work, it is easy to see how many work items were delivered for each unit of time, e.g. sprint, month, etc.

Using the throughput distribution, you can use probability to forecast, and begin to answer questions like, ‘How many stories are going to be done in time for the next release?’ or ‘When will this release ship?’

[Figure 2]

Improvement: where to start

In my experience, the secret to driving improvement in business outcomes is simple: experience the flow of work from the point of view of a work item. When every person (or team) is busy and productive, it can be difficult to understand why outcomes are not improving. However, if you experience the flow from the point of view of the work (rather than the worker), opportunities for improvement become clear. Is work building up between teams? Is there an arduous approval or deployment process? Does every work item need sign-off from marketing or legal before being deployed? Is one work center processing work in big batches?

While many of our techniques for managing development work have the goal of making developers type faster or more efficiently, the sad truth is that the majority of a work item’s time in process will be spent waiting. The ratio of touch time to total time in process is referred to as flow efficiency, and even world-class value streams often only achieve a 15% flow efficiency. This means that 85% of the total duration from customer request to feature-in-use is made of wait time.

Flow efficiency

If we measure the time in process and total work in process for each stage in a value stream, we can begin to understand where the bottlenecks are in the flow. Improvement at the bottleneck will drive improvement in outcomes, whereas improvement outside of it can actually exacerbate existing problems by overwhelming bottlenecks. If you need an example of how improvements outside the bottleneck can go wrong, I highly recommend watching the chocolate factory clip from I Love Lucy.

Assumptions: quality and value

Quality

Walter A. Shewhart

I hope it goes without saying that it is extremely important to have a metric for quality to balance out the metrics for responsiveness (time in process) and productivity (throughput) detailed above. However, I have found quality to be extremely dependent on context. There are lots of metrics that could be used: escaped defects, defect density, release rollbacks, etc. depending on what would be meaningful to your users and your business.

In my experience, the most important consideration for quality metrics is to preserve a sense of safety. Metrics that have a ‘per team member’ element (e.g. defects reported per unit of time per person or per code owner) can, in some team cultures, lead to finger pointing and blame. This makes it less likely that future defects are reported properly. The user doesn’t care who wrote the bug. The user cares that their expectations are met and trust is maintained.

Value

Dr. Russell Ackoff

All of the above-mentioned process metrics assume you have some measure of value and that you are building the right thing. Having a value-oriented goal can help folks make informed tradeoffs between process metrics. For example, if the value-driven goal is to exploit a first-mover advantage, it may be prudent to favor responsiveness over quality. In the absence of a consistent value context, each person will need to make personal value judgments on tradeoffs. This can lead to confusion and incoherence in the product.

Years ago, I was talking with a business unit CIO (Chief Information Officer) at a large company. He was explaining to me that he had given all teams in his business unit the single goal of reducing story time in process by half, and they had achieved that goal in a matter of a few months. I thought this was amazing and couldn’t really understand why he seemed so disappointed, but he went on to tell me that reducing story duration had done nothing for the business metrics he was hoping to move. He explained the company’s existing market was shrinking, and that they needed to branch out or risk steady decline. The desire to reduce story time in process was in an effort to be able to rapidly experiment with new products and services to tap new markets.

Unfortunately, none of the teams had that context. They had done all sorts of creative things in order to achieve the target of a 50% reduction in time in process e.g. making stories so granular that they were actually tasks, stopping the story duration clock every time the story was not actively being worked, etc.

The folks on these teams loved their jobs and wanted to meet the target, but in the absence of context and a balanced set of metrics, they made their own best judgments.

The moral of this story: metrics require balance and are only meaningful in context. If you don’t know why you’re measuring something, just don’t.

Footnotes

[1] Some people call this cycle time, but cycle time has a very different meaning in manufacturing and queuing theory. I have found this leads to a lot of confusion, so I use the phrase ‘time in process’.

[2] At this point, you may be wondering why I recommend throughput of work items (or stories) versus velocity in story points. I’ll leave that explanation to Ron Jeffries who has this to say: ‘I may have invented story points, and if I did, I’m sorry.’