Updated on February 28, 2021 to clarify some material and make it consistent with materials written at a later date.
It should be stated from the outset that metrics, in any form, have more potential to do great harm than to do the slightest good. To be clear, we’re talking about metrics that measure operations and performance, not numbers that measure the size of something. For example, when we express the complexity of a user story, we discuss “story points.” This is a measurement of size, not a performance metric. On the other hand, when we add all those story points together for the stories completed in a sprint, we have “velocity” a questionable metric for measuring with performance.
You should create performance metrics under two conditions:
- The metrics are used to improve or monitor something that actually needs to be improved or is mission-critical. Lots of metrics are often created for the sole purpose of creating metrics. In other words, having detailed and exhaustive metrics was an end in itself. It keeps people busy collecting and regurgitating the information in the form of PowerPoint slides and posters and is a huge waste of time. Performance metrics should be created when performance is so critical to the mission of the organization that any change in performance is to be immediately addressed. For example, if your organization is required to address issues or requests within a specific period of time, monitoring performance could be mission-critical.
- The metrics are used by the team for forecasting purposes (that is, understanding approximately how much work can be done in a single Sprint).
No matter what your performance metrics tell you, always be careful not to jump to conclusions when measurements change. Likewise, when teams are punished (i.e., made to work longer hours, demeaned or shamed by others in the organization, or even fired if productivity drops) it is not uncommon for teams to knowingly or unknowingly “game” the metrics to simulate a desirable outcome. For example, if you measure a Scrum team’s productivity solely through their velocity, the team can artificially improve velocity by simply over-estimating stories (teams can even “lean” toward larger sizes without realizing it — the desire to protect one’s job is powerful). For this reason, I suggest the metrics NEVER stand alone. In other words, if you want to measure the productivity or performance of something (a Scrum team, a program, a process), create duos or trios of metrics that are closely related but tend to show a realistic and more complete picture.
A Suggested Scrum Team Metric Triad
Measuring Scrum Team productivity is something that nearly every organization wants to do. Having said that, I firmly believe that you don’t need to measure Scrum Team productivity unless one of the earlier guidelines is true, that 1) team productivity needs to be improved and 2) the metrics are primarily intended to support forecasting. If your organization is already happy with Scrum team productivity; if the team or teams are producing at least as much as management expects at a high level of quality, find a better use of your time than creating valueless metrics.
If, however, the organization feels strongly that team productivity could be improved (and this might simply be because the team is new and there’s no way to know what the team’s potential might be), here’s a trio of metrics that can be used to measure productivity:
- Achieved DONEness – this metric is a ratio of the sum of the story points of the stories that the Product Owner accepts at Sprint Review over the sum of the story points of the stories that the Scrum Team said was DONE at Sprint Review. In other words, if everything the team says is DONE at Sprint Review is accepted by the Product Owner, this metric would be equal to 1. If the team completed the equivalent of 18 story points and the Product Owner accepted those same stories, the ratio would be 18/18 = 100%. On the other hand, should the Product Owner accept 14 points, the metric would be 14/18 or 78%. This metric focuses the team on DONEness, rather than speed. This is important, as it is the improved quality of the software that the team produces that often plays the largest role in productivity gains. Teams reach 100% by internalizing DONEness, listening to the Product Owner, working more closely with the Product Owner, and giving the Product Owner more transparency regarding what is being built during the Sprint.
- Velocity – this is the well-known metric that is calculated as a three-Sprint moving average of the sum of the story points of the user stories completed during one Sprint. Velocity cannot be compared across teams, but it can be used to determine if a team is getting more and more points DONE during a Sprint. Once a team has reached and is maintaining an “Achieved DONEness” of 100%, they should begin to focus on improving velocity (while continuing to maintain Achieve DONEness at 100%). This is accomplished through continuous training, improving team skill sets, improving practices and processes, and implementing tools that reduce or eliminate the effort of common tasks.
- Defects – this is a measure of the total number of defects open against the software changed by the team or for which the team is responsible. This metric balances against velocity to ensure that improved velocity is not gained at the cost of decreased quality. This metric is sometimes modified to count only highly critical defects. However, I caution against it — modifying the metric to count only certain defects introduces (or sustains) an organization dysfunction that if we only count certain metrics, the number will look better. This is usually suggested in situations where the defect count is quite high and indicative of a culture that has long allowed defects to accumulate rather than holding a hard-line against their accumulation.
This then gives you a triad of values that can provide a somewhat clear picture of team performance (again, I caution against using the metrics for anything except an early-warning system — do not use metrics to say if something is good or bad. If you want information, look more closely). I recommend using the triad in this manner:
During the Sprint Retrospective, review the state of the metric values over the past three or four Sprints. Look at each number and make some decisions.
- If “Achieved DONEness” is less than 100%, the team should take steps to improve the involvement of the Product Owner in the Sprint and to improve communication with the Product Owner during the Sprint.
- If “Velocity” is decreasing, the team should start asking questions about what is causing the increased effort. Why is fewer and fewer story points being completed during the Sprint? Causes and follow-up actions should be discussed.
- If “Defects” is increasing, the team should start reviewing root causes for some of the defects. The question that should be asked is, “how did this defect make it past the team’s testing?” Causes and follow-up actions should be discussed; if necessary the team’s DONEness definition may need to be updated.
No matter what you do, be careful with metrics. They are at least as likely to cause harm as to do good. Metrics should always be connected to goals. In other words, the goal comes first, then the metric. Make sure your metrics make sense and ensure that the metric values are open and accessible to your Scrum team. Don’t keep them in the dark. Lastly, use the metrics to monitor and improve — don’t use metrics as a threat against a poor evaluation or a carrot to encourage good performance. It just doesn’t work and it makes everyone frustrated.
You can read more about metrics and how to use them in my book, Enterprise Agile Development, in the chapter entitled Determine Transition Goals. If you want to learn hundreds of tips and advice like this, consider Artisan Agility’s The Leadership Edge training system.