From the outset that metrics, in any form, have more potential to do great harm than to do the slightest good. To be clear, we’re talking about metrics that measure operations and performance, not numbers that measure the size of something. For example, when we express the complexity of a user story, we discuss “story points.” This is a measurement of size, not an operational metric of performance. On the other hand, when we add all those story points together for the stories completed in a sprint, we have “velocity” a (questionable) metric for dealing with performance.
One type of number measures something static: effort, size, length. The other type is a metric that measures economy (efficient use of inputs), efficiency (process performance), and effectiveness (comparison of actual outcomes against desired outcomes).
Going further, let’s also be clear that performance metrics should be created under two conditions (and, as far as I can see, ONLY two conditions). Those conditions are:
- The metrics are used to improve or monitor something that actually needs to be improved or is mission-critical. Lots and lots of metrics are often created for the sole purpose of creating metrics. In other words, having detailed and exhaustive metrics was an end in itself. It keeps certain people busy collecting information and regurgitating the information in the form of PowerPoint slides and posters. Ideally, metrics should only be applied to aspects of economy, efficiency, or effectiveness that actually need to be measured and improved OR are so critical to the mission of the organization that any change in performance is to be immediately addressed.
- The metrics are used with some kind of formal or informal statistical control that indicates when closer human attention is required. No matter how hard you try, metrics will never tell the entire story. For example, imagine a Scrum team with a fairly regular velocity of 18 story points per Sprint (some Sprints it drops to 16, in others, it exceeds 20). All of the sudden, the velocity increases to 30 story points for two months, and then returns to 18. What happened? There’s no way to tell just by looking at the numbers. Perhaps the team got lucky and hit a series of stories that they were able to complete much more quickly than they anticipated. Perhaps they overestimated a bunch of related stories and did them all over a two Sprint period. Maybe they got a new team member for two Sprints. The point is, the only thing the metric tells you is that something changed. The metric doesn’t tell you why the change occurred nor does it tell you if something good happened or if something questionable occurred. If you want to understand what happened, you have to look closer and ask questions.
In addition to measuring what needs measuring and being careful not to jump to conclusions when measurements change, it’s also important to understand that most metrics, by themselves, can be easily “gamed” to increase or decrease. For example, if you measure a Scrum team’s productivity solely through their velocity, the team can artificially improve velocity by simply over-estimating stories. While story points can be very powerful, because they are relative it is quite possible to “up-shift” the scale slowly to create the effect of improving velocity in the absence of any real improvement. For this reason, I suggest the metrics NEVER stand alone. In other words, if you want to measure the productivity or performance of something (a Scrum team, a program, a process), create duos or trios of metrics that are closely related but make it much harder to work the numbers.
The Scrum Team Metric Triad
Measuring Scrum Team productivity is something that nearly every organization wants to do. Having said that, I still firmly believe that you don’t need to measure Scrum Team productivity unless both of the earlier guidelines are true, that is 1) team productivity needs to be improved and 2) the metrics are supplemental to first person observation and coaching. If your organization is already happy with Scrum team productivity (that is, the team or teams are producing at least as much as management expects at a high level of quality), find something else to do with your time.
If, however, the organization feels strongly that team productivity could be improved (and this might simply be because the team is new and there’s no way to know what the team’s potential might be), here’s a trio of metrics that can be used to “safely” measure productivity:
- Achieved DONEness – this metric is a ratio of the sum of the story points of the stories that the Product Owner accepts at Sprint Review over the sum of the story points of the stories that the Scrum Team said was DONE at Sprint Review. In other words, if everything the team says is DONE at Sprint Review is accepted by the Product Owner, this metric would be equal to 1. If the team completed the equivalent of 18 story points and the Product Owner accepted those same stories, the ratio would be 18/18 = 100%. On the other hand, should the Product Owner accept 14 points, the metric would be 14/18 or 78%. This metric focuses the team on DONEness, rather than speed. This is important, as it is the improved quality of the software that the team produces that often plays the largest role in productivity gains. Teams reach 100% by internalizing DONEness, listening to the Product Owner, working more closely with the Product Owner, and giving the Product Owner more transparency regarding what is being built during the Sprint.
- Velocity – this is the well-known metric that is calculated as a three-Sprint moving average of the sum of the story points of the user stories completed during one Sprint. Velocity cannot be compared across teams, but it can be used to determine if a team is getting more and more points DONE during a Sprint. Once a team has reached and is maintaining an “Achieved DONEness” of 100%, they should begin to focus on improving velocity (while continuing to maintain Achieve DONEness at 100%). This is accomplished through continuous training, improving team skill sets, improving practices and processes, and implementing tools that reduce or eliminate the effort of common tasks.
- Defects – this is a measure of the total number of defects open against the software changed by the team or for which the team is responsible. This metric balances against velocity to ensure that improved velocity is not gained at the cost of decreased quality. This metric is sometimes modified to count only highly critical defects. However, I caution against it — modifying the metric to count only certain defects introduces (or sustains) an organization dysfunction that if we only count certain metrics, the number will look better. This is usually suggested in situations where the defect count is quite high and indicative of a culture that has long allowed defects to accumulate rather than holding a hard-line against their accumulation.
This then gives you a triad of values that can provide a somewhat clear picture of team performance (again, I caution against using the metrics for anything except an early-warning system — do not use metrics to say if something is good or bad. If you want information, look more closely). I recommend using the triad in this manner:
During the Sprint Retrospective, review the state of the metric values over the past three or four Sprints. Look at each number and make some decisions.
- If “Achieved DONEness” is less than 100%, the team should take steps to improve the involvement of the Product Owner in the Sprint and to improve communication with the Product Owner during the Sprint.
- If “Velocity” is decreasing, the team should start asking questions about what is causing the increased effort. Why is fewer and fewer story points being completed during the Sprint? Causes and follow-up actions should be discussed.
- If “Defects” is increasing, the team should start reviewing root causes for some of the defects. The question that should be asked is, “how did this defect make it past the team’s testing?” Causes and follow-up actions should be discussed; if necessary the team’s DONEness definition may need to be updated.
No matter what you do, be careful with metrics. They are at least as likely to cause harm as to do good. Make sure your metrics make sense and ensure that the metric values are open and accessible to your Scrum team. Don’t keep them in the dark. Lastly, use the metrics to monitor and improve — don’t use metrics as a threat against a poor evaluation or a carrot to encourage good performance. It just doesn’t work and it makes everyone frustrated.
You can read more about metrics and how to use them in my book, Enterprise Agile Development, in the chapter entitled Determine Transition Goals.