“Measurements are made of the things which are easy to measure, leading to strong pressure for improved performance along these dimensions. At the same time, no measurements are made of the things harder to measure, and no pressure applied there. However, for any given dimension of performance, "ease of measurement" does not necessarily correlate well with "criticality to business results". In fact, the correlation usually goes the other way.”
“Measurement efforts therefore tend to have adverse effects, because people respond to the differential pressure by slacking off on the aspects not measured, even if they are critical to business results.”
We are all familiar with this effect. The cannonical example in our industry is coverage. Coverage data gives us insight into how complete our tests are but, as soon as you start to set coverage targets, the targets take on a life of their own and exert a powerful force on the testers. Coverage becomes the goal rather than a proxy for the true goal of having more complete tests. This is a widespread phenomenon but I would hate to give up on the excellent benefits that coverage data provides because some people might misuse that data. I'd rather educate people on how to not misuse coverage data. Brian Marick has a lot more to say on this topic [PDF].
But I am more interested in Laurent's next remark...
“An excellent example is software quality. "Productivity", roughly defined as the number of features (or worse, lines of code) delivered per unit time, is all too easy to measure. "Quality" is much harder to measure, because even at its simplest it consists of several distinct dimensions, such as customer satisfaction with the product delivered, programming defects (i.e. "bugs") detected during the development process or after deployment of the product, and various "ilities" such as maintainability.”
One of the core ideas of Extreme Programming is that high quality makes you more productive in the long run. Unit testing doesn't slow you down - it's an investment in the future. Same story with refactoring and acceptance testing and pair programming and several other of the XP practices. If your timescale is measured in days they all appear to slow you down but, if it's measured in months and years, the higher quality should enable you to turn out features faster over the long term. So how do we measure that? Isn't that just Project Velocity? Can't we use velocity as a proxy for quality? If a project's velocity drops, poor quality is often the culprit and, after all, it's the rate of feature delivery that we really care about. If your velocity drops, look for ways to improve the quality. Better still, don't let the quality drop in the first place. Coverage targets can help (but see above).
This solves the problem of how we measure productivity within a particular project but my story points are not the same size as your story points. How do I compare my productivity with your productivity? That's a problem I have struggled with for a long time. If someone out there has the answer, I'd love to hear it.
Posted by Kevin Lawrence at April 16, 2004 11:17 AM
TrackBack URL for this entry:
I don't think we can measure "quality" from within the development organization. In fact, "quality" like "simplicity" is a word that I don't think has a useful place in a choice-making discussion. Evaluation of either of these psuedo-criteria depends crucially on the assumptions you make and the boundaries of your discourse. As an example on the "simplicity" side, see the note on "Fight Complexity with Complexity" -- by drawing the boundary of discourse larger than "just this test code", superficial complication became actual simplification.
Apply this argument to quality. There is a kernel of quality that has to do with craft, and can be recognized and evaluated from within the development framework; but that's an aesthetic judegement, and we should cherish it as an intrinsic reward of our work. The other part of "quality", which can be measured and managed as a business process, could more properly be called "fitness to purpose". Using this label makes it obvious that to ask the question "how much X do we have" only begs the question "what is the purpose". So we can't measure "quality" as a process driver; we have to discover what the business needs from development, and then work jointly to measure our performance on those goals.
I think that this will often be a joint process of discovery; the things that people really need are often so fundamental that they don't articulate them. Particularly, when evaluating process, think about *-ities (flexibility, predictability, time-to-market, maintainability) as well as features.
Posted by: Roger Hayes on October 11, 2004 07:32 AM