Developer Testing: Is it Wise To Aim for 100% NTF ?

February 18, 2005 - Is it Wise To Aim for 100% NTF ?

10 steps on the journey to the perfect metric.

Measuring NTF

The FMA recently issued a study that showed a high correlation between a project's Notional Temperance Factor (NTF) and the number of broken builds. The whole industry is a-buzz with discussions of whether you should or should not try to achieve 100% NTF or indeed whether it is even possible.

We decided to do a trial into the effects of NTF. The first thing we did - as you should do for any metric - is measure our NTF. Bob Martin has built a Fitnesse module for measuring NTF so we downloaded it and found that our NTF was only 43%. Everyone agreed that we needed to improve but how to go about it ?

Setting Targets

"Be careful what you ask a Marine to do, because he'll die trying."
US Marine Corp via Semper Fi Consulting [pdf]

We tried setting targets. We modified Uncle Bob's Fitnesse module to let you enter a target value which turns red if you don't meet it. The industry seems to have settled on a target NTF of around 75% so that's where we set our target. Within a week, we had achieved an NTF of 76%. But there was no change in the frequency of broken builds. We set the target higher - 95%.

NTF

95.0 expected

82.0 actual

By the very next week, our NTF had increased to 80%. The week after that - 82%.

But then it dropped precipitously. To 50%. 45%. Finally it settled down around 28%. The number of broken builds went up dramatically though. Lesson One - if you set targets, make sure they are achievable (especially if there are Marines involved).

Lesson 1 - If you set targets, make sure they are achievable

"Good tests fail"
The Developer Testing Paradox - Alberto Savoia html

If your red lava lamp is always on, the target is too high. If the red one never comes on the target is too low. You have found the sweet spot when your red light flashes on from time to time but you are mostly bathed in a pleasant green glow. Alberto captures this idea more succinctly - "Good tests fail".

Monitor it More Closely

Alberto found a paper suggesting that we could improve our NTF by using the Hawthorne Effect so we tried a little experiment. We had Mark and Rob keep track of all activities that might affect NTF and, sure enough, their NTF increased. We tried the same thing with Scott and the same thing happened. Matt suggested that we use the Hawthorne Effect on everybody and after just three days, NTF soared to the dizzy heights of 81%. There was still no change in the frequency of broken builds and within a week our NTF was back down to 28%. Lesson two - You can improve performance in a certain area just by measuring it - but the improvement won't be sustainable unless you change fundamental behaviors. Time to try a new approach.

Lesson 2 - Beware the Hawthorne Effect

NTF Over Time

We tried plotting NTF and broken builds against time.

The chart shows that NTF is steady all week but it drops noticeably (accompanied by an increase in the frequency of broken builds) on Friday afternoons. "What happens on Friday afternoons that could be causing that ?", we wondered. We put a big visible version of the chart up on the wall and by the end of the month, there was a distinct improvement.

Just having the chart around made people more aware of both positive and negative changes in NTF - and they made a concious effort to make the trend positive. This led us to speculate on a third lesson. Trends are often more interesting than isolated values.

Lesson 3 - Trends are more interesting than isolated values.

We decided to break down the NTF number by developer and put the resulting chart up on the wall without comment.

Martha - 93
David - 51
CP - 47
Mark - 47
Kevin - 47
Scott - 43
Ken - 41
Roongko - 31
Ashish - 31
Rob - 29
Mark - 23
Jeff - 21
Dan - 12

Of all the things we tried so far, this was by far the most effective. Dan couldn't bear the fact that he was at the bottom of any table and he set himself to the task of moving up.

Martha - 93
David - 87
CP - 79
Mark - 75
Mark - 73
Jeff - 66
Dan - 60
Scott - 65
Ken - 63
Roongko - 59
Ashish - 52
Rob - 52
Kevin - 47

Whoa! Now I was at the bottom - I had some work to do ! In no time at all everyone had improved their numbers. Peer pressure had achieved what management pressure could not. Give the team the information they need to do their jobs and trust that they'll do the right thing.

Lesson 4 - Peer pressure is more powerful than management pressure

That magic number of 100% was now within reach. The boss tried to rally the team - "One more big push and we can make it!" It was a very inspiring speech and by the end of it we were ready to storm the gates of Harfleur. Improving NTF was our focus for the whole of the next week and by the end of it we were at 99.3.... but ..... all other work had stopped. No bugs were fixed, no features developed. Even the previously mighty flow of documentation slowed to a trickle. Everyone was grumpy.

Lesson 5 - The best is the enemy of the good - Voltaire

and

Lesson 6 - You get what you measure

The worst of it was that the frequency of broken builds had not changed one iota. The whole thing reminded me of that company that noticed the correlation between running very fast and sweating. They claimed that the faster you run, the more you sweat and consequently introduced a product that made you sweat more. You didn't run any faster - you just had to change your t-shirt more often.

Lesson 6 - Choose your metrics wisely

and more specifically

Lesson 7 - Are you measuring a cause or an effect ?

By now we were ready to drop the whole thing. Things soon went back to normal. We modified our cruise control job to send out a trend chart and a ranking by developer if there was an sudden change in the value or if there was a sustained, downward trend. But we stopped stressing over NTF.

A couple of weeks later, Jeff and I were chatting about the whole experience at our Friday beer bash. Our build process was actually fine. We had wasted a lot time trying to improve something that didn't really need improving. Jeff proposed a new lesson in the form of a toast : "If it's not a problem - don't measure it" and we all raised our glasses to that one.

Lesson 8 - If it's not a problem - don't measure it

We never did find out why there were more broken builds on Friday evenings - but it didn't really matter because everything was back to normal by Monday morning.

Posted by Kevin Lawrence at February 18, 2005 02:08 PM

Trackback Pings

TrackBack URL for this entry:
http://www.developertesting.com/mt/mt-tb.cgi/149

Comments

Have your tried measuring the number of broken builds?!? ;->

Posted by: Jeff Grigg on July 19, 2005 11:08 AM