Setting targets… the wrong ones
As we initiated OPLA we decided to put not just our product but our approach to the test. We knew that what we wanted was to have our riskiest classes well tested, thinking that this effort would yield the highest ROI. We set ourselves a difficult but inspiring target — a race to 10,000 test points.
This ambitious target put a premium on "easy agitation," where you agitate a class and then focus on converting observations into assertions, without a lot of effort at improving coverage or considering the outcomes. While easy agitation provides value quickly — just "think a little, click a little" — our early success created a mentality of "don't think, just click," and we ended up with an unfortunately high number of assertions that were either useless or incorrect.
Despite the flaws of our approach there was some good news:
And of course we were able to achieve the usual dog food goal of surfacing the kinds of bugs that only come up through real use.
Out with the bad, in with the good
At the end of our 10,000 point sprint, we used the dashboard to view what we had wrought. Because we could review runs from multiple nights on multiple platforms it was clear that we had gone astray with our emphasis on just counting; in any given run we had dozens of failing assertions. After a quick look we realized that most of these failures were from bogus assertions, and that this was preventing us from seeing the assertions that were flagging real bugs. So — power of the dashboard to the rescue — our next target became to remove these invalid assertions.
Reviewing the results of nightly runs we identified the classes with the most failing assertions and focused on those first. We quickly reached a point where we had few enough classes with failures that we could track the source of the failure. We put the failures into one of four buckets:
To help identify who was responsible for the assertions in a given class we added the owner field, making it clear who should do the follow up.
Immediately after we started tracking these numbers we made the finding we had hoped for — our assertions were finding real bugs! The months of toil had paid off, the product worked, and we could use it to test itself. That meant it was time to take it to the next level …
Continuous agitation
The value proposition of developer testing is that the earlier you find bugs, the cheaper they are to fix. So now that our assertions were catching real bugs, how could we get that feedback as quickly as possible? Our solution was to use CruiseControl and the approach of continuous integration to give us "continuous agitation".
To make this work we started by expanding our command-line interface to give us more control over the classes to agitate. To use this interface in our system created Ant tasks to wrap our command-line interface. We already had an Ant script for all the project tasks (pull source, compile, jar, javadoc, etc.) so it was a simple matter to add an agitation target as the final link in the chain. Integrating with CruiseControl was straightforward, with a couple of new .xsl files giving us the reporting we wanted.
Having put these pieces together we get immediate feedback on any regression bugs. Within minutes of the code being committed the developers get an email of the results, including a custom dashboard of any failing assertions.
Conclusions
We feel OPLA has been an extremely valuable exercise from its inception. In addition to surfacing product bugs through use, it provided a good validation of the Managed Developer Testing approach, demonstrated the value of the easy agitation, the danger of a naive target metric, and suggested several important features that have now made their way into the product.
The experience also reinforced a key insight into developer testing: the value of the tests is not only in the bugs you find while writing them, but also in the bugs you catch by executing them. Any developer testing effort should consider the automatic reporting of test feedback to developers an essential part of the infrastructure.
The system continues to evolve, but this is how we use Agitator day–to–day. Dog food indeed!
Posted by Jeffrey Fredrick at January 8, 2004 03:26 PM
TrackBack URL for this entry:
http://www.developertesting.com/mt/mt-tb.cgi/106