In this article, I describe my team's dog-food-eating experience with Agitator, a very advanced unit test generator for Java (if I may say so myself ), and our company's flagship product. The creation of Agitator was motivated by my experience and my belief that the best way to improve software quality and reduce development costs is to implement a developer/unit testing program. The idea of having a software project where every individual component has its own set of unit tests, contributed by the developers themselves (the people who should know that code best), and run by the developers before system integration has always been very appealing to me. The unit tests would not negate the need for integration or system tests, but would make those activities more effective and more efficient because most unit-level defects would be caught early in the development cycle and fixed before integration.
The challenge with this approach is that hand-coding a thorough set of unit tests can be very time consuming and somewhat tedious. Furthermore, time required to develop, maintain, and run unit tests is often the first thing sacrificed when the schedule gets tight. As a result, most developer-testing initiatives fail in the long term, are only adopted by a fraction of the developers, or achieve an unsatisfying subset of their original objectives.
We designed Agitator to automate the majority of the effort required to develop, maintain, and run a thorough suite of unit tests. In fact, Agitator is particularly effective at automatically generating the portions of test code that developers find most monotonous and unrewarding.
Our hypothesis was that by using Agitator, a development team can have their cake and eat it too: they could reap all the benefits of having a thorough suite of unit tests without taking an excessive amount of time and resources from other core development activities.
By using Agitator on itself (which we refer to as self-agitation) we had an opportunity to test our hypothesis first-hand.
We were eager and prepared to eat our own dog food; unfortunately, we had to wait until development of Agitator reached a major milestone: we had to implement enough functionality to enable Agitator to agitate itself. Because Agitator is a large and complex program (with almost 2,000 Java classes at the time of this writing), in addition to basic features, we had to meet non-trivial scalability, robustness, and performance objectives.
During this period we kept busy by applying whatever Agitator functionality was available at any one time to a growing collection of test and demo applications, including several open source applications and portions of Agitator itself.
The lessons we learned during this period were invaluable, and led to several changes in the design and architecture of Agitator, but I cannot consider it a proper effort to eat our own dog food, because the food was still cooking and we could not use it with the depth and breadth we had intended.
This was just an appetizer.
After a few more months of development, Agitator reached the point where it could reliably generate tests for itself. There were still some rough edges both in the user interface and the engine, but we had enough functionality and reliability to start self-agitation. We labeled the release pre-alpha, and I asked developers to start using it to generate unit tests for their own code.
In the process, we encountered another obstacle to overcome: we could not run a regular version of Agitator on itself because of self-references and name-space conflicts (sort of like a snake eating its own tail). We solved the problem by modifying our build cycle so that now every time we build Agitator we create two executable versions: Agitator and EvilTwin. EvilTwin has the same functionality as Agitator, but uses a different name space to prevent conflicts, and when we test we run EvilTwin against the Agitator code.
This was an exciting – and exceedingly busy – time. Every day we experienced nice surprises as well as unexpected problems. For a while it seemed as if every positive surprise was accompanied by a problem:
As a manager, I tried to balance the team's mood between enthusiasm for Agitator's successes and concern for its shortcomings. We celebrated the former and took action on the latter. We discovered, for example, that Agitator and JUnit complemented each other beautifully. JUnit worked particularly well for test-driven development and for crafting test cases that require a very specific sequence of operations and outcomes. Agitator, on the other hand, excelled at pushing developers to think about corner cases and exceptions, making sure that all partitions were exercised and tested, alerting them when code changes required test updates, and giving feedback on code coverage and overall quality of the tests. We therefore augmented Agitator by providing full support for analyzing and running JUnit tests and integrating JUnit results and coverage information with those from Agitator.
Over the next few weeks, our collection of tests kept growing and we had mounting evidence that we had the foundation for a useful and usable tool. For me, the most pleasant surprise is that developers did not complain as much as I thought about having to deliver tests along with their code. All in all, things were looking pretty darn good. Because it was new, exciting, and somewhat fun to use, Agitator managed to hold the developers' attention and commitment to unit testing longer than I had experienced with approaches where the development of unit tests was a purely manual effort. The tests were also more thorough in terms of coverage, especially in the areas where manual tests fall short (for example, testing exceptions and corner cases).
Little by little, however, the novelty wore off and I noticed that some developers were drifting back to their old ways. In some respects unit testing is like physical exercise; it's one of those activities that we know is good for us, we feel good when we do it, everyone agrees that it's highly beneficial and that if we don't do it we'll pay a heavy price sooner or later, but somehow it's difficult to stick with it. Could it be that, as with diets and exercise, it's much easier to stick with a program if you have a clear set of objectives, regular feedback on progress, as well as coaching and. maybe. some peer pressure? Of course it is. As a matter of fact, we had always planned to write a developer-testing dashboard to collect and report on the overall coverage and status of our unit-testing effort. However, I now realized that such a dashboard would be much more than a pretty reporting mechanism. The dashboard would be a key component for ensuring the long-term success and effectiveness of any developer-testing effort.
The developer-testing information I wanted to collect and present in the Management Dashboard was pretty straightforward: code coverage information, test assertions coverage, percentage of classes and methods with tests, etc. But I also wanted something extra. I wanted a metric that I could use to rally the team, something exciting, something simple to understand and report, preferably with big numbers. I knew that it would not be much fun to rally around the cry of: "80% code coverage or bust!" That's when I thought of test points. The Management Dashboard would count one test point for every Agitator or JUnit assertion in the body of tests. I announced the first big, bold target for our unit testing effort: 10,000 test points.
When the first prototype of the developer testing dashboard was completed and I looked at the initial set of results, I was absolutely thrilled. I felt like Antony van Leeuwenhoek (the Dutch inventor and father of microscopy; the first man to see bacteria, yeast, blood cells, and many other micro-organisms.) The combination of Agitator and the new Management Dashboard allowed me to see and analyze unit-testing information in a way, and with a level of detail, that I had never seen before.
I had used test coverage analyzers before but this dashboard, customized for unit testing, gave me so much more. Code coverage is only part of the equation. I wanted to know if the code that was covered was also tested (that is, whether any assertions were associated with the coverage). I wanted to know if all expected exceptions were triggered and tested. I wanted to be able to correlate code complexity and risk to both code and assertion coverage. I wanted to be able to set specific unit-testing targets and see how we were doing against them. And I wanted to know how well individual developers were meeting their own testing objectives. The Management Dashboard gave me all that. If you believe that every developer in your organization should contribute to the unit-testing effort, that every unit of code in your system should have a test associated with it, and that the test should provide adequate code and assertion coverage, you need such a dashboard to help you achieve those objectives.
This section shows some sample reports from the Management Dashboard with actual data for our core Agitator classes. For these classes we set a very ambitious target: to have test assertions for every class and every method.
The Management Dashboard gives an overview that shows both the state of tests for the entire project and the complexity of the classes and methods in the project. This report shows the test targets we set and our progress against each target. For this project we agreed to work toward having test points on every class and every method.
As you can see from this mid-project snapshot, the Test Targets table reveals that we've made significant progress, but there's still some work for us to do:
The overview report also shows the distribution of code coverage for the classes in the project. As before, the Management Dashboard lets me know that some of the classes need additional attention:
And the following table summarizes the results of the last test run so we can see how we did:
You can click on one of the numbered links to see which assertions failed:
Since I believe that overly complex code leads to poor testability and a higher incidence of bugs, the Management Dashboard also shows you the complexity distribution for both classes and methods. The following graph shows method complexity. The ten methods with red bars might be candidates for refactoring:
In addition to the overview report, the Project Dashboard provides details about the test status of each package:
And another report shows detailed information about each class:
You can also see a list of developers on the project and the overall status of the classes assigned to each developer:
You can click the name of a developer for details about the classes for that developer:
And if you really like detail, you can drill down to the method level and get precise information on the code coverage, assertion coverage, etc., for each method in a class.
Hopefully, you can now understand why I was so excited when I first saw our dashboard results. For the first time in my software engineering management career, my team and I have a way to really know the thoroughness of our unit tests. If your team believes in developer testing, and is investing time to develop and maintain unit tests, you need a way to manage that effort. To me that means being able to set specific, objective targets for both individuals and for the entire team, and then to be able to monitor and review progress against those targets with enough information to take action.
Armed with Agitator and the dashboard, our team of a dozen developers charged for the ambitious goal of 10,000 test points in only a couple of weeks. Everybody started writing tests left and right and our test point counts started going up dramatically. I charted the daily progress in test points on a big whiteboard in our main development area. The chart looked like the NASDAQ index during the Internet bubble.
The good news is that we achieved our objective of 10,000 test points and we celebrated with an offsite team lunch at a nice restaurant. The bad news is that we got a bit of indigestion — from the tests, not the lunch. In the rush to score test points, we had sacrificed test quality for test quantity. Some tests, for example, had unnecessary dependencies on a specific OS or directory structure and failed when executed on another system. And some of the tests, which were designed to catch unit–level bugs in the application, had unit–level bugs of their own. In the end, it took each developer several tedious hours to clean–up their tests. The lesson was clear: like any other programming task, the creation of high–quality (robust, portable, etc.) tests requires time and thought. Agitator can greatly accelerate the development and thoroughness of unit tests by automating most of the activities that don't require human understanding, intelligence, and creativity, but you still need to invest time and thought to direct the automation and to make sure the results the results are correct, robust, and maintainable.
We also had to worry about another type of indigestion: when you have very thorough code coverage, matched with tens of thousands of test assertions, and the code under test is changing daily, you are guaranteed to find a lot of issues when you run the tests. The test suite will detect proper bugs, but it will also work as a very sensitive change detector — it will show you when you changed a method or class name, or added a new exception, and forgot to update the tests. This is the price you have to pay for having great test assets; they won't let you get away with much.
I believe this is a small price to pay for the comfort of knowing that your code is thoroughly exercised. You can make the price even smaller by running the tests as frequently as possible. As a minimum, I recommend nightly test runs; running the tests several times a day is much better. The situation you want to avoid is running the tests for the first time after several days of development and find out that you have to address dozens or hundreds bugs and/or changes all at once.
After the initial excitement of using Agitator to create unit tests more quickly than we ever thought possible, and using the Management Dashboard to set and monitor developer-testing targets with the level of detail I had always wanted, things settled down. Developers had learned how to tame the power of Agitator by being more careful and deliberate in the creation of tests. I had learned to be more careful and deliberate in using the Management Dashboard to set objectives that would not cause us another case of testing indigestion.
The Agitator and the Management Dashboard graduated from being cool new toys to essential components in our development organization toolkit. After having used our new technology for a few months, I can't imagine living without it. Thanks to Agitator we now have over 20,000 test points in our Agitator suite and are adding more each day. Thanks to the Management Dashboard, we know exactly which portions of the code have solid unit tests and which portions need more work.
Every night we do a full build and run the entire set of tests; knowing that each build goes through more than 20,000 test points, and several hours of testing gives me an unprecedented level of confidence in our code. We also practice continuous integration, and use Agitator on several intra-day builds to test any classes we have added or modified, which has helped us catch and fix many regressions in a matter of minutes. Without this unit-level regression testing, I am convinced that dozens of unit-level bugs would have gone undiscovered until system testing (or worse customer testing) where tracking them down and fixing them would have cost us orders of magnitude more in terms of effort, schedule, and headaches.
The transition from initial excitement to routine usage is the sweet dessert I was hoping for in this dog-food-eating experience. Unit testing has become a fact of life in our organization. Every developer understands that it's his or her responsibility to deliver thorough unit tests along with their code, and they can do it without taking excessive amounts of time from other development activities. Based on my own previous experiences, and from having talked to dozens of organizations that are trying to achieve the same results, I have no doubt that we could not have accomplished this without the help of Agitator and the Management Dashboard.
Before we leave the table, I would like to share one last realization. The focus and measure of success of a developer-testing program should not revolve around how many bugs are found by the tests. You cannot control how many bugs are in the code. Furthermore, as developers gain practice with unit testing and Agitation, they will introduce fewer bugs to start with. If a developer knows ahead of time that Agitator will automatically test a particular piece of code by invoking it with, let's say, a very long string, they will make sure (consciously or subconsciously) that their code will handle the anticipated test. Over time, they will become sensitive to, and stop introducing, entire categories of bugs.
Another great benefit of unit testing is that if developers know that they will have to create tests for the code they are writing, they will write more testable code. And, generally speaking, code testability goes hand in hand with maintainability and overall quality.
So, instead of focusing on bugs and using bug-related metrics as a measure of success, you should focus on building solid unit tests assets, making sure that each developer delivers high quality tests along with their code, and that those tests are run as frequently as possible.
Did Agitator allow us to have our cake and eat it too? Our hypothesis was that Agitator would allow us to create and reap the benefits of a thorough set of unit tests without taking excessive amounts of time out of development. Did our eat-your-own-dog-food experience confirm the hypothesis? For me the answer is a clear and unambiguous yes — but only after we added the Management Dashboard to the equation. Before the Management Dashboard, our use of Agitator was uneven and unfocused, and we had no overarching team objective. We had no doubt that Agitator was helping us to prevent a lot of bugs and catch a significant number of regressions, but it was clear that we were not getting the most out of it.
By making the results of unit testing highly visible at the project level, and enabling us to set specific goals, the Management Dashboard ensured even and steady growth in our unit tests. In addition, project-wide objectives (such as reaching 10,000 test points) created a sense of mission and team spirit and provided some very healthy and friendly peer-pressure.
The need for a Management Dashboard was probably the major lesson we learned, but there were many other useful and related ones:
I believe that we are at a major crossroad in software engineering. Thanks in part to the growing popularity of eXtreme Programming (where unit testing is a key component) and the popularity of unit testing frameworks like JUnit, developer testing is finally gaining broad acceptance (or perhaps it would be more appropriate to say that it's encountering less resistance).
This is great news, and we should jump at this opportunity, because I am convinced that the best way to improve both software quality and the economics of software development in general is to give developers the responsibility for delivering high-quality, reusable, unit tests along with their code.
My mission, and the mission of my company, Agitar Software, is to leverage the current interest, enthusiasm, and activity around developer testing, and to play a key role in promoting, supporting, and advancing developer testing ideas and technology until it becomes a fully established, accepted, and broadly adopted practice. My dream, and my expectation, is that a few years from now it will be unthinkable to have a commercial software project that does not require unit testing, and does not leverage some form of unit-test automation as well as some form of developer-testing dashboard to manage the effort.
Posted by Alberto Savoia at March 15, 2004 02:47 PM
TrackBack URL for this entry:
http://www.developertesting.com/mt/mt-tb.cgi/115