Homepage ->
Individual Weblogs
->
Kevin Lawrence
-> Build Failures Policy
August 24, 2006 - Build Failures Policy
I just wrote a page on our internal wiki with our policy for dealing with build failures. We thought others might find it interesting so I am sharing it here (the links will be broken for obvious reasons).
Executive Summary
If the build fails, fix it.
directly from wiki ....
Red lights attract bugs
Developers have a natural aversion to breaking the build and most will do everything they can to avoid breaking it. The red lava lamp is our early warning signal that the build is broken.
Unfortunately, if the red light is
already on, we tend to ignore all the symptoms of a broken build because, after all, it's already broken. If the red light is on for more than an hour, build-is-broken becomes the natural state of affairs.
To address this problem, we are introducing a new policy. It's the same as the old one.
What to do if the red light comes on
Step 1 - See if it is your fault
Check your email. If it was your checkin that broke the build, you will almost certainly have an email from
xxxxxxx@agitar.com that has "Build Failed" in the subject.
If it was not your fault, look at the modifications list in the cc email to find the culprit. Guide them gently towards Step 1. If it is not obvious who the culprit is - or if the culprit is unavailable - fix it anyway.
Step 2 - Fix it
If it was your fault, fix it quickly. If you don't understand why it broke, ask for help.
If there is a reason that you cannot fix it immediately, make the red light go out by other means (see below).
Either way, send an email to
xxx@agitar.com and explain briefly that you are/are not fixing it.
If it was not your fault and the light is still on after several minutes, inquire gently whether anyone is dealing with the broken build and when it will be fixed.
I can't fix it right now
For acceptance level tests (system test, tiger acceptance tests, dashboard acceptance tests), it might be a while before the test can be made to pass - the feature being tested might not even be written yet. Just make a call to
bug()
in the failing test.
The build will not fail for a failing test that has an open bug. The build generates a report that separates known issues from new failures. Bugs that are scheduled for the current release will be shown in red.
Unit tests should be fixed immediately.
What exactly causes the red light to come on?
Unit tests
All the unit tests (plus the smoke tests) run on
http://cc-unit-test:8080/cruisecontrol/ and a failure in any of them will light the light.
Product build
The entire product (eclipse client and tiger server) is built on
http://cc-builder:8080/cruisecontrol/
Acceptance tests
Once the product is built, the server is installed and the acceptance tests run at
http://cc-builder:8080/cruisecontrol/
A failure in any of them will light the light - unless there is an open bug (see above).
Antelope tests
Antelope tests will only break the build if they don't compile. Failing tests should be fixed quickly though, of course. You have three options :
- fix the test
- fix the bug in the product that causes the test to fail
- regenerate the test
Is there anything that doesn't make the red light come on?
System tests
System tests for agitator and dashboard run on
http://cc-system-test:8080/cruisecontrol/ and respect the
open bug rule. They don't light the light though because the cycle time is too long.
Nightly agitator
The nightly agitator runs on
http://cc-agitation:8080/cruisecontrol/
The build will only fail for assertion failures (not outcome failures). It will not light the light.
Agitation for com.agitar.common
Runs on
http://dragon:8080/cruisecontrol/ (until it finds a permanent home).
The build will only fail for assertion failures (not outcome failures). It will not light the light.
There are too many emails!
Set up two mail filters
- send emails from xxxx@agitar.com to another folder
- mark emails as read if they have Build Successfull in the subject
Posted by Kevin Lawrence at August 24, 2006 04:20 PM
Trackback Pings
TrackBack URL for this entry:
http://www.developertesting.com/mt/mt-tb.cgi/227
Interesting policy - something I think we should adopt more rigidly too.
We have recently had the unfortunate problem of failing acceptance tests as we have rapidly built a suite of tests for our main product and some are a bit unstable. This can sometimes mask the genuine unit test failures in that product or other products built by cruisecontrol. Or, worse, a behavioral change in the software that the acceptance tests have highlighted, but that are ignored because people just think "the acceptance tests are always failing...".
How does your bug() call work? I am interested in implementing something similar. At the moment we have a suite called "failingTests" that bad acceptance tests are added to and are not run as part of the CC build. I'd much prefer the simpler mechanism of a method call in a failing test to bug() that was still logged and tracked in the CC build reports...
Posted by:
Patrick Myles on September 2, 2006 05:44 AM
I have bug() write the bug number and test name to a file bugs.txt.
7720 test.harness.rabbit.selftest.ShouldLogBugNumbers.testShouldAssociateBugNumbersWithTest
9020 test.generation.environment.CharacterEncodings.testShouldSupportMultibyteInClassNames
Then I have a custom ant task that generates an xml report by merging bugs.txt with the TEST-xxxx.xml files that the ant JUnit runner generates. It annotates the bugs with data from bugzilla. I have an XSLT file that generates an HTML report with separate sections for passing tests, failures wth open bugs and new failures.
Finally I have another ant task that reads the XML file and causes the build to fail iff there is a failure with no open bug.
Posted by:
Kevin Lawrence on September 3, 2006 09:35 AM
Neat.
Does the annotated XML report that you create and your XSLT file work as part of the cruisecontrol transforms, or is that HTML report independent of cruisecontrol's output?
So I guess you don't use the ANT junit runner to do the build failing, but rely on your last ant task to do that instead. I hadn't thought of doing that...
I don't suppose you want to share some of your code do you? ;-)
Posted by:
Patrick Myles on September 5, 2006 09:17 AM
The xml report generator and verifier are separate ant tasks. The XSLT runs in ant and generates a report that I copy into CC's archive directory.
The verifier also writes a summary to ant's logger with a link to the report.
I have an index card on my desk that says
integrate the xslt transform into CC's email
and another that says
write an ant task that does all the above in a single task.
I'd be delighted to share the code when I get a chance to disentangle it from the rest of my build code. Any day now ;-)
Posted by:
Kevin Lawrence on September 5, 2006 12:02 PM
Post a comment