The entire agile testing philosophy is based on the goal of having ship-ready code at the end of each iteration, where ‘ship-ready code’ means: 100% feature tested, 100% system/performance/stress tested, and zero open bugs.
This sounds like a very tall order, requiring a fundamentally different approach to the traditional practice of development hand-offs (typically of half-baked code), to the test organization. There is simply not enough time for this approach to work if we accept the above-stated objective.
Feature and system testing must happen concurrently with development, and for this to work, the development team must be making available nothing but clean code to the testers.
This chart underscores the fundamental difference between agile and waterfall in terms of the test and bug-fix approach. In agile there are no hand-offs, no phases and no large backlogs of bugs to fix as a release nears the end of its development life-cycle. Each iteration strives to deliver an increment of bug-free, production-ready code as the output of each iteration.
Let’s break this down to see what is required for this to work in practice. Agile testing starts as soon as the first User Story is declared done (not at the end of the sprint!). But for this approach to have any chance of success, re-work must be minimized. By re-work we mean the traditional test and bug-fixing cycle, characteristic of waterfall development, that starts with the hand-off from development to the test organization. Thus, the definition of done must include:
- Code clean compiles with all static analysis warnings removed
- Code reviewed, with all review issues resolved
- Story has been unit tested, and ideally units tests are automated
- Test coverage based on unit testing meets some minimum threshold
- Code and automated unit tests checked into build system, and system builds and passes all unit tests
- Build passes all predefined build tests
Next, the test team verifies the user story based on its defined acceptance criteria. The majority of stories should be passing at this point. The manufacturing analogy is the production ‘yield’, and we should be striving for the highest possible yield, say > 90%. If the yield is low (and the corresponding re-work high), then we need to dig into the reasons for this, identify root causes, and apply corrective actions to drive the yield higher. Clearly, this will not happen overnight, and may require multiple iterations, if not releases, to get there. There are a couple of additional prerequisites that go along with getting to a high first-pass success rate:
- A continuous integration environment with a high degree of automation of both the unit test and build sanity level
- A high degree of system test automation
- A continuous improvement mindset where the team routinely dissects test failures and institutes actions to push the bar higher for first-pass test success.
One of the fundamental goals of agile development is to have deployable code at the end of every iteration. Working backwards from that challenge implies that a number of technical practices need to be in place. These technical practices need to support the organization’s definition of ‘done’ at both the story and sprint level. For example:
User Story Done Criteria:
- Story designed/coded/unit tested
- Unit tests automated (Why? See below)
- Tested code checked in and built without errors:
- Static analysis tests run and passed
- Automated unit tests run and passed
- (Unit) test coverage measured and meets acceptable threshold
- Independent validation of user story by QA team
- User story acceptance criteria met
- Zero open bugs
Sprint Done Criteria
- All user stories done
- All system tests executed and passed
- All performance/stress tests executed and passed
- All regression tests executed and passed
- Zero open bugs
How on earth are we expected to accomplish all of this in an iteration lasting a maximum of 2-4 weeks? To make all of this happen, a number of practices must be in place:
- There is no ‘hand-off’ from the developers to the testers. Story acceptance testing runs concurrently with development. The QA team can begin testing as soon as the first user story has been delivered cleanly through the build system.
- Re-work must be absolutely minimized. There is simply is no time for the classical back-and-forth between QA and development. The vast majority of user stories must work first time. This can only be accomplished by rigorous unit testing.
- System-level regression and performance testing must be running continuously throughout the iteration
- Test cases for new user stories must be automated. This requires resources and planning.
- All changed code must be checked in, built and tested as frequently as possible The goal is to re-build the system upon every change.
- Fixing of broken builds must be given the highest priority.
When all of the above is in place we have something referred to as ‘Continuous Integration’. A typical continuous integration configuration is summarized in the following diagram.
In this system we have setup a CI system such as Hudson – an open source CI tool. Hudson integrates with other CI-related tools from multiple vendors, such as:
- SCM Systems: Perforce, Git
- Build Tools: Maven, Ant
- Unit Testing Frameworks: Junit, XUnit, Selenium
- Code Coverage Tools: Clover, Cobertura
Hudson orchestrates all of the individual sub-systems of the CI system, and can run any additional tools that have been integrated. Here is a step-by-step summary of how the system works:
- Developers check code changes into the SCM system
- Hudson constantly polls the SCM system, and initiates a build when new check-ins are detected. Automated units tests, static analysis tests and build sanity tests are run on the new build
- Successful builds are copied to an internal release server, from where they can be
- loaded into the QA test environment. The QA automated regression and performance tests are run
- Test results are reported back to the team
Knowing that every change made to an evolving code-base resulted in a correctly built and defect-free image is invaluable to a development team. Inevitably, defects do get created from time to time. However, identifying and correcting these early means that the team will not be confronted with the risk of a large defect backlog near the end of a release cycle, and can be confident in delivering a high quality release on-time.
Setting up a continuous integration system is not a huge investment, and a rudimentary system can be set up fairly quickly, and then enhanced over time. The payback for early detection and elimination of integration problems and software defects dramatically outweighs the costs. Having the confidence that they are building on a solid foundation frees up development teams to devote their energies into adding new features as opposed to debugging and correcting mistakes in new code.
So far we have discussed a framework for a continuous integration system that includes executing a suite of automated unit tests on every build and using the results of that testing to determine whether the build is of sufficient quality to proceed with further development activities. Ultimately though, we need to have a test environment that assures us that at the end of every iteration we have a production quality product increment.
If we go back to our simple VOD system example from the last chapter we may realize at this point that we could be facing some significant challenges. The goal is to deliver an increment of defect-free functionality with each iteration. To accomplish this requires:
- Mature software engineering practices that produce clean, reliable code at the unit and individual user story level.
- An automated unit test framework that grows with every code check-in and can be run as part of a build validation process.
- Build metrics such as test coverage that help ensure that the growing code base is comprehensively tested.
- The ability to create and execute a set of new feature test cases at system level within the boundaries of a single iteration.
- A suite of automated regression test cases that can be run continuously or at least on the final build of an iteration to ensure that work in creating new functionality has not broken existing system behavior, performance or stability.
For systems of even moderate complexity and those being engineered by multiple teams, this may be neither a practical or economically feasible proposition. In this case the question now becomes what are the priorities and what tradeoffs must be made. In the VOD system example we might have an independent System Integration and Test (SIT) team to takes the output of each development iteration and subjects it to a combination of feature and regression testing. In this structure the output of a development iteration gives us and increment which is:
- 100% unit tested, unit tests automated
- 100% user story tested at the subsystem level
The increment is then picked up by the SIT team who subjects it to:
- 100% new feature validation, end-to-end on a fully integrated system
- X% regression test coverage, where X represents the highest priority, highest risk aspects of the system.
The result: A ‘High Quality’ increment as the output of each SIT cycle. If the teams are using say, a 3-week iteration cycle, this gets us a high quality increment every 3 weeks. If the team cannot produce something they can call ‘High Quality’ with every iteration then they should consider adjusting the iteration length accordingly. If these intermediate increments are being delivered to customer labs for trial of new functionality then this can still be done at relatively low risk. The final SIT cycle can be extended to provide 100% regression test coverage. If a decision is made to deliver one of the increments before the end of the planned release, this can be done by extending the SIT cycle for that increment to provide full test coverage. Over time, the SIT should be working aggressively (backed up of course by appropriate resourcing and funding from their management teams) to maximize the automation of their regression test suites. Some categories of testing, by their nature, require lengthy test cycles, for example, stability testing or testing of high availability system configurations. Other types of tests, for example those that require direct human involvement like video quality validation, are not easily or inexpensively automated, Nonetheless delivering a high quality product increment is a goal that should be in reach of most organizations.