Dealing with specs and bugs
When you complete features before you start new ones, you significantly shorten the time between writing specs and implementing them and between creating bugs and fixing them. In practice, that reduced time leads to the simplified handling of specs and bugs in Kanban.
Many traditional Waterfall teams write detailed design specification documents (“specs”) for every feature, all of which are reviewed and approved before implementation starts. Detailed specs are important because in traditional Waterfall a feature may not be implemented until months after specification, and not be validated until months after implementation. If you don’t clearly document the feature in detail, developers won’t remember what to implement and testers won’t remember what to validate.
Because Kanban uses small batches, the time between specification, implementation, and validation is measured in days instead of months. At first, traditional Waterfall teams might choose to continue writing detailed specs for every feature. However, teams may find that level of formality to be unnecessary because the design is still fresh in everyone’s minds.
It’s still important to document, review, and approve designs. However, it’s often not necessary to transcribe a design and a list of considerations drawn on a whiteboard into a formal document, with detailed explanations of every point so that people remember them months later. Instead, teams often use electronic notebooks, wikis, or other quick authoring tools to capture photos of whiteboards and key points of design discussions. (My teams love OneNote for this purpose.) Those informal documents capture enough of the design for everyone to remember the main attributes, issues, and agreements. Then, as the feature is implemented, details of the design emerge and can be discussed in real time instead of being speculated upon months in advance.
As I point out in the “Troubleshooting” section of the Kanban quick-start guide (Chapter 2), some feature areas or individual features are still so complex that a detailed design document is necessary. Choosing to write a detailed spec should not be a matter of dogma or habit. It should be a decision based on the needs of the team (and the customer, should the customer require it). If a feature or feature area is unclear, or the tradeoffs and architecture are in question, you should write a detailed spec and flesh out the design. Otherwise, a quick, informal electronic notebook or wiki should be sufficient. (My teams use Word for detailed specs and OneNote for other documentation.)
In traditional Waterfall, features might not be validated until months after implementation. On a large project with hundreds of engineers, validation may find thousands of bugs. Often, validation takes as long as or longer than implementation.
Over the years, traditional Waterfall teams have devised a variety of ways to handle large bug counts:
- With limited time to fix so many bugs, each bug must be prioritized and duplicate bug reports removed. At Microsoft, we call this process “bug triage.” Team leaders representing each job role meet for roughly an hour each day and review every new or updated active bug. They discuss the impact of the bug (severity, frequency, and percentage of customers affected) and set an appropriate priority (fix now, fix before release, fix if time, or fix in subsequent release). They can also decide that the bug is a duplicate of a prior reported issue or that the bug isn’t worth fixing (usually because a trivial workaround is available, most customers won’t encounter the bug, or the fix would cause more havoc than the bug).
- Some teams have “bug jail,” in which individual engineers must stop further development and only resolve bugs until their individual bug counts drop below a reasonable level (such as below five bugs each).
- Some teams have something like “workaholic Wednesdays”—one day a week when the team doesn’t go home until all bugs are resolved, or at least brought below a reasonable level (such as below five bugs per engineer).
- Every traditional Waterfall team I’ve encountered has substantial stabilization periods at the end of each milestone or at the end of each release. During stabilization, the team (and often the entire organization) focuses on nothing but fixing bugs, doing various forms of system validation, and logging any new or reoccurring bugs they find. Stabilization for a large project can sometimes last longer than all the specification, implementation, and prestabilization validation times put together.
- Some progressive development teams might employ extensive code reviews, inspections, unit testing, static analysis, pair programming, and even test-driven development (TDD) during implementation to reduce the number of bugs found during validation. These methods make a big difference, but you are usually still left with a substantial number of bugs at the end, partially because system validation happens well after implementation, and partially because adherence to these practices varies widely among developers.
In Kanban, a bug’s life is a bit different:
- Kanban’s small batches ensure that validation happens only days after implementation, so bug backlogs are small and fixes are readily apparent.
- Kanban ensures that every task implemented has been through code review, inspected, unit tested, statically analyzed, pair programmed, or designed and verified using TDD, based on the implementation done rule that the team imposed on itself. Even reckless developers are kept in line by their own teams. (No one likes cleaning up after a lazy slob.) This further reduces bug counts.
- Likewise, the validation done rule ensures that every task has gone through integration testing and all its issues are resolved. Thus, by the time a task is done with validation, it’s ready for production use. I talk about taking advantage of this fact in Chapter 6, “Deploying components, apps, and services.”
Even though every work item completing validation each day has gone through integration testing and all its issues are resolved, bugs can still be discovered in production use, stress testing, security testing, usability and beta testing, and a variety of other product-wide system testing. However, the stabilization period required to fix those issues is far shorter, and in my experience, the number of stabilization bugs opened per feature team drops from hundreds to 10 to 20 (a couple of weeks of effort to resolve).
After adapting to Kanban, traditional Waterfall teams might choose to continue bug triage, bug jail, workaholic Wednesdays, and long stabilization periods. However, teams may soon find some or all of those practices unnecessary. In a large organization, a cross-team, product-wide triage stabilization period might still be needed (see Chapter 7), but individual teams using Kanban won’t have enough bugs to make team triage, bug jail, or workaholic Wednesdays useful.