ABSTRACT
Peer reviewing of software engineering work-products has received much
attention over the past two decades. While the importance of reviews is
amply demonstrated, we lack a consensus on one best process. This
stems from the observation that optimal varies across organizations,
as a function of their developers, managers, and process background. Key
processes, therefore, should not be viewed as having ideal, static solutions;
they must be flexible and maturing over time. In this paper, a progression
of review processes is delineated, each appropriate for organizations of
different maturities. Variations influenced by management styles, and implications
for process design and ongoing technology transfer are also discussed.
Keywords
Process Maturity, Formal Technical Reviews, Process Evolution
INTRODUCTION
In observing peer technical review processes over the past two decades, it has become obvious that no one review process is best for all. There are many variations on the general theme, which may be appropriate in some organizations and not in others [CRA]. This observation is in direct contrast to the SEI's (Software Engineering Institute) CMM (Capability Maturity Model), which basically prescribes a style of review process to be implemented at a particular point in an organization's overall process maturity. Characteristics of the process outlined follow the lines of the IEEE's standard for reviews [IEE]. In our deliberations over that standard in the early 1980's, only the most generic parts of a review process were retained, from the many successful variations with which the standards committee was familiar. The process defined basically elaborates on: "You plan, prepare, review, and follow-up". Little guidance is provided on exactly how to do those activities. The real action of a review process, however, is in those details.
Regardless of an organization's maturity level, however, there is a review process appropriate for it. In an organization with absolutely no formally defined development process, people can sit down together, look over one another's work, and productively find flaws in their acts of software design and construction. In such an organization, that is probably the single most important thing to do to improve the quality of a delivered system.
As an organization matures (by whatever scale chosen), however, so should its technical review process. As an organization documents its processes, defining roles & responsibilities for people and entry & exit criteria for activities, so too should this be done for the review process. As measurement and baselines are established, the review process should be updated to track data. As root cause analysis becomes understood, it should be integrated with reviews. And as the overall development process becomes better understood and more finely tuned, the standard review process has been seen to evolve into several processes, each appropriate to the very different types of workproducts within the development cycle.
Another dimension of variation among review processes is influenced by an organization's management style. Just as the varying cognitive styles of people influence how best they learn and mature, the management style of organizations influence the maturing of their review process. In an autocratic organization, a process, once somehow defined or selected, is imposed and changed by executive fiat. Management with a participative or consensual focus will have processes designed (at least in part) by the technical staff, thus opening the door to nearly unlimited process variations.
In the body of this paper, review processes for differing maturity levels are delineated. The basic outline of the CMM's process plateaus is used here, providing a common framework with the SEI's work. The influence of management style is then highlighted. The closing section notes the impact of this thinking on Software Process design and ongoing process Technology Transfer.
My particular expertise in technical review processes comes from a long consulting relationship across a diversity of development organizations. My intuition tells me that a multi-level model of a review process, as discussed herein, has analogues in other software development practices. I hope others with a breadth of experience across these other practices can profit from this model and apply it within their realms.
REVIEW PROCESSES, BY MATURITY
What follows is a typical, generic story of a review process, as it matures along with the overall set of an organization's development practices. The basic outline of the CMM is preserved, as a common framework. For any one development organization, the details may vary, but the general outline of the levels remains.
The Initial Plateau
In an ad hoc development, peer reviews can prove particularly important.
Any professional realizes that mistakes are likely in developing complex
systems, and the independent perspective of another person is key in discovering
oversights and blind spots. Babbage and von Neuman, long before "software
development" was a phrase, reviewed with their peers the logic they attempted
to implement on the earliest of computers.
An early model of software peer reviews was that familiar from referred journals: circulate a draft and consolidate returned commentary. To be effective for software, this model relies on the software artefact to be reasonably stand-alone and self explanatory. Another model, the buddy-check or desk-check, has roots in the multi-person office, where one could turn to a peer and ask them to, "look this over, could you?" In this case, as questions arose, the author was close at hand to fill in missing context and detail. Too close, some would argue. Often these practices were "recommended" to be followed during software systems development, but the general lack of fine-grained project scheduling and tracking typical of organizations at this level precluded strict following of what actually occurred.
Some of the problems noted in the ad hoc practices of this stage were that of inconsistent practice both within and across teams, and a feeling among some that the time spent reviewing was wasted since it took away from time which could be otherwise be spent coding (which is the only real productive activity at this level, of course). Generally, the code's author runs the review, since the product is their responsibility.
Up to the Repeatable Plateau
At the repeatable level, the techniques with which people experimented
at the "initial" level were improved based on experience, and generally
documented. The processes were written down in their basic form to establish
a common set of expectations across the team. The advocates of reviews
then began to make headway in achieving an initial degree of acceptance
by looking at the success stories and qualitative feedback from the "initial"
process efforts. With this process now a generally "recommended practice",
their may be some degree of planning to help integrate review activities
into the rest of the work to be done.
The review process, at this level, generally has this structure: first, material for review is distributed somewhat in advance of when the comments are needed; then, depending on the process used by the development team, the comments are consolidated by the author or by the group of reviewers as a team; the author then adjourns to independently correct the identified items. Typical processes at this level are the basic group peer reviews (of Weinberg [WEI], for instance) and walkthroughs (see Yourdon [YOU], for instance).
New problems appear. Along with the structure added to the review process, other processes in the development lifecycle are being similarly examined and recommended, and software people are now asked to write down requirements, document designs, and develop test plans, in addition to the "real work" of cutting code. Reviews, along with these other activities, are put into detailed project plans, which developers are held to follow. Under severe resource constraints, reviews begin to buckle: reviewers are given little advance review notice (since everything's running late); they often don't prepare adequately; they are given massive work products to review in a single sitting; and some identified items never get addressed because the author's deadlines are yesterday. "Quality", an attribute far less visible than cost, schedule, and functionality, is left behind as a good intention.
Up to the Defined Plateau
By now, in the overall development process itself, there is enough
data on development effort to show that maintenance is a nightmare sucking
up project resources. Because of their ability to catch problems early
and avoid potentially large downstream costs, reviewers must get serious
about this review process, the thinking goes. This is usually where the
term "formal" is introduced, as in "Formal Peer Reviews". Guidelines
are established, such as lines-of-code/hour or pages/hour to help size
a review to a realistic task. This also helps managers quantify the overall
effort involved in reviews, which turns out to be large, but not as large
as the maintenance numbers.
A common review process is once again improved to address the problems of the past. The process at this level looks much like that defined by the IEEE [IEE] or by Fagan [FAG]. Entry criteria are established (with specifics for different review types, such as design, code, test) defining amount of material to be covered in a review, materials to be distributed to reviewers, etc. Roles and responsibilities (such as reader, scribe, moderator) are assigned to review team members to improve the meeting effectiveness. Since the author has emotion and schedule vested in the workproduct, the running of the review is often given to a more independent person to act as leader. Since historically people have never prepared well, the meeting is run as a group walkthrough, with issues identified and delineated by the group as a reader summarizes the material sequentially. Follow-up activities are established to assure that identified items are resolved. Data are collected to support the cost-benefit of the reviews.
Since reviews have proven effective in avoiding problems, all the work-products of the lifecycle are expected to be reviewed. The cost of reviewing everything becomes enormous. Teams try to follow the proscribed process activities, but its overall effectiveness is lacking. In monitoring code reviews at this level, I have seen meetings where the most serious item discussed was basic syntax or structure a compiler or quick test would have identified, but the process said: "No compiling or testing before the review". A well-intended process has been over-generalized, and now its details stand in need of evaluation.
To Manage & Optimize - Quantitatively
The large scale tradeoff of reduced maintenance efforts used to finance
solid review methodologies has been made at this point. The question now
becomes how to manage and optimize within a process, the review
process in this case. An organization having been through all this now
has a base of experience and data which can drive either further changes
or the status quo, depending on how it is used. Some high-level data (see
[HOO], for example), along with probable process inferences, lead to refocusing
review efforts at different levels. Across the software profession, there
is less experience of reviews at this level, though some studies are being
published.
As an example of process managing, Christenson [CHR] uses a baseline of data and a classical statistical paradigm to establish control limits for reviews. Using test and field data as driving quality measures, he relates these back to the data from individual reviews to characterize why some were better than others, and then establishes control limits and monitors ongoing reviews to these limits. There are several problems in this approach, however. First and foremost is that development processes are human-based, not machine based, and the humans are conscious of the expectations on them; numbers can then be easily manipulated to fall within the desired norms. Another is the long-term nature of the feedback from a review, when one ultimately finds out how many of the total defects in a work-product the review found or which got through; long term feedback admits many intervening events, which may contaminate any results. Machines have a rather short-term feedback from control to output, permitting quicker loops for isolating and understanding events, and for controlling them.
As an example of process optimizing, Votta [VOT] takes a somewhat different course, using more immediate effectiveness measures to tune the process itself. In the environment studied, reviewers came to code review meetings having thoroughly prepared, with issues identified in advance, and the meetings were used to discuss, evaluate, and consolidate their findings. Over a series of many reviews, he studied the union of all reviewers' issues before the meeting, and compared it to the list of issues documented during the meeting. The effect of synergy in finding new errors as a group (an advertised selling point of the process) was shown to counterbalance the effect of items which were on someone's list at the start, but were never discussed, and thus lost in the final reporting. The bottom line in this work was that for code reviews, group meetings added little to review effectiveness. One possible conclusion is that the staff cost of the meeting time outweighed its marginal benefit, and thus should be abandoned under the circumstances studied. Another option may be to change the meeting's discussion protocol so that all issues get due consideration.
What seems to be happening as we gain more insight into review processes is that the one-process-for-all workproducts becomes less applicable as we tune the process. In Votta's environment, code reviews should probably not have a meeting. But what about reviews at other levels? Why should we expect a review of software requirements, where inputs come from a wide diversity of human sources, be run anything like a review of code, where a tight team has been working together for some time and the outline of what is to accomplished is straightforward?
To Manage & Optimize - Qualitatively
We are enamored of metrics because they give an air of objectivity
to our work. Measurements of software development are human-based, however,
and will always have a subjective element. Much can be gained and learned
from qualitative information, as well. The developers themselves, those
doing the work, can often give insights which data could never illuminate.
This is one of the articles of faith of the current-day quality thrust,
as propounded by Juran.
Work with current clients has focused not on the steps of the review process, which are rather straightforward and generic, but on the details of how best to implement each. For example, all review processes have a box labeled "Preparation", where a reviewer studies a workproduct before the group meeting. But how is that best accomplished? Rifkin [RIF] argues that software people can well benefit from the current work and insights developed in the field of program comprehension.
INFLUENCE OF MANAGEMENT STYLE ON PROCESS
Autocratic organizations are often thought to be easiest when technology transfer is addressed: import or create a process, train people, and demand it. Drawbacks though are compliance and subterfuge. Will everyone follow the process, and if policed, will people report compliance accurately? Looking at process variations in this light is not productive, since once established, there is little room for conscious change.
For quantitatively driven managers, a process must be well-instrumented to demonstrate its value, or it will not survive. One reason for Fagan's success [FAG] in implementing a software inspection process at IBM is the integration of data collection into the process. Reviews are expensive in staff effort, and to justify their ongoing existence, data-driven managers need to see review costs displayed against the alternatively postulated maintenance savings to keep reviews in place. Process designers under data-driven managers must be sure to integrate ongoing data capture, collection, analysis, and reporting into any process to justify its continuance. Of course this information is useful to know, but once established, the physical and intellectual expenses will be better directed toward understanding the next generation of process changes necessary. For example, dollars-per-defect is certainly important in balancing review effort with maintenance effort, but defects-per-page is a more valuable statistic in exploring process variations which result in better defect removal.
Managers with a participative focus will rely on people more than numbers to deliver an effective review process to the overall development environment. One way to view this perspective is that a somewhat sub optimal process without compliance problems (since the developers who created it are vested in making it work) is preferable to a theoretically better process which no one is seriously motivated to follow. Given that varying sets of people are now responsible for the ultimate shape of a process, there are many process variations which may be implemented. The implications for process evolution are:
IMPACT ON PROCESS ENGINEERING AND TECHNOLOGY TRAINING
As an overall software development process matures, and so do its individual key component processes. This paper has pulled out one well-understood key process threaded among the many of software development, and separately examined its story.
Each Process An Unending Task
The maturing of an overall software development organization is depicted
by the CMM incrementally: a few basic key processes are established, and
then on that foundation others are introduced. Much as in human development,
where a child masters a task like reading and then moves on to other things.
Process engineers therefore focus on the introduction and integration of
rather statically-defined processes, each at its appropriate place in the
maturity timeline. But the review process story, revealed here, is not
that of a static process. It is the story of a process that begins in the
infancy of software development, and matures on a parallel track with development
processes as a whole. (Experts in reading tell us those skills also
can be improved as we mature, and we should be focusing on that, too, though
we rarely do...)
An immediate lesson to all "process engineers": your job is never done. Discussions of overall software engineering often center around selecting and implementing the proper combination of key processes. However none work in isolation. Initially, they must be orchestrated as a whole. Then, on an ongoing basis, each key process must continually be fed, monitored, and allowed to mature in its own right.
Process Experimentation
Key to ongoing process improvement is experimentation, to understand
and verify improvement opportunities. In a manufacturing environment, a
line can often be run under varying conditions to isolate improvements.
Software development processes rarely give us that opportunity. Software
development projects are large and costly, and rarely is there an opportunity
to develop something more than once under controlled conditions.
Here are some rather basic issues which must be addressed for adequate software process experimentation:
Within the CMM optimization is at the highest of maturity plateaus. Unfortunately, this can lead process engineers working in organizations below level 5 to forgo experimentation. While experimentation with instrumented processes is certainly a long-term goal, there are experiments appropriate to all maturity levels, also.
At the earliest of maturities , small-scale, controlled studies can be useful to justify initial process design and establish basic ranges for process parameters. As an example, Buck [BUC] documented initial parameters for IBM's inspection process, using data from early process training classes. Operating ranges for material coverage rates and team size were indicated in the early work. These early parameters must be viewed cautiously, though, as they become used in the very different environment of day-to-day practice. Process studies are ultimately needed to confirm and tune initial estimates.
As an example of middle-maturity process structuring,Weller [WEL] consolidated three years of experience with review processes. He pulls out several large scale themes from the data, which are likely to be relevant in his environment. There are many other possible explanation s that fit the data, however, which remain undiscussed; there is little basis to judge how portable the results are to other organizations.
Speciation
The evolution of species teaches that from variation and succession,
the more fit can come to dominate. Thus in developing processes, designers
should value variation and learn from it. A less obvious lesson parallels
the evolutionary creation of several species from one, when there are varying
environmental niches to fill. I posit that this is what we are seeing in
higher maturity projects, as review processes for requirements, design,
and maintenance differentiate themselves to better meet their tasks. Hawks
and sparrows are both recognized as birds, though feeding by hunting vs.
gathering have given them very different styles and appearances.
The lesson for process engineering is to allow this to happen. Many organizations are pushing for a "common process", and the underlying standardization that implies. Consider an extreme case: should the review process for a newly developed subsystem be the same as that for a 5-line patch immediately required by a customer? It is certainly important to review both. But developing one process for both will likely lead to a process ill-suited to either.
Given that process speciation is valued and allowed, the focus of process engineers then becomes the support of many process variants, each in its own niche. Monolithic beasts are out, diversity is in.
Technology Training
In attempting to improve a review process, one must be cognizant of
how mature the reviews themselves are, and focus efforts from that baseline.
Rudimentary, industry-wide cost-benefit studies are not what's needed in
working to improve an organization already at a CMM level three baseline.
Similarly, thorough root-cause analysis will be lost on a project just
starting to collect defect data. Training is needed on an ongoing, and
updated, basis as processes evolve, to establish a common framework. But
training must focus on the delta: many organizations have at least some
exposure to reviews; few are starting from scratch. This is an ideal opportunity
for trainers to capitalize on the existing experience base. Developers
are often cognizant of needed improvements, and more experientially-based
instruction can lead them constructively onward.
REFERENCES
[BUC]F. O. Buck, "Indicators of Quality Inspections", IBM Corporation, Technical Report, Number 21.802, September, 1981.
[CHR]Dennis A. Christenson and Steel T. Huang, "Code Inspection Management Using Statistical Control Limits", National Communications Forum, Volume 41, Number 2, pages 1095-1100, 1987.
[CRA]Stewart Crawford-Hines, "Software Inspections & Technical Reviews: Transcending the Dogma", Fifth International Conference on Software Quality, ASQC, October 1995, pp. 73-81.
[FAG]Michael E. Fagan, "Design and code inspections to reduce errors in program development", IBM System Journal, Volume 15, Number 3, pages 182-211, 1976.
[HOO]Hooczko, "Taking Inspections to the Limit", STAR'95 Conference, San Diego, CA, 1995.
[IEE]IEEE Computer Society, "IEEE Standard for Software Reviews and Audits", ANSI/IEEE STD, Number 1028-1988, 1988.
[RIF]Stan Rifkin and Lionel E. Deimel, "Applying Program Comprehension Techniques to Improve Software Inspections", Proceedings of the 19th Annual NASA Software Engineering Workshop, Greenbelt, MD., December, 1994.
[VOT]Lawrence G. Votta, "Does every inspection need a meeting?", Proceedings of the ACM SIGSOFT 1993 Symposium on Foundations of Software Engineering, December, 1993.
[WEI]G. Weinberg, The Psychology of Computer Programming, Van Nostrand Reinhold Co., 1971.
[WEL]Edward F. Weller, Lessons Learned from Three Years of Inspection Data, IEEE Software, pages 38-45, September, 1993.
[YOU]Edward Yourdon, Structured Walkthroughs, Prentice-Hall, 1979.