Rubric: Software Engineering : Factual Claims : Defect Cost Increase : Wolverton Ratios
See previous note on the IBM Systems Sciences Institute
In absolute numbers, the Wolverton are as follows: 139:455:977:7136:14102, claimed dollar costs of fixing an "average" defect. (Itself an absurd claim, see Leprechauns, I should perhaps write more on that.)
Normalizing to "if it costs one unit to fix at the requirements stage", these work out to 1:3:7:50:100 (requirements, design, coding, testing, maintenance)
It pops up in many books and articles and in various forms, for instance a very 1990s-looking Excel 3D bar chart.
The Big Puzzle is that a bunch of later article and books attribute these ratios to a paper by Boehm and Basili "Software Defect Reduction Top 10 List" which, it is easy to verify, does not contain these numbers. (It's a whole two pages long.)
Ergo, these later authors who are citing Boehm and Basili actually HAVE NOT READ that paper are have just copied and pasted a citation which flattered their existing biases.
What is investigated here is "what exactly happened", a forensic investigation. The crime is how little attention we are paying, as a profession, to the question "what process of empirical investigation generated the data we are looking at, and how reliable was that process".
Listing the works chronologically kind of spoils it as storytelling, since the investigation actually happened in reverse: coming across the claim in relatively recent articles, asking "where did this came from" - and asking it again, and again, and again. This document is a reference, not a write-up.
PDFs of "Top 10 List" paper: https://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.78.pdf http://www.cs.cmu.edu/afs/cs/academic/class/17654-f01/www/refs/BB.pdf
"Tutorial, Quantitative Management: Software Cost Estimating"
This was a tutorial given at the inaugural COMPSAC conference. A valued friend with access to a university library holding a copy of the book was kind enough to send me a photo of the relevant page:
Apparently the origin of the "data" is some software portion of the Safeguard anti-missile program: https://en.wikipedia.org/wiki/Safeguard_Program
The text credits a "W. E. Stephenson" as having collected data on Safeguard, but the chart itself cites a "R. O. Lewis" as the source of the error cost data specifically. These should therefore more properly be called the "Lewis ratios" - Wolverton's name was the one I found in an initial and slightly less tenacious investigation.
We know this is the source because it next appears cited by:
https://apps.dtic.mil/dtic/tr/fulltext/u2/a104249.pdf
"Analysis of IV&V data"
Emphasizing the importance of early detection, Wolverton (Reference 30) cites figures stating that a design change costs, on the average, $977 to correct during code and checkout and $7136 during test and integration (1975 figures).
p87: $195:$489:$977:$7136 or 1:2.5:5:36.5
p89 extrapolates this to add a phase @ $14655
Exact same method of calculating a ROI for investment in IV&V that we'll see later on in NASA docs (which I've blasted as being "Flaubert math", cf https://www.lesswrong.com/posts/tggnLEXxrTDWQwDL3/rocket-science-and-big-money-a-cautionary-tale-of-math-gone)
The name "Safeguard" appears (without citation or explanation, it seems) in Boehm's "Software Engineering Economics", providing some of the data points on this chart:
The Safeguard data and some of the process for collecting it is described (over 15 years later!) in Robert O. Lewis' book "Independent Verification and Validation: A Life Cycle Engineering Process for Quality Software"
http://web.archive.org/web/20030404212123/http://www.cigital.com/solutions/roi-cs2.php
Case Study: Finding Defects Earlier Yields Enormous Savings
Seems to be "patient 0" for the numbers and the Excel chart and attributing the costs to Boehm & Basili (or possibly, Capers Jones, or Humphrey)
The attribution to Basili & Boehm is obviously bogus.
The attribution to Jones is bogus when you know the slightest thing about Jones, in particular that he's virulently opposed to cost per defect metrics.
The attribution to Humphrey is quickly disproven by looking inside the book (which admittedly people would have needed to buy the book for, today I can use Google Books).
The numbers are off from the Radatz numbers but clearly no coincidence, I suspect they were fudged a bit to avoid the appearance of "round" ratios
"Best Practices for the Formal Software Testing Process"
Page 21/22: "data from a study by J.W. Radatz", this is how I found Radatz in the first place
The numbers are a bit off, 194:489:997:7136, possibly honest transcription errors
"We did nothing wrong"
https://simson.net/ref/2006/csci_e-180/ref/Baseline0304-DissectionNEW.pdf
Influential article? Redrawn version of the Cigital chart
https://elib.uni-stuttgart.de/bitstream/11682/8977/1/main.pdf
A negative result: comprehensive survey of cost factors in the literature, no mention of the Wolverton ratios
Software Testing: Testing Across the Entire Software Development Life Cycle
p14 - ugly curve "The numbers first published in 1996 were revalidated in 2001"
No chart but a table of the "cost factors"; same Everett as the 2006 book
https://www.mddionline.com/adopting-static-analysis-tools
References Capers Jones "Software Assessments, Benchmarks, and Best Practices", Humphrey, "Introduction to the Personal Software Process" in addition to the usual Top 10 - a semi-honest, "covering all bases" way of citing Cigital indirectly
https://agileelements.wordpress.com/2008/04/22/cost-of-software-defects/
Blog, crediting Capers Jones "Software Assessments, Benchmarks, and Best Practices"
http://2008.secrus.org/en/-pageid=4548&submissionid=5480.htm http://2008.secrus.org/en/etc/secr2008_andreas_golze_professional_testing.ppt
"Reduce Project risk through early defect detection", conference presentation
Excel-style chart
https://info.kpmg.us/content/dam/institutes/en/government/pdfs/2009/gov-it-projects-need-qa-iv-v.pdf
More modern-looking chart
Figure 14.2, based on data collected by Boehm and Basili [Boe01b] and illustrated by Cigital Inc. [Cig07], illustrates this phenomenon. The industry average cost to correct a defect during code generation is approximately $977 per error.
This is the Pressman of the "Pressman ratios", now in its seventh edition. Boe01b is the "Top 10" article.
It is baffling that the editorial process for possibly the foremost book in the field let this through for the 7th edition. It is apparently gone from the 8th edition, without a retraction as far as I can tell.
ftp://ftp.software.ibm.com/software/sk/pdf/SystemsEngineeringforDummies.pdf
p 49. "for dummies" means we round them out…
This is notable for corrupting the $977 into $937. So if someone is quoting "$937 during coding" at you, they're most likely referencing this Typemock infographic.
https://slideplayer.com/slide/1526223/
slide 11, the Excel-style chart
https://pdfs.semanticscholar.org/8b1f/4f33d8a6c39489a47a58f305fdbe25e1a14b.pdf
https://arxiv.org/pdf/1609.04886.pdf
"Are Delayed Issues Harder to Resolve?"
Solid negative result, still largely ignored (*)
We found no evidence for the delayed issue effect; i.e. the effort to resolve issues in a later phase was not consistently or substantially greater than when issues were resolved soon after their introduction.
(*) detailed citation analysis needed, but early results not hopeful, see below
"Source: me."
https://www.securityweek.com/how-reduce-risk-while-saving-cost-resolving-security-defects
Haha. Jim 2 quotes Jim 1 and adds "If anyone is credible on this, he is… we didn't have empirical data, but now we do."
https://slcontrols.com/justify-early-extra-investment-reduce-late-budget-overruns/
Excel style chart
"Actual data from Routh, Aetna"
Cites the "for dummies" book
https://blackfire.io/docs/book/01-introduction
PHP Code Performance Explained (book)
Classic example of quoting the Typemock $937 figure but attributing it to Boehm and Basili.
https://repository.lib.ncsu.edu/bitstream/handle/1840.20/36633/etd.pdf?sequence=1
On the Nature of Software Engineering Data (Implications of ε-Dominance in Software Engineering)
It is also useful to be able to predict issue lifetime specifically when the issue is created, since it is found earlier that delaying to resolve issues can become harder and costlier [Men17].
So here we have a PhD student working under the direction of the author of the one negative result, claiming it as his source for the positive version! I despair.
"Automatização de testes para plataformas Oracle - Xstore"
Through Figure 1.1, it is possible to verify that the error correction cost has a growth quite pronounced as a project progresses between the various phases. With this in mind, effective and efficient quality control is essential from the earliest stages and fundamentally in the phase before the start of production. [Translated from Portuguese by Google Translate]
Yet another. Feels awful to work in an industry where someone can disprove a result yet be cited a few years later as having proved it.
https://shodhganga.inflibnet.ac.in/bitstream/10603/53250/10/10_chapter%201.pdf
Thank you for this writeup - it was a fun read. I thought you might be interested to hear about a similar oft-repeated software engineering figure which is nonsense when traced to it's roots.
Sometimes people will claim an exponential growing cost to fix defects based on the development stage where they are detected of:
The origin of these figures is table 5-1 on page 5-4 of the 2002 NIST Planning Report 02-3 "The Economic Impacts of Inadequate Infrastructure for Software Testing" which is an example table with completely made up data for the purposes of illustrating how later on in the report they will attribute costs.
This made up data was cited in a number of places, including in this paper from IBM / Rational Software which uses in tin this introduction as a source apparently without understanding that the numbers were not supposed to be indicative of reality.
It shows up frequently in various blog posts etc., often without citation: 1, 2, 3