Morendil/degrees_of_dishonesty.md

Last active June 3, 2025 04:56

Star (9) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/Morendil/258a523726f187334168f11fc8331569.js"></script>
Save Morendil/258a523726f187334168f11fc8331569 to your computer and use it in GitHub Desktop.

Raw

Note: this content is reposted from my old Google Plus blog, which disappeared when Google took Plus down. It was originally published on 2016-05-18. My views and the way I express them may have evolved in the meantime. If you like this gist, though, take a look at Leprechauns of Software Engineering. (I have edited minor parts of this post for accuracy after having a few mistakes pointed out in the comments.)

Degrees of intellectual dishonesty

In the previous post, I said something along the lines of wanting to crawl into a hole when I encounter bullshit masquerading as empirical support for a claim, such as "defects cost more to fix the later you fix them".

It's a fair question to wonder why I should feel shame for my profession. It's a fair question who I feel ashamed for. So let's drill a little deeper, and dig into cases.

Before we do that, a disclaimer: I am not in the habit of judging people. In what follows, I only mean to condemn behaviours. Also, I gathered most of the examples by random selection from the larger results of a Google search. I'm not picking on anyone in particular.

The originator of this most recent Leprechaun is Roger S Pressman, author of the 1982 book "Software Engineering: a Practitioner's Approach", now in its 8th edition and being sold as "the world's leading textbook in software engineering".

Here is, in extenso, the relevant passage (I quote from the 5th edition, the first edition, which I do not have access to, reportedly stated "67 units" and that later became "between 60 and 100 units"; the rationale for this change is unclear.)

To illustrate the cost impact of early error detection, we consider a series of relative costs that are based on actual cost data collected for large software projects [IBM81]. Assume that an error uncovered during design will cost 1.0 monetary unit to correct. Relative to this cost, the same error uncovered just before testing commences will cost 6.5 units; during testing, 15 units; and after release, between 60 and 100 units.

This [IBM81] is expanded, in the References section of the book, into a citation: "Implementing Software Inspections", course notes, IBM Systems Sciences Institute, IBM Corporation, 1981.

Am I embarrassed for Pressman, that is, do I think he's being intellectually dishonest? Yes, but at worst mildly so.

It's bothersome that for the first edition Pressman had no better source to point to than "course notes" - that is, material presented in a commercial training course, and as such not part of the "constitutive forum" of the software engineering discipline.

We can't be very harsh on 1982-Pressman, as software engineering was back then a discipline in its infancy; but it becomes increasingly problematic as edition after edition of this "bible" lets the claim stand without increasing the quality of the backing.

Moving on, consider this 1995 article:

"Costs and benefits of early defect detection: experiences from developing client server and host applications", Van Megen et al.

This article doesn't refer to the cost increase factors. It says only this:

"To analyse the costs of early and late defect removal one has to consider the meaning and effect of late detection. IBM developed a defect amplification model (IBM, 1981)."

The citation is as follows:

"IBM (1981) Implementing Software Inspections, course notes (IBM Systems Sciences Institute, IBM Corporation) (summarised in Pressman 1992.)"

This is the exact same citation as Pressman's, with the added "back link" to the intermediate source. The "chain of data custody" is intact. I give Van Megen et al. a complete pass as far as their use of Pressman is concerned.

Let's look at a blog post by my colleague Johanna Rothman: http://www.jrothman.com/articles/2000/10/what-does-it-cost-you-to-fix-a-defect-and-why-should-you-care/

Johanna refers, quite honestly, to "hypothetical examples". This means "I made up this data", and she's being up front about it. She says:

"According to Pressman, the expected cost to fix defects increases during the product's lifecycle. [...] even though the cost ratios don't match the generally accepted ratios according to Pressman, one trend is clear: The later in the project you fix the defects, the more it costs to fix the defects."

I'm almost totally OK with that. It bothers me a bit that one would say "one trend is clear" about data that was just made up; we could have made the trend go the other way, too. But the article is fairly clear that we are looking at a hypothetical example based on data that only has a "theoretical" basis.

The citation:

Pressman, Roger S., Software Engineering, A Practitioner's Approach, 3rd Edition, McGraw Hill, New York, 1992. p.559.

This is fine. It's a complete citation with page number, still rather easy to check.

I am starting to feel queasy with this 2007 StickyMinds article by Joe Marasco:

https://www.stickyminds.com/article/what-cost-requirement-error

"The cost to fix a software defect varies according to how far along you are in the cycle, according to authors Roger S. Pressman and Robert B. Grady. These costs are presented in a relative manner, as shown in figure 1."

What Grady? Who's that? Exactly what work is being cited here? There's no way to tell, because no citation is given. Also, the data is presented as fact, and a chart, "Figure 1" is provided which was not present in the original.

This is shady. Not quite outright dishonest, but I'd be hard pressed to describe it more generously than as "inaccurate and misleading".

A different kind of shady is this paper by April Ritscher at Microsoft.

http://www.uploads.pnsqc.org/2010/papers/Ritscher_Incorporating_User_Scenarios_in_Test_Design.pdf

The problem here is a (relatively mild) case of plagiarism. The words "the cost to fix software defects varies according to how far along you are in the cycle" are lifted straight from the Marasco article, with the "according to" clause in a different order. But the article doesn't give Marasco credit for those words.

There's also the distinct possibility that Ritscher never actually read "Pressman and Grady". Do I have proof of that? No, but it is a theorem of sorts that you can figure out the lineage of texts by "commonality of error". If you copy an accurate citation without having read the original, nobody's the wiser. But why would you go to the trouble of reproducing the same mistake that some random person made if you had actually read the original source?

So we're entering the domain of intellectual laziness here. (Again, to stave off the Fundamental Attribution Error: I am not calling the person intellectually lazy; I am judging the behaviour. The most industrious among us get intellectually lazy on occasion, that's why the profession of tester exists.)

Next is this 2008 article by Mukesh Soni:

https://www.isixsigma.com/industries/software-it/defect-prevention-reducing-costs-and-enhancing-quality/

"The Systems Sciences Institute at IBM has reported that the cost to fix an error found after product release was four to five times as much as one uncovered during design, and up to 100 times more than one identified in the maintenance phase (Figure 1)."

We find the same level of deceit in a 2008 thesis, "A Model and Implementation of a Security Plug-in for the Software Life Cycle " by Shanai Ardi.

http://www.diva-portal.org/smash/get/diva2:17553/FULLTEXT01.pdf

"According to IBM Systems Science Institute, fixing software defects in the testing and maintenance phases of software development increases the cost by factors of 15 and 60, respectively, compared to the cost of fixing them during design phase [50]."

The citation is missing, but that's not really what's important here. We've crossed over into the land of bullshit. Both authors presumably found the claim in the same place everyone else found it: Pressman. (If you're tempted to argue "they might have found it somewhere else", you're forgetting my earlier point about "commonality of error". The only thing the "IBM Systems Science Institute" is known for is Pressman quoting them; it was a training outfit that stopped doing business under that name in the late 1970's.)

But instead of attributing the claim to "IBM, as summarized by Pressman", which is only drawing attention to the weakness of the chain of data custody in the first place, it sounds a lot more authoritative to delete the middle link.

I could go on and on, so instead I'll stop at one which I think takes the cake: "ZDLC for the Early Stages of the Software Development Life Cycle", 2014:

"In 2001, Boehm and Basili claimed that the cost of fixing a software defect in a production environment can be as high as 100 times the cost of fixing the same defect in the requirements phase. In 2009, researchers at the IBM Systems Science Institute state that the ratio is more likely to be 200 to 1 [7], as shown in Figure 2".

The entire sentence starting "In 2009" is a layer cake of fabrication upon mendacity upon affabulation, but it gets worse with the citation.

Citation [7] is this: "Reducing rework through effective requirements management", a 2009 white paper from IBM Rational.

Yes, at the century scale IBM Rational is a contemporary with the defunct IBM Systems Science Institute, but that's a little like attributing a Victor Hugo quote to Napoleon.

While Figure 2 comes straight out of the IBM paper, the reference to "IBM Systems Science Institute" comes out of thin air. And in any case the data does not come from "researchers at IBM", since the IBM paper attributes the data to Boehm and Papaccio's classic paper "Understanding and Controlling Software Costs", which was published not in 2009 but in 1988. (Both of them worked at Defense consultancy TRW.)

We've left mere "bullshit" some miles behind here. This isn't a blog post, this an official peer reviewed conference with proceedings published by the IEEE, and yet right on the first page we run into stuff that a competent reviewer would have red-flagged several times. (I'm glad I've let my IEEE membership lapse a while ago.)

Garden-variety plagiarism and bullshit (of which we are not in short supply) make me feel icky about being associated with "software engineering", but I want to distance myself from that last kind of stuff as strongly as I possibly can. I cannot be content to merely ignore academic software engineering, as most software developers do anyway; I believe I have an active duty to disavow it.

spock64 commented Jul 23, 2021

Hi - I have a few serious problems with what you write here (I was pointed to this circuitously from a Register article) - I think it might well be that you may be spouting bullshit instead of Pressman ... do read on ...

First, Pressman's book was first published in 1982, not 1987. The source of information was cited as being from the IBM course notes. These notes were contemporary at the time of publication, being from the previous year (1981).

The first edition copy, which is on my desk right now, actually states:

Relative to this cost, the same error uncovered just before testing commences will cost 6.5 units; during testing, 15 units; and after release, 67 units.

So, the text did change between the first edition and the later revision you cited. I am not sure why the numbers moved up - perhaps there was later data available?

Next, in 1982, Software Engineering was a discipline that was very much in vogue. It was a mandatory module in the second year of my undergraduate degree course (1983-1984). I graduated in 1986 and Pressman's book was seen as very much state of the art, based as it was and is on actual experience gained to date in real and significant software projects. Most of these projects were large compared to today's standards, and many were dealing with significant complexity. For example IBM's OS360 (much covered by F.P.Brookes) and ICL's VME/B - both large projects spanning from the early 1960's into the 1970's and beyond. In the case of ICL's VME/B, version control and what we today would call normal software engineering practice were already in place. I can point you at the relevant reports from the 1970's if you like. What did happen in the meantime was that the industry was filled out with amateurs with no formal training in computing science, and this has led to "inventions" that predated the birth of the wannabe "inventors" by decades. Moving on ...

You omit to mention that Pressman also develops his thinking to consider "defect amplification", which was even then a well known problem. Pressman was simply reflecting real world data that he had available. A long time ago I had a series of reports, including the IBM reports, where defect cost was measured. These reports aligned, at least in order of magnitude terms, with what Pressman's book was pointing out. Contrary to you thinking that all this is "bullshit", Pressman's words were based on real data. You should realise that even in the 1970s, senior managers, even in competing firms, would collaborate to find better ways to engineer software. I know that these discussions happened between the major mainframe companies - including IBM, and ICL (who were dominant or large in the UK and Commonwealth). I know this because I knew some of the ICL people involved.

As for as my own experience goes, in my first post-graduate job (at ICL), we used the same inspection methodology as IBM to reduce bug counts - this leading to many very high quality outcomes. This was done both with designs (yes we actually wrote them in those days) and with code. We didn't use the "click compile and see if the unit test catches anything" methods of today. Oh, and the first time I had to write unit tests was 1986, not "after 2000" when you think this technique was invented.

My own experience since then has been that many engineers of today do not understand the impact of bugs, the need to design them out nor much else about how to actually engineer good software. They are disinterested in others experience and somehow think their "agile" methods (we used to call this "incremental development" in the 1980's) are somehow new and good. What I have come to know is that bugs that get to production are way more expensive to fix than those found ahead of deployment. Why is this? Just add up the costs to users of dealing with the bug plus the overheads of a support team plus the eventual fixing and (not that this happens much in the new click-happy world) validation and verification of the fix. All of this extra work costs more than ensuring that the bug was removed ahead of release - this is obvious by inspection. Real world data has told the truth for decades.

Last, my son just graduated from a Masters course in Computing from Imperial College (one of the best in the world). Imagine my surprise and delight that Pressman is still required reading. Long may it continue.

To close, sorry for my venting, but hey, you got a counter view from someone with around 35 years of experience!

Author

Morendil commented Jul 23, 2021 •

edited

Loading

I do enjoy a good rant, so thanks for stopping by.

There are a few things I'd like to clarify, though. First, I have roughly the same amount of experience as you have (30, but that's not a huge relative difference). Second, I have at no point accused Pressman of "spouting bullshit". I have reserved that distinction for later authors who treated their source material carelessly. Third, I have no idea where you're getting the idea that I think unit tests were invented "after 2000".

You seem to have a problem with my handling of the multiple editions of Pressman's book. I might cop to some criticism here, with the caveat that I didn't have access to all of the successive editions and I was careful to note where I read what I quoted and what assumptions I was making. I do appreciate the quote from the first edition, that's useful information ! And it shows that my (stated) assumption that the material hadn't changed since the first edition was incorrect. I'm happy to correct that. I also stand corrected on the date of the book's first edition - the correction reinforces rather than weakens the inference I was drawing from the date.

But I don't think you are addressing my substantial point, which is that Pressman failed to update later editions of his book by either acknowledging that the data he relied on was getting increasingly dated, or replacing his citations with more recent and higher quality ones.

You state "Pressman's words were based on real data" - and kindly offer to point me to the relevant reports - that's all I'm asking for, and I'll gladly take you up on the offer ! And since these reports exist, what I expect of someone with Pressman's impressive reputation is that he should update his citations to point to the IBM reports, so that I could look these up and see how the data was compiled and analysed. And for my part, I commit to updating my own writings (or issuing corrections and retractions as appropriate). These are, as I understand them, the rules of scholarly writing.

As for the rest, you're entitled to your opinion on what are the best methods to develop software, just as I'm entitled to mine… it's fine to disagree, and my definition of "bullshit" does not cash out to something like "an opinion I do not agree with". That, again, is not my beef with Pressman or anyone else.

It's OK to have opinions, and if we want to move beyond opinion, we have to have a "constitutive forum" for scientific discourse on software development. And the rules by which publications in this forum are expected to abide will govern, to a large degree, whether we can view their conclusions as reliable. (If we disagree on these two theses, that's fine too - opinions ! - but we do not have a common ground for what we take to be "science".)

One can quibble regarding the content of these rules, there are matters of opinions in this regard as well. But there are also some commonalities in what wide swaths of the scientific community regard as acceptable and unacceptable behaviour. You have to show your work, you can't just state something to be fact because you remember it from a course you attended. You don't change important numbers between editions without adding an erratum. When you cite work that you did not perform yourself, such citations should allow a skeptical reader to find the original material and check that it does support your claims. The self-correcting nature of science depends on allowing, in fact encouraging, skepticism.

Going back to what I do call "bullshit", I would point to the description given in the book "Calling Bullshit", where it goes: "language, statistical figures, data graphics, and other forms of presentation intended to persuade by impressing and overwhelming a reader or listener, with a blatant disregard for truth and logical coherence". I stand by that characterization as it concerns the many authors who have cited Pressman misleadingly to try to persuade their readers.

Author

Morendil commented Jul 23, 2021

Third, I have no idea where you're getting the idea that I think unit tests were invented "after 2000".

Correction, I have a working hypothesis at least, which is that you have me confused with Hillel Wayne. He, not me, wrote:

Most Papers are Useless. A lot will be from before 2000, before we had things like “Agile” and “unit tests” and “widespread version control”, so you can’t extrapolate any of their conclusions to what we’re doing.

I suspect you would find more common ground with Hillel than you anticipate, if you took the time to find out what he's actually saying - the broader context of his argument. At any rate, I don't think Hillel believes unit tests were invented after 2000 either.

Derek-Jones commented Jul 24, 2021

I think that researchers can be intellectually dishonest, and would cite software effort estimation as an example.

I think that the claims around cost-to-fix are more likely to be driven by naivety, sloppiness or ignorance. The Grant-Sackman folklore is another example.

I frequently encounter people making the assumption that one size fits all, e.g., one instance where the cost-to-fix ratio between phases was very large assumed to apply in all situations (there probably are situations where this is true). This is naivety, sloppiness or ignorance.

I frequently encounter people citing papers they have not read; this provides an opportunity for misunderstandings to spread. This sloppiness has expensive consequences.

Unfortunately, it is not enough to show that some current 'truism' about software engineering is built on quicksand; the belief has to be replaced by something else (nature adhors a vacuum).

The way forward is to obtain data from organizations working in a variety of application domains (easier said than done), to find out what the cost-to-fix ratios are under various development/customer circumstances.

Software engineering research is slowly emerging from a dark age, so don't expact an definitve answer any time soon.

Author

Morendil commented Aug 3, 2021 •

edited

Loading

Thanks for the perspective, Derek. "One size fits all" definitely leads to absurdities.

One of these for instance is this silly notion that there can be such a thing as the "industry average cost to fix one defect". And people can tell you with a straight face that this cost is $977.

I just found out to my dismay that in the 7th edition of his famous book, Pressman switched from citing the "Pressman ratios" (vaguely sourced to "course notes" from a 1981 IBM training program) to citing instead the "Wolverton ratios", and specifically the Cigital version of these numbers, where this $977 comes from.

Having previously tracked down these numbers, I know them to be in part ancient (prior to 1977), in part bogus (the $14,102 was made up by Cigital) and in any case misinterpreted (the Lewis study cited by Wolverton lumped together all changes and thus could not possibly have broken out the specific portion of the cost that went toward fixing defects).

Pressman's 7th edition came out in 2010. We can readily figure out what $977 in 1977 is worth in inflation-adjusted value in 2010; it's around $3265. We're only off by a factor of four.

My point here is that Pressman had thirty years to improve a book regarded as authoritative in the field of software engineering, and in these thirty years instead of coming up with better data he switched from one dubious source to a demonstrably worse one: older, less accurate, more distorted by intervening "telephone games".

The supposedly self-correcting nature of science is not on display here.

I don't quite share your implied optimism when you say "the way forward is to obtain data" - the data we collect is only as good as the underlying theoretical understanding that guides the manner of collecting it.

If we believe in an "industry average cost to correct one defect", then we will go on collecting aggregate measurements (e.g. "total cost of development"), dividing those into other aggregate measurements (say "total number of defects recorded"), pretending that this operation has any sort of meaning, and then compounding the error by averaging it over several organizations that work in entirely unlike ways, yielding an absolutely meaningless "industry average". All this data collection (which will probably have had real costs, in money but also in motivation of the poor programmers upon which management has imposed an empty and unnecessary ritual) will have not advanced our knowledge of "how to write software better" one iota.

The way forward, I suspect, is to clear the deck of some outdated notions and figure out what really matters - then we might have a chance of collecting data about that. We need to start searching for our keys where we dropped them, not under the lamppost that shines a convenient light.

The work you're doing serves an important purpose - it inventories the lampposts (those with public data sets, anyway), giving us an idea of where it's still dark. And perchance some of these lampposts will not be too far from dropped sets of keys. But I'm yearning for more.

Nature may abhor a vacuum, but as people we can build up a tolerance to saying "we just don't know". Unlike Nature we can muster the courage to scrap what has so far felt like an indispensable foundation, but might in fact be an anchor preventing us from making real progress. One thing I have in mind for instance is Hillel Wayne's contention that perhaps the least effort, highest impact on software development productivity would be to get more and better sleep. ESEUR stands out by devoting ample space to human cognition, but how many books on software engineering fail to even acknowledge that it's cognitive work done by humans ?

Derek-Jones commented Aug 4, 2021

The only way to clear away false notions is to figure out what works, and to give people the evidence.

Perception is important. We need to brand research that does not involve significant amounts of real world data as vanity research.

I don't experience any difficulty in convincing people in industry that existing folklore is based on dodgy data (or none at all). But having made somebody receptive to an alternative theory/reality, I have nothing to give them. People are visibly let down, and don't like the idea of "the situation is unknown". Unless the vacuum can be filled, I don't think it is not a good use of my time talking people out of a folklore belief.

We are still a long way from using theory to guide the collection of data. Today's collection methodology is to ask people if they have data (most don't), and when some is found ask for a copy (I don't see things changing for many years). I seem to be the only person actively searching for data; from what I have seen, other researchers acquire industry data through one of their students having worked someplace, or having an existing relationship with an organization. Most are burrowing down the github rabbithole.

Software research is primarily done by people interested in software. I have had academics tell me of the perception difficulties they would face, should they get involved in 'human stuff'. I have also been told about the difficulties of getting human issue papers published. One solution is to start funding psychology and sociology researchers to investigate software issues (psychologists can be snobbish about applied research; they see their calling as understanding the operation of the human mind).

Where do people go to talk about evidence-based software engineering? We need a mailing list (please no walled garden such as Facebook). I would make a poor moderator, volunteer needed :-)

Morendil/degrees_of_dishonesty.md

spock64 commented Jul 23, 2021

Uh oh!

Morendil commented Jul 23, 2021 •

edited

Loading

Uh oh!

Morendil commented Jul 23, 2021

Uh oh!

Derek-Jones commented Jul 24, 2021

Uh oh!

Morendil commented Aug 3, 2021 •

edited

Loading

Uh oh!

Derek-Jones commented Aug 4, 2021

Uh oh!

Morendil/degrees_of_dishonesty.md

spock64 commented Jul 23, 2021

Uh oh!

Morendil commented Jul 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Morendil commented Jul 23, 2021

Uh oh!

Derek-Jones commented Jul 24, 2021

Uh oh!

Morendil commented Aug 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Derek-Jones commented Aug 4, 2021

Uh oh!

Morendil commented Jul 23, 2021 •

edited

Loading

Morendil commented Aug 3, 2021 •

edited

Loading