Did I ever tell you the story about when I was Jimmy Saville’s quality assurance manager? No?
Well, it’s not surprising really. Because, even if it were true, I would hardly be likely to boast about it. Let’s face it, the monitoring of Jimmy’s non-conformities was hardly QA’s finest hour. I guess the problem is that, when it comes to the ‘creative’ disciplines, the quality control man is the last person to be let in on the secret. The Church is another area where one might imagine that quality control should know its place. The confessional might be a good way to keep the masses in check but it didn’t do a particularly good job of stopping abuse in the sacristy.
So clearly, having a good quality management system in place isn’t the answer to all of the world’s problems. I get that. There is a limit to which an independent quality control function can deal with matters of ethics and morality, particularly where mutual scrutiny amongst expert authorities is the order of the day. So what chance would quality management have in identifying and dealing with any systemic problems that may exist within the ultimate bastion of creative expertise: scientific research and development? For example, if one feared that experimenter bias had become institutionalised within a particular field, what, if anything, could be achieved by having a quality control engineer on the case?
I ask this question because I recently had an interesting discussion on this website regarding this very matter with a certain Professor Ken Rice (aka ATTP). As it happens, the subject under the microscope was paleoclimatology, but I don’t want to make too much of that because I don’t want to get drawn into a discussion of the specifics of Climategate, hidden declines or PAGES2k. What matters is whether the generic system currently in place to ensure integrity within research (i.e. peer review and mutual scrutiny) could be improved upon by adopting practices that are commonplace outside of academia.
Here’s to having a go
I am, of course, not the first one to consider the application of quality assurance within a research environment. The idea is well enough established to have even received its own acronym: QAR. This has been defined by the University of Reading as:
“…all the techniques, systems and resources that are deployed to give assurance about the care and control with which research has been conducted.”
Accordingly, QAR addresses matters such as:
- The responsibilities of those involved in the research;
- transparent project planning;
- the training and competence of research staff;
- facilities and equipment;
- documentation of procedures and methods;
- research records;
- and the handling of samples and materials.
The university even has a Code of Good Practice in Research, itself based upon the ‘UK Research Integrity Office’s Code of Practice for Research (UKRIO Code)’. It covers, amongst many other things, matters of honesty, integrity, cooperation, openness and research design. In particular, it states that when designing a research project one must ensure that:
“…the design of the study is appropriate for the question(s) being asked and addresses the most important potential sources of bias”
Helpfully, the University of Reading’s codes of practice add that:
“Within the University, advice on all aspects of statistical design and analysis is available from the Maths and Statistics Department.”
All good stuff. But it’s one thing to remind staff of their responsibilities and quite another to ensure that they are fulfilled. Apart from taking the advice of the Maths and Statistics Department, what other ‘techniques, systems and resources’ might one bring to bear to ensure that one does not fall prey to potential sources of bias when designing a research project? Is this something that can be readily audited, for example?
One of the major difficulties here, as I see it, is the less than obvious manner in which many of the biases can sometimes operate. This is especially true when focusing upon interpretive biases, i.e. those that can compromise the assessment of research evidence. If you talk to anyone defending the climate science consensus they will tell you that interpretive bias hasn’t played any major role. Talk to anyone outside of climate science and they will tell you that it is a perennial problem within research. In fact, biases may be present even in the most rigorous of sciences and may be obvious only in retrospect. It would be nice to think that a good dose of quality assurance would solve the problem but there are a number of reasons for being pessimistic. The first is that interpretive bias can actually be made worse through the application of quality assurance.
The QAR Paradox
To explain this paradox, I need to first list the various forms that interpretive bias may take. These have been defined as follows:
- Confirmation Bias – evaluating evidence that supports one’s preconceptions differently from evidence that challenges these convictions.
- Rescue Bias – discounting data by finding selective faults in the experiment.
- Auxiliary Hypothesis Bias – introducing ad hoc modifications to imply that an unanticipated finding would have been otherwise had the experimental conditions been different.
- Mechanism Bias – being less sceptical when underlying science furnishes credibility for the data.
- “Time will tell” bias – the phenomenon that different scientists need different amounts of confirmatory evidence.
- Orientation Bias – the possibility that the hypothesis itself introduces prejudices and errors and becomes a determinate of experimental outcomes.
The problem here is that any quality assessment that one may introduce is likely to be a victim of the above biases rather than a panacea. Take confirmation bias, for example. Behavioral experiments have been conducted that demonstrate that evidence that seems to contradict the favoured hypothesis will receive greater quality assurance scrutiny, and will be required to pass more stringent criteria for acceptance, than evidence that confirms preconceptions. That’s how it should be, you might say, if the preconceptions are well-founded. But that is just a way of saying that there is nothing wrong with confirmation bias – when there is.
Similarly with other biases; given the presence of a bias, quality assurance is likely to be enrolled within its service rather than serve to expose it. That’s just human nature. Total objectivity is hard to achieve and that is as true within quality assurance as it is within the area of research it is intended to assure.
The expertise problem
An obvious solution to the above problem may seem to be the employment of quality assurance personnel who are skilled in the subject area but not invested in the outcome of the assessment. The parallel in industry might be, for example, a software quality assurance specialist who is fully conversant with the technicalities and processes involved in software development but who is not on the project team of software development practitioners. An analogous situation may be difficult to imagine within the context of a research establishment, but it is not impossible to see it working. Take, for example, the role that could be served by the Mathematics and Statistics Department of the University of Reading if they were to go beyond providing advice. If, instead, they were to be authorized to audit the practices of the University’s various research departments, then some degree of expert yet independent scrutiny might be brought to bear. The problem here is that it would be quite counter-cultural to expect one university department to have the authority to dictate the manner in which another department designed its research projects. Counter-cultural, perhaps, but this is the sort of imposition that industry invites every time it signs up to the scrutiny of an internal quality management system that is certified and audited by a UKAS accredited certification body. Nobody likes it, but the necessity is acknowledged.
Nobody Likes the QA Guy
Which brings me on to the next barrier against the introduction of an effective QAR function. As a former software quality assurance manager, I can speak from bitter experience of just how resentful and aggressive can be the response from a professional who has been formally criticised for any aspect of their work. Even with the most supportive of management, quality assurance staff can feel like the apocryphal plover bird picking the teeth of a Nile crocodile. And when the defects highlighted by quality assurance have their roots in the behaviour of senior management, the problem becomes even more acute. Senior management have particularly powerful jaws.
When dealing with proud and dedicated software professionals (who have a not-so-secret suspicion that quality assurance staff are just second-rate practitioners who only know how to stop the creative process), I had to be highly diplomatic in my approach. I can only guess at how difficult the role would be when dealing with someone who has dedicated a life to the advancement of a particular hypothesis and who’s whole career depended upon the establishment and maintenance of prestige amongst their peers. Furthermore, this problem is bad enough when dealing with a straightforward case of failure to follow written procedure, or the production of something that is clearly defective. But it is in the nature of experimenter or interpretive biases that accusations of such may be misinterpreted as a highly personal attack on the integrity of the individual. These are not the sort of attacks that go unpunished. For that reason, anyone operating within QAR needs to be fully sanctioned in their work and protected from the worst excesses of an academic’s umbrage. There are prominent climate scientists, I am told, who would not hesitate to sue the moment that anyone challenged their position.
So what is so wrong with peer review?
Fortunately for academia, when it comes to the big boy stuff, it has no need to defer to a supposedly second-rate jobsworth to keep the errant in line, since it has a system by which publication of results requires submission to peer review. Scientists have to publish in reputable journals that protect their reputation by ensuring that all submissions have received expert scrutiny. It’s a simple system of self-policing that embodies the scientific method. So what is wrong with that?
Well, where do I start? With my quality assurance expert’s hat on, I have to say that if I set out to design a dysfunctional process by which the quality of work may be assured, I couldn’t do a better job than peer review. It is a system in which individuals are selected to act as experts, using a selection process that does not itself adhere to any quality control precepts. It is also often the case that those appointed to critique work do so under the cloak of anonymity – not a characteristic normally associated with effective quality assurance. There appears to be nothing in place to assure the thoroughness, completeness, or impartiality of the scrutiny applied. The process appears to have no built-in safeguards to guard against partisan reviewing or, indeed, hostile reviewing borne of professional rivalry. And all of this for the publication of an article that will appear in a journal that is under editorial control. So, far from being a means of preventing interpretive bias, it seems to be a system guaranteed to nurture it into full bloom. If there had not been sufficient quality assurance applied within the relevant research establishment prior to submission of its work, I see little in peer review prior to publication that adds anything to the assurance equation.
This is not to say that peer review should be scrapped. It is, after all, the best that is on offer, and it does at least provide the opportunity for a degree of quality assurance, even if it may be difficult to judge the extent to which the opportunity has been taken.
Where we stand
It is often argued that scientific research is a special case in which standard quality assurance standards have limited applicability. They can be used to address peripheral issues, but when it comes down to creativity within an arcane subject area, there is no alternative but to place trust in the expert community’s integrity and ability to apply peer-group judgement. There again, the same argument was applied to the arts and those who held imperious positions within them, such as Mr Saville. Similar pleas for special treatment would no doubt have been used with regard to the clergy in the catholic church. There are some lines of work, it seems, where mutual scrutiny amongst peers is de rigueur, because there is no one who could gain the required abilities without becoming a member of the club. But be that as it may, there can be no doubt that it is a system that can, from time to time, lose its way.
There is, of course, nothing happening within academic research that can compare with the egregious misdemeanors of the likes of Saville and co. When all is said and done, experimenter and interpretive biases are largely innocent and subconscious failings committed in the spirit of an over-eagerness to determine the truth. Nevertheless, to the extent that truth is a rare and valuable commodity, one should not underestimate the damage that such biases may cause. And, given the insidious manner in which they operate, we should not underestimate the extent to which even the most circumspect of fields of research may be susceptible.
Several important initiatives have been taken in the arena of academic research to try to address the quality assurance challenge. I have already cited the codes of practice developed by the University of Reading, but this is just one of many establishments that has taken the subject seriously. Add to that the existence of standards and guidance such as the UKRIO Code, supplemented by more specialised directives such as ISO Guide 25, the EURACHEM/CITAC Guide CG2 and the OECD Principles of Good Laboratory Practice (GLP), together with a variety of published codes of practice for statistical analysis, and one can see that this is not a problem taken lightly. You can also see such levels of interest reflected in the British Computer Society’s recent intervention to call for better standards of software development in the production of safety-related academic code, such as the epidemiological models developed by Imperial College London (an intervention, incidentally, that was resisted with some pretty traditional negativity by the punters at Professor Rice’s website).
Nevertheless, there will always be an extent to which QAR remains little more than that intrepid little bird, perilously pecking at the teeth of the basking monster – tolerated only insofar as relatively cosmetic housekeeping is undertaken. Because, when it comes to matters of subconsciously applied or institutionalised bias, scientists are nothing if not individuals who think they can take care of it themselves. At the end of the day, despite the plover’s best efforts, the crocodile is still the crocodile.
A good selection of resources on best practice in statistics can be found here.
John I was involved in the peer review process for my specialist subject for about four decades, for several journals and for shorter periods I organised the process for two of them. I have found, over the years, that many people have great expectations of the peer review process that are rarely met. Ideally you want from a reviewer someone who knows well the topic of a submitted paper yet is not involved in the paper’s research. This commonly is impossible to achieve. So the person commissioning a peer review either has to request it from someone who may well be biased either for or against the researcher(s), or from someone who might be unable to spot any deficiencies in the research or its write-up.
Essentially the purpose of peer review has commonly devolved down to the simple question “is the submitted paper appropriate for the journal”, or in different words “will publishing the paper cause the journal any regrets?” Commonly papers are not sent to known opponents of the paper’s author(s). The last thing a journal wants is to become embroiled in a dispute.
I always refused to be anonymous when reviewing papers. I always tried to be conscientious when writing a review, never got any monetary reward for my efforts, yet rarely received any feedback. I know, from receiving reviews of my own work, just how angry or irritated one can become. I developed the practice of quickly reading the review and then sleeping on it, before picking it up again. Questions like “how could that stupid cretinous reviewer not understand what I did or was attempting to do, changed subtly into “how can I convey with greater clarity what that….reviewer was unable to grasp”?
My impression is that peer review, as I was involved in it, was commonly nothing to do assessment except in its most basic sense – is the work complete, is it sufficiently well written, is it novel?
LikeLiked by 5 people
In the face of such a well-reasoned and erudite article, my own thoughts are limited and perhaps trite.
Given the overwhelming media, political and scientific community investment in CAGW as a theory, it strikes me as almost inevitable that confirmation and other biases will be inherent in much of the work that is carried out in this area.
It ought to be obvious that some form of quality control, over and above such as does exist, is of vital importance, given that almost every new piece of climate research is used to add to the demands that we “do more” to achieve “net zero” to “deal with” the “climate crisis”. The financial and other consequences of such demands are so enormous that they, and the scientific evidence on which they are based, should be subject to the most rigorous scrutiny. However, I fear they are not.
LikeLiked by 2 people
That was a most interesting insight and a good counterpoint to what was something of an outsider’s view on my part. I was particularly interested to see the extent to which you seemed to be essentially setting standards for yourself. This would imply that the integrity of the process is perhaps a little too dependent upon the individuals involved. Did you find that there was much editorial input into the process? By that, I mean to ask if different journals had their in-house rules, or were you entirely at liberty to do it your own way, as it were? All of my peer reviewing was performed in the context of a commercial enterprise, for which there were written codes of practice in place.
Also, I fully agree regarding the hurt felt when on the receiving end of a review. We had something called a Black Hat review in which everyone on the review team was encouraged to be as negative as they could be. And this was all done face to face. I have seen grown men cry.
LikeLiked by 1 person
Peer review is not up to the job. Academic publishing is not up to the job. Grant allocation procedures are not up to the job.
There is no question that a peer review is ever going to audit a paper in anything other than a cursory way. The reviewer is not going to replicate the work. The reviewer is going to skim over and look for obvious deficiencies. Quickly check that the percentages add up to 100. Check that the degrees of freedom reported are correct. But it would be impossible to check significance levels. And if it were possible to check significance levels… it would be impossible to check that the reported measurements were the “true” ones. Or that the experimental setup was as described. Everything is taken on trust – it has to be – and the peer review is little more than a sanity check.
A huge problem is that there is now a pressure to publish – publish or perish they used to call it, but it really is that now. The result is that fakery is growing. My impression is that, a long time ago in the golden age, no-one much cared which way the result went. Now it is different. The journals are complicit, demanding eye-catching stories. Everyone watches impact factor.
These days you can get away with copy-pasting images of the same gels, or repeating photographs of the same fish, or just copy-pasting rows of measurement data, or hell, just typing out the values that work. I hope and believe that this is still very much the minority pursuit of the most ambitious. But it casts a shadow on us all.
There has never been anything as good as hostile review in a blog like Climate Audit. Science should not be bullet proof. But as I have said before, for any good scientist, being corrected is not, or should not be, a badge of shame. Accepting a correction is a badge of honour. It is not, nor has it ever been, about winning an argument. It is all about finding the right answer.
In O level physics the teacher next door to mine* used to offer a Mars bar to anyone who spotted a mistake in his maths. A prize was claimed a few times a year. Perhaps journals should be compelled to offer something similar: 500 quid to anyone who could spot a refutation-sized error in one of its articles. I think that might get postgrads’ pencils out.
*Mine had flown Spitfires. We wasted many a lesson dragging an anecdote or six out of him rather than learning actual physics. But next door had Mars bars.
LikeLiked by 1 person
Nice article. I’m familiar with the corporate and software (embedded version) worlds, and indeed all you relate is an issue therein and a constant challenge. I have no solution to offer. But there is a sense of even worse problems, in that writ small (say corporate group-think) or writ large (full blown cultural bias), in bad cases emergent social process can get to the point of subconsciously marshalling all the biases that you mention (and many more) *to the same end*. And should the process spread far enough, there may not be anywhere left ‘solid’ enough so to speak, within an organisation or a government or whatever else, which might serve as a platform for fighting back. This is what happened at Enron, for instance. Some claim it has happened within climate science. Yet while such a worst case scenario would seem to be beyond help, it does have one saving grace. The main characteristics of group-think / cultural bias are completely independent of whatever area of expertise has become ‘infected’ with same. Hence outside expertise only in such characteristics is needed to highlight the issue, in theory. NOT expertise in software or financial markets or climate-science or whatever. So perhaps any area of endeavour large enough or important enough should have an audit related to such characteristics. Although there’s still the problem of… what happens when the field of cultural expertise itself also believes en-masse in a cultural narrative!
John I strongly suspect that the differences we expose between academic peer review and QAR result from two main factors. First, no-one is paid to conduct peer review (or at least I wasn’t) so there are no professional standards, no supervision, no training. Second, I suspect that in industry there is a high degree of focus, what is being monitored or checked is limited. In peer review the material (research papers) can arrive for judgement upon a bewildering variety of topics. As an unpaid journal editor I commonly had great difficulty in finding suitably qualified reviewers and I still remember reserving one particular reviewer for manuscripts upon unusual topics. She was one of the very few who would tackle such topics and consistently would do a good job giving a fair and often helpful assessment to the authors,
I was constantly amazed at the isolation of university and academic researchers. I suppose that with the internet, much has changed. The best example I know of this concerned sequence stratigraphy – developed by Esso in the 1970s, an exploration of the history of the Earth that revealed sea-level changes. When this was first revealed at a presentation at a meeting, the presenter was asked why Esso was revealing these valuable ideas, because other organisations (other oil companies, most academia) had not a clue as to what Esso had been doing. It was immediately obvious that the techniques were immensely valuable to finding hydrocarbon traps. The Esso geologists were amazed that sequence stratigraphy was not widely known. They had discussed the concepts with Houston and other nearby geology departments, but clearly there it stayed.
There were no standards for peer review. You were chosen as a reviewer because you had previously published, and so would be expected to know what was expected of you. Poor reviewers were dropped and new ones sought. An editor would select three reviewers and make recommendations based upon these. Conflicting decisions from reviewers would be judged by the editor. So everything was done on a hit or miss basis, dependent upon the good will and fairness of all concerned. Easy though to see how the system could be subverted and degenerate into “pal review”.
LikeLiked by 1 person
>”These days you can get away with copy-pasting images…”
Indeed you can, and I wonder how many of the readers of this article spotted that the bird in the above picture had been photoshopped. We accept what we expect and we trust those who seem trustworthy. But the fact is that the crocodile bird is a just a meme that we have all grown up to believe. I could have written an article on how this meme has developed, and why. Instead, I chose to go with it because it was very useful to me. It makes you think.
Finance is indeed an important issue here. In many respects it is unfair to make comparisons with industry, in which contractual arrangements can be used to secure the funding for QA. And if we insisted on scientific research always paying its way, what would happen to the important principle of pursuing knowledge for its own sake? On reflection, this may be the most important barrier to the introduction of effective QAR – a completely different financial model is needed that may just be completely at odds with what academic research is set up to achieve. We have moved on a long way from the days of the gentleman scientist, but there is, perforce, vestiges of amateurism in modern day science.
When my company performed Black Hat reviews of bid documents, it could afford twelve people in a room to spend at least a day listening to someone read from their proposed bid document, each person butting in with criticisms whenever they wanted to, and each trying to out-do the other in the ferocity of their onslaught. It sure as hell was effective in detecting weaknesses in the bid, but it was a very expensive process that needed to be included in the financial planning of the sales and marketing department. But we did it because the cost of the customer discovering the weaknesses in the bid was a lot more than the cost of removing them before submission. I’m not convinced that the same equation applies in scientific publication.
All of that said, even in industry it always seemed to be the QA budget that was cut first when money was tight.
LikeLiked by 1 person
I’m not surprised the alleged symbiotic relationship between Egyptian plovers and crocodiles is mythical. Crocodiles swallow small prey whole and tear the larger into chunks by holding on,and rotating their own bodies, and swallowing those torn off body parts whole. Teeth are designed to penetrate and hold on. They are wide spaced. If any prey parts did become wedged they would have to be large enough that a small bird like the plover would not be able to lift them out.
Secondly, Herodotus (the originator of the myth) has an epitaph that begins “Herodotus, the son of Sphynx, lies”.
John. The more I read about your work experiences, the more I recognise. For example, your Black Hat reviews had an analog at UEA for a time. Research proposals were presented in front of a School panel, ostensibly to aid in their improvement and to enhance their chances for success. This sometimes became a free-for-all, with some senior academics flexing their muscles. Tears were also shed, but bloody rows were more common.
Round about this time I stopped submitting proposals. My experience was that my best proposals (as judged by me) were repeatedly criticised by external reviewers and rejected, only later for me to discover those same proposals being researched by those self-same reviewers. I just refused to play those games and in my last few academic years I would work together with undergraduates who were doing their dissertations and either published with them, or if they didn’t wish to publish themselves, on my own acknowledging their input.
My experience working in oil companies as a technical expert was different again. I suppose I was responsible for imparting some form of quality control. I had perhaps been working with exploration geologists on different plays, feeding them ideas and previewing their presentations. At other times I sat the other side of the table, with the manager, chief geologist and others like him. Then I listened to proposals, and if asked gave my opinion. Decisions were entirely made by the Manager. He (for there never was a she in my time) was judged by results. He might disappear overnight, either booted upstairs to even bigger things, or was fired without warning. In one company I worked closely with a supreme being, the Chief Geologist of the entire company, and not just my Canadian bit. He disappeared without warning. It was a cruel world for some. Apart from technical advisors, like myself, I cannot recall my company doing any QA, but then that was back in the 1980s. Successful managers (and underlings) were lucky ones.
A logical question is what is the Crocodile planning?
Here are some insights on just that:
in engineering/aerospace we have PDR & CDR’s – “https://acqnotes.com/acqnote/acquisitions/critical-design-review”
the difference to Academia is your company can lose money big time if someone (you) get things wrong.
having been thru a few in my work life it makes peer review sound like a doddle (was going to say “cake walk” until I looked it up)
DF. It depends upon your message. Try getting some climate scepticism past peer review for inclusion in Nature, Science or some consensus journal. Even with non-climate science, the unconventional will commonly be a difficult sell. Reviewers like what they are comfortable with and what they can believe in.
Amateur science has almost disappeared from published journals, even those that were specifically set up for that category. I lost count of papers originating from the Third World that were rejected by First World reviewers for not incorporating data from laboratories: laboratories not available to the authors. Our job as editors was to actively encourage participation from all quarters, but it often felt as if peer review was designed to discourage or deny such wider participation.
LikeLiked by 1 person