“Immaturity is the inability to use one’s own understanding without direction from another. This immaturity is self-incurred if its cause is not lack of understanding, but lack of resolve and courage to use it without another’s guidance.” — Immanuel Kant
What do you believe gives you confidence in your confidence? For a lot of people, the answer lies in the number of individuals who share their confidence. But can you be confident that this is a good basis for confidence in confidence? Would you be more confident that this was a good basis if enough people agreed with you that it was? Well, if you were an author for one of the IPCC’s Assessment Reports, you certainly would be – or you would if you took any notice of the IPCC’s guidance on this matter, the confidently titled “IPCC AR5 guidance note on consistent treatment of uncertainties: a common approach across the working groups”.
The IPCC’s guidance note is a document of central importance since it explains the process by which the AR authors are to arrive at the attributions of uncertainty that you read in the IPCC’s various proclamations – attributions such as: “Past emissions alone are unlikely to raise global-mean temperature to 1.5°C above pre-industrial levels but past emissions do commit to other changes, such as further sea level rise (high confidence)”. And, given how important the note is, I thought it might be worthwhile for me to take you through its main points, commenting along the way upon its various deficiencies and errors, and thereby explain to you why you should never be confident in the IPCC’s confidence.
The train not arriving at platform one
So let us start, as one always should, by defining ones terms. Except, here we have a big problem, because the one thing that the AR5 guidance note signally fails to do is define its terminology. Yes, there are plenty of references to variables such as ‘confidence’, ‘risk’, ‘likelihood’ and ‘uncertainty’ but nowhere are these terms defined – even though the guidance note has the sole purpose of standardising upon how levels of said variables are to be described. But let us not allow ourselves to be put off by a little detail such as total lack of definition. Let’s crack on regardless. You have a long way to go yet, dear reader, so please don’t give up on me now when we haven’t even left the station.
In the absence of clear definition, we must instead start with what the guidance note has to say regarding the metrics for uncertainty:
“The AR5 will rely on two metrics for communicating the degree of certainty in key findings:
• Confidence in the validity of a finding, based on the type, amount, quality, and consistency of evidence (e.g., mechanistic understanding, theory, data, models, expert judgment) and the degree of agreement. Confidence is expressed qualitatively.
• Quantified measures of uncertainty in a finding expressed probabilistically (based on statistical analysis of observations or model results, or expert judgment).”
This may seem innocuous enough, but let me say straight away that there is no reason why confidence in the validity of a finding cannot be quantitatively assessed (I’ll explain how later) and there are plenty of non-probabilistic methods available to evaluate uncertainty (although I say this whilst labouring under the disadvantage of not knowing what the guidance note means exactly by ‘uncertainty’). Furthermore, there is here the first hint of a big problem: Why is the ‘degree of agreement’ deemed important as a metric for uncertainty?
We must agree to agree
After a short diversion into risk management issues, such as the importance of focusing upon high impact consequences that have low probabilities, the note gets down to the business of describing the process by which uncertainties are to be considered, evaluated and communicated. It is a guidance that includes a good deal of sensible advice, such as:
“Determine the areas in your chapter where a range of views may need to be described, and those where the author team may need to develop a finding representing a collective view. Agree on a moderated and balanced process for doing this in advance of confronting these issues in a specific context.”
Just how this fits in with the IPCC’s determination to establish consensus at all costs1 is unclear but, rather than get bogged down by such matters, let us quickly move on to the heart of the matter: the definition of a so-called ‘calibrated language’ and how it should be used to convey levels of confidence. The following guidance is provided:
“Use the following dimensions to evaluate the validity of a finding: the type, amount, quality, and consistency of evidence (summary terms: ‘limited’, ‘medium’, or ‘robust’), and the degree of agreement (summary terms: ‘low’, ‘medium’, or ‘high’).”
These two dimensions (evidential weight and levels of agreement), are then used to construct a 3×3 matrix, with the bottom left cell representing the combination of ‘limited evidence’ and ‘low agreement’ – this is the cell corresponding to minimum confidence. The top right cell (‘robust evidence’ combined with ‘high agreement’) corresponds to maximum confidence. To standardise upon the expression of confidence, the guidance note says:
“A level of confidence is expressed using five qualifiers: ‘very low’, ‘low’, ‘medium’, ‘high’, and ‘very high’. It synthesizes the author teams’ judgments about the validity of findings as determined through evaluation of evidence and agreement.”
By this stage, if not before, you should definitely be asking why the “type, amount, quality, and consistency of evidence” is insufficient in itself to determine levels of confidence. Why is the extra dimension (degree of agreement) required?
Well, there is a very good answer to this – it isn’t, and one only has to consider two implications of the IPCC’s uncertainty matrix to recognize that something has gone horribly wrong. Firstly, one has to wonder how one can trust high levels of agreement when the evidence is limited. Secondly, one has to question what legitimacy there is to having low levels of agreement in the face of robust evidence.
In reality, the level of agreement is not an orthogonal variable that can be treated separately from the robustness of evidence. If the evidence is robust then disagreement should be low, or something very odd is happening. Moreover, it is only when the data is sparse, and expert opinion starts to serve as a substitute, that levels of agreement even become relevant. That said, when the experts do agree, this agreement is factored into the assessment of evidential weight, it still can’t be treated as a second dimension in the assessment of uncertainty.
Except, we keep coming back to a fundamental problem – the guidance note fails abjectly to define what it means by ‘uncertainty’. And as failures go, this is a humdinger. It leaves one with the nagging thought that perhaps the IPCC has dredged up a definition from somewhere, for which consensus can be used as a metric applying independently of evidential weight. Actually, that is precisely what I think it has done. For my next piece of evidence, m’lud, I offer you Chapter 2 of AR5: “Integrated Risk and Uncertainty Assessment of Climate Change Response Policies”. In section 2.6.2 we are offered definitions of uncertainty that include ‘paradigmatic uncertainty’ and ‘translational uncertainty’, such that:
“Paradigmatic uncertainty results from the absence of prior agreement on the framing of problems, on methods for scientifically investigating them, and on how to combine knowledge from disparate research traditions. Such uncertainties are especially common in cross-disciplinary, application-oriented research and assessment for meeting policy objectives (Gibbons, 1994; Nowotny et al, 2001).”
“Translational uncertainty results from scientific findings that are incomplete or conflicting, so that they can be invoked to support divergent policy positions (Sarewitz, 2010). In such circumstances, protracted controversy often occurs, as each side challenges the methodological foundations of the other’s claims in a process called ‘experimenters’ regress’ (Collins, 1985).”
You’ll not be surprised to learn that both of the above are rather niche definitions loitering in the bowels of the sociology of science. In both instances, emphasis is placed upon the extent to which dispute exists between social, political or ideological groups, and so to call them types of uncertainty is stretching the point somewhat. If it is uncertainty, then it is of a kind that can be reduced simply by eradicating or discrediting one of the groups concerned, and that should be obvious to the IPCC. In treating such dispute as a legitimate class of uncertainty, it is unsurprising that the IPCC should then treat consensus as an appropriate metric, thereby inviting the involvement of factors that have much more to do with politics, sociology and cultural bias than they do the objective evaluation of data. The IPCC matrix of uncertainty smacks of self-incurred immaturity, in which there is too much focus upon social cohesion and not enough focus upon the evidence.
Confidence measured objectively
But, if we are not to include consensus in our calculation of uncertainty, how do we then calculate confidence values based upon evidential weight alone? I’m glad you asked. Here is how:2
The first thing to appreciate is that the world is full of possibilities, and current evidence may simultaneously support belief in any or all of them. This ambivalence leads to uncertainty, but it is not the strength of evidence for a particular possibility that matters, it is the relevant strength compared to the alternative possibilities. A possibility that is supported by several sets of evidence will be greater than a possibility that is only supported by one. As evidence is collected, uncertainty may be reduced if it further supports a promising possibility, but uncertainty will be increased if it supports a previously unsubstantiated idea.
This discordance can be analysed formally using a branch of statistics referred to as possibility theory. The range of possibilities supported by the available evidence is represented by a possibility distribution function (π), constructed by accruing evidential weighting. A possibility distribution is not to be confused with a probability density function (pdf), for which the probability for a given outcome, p(x is A), is determined by the area under the pdf for which the variable x lies within the proposed set of alternatives A. In contrast, the Possibility(x is A) is given by the highest value attained by the possibility distribution, as it covers the range for which x would lie within A. Furthermore, there is a complementarity in probability theory that does not exist in possibility theory. Whereas, in probability theory, p(x is A) + p(x is not A) = 1, the area under a possibility distribution may exceed 1 (although no single possibility value may do so). Nevertheless, a form of complementary does exist in possibility theory, albeit by introducing the concept of necessity, i.e. an evidential weighting that is calculated by only taking into account the evidence that exclusively supports the proposition. Complementary in possibility theory is then given by:
Possibility(x is A) = 1 – Necessity(x is not A)
Possibility(x is not A) = 1 – Necessity(x is A)
Finally, a measure of confidence can be calculated by considering the extent to which Possibility(x is A) differs from Possibility(x is not A), i.e.:
Confidence(x is A) = Possibility(x is A) – (1 – Necessity(x is A))
Confidence(x is A) = Possibility(x is A) + Necessity(x is A) -1
So there you go. Confidence calculated without a single opinion poll in sight.
When reflecting upon the above, I like to think of a possibility distribution as evidential terrain, in which one would like one’s proposition to be sat upon a majestically isolated peak surrounded by a flat expanse. But a word of warning here. The evidential terrain is not the territory – it is the map. Furthermore, it is a map that has been drawn up by explorers who may not have visited all areas, and so vital high-ground may be missing. It is easy to be confident in a homespun proposition if one steadfastly stays at home.
Alternatively, if you don’t feel comfortable abandoning probability theory, you may be interested to learn that you can calculate the uncertainty (H) represented by a given probability distribution using the formula:
H = – Σ p * loge(p)
It is no coincidence that this is Shannon’s equation for the calculation of entropy, since the concept of entropy is based upon the number of possible configurations that a system may adopt.
So, whether one is focused upon probability or possibility, one still has a means of quantifying confidence simply by analysing the pattern of evidence. There is absolutely no need to introduce a measure that captures the level of dispute between competing political, social or ideological groups. It isn’t the agreement of competing ideologues that matters, it is the agreement of data. If you attempt a calculation of confidence that introduces the former, you will only obscure what the latter is already ably telling you. Needless to say, none of the above methods for calculating confidence (i.e. methods based purely upon evaluation of evidence) feature in either AR5 or its guidance note.
Probability to the rescue?
I have written thus far, at some length, of the IPCC’s mishandling of the concept of confidence when it is used as a metric for uncertainty. However, you may recall that the guidance note had referred to the existence of two available metrics, the second being a quantified, probabilistic measure “based on statistical analysis of observations or model results, or expert judgment.” Unfortunately, however, at this point the guidance simply introduces more confusion, in which the concepts of uncertainty and likelihood are conflated. Consequently, rather than probability being related to uncertainty, as per Shannon’s equation, it becomes its synonym. Furthermore, in seeking a ‘calibrated language’ for the expression of probability, quantified levels become arbitrarily defined, thus:
- 99–100% probability = Virtually certain
- 90–100% probability = Very Likely
- 66–100% probability = Likely
- 33 to 66% probability = About as likely as not
- 0–33% probability = Unlikely
- 0–10% probability = Very Unlikely
- 0–1% probability = Exceptionally Unlikely
Nowhere in the accompanying text is there any attempt to explain how levels of likelihood are related to uncertainty. For example, there is nothing to explain that the point of maximum probabilistic discord (‘About as likely as not’) corresponds to maximum epistemic uncertainty.3 In fact, since the table refers to the likelihood of something happening, it is risk that becomes the more relevant concept – not uncertainty. To add to the confusion, the likelihood categories are defined using extended ranges of probability (thereby introducing imprecision) that overlap (thereby introducing ambiguity). The result is a conceptual dog’s dinner that leaves the reader unable to discern whether the IPCC is advocating risk aversion or uncertainty aversion, or indeed appreciates the distinction. I’m afraid that, in the hands of the IPCC, there is no redemption to be found in quantified probability.4
Authority has the final word
It is disconcerting enough that a document that sets the standard by which uncertainty shall be evaluated and communicated by the IPCC should attribute such importance to consensus. But it is no less disconcerting to see that it even fails to make a clear distinction between the concepts of risk and uncertainty and how they are related. It is tempting to speculate that, if the authors had taken the time to define their terminology, perhaps such confusion and sleight of hand could have been avoided. The irony is that, given the IPCC’s misplaced reverence towards consensus, the guidance note has been readily adopted, complete with its illogic and conceptual confusion.
The BBC in its magnificence would do well to invite an independent expert on to one of its self-esteemed programmes to discuss the IPCC’s treatment of uncertainty, placing it under the scrutiny I have applied here. But I can’t see that happening. The very fact that the expert would be an outsider, i.e. somebody that Evan Davis knows isn’t one of the IPCC scientists, would mean that (in the eyes of the BBC) the individual speaks with no authority and so couldn’t possibly know what they were talking about. I would obviously prefer to think that I do know what I am talking about, but not because Mr Davis might agree so – may I be so immodest as to propose that it could be because the evidence suggests that I do? It’s just a shame that evidence carries so little weight nowadays.
 See paragraph 10 of “Procedures Guiding IPCC Work”, which states that “In taking decisions, and approving, adopting and accepting reports, the Panel, its Working Groups and any Task Forces shall use all best endeavors to reach consensus.”
 A full explanation, containing several diagrams to help you visualize what I am saying, may be found in this excellent paper.
 In fact, the guidance note warns against treating the probabilistic discord as a measure of epistemic uncertainty, leaving the reader to speculate that perhaps the IPCC considers uncertainty in model predictions to be simply a matter of inherent variability.
 To save the reader from further grief, I will not pontificate upon the linguistic vagueness one always invites when using degree adjectives such as ‘likely’, and how arbitrary boundaries are just a futile attempt to militate against sorites paradox.