One of the central ideas underpinning climate change projection is that of the climate model ensemble. The thinking behind an ensemble is very simple. If one takes a number of models, each of which cannot be assumed to be a faithful and complete representation of the system being modelled, then the average of their projections is likely to be closer to the truth than any single one. Furthermore, the spread of projections can be used to represent the uncertainty associated with that average. This entails a well-established statistical approach in which the spread is assumed to take the form of a probability density function.
However, nobody should underestimate the difficulties to be overcome by the climate modellers, since the models concerned will be subject to a number of parametric and structural uncertainties. Furthermore, the tuning of the models, and the basis for their inclusion and weighting within the ensemble, can be more of a black art than a science. Despite this, it is assumed that the resulting uncertainties are nevertheless captured by the properties of the probability distribution. The purpose of this article is to convince you that the problems are too deep-seated for that to be the case, since the uncertainties can actually invalidate the statistical methodology employed to analyse the ensemble. In reality, the distribution of projections is highly unlikely to form a true probability density function and so should not be treated as such.
To make my case I will be dividing my argument into two parts. In the first, I will introduce some basics of uncertainty analysis and explain how their consideration has led to the development of guidelines for the handling of uncertainty in system modelling. There will be no reference to climate science in this first part since the guidelines have general application. In the second part, I will examine the extent to which the guidelines have been embraced by the climate science community and explain the implications for evaluation of climate model ensembles. Finally, having made my case, I will briefly discuss the origins and significance of the climate science community’s mishandling of uncertainty analysis and what this may mean for the validity of climate change policy. The central premise is that no policy, least of all one associated with something as hugely important as climate change management, should be based upon the misapplication of statistical technique.
The background issues
Fundamental to an understanding of uncertainty is the ability to answer the following question: Does the uncertainty regarding the likely future state of a system have as its basis the inherent variability of that system, or is it due to gaps in our understanding of the causation behind the variability? To the extent that variability by itself can lead to uncertainty, one will be in the province of stochastic physics. Here the mathematics of randomness will apply, with the caveat that one may be dealing with a chaotic system in which the randomness stems from deterministic processes that are critically sensitive to boundary conditions. If the processes involved are fully understood, and the boundary conditions have been quantified with maximum precision, then the residual uncertainty will be both objective and irreducible. Such uncertainty is sometimes known as aleatory uncertainty, from the Latin for a gambler’s die, alea.1
In practice, however, the basis for the variability may not be fully understood. These gaps in knowledge lead to a fundamentally different form of uncertainty called epistemic uncertainty. Such uncertainty is subjective, since different parties may have differing perspectives that bear upon their understanding of the variability. It can be reduced simply by filling the gaps in knowledge. As a side-effect of the reduction of epistemic uncertainty, a consensus will emerge. However, care has to be taken when using consensus as a metric to measure levels of epistemic uncertainty, since there are also non-epistemic processes that can lead to the development of consensus.2
A simple example that can be used to illustrate the distinction between aleatory and epistemic uncertainty would be a system in which variability results from the random outcome of a thrown die. For a die of a known number of sides, the average score over time is determined by well-understood probability theory and it is a simple matter to determine the appropriate probability distribution. Furthermore, the uncertainty of outcome is both objective and irreducible – it is, after all, the titular aleatory uncertainty. However, now consider a situation in which the number of sides of the die is unknown. In this example the range of possibilities for the average score over time is greater, and there is both an epistemic and aleatory component to the uncertainty. It remains the case that the aleatory component can be defined using a probability density function but the same cannot be said of the epistemic component since it represents an entirely deterministic situation (i.e. it is a fixed number that is simply unknown). A set of probabilities can be provided representing the likelihoods for the various possible number of sides, but there can be no guarantee that the full state space is represented by such a set and there is no basis for assuming that stochastic logic applies to their allocation (as it would for the aleatory). For that reason the amalgamation of the epistemic and aleatory uncertainties cannot be represented by a true probability density function, or at the very least, such a function would not be as meaningful as that used for the aleatory component on its own.
You will also note that in both situations the likelihood (i.e. for the average score of the known die, and for the number of sides of the unknown die) has been characterised by the allocation of probabilities. This is a fundamental difficulty that lies at the heart of uncertainty analysis since probability has such a dual purpose of quantifying both variability and incertitude (between which it does not discern). Consequently, any mathematical technique for analysing uncertainty that is basically probabilistic will suffer from the same drawback, i.e. it runs the risk of conflating variability and incertitude. Yet, as I have just pointed out, only the former lends itself to a stochastically determined probability distribution. It is highly likely, therefore, that the application of such a probabilistic technique will lead to erroneous results if the epistemic and aleatory components have not been isolated prior to the positing of a probability distribution representing the overall uncertainty. As Professors Armen Der Kiureghian and Ove Ditlevsen have noted in the context of time-variant, structural reliability modelling:
The distinction between aleatory and epistemic uncertainties is determined by our modeling choices. The distinction is useful for identifying sources of uncertainty that can be reduced, and in developing sound risk and reliability models. It is shown that for proper formulation of reliability, careful attention should be paid to the categorization (epistemic, aleatory, ergodic or non-ergodic) of uncertainties. Failure to do so may result in underestimation or overestimation of failure probability, which can be quite significant (orders of magnitude) in certain cases.‘Aleatory or epistemic? Does it matter?’, 2007
In real life, most systems are a lot more complicated than the throwing of a single n-sided die. Typically, a system’s mathematical model will have many free variables (i.e. parameters and boundary conditions) of uncertain value that will determine any projection based upon it. To handle this complexity, modellers employ a technique referred to as Monte Carlo simulation in which multiple runs of the model are aggregated to create a statistical spread of projections. Each run is based upon a random sampling of the possible parametric or boundary condition values allowed for by the model. When performing the sampling, the likelihood of selecting any particular value for a given variable is determined by the probability distribution that applies to that variable’s uncertainty. Note, however, that I used the term ‘random sampling’ here. This is because Monte Carlo simulation is a technique originally developed for the analysis of aleatory uncertainty (it finds widespread use in the field of stochastic physics). When the variable’s uncertainty is epistemic, however, random sampling is inappropriate because the actual value is subject to deterministic incertitude and not stochastic variability. For such variables, one would be sampling possible opinions of experts rather than variable states of the system. But Monte Carlo simulation doesn’t care. Give it a set of probabilities expressing a variable’s uncertainty and it will crunch those numbers merrily and provide you with a very aleatoric looking output. That doesn’t mean that it can be relied upon, however.
The restrictions on the applicability of Monte Carlo simulation are well-known and have led to written guidelines to be followed by mathematical modellers working within the field of risk assessment. Here, for example, is how the US Environmental Protection Agency (EPA) has put it:
Monte Carlo simulation also has important limitations, which have restrained EPA from accepting it as a preferred risk assessment tool. Available software cannot distinguish between variability and uncertainty. Some factors, such as body weight and tap water ingestion, show well-described differences among individuals. These differences are called ‘variability’. Other factors, such as frequency and duration of trespassing, are simply unknown. This lack of knowledge is called ‘uncertainty’.3 Current Monte Carlo software treats uncertainty as if it were variability, which may produce misleading results.‘Use of Monte Carlo simulation in risk assessments’, 1994
And here is what Professor Susan R. Poulter has to say regarding the guidelines issued by the US National Academy of Sciences (NAS):
In ‘Science and Judgment in Risk Assessment’, the NAS noted the problems with incorporation of subjective assessments of model uncertainty into probability distribution functions, suggesting that such quantitative analyses of model uncertainty be reserved for priority setting and risk trading. For standard setting, residual risk determinations and risk communication, however, the NAS recommended that separate analyses of parameter uncertainty be conducted for each relevant model (rather than a single hybrid distribution), with additional reporting of the subjective probability that each model is correct.4‘Monte Carlo simulation in environmental risk assessment –Science, policy and legal issues’, 1998
None of this is to say that Monte Carlo simulation is a bad idea or a technique that should never be used. On the contrary, it can be an invaluable tool when analysing the effects of variability in essentially stochastic systems. The problems only emerge when epistemic uncertainty features in the system’s mathematical model and this is not properly isolated, and allowed for, when assessing the extent to which the probability distributions produced by the simulation fully capture the uncertainty.
So, the key questions are these: Is there any indication that these issues and the guidelines arising have been taken on board by the climate scientists? Have they understood that ensemble statistics can be inappropriate or very misleading when epistemic uncertainties are present? And if they haven’t, can we trust the uncertainties they attribute to climate model projections, particularly when model ensembles are involved?
The relevance to climate science
In fact, there is plenty of evidence that the issues are familiar to the community of climate modellers. For example, there is the following from Professor Jeroen Pieter van der Sluijs, of the University of Bergen, regarding the uncertainties associated with climate sensitivity:
Being the product of deterministic models, the 1.5°C to 4.5°C range is not a probability distribution. There have, nevertheless, been attempts to provide a ’best guess’ from the range. This has been regarded as a further useful simplification for policy-makers. However, non-specialists – such as policy-makers, journalists and other scientists – may have interpreted the range of climate sensitivity values as a virtual simulacrum of a probability distribution, the ’best guess’ becoming the ’most likely’ value.‘Anchoring amid uncertainty’, 1997
The above quote makes it clear that the problem may not lie with the modellers but how various stakeholders and pundits choose to simplify the situation and read into it a scientific rigour that simply is not there. A similar point is made by Gavin Schmidt, climate modeller and Director of the NASA Goddard Institute for Space Studies (GISS) in New York:
Collections of the data from the different groups, called multi-model ensembles, have some interesting properties. Most notably the average of all the models is frequently closer to the observations than any individual model. But does this mean that the average of all the model projections into the future is in fact the best projection? And does the variability in the model projections truly measure the uncertainty? These are unanswerable questions.‘Climate models produce projections, not probabilities’, 2007
Actually, the questions are not unanswerable, and the answer is ‘no’. It is Schmidt himself who provides the reasons when he goes on to say:
Model agreements (or spreads) are therefore not equivalent to probability statements. Since we cannot hope to span the full range of possible models (including all possible parameterizations) or to assess the uncertainty of physics about which we so far have no knowledge, hope that any ensemble range can ever be used as a surrogate for a full probability density function of future climate is futile.
As with Professor van der Sluijs, Schmidt implies that the pressure to ignore the issue comes from outside the community of climate modellers. Schmidt continues:
Yet demands from policy makers for scientific-looking probability distributions for regional climate changes are mounting, and while there are a number of ways to provide them, all, in my opinion, are equally unverifiable. Therefore, while it is seductive to attempt to corner our ignorance with the seeming certainty of 95-percent confidence intervals, the comfort it gives is likely to be an illusion.
Professor Eric Winsberg of the University of South Florida, a philosopher who specialises in the treatment of uncertainty in mathematical models, could not have made the problem with climate model ensembles any clearer when he said:
Ensemble methods assume that, in some relevant respect, the set of available models represent something like a sample of independent draws from the space of possible model structures. This is surely the greatest problem with ensemble statistical methods. The average and standard deviation of a set of trials is only meaningful if those trials represent a random sample of independent draws from the relevant space—in this case the space of possible model structures. Many commentators have noted that this assumption is not met by the set of climate models on the market…Perhaps we are meant to assume, instead, that the existing models are randomly distributed around the ideal model, in some kind of normal distribution, on analogy to measurement theory. But modeling isn’t measurement, and so there is very little reason to think this assumption holds.‘Values and uncertainties in the predictions of global climate models’, 2012
This quote alludes to a problem that goes well beyond that of basic Monte Carlo simulation and its applicability. The ensembles to which Winsberg refers are not just ensembles of models in which differing parameters and boundary values are sampled from the range of possibilities — they are models that do not even have the same structure! To put this state space problem in perspective, consider the following quote from Professor Ken Carslaw:
But when it comes to understanding and reducing uncertainty, we need to bear in mind that a typical set of, say, 15 tuned models is essentially like selecting 15 points from our million-cornered space, except now we have 15 hopefully overlapping, million-cornered spaces to select from.‘Climate models are uncertain, but we can do something about it’, 2018
If that does not invalidate the interpretation of the ensemble output as a probability density function, then I do not know what could.5
And yet, despite the guidance provided by the likes of the EPA and NAS, and despite the fact that the problem is known even to the climate modelling community, the practice of treating the spread of outputs from a climate model ensemble as if it were a probability density function resulting from a measurement exercise is widespread and firmly established. It seems the “demands from policy makers for scientific-looking probability distributions” have been enthusiastically fulfilled by those who should, and quite possibly do, know better. Take for example, the following figure to be found in the IPCC’s AR5, WG1, ‘The Physical Science Basis’:
Note that both variability and uncertainty are treated equivalently here, as are the uncertainties relating to historical records and model-based projections. Furthermore, the claim regarding the portrayal of ‘90% uncertainty ranges’ indicates that everything is being treated as a probability distribution that adequately captures and quantifies the uncertainty. Also note that this is a diagram that appears in the report’s introduction and so serves the political purpose that was once the reserve of the hockey stick graph. As an eye-catching graphic, it is very successful. As a scientific representation of uncertainty, it falls well short. It does indeed appear that the policy makers have got what they wanted – a scientific-looking portrayal of uncertainty that appeals to their schoolday training in statistics.
Origins and implications
To fully appreciate what is going on here one has to understand that the idea of climate modelling and the use of model ensembles has its genesis in stochastic physics and the pioneering work of the likes of Suki Manabe (recent recipient of the Nobel Prize) and Professor Tim Palmer, Royal Society Research Professor in the Department of Physics, Oxford University, and author of the recently published ‘The Primacy of Doubt’. Both are pre-eminent in their field, both are experts in physical variability (Palmer explains that an alternative book title he had considered was ‘The Geometry of Chaos’). Yet neither has shown any signs of acknowledging that not every uncertainty in life can be characterised by a probability density function.
In his book, Palmer explains at some length the physical and mathematical basis for our climate’s variability and how it can be modelled, but not once does he mention the words ‘epistemic’ or ‘aleatory’ in the context of model uncertainty.6 He also sets out how Monte Carlo simulation can be used to analyse climate model ensembles, but not once does he acknowledge its limitations or refer to any of the published guidelines for its applicability. And then, on page 121, he reproduces a histogram7 illustrating the numbers of climate models that are predicting the various possible values for climate sensitivity (ECS), and for good measure he superimposes a probability density function to make clear the uncertainty indicated by the histogram! Why would a brilliant physicist commit such a basic gaffe? Why would he resort to an aleatoric representation of epistemic uncertainty?
Well, maybe because he actually is a brilliant physicist and not an expert in uncertainty analysis. He knows so much about the mathematics of stochastic physics and how it affects climate variability that he just couldn’t resist reaching for an aleatoric representation. It would have seemed to him the most natural thing to do.
And herein lies the rub. It isn’t just the policy makers and their demands for simplistic and familiar representations of uncertainty that are to blame. In many instances it is the scientists themselves who are making the mistake. And who is going to challenge someone such as Professor Palmer – someone who can lay claim to having turned down an offer to work alongside Stephen Hawking? When he suggests in his book that it is always okay to calculate probabilities by counting the numbers of climate models making a particular prediction and then divide by the total number of models, you have a seemingly massive expert authority to challenge. Yes, he is a brilliant scientist, but does that mean he is fully conversant with the philosophy of uncertainty and how it bears upon the validity of methodology?8
Palmer makes a very good point in his book: It is vital that in matters such as climate change that the uncertainties are not underplayed or overplayed. Unfortunately, however, underplaying the uncertainty is exactly what he does when he presents figures such as that on page 121. He superimposes a probability density function to portray the uncertainty but he fails to acknowledge that such a curve can only be a simulacrum of a probability distribution. It couldn’t be otherwise, because it doesn’t come anywhere close to reflecting the epistemic uncertainties affecting the models and the process of selection for the ensemble.
An understated uncertainty, of course, contributes to the narrative that the science is settled, and a settled science is what the policy makers demand. Scientists who treat everything as though the uncertainty is physically inherent rather than characteristic of gaps in their own knowledge are either wittingly or unwittingly playing into the hands of those who think that objectivity is what the science is all about. How this bears upon policy is very much a matter of how one views the importance of uncertainty. Those who would advocate the precautionary principle would argue that knowing the uncertainty has been understated is all the more reason for drastic action. Those who advocate that risk-based decisions should not be made when the uncertainty is high would argue that knowing the uncertainty has been understated is all the more reason instead to employ Robust Decision Making (RDM); in which case, a multi-trillion dollar commitment founded on a 15-point sample of a ‘million-cornered state space’ might not seem such a bright idea. All I will say is that an uncertainty analysis cannot possibly benefit from the misapplication of statistical technique, and so I tend to view the IPCC’s quantification of climate model ensemble uncertainties with a very healthy dose of empirical scepticism.
As a final word of warning, the fact that the uncertainty has been underestimated doesn’t automatically mean that the spread of projections is too narrow; it simply means that no reliable conclusion can be drawn from it because no probabilities can be inferred. This is not a view shared by the majority of climate scientists, however. Climate model ensembles are here to stay, and the illusion of objectivity that a probability density function gives the scientists goes hand in hand with the misconception that they can be certain how uncertain they are.
 Hence the phrase attributed to Julius Caesar when crossing the Rubicon, ‘alea iacta est’, meaning ‘the die is cast’.
 This problem is discussed further in my article ‘The Confidence of Living in the Matrix’.
 Note that the EPA uses the term ‘uncertainty’ for ‘incertitude’. They are actually referring to epistemic uncertainty as opposed to aleatory uncertainty (i.e. variability).
 That said, the NAS is quite critical of the way in which the EPA analyses risk, accusing it of being too qualitative in its approach. On the other hand, in its own devotion to quantification, the NAS quite happily advocates the combination of subjective and objective probability into a hybrid distribution using Monte Carlo simulation, albeit for limited purposes. The obvious drawbacks of this are treated as acceptable and manageable difficulties.
 The distinction being made here is between two basic types of ensemble. The first (the Perturbed Physics Ensemble) is not as problematic as the second (the Multi-Model Ensemble).
 He does, however, use the term ‘epistemic uncertainty’ in the context of quantum mechanics when he refers to the debate as to whether quantum uncertainty is subjective or inherent. This is, of course, the famous Einstein/Bohr debate that was resolved with Alain Aspect’s experimental confirmation of the violation of Bell’s Inequality. Interestingly, Palmer does not use the term ‘aleatory’ anywhere, despite Einstein’s famous quote about God not playing dice.
 My apologies for not being able to reproduce the figure in my own article.
 If Professor Palmer does understand that there is a distinction to be made between incertitude and variability, he doesn’t seem to make much of it. The most revealing passage in his book is an attempt to explain what is meant by predicting an 80% probability for a future weather event. Amongst other things, he stresses that it does not mean that 80% of weather forecast providers think that the event will happen. And yet when it comes to the climate modellers’ multi-model ensembles, the calculus of consensus is very much the order of the day! Ignoring this, Palmer offers a purely frequentist interpretation of the probability, suggesting that he really does believe that modelling is measurement.
Referenced material and further reading
Distinguishing two dimensions of uncertainty, Craig R. Fox and Gulden Ulkiimen, Perspectives on Thinking, Judging, and Decision Making, Oslo, Norway: Universitetsforlaget, 2011.
Uncertainty Management in Reliability/Safety Assessment. In: Reliability and Safety Engineering. Springer Series in Reliability Engineering, vol 0. Springer, London. https://doi.org/10.1007/978-1-84996-232-2_11
Different methods are needed to propagate ignorance and variability, Scott Ferson and Lev R. Ginzburg, Reliability Engineering and System Safety 54, 1996.
Aleatory or epistemic? Does it matter?, Armen Der Kiureghian and Ove Ditlevsen, Special Workshop on Risk Acceptance and Risk Communication, March 26-27, 2007, Stanford University.
Understanding aleatory and epistemic parameter uncertainty in statistical models, N.W. Porter and V. A. Mousseau, Best Estimate Plus Uncertainty (BEPU) International Conference, 2020.
The role of epistemic uncertainty in risk analysis, Didier Dubois, Lecture Notes in Computer Science, LNAI Volume 6379.
What Monte Carlo Methods cannot do, Scott Ferson, Human and Ecological Risk Assessment, Volume 2, 1996 – Issue 4.
Guiding principles for Monte Carlo Analysis, EPA/630/R-97/001, March 1997.
Science Policy Council Handbook: Risk Characterization EPA Office of Science Policy, Office of Research and Development, EPA 100-B-00-002, 2000.
Monte Carlo simulation in environmental risk assessment – Science, policy and legal issues, Susan R. Poulter, Risk: Health, Safety and Environment, Volume 9 Number 1 Article 4, 1998.
Science and judgement in risk assessment: Chapter 9 Uncertainty, National Academies Press, 1994.
Uncertainty analysis methods, S. Mahadevan and S. Sarkar, Consortium for Risk Evaluation, 2009.
Anchoring amid uncertainty, Jeroen Pieter van der Sluijs, 1997.
Climate models produce projections, not probabilities, Gavin Schmidt, Bulletin of the Atomic Scientists, 2007.
Values and uncertainties in the predictions of global climate models, Eric Winsberg, Kennedy Institute of Ethics Journal, Volume 22, Number 2, June 2012.
Climate models are uncertain, but we can do something about it, K.S. Carslaw et al, Eos.org, 2018.
‘The Primacy of Doubt: From Quantum Physics to Climate Change, How the Science of Uncertainty Can Help Us Understand Our Chaotic World’, Tim Palmer, ISBN-10: 1541619714, 2022.
Uncertainty in regional climate modelling: A review, A. M. Foley, National University of Ireland, Maynooth, Republic of Ireland, Progress in Physical Geography 34(5) 647–670, 2010.
Statistical analysis in climate research: Introduction, Hans von Storch and Francis W. Zwiers, Cambridge University Press, 1999.
Multi-model ensembles in climate science: mathematical structures and expert judgements, Julie Jebeile and Michel Crucifix, Studies in History and Philosophy of Science Part A, 2020.
An antidote for hawkmoths: On the prevalence of structural chaos in non-linear modeling, Lukas Nabergall et al, August 2018.
The art and science of climate model tuning, Frederic Hourdin et al, Bulletin of the American Metreological Society, Volume 98, Issue 3.
Evaluation, characterization and communication of uncertainty by the Intergovernmental Panel on Climate Change – An introductory essay, Gary Yohe, Climatic Change, 2011.
Robust Decision Making, Wikipedia.