## Appendix 12.5: Determining the Likelihood of Outcomes (p-values)

Traditional statistical hypothesis tests are performed by comparing the value
of a detection statistic with an estimate of its natural internal variability
in the unperturbed climate. This estimate must be obtained from control climate
simulations because detection statistics typically measure change on time-scales
that are a substantial fraction of the length of the available instrumental
record (see Appendix 12.4). Most "optimal" detection studies
use two data sets from control climate simulations, one that is used to develop
the optimal detection statistic and the other to independently estimate its
natural variability. This is necessary to avoid underestimating natural variability.
The *p*-value that is used in testing the no signal null hypothesis is
often computed by assuming that both the observed and simulated projections
on signal patterns are normally distributed. This is convenient, and is thought
to be a reasonable assumption given the variables and the time and space scales
used for detection and attribution. However, it leads to concern that very small
*p*-values may be unreliable, because they correspond to events that have
not been explored by the model in the available control integrations (Allen
and Tett, 1999). They therefore recommend that *p*-values be limited to
values that are consistent with the range visited in the available control integrations.
A non-parametric approach is to estimate the *p*-value by comparing the
value of the detection statistic with an empirical estimate of its distribution
obtained from the second control simulation data set. If parametric methods
are used to estimate the *p*-value, then very small values should be reported
as being less than 1/*n*_{p} where *np* represents the equivalent
number of independent real-isations of the detection statistic that are contained
in the second control integration.