## Appendix 12.4: Dimension Reduction

Estimation of the signal amplitudes, as well as the detection and attribution
consistency tests on the amplitudes, requires an estimate of the covariance
matrix **C**_{uu} of the residual noise field. However, as **y**
typically represents climate variation on time-scales similar to the length
of the observed instrumental record, it is difficult to estimate the covariance
matrix reliably. Thus the covariance matrix is often estimated from a long control
simulation. Even so, the number of independent realisations of **u** that
are available from a typical 1,000 to 2,000-year control simulation is substantially
smaller than the dimension of the field, and thus it is not possible to estimate
the full covariance matrix. The solution is to replace the full fields **y**,
**g**_{1},...,g_{m} and **u** with vectors
of dimension *k*, where *m<k<<n*, containing indices of their
projections onto the dominant patterns of variability **f**_{1},...,f_{k}
of **u**. These patterns are usually taken to be the *k* highest variance
EOFs of a control run (North and Stevens, 1998; Allen and Tett, 1999; Tett et
al., 1999) or a forced simulation (Hegerl et al., 1996, 1997; Schnur, 2001).
Stott and Tett (1998) showed with a "perfect model" study that climate change
in surface air temperature can only be detected at very large spatial scales.
Thus Tett et al. (1999) reduce the spatial resolution to a few spherical harmonics
prior to EOF truncation. Kim et al. (1996) and Zwiers and Shen (1997) examine
the sampling properties of spherical harmonic coefficients when they are estimated
from sparse observing networks.

An important decision, therefore, is the choice of *k*. A key consideration
in the choice is that the variability of the residuals should be consistent
with the variability of the control simulation in the dimensions that are retained.
Allen and Tett (1999) describe a simple test on the residuals that makes this
consistency check. Rejection implies that the model-simulated variability is
significantly different from that of the residuals. This may happen when the
number of retained dimensions, *k*, is too large because higher order EOFs
may contain unrealistically low variance due to sampling deficiencies or scales
that are not well represented. In this situation, the use of a smaller value
of *k *can still provide consistent results: there is no need to require
that model-simulated variability is perfect on all spatio-temporal scales for
it to be adequate on the very large scales used for detection and attribution
studies. However, failing the residual check of Allen and Tett (1999) could
also indicate that the model does not have the correct timing or pattern of
response (in which case the residuals will contain forced variability that is
not present in the control regardless of the choice of *k*) or that the
model does not simulate the correct amount of internal variability, even at
the largest scales represented by the low order EOFs. In this case, there is
no satisfactory choice of *k*. Previous authors (e.g., Hegerl et al., 1996,
1997; Stevens and North, 1996; North and Stevens, 1998) have made this choice
subjectively. Nonetheless, experience in recent studies (Tett et al. 1999; Hegerl
et al. 2000, 2001; Stott et al., 2001) indicates that their choices were appropriate.