There have been many attempts to obtain a "figure of merit" for climate models. Usually such quantification is only attempted for well-observed atmospheric variables and range from calculation of simple root mean square errors (r.m.s.) between a model variable and an observation, to more complex multi-variate calculations. Among the most promising attempts at generating skill scores deemed more suitable for climate models are: the normalised mean square error approach of Williamson (1995) that follows on, in part, from Murphy (1988); and the categorisation of models in terms of combination of the error in the time mean and the error in temporal variability along the lines suggested by Wigley and Santer (1990) (see Chapter 5, Section 220.127.116.11. of the SAR for an example). Other less widely used non-dimensional measures have also been devised (e.g., Watterson, 1996). Although a number of skill scoring methods have been devised and used for the seasonal prediction problem (e.g., Potts et al., 1996; linear error in probability space score - LEPS) these have not found general application in climate models. Attempts to derive measures of the goodness of fit between model results and data containing large uncertainties have been partially successful in the oceanographic community for a limited number of variables (Frankignoul et al., 1989; Braconnot and Frankignoul, 1993). Fuzzy logic techniques have been trialled by the palaeoclimatology community (Guiot et al., 1999). It is important to remember that the types of error measurement that have been discussed are restricted to relatively few variables. It has proved elusive to derive a fully comprehensive multi-dimensional "figure of merit" for climate models.
Since the SAR, Taylor (2000) has devised a very useful diagrammatic form (termed a "Taylor diagram" - see Section 18.104.22.168 for description) for conveying information about the pattern similarity between a model and observations. This same type of diagram can be used to illustrate the relative accuracy amongst a number of model variables or different observational data sets (see Section 8.5.1). One additional advantage of the "Taylor diagram" is that there is no restriction placed on the time or space domain considered.
While at times we use a figure of merit to intercompare models for some selected variables, we usually apply more subjective assessments in our overall evaluations; we do not believe it is objectively possible to state which model is "best overall" for climate projection, since models differ amongst themselves (and with available observations) in many different ways. Even if a model is assessed as performing credibly when simulating the present climate, we cannot be sure that the response of such a model to a perturbation remains credible. Hence we also rely on evaluating models in their performance with individual processes (see Chapter 7) as well as past climates as in Section 8.5.5.
Other reports in this collection