Methodology

Extrapolation of Extreme Values

Methodological approach for extrapolating extreme climate values.

Methodological approach and reference

Climatology is generally based on 30-year periods during which the climate is assumed to be stationary. As low probabilities events (typically with a return period of 10 years or more) cannot be computed empirically from a 30 years sample they must be extrapolated. Many extrapolation methods are used, and none is universally accepted in the industrial field^xi.

In this project we used a statistical method based on Extreme Value Theory^xii. The idea of this method is to take extremes values from a weather data sample and fit the tail of the probability distribution with an appropriate extreme value distribution. More specifically, we use the block-maxima approach to select extrema and the maximum likelihood estimation to fit their distribution to a generalized extreme value (GEV) distribution.

This method is similar to the "historical method" recommended in the ISO19901, except that input data are derived from bias-corrected climate projections instead of in-situ measurements or model hindcasts. It is frequently used in climate research^xiii.

A key advantage of extreme value analysis is that it is highly flexible as it is based on the mathematical properties of extreme distribution and not on a knowledge of the underlying physical process. Therefore, the same method can be used to study heatwaves, cold spells, extreme rainfall, etc.

The definition of the blocks is adapted to the variable and location, e.g.: for maximum temperature the block is a calendar year (January-December) in the northern hemisphere and July-June in the southern hemisphere.

Illustration

The extrapolation is based on a sample of 30 years, either from reanalysis for current and past climate or from bias-corrected projections for future climate.

The first step is to select the extreme values from the sample using the block-maxima method: we divide the sample in blocks of similar duration (1 year) and take the highest value in each block:

Figure 1 — Extreme selection using the block maxima method

Assuming that the selected values are independent and identically distributed, the normalized distribution of block maxima must converge to a generalized extreme value (GEV) distribution:

f(x) = \frac{1}{\sigma}\left(1 + \xi(\frac{x - \mu}{\sigma})\right)^{-\frac{1}{\xi} - 1} e^{-\left(1 + \xi(\frac{x - \mu}{\sigma})\right)^{-\frac{1}{\xi}}}

This mathematical property is known as the Fisher-Tippett theorem. Other distributions can be used, such as the Fréchet, Gumbel or Weibull distributions, but they are in facts special cases of the more general GEV distributions. ClimateVision uses the more general form as it does not require assumptions on the physical properties of the extremes.

The trade-off is that the GEV has three parameters, $\mu$ (location), $\sigma$ (scale) and $\xi$ (shape), while other distributions may have less degree of liberty. This means that fitting our maxima to the GEV (i.e.: choosing value of the parameters that minimize the distance between the empirical and theoretical distributions) will be more complex. Various methods exist and can yield slightly different results, especially for long return periods and/or small samples^xiv^,^xv.

ClimateVision uses maximum likelihood method. The idea is to define the probability to get the observed data as a function of the parameters, then numerically choose the set of parameters that maximize this function. This is a flexible method, widely adopted in statistics as it can be used to adjust any theoretical distribution to observed data, and it usually performs well for extreme value analysis.

Figure 2 — GEV fitted to an empirical distribution of block maxima

The cumulative density function of the fitted GEV can be interpreted as the probability that a value is not exceeded, in other words the long-term frequency of the event "the value $x$ is exceeded" is:

\mathfrak{f}(x) = P(X \geq x) = 1 - F(x)

With:

F(x) = \int_{-\infty}^{x} f(u) \, du

Conversely, the return period of a given $x$ value is:

T(x) = \frac{1}{\mathfrak{f}(x)} = \frac{1}{1 - F(x)}

This formula and the fitted GEV distribution can be used to estimate the return level associated with any specified return period:

Figure 3 — Theoretical and empirical return levels as a function of return periods

Minima are computed using the same method simply by inversing the sample data and results:

Min(f) = -Max(-f)