Methodology

Statistical Downscaling

Methodology for statistical downscaling including CDF-t principle, R2D2, and precipitation correction.

Methodology

Global and regional climate models' resolution is often lower than desired and they are in general biased. As a result, bias correction and downscaling are widely used in climate impact modelling.

Bias correction is the process of adjusting climate model outputs to account for their systematic errors. Numerous statistical bias correction methods exist. The essence of these methods is to build a transformation that fits the simulations on a past reference period to the observations on this same period, then apply the same transformation to future climate simulations with the fundamental assumption that it will remain valid in the future.

No method can be considered as entirely reliable^viii, the choice of a suitable method depends on the use case but also on the needs and technical constraints of each project (e.g.: explicability, computational efficiency…).

At this stage, ClimateVision employs one method: the Cumulative Distribution Function transform (CDF-t) for all the variables.

The CDF-t method^ix, a well-established variation of the quantile-quantile method. It effectively corrects biases while preserving trends in future scenarios. It has been used in numerous research and adaptation projects.

CDF-t principle

The CDF-transformation method shares the same philosophy of quantile-quantile bias correction methods. CDF-t differs from typical quantile-quantile methods by considering the change in the large-scale CDFs between the training and the future period. These transfer functions between past and future CDFs correspond to a change in the distribution of the variable of interest over time. The CDF-t method assumes that the evolution over time of the distribution of the global scale variable is identical to the evolution of this local scale variable.

Let $V$ be the number of variables and $N$ be the number of time steps. $X_{M_p}$ is the $V \times N$ matrix containing projections. $X_{M_p}^d(n)$ is the value of the model simulation for the variable $d$ and the $n$ th time step of the projection period. Similarly, $X_{M_c}$ and $X_{R_c}$ are the matrixes containing the model simulation and observation records, respectively, for the calibration period.

Let $F_{M_p}^d$ , $F_{M_c}^d$ and $F_{R_c}^d$ be the univariate cumulative distribution functions of the model projections, model calibration and records calibration data for the variable $d$ . These distributions can be estimated from the data.

The objective is to calculate $F_{R_p}^d$ , the cumulative distribution of observations in the future.

	Calibration (Past reference period)	Projection (Future period)
Model (Simulated weather)	$F_{M_c}^d$	$F_{M_p}^d$
Records (Actual weather)	$F_{R_c}^d$	$F_{R_p}^d$ ?

Let $T$ be the transfer function between the cumulative distribution function of the observations and projections during the reference period. For any quantile $q$ :

$F_{R_c}^d(q) = T\left(F_{M_c}^d(q)\right)$

By inversing this relation, we get:

$T(u) = F_{R_c}^d\left({F_{M_c}^d}^{-1}(u)\right)$ for $u$ in $[0, 1]$

Assuming time stationarity of the transfer function, the first relation can be applied to the projection period as well:

$F_{R_p}^d(q) = T\left(F_{M_p}^d(q)\right)$

By combining the two previous equations, we get:

$F_{R_p}^d(q) = F_{R_c}^d\left({F_{M_c}^d}^{-1}\left(F_{M_p}^d(q)\right)\right)$

Once $F_{R_p}^d$ has been estimated, a simple quantile-quantile method is performed to calculate the bias corrected series $\widehat{X_{M_p}^d}$ , i.e.:

\widehat{X_{M_p}^d}(n) = {F_{R_p}^d}^{-1}\left(F_{M_p}^d(X_{M_p}^d(n))\right)

Figure — Bias correction using the quantile-quantile method.

The method is applied on a month-by-month basis to avoid processing seasonal data.

R2D2

ClimateVision employs a multivariate bias correction technique for high-dimensional climate simulations, known as Rank Resampling for Distributions and Dependences (R2D2). The method aims to adjust not only the univariate marginal distributions of climate model outputs but also their multivariate dependence structures across variables and sites, which traditional univariate bias correction methods do not account for. R2D2 operates in a marginal/dependence correction framework:

Each statistical dimension (variable at a location) is first corrected independently using a univariate bias correction method, such as the CDF-t approach in ClimateVision.
A "reference dimension" is selected, and its bias-corrected rank sequence is preserved to drive the multivariate reconstruction. By default, ClimateVision employs the primary variable; for instance, when using R2D2 to correct wind and temperature data for the purpose of computing wind power production, the bias-corrected time series of wind speed serves as the reference.
For each time step of a projection period (e.g., future climate), ranks of the reference dimension are matched to ranks in a historical calibration period to identify analogous reference states.
For each match, the corresponding inter-variable and inter-site rank structures from the reference dataset are imposed by shuffling the corrected values, thereby reproducing the empirical copula of the reference.
The procedure is repeated with each dimension serving as reference to create multiple stochastic corrected outputs.

Crucially, R2D2 assumes stationarity of the multivariate dependence (copula) structure over time, enabling its application in high-dimensional contexts (e.g., thousands of grid cells and variables). The approach relaxes the deterministic temporal constraints of previous copula-based bias correction methods, allowing the climate model to retain its inherent temporal dynamics while preserving inter-site and inter-variable dependencies.

It is important to note that the R2D2 method substantially degrades autocorrelation. Consequently, the resulting data should be used with caution.

Precipitation special case

To correct the precipitation the CDF-t method is combined with the Singularity Stochastic Removal (SSR) approach^x.

Due to the nature of precipitation, additional steps are required for bias correction of this variable. The correction procedure follows these steps:

Application of the SSR
Transformation using the Log1x function
Bias correction using the CDF-t method
Transformation using the Exp1x function
Application of the inverse SSR

First, a threshold $th$ is set to $1 \times 10^{-4}$ mm.day⁻¹. For a time series to be corrected, all values below the threshold $th$ (including zeros) are replaced by a random value drawn from a uniform distribution over the interval $[0, th)$ . As a result, there are no longer any zero-precipitation values in the dataset. The resulting series is then transformed using the Log1x function:

\text{log1x}(x) = \begin{cases} x - 1 & \text{if } x \geq 1 \\ \ln(x) & \text{if } 0 < x < 1 \\ x & \text{otherwise} \end{cases}

The CDF-t bias correction is subsequently applied to this series. After correction, the Exp1x function is applied to the bias-corrected precipitation values:

\text{exp1x}(x) = \begin{cases} e^x & \text{if } x < 0 \\ x + 1 & \text{otherwise} \end{cases}

Finally, the inverse SSR is applied by setting values below the threshold $th$ to zero.

In addition, a security barrier has been implemented for the precipitation extremums. To limit the effects of an overestimation of extreme precipitation, sometimes introduced by the CDF-t method, an adaptive smoothing approach for the distribution tails was implemented. It is based on the definition of a maximum precipitation threshold $S$ , determined as follows:

S = \max\{\text{ERA5}\} \times \min \left( 2.0, \frac{Q95(\text{GCM}_{\text{proj}})}{Q95(\text{GCM}_{\text{calib}})} \right)

This threshold helps to constrain the excessive amplification of extremes during the projection phase. It depends on the evolution of the 95th percentile of the GCM between the calibration and projection periods, with an upper bound set at +100% relative to observations (ERA5).

In practical terms, for each corrected time series, if the maximum corrected precipitation value $M$ exceeds this threshold ( $M > S$ ), the upper tail of the distribution is adjusted. The interval $[Q95(\text{corrected}), M]$ is then reprojected onto $[Q95(\text{corrected}), S]$ , ensuring a smooth transition while preventing unrealistic extreme values.