Simple (unconstrained) ordination analyses (such as PCA) on a single data matrix

**\(X\)**helps to reveal its major structure (Borcard, Gillet, and Legendre 2011).There are not notions of

**explanatory**or**response**variables.On contrary,

**canonical ordination**such as RDA explicitly explores the relationships between two matrices: a**response**matrix and an**explanatory**matrix.

RDA is the multivariate (meaning

*multiresponse*) technique analogue of regression.The method uses a mix of linear regression and principal components analysis (PCA).

Conceptually, RDA is a multivariate (meaning multiresponse) multiple linear regression followed by a PCA of the table of fitted values.

Lets define :

- \(X\) a matrix of
**explanatory variables** - \(Y\) a matrix of
**response variables**

RDA procedure works on both **centered** matrices. This simply means that that the average of the variable is subtracted from each observation.

\[ \bar{X}_j = \sum_{i = 1}^{n} X_{ij} = 0 \]

\[ \bar{Y}_j = \sum_{i = 1}^{n} Y_{ij} = 0 \]

These steps are from Borcard, Gillet, and Legendre (2011) which I **highly** recommend.

Regress each (centered) \(y\) variable on explanatory matrix \(X\) and compute the fitted (\(\hat{y}\)) and residuals (\(y_{res}\)) vectors.

Create a new matrix (\(\hat{Y}\)) containing all the fitted vectors (\(\hat{y}\)).

Compute a PCA on \(\hat{Y}\). This will produces a vector of canonical eigenvalues and a matrix \(U\) of canonical eigenvectors (principal components).

\(\hat{Y}\) is produced using multiple linear regression between \(X\) and each \(y_i\).

A PCA is performed on \(\hat{Y}\) which gives a set of principal component vectors \(U\).

PCA and RDA are very similar:

PCA is performed on a matrix with explanatory variables.

RDA is performed on a matrix of

**predicted**explanatory variables.

Depends on how site scores are calculated (**two possibilities**).

- \(Y \times U\) to obtain ordination in the space of the original variables \(Y\).
- \(\hat{Y} \times U\) to obtain ordination in the space of the variables \(X\).

Site scores calculated using \(Y \times U\) are simply called **site scores** where as scores calculates using \(\hat{Y} \times U\) are called **site constraints** since they are calculated using linear combinations of constraining variables \(X\).

The `vegan`

package makes it very easy to perform RDA in R using the `RDA()`

function.

The data come from a Ph.D. thesis (Verneaux, 1973) who proposed to use fish species to characterize ecological zones along European rivers and streams. He showed that fish communities were good biological indicators of these water bodies. Data have been collected at 30 localities along Doubs river.

The first matrix (**Y**) contains coded abundances of 27 fish species.

The second matrix (**X**) contains 11 environmental variables related to the hydrology, geomorphology and chemistry of the river.

*Reference:* Verneaux, J. (1973) Cours d’eau de Franche-Comté (Massif du Jura). Recherches écologiques sur le réseau hydrographique du Doubs. Essai de biotypologie. Thèse d’état, Besançon. 1–257.