Simple (unconstrained) ordination analyses (such as PCA) on a single data matrix \(X\) helps to reveal its major structure (Borcard, Gillet, and Legendre 2011).
There are not notions of explanatory or response variables.
On contrary, canonical ordination such as RDA explicitly explores the relationships between two matrices: a response matrix and an explanatory matrix.
RDA is the multivariate (meaning multiresponse) technique analogue of regression.
The method uses a mix of linear regression and principal components analysis (PCA).
Conceptually, RDA is a multivariate (meaning multiresponse) multiple linear regression followed by a PCA of the table of fitted values.
Lets define :
RDA procedure works on both centered matrices. This simply means that that the average of the variable is subtracted from each observation.
\[ \bar{X}_j = \sum_{i = 1}^{n} X_{ij} = 0 \]
\[ \bar{Y}_j = \sum_{i = 1}^{n} Y_{ij} = 0 \]
These steps are from Borcard, Gillet, and Legendre (2011) which I highly recommend.
Regress each (centered) \(y\) variable on explanatory matrix \(X\) and compute the fitted (\(\hat{y}\)) and residuals (\(y_{res}\)) vectors.
Create a new matrix (\(\hat{Y}\)) containing all the fitted vectors (\(\hat{y}\)).
Compute a PCA on \(\hat{Y}\). This will produces a vector of canonical eigenvalues and a matrix \(U\) of canonical eigenvectors (principal components).
\(\hat{Y}\) is produced using multiple linear regression between \(X\) and each \(y_i\).
A PCA is performed on \(\hat{Y}\) which gives a set of principal component vectors \(U\).
PCA and RDA are very similar:
PCA is performed on a matrix with explanatory variables.
RDA is performed on a matrix of predicted explanatory variables.
Depends on how site scores are calculated (two possibilities).
Site scores calculated using \(Y \times U\) are simply called site scores where as scores calculates using \(\hat{Y} \times U\) are called site constraints since they are calculated using linear combinations of constraining variables \(X\).
The vegan
package makes it very easy to perform RDA in R using the RDA()
function.
The data come from a Ph.D. thesis (Verneaux, 1973) who proposed to use fish species to characterize ecological zones along European rivers and streams. He showed that fish communities were good biological indicators of these water bodies. Data have been collected at 30 localities along Doubs river.
The first matrix (Y) contains coded abundances of 27 fish species.
The second matrix (X) contains 11 environmental variables related to the hydrology, geomorphology and chemistry of the river.
Reference: Verneaux, J. (1973) Cours d’eau de Franche-Comté (Massif du Jura). Recherches écologiques sur le réseau hydrographique du Doubs. Essai de biotypologie. Thèse d’état, Besançon. 1–257.