Simple (unconstrained) ordination analyses (such as PCA) on a single data matrix \(X\) helps to reveal its major structure (Borcard, Gillet, and Legendre 2011).
There are not notions of explanatory or response variables.
On contrary, canonical ordination such as RDA explicitly explores the relationships between two matrices: a response matrix and an explanatory matrix.
RDA is the multivariate (meaning multiresponse) technique analogue of regression.
The method uses a mix of linear regression and principal components analysis (PCA).
Conceptually, RDA is a multivariate (meaning multiresponse) multiple linear regression followed by a PCA of the table of fitted values.
Lets define :
RDA procedure works on both centered matrices. This simply means that that the average of the variable is subtracted from each observation.
\[ \bar{X}_j = \sum_{i = 1}^{n} X_{ij} = 0 \]
\[ \bar{Y}_j = \sum_{i = 1}^{n} Y_{ij} = 0 \]
These steps are from Borcard, Gillet, and Legendre (2011) which I highly recommend.
Regress each (centered) \(y\) variable on explanatory matrix \(X\) and compute the fitted (\(\hat{y}\)) and residuals (\(y_{res}\)) vectors.
Create a new matrix (\(\hat{Y}\)) containing all the fitted vectors (\(\hat{y}\)).
Compute a PCA on \(\hat{Y}\). This will produces a vector of canonical eigenvalues and a matrix \(U\) of canonical eigenvectors (principal components).
\(\hat{Y}\) is produced using multiple linear regression between \(X\) and each \(y_i\).
A PCA is performed on \(\hat{Y}\) which gives a set of principal component vectors \(U\).
PCA and RDA are very similar:
PCA is performed on a matrix with explanatory variables.
RDA is performed on a matrix of predicted explanatory variables.
Depends on how site scores are calculated (two possibilities).
Site scores calculated using \(Y \times U\) are simply called site scores where as scores calculates using \(\hat{Y} \times U\) are called site constraints since they are calculated using linear combinations of constraining variables \(X\).
The vegan
package makes it very easy to perform RDA in R using the RDA()
function.
The data come from a Ph.D. thesis (Verneaux, 1973) who proposed to use fish species to characterize ecological zones along European rivers and streams. He showed that fish communities were good biological indicators of these water bodies. Data have been collected at 30 localities along Doubs river.
The first matrix (Y) contains coded abundances of 27 fish species.
The second matrix (X) contains 11 environmental variables related to the hydrology, geomorphology and chemistry of the river.
Reference: Verneaux, J. (1973) Cours d’eau de Franche-Comté (Massif du Jura). Recherches écologiques sur le réseau hydrographique du Doubs. Essai de biotypologie. Thèse d’état, Besançon. 1–257.
Matrix \(Y\) contains fish species abundance.
CHA | TRU | VAI | LOC | OMB | BLA | HOT | TOX | VAN | CHE | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 5 | 4 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 5 | 5 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 4 | 5 | 5 | 0 | 0 | 0 | 0 | 0 | 1 |
5 | 0 | 2 | 3 | 2 | 0 | 0 | 0 | 0 | 5 | 2 |
6 | 0 | 3 | 4 | 5 | 0 | 0 | 0 | 0 | 1 | 2 |
7 | 0 | 5 | 4 | 5 | 0 | 0 | 0 | 0 | 1 | 1 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 5 |
10 | 0 | 1 | 4 | 4 | 0 | 0 | 0 | 0 | 2 | 2 |
Lets transform abundance data (\(Y\)) using the Hellinger-transform method (Borcard, Gillet, and Legendre 2011; Legendre and Gallagher 2001).
Particularly suited to species abundance data, this transformation gives low weights to variables with low counts and many zeros. The transformation itself comprises dividing each value in a data matrix by its row sum, and taking the square root of the quotient.
Source: http://mb3is.megx.net/gustame/reference/transformations
In R, the Hellinger-transform is performed using the decostand()
function from vegan
package.
CHA | TRU | VAI | LOC | OMB | BLA | HOT | TOX | VAN | CHE | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 0.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2 | 0.00 | 0.65 | 0.58 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
3 | 0.00 | 0.56 | 0.56 | 0.56 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
4 | 0.00 | 0.44 | 0.49 | 0.49 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.22 |
5 | 0.00 | 0.24 | 0.30 | 0.24 | 0.00 | 0.00 | 0.00 | 0.00 | 0.38 | 0.24 |
6 | 0.00 | 0.38 | 0.44 | 0.49 | 0.00 | 0.00 | 0.00 | 0.00 | 0.22 | 0.31 |
7 | 0.00 | 0.56 | 0.50 | 0.56 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.25 |
8 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
9 | 0.00 | 0.00 | 0.27 | 0.46 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.60 |
10 | 0.00 | 0.27 | 0.53 | 0.53 | 0.00 | 0.00 | 0.00 | 0.00 | 0.38 | 0.38 |
Environemental data (\(X\))
das | alt | pen | deb | pH | dur | pho | nit | amm | oxy | dbo | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.30 | 934 | 48.00 | 0.84 | 7.90 | 45 | 0.01 | 0.20 | 0.00 | 12.20 | 2.70 |
2 | 2.20 | 932 | 3.00 | 1.00 | 8.00 | 40 | 0.02 | 0.20 | 0.10 | 10.30 | 1.90 |
3 | 10.20 | 914 | 3.70 | 1.80 | 8.30 | 52 | 0.05 | 0.22 | 0.05 | 10.50 | 3.50 |
4 | 18.50 | 854 | 3.20 | 2.53 | 8.00 | 72 | 0.10 | 0.21 | 0.00 | 11.00 | 1.30 |
5 | 21.50 | 849 | 2.30 | 2.64 | 8.10 | 84 | 0.38 | 0.52 | 0.20 | 8.00 | 6.20 |
6 | 32.40 | 846 | 3.20 | 2.86 | 7.90 | 60 | 0.20 | 0.15 | 0.00 | 10.20 | 5.30 |
7 | 36.80 | 841 | 6.60 | 4.00 | 8.10 | 88 | 0.07 | 0.15 | 0.00 | 11.10 | 2.20 |
8 | 49.10 | 792 | 2.50 | 1.30 | 8.10 | 94 | 0.20 | 0.41 | 0.12 | 7.00 | 8.10 |
9 | 70.50 | 752 | 1.20 | 4.80 | 8.00 | 90 | 0.30 | 0.82 | 0.12 | 7.20 | 5.20 |
10 | 99.00 | 617 | 9.90 | 10.00 | 7.70 | 82 | 0.06 | 0.75 | 0.01 | 10.00 | 4.30 |
Environmental data (X)
Code | Description of the variable |
---|---|
das | Distance from the source [km] |
alt | Altitude [m a.s.l.] |
pen | Slope [per thousand] |
deb | Mean minimum discharge [m3s-1] |
pH | pH of water |
dur | Calcium concentration (hardness) [mgL-1] |
pho | Phosphate concentration [mgL-1] |
nit | Nitrate concentration [mgL-1] |
amn | Ammonium concentration [mgL-1] |
oxy | Dissolved oxygen [mgL-1] |
dbo | Biological oxygen demand [mgL-1] |
Can you spot potential problems?
To make the syntax easier, you can use the formulation
y ~ ., data = x
which means: y as a function of all variables in x.
First part -> contribution to explained variance
Second part -> contribution to total variance
For RDA, the visualization plot is called triplot since there are three different entities in the plot: sites, response variables and explanatory variables (Borcard, Gillet, and Legendre 2011).
Attention: \(R^2\) as the relative contribution of each eigenvectors are unadjusted and are therefore biased. For a proper computation of unbiased, adjusted \(R^2\) one should use the RsquareAdj()
function.
## $r.squared
## [1] 0.7269421
##
## $adj.r.squared
## [1] 0.5600733
\[ R^2_{adj} = 1 - \frac{n - 1}{n - m - 1}(1 - R^2) \]
Most of the time, ecological data are not normally-distributed: parametric tests are not appropriate.
For canonical analysis such as RDA, model significance relies on permutation tests.
The principle of a permutation test is to generate a reference distribution of the chosen statistic under the null hypothesis \(H0\) by randomly permuting appropriate elements of the data a large number of times and recomputing the statistic each time. Then, one compares the true value of the statistic to this reference distribution. The \(p\) value is computed as the proportion of the permuted values equal to or larger than the true (unpermuted) value of the statistic for a one-tailed test in the upper tail, like the \(F\) test used in RDA (Borcard, Gillet, and Legendre 2011).
\(H0\): Observed results can be produced by random chance.
The null hypothesis is rejected if this \(p\) value is equal to or smaller than the predefined significance level \(\alpha\) (ex.: \(\alpha = 0.05\)).
For an RDA, you have to test for three different things:
Global RDA significance
Axis significance
Terms (explanatory variables) significance
We use the function anova.cca()
to perform the permutation test. Do not get confused with the name, is has nothing to do with the classical ANOVA test.
Argument
permutations
gives the minimal number of permutations requested to assess if the F value of a test is obviously significant or not (Borcard, Gillet, and Legendre 2011).
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe_hellinger ~ das + alt + pen + deb + pH + dur + pho + nit + amm + oxy + dbo, data = env)
## Df Variance F Pr(>F)
## Model 11 0.36517 4.3564 0.001 ***
## Residual 18 0.13717
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since \(p\) = 0.001 we reject \(H0\) meaning that the RDA model is significant.
Axis significance can be done by adding by = "axis"
.
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe_hellinger ~ das + alt + pen + deb + pH + dur + pho + nit + amm + oxy + dbo, data = env)
## Df Variance F Pr(>F)
## RDA1 1 0.221790 29.1044 0.001 ***
## RDA2 1 0.056112 7.3633 0.001 ***
## RDA3 1 0.032462 4.2599 0.090 .
## RDA4 1 0.027880 3.6585 0.259
## RDA5 1 0.009756 1.2802 0.998
## RDA6 1 0.005466 0.7173 1.000
## RDA7 1 0.005127 0.6728 0.999
## RDA8 1 0.003182 0.4176 1.000
## RDA9 1 0.001958 0.2569 1.000
## RDA10 1 0.000880 0.1155 1.000
## RDA11 1 0.000562 0.0737 1.000
## Residual 18 0.137169
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Terms significance can be done by adding by = "terms"
.
## Permutation test for rda under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe_hellinger ~ das + alt + pen + deb + pH + dur + pho + nit + amm + oxy + dbo, data = env)
## Df Variance F Pr(>F)
## das 1 0.183727 24.1096 0.001 ***
## alt 1 0.031610 4.1481 0.009 **
## pen 1 0.023637 3.1018 0.007 **
## deb 1 0.039763 5.2179 0.002 **
## pH 1 0.007411 0.9725 0.389
## dur 1 0.014727 1.9326 0.114
## pho 1 0.029551 3.8778 0.004 **
## nit 1 0.010425 1.3680 0.229
## amm 1 0.008019 1.0523 0.362
## oxy 1 0.011292 1.4819 0.180
## dbo 1 0.005013 0.6578 0.619
## Residual 18 0.137169
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As with multiple linear regressions, there are chances to have collinearity among predictors (\(X\)).
This needs to be carefully verified.
Lets look at the environmental matrix (\(X\)) used for the RDA.
Multicollinearity can be verified using the vif.cca()
function.
## das alt pen deb pH dur pho
## 10.198473 6.292054 1.422669 6.004052 1.127849 1.954401 5.032821
## nit amm oxy dbo
## 4.157438 5.576650 3.600462 4.203407
As a rule of the thumb, if \(\sqrt{VIF} > 2\) multicollinearity is considered high.
Here, it seems that some of explanatory variables are redundant.
Should we reduce the number of explanatory variables?
Like we saw for the multiple regression analysis, we can use the forward.sel()
function to select the best explanatory variables.
You have to install the packfor
package.
To prevent problem with inflation of type I error: compute the global model (i.e. using all explanatory variables) and calculate p value. If the model is significant, perform the forward selection.
To prevent adding too many explanatory variables: compute the global \(R^2_{adj}\) of the model. Use this value as the second stopping criteria.
library(packfor)
# Global model with all environmental variables
my_rda <- rda(spe_hellinger ~ ., data = env)
# global R2
global_r2 <- RsquareAdj(my_rda)$adj.r.squared
global_r2
## [1] 0.5600733
Perform the forward selection with \(R^2_{adj} = 0.56\).
# Forward selection
fs <- forward.sel(spe_hellinger, # Y matrix
env, # X matrix
adjR2thresh = global_r2, # Set the adj.R2 threshold
alpha = 0.001, # Set alpha level
nperm = 999 # Number of permutations
)
## Testing variable 1
## Testing variable 2
## Testing variable 3
## Testing variable 4
## Procedure stopped (alpha criteria): pvalue for variable 4 is 0.005000 (superior to 0.001000)
The forward selection has determined that fish species gradient was best modeled using 3 environmental variables.
## variables order R2 R2Cum AdjR2Cum F pval
## 1 das 1 0.3657395 0.3657395 0.3430874 16.145902 0.001
## 2 oxy 10 0.1235899 0.4893295 0.4515020 6.534406 0.001
## 3 dbo 11 0.0746889 0.5640184 0.5137128 4.454113 0.001
Based on the forward selection, it seems that das
, oxy
and dbo
are good predictors.
Lets create another RDA using the selected variables.
## [1] 0.5600733
## [1] 0.5137128
The RDA using only 3 explanatory variables explains almost as much as the RDA using 11 explanatory variables.
## das alt pen deb pH dur pho
## 10.198473 6.292054 1.422669 6.004052 1.127849 1.954401 5.032821
## nit amm oxy dbo
## 4.157438 5.576650 3.600462 4.203407
## das oxy dbo
## 1.165765 1.991699 1.865008
Remember: \(\sqrt{VIF} > 2\) means high multicollinearity among predictor.
Axis significance can be done by adding by = "axis"
.
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe_hellinger ~ das + oxy + dbo, data = env)
## Df Variance F Pr(>F)
## RDA1 1 0.210795 25.0245 0.001 ***
## RDA2 1 0.050414 5.9849 0.001 ***
## RDA3 1 0.022121 2.6261 0.007 **
## Residual 26 0.219012
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Terms significance can be done by adding by = "terms"
.
## Permutation test for rda under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe_hellinger ~ das + oxy + dbo, data = env)
## Df Variance F Pr(>F)
## das 1 0.183727 21.8111 0.001 ***
## oxy 1 0.062085 7.3704 0.001 ***
## dbo 1 0.037519 4.4541 0.004 **
## Residual 26 0.219012
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It is a good idea to look at the residual variation in the data. We can use the Kaiser-Guttman criterion (Kaiser 1960).
## PC1 PC2 PC3 PC4 PC5 PC6
## 0.044375541 0.023551441 0.018339110 0.012026190 0.010230639 0.007679747
Explore our data for your using an RDA.
If you do not have data, use the following data and create an RDA to predict vascular plant abundance with chemical variables.
It is possible to run an RDA of a matrix Y, explained by a matrix variables X, in the presence of co-variable(s) W.
This analysis allows to display the patterns of the response data (Y) uniquely explained by a linear model of the explanatory variables (X) when the effect of other covariates (W) is held constant (Borcard, Gillet, and Legendre 2011).
The analysis that follows determines whether water chemistry (X) significantly explains the fish species patterns (Y) when the effect of the topographical gradient (W) is held constant.
In other words, we want to model the effect of water chemistry variables on fish abundance once the effect of topographic variables is removed.
The construction of a partial RDA is pretty much the same as for a standard RDA except that we are using the Condition(...)
function to specify which covariate (\(W\)) we want to remove the effect.
##
## Call:
## rda(formula = spe_hellinger ~ pH + dur + pho + nit + amm + oxy + dbo + Condition(das + alt + pen + deb), data = env)
##
## Partitioning of variance:
## Inertia Proportion
## Total 0.50234 1.0000
## Conditioned 0.27874 0.5549
## Constrained 0.08644 0.1721
## Unconstrained 0.13717 0.2731
##
## Eigenvalues, and their contribution to the variance
## after removing the contribution of conditiniong variables
##
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## Eigenvalue 0.04636 0.01732 0.01080 0.005924 0.003477 0.001806
## Proportion Explained 0.20734 0.07747 0.04829 0.026493 0.015549 0.008077
## Cumulative Proportion 0.20734 0.28481 0.33311 0.359600 0.375149 0.383225
## RDA7 PC1 PC2 PC3 PC4 PC5
## Eigenvalue 0.0007458 0.04438 0.02355 0.01834 0.01203 0.01023
## Proportion Explained 0.0033352 0.19845 0.10533 0.08202 0.05378 0.04575
## Cumulative Proportion 0.3865606 0.58501 0.69034 0.77236 0.82614 0.87189
## PC6 PC7 PC8 PC9 PC10 PC11
## Eigenvalue 0.00768 0.005359 0.004465 0.003185 0.002903 0.001800
## Proportion Explained 0.03435 0.023966 0.019970 0.014242 0.012982 0.008051
## Cumulative Proportion 0.90624 0.930203 0.950173 0.964415 0.977397 0.985449
## PC12 PC13 PC14 PC15 PC16
## Eigenvalue 0.001160 0.0008835 0.0004229 0.0003872 0.0002722
## Proportion Explained 0.005189 0.0039509 0.0018914 0.0017318 0.0012175
## Cumulative Proportion 0.990638 0.9945887 0.9964801 0.9982119 0.9994294
## PC17 PC18
## Eigenvalue 9.722e-05 3.036e-05
## Proportion Explained 4.348e-04 1.358e-04
## Cumulative Proportion 9.999e-01 1.000e+00
##
## Accumulated constrained eigenvalues
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## Eigenvalue 0.04636 0.01732 0.0108 0.005924 0.003477 0.001806
## Proportion Explained 0.53638 0.20041 0.1249 0.068536 0.040223 0.020894
## Cumulative Proportion 0.53638 0.73679 0.8617 0.930255 0.970479 0.991372
## RDA7
## Eigenvalue 0.0007458
## Proportion Explained 0.0086278
## Cumulative Proportion 1.0000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 1.953663
##
##
## Species scores
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## CHA -0.006736 0.056075 -0.067508 -0.031867 0.0328150 -0.003809
## TRU 0.056411 0.215314 0.048162 0.042760 0.0245275 0.013154
## VAI 0.113326 0.120592 0.082918 -0.011714 -0.0447019 -0.010303
## LOC 0.178107 0.121882 0.076960 -0.052258 -0.0432833 0.013544
## OMB -0.032569 0.062566 -0.072818 -0.010169 0.0033421 -0.026404
## BLA 0.021748 0.047012 -0.068045 -0.023806 0.0620808 0.020182
## HOT 0.079670 -0.046183 -0.014754 0.043955 -0.0268121 0.033513
## TOX 0.147921 -0.059793 0.019036 0.059762 -0.0006972 -0.027146
## VAN 0.061168 -0.067139 0.069243 -0.058602 0.0234703 -0.018282
## CHE -0.107322 -0.044838 0.017223 -0.097562 -0.0522028 -0.017556
## BAR 0.176698 -0.006498 -0.038188 0.022431 0.0532397 0.010502
## SPI 0.137367 -0.061973 0.019658 0.043529 -0.0029242 -0.019346
## GOU 0.047367 -0.044509 0.102003 -0.053955 0.0363768 0.041214
## BRO 0.068898 0.006641 0.091837 -0.003211 0.0526976 -0.043015
## PER 0.136899 -0.041390 0.024280 0.001626 -0.0360536 -0.011459
## BOU 0.152817 -0.027459 0.002190 0.026119 -0.0020938 -0.012330
## PSO 0.118847 -0.045988 -0.025997 0.024803 -0.0251135 0.010376
## ROT 0.005053 -0.052845 0.066721 -0.011614 0.0502122 -0.010447
## CAR 0.139290 -0.018109 -0.011819 0.010822 0.0002110 -0.014721
## TAN 0.167530 -0.048528 -0.012838 -0.040782 0.0154087 0.026609
## BCO 0.119418 0.010340 -0.023971 -0.005092 -0.0178581 0.018706
## PCH 0.075090 0.011810 -0.039215 -0.013490 -0.0157576 0.002078
## GRE 0.030458 -0.052458 0.050348 -0.016292 0.0118461 0.044864
## GAR -0.082276 -0.120778 0.032477 0.018500 -0.0112214 0.013140
## BBO 0.103534 -0.010786 -0.049588 0.017441 -0.0227933 0.028896
## ABL -0.205858 0.015811 0.108859 0.100722 -0.0031471 0.018825
## ANG 0.134038 -0.021178 -0.006907 0.019326 0.0024788 -0.021222
##
##
## Site scores (weighted sums of species scores)
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## 1 -0.01065 -0.16652 -0.14794 0.264469 0.32820 0.12189
## 2 0.06446 0.78394 -0.09872 1.083078 -0.03281 0.71002
## 3 0.12808 0.73956 0.18636 0.853163 0.26171 -0.14300
## 4 0.23832 0.03298 0.49519 -0.068726 -0.04972 0.15323
## 5 0.05334 -1.33539 0.68190 -0.459798 1.22152 -0.42072
## 6 0.18560 -0.21669 0.49966 -0.695084 -0.05519 0.24449
## 7 0.05283 0.60479 0.20124 -0.225805 -0.64900 -0.17305
## 8 -0.40692 -0.70449 -1.51926 0.915941 1.03301 -0.47344
## 9 -0.29966 -0.74228 -0.51274 -1.021561 -1.47758 0.04004
## 10 0.03389 0.09186 0.78500 -1.599635 -1.30992 -0.22610
## 11 -0.32573 0.36537 -0.16671 0.007798 -1.25477 -0.33065
## 12 -0.26473 0.65465 -0.28707 -0.009253 -1.00531 -0.15052
## 13 -0.23858 0.82346 -0.73758 0.397032 0.31015 0.24887
## 14 -0.14521 0.64434 -0.45194 -0.308003 0.97783 0.10133
## 15 -0.07509 0.18484 -0.36303 -1.175132 0.96867 0.95440
## 16 0.45081 0.19081 -0.05680 -0.610718 1.08501 -0.94728
## 17 0.34019 -0.49636 0.17749 0.846064 0.08994 -0.96113
## 18 0.33976 -0.60291 0.26129 0.555080 -0.03882 -0.55441
## 19 0.38051 -0.88714 0.55807 0.614652 -0.14071 0.93982
## 20 0.34744 -0.88344 0.41751 0.596592 -0.01302 0.39399
## 21 0.46064 -0.27667 0.14040 0.326027 0.16471 0.04939
## 22 0.43425 -0.19827 -0.14184 0.040092 0.14864 0.06692
## 23 -1.13447 0.44175 -0.09791 0.534633 -1.15993 -0.98447
## 24 -0.69985 0.30306 -0.13651 0.504795 -0.98194 2.12108
## 25 -0.85845 -0.15106 1.17533 -0.544402 1.32867 -0.10600
## 26 0.20536 0.29058 -0.06739 -0.209412 0.14905 0.44500
## 27 0.35017 0.50452 -0.25649 -0.352568 0.33618 -0.32536
## 28 0.33524 0.64514 -0.37172 -0.416249 0.08103 -0.67630
## 29 0.03591 -0.33363 0.00736 0.145693 -0.25999 0.29330
## 30 0.02253 -0.30681 -0.17314 0.011237 -0.05560 -0.41134
##
##
## Site constraints (linear combinations of constraining variables)
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## 1 -0.02391 -0.036707 -0.03327 0.07930 0.16118 0.075277
## 2 0.33516 0.006732 0.58753 0.68365 0.18160 0.551285
## 3 0.07580 0.382694 -0.04659 0.55873 0.52004 0.115742
## 4 0.20289 0.239507 0.35641 -0.40527 -0.14871 0.005679
## 5 -0.31485 -0.563519 -0.25858 0.02004 0.17970 -0.097047
## 6 0.06779 0.585164 0.26344 0.03847 -0.39751 0.005655
## 7 -0.02030 0.465622 -0.07491 -0.58245 -0.24644 -0.534697
## 8 -0.40872 -0.553370 -0.70505 -0.04871 -0.06311 -0.276797
## 9 -0.04102 -0.760693 -0.17589 -0.42206 0.01551 0.073980
## 10 0.06557 -0.264006 0.16238 -0.16112 -0.66799 -0.049270
## 11 0.02127 -0.202447 -0.03652 -0.38875 0.27358 -0.297116
## 12 -0.07716 0.191338 0.11839 0.23000 -0.56646 0.108175
## 13 -0.18417 0.231607 -0.20020 -0.01579 -0.32286 -0.014005
## 14 -0.16408 0.573558 -0.39262 -0.14808 0.15340 -0.401418
## 15 -0.08290 0.239785 -0.49693 -0.14732 0.93083 0.762921
## 16 0.37084 -0.076852 0.09606 -0.28534 -0.07623 0.037788
## 17 0.10555 -0.408005 -0.03129 0.60953 0.08629 -0.606067
## 18 0.21528 -0.292934 0.26194 -0.07232 0.06156 0.180256
## 19 0.25110 0.118684 0.20929 -0.15871 0.19487 0.336569
## 20 0.39265 -0.244266 0.27743 0.58776 0.09885 -0.428379
## 21 0.43229 -0.108845 0.08768 0.13806 -0.32203 0.191321
## 22 0.21640 0.218611 -0.32197 0.13614 -0.26136 0.418909
## 23 -1.03978 0.716385 -0.04610 0.42770 -0.21453 -0.102241
## 24 -0.49025 -0.331472 -0.37860 0.20922 -0.24626 0.461350
## 25 -0.94699 -0.265809 1.22095 -0.24306 0.56368 -0.164705
## 26 0.17052 0.100774 0.19412 -0.88716 -0.11180 0.441012
## 27 0.34452 -0.045063 -0.19822 0.26767 0.12702 -0.086841
## 28 0.54104 0.331874 -0.14415 -0.15916 0.58080 -0.889494
## 29 0.01374 -0.233864 0.07644 0.18198 -0.58074 -0.132751
## 30 -0.02828 -0.014485 -0.37119 -0.04297 0.09712 0.314907
##
##
## Biplot scores for constraining variables
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## pH -0.009684 0.35319 -0.5252 0.01125 0.713943 0.04296
## dur -0.270682 -0.17691 -0.2616 -0.31065 -0.111336 -0.27432
## pho -0.672975 0.01490 0.4494 -0.10956 0.225350 -0.00608
## nit -0.209209 -0.08541 0.3855 -0.01586 0.285438 -0.21167
## amm -0.670851 -0.05230 0.4774 0.06538 0.212485 -0.09336
## oxy 0.436621 0.33888 -0.1078 0.07314 -0.042087 -0.03581
## dbo -0.767570 -0.05931 0.1696 0.01937 0.008685 -0.01508
According to Borcard, Gillet, and Legendre (2011):
Conditioned: gives the amount of variance that has been explained by the covariables and removed (55.49%).
Constrained: gives the amount of variance uniquely explained by the explanatory variables (17.21%).
Unconstrained: gives the residual variance (27.31%).
Caution: the values are unadjusted and are therefore biased.
## $r.squared
## [1] 0.1720683
##
## $adj.r.squared
## [1] 0.07641971
As for RDA without covariates we can use the anova.cca()
function to test the significance of partial RDA.
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe_hellinger ~ pH + dur + pho + nit + amm + oxy + dbo + Condition(das + alt + pen + deb), data = env)
## Df Variance F Pr(>F)
## Model 7 0.086437 1.6204 0.021 *
## Residual 18 0.137169
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since \(p \leq 0.05\) we reject \(H0\) and can conclude that the result was not produced randomly.
Run anova.cca(partial_rda)
couple of times. What are you observing?
## [1] 0.5600733
# Adj-R2 after removing effect of topography
RsquareAdj(rda(spe_hellinger ~ pH + dur + pho + nit + amm + oxy +
dbo + Condition(das + alt + pen + deb), data = env))$adj.r.squared
## [1] 0.07641971
# Adj-R2 after removing effect of water chemistry
RsquareAdj(rda(spe_hellinger ~ das + alt + pen + deb + Condition(pH +
dur + pho + nit + amm + oxy + dbo), data = env))$adj.r.squared
## [1] 0.1007639
Why 0.08% + 0.1% is not equal to 0.56%?
We just saw that the \(R^2_{adj}\) of each partial RDA does not sum up to the \(R^2_{adj}\) of the global model.
56% - 8% - 10 = 38%
Where is the missing 38%?
There must have confounded variance explained by both partial RDA.
In R, variance partitioning among partial RDA is done using the varpart
function from the vegan
package.
env_topograhpy <- env[, 1:4] # Extract topography variables
env_water_chemistry <- env[, 5:11] # Extract water chem variables
vp <- varpart(spe_hellinger, env_water_chemistry, env_topograhpy)
vp
##
## Partition of variance in RDA
##
## Call: varpart(Y = spe_hellinger, X = env_water_chemistry,
## env_topograhpy)
##
## Explanatory tables:
## X1: env_water_chemistry
## X2: env_topograhpy
##
## No. of explanatory tables: 2
## Total variation (SS): 14.568
## Variance: 0.50234
## No. of observations: 30
##
## Partition table:
## Df R.squared Adj.R.squared Testable
## [a+b] = X1 7 0.58982 0.45931 TRUE
## [b+c] = X2 4 0.55487 0.48365 TRUE
## [a+b+c] = X1+X2 11 0.72694 0.56007 TRUE
## Individual fractions
## [a] = X1|X2 7 0.07642 TRUE
## [b] 0 0.38289 FALSE
## [c] = X2|X1 4 0.10076 TRUE
## [d] = Residuals 0.43993 FALSE
## ---
## Use function 'rda' to test significance of fractions of interest
## $r.squared
## [1] 0.1720683
##
## $adj.r.squared
## [1] 0.07641971
We have tested the global significance of the RDA. We also need to do it for each fraction: (1) [a + b], (2) [b + c], (3) [a + b + c], (4) [a] and (5) [c]
Same as a normal RDA using permutation test.
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe_hellinger, Y = env_water_chemistry)
## Df Variance F Pr(>F)
## Model 7 0.29629 4.5193 0.001 ***
## Residual 22 0.20605
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe_hellinger, Y = env_topograhpy)
## Df Variance F Pr(>F)
## Model 4 0.27874 7.791 0.001 ***
## Residual 25 0.22361
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe_hellinger, Y = env)
## Df Variance F Pr(>F)
## Model 11 0.36517 4.3564 0.001 ***
## Residual 18 0.13717
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe_hellinger, Y = env_water_chemistry, Z = env_topograhpy)
## Df Variance F Pr(>F)
## Model 7 0.086437 1.6204 0.019 *
## Residual 18 0.137169
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe_hellinger, Y = env_topograhpy, Z = env_water_chemistry)
## Df Variance F Pr(>F)
## Model 4 0.068882 2.2598 0.003 **
## Residual 18 0.137169
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With varpart
it is possible to have up to 4 explanatory matrices.
Variable selection is kind of powerful statistical tool to select best explanatory variables. However, it should not replace good scientific knowledge and good scientific hypothesis! Don’t go fishing on your explanatory variables.
Perform a RDA on fish species data using water chemistry variables as X and topography variables as W.
Perform variable selection on X using \(\alpha = 0.01\).
Perform variable selection on W using \(\alpha = 0.01\).
Keep selected variables in both X and W and perform variance partitioning.
Test each fraction significance.
What are the differences with the previous varpart we performed?
Use env
and create three data frames and perform a RDA using three sets of covariates and perform variance partitioning.
## # A tibble: 6 x 11
## das alt pen deb pH dur pho nit amm oxy dbo
## <dbl> <int> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.3 934 48 0.84 7.9 45 0.01 0.2 0 12.2 2.7
## 2 2.2 932 3 1 8 40 0.02 0.2 0.1 10.3 1.9
## 3 10.2 914 3.7 1.8 8.3 52 0.05 0.22 0.05 10.5 3.5
## 4 18.5 854 3.2 2.53 8 72 0.1 0.21 0 11 1.3
## 5 21.5 849 2.3 2.64 8.1 84 0.38 0.52 0.2 8 6.2
## 6 32.4 846 3.2 2.86 7.9 60 0.2 0.15 0 10.2 5.3
Same amount of explained variance, just divided differently.
http://evol.bio.lmu.de/_statgen/Multivariate/11SS/rda.pdf
Borcard, Daniel, Francois Gillet, and Pierre Legendre. 2011. Numerical Ecology with R. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4419-7976-6.
Kaiser, H. F. 1960. “The application of electronic computers to factor analysis.” Educ. Psychol. Meas. 20 (1): 141–51. https://doi.org/10.1177/001316446002000116.
Legendre, Pierre, and Eugene Gallagher. 2001. “Ecologically meaningful transformations for ordination of species data.” Oecologia 129 (2): 271–80. https://doi.org/10.1007/s004420100716.