Spatial structures play important roles in the analysis of ecological data. Living communities are spatially structured at many spatial scales (Borcard, Gillet, and Legendre (2011)).
According to the geographer Waldo R. Tobler, the first law of geography is:
Everything is related to everything else, but near things are more related than distant things.
Spatial autocorrelation can causes problems for statistical methods that make assumptions about the independence of residuals.
Spatial data can be positively spatially autocorrelated, negatively spatially autocorrelated, or not (or randomly) spatially autocorrelated.
A positive spatial autocorrelation means that similar values are close to each other.
A negative spatial autocorrelation means that similar values are distant from each other.
A random spatial autocorrelation means that, in general, similar values are neither close nor distant from each other.
The Moran’s index (Moran’s I) is widely used to measure spatial autocorrelation based on feature locations and feature values simultaneously.
\[\begin{equation} I = \frac{n}{S_0} \frac{\displaystyle\sum_{i=1}^n \sum_{j=1}^n w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\displaystyle\sum_{i=1}^n (x_i - \bar{x})^2} \label{eq:morani} \end{equation}\] where \(w_{ij}\) is the weight between observation \(i\) and \(j\), and \(S_0\) is the sum of all \(w_{ij}\)’s.
Moran’s I can vary between -1 and 1 (like a normal correlation index).
There are two types of Moran’s I:
We will use bird diversity data (https://bit.ly/2BLOwdd) to learn how to deal with spatial autocorrelation.
df <- read_table2("data/bird.diversity.txt") # Load tidyverse first
df <- janitor::clean_names(df) # Clean column names
df
## # A tibble: 64 x 5
## site bird_diversity tree_diversity lon_x lat_y
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 7.18 4.12 -118. 33.7
## 2 2 7.54 6.12 -117. 34.1
## 3 3 4.89 4.1 -118. 34.0
## 4 4 4.15 4.8 -118. 33.8
## 5 5 5.90 3.8 -118. 33.9
## 6 6 4.72 3.7 -118. 33.7
## 7 7 3.16 4.07 -119. 33.7
## 8 8 4.05 5 -118. 33.7
## 9 9 7.27 4.2 -118. 34.1
## 10 10 6.53 4.9 -118. 33.9
## # … with 54 more rows
There is indeed a positive influence of tree diversity on bird diversity.
Spatial pattern in the data?
Relationships between geographical coordinates and bird diversity.
To calculate Moran’s I, we first need to generate a matrix of inverse distance weights. We also need to set the diagonal of the matrix to 0.
sample_distances <- as.matrix(dist(cbind(df$lon_x, df$lat_y)))
sample_distances_inv <- 1 / sample_distances
diag(sample_distances_inv) <- 0
sample_distances_inv[1:6, 1:6] # Show a subset of the data
## 1 2 3 4 5 6
## 1 0.0000000 0.8993377 2.410577 1.780454 2.464641 6.926088
## 2 0.8993377 0.0000000 1.268405 1.789176 1.372575 1.032375
## 3 2.4105769 1.2684053 0.000000 2.849182 8.657011 3.184743
## 4 1.7804543 1.7891758 2.849182 0.000000 4.124330 2.396467
## 5 2.4646408 1.3725750 8.657011 4.124330 0.000000 3.609218
## 6 6.9260877 1.0323754 3.184743 2.396467 3.609218 0.000000
Spatial analysis, in general, should not use lat/long when spatial weights are calculated. This is because observations from distant locations can cause distortion. It is better to calculate distances based on spherical coordinates (projection based coordinates). This can be done using the geosphere package.
We need to install the ape
package to calculate global Moran’s I.
The global Moran’s I is then calculated using the Moran.I()
function.
## $observed
## [1] 0.460358
##
## $expected
## [1] -0.01587302
##
## $sd
## [1] 0.03753738
##
## $p.value
## [1] 0
Note that under the null hypothesis that there is no spatial autocorrelation, I is calculated as:
\[ I = -1/(N-1) \] where N is the number of observations.
## [1] -0.01587302
Here, the observed global Moran’s I has a value of 0.46. Based on these results and the low p-value, we can reject the null hypothesis that there is zero spatial autocorrelation present in the bird diversity data.
Now that we know that there is positive spatial autocorrelation, we can verify at which spatial scales this is occurring. This can be done using the correlog()
function from the pgirmess
package.
There are many approaches to account for spatial autocorrelation (we will see two today).
Create the null model.
# Simple lm model to predict bird diversity
mod_null <- lm(bird_diversity ~ tree_diversity, data = df)
 | bird diversity | ||
---|---|---|---|
Predictors | Estimates | CI | p |
(Intercept) | 1.09 | -1.28 – 3.46 | 0.370 |
tree diversity | 1.14 | 0.73 – 1.55 | <0.001 |
Observations | 64 | ||
R2 / adjusted R2 | 0.324 / 0.313 |
Based on this model, tree_diversity
explains 32 % of the bird_diversity
variance.
Create the model with the lat/lon interaction.
 | bird diversity | ||
---|---|---|---|
Predictors | Estimates | CI | p |
(Intercept) | 10035.70 | 3010.44 – 17060.96 | 0.007 |
tree diversity | 0.30 | -0.05 – 0.65 | 0.097 |
lon x | 85.46 | 25.77 – 145.14 | 0.007 |
lat y | -284.51 | -489.69 – -79.33 | 0.009 |
lon_x:lat_y | -2.42 | -4.17 – -0.68 | 0.008 |
Observations | 64 | ||
R2 / adjusted R2 | 0.699 / 0.678 |
Note the p-value of tree_diversity.
We can compare the models using Akaike’s Information Criterion (AIC).
## df AIC
## mod_null 3 265.1464
## mod_lon_lat 6 219.3481
The results suggest that the mod_lon_lat
model is a better choice. This further suggests that tree_diversity
has a limited influence on bird_diversity
.
Using a variogram (also called semivariogram) is another useful way at looking for spatial autocorrelation. A variogram shows how the total variance in the data is related to the distance between observations.
Image from http://www.flutterbys.com.au/stats/tut/tut8.4a.html
A variogram can take different shapes.
Image from Pinheiro and Bates 2004, p. 233
Variograms can be calculated using the Variogram()
function from the nlme
package. First, we need to create a Generalized Least Squares model using the gls()
.
We will now test different spatial correlation structures and see which one is best for our data. This is done by setting the correlation
argument in the gls()
function.
# Model without spatial structure
mod1 <- gls(bird_diversity ~ tree_diversity, data = df)
# Model with Gaussian structure
mod_gaus <-
gls(
bird_diversity ~ tree_diversity,
correlation = corGaus(form = ~ lon_x + lat_y, nugget = TRUE),
data = df
)
# Model with a Linear structure
mod_lin <-
gls(
bird_diversity ~ tree_diversity,
correlation = corLin(form = ~ lon_x + lat_y, nugget = TRUE),
data = df
)
# Model with Exponential structure
# mod_exp <-
# gls(
# bird_diversity ~ tree_diversity,
# correlation = corExp(form = ~ lon_x + lat_y, nugget = TRUE),
# data = df
# )
# Model with Spherical structure
mod_spher <-
gls(
bird_diversity ~ tree_diversity,
correlation = corSpher(form = ~ lon_x + lat_y, nugget = TRUE),
data = df
)
# Model with Rational Quadratic Correlation Structureo
mod_ratio <-
gls(
bird_diversity ~ tree_diversity,
correlation = corRatio(form = ~ lon_x + lat_y, nugget = TRUE),
data = df
)
## df AIC
## mod1 3 267.5524
## mod_gaus 5 219.8499
## mod_lin 5 221.7109
## mod_spher 5 221.7109
## mod_ratio 5 219.8787
Based on the AIC, we can choose between the Gaussian Correlation and the Rational Quadratic Correlation structures.
The quick inspection of the variogram calculated on the residuals confirms that the spatial structure is gone.
 | NULL model | Gaussian model | ||||
---|---|---|---|---|---|---|
Predictors | Estimates | CI | p | Estimates | CI | p |
(Intercept) | 1.09 | -1.32 – 3.50 | 0.370 | 7.00 | 2.28 – 11.72 | 0.004 |
tree diversity | 1.14 | 0.72 – 1.55 | <0.001 | 0.13 | -0.19 – 0.45 | 0.419 |
Observations | 64 | 64 |
After modeling the spatial structure in the data, tree_diversity
is not significant.
tree_diversity
seemed to be an important predictor of bird_diversity
.lon_x
and lat_y
interaction in a simple linear model suggested that autocorrelation was present in the data.bird_diversity
and tree_diversity
(we first identified a false positive in the data).Borcard, Daniel, Francois Gillet, and Pierre Legendre. 2011. Numerical Ecology with R. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4419-7976-6.