A brief introduction to spatial autocorrelation

What is spatial autocorrelation?

Spatial structures play important roles in the analysis of ecological data. Living communities are spatially structured at many spatial scales (Borcard, Gillet, and Legendre (2011)).
The environmental elements such as climatic, physical, chemical forces control living communities. If these factors are spatially structured, their patterns will reflect on the living communities.
- Example: patches in the desert where the soil is humid enough to support vegetation.

Spatial autocorrelation

According to the geographer Waldo R. Tobler, the first law of geography is:

Everything is related to everything else, but near things are more related than distant things.

Spatial autocorrelation can causes problems for statistical methods that make assumptions about the independence of residuals.

Spatial autocorrelation

Spatial data can be positively spatially autocorrelated, negatively spatially autocorrelated, or not (or randomly) spatially autocorrelated.

A positive spatial autocorrelation means that similar values are close to each other.

A negative spatial autocorrelation means that similar values are distant from each other.

A random spatial autocorrelation means that, in general, similar values are neither close nor distant from each other.

Moran’s index

The Moran’s index (Moran’s I) is widely used to measure spatial autocorrelation based on feature locations and feature values simultaneously.

\[\begin{equation} I = \frac{n}{S_0} \frac{\displaystyle\sum_{i=1}^n \sum_{j=1}^n w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\displaystyle\sum_{i=1}^n (x_i - \bar{x})^2} \label{eq:morani} \end{equation}\] where \(w_{ij}\) is the weight between observation \(i\) and \(j\), and \(S_0\) is the sum of all \(w_{ij}\)’s.

Moran’s index

Moran’s I can vary between -1 and 1 (like a normal correlation index).

Moran’s I

There are two types of Moran’s I:

Global Moran’s I is a measure of the overall spatial autocorrelation.
Local Moran’s I is a measure of the local spatial autocorrelation (e.x.: locally at each station).

A working example

We will use bird diversity data (https://bit.ly/2BLOwdd) to learn how to deal with spatial autocorrelation.

df <- read_table2("data/bird.diversity.txt")  # Load tidyverse first
df <- janitor::clean_names(df)  # Clean column names

df

## # A tibble: 64 x 5
##     site bird_diversity tree_diversity lon_x lat_y
##    <dbl>          <dbl>          <dbl> <dbl> <dbl>
##  1     1           7.18           4.12 -118.  33.7
##  2     2           7.54           6.12 -117.  34.1
##  3     3           4.89           4.1  -118.  34.0
##  4     4           4.15           4.8  -118.  33.8
##  5     5           5.90           3.8  -118.  33.9
##  6     6           4.72           3.7  -118.  33.7
##  7     7           3.16           4.07 -119.  33.7
##  8     8           4.05           5    -118.  33.7
##  9     9           7.27           4.2  -118.  34.1
## 10    10           6.53           4.9  -118.  33.9
## # … with 54 more rows

Tree diversity explains bird diversity

There is indeed a positive influence of tree diversity on bird diversity.

ggplot(df, aes(x = tree_diversity, y = bird_diversity)) + 
    geom_point(size = 3) + geom_smooth(method = "lm")

Spatial distribution of the data

Spatial pattern in the data?

ggplot(df, aes(x = lon_x, y = lat_y, size = bird_diversity)) + 
    geom_point()

Spatial distribution of the data

Relationships between geographical coordinates and bird diversity.

Global Moran’s I

To calculate Moran’s I, we first need to generate a matrix of inverse distance weights. We also need to set the diagonal of the matrix to 0.

sample_distances <- as.matrix(dist(cbind(df$lon_x, df$lat_y)))
sample_distances_inv <- 1 / sample_distances
diag(sample_distances_inv) <- 0

sample_distances_inv[1:6, 1:6] # Show a subset of the data

##           1         2        3        4        5        6
## 1 0.0000000 0.8993377 2.410577 1.780454 2.464641 6.926088
## 2 0.8993377 0.0000000 1.268405 1.789176 1.372575 1.032375
## 3 2.4105769 1.2684053 0.000000 2.849182 8.657011 3.184743
## 4 1.7804543 1.7891758 2.849182 0.000000 4.124330 2.396467
## 5 2.4646408 1.3725750 8.657011 4.124330 0.000000 3.609218
## 6 6.9260877 1.0323754 3.184743 2.396467 3.609218 0.000000

Notes on the calculation of spatial weights

Spatial analysis, in general, should not use lat/long when spatial weights are calculated. This is because observations from distant locations can cause distortion. It is better to calculate distances based on spherical coordinates (projection based coordinates). This can be done using the geosphere package.

Global Moran’s I

We need to install the ape package to calculate global Moran’s I.

install.packages("ape")
library(ape)

Global Moran’s I

The global Moran’s I is then calculated using the Moran.I() function.

moran <- Moran.I(df$bird_diversity, sample_distances_inv)
moran

## $observed
## [1] 0.460358
## 
## $expected
## [1] -0.01587302
## 
## $sd
## [1] 0.03753738
## 
## $p.value
## [1] 0

Global Moran’s I

Note that under the null hypothesis that there is no spatial autocorrelation, I is calculated as:

\[ I = -1/(N-1) \] where N is the number of observations.

-1/(nrow(df) - 1)  # Expected Moran's I

## [1] -0.01587302

Here, the observed global Moran’s I has a value of 0.46. Based on these results and the low p-value, we can reject the null hypothesis that there is zero spatial autocorrelation present in the bird diversity data.

Local Moran’s I

Now that we know that there is positive spatial autocorrelation, we can verify at which spatial scales this is occurring. This can be done using the correlog() function from the pgirmess package.

install.packages("pgirmess")
library(pgirmess)

Local Moran’s I

clog <- correlog(df[, c("lon_x", "lat_y")], z = df$bird_diversity)
plot(clog)

A graphical view

Taking care of spatial autocorrelation

There are many approaches to account for spatial autocorrelation (we will see two today).

Include the geographical positions, i.e. the latitude-longitude interaction term in the analyses.
Explicitly provide a spatial correlation structure that describes the nature of the autocorrelation.

Method 1: Include the geographical positions

First, we create the null model where we do not control for spatial autocorrelation.

Then, to account for potential spatial autocorrelation we include the geographical position, i.e. the latitude-longitude interaction term in the analyses.

Finally, we compare both models to see if spatial autocorrelation is present in the data.

Method 1: Include the geographical positions

Create the null model.

# Simple lm model to predict bird diversity
mod_null <- lm(bird_diversity ~ tree_diversity, data = df)

	bird diversity
Predictors	Estimates	CI	p
(Intercept)	1.09	-1.28 – 3.46	0.370
tree diversity	1.14	0.73 – 1.55	<0.001
Observations	64
R² / adjusted R²	0.324 / 0.313

Based on this model, tree_diversity explains 32 % of the bird_diversity variance.

Method 1: Include the geographical positions

Create the model with the lat/lon interaction.

mod_lon_lat <- lm(bird_diversity ~ tree_diversity + 
    lon_x * lat_y, data = df)

	bird diversity
Predictors	Estimates	CI	p
(Intercept)	10035.70	3010.44 – 17060.96	0.007
tree diversity	0.30	-0.05 – 0.65	0.097
lon x	85.46	25.77 – 145.14	0.007
lat y	-284.51	-489.69 – -79.33	0.009
lon_x:lat_y	-2.42	-4.17 – -0.68	0.008
Observations	64
R² / adjusted R²	0.699 / 0.678

Note the p-value of tree_diversity.

Comparing models

We can compare the models using Akaike’s Information Criterion (AIC).

AIC(mod_null, mod_lon_lat)

##             df      AIC
## mod_null     3 265.1464
## mod_lon_lat  6 219.3481

The results suggest that the mod_lon_lat model is a better choice. This further suggests that tree_diversity has a limited influence on bird_diversity.

Method 2: provide a spatial correlation structure

Using a variogram (also called semivariogram) is another useful way at looking for spatial autocorrelation. A variogram shows how the total variance in the data is related to the distance between observations.

Image from http://www.flutterbys.com.au/stats/tut/tut8.4a.html

Method 2: provide a spatial correlation structure

A variogram can take different shapes.

Image from Pinheiro and Bates 2004, p. 233

Method 2: provide a spatial correlation structure

Variograms can be calculated using the Variogram() function from the nlme package. First, we need to create a Generalized Least Squares model using the gls().

install.packages("nlme")
library(nlme)

# Create a null model
mod <- gls(bird_diversity ~ 1, data = df)

# Calculate the variogram on the model
semivario <-
  Variogram(mod, form = ~ lon_x + lat_y, resType = "normalized")

plot(semivario, smooth = TRUE)

Method 2: provide a spatial correlation structure

We will now test different spatial correlation structures and see which one is best for our data. This is done by setting the correlation argument in the gls() function.

# Model without spatial structure
mod1 <- gls(bird_diversity ~ tree_diversity, data = df)

# Model with Gaussian structure
mod_gaus <-
  gls(
    bird_diversity ~ tree_diversity,
    correlation = corGaus(form = ~ lon_x + lat_y, nugget = TRUE),
    data = df
  )

# Model with a Linear structure
mod_lin <-
  gls(
    bird_diversity ~ tree_diversity,
    correlation = corLin(form = ~ lon_x + lat_y, nugget = TRUE),
    data = df
  )

# Model with Exponential structure
# mod_exp <-
#   gls(
#     bird_diversity ~ tree_diversity,
#     correlation = corExp(form = ~ lon_x + lat_y, nugget = TRUE),
#     data = df
#   )

# Model with Spherical structure
mod_spher <-
  gls(
    bird_diversity ~ tree_diversity,
    correlation = corSpher(form = ~ lon_x + lat_y, nugget = TRUE),
    data = df
  )

# Model with Rational Quadratic Correlation Structureo
mod_ratio <-
  gls(
    bird_diversity ~ tree_diversity,
    correlation = corRatio(form = ~ lon_x + lat_y, nugget = TRUE),
    data = df
  )

Comparing models

AIC(mod1, mod_gaus, mod_lin, mod_spher, mod_ratio)

##           df      AIC
## mod1       3 267.5524
## mod_gaus   5 219.8499
## mod_lin    5 221.7109
## mod_spher  5 221.7109
## mod_ratio  5 219.8787

Based on the AIC, we can choose between the Gaussian Correlation and the Rational Quadratic Correlation structures.

Variogram on the residuals

The quick inspection of the variogram calculated on the residuals confirms that the spatial structure is gone.

semivario <- Variogram(mod_gaus, form = ~lon_x + lat_y, 
    resType = "normalized")
plot(semivario, smooth = TRUE)

Comparing models

	NULL model			Gaussian model
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	1.09	-1.32 – 3.50	0.370	7.00	2.28 – 11.72	0.004
tree diversity	1.14	0.72 – 1.55	<0.001	0.13	-0.19 – 0.45	0.419
Observations	64			64

After modeling the spatial structure in the data, tree_diversity is not significant.

Take home messages

At first, tree_diversity seemed to be an important predictor of bird_diversity.
Including lon_x and lat_y interaction in a simple linear model suggested that autocorrelation was present in the data.
Moran’s I confirmed that there was positive spatial autocorrelation in the data.
Including a Gaussian spatial structure, based on the results of the variogram, modelled adequately the spatial autocorrelation.
Conclusion: there is no relationship between bird_diversity and tree_diversity (we first identified a false positive in the data).

References

Borcard, Daniel, Francois Gillet, and Pierre Legendre. 2011. Numerical Ecology with R. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4419-7976-6.