R/mem.select.R
mem.select.Rd
mem.select
computes the spatial eigenvectors (MEM) of the spatial
weighting matrix (SWM) provided (listw
) and optimizes the selection of
a subset of MEM variables relative to response variable(s) stored in
x
. The optimization is done either by maximizing the adjusted
R-squared (R2) of all (method = "global"
) or a subset (method =
"FWD"
) of MEM variables or by minimizing the residual spatial
autocorrelation (method = "MIR"
) (see details in Bauman et al. 2018a).
A vector, matrix, or dataframe of response variable(s). The
method = "MIR"
is only implemented for a vector response. Note that
x
can also contain the residuals of a model when other (e.g.,
environmental) variables should be considered in the model.
A spatial weighting matrix of class listw
; can be created
with functions of the package spdep
, or with the user-friendly
function listw.candidates
. Note that, the function listw.candidates
returns a list
of listw
and subselection by [[]]
should be
performed in this case (see Example
)
Sign of the spatial eigenvectors to generate; "positive", "negative", or "all", for positively, negatively autocorrelated eigenvectors, or both, respectively; default is "positive"
Criterion to select the best subset of MEM variables. Either
"FWD"
(default option), "MIR"
(for univariate x
only), or "global"
(see Details
)
A logical indicating if the complete set of MEM variables should be returned
Number of permutations to perform the tests in the selection procedure; Default is 999
Number of permutations to perform the tests in the global test; Default is 9999
Significance threshold value for the tests; Default is 0.05
If 'TRUE' more diagnostics are printed. The default setting is FALSE
Other parameters (for internal use with listw.select
)
The function returns a list with:
An
object of class randtest
containing the result of the global test
associated to all MEM (adjusted R2 and p-value). If MEM.autocor = "all"
,
a list with two elements (positive
and negative
) corresponding to
the results of the global tests performed on positive and negative MEM respectively.
An object of class orthobasisSp
containing the
complete set of generated MEM variables (generated by
scores.listw
). Only returned if MEM.all = TRUE
.
A dataframe summarizing the results of the selection procedure
An object of class orthobasisSp
containing the subset of significant MEM variables.
The function provides three different methods to select a subset of
MEM variables. For all methods, a global test is firstly performed. If
MEM.autocor = "all"
, two global tests are performed and p-values are
corrected for multiple comparison (Sidak correction).
If the MEM variables are to be further used in a model including actual
predictors (e.g. environmental), then a subset of spatial eigenvectors
needs to be selected before proceeding to further analyses to avoid model
overfitting and/or a loss of statistical power to detect the contribution of
the environment to the variability of the response data (Griffith 2003, Dray
et al. 2006, Blanchet et al. 2008, Peres-Neto and Legendre 2010, Diniz-Filho
et al. 2012). Although several eigenvector selection approaches have been
proposed to select a best subset of eigenvectors, Bauman et al. (2018b) showed
that two main procedures should be preferred, depending on the underlying
objective: the forward selection with double stopping criterion (Blanchet et
al. 2008; method = "FWD"
) or the minimization of the residual spatial
autocorrelation (Griffith and Peres-Neto 2006; MIR selection in Bauman et al.
2018a,b, method = "MIR"
).
The most powerful and accurate selection method, in terms of R2 estimation,
is the forward selection. This method should be preferred when the objective
is to capture as accurately as possible the spatial patterns of x
. If
the objective is to optimize the detection of the spatial patterns in the
residuals of a model of the response variable(s) against a set of environmental
predictors, for instance, then x
should be the model residuals, and
method = "FWD"
. This allows optimizing the detection of residual
spatial patterns once the effect of the environmental predictors has been
removed.
If however the objective is only to remove the spatial autocorrelation from the
residuals of a model of x
against a set of actual predictors (e.g.
environmental) with a small number of spatial predictors, then accuracy is
not as important and one should focus mainly on the number of spatial
predictors (Bauman et al. 2018b). In this case, method = "MIR"
is more
adapted, as it has the advantage to maintain the standard errors of the actual
predictor coefficients as low as possible. Note that method = "MIR"
can
only be used for a univariate x
, as the Moran's I is a univariate index.
If x
is multivariate, then the best criterion is the forward selection
(see Bauman et al. 2018b).
A third option is to not perform any selection of MEM variables
(method = "global"
). This option may be interesting when the complete set
of MEM variables will be used, like in Moran spectral randomizations (Wagner and
Dray 2015, Bauman et al. 2018c) or when using smoothed MEM (Munoz 2009).
For method = "MIR"
, the global test consists in computing the
Moran's I of x
(e.g. residuals of the model of the response variable
against environmental variables) and tests it by permutation (results
stored in global.test
). If the Moran's I is significant, the
function performs a selection procedure that searches among the set of
generated spatial predictors the one that best minimizes the value of the
Moran's I. A model of x
against the selected eigenvector is built,
and the significance of the Moran's I of the model residuals is tested
again. The procedure goes on until the Moran's I of the model residuals is
not significant anymore, hence the name of Minimization of moran's I in the
Residuals (MIR).
For method = "global"
and method = "FWD"
, the global test
consists in computing the adjusted global R2, that is, the R2 of the model
of x
against the whole set of generated MEM variables and tests it
by permutation (results stored in global.test
).
For method = "global"
, if the adjusted global R2 is significant, the
functions returns the whole set of generated MEM variables in MEM.select
.
For method = "FWD"
, if the adjusted global R2 is significant, the
function performs a forward selection with double stopping criterion that
searches among the set of generated spatial predictors the one that best maximizes
the R2 of the model. The procedure is repeated untill one of the two stopping
criterion is reached (see Blanchet et al. 2008). Note that in a few cases, the forward
selection does not select any variable even though the global model is significant. This
can happen for example when a single variable has a strong relation with the response
variable(s), because the integration of the variable alone yields an adjusted R2
slightly higher than the global adjusted R2. In this case, we recommend checking that this
is indeed the reason why the first selected variable was rejected, and rerun the analysis
with a second stopping criterion equal to the global adjusted R2 plus a small amount
allowing avoiding this issue (e.g. 5
done through the argument adjR2thresh
of function forward.sel
, until the
solution is implemented in mem.select
.
For the method = "FWD"
and method = "MIR"
, the MEM selected
by the procedure are returned in MEM.select
and a summary of the
results is provided in summary
. If no MEM are selected, then
MEM.select
and summary
are not returned.
Bauman D., Fortin M-J, Drouet T. and Dray S. (2018a) Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology, 99, 2159-2166
Bauman D., Drouet T., Dray S. and Vleminckx J. (2018b) Disentangling good from bad practices in the selection of spatial or phylogenetic eigenvectors. Ecography, 41, 1–12
Bauman D., Vleminckx J., Hardy O., Drouet T. (2018c) Testing and interpreting the shared space-environment fraction in variation partitioning analyses of ecological data. Oikos
Blanchet G., Legendre P. and Borcard D. (2008) Forward selection of explanatory variables. Ecology, 89(9), 2623–2632
Diniz-Filho J.A.F., Bini L.M., Rangel T.F., Morales-Castilla I. et al. (2012) On the selection of phylogenetic eigenvectors for ecological analyses. Ecography, 35, 239–249
Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483–493
Griffith D. (2003) Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Springer, Berlin
Griffith D. and Peres-Neto P. (2006) Spatial modeling in Ecology: the flexibility of eigenfunction spatial analyses. Ecology, 87, 2603–2613
Munoz, F. 2009. Distance-based eigenvector maps (DBEM) to analyse metapopulation structure with irregular sampling. Ecological Modelling, 220, 2683–2689
Peres-Neto P. and Legendre P. (2010) Estimating and controlling for spatial structure in the study of ecological communities. Global Ecology and Biogeography, 19, 174–184
Wagner H., Dray S. (2015). Generating spatially constrained null models for irregularly spaced data using Moran spectral randomization methods. Methods in Ecology and Evolution, 6, 1169–1178
if(require(vegan)){
# Illustration of the MIR selection on the oribatid mite data
# (Borcard et al. 1992, 1994 for details on the dataset):
# *******************************************************
# Community data (response matrix):
data(mite)
# We will compute the example on a single species:
spe <- mite[, 2]
# Environmental explanatory dataset:
data(mite.env)
# We only use two numerical explanatory variables:
env <- mite.env[, 1:2]
dim(env)
# Coordinates of the 70 sites:
data(mite.xy)
coord <- mite.xy
# We build the model we are interested in:
mod <- lm(spe ~ ., data = env)
# In order to avoid possible type I error rate inflation issues, we check
# whether the model residuals are independent, and if they are spatially
# autocorrelated, we select a small subset of MEM variables to add to the
# model as covariables with the MIR selection:
# 1) We build a spatial weighting matrix based on Gabriel graph with a
# weighting function decreasing linearly with the distance:
w <- listw.candidates(coord, nb = "gab", weights = "flin")
# 2) We test the spatial autocorrelation of the model residuals and, if
# necessary, select a subset of spatial predictors:
y <- residuals(mod)
MEM <- mem.select(x = y, listw = w[[1]], method = "MIR", MEM.autocor = "positive",
nperm = 999, alpha = 0.05)
dim(MEM$MEM.select)
# The residuals of the model presented spatial autocorrelation. The selection
# of MEM variables is thus performed to remove residual autocorrelation.
# 3) We can reconstruct our model adding the selected MEM variable as covariables:
env2 <- cbind(env, MEM$MEM.select)
mod_complete <- lm(spe ~ ., data = env2)
summary(mod_complete)$coefficient[, 1] # Coefficient estimates
summary(mod_complete)$coefficient[, 2] # Standard errors
}
#> (Intercept) SubsDens WatrCont MEM6 MEM5
#> 0.763474835 0.017891034 0.001448202 0.198312047 0.191102035