R/listw.select.R
listw.select.Rd
listw.select
computes MEM variables (i.e., eigenvectors of a doubly
centered spatial weighting matrix) for various definitions of spatial
weighting matrices (SWM) and optimizes the selection of the SWM and of a
subset of MEM variables. The optimization is done by maximizing the adjusted
R-squared (R2) or by minimizing the residual spatial autocorrelation. The
function controls the type I error rate by accounting for the number of tests
performed. This function combine calls to the functions scores.listw
and
mem.select
. The list of candidate SWMs
can easily be generated using listw.candidates
.
Vector, matrix, or dataframe of the response variable(s)
A list of SWMs of the class listw
;
candidates
can be created by listw.candidates
Sign of the spatial eigenvectors to generate; "positive"
,
"negative"
, or "all"
, for positively, negatively autocorrelated
eigenvectors, or both, respectively; default is "positive"
Criterion to select the best subset of MEM variables. Either
forward
(default option), "MIR"
(for univariate x
only), or "global"
(see Details
)
A logical indicating if the complete set of MEM variables for the best model should be returned
Number of permutations to perform the tests in the selection procedure; Default is 999
Number of permutations to perform the tests in the global test; Default is 9999
Significance threshold value for the tests; Default is 0.05
A logical indicating wheter the p-value of the global test performed on each SWM
should be corrected for multiple tests (TRUE) or not (FALSE); default is
TRUE
If 'TRUE' more diagnostics are printed. The default setting is FALSE
listw.select
returns a list that contains:
A data.frame that summarizes the results on all SWMs
The index and name of the best SWM
The results
for the best SWM as returned by mem.select
While the selection of the SWM is the most critical step of the
spatial eigenvector-based methods (Dray et al. 2006), Bauman et al. (2018)
showed that optimizing the choice of the SWM led to inflated type I error
rates if an explicit control of the number of SWMs tested was not applied.
The function listw.select
therefore applies a Sidak correction (Sidak
1967) for multiple tests to the p-value of the global test of each SWM
(i.e., the model integrating the whole set of spatial predictors). The Sidak
correction is computed as: \(P_corrected = 1 - (1 - P)^n\), where \(n\)
is the number of tests performed, \(P\) is the observed p-value, and
\(P_corrected\) is the new p-value after the correction. The p-value is
first computed using nperm
permutations and then corrected according to
the total number of SWMs tested (if p.adjust = TRUE
). Although the
function can be run without this correction, using the default value is strongly
recommended to avoid inflated type I error rates (Bauman et al. 2018).
As a consequence of the p-value correction, the significance threshold decreases as the number of SWMs increases, hence leading to a trade-off between the gain of accuracy and the power loss.
The optimization criterion of the SWM performed by listw.select
is
either based on the maximization of the significant adjusted R2 of all the generated
spatial eigenvectors (also referred to as spatial predictors or MEM
variables) (method = "global"
), or is based on an optimized subset of
eigenvectors (method = "FWD"
and "MIR"
).
If the objective is only to optimize the selection of the SWM, without the
intervention of the selection of a subset of predictors within each SWM
(method = "global"
), then the best SWM is the one maximizing the significant adjusted
global R2, that is, the R2 of the model of x
against the whole set of
generated MEM variables which must be significant for the global test (method = "global"
).
The optimization of the SWM depends on the choosen method
. See
mem.select
for a description of the situations in which
method = "FWD"
, "MIR"
, and "global"
should be preferred.
If a subset of MEM variables is needed, then the optimization of the subset
of spatial predictors guides the optimization of the selection of SWM
(method = "FWD"
or "MIR"
).
If method = "FWD"
, listw.select
performs the forward
selection on the significant SWMs and selects among these the SWM for which
the forward-selected subset of spatial eigenvectors yields the highest
adjusted R2. If method = "MIR"
, listw.select
performs the MIR
selection on all the significant candidate SWMs, and selects the best SWM as
the one with the smallest number of MIR-selected spatial eigenvectors. If
two or more SWMs present the same smallest number of predictors, then the
selection is made among them on the basis of the residual Moran's I.
If MEM.autocor = "all"
, the optimization criteria described above are
applied on the sum of the adjusted R2 or number of selected spatial eigenvectors,
for method = "FWD"
and "MIR"
, respectively.
If no subset of MEM variable is required, then the optimization of the SWM is
based on the maximization of the adjusted R2 of all the generated MEM variables
(method = "global"
).
If MEM.autocor = "all"
, n-1 MEM variables are generated. In this case, if
method = "global"
or method = "FWD"
, the adjusted R2 is computed
separately on the MEM associated to positive and negative eigenvalues (hereafter
positive and negative MEM variables, respectively), and the SWM yielding the
highest sum of the the two significant R2 values is selected. If method = "MIR"
, the
MIR selection is performed separately on the positive and negative MEM variables,
and the SWM is selected based on the sum of the number of positive and
negative spatial predictors.
Bauman D., Fortin M-J, Drouet T. and Dray S. (2018) Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology
Blanchet G., Legendre P. and Borcard D. (2008) Forward selection of explanatory variables. Ecology, 89(9), 2623–2632
Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483–493
Sidak Z. (1967) Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62(318), 626–633
# \donttest{
if(require(spdep)) {
### Create a grid of 15 x 15:
grid <- expand.grid(x = seq(1, 15, 1), y = seq(1, 15, 1))
### Generate a response variable Y structured at broad scale by linear combination of
### the first three MEM variables to which a normal noise is added:
nb <- cell2nb(nrow = 15, ncol = 15, "queen")
lw <- nb2listw(nb, style = "B")
MEM <- scores.listw(lw, MEM.autocor = "positive")
# Degree of spatial autocorrelation:
intensity <- 0.8
Y_space <- scale(MEM[, 1] + MEM[, 2] + MEM[, 3]) * intensity
Y_noise <- scale(rnorm(n = nrow(MEM), mean = 0, sd = 1)) * (1 - intensity)
Y <- Y_space + Y_noise
### Y is sampled in 100 randomly-chosen sites of the grid:
idx.sample <- sample(c(1:nrow(grid)), 100, replace = FALSE)
xy <- grid[idx.sample, ]
Y_sampled <- Y[idx.sample]
### The function listw.candidates is used to build the spatial weighting matrices that
### we want to test and compare (with the listw.select function). We test a Gabriel's graph,
### a minimum spanning tree, and a distance-based connectivity defined by a threshold
### distance corresponding to the smallest distance keeping all sites connected (i.e.,
### the defaut value of d2; see help of function listw.candidates).
### These connectivity matrices are then either not weighted (binary weighting), or
### weighted by the linearly decreasing function (see help of the function listw.candidates):
candidates <- listw.candidates(coord = xy, nb = c("gab", "mst"), weights = c("binary", "flin"))
### Number of candidate W matrices generated:
nbw <- length(candidates)
### Significance threshold value after p-value correction (Sidak correction):
1 - (1 - 0.05)^(1/nbw)
### Optimization of the selection of the SWM among the candidates generated above,
### using the corrected significance threshold calculated above for the global tests:
W_sel <- listw.select(Y_sampled, candidates, MEM.autocor = "positive", method = "FWD",
p.adjust = TRUE, nperm = 299)
### Some characteristics of the best spatial model:
# Best SWM:
W_sel$best.id
# Selected subset of spatial predictor within the best SWM:
W_sel$best$MEM.select
nrow(W_sel$best$summary)
# Corrected p-value of the global test of the best SWM:
W_sel$best$global.test$Pvalue
# Adjusted R2 of the subset of spatial predictors selected within the chosen SWM:
max(W_sel$best$summary$R2Adj)
# p-values of all the tested W matrices:
W_sel$candidates$Pvalue
# Adjusted R2 of the subset of spatial predictors selected for all the significant
# W matrices:
W_sel$candidates$R2Adj.select
# See Appendix S3 of Bauman et al. 2018 for more extensive examples and illustrations.
}
#> Warning: zero sum general weights
#> Procedure stopped (adjR2thresh criteria) adjR2cum = 0.891150 with 6 variables (> 0.885452)
#> Procedure stopped (adjR2thresh criteria) adjR2cum = 0.923635 with 10 variables (> 0.920299)
#> Procedure stopped (adjR2thresh criteria) adjR2cum = 0.911589 with 17 variables (> 0.909435)
#> Procedure stopped (alpha criteria): pvalue for variable 15 is 0.076667 (> 0.050000)
#> Warning: no non-missing arguments to max; returning -Inf
#> [1] 0.8756144 0.9185877 0.9084881 0.8688357
# }