Function to optimize the selection of a spatial weighting matrix and select the best subset of eigenvectors (MEM, Moran's Eigenvector Maps)

listw.select computes MEM variables (i.e., eigenvectors of a doubly centered spatial weighting matrix) for various definitions of spatial weighting matrices (SWM) and optimizes the selection of the SWM and of a subset of MEM variables. The optimization is done by maximizing the adjusted R-squared (R2) or by minimizing the residual spatial autocorrelation. The function controls the type I error rate by accounting for the number of tests performed. This function combine calls to the functions scores.listw and mem.select. The list of candidate SWMs can easily be generated using listw.candidates.

listw.select(
  x,
  candidates,
  MEM.autocor = c("positive", "negative", "all"),
  method = c("FWD", "MIR", "global"),
  MEM.all = FALSE,
  nperm = 999,
  nperm.global = 9999,
  alpha = 0.05,
  p.adjust = TRUE,
  verbose = FALSE
)

Arguments

x: Vector, matrix, or dataframe of the response variable(s)
candidates: A list of SWMs of the class listw; candidates can be created by listw.candidates
MEM.autocor: Sign of the spatial eigenvectors to generate; "positive", "negative", or "all", for positively, negatively autocorrelated eigenvectors, or both, respectively; default is "positive"
method: Criterion to select the best subset of MEM variables. Either forward (default option), "MIR" (for univariate x only), or "global" (see Details)
MEM.all: A logical indicating if the complete set of MEM variables for the best model should be returned
nperm: Number of permutations to perform the tests in the selection procedure; Default is 999
nperm.global: Number of permutations to perform the tests in the global test; Default is 9999
alpha: Significance threshold value for the tests; Default is 0.05
p.adjust: A logical indicating wheter the p-value of the global test performed on each SWM should be corrected for multiple tests (TRUE) or not (FALSE); default is TRUE
verbose: If 'TRUE' more diagnostics are printed. The default setting is FALSE

Value

listw.select returns a list that contains:

candidates: A data.frame that summarizes the results on all SWMs
best.id: The index and name of the best SWM
best: The results for the best SWM as returned by mem.select

Details

While the selection of the SWM is the most critical step of the spatial eigenvector-based methods (Dray et al. 2006), Bauman et al. (2018) showed that optimizing the choice of the SWM led to inflated type I error rates if an explicit control of the number of SWMs tested was not applied. The function listw.select therefore applies a Sidak correction (Sidak 1967) for multiple tests to the p-value of the global test of each SWM (i.e., the model integrating the whole set of spatial predictors). The Sidak correction is computed as: \(P_corrected = 1 - (1 - P)^n\), where \(n\) is the number of tests performed, \(P\) is the observed p-value, and \(P_corrected\) is the new p-value after the correction. The p-value is first computed using nperm permutations and then corrected according to the total number of SWMs tested (if p.adjust = TRUE). Although the function can be run without this correction, using the default value is strongly recommended to avoid inflated type I error rates (Bauman et al. 2018).

As a consequence of the p-value correction, the significance threshold decreases as the number of SWMs increases, hence leading to a trade-off between the gain of accuracy and the power loss.

The optimization criterion of the SWM performed by listw.select is either based on the maximization of the significant adjusted R2 of all the generated spatial eigenvectors (also referred to as spatial predictors or MEM variables) (method = "global"), or is based on an optimized subset of eigenvectors (method = "FWD" and "MIR").

If the objective is only to optimize the selection of the SWM, without the intervention of the selection of a subset of predictors within each SWM (method = "global"), then the best SWM is the one maximizing the significant adjusted global R2, that is, the R2 of the model of x against the whole set of generated MEM variables which must be significant for the global test (method = "global").

The optimization of the SWM depends on the choosen method. See mem.select for a description of the situations in which method = "FWD", "MIR", and "global" should be preferred.

If a subset of MEM variables is needed, then the optimization of the subset of spatial predictors guides the optimization of the selection of SWM (method = "FWD" or "MIR"). If method = "FWD", listw.select performs the forward selection on the significant SWMs and selects among these the SWM for which the forward-selected subset of spatial eigenvectors yields the highest adjusted R2. If method = "MIR", listw.select performs the MIR selection on all the significant candidate SWMs, and selects the best SWM as the one with the smallest number of MIR-selected spatial eigenvectors. If two or more SWMs present the same smallest number of predictors, then the selection is made among them on the basis of the residual Moran's I. If MEM.autocor = "all", the optimization criteria described above are applied on the sum of the adjusted R2 or number of selected spatial eigenvectors, for method = "FWD" and "MIR", respectively. If no subset of MEM variable is required, then the optimization of the SWM is based on the maximization of the adjusted R2 of all the generated MEM variables (method = "global").

If MEM.autocor = "all", n-1 MEM variables are generated. In this case, if method = "global" or method = "FWD", the adjusted R2 is computed separately on the MEM associated to positive and negative eigenvalues (hereafter positive and negative MEM variables, respectively), and the SWM yielding the highest sum of the the two significant R2 values is selected. If method = "MIR", the MIR selection is performed separately on the positive and negative MEM variables, and the SWM is selected based on the sum of the number of positive and negative spatial predictors.

References

Bauman D., Fortin M-J, Drouet T. and Dray S. (2018) Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology

Blanchet G., Legendre P. and Borcard D. (2008) Forward selection of explanatory variables. Ecology, 89(9), 2623–2632

Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483–493

Sidak Z. (1967) Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62(318), 626–633

Author

Bauman David (dbauman@ulb.ac.be or davbauman@gmail.com) and Stéphane Dray

Examples

# \donttest{
if(require(spdep)) {
### Create a grid of 15 x 15:
grid <- expand.grid(x = seq(1, 15, 1), y = seq(1, 15, 1))
### Generate a response variable Y structured at broad scale by linear combination of
### the first three MEM variables to which a normal noise is added:
nb <- cell2nb(nrow = 15, ncol = 15, "queen")
lw <- nb2listw(nb, style = "B")
MEM <- scores.listw(lw, MEM.autocor = "positive")
# Degree of spatial autocorrelation:
intensity <- 0.8
Y_space <- scale(MEM[, 1] + MEM[, 2] + MEM[, 3]) * intensity
Y_noise <- scale(rnorm(n = nrow(MEM), mean = 0, sd = 1)) * (1 - intensity)
Y <- Y_space + Y_noise
### Y is sampled in 100 randomly-chosen sites of the grid:
idx.sample <- sample(c(1:nrow(grid)), 100, replace = FALSE)
xy <- grid[idx.sample, ]
Y_sampled <- Y[idx.sample]
### The function listw.candidates is used to build the spatial weighting matrices that
### we want to test and compare (with the listw.select function). We test a Gabriel's graph,
### a minimum spanning tree, and a distance-based connectivity defined by a threshold
### distance corresponding to the smallest distance keeping all sites connected (i.e.,
### the defaut value of d2; see help of function listw.candidates).
### These connectivity matrices are then either not weighted (binary weighting), or
### weighted by the linearly decreasing function (see help of the function listw.candidates):
candidates <- listw.candidates(coord = xy, nb = c("gab", "mst"), weights = c("binary", "flin"))
### Number of candidate W matrices generated:
nbw <- length(candidates)
### Significance threshold value after p-value correction (Sidak correction):
1 - (1 - 0.05)^(1/nbw)
### Optimization of the selection of the SWM among the candidates generated above,
### using the corrected significance threshold calculated above for the global tests:
W_sel <- listw.select(Y_sampled, candidates, MEM.autocor = "positive", method = "FWD",
                    p.adjust = TRUE, nperm = 299)
### Some characteristics of the best spatial model:
# Best SWM:
W_sel$best.id
# Selected subset of spatial predictor within the best SWM:
W_sel$best$MEM.select
nrow(W_sel$best$summary)
# Corrected p-value of the global test of the best SWM:
W_sel$best$global.test$Pvalue
# Adjusted R2 of the subset of spatial predictors selected within the chosen SWM:
max(W_sel$best$summary$R2Adj)
# p-values of all the tested W matrices:
W_sel$candidates$Pvalue
# Adjusted R2 of the subset of spatial predictors selected for all the significant
# W matrices:
W_sel$candidates$R2Adj.select

# See Appendix S3 of Bauman et al. 2018 for more extensive examples and illustrations.
}
#> Warning: zero sum general weights
#> Procedure stopped (adjR2thresh criteria) adjR2cum = 0.915923 with 13 variables (> 0.914374)
#> Procedure stopped (alpha criteria): pvalue for variable 13 is 0.053333 (> 0.050000)
#> Procedure stopped (alpha criteria): pvalue for variable 18 is 0.053333 (> 0.050000)
#> Procedure stopped (alpha criteria): pvalue for variable 17 is 0.070000 (> 0.050000)
#> Warning: no non-missing arguments to max; returning -Inf
#> [1] 0.9131575 0.9232227 0.9017483 0.8685984
# }