Title: | Stacked Species Distribution Modelling |
---|---|
Description: | Allows to map species richness and endemism based on stacked species distribution models (SSDM). Individuals SDMs can be created using a single or multiple algorithms (ensemble SDMs). For each species, an SDM can yield a habitat suitability map, a binary map, a between-algorithm variance map, and can assess variable importance, algorithm accuracy, and between- algorithm correlation. Methods to stack individual SDMs include summing individual probabilities and thresholding then summing. Thresholding can be based on a specific evaluation metric or by drawing repeatedly from a Bernoulli distribution. The SSDM package also provides a user-friendly interface. |
Authors: | Sylvain Schmitt, Robin Pouteau, Dimitri Justeau, Florian de Boissieu, Lukas Baumbach, Philippe Birnbaum |
Maintainer: | Sylvain Schmitt <[email protected]> |
License: | GPL (>=3) | file LICENSE |
Version: | 0.2.9 |
Built: | 2025-01-10 04:49:20 UTC |
Source: | https://github.com/sylvainschmitt/ssdm |
This is an S4 class to represent an SDM based on a single algorithm (including
generalized linear model, general additive model, multivariate adpative
splines, generalized boosted regression model, classification tree analysis,
random forest, maximum entropy, artificial neural network, and support vector
machines). This S4 class is obtained with modelling
.
name
character. Name of the SDM (by default Species.SDM).
projection
raster. Habitat suitability map produced by the SDM.
binary
raster. Presence/Absence binary map produced by the SDM.
evaluation
data frame. Evaluation of the SDM (available metrics include AUC, Kappa, sensitivity, specificity and proportion of correctly predicted occurrences) and identification of the optimal threshold to convert the habitat suitability map into a binary presence/absence map.
variable.importance
data frame. Relative importance of each variable in the SDM.
data
data frame. Data used to build the SDM.
parameters
data frame. Parameters used to build the SDM.
Ensemble.SDM an S4 class for ensemble SDMs, and Stacked.SDM an S4 class for SSDMs.
This is a method to assemble several algorithms in an ensemble SDM. The
function takes as inputs several S4 Algorithm.SDM class objects
returned by the modelling
function. The function returns an S4
Ensemble.SDM class object containing the habitat suitability
map, the binary map, and the uncertainty map (based on the between-algorithm
variance) and the associated evaluation tables (model evaluation, algorithm
evaluation, algorithm correlation matrix and variable importance).
ensemble( x, ..., name = NULL, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, thresh = 1001, uncertainty = TRUE, SDM.projections = FALSE, cores = 0, verbose = TRUE, GUI = FALSE ) ## S4 method for signature 'Algorithm.SDM' ensemble( x, ..., name = NULL, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, thresh = 1001, uncertainty = TRUE, SDM.projections = FALSE, cores = 0, verbose = TRUE, GUI = FALSE ) ## S4 method for signature 'Algorithm.SDM' sum( x, ..., name = NULL, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, thresh = 1001, format = TRUE, verbose = TRUE, na.rm = TRUE )
ensemble( x, ..., name = NULL, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, thresh = 1001, uncertainty = TRUE, SDM.projections = FALSE, cores = 0, verbose = TRUE, GUI = FALSE ) ## S4 method for signature 'Algorithm.SDM' ensemble( x, ..., name = NULL, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, thresh = 1001, uncertainty = TRUE, SDM.projections = FALSE, cores = 0, verbose = TRUE, GUI = FALSE ) ## S4 method for signature 'Algorithm.SDM' sum( x, ..., name = NULL, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, thresh = 1001, format = TRUE, verbose = TRUE, na.rm = TRUE )
x , ...
|
SDMs. SDMs to be assembled. |
name |
character. Optional name given to the final Ensemble.SDM produced (by default 'Ensemble.SDM'). |
ensemble.metric |
character. Metric(s) used to select the best SDMs that will be included in the ensemble SDM (see details below). |
ensemble.thresh |
numeric. Threshold(s) associated with the metric(s) used to compute the selection. |
weight |
logical. If TRUE, SDMs are weighted using the ensemble metric or, alternatively, the mean of the selection metrics. |
thresh |
numeric. A integer value specifying the number of equal interval threshold values between 0 and 1. |
uncertainty |
logical. If TRUE, generates an uncertainty map and an algorithm correlation matrix. |
SDM.projections |
logical. If FALSE (default), the Algorithm.SDMs inside the 'sdms' slot will not contain projections (for memory saving purposes). |
cores |
integer. Specify the number of CPU cores used to do the
computing. You can use |
verbose |
logical. If set to true, allows the function to print text in the console. |
GUI , format , na.rm
|
logical. Do not take those arguments into account (parameters for the user interface and sum function). |
ensemble.metric (metric(s) used to select the best SDMs that will be included in the ensemble SDM) can be chosen from among:
Area under the receiver operating characteristic (ROC) curve
Kappa from the confusion matrix
Sensitivity from the confusion matrix
Specificity from the confusion matrix
Proportion of correctly predicted occurrences from the confusion matrix
Calibration metric (Naimi & Araujo 2016)
an S4 Ensemble.SDM class object viewable with the
plot.model
function.
ensemble_modelling
to build an ensemble SDM from
multiple algorithms.
## Not run: # Loading data data(Env) data(Occurrences) Occurrences <- subset(Occurrences, Occurrences$SPECIES == 'elliptica') # ensemble SDM building CTA <- modelling('CTA', Occurrences, Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE') SVM <- modelling('SVM', Occurrences, Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE') ESDM <- ensemble(CTA, SVM, ensemble.thresh = c(0.6)) # Results plotting plot(ESDM) ## End(Not run)
## Not run: # Loading data data(Env) data(Occurrences) Occurrences <- subset(Occurrences, Occurrences$SPECIES == 'elliptica') # ensemble SDM building CTA <- modelling('CTA', Occurrences, Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE') SVM <- modelling('SVM', Occurrences, Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE') ESDM <- ensemble(CTA, SVM, ensemble.thresh = c(0.6)) # Results plotting plot(ESDM) ## End(Not run)
Build an ensemble SDM that assembles multiple algorithms for a single species. The function takes as inputs an occurrence data frame made of presence/absence or presence-only records and a raster object for data extraction and projection. The function returns an S4 Ensemble.SDM class object containing the habitat suitability map, the binary map, the between-algorithm variance map and the associated evaluation tables (model evaluation, algorithm evaluation, algorithm correlation matrix and variable importance).
ensemble_modelling( algorithms, Occurrences, Env, Xcol = "Longitude", Ycol = "Latitude", Pcol = NULL, rep = 10, name = NULL, save = FALSE, path = getwd(), cores = 0, parmode = "replicates", PA = NULL, cv = "holdout", cv.param = c(0.7, 1), final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, axes.metric = "Pearson", uncertainty = TRUE, tmp = FALSE, SDM.projections = FALSE, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, verbose = TRUE, GUI = FALSE, ... )
ensemble_modelling( algorithms, Occurrences, Env, Xcol = "Longitude", Ycol = "Latitude", Pcol = NULL, rep = 10, name = NULL, save = FALSE, path = getwd(), cores = 0, parmode = "replicates", PA = NULL, cv = "holdout", cv.param = c(0.7, 1), final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, axes.metric = "Pearson", uncertainty = TRUE, tmp = FALSE, SDM.projections = FALSE, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, verbose = TRUE, GUI = FALSE, ... )
algorithms |
character. A character vector specifying the algorithm name(s) to be run (see details below). |
Occurrences |
data frame. Occurrences table (can be processed first by
|
Env |
raster object. RasterStack object of environmental variables (can
be processed first by |
Xcol |
character. Name of the column in the occurrence table containing Latitude or X coordinates. |
Ycol |
character. Name of the column in the occurrence table containing Longitude or Y coordinates. |
Pcol |
character. Name of the column in the occurrence table specifying whether a line is a presence or an absence. A value of 1 is presence and value of 0 is absence. If NULL presence-only dataset is assumed. |
rep |
integer. Number of repetitions for each algorithm. |
name |
character. Optional name given to the final Ensemble.SDM produced (by default 'Ensemble.SDM'). |
save |
logical. If |
path |
character. If save is If |
cores |
integer. Specify the number of CPU cores used to do the
computing. You can use |
parmode |
character. Parallelization mode: along 'algorithms' or 'replicates'. Defaults to 'replicates'. |
PA |
list(nb, strat) defining the pseudo-absence selection strategy used in case of presence-only dataset. If PA is NULL, recommended PA selection strategy is used depending on the algorithm (see details below). |
cv |
character. Method of cross-validation used to evaluate the ensemble SDM (see details below). |
cv.param |
numeric. Parameters associated to the method of cross-validation used to evaluate the ensemble SDM (see details below). |
final.fit.data |
strategy used for fitting the final/evaluated Algorithm.SDMs: 'holdout'= use same train and test data as in (last) evaluation, 'all'= train model with all data (i.e. no test data) or numeric (0-1)= sample a custom training fraction (left out fraction is set aside as test data) |
bin.thresh |
character. Classification threshold ( |
metric |
(deprecated) character. Classification threshold ( |
thresh |
(deprecated) integer. Number of equally spaced thresholds in the interval 0-1 ( |
axes.metric |
Metric used to evaluate variable relative importance (see details below). |
uncertainty |
logical. If |
tmp |
logical or character. If |
SDM.projections |
logical. If FALSE (default), the Algorithm.SDMs inside the 'sdms' slot will not contain projections (for memory saving purposes). |
ensemble.metric |
character. Metric(s) used to select the best SDMs that will be included in the ensemble SDM (see details below). |
ensemble.thresh |
numeric. Threshold(s) associated with the metric(s) used to compute the selection. |
weight |
logical. If |
verbose |
logical. If |
GUI |
logical. Do not take this argument into account (parameter for the user interface). |
... |
additional parameters for the algorithm modelling function (see details below). |
'all' calls all the following algorithms. Algorithms include Generalized linear model (GLM), Generalized additive model (GAM), Multivariate adaptive regression splines (MARS), Generalized boosted regressions model (GBM), Classification tree analysis (CTA), Random forest (RF), Maximum entropy (MAXENT), Artificial neural network (ANN), and Support vector machines (SVM). Each algorithm has its own parameters settable with the ... (see each algorithm section below to set their parameters).
list with two values: nb number of pseudo-absences selected, and strat strategy used to select pseudo-absences: either random selection or disk selection. We set default recommendation from Barbet-Massin et al. (2012) (see reference).
Cross-validation method used to split the occurrence dataset used for evaluation: holdout data are partitioned into a training set and an evaluation set using a fraction (cv.param[1]) and the operation can be repeated (cv.param[2]) times, k-fold data are partitioned into k (cv.param[1]) folds being k-1 times in the training set and once the evaluation set and the operation can be repeated (cv.param[2]) times, LOO (Leave One Out) each point is successively taken as evaluation data.
Choice of the metric used to compute the binary map threshold and the confusion matrix (by default SES as recommended by Liu et al. (2005), see reference below): Kappa maximizes the Kappa, CCR maximizes the proportion of correctly predicted observations, TSS (True Skill Statistic) maximizes the sum of sensitivity and specificity, SES uses the sensitivity-specificity equality, LW uses the lowest occurrence prediction probability, ROC minimizes the distance between the ROC plot (receiving operating characteristic curve) and the upper left corner (1,1).
Metric used to evaluate the variable relative importance (difference between a full model and one with each variable successively omitted): Pearson (computes a simple Pearson's correlation r between predictions of the full model and the one without a variable, and returns the score 1-r: the highest the value, the more influence the variable has on the model), AUC, Kappa, sensitivity, specificity, and prop.correct (proportion of correctly predicted occurrences).
Ensemble metric(s) used to select SDMs: AUC, Kappa, sensitivity, specificity, and prop.correct (proportion of correctly predicted occurrences).
See algorithm in detail section
an S4 Ensemble.SDM class object viewable with the
plot.model
function.
Uses the glm
function from the package 'stats'. You can set parameters by supplying glm.args=list(arg1=val1,arg2=val2)
(see glm
for all settable arguments).
The following parameters have defaults:
character. Test used to evaluate the SDM, default 'AIC'.
list (created with glm.control
).
Contains parameters for controlling the fitting process. Default is glm.control(epsilon = 1e-08, maxit = 500)
.
'epsilon' is a numeric and defines the positive convergence tolerance (eps). The iterations converge when |dev - dev_old|/(|dev| + 0.1) < eps.
'maxit' is an integer giving the maximal number of IWLS (Iterative Weighted Last Squares) iterations.
Uses the gam
function from the package 'mgcv'. You can set parameters by supplying gam.args=list(arg1=val1,arg2=val2)
(see gam
for all settable arguments).
The following parameters have defaults:
character. Test used to evaluate the model, default 'AIC'.
list (created with gam.control
).
Contains parameters for controlling the fitting process. Default is gam.control(epsilon = 1e-08, maxit = 500)
.
'epsilon' is a numeric used for judging the conversion of the GLM IRLS (Iteratively Reweighted Least Squares) loop. 'maxit' is an integer giving the maximum number of IRLS iterations to perform.
Uses the
earth
function from the package 'earth'. You can set parameters by supplying mars.args=list(arg1=val1,arg2=val2)
(see earth
for all settable arguments).
The following parameters have defaults:
integer. Maximum degree of interaction (Friedman's mi) ; 1 meaning build an additive model (i.e., no interaction terms). By default, set to 2.
Uses the
gbm
function from the package 'gbm'. You can set parameters by supplying gbm.args=list(arg1=val1,arg2=val2)
(see gbm
for all settable arguments).
The following parameters have defaults:
character. Automatically detected from the format of the presence column in the occurrence dataset.
integer. The total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. By default, set to 2500.
integer. minimum number of observations in the trees terminal nodes. Note that this is the actual number of observations, not the total weight. By default, set to 1.
integer. Number of cross-validation folds to perform. If cv.folds>1 then gbm - in addition to the usual fit - will perform a cross-validation. By default, set to 3.
numeric. A shrinkage parameter applied to each tree in the expansion (also known as learning rate or step-size reduction). By default, set to 0.001.
numeric. Fraction of the training set observations randomly selected to propose the next tree in the expansion.
numeric. Training fraction used to fit the first gbm. The remainder is used to compute out-of-sample estimates of the loss function. By default, set to 1 (since evaluation/holdout is done with SSDM::evaluate
.
integer. Number of cores to use for parallel computation of the CV folds. By default, set to 1. If you intend to use this, please set ncores=0
to avoid conflicts.
Uses the rpart
function from the package 'rpart'. You can set parameters by supplying cta.args=list(arg1=val1,arg2=val2)
(see rpart
for all settable arguments).
The following parameters have defaults:
list (created with rpart.control
).
Contains parameters for controlling the rpart fit. The default is rpart.control(minbucket=1, xval=3)
.
'mibucket' is an integer giving the minimum number of observations in any
terminal node. 'xval' is an integer defining the number of
cross-validations.
Uses the randomForest
function
from the package 'randomForest'. You can set parameters by supplying cta.args=list(arg1=val1,arg2=val2)
(see randomForest
all settable arguments).
The following parameters have defaults:
integer. Number of trees to grow. This should not be set to a too small number, to ensure that every input row gets predicted at least a few times. By default, set to 2500.
integer. Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). By default, set to 1.
Uses the maxent
function
from the package 'dismo'. Make sure that you have correctly installed the
maxent.jar file in the folder ~\R\library\version\dismo\java available
at https://biodiversityinformatics.amnh.org/open_source/maxent/. As with the other algorithms, you can set parameters by supplying maxent.args=list(arg1=val1,arg2=val2)
. Mind that arguments are passed from dismo to the MAXENT software again as an argument list (see maxent
for more details).
No specific defaults are set with this method.
Uses the nnet
function from the package 'nnet'. You can set parameters by supplying ann.args=list(arg1=val1,arg2=val2)
(see nnet
for all settable arguments).
The following parameters have defaults:
integer. Number of units in the hidden layer. By default, set to 6.
integer. Maximum number of iterations, default 500.
Uses the svm
function
from the package 'e1071'. You can set parameters by supplying svm.args=list(arg1=val1,arg2=val2)
(see svm
for all settable arguments).
The following parameters have defaults:
character. Regression/classification type SVM should be used with. By default, set to "eps-regression".
float. Epsilon parameter in the insensitive loss function, default 1e-08.
integer. If an integer value k>0 is specified, a k-fold cross-validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Squared Error for regression. By default, set to 3.
character. The kernel used in training and predicting. By default, set to "radial".
numeric. Parameter needed for all kernels, default 1/(length(data) -1)
.
Depending on the raster object resolution the process can be more or less time and memory consuming.
M. Barbet-Massin, F. Jiguet, C. H. Albert, & W. Thuiller (2012) "Selecting pseudo-absences for species distribution models: how, where and how many?" Methods Ecology and Evolution 3:327-338 http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00172.x/full
C. Liu, P. M. Berry, T. P. Dawson, R. & G. Pearson (2005) "Selecting thresholds of occurrence in the prediction of species distributions." Ecography 28:85-393 http://www.researchgate.net/publication/230246974_Selecting_Thresholds_of_Occurrence_in_the_Prediction_of_Species_Distributions
modelling
to build SDMs with a single algorithm,
stack_modelling
to build SSDMs.
## Not run: # Loading data data(Env) data(Occurrences) Occurrences <- subset(Occurrences, Occurrences$SPECIES == 'elliptica') # ensemble SDM building ESDM <- ensemble_modelling(c('CTA', 'MARS'), Occurrences, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', ensemble.thresh = c(0.6)) # Results plotting plot(ESDM) ## End(Not run)
## Not run: # Loading data data(Env) data(Occurrences) Occurrences <- subset(Occurrences, Occurrences$SPECIES == 'elliptica') # ensemble SDM building ESDM <- ensemble_modelling(c('CTA', 'MARS'), Occurrences, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', ensemble.thresh = c(0.6)) # Results plotting plot(ESDM) ## End(Not run)
This is an S4 class to represent an ensemble SDM from multiple algorithms
(including generalized linear model, general additive model, multivariate
adaptive splines, generalized boosted regression model, classification tree
analysis, random forest, maximum entropy, artificial neural network, and
support vector machines). This S4 class is returned by
ensemble_modelling
or ensemble
.
uncertainty
raster. Between-algorithm variance map.
algorithm.correlation
data frame. Between-algorithm correlation matrix.
algorithm.evaluation
data frame. Evaluation of the ensemble SDM (available
sdms
list. Individual SDMs used to create the ESDM. metrics include AUC, Kappa, sensitivity, specificity and proportion of correctly predicted occurrences) and identification of the optimal threshold to convert the habitat suitability map into a binary presence/absence map.
Algorithm.SDM an S4 class to represent an SDM based on a single algorithm, and Stacked.SDM an S4 class for SSDMs.
A stack of three 30 arcsec-resolution rasters covering the north part of the main island of New Caledonia 'Grande Terre'. CRAINFALL and TEMPERATURE rasters are climatic variables from the WorldClim database, and SUBSTRATE raster is from the IRD Atlas of New Caledonia (2012) (see reference below).
Env
Env
A stack of three rasters:
Annual mean rainfall (mm)
Annual mean temperature (x10 degree Celsius)
Substrate type (categorical variable)
R.J. Hijmans, C.H. & Graham (2006) "The ability of climate envelope models to predict the effect of climate change on species distributions." Global Change Biology 12:2272-2281 http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2486.2006.01256.x/full
E. Fritsch (2012) "Les sols. Atlas de la Nouvelle-Caledonie (ed. by J. Bonvallot, J.-C. Gay and E. Habert)" IRD-Congres de la Nouvelle-Caledonie, Marseille. 73-76
Evaluation of SDM or ESDM habitat suitability predictions or evalaution of SSDM floristic composition with Pottier et al, 2013 method (see reference below)
evaluate(obj, ...) ## S4 method for signature 'Algorithm.SDM' evaluate( obj, cv, cv.param, final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, Env, ... ) ## S4 method for signature 'MAXENT.SDM' evaluate( obj, cv, cv.param, final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, Env, ... ) ## S4 method for signature 'Stacked.SDM' evaluate(obj, ...)
evaluate(obj, ...) ## S4 method for signature 'Algorithm.SDM' evaluate( obj, cv, cv.param, final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, Env, ... ) ## S4 method for signature 'MAXENT.SDM' evaluate( obj, cv, cv.param, final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, Env, ... ) ## S4 method for signature 'Stacked.SDM' evaluate(obj, ...)
obj |
Stacked.SDM. SSDM to evaluate |
... |
arguments for internal use (get_model), such as argument lists to be passed to the source functions (e.g. glm.args=list(test="AIC",singular.ok=FALSE)) |
cv |
character. Method of cross-validation used to evaluate the SDM (see details below). |
cv.param |
numeric. Parameters associated to the method of cross-validation used to evaluate the SDM (see details below). |
final.fit.data |
strategy used for fitting the final model to be returned: 'holdout'= use same train and test data as in (last) evaluation, 'all'= train model with all data (i.e. no test data) or numeric (0-1)= sample a custom training fraction (left out fraction is set aside as test data) |
bin.thresh |
character. Classification threshold ( |
metric |
(deprecated) character. Classification threshold ( |
thresh |
(deprecated) integer. Number of equally spaced thresholds in the interval 0-1 ( |
Env |
raster object. Stacked raster object of environmental variables
(can be processed first by |
SDM/ESDM/SSDM evaluation in a data.frame
Pottier, J., Dubuis, A., Pellissier, L., Maiorano, L., Rossier, L., Randin, C. F., Guisan, A. (2013). The .accuracy of plant assemblage prediction from species distribution models varies along environmental gradients. Global Ecology and Biogeography, 22(1), 52-63. https://doi.org/10.1111/j.1466-8238.2012.00790.x
## Not run: # Loading data data(Env) data(Occurrences) # SSDM building SSDM <- stack_modelling(c('CTA', 'SVM'), Occurrences, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', Spcol = 'SPECIES') # Evaluation evaluate(SSDM) ## End(Not run)
## Not run: # Loading data data(Env) data(Occurrences) # SSDM building SSDM <- stack_modelling(c('CTA', 'SVM'), Occurrences, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', Spcol = 'SPECIES') # Evaluation evaluate(SSDM) ## End(Not run)
User interface of the SSDM package.
gui( port = getOption("shiny.port"), host = getOption("shiny.host", "127.0.0.1"), working.directory = getwd() )
gui( port = getOption("shiny.port"), host = getOption("shiny.host", "127.0.0.1"), working.directory = getwd() )
port |
char. The TCP port that the application should listen on (see
|
host |
char. The IPv4 address that the application should listen on (see
|
working.directory |
char. Directory in which the application will run. |
If your environmental variables have an important size, you should
give enough memory to the interface with the (maxmem
parameter).
Note that only one instance of gui can be run at a time.
Open a window with a shiny app to use the SSDM package with an user-friendly interface.
## Not run: gui() ## End(Not run)
## Not run: gui() ## End(Not run)
Load occurrence data from CSV file to perform modelling
,
ensemble_modelling
or stack_modelling
.
load_occ( path = getwd(), Env, file = NULL, ..., Xcol = "Longitude", Ycol = "Latitude", Spcol = NULL, GeoRes = TRUE, reso = max(res(Env@layers[[1]])), verbose = TRUE, GUI = FALSE )
load_occ( path = getwd(), Env, file = NULL, ..., Xcol = "Longitude", Ycol = "Latitude", Spcol = NULL, GeoRes = TRUE, reso = max(res(Env@layers[[1]])), verbose = TRUE, GUI = FALSE )
path |
character. Path to the directory that contains the occurrence table. |
Env |
raster stack. Environmental variables in the form of a raster stack used to
perform spatial thinning (can be the result of the
|
file |
character. File containing the occurrence table, if NULL (default) the .csv file located in the path will be loaded. |
... |
additional parameters given to |
Xcol |
character. Name of the Latitude or X coordinate variable. |
Ycol |
character. Name of the Longitude or Y coordinate variable. |
Spcol |
character. Name of the column containing species names or IDs. |
GeoRes |
logical. If |
reso |
numeric. Resolution used to perform the geographical thinning,
default is the resolution of |
verbose |
logical. If |
GUI |
logical. Parameter reserved for graphical interface. |
A data frame containing the occurrence dataset (spatially thinned or not).
load_var
to load environmental variables.
## Not run: load_occ(path = system.file('extdata', package = 'SSDM'), Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', file = 'Occurrences.csv', sep = ',') ## End(Not run)
## Not run: load_occ(path = system.file('extdata', package = 'SSDM'), Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', file = 'Occurrences.csv', sep = ',') ## End(Not run)
Function to load environmental variables in the form of rasters to perform
modelling
, ensemble_modelling
or
stack_modelling
.
load_var( path = getwd(), files = NULL, format = c(".grd", ".tif", ".asc", ".sdat", ".rst", ".nc", ".envi", ".bil", ".img"), categorical = NULL, Norm = FALSE, tmp = TRUE, verbose = TRUE, GUI = FALSE )
load_var( path = getwd(), files = NULL, format = c(".grd", ".tif", ".asc", ".sdat", ".rst", ".nc", ".envi", ".bil", ".img"), categorical = NULL, Norm = FALSE, tmp = TRUE, verbose = TRUE, GUI = FALSE )
path |
character. Path to the directory that contains the environmental variables files. |
files |
character. Files containing the environmental variables If NULL (default) all files present in the path in the selected format will be loaded. |
format |
character. Format of environmental variables files (including .grd, .tif, .asc, .sdat, .rst, .nc, .tif, .envi, .bil, .img). |
categorical |
character. Specify environmental variables that are categorical. |
Norm |
logical. Default FALSE. If set to TRUE, normalizes environmental variables into a range between 0 and 1. |
tmp |
logical. If set to TRUE, rasters are read in temporary file avoiding to overload the random access memory. But beware: if you close R, temporary files will be deleted. |
verbose |
logical. If set to TRUE, allows the function to print text in the console. |
GUI |
logical. Do not take that argument into account (parameter for the user interface). |
A stack containing the environmental rasters (normalized or not).
load_occ
to load occurrences.
## Not run: load_var(system.file('extdata', package = 'SSDM')) ## End(Not run)
## Not run: load_var(system.file('extdata', package = 'SSDM')) ## End(Not run)
Load S4 Ensemble.SDM and Stacked.SDM objects saved with their respective save function.
load_esdm(name, path = getwd()) load_stack(name = "Stack", path = getwd(), GUI = FALSE)
load_esdm(name, path = getwd()) load_stack(name = "Stack", path = getwd(), GUI = FALSE)
name |
character. Name of the folder containing the model to be loaded. |
path |
character. Path to the directory containing the model to be loaded, by default the path to the current directory. |
GUI |
logical. Do not take this argument into account (parameter for the user interface). |
The corresponding SDM object.
Methods for Stacked.SDM or SSDM to map diversity and communities composition.
mapDiversity(obj, ...) ## S4 method for signature 'Stacked.SDM' mapDiversity(obj, method, rep.B = 1000, verbose = TRUE, Env = NULL, ...)
mapDiversity(obj, ...) ## S4 method for signature 'Stacked.SDM' mapDiversity(obj, method, rep.B = 1000, verbose = TRUE, Env = NULL, ...)
obj |
Stacked.SDM. SSDM to map diversity with. |
... |
other arguments pass to the method. |
method |
character. Define the method used to create the local species richness map (see details below). |
rep.B |
integer. If the method used to create the local species richness is the random Bernoulli (Bernoulli), rep.B parameter defines the number of repetitions used to create binary maps for each species. |
verbose |
logical. If set to true, allows the function to print text in the console. |
Env |
raster object. Stacked raster object of environmental variables
(can be processed first by |
Methods: Choice of the method used to compute the local species richness map (see Calabrese et al. (2014) and D'Amen et al (2015) for more informations, see reference below):
sum probabilities of habitat suitability maps
draw repeatedly from a Bernoulli distribution
sum the binary map obtained with the thresholding (depending on the metric of the ESDM).
adjust species richness of the model by linear regression
model richness with a macroecological model (MEM) and adjust each ESDM binary map by ranking habitat suitability and keeping as much as predicted richness of the MEM
model richness with a pSSDM and adjust each ESDM binary map by ranking habitat suitability and keeping as much as predicted richness of the pSSDM
a list with a diversity map and eventually ESDMs for stacking method using probability ranking from richness (PPR).
M. D'Amen, A. Dubuis, R. F. Fernandes, J. Pottier, L. Pelissier, & A Guisan (2015) "Using species richness and functional traits prediction to constrain assemblage predicitions from stacked species distribution models" Journal of Biogeography 42(7):1255-1266 http://doc.rero.ch/record/235561/files/pel_usr.pdf
J.M. Calabrese, G. Certain, C. Kraan, & C.F. Dormann (2014) "Stacking species distribution models and adjusting bias by linking them to macroecological models." Global Ecology and Biogeography 23:99-112 https://onlinelibrary.wiley.com/doi/full/10.1111/geb.12102
stacking
to build SSDMs.
## Not run: # Loading data data(Env) data(Occurrences) # SSDM building SSDM <- stack_modelling(c('CTA', 'SVM'), Occurrences, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', Spcol = 'SPECIES') # Diversity mapping mapDiversity(SSDM, mathod = 'pSSDM') ## End(Not run)
## Not run: # Loading data data(Env) data(Occurrences) # SSDM building SSDM <- stack_modelling(c('CTA', 'SVM'), Occurrences, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', Spcol = 'SPECIES') # Diversity mapping mapDiversity(SSDM, mathod = 'pSSDM') ## End(Not run)
This is a function to build an SDM with one algorithm for a single species. The function takes as inputs an occurrence data frame made of presence/absence or presence-only records and a raster object for data extraction and projection. The function returns an S4 Algorithm.SDM class object containing the habitat suitability map, the binary map and the evaluation table.
modelling( algorithm, Occurrences, Env, Xcol = "Longitude", Ycol = "Latitude", Pcol = NULL, name = NULL, PA = NULL, cv = "holdout", cv.param = c(0.7, 2), final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, axes.metric = "Pearson", select = FALSE, select.metric = c("AUC"), select.thresh = c(0.75), verbose = TRUE, GUI = FALSE, ... )
modelling( algorithm, Occurrences, Env, Xcol = "Longitude", Ycol = "Latitude", Pcol = NULL, name = NULL, PA = NULL, cv = "holdout", cv.param = c(0.7, 2), final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, axes.metric = "Pearson", select = FALSE, select.metric = c("AUC"), select.thresh = c(0.75), verbose = TRUE, GUI = FALSE, ... )
algorithm |
character. Choice of the algorithm to be run (see details below). |
Occurrences |
data frame. Occurrence table (can be processed first by
|
Env |
raster object. Raster object of environmental variable (can be
processed first by |
Xcol |
character. Name of the column in the occurrence table containing Latitude or X coordinates. |
Ycol |
character. Name of the column in the occurrence table containing Longitude or Y coordinates. |
Pcol |
character. Name of the column in the occurrence table specifying whether a line is a presence or an absence. A value of 1 is presence and value of 0 is absence. If NULL presence-only dataset is assumed. |
name |
character. Optional name given to the final SDM produced (by default 'Algorithm.SDM'). |
PA |
list(nb, strat) defining the pseudo-absence selection strategy used in case of presence-only dataset. If PA is NULL, recommended PA selection strategy is used depending on the algorithms (see details below). |
cv |
character. Method of cross-validation used to evaluate the SDM (see details below). |
cv.param |
numeric. Parameters associated to the method of cross-validation used to evaluate the SDM (see details below). |
final.fit.data |
strategy used for fitting the final model to be returned: 'holdout'= use same train and test data as in (last) evaluation, 'all'= train model with all data (i.e. no test data) or numeric (0-1)= sample a custom training fraction (left out fraction is set aside as test data) |
bin.thresh |
character. Classification threshold ( |
metric |
(deprecated) character. Classification threshold ( |
thresh |
(deprecated) integer. Number of equally spaced thresholds in the interval 0-1 ( |
axes.metric |
Metric used to evaluate variable relative importance (see details below). |
select |
logical. If set to true, models are evaluated before being projected, and not kept if they don't meet selection criteria (see details below). |
select.metric |
character. Metric(s) used to pre-select SDMs that reach a sufficient quality (see details below). |
select.thresh |
numeric. Threshold(s) associated with the metric(s) used to compute the selection. |
verbose |
logical. If set to true, allows the function to print text in the console. |
GUI |
logical. Don't take that argument into account (parameter for the user interface). |
... |
additional parameters, e.g. argument lists for the source algorithm modelling functions (see details below). |
'all' allows to call directly all available algorithms. Currently, available algorithms include Generalized linear model (GLM), Generalized additive model (GAM), Multivariate adaptive regression splines (MARS), Generalized boosted regressions model (GBM), Classification tree analysis (CTA), Random forest (RF), Maximum entropy (MAXENT), Artificial neural network (ANN), and Support vector machines (SVM). Each algorithm has its own parameters settable with the ... by supplying argument lists (see each algorithm section below to set their parameters).
list with two values: nb number of pseudo-absences selected, and strat strategy used to select pseudo-absences: either random selection or disk selection. We set default recommendation from Barbet-Massin et al. (2012) (see reference).
Cross-validation method used to split the occurrence dataset used for evaluation: holdout data are partitioned into a training set and an evaluation set using a fraction (cv.param[1]) and the operation can be repeated (cv.param[2]) times, k-fold data are partitioned into k (cv.param[1]) folds being k-1 times in the training set and once the evaluation set and the operation can be repeated (cv.param[2]) times, LOO (Leave One Out) each point is successively taken as evaluation data.
Choice of the metric used to binarize model predictions and compute the confusion matrix (by default SES as recommended by Liu et al. (2005), see reference below): Kappa maximizes the Kappa, NOM highest threshold without omission, TSS (True Skill Statistic) maximizes the sum of sensitivity and specificity, SES uses the sensitivity-specificity equality, EP threshold where modeled prevalence is closest to observed prevalence.
Choice of the metric used to compute the binary map threshold and the confusion matrix (by default SES as recommended by Liu et al. (2005), see reference below): Kappa maximizes the Kappa, CCR maximizes the proportion of correctly predicted observations, TSS (True Skill Statistic) maximizes the sum of sensitivity and specificity, SES uses the sensitivity-specificity equality, LW uses the lowest occurrence prediction probability, ROC minimizes the distance between the ROC plot (receiving operating curve) and the upper left corner (1,1).
Choice of the metric used to evaluate the variable relative importance (difference between a full model and one with each variable successively omitted): Pearson (computes a simple Pearson's correlation r between predictions of the full model and the one without a variable, and returns the score 1-r: the highest the value, the more influence the variable has on the model), AUC, Kappa, sensitivity, specificity, and prop.correct (proportion of correctly predicted occurrences).
Selection metric(s) used to select SDMs: AUC, Kappa, sensitivity, specificity, and prop.correct (proportion of correctly predicted occurrences), calibration (calibration statistic as used by Naimi & Araujo 2016).
See algorithm in detail section
an S4 Algorithm.SDM Class object viewable with the
plot.model
method.
Uses the glm
function from the package 'stats'. You can set parameters by supplying glm.args=list(arg1=val1,arg2=val2)
(see glm
for all settable arguments).
The following parameters have defaults:
character. Test used to evaluate the SDM, default 'AIC'.
list (created with glm.control
).
Contains parameters for controlling the fitting process. Default is glm.control(epsilon = 1e-08, maxit = 500)
.
'epsilon' is a numeric and defines the positive convergence tolerance (eps).
'maxit' is an integer giving the maximal number of IWLS (Iterative Weighted Last Squares) iterations.
Uses the gam
function from the package 'mgcv'. You can set parameters by supplying gam.args=list(arg1=val1,arg2=val2)
(see gam
for all settable arguments).
The following parameters have defaults:
character. Test used to evaluate the model, default 'AIC'.
list (created with gam.control
).
Contains parameters for controlling the fitting process. Default is gam.control(epsilon = 1e-08, maxit = 500)
.
'epsilon' is a numeric used for judging the conversion of the GLM IRLS (Iteratively Reweighted Least Squares) loop. 'maxit' is an integer giving the maximum number of IRLS iterations to perform.
Uses the
earth
function from the package 'earth'. You can set parameters by supplying mars.args=list(arg1=val1,arg2=val2)
(see earth
for all settable arguments).
The following parameters have defaults:
integer. Maximum degree of interaction (Friedman's mi) ; 1 meaning build an additive model (i.e., no interaction terms). By default, set to 2.
Uses the
gbm
function from the package 'gbm'. You can set parameters by supplying gbm.args=list(arg1=val1,arg2=val2)
(see gbm
for all settable arguments).
The following parameters have defaults:
character. Automatically detected from the format of the presence column in the occurrence dataset.
integer. The total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. By default, set to 2500.
integer. minimum number of observations in the trees terminal nodes. Note that this is the actual number of observations, not the total weight. By default, set to 1.
integer. Number of cross-validation folds to perform. If cv.folds>1 then gbm - in addition to the usual fit - will perform a cross-validation. By default, set to 3.
numeric. A shrinkage parameter applied to each tree in the expansion (also known as learning rate or step-size reduction). By default, set to 0.001.
numeric. Fraction of the training set observations randomly selected to propose the next tree in the expansion.
numeric. Training fraction used to fit the first gbm. The remainder is used to compute out-of-sample estimates of the loss function. By default, set to 1 (since evaluation/holdout is done with SSDM::evaluate
.
integer. Number of cores to use for parallel computation of the CV folds. By default, set to 1. If you intend to use this, please set ncores=0
to avoid conflicts.
Uses the rpart
function from the package 'rpart'. You can set parameters by supplying cta.args=list(arg1=val1,arg2=val2)
(see rpart
for all settable arguments).
The following parameters have defaults:
list (created with rpart.control
).
Contains parameters for controlling the rpart fit. The default is rpart.control(minbucket=1, xval=3)
.
'mibucket' is an integer giving the minimum number of observations in any
terminal node. 'xval' is an integer defining the number of
cross-validations.
Uses the randomForest
function
from the package 'randomForest'. You can set parameters by supplying cta.args=list(arg1=val1,arg2=val2)
(see randomForest
all settable arguments).
The following parameters have defaults:
integer. Number of trees to grow. This should not be set to a too small number, to ensure that every input row gets predicted at least a few times. By default, set to 2500.
integer. Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). By default, set to 1.
Uses the maxent
function
from the package 'dismo'. Make sure that you have correctly installed the
maxent.jar file in the folder ~\R\library\version\dismo\java available
at https://biodiversityinformatics.amnh.org/open_source/maxent/. As with the other algorithms, you can set parameters by supplying maxent.args=list(arg1=val1,arg2=val2)
. Mind that arguments are passed from dismo to the MAXENT software again as an argument list (see maxent
for more details).
No specific defaults are set with this method.
Uses the nnet
function from the package 'nnet'. You can set parameters by supplying ann.args=list(arg1=val1,arg2=val2)
(see nnet
for all settable arguments).
The following parameters have defaults:
integer. Number of units in the hidden layer. By default, set to 6.
integer. Maximum number of iterations, default 500.
Uses the svm
function
from the package 'e1071'. You can set parameters by supplying svm.args=list(arg1=val1,arg2=val2)
(see svm
for all settable arguments).
The following parameters have defaults:
character. Regression/classification type SVM should be used with. By default, set to "eps-regression".
float. Epsilon parameter in the insensitive loss function, default 1e-08.
integer. If an integer value k>0 is specified, a k-fold cross-validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Squared Error for regression. By default, set to 3.
character. The kernel used in training and predicting. By default, set to "radial".
numeric. Parameter needed for all kernels, default 1/(length(data) -1)
.
Depending on the raster object resolution the process can be more or less time and memory consuming.
M. Barbet-Massin, F. Jiguet, C. H. Albert, & W. Thuiller (2012) 'Selecting pseudo-absences for species distribution models: how, where and how many?' Methods Ecology and Evolution 3:327-338 http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00172.x/full
C. Liu, P. M. Berry, T. P. Dawson, R. & G. Pearson (2005) 'Selecting thresholds of occurrence in the prediction of species distributions.' Ecography 28:85-393 http://www.researchgate.net/publication/230246974_Selecting_Thresholds_of_Occurrence_in_the_Prediction_of_Species_Distributions
ensemble_modelling
to build ensemble SDMs,
stack_modelling
to build SSDMs.
# Loading data data(Env) data(Occurrences) Occurrences <- subset(Occurrences, Occurrences$SPECIES == 'elliptica') # SDM building SDM <- modelling('GLM', Occurrences, Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE') # Results plotting ## Not run: plot(SDM) ## End(Not run)
# Loading data data(Env) data(Occurrences) Occurrences <- subset(Occurrences, Occurrences$SPECIES == 'elliptica') # SDM building SDM <- modelling('GLM', Occurrences, Env, Xcol = 'LONGITUDE', Ycol = 'LATITUDE') # Results plotting ## Not run: plot(SDM) ## End(Not run)
A dataset containing plant occurrences of five Cryptocarya species native to New Caledonia. Occurrence data come from the Noumea Herbarium (NOU) and NC-PIPPN network (see Ibanez et al (2014) in reference below).
Occurrences
Occurrences
A data frame with 57 rows and 3 variables:
Species of the occurrence
Longitude of the occurrence
Latitude of the occurrence
T. Ibanez, J. Munzinger, G. Dagostini, V. Hequet, F. Rigault, T. Jaffre, & P. Birnbaum (2014) "Structural and floristic characteristics of mixed rainforest in New Caledonia: new data from the New Caledonian Plant Inventory and Permanent Plot Network (NC-PIPPN)." Applied Vegetation Science 17:386-397
Allows to plot S4 Algorithm.SDM, Ensemble.SDM and Stacked.SDM class objects.
## S4 method for signature 'Stacked.SDM,ANY' plot(x, y, ...) ## S4 method for signature 'SDM,ANY' plot(x, y, ...)
## S4 method for signature 'Stacked.SDM,ANY' plot(x, y, ...) ## S4 method for signature 'SDM,ANY' plot(x, y, ...)
x |
Object to be plotted (S4 Algorithm.SDM, Ensemble.SDM or Stacked.SDM object). |
y , ...
|
Plot-based parameter not used. |
Open a window with a shiny app rendering all the results (habitat suitability map, binary map, evaluation table, variable importance and/or between-algorithm variance map, and/or algorithm evaluation, and/or algorithm correlation matrix and/or local species richness map) in a user-friendly interface.
This is a collection of methods to project SDMs, ESDMs or SSDMs into the supplied environment. The function is used internally to calculate the input for the projection slot of .SDM classes but can also be used to project existing .SDM objects (see Details).
project(obj, Env, ...) ## S4 method for signature 'Algorithm.SDM' project(obj, Env, output.format = "model", ...) ## S4 method for signature 'MAXENT.SDM' project(obj, Env, output.format = "model", ...) ## S4 method for signature 'Ensemble.SDM' project( obj, Env, uncertainty = TRUE, output.format = "model", SDM.projections = FALSE, cores = 0, minimal.memory = FALSE, tmp = FALSE, ... ) ## S4 method for signature 'Stacked.SDM' project( obj, Env, method = NULL, uncertainty = TRUE, output.format = "model", SDM.projections = FALSE, cores = 0, minimal.memory = FALSE, tmp = FALSE, ... )
project(obj, Env, ...) ## S4 method for signature 'Algorithm.SDM' project(obj, Env, output.format = "model", ...) ## S4 method for signature 'MAXENT.SDM' project(obj, Env, output.format = "model", ...) ## S4 method for signature 'Ensemble.SDM' project( obj, Env, uncertainty = TRUE, output.format = "model", SDM.projections = FALSE, cores = 0, minimal.memory = FALSE, tmp = FALSE, ... ) ## S4 method for signature 'Stacked.SDM' project( obj, Env, method = NULL, uncertainty = TRUE, output.format = "model", SDM.projections = FALSE, cores = 0, minimal.memory = FALSE, tmp = FALSE, ... )
obj |
Object of class Algorithm.SDM, Ensemble.SDM or Stacked.SDM. Model(s) to be projected. |
Env |
Raster stack. Updated environmental rasters to be used for projection. |
... |
arguments for internal use (get_model), such as argument lists to be passed to the source functions (e.g. glm.args=list(test="AIC",singular.ok=FALSE)). See |
output.format |
character. If 'model' (default), the original .SDM object will be returned with updated projection slots. If 'rasters', the projected rasters will be returned as a list of rasters. |
uncertainty |
logical. If set to TRUE, generates an uncertainty map. If output.format is 'model' an algorithm correlation matrix is additionally returned. |
SDM.projections |
logical. If FALSE (default), the projections of the Algorithm.SDMs will not be returned (only applies to Ensemble.SDMs and Stack.SDMs). |
cores |
integer. Specify the number of CPU cores used to do the
computing. You can use |
minimal.memory |
logical. Only relevant if cores >1. If TRUE, only one model will be sent to each worker at a time, reducing used working memory. |
tmp |
logical or character. If FALSE, no temporary rasters are written. If TRUE, temporary rasters are written to the „tmp“ directory of your R environment. If character, temporary rasters are written to a custom path. Very useful to reduce working memory consumption (use together with minimal.memory=TRUE for maximal effect). But beware: Depending on number, resolution and extent of models, temporary files can take a lot of disk space. |
method |
character. Define the method used to create the local species
richness map (for details see |
The function uses any S4 .SDM class object and a raster stack of environmental layers of the variables the model was trained with.
Either returns the original .SDM object with updated projection slots (default) or if output.format = 'rasters' only returns the projections as Raster* objects or a list thereof.
Allows to save S4 Ensemble.SDM and Stacked.SDM class objects.
save.esdm( esdm, name = strsplit(esdm@name, ".", fixed = TRUE)[[1]][1], path = getwd(), verbose = TRUE, GUI = FALSE ) ## S4 method for signature 'Ensemble.SDM' save.esdm( esdm, name = strsplit(esdm@name, ".Ensemble.SDM", fixed = TRUE)[[1]][1], path = getwd(), verbose = TRUE, GUI = FALSE ) save.stack(stack, name = "Stack", path = getwd(), verbose = TRUE, GUI = FALSE) ## S4 method for signature 'Stacked.SDM' save.stack(stack, name = "Stack", path = getwd(), verbose = TRUE, GUI = FALSE)
save.esdm( esdm, name = strsplit(esdm@name, ".", fixed = TRUE)[[1]][1], path = getwd(), verbose = TRUE, GUI = FALSE ) ## S4 method for signature 'Ensemble.SDM' save.esdm( esdm, name = strsplit(esdm@name, ".Ensemble.SDM", fixed = TRUE)[[1]][1], path = getwd(), verbose = TRUE, GUI = FALSE ) save.stack(stack, name = "Stack", path = getwd(), verbose = TRUE, GUI = FALSE) ## S4 method for signature 'Stacked.SDM' save.stack(stack, name = "Stack", path = getwd(), verbose = TRUE, GUI = FALSE)
esdm |
Ensemble.SDM. Ensemble SDM to be saved. |
name |
character. Folder name of the model to save. |
path |
character. Path to the directory chosen to save the SDM, by default the path to the current directory. |
verbose |
logical. If set to true, allows the function to print text in the console. |
GUI |
logical. Don't take that argument into account (parameter for the user interface). |
stack |
Stacked.SDM. SSDM to be saved. |
Nothing in R environment. Creates folders, tables and rasters associated to the SDM. Tables are in .csv and rasters in .grd/.gri.
SSDM is a package to map species richness and endemism based on Stacked Species Distribution Models (SSDM). It provides tools to build
SDM, i.e. a single species fitted with a single algorithm, Ensemble SDM (ESDM), i.e. a single species fitted with multiple algorithms,
SSDM several species with one or more algorithms. The package includes numerous modelling algorithms, and specifiable ensemble aggregating and stacking methods.
This package also provides tools to evaluate and explore models such as variable importance, algorithm accuracy, and between-algorithm
correlation, and tools to map results such as habitat suitability maps, binary maps, between-algorithm variance maps.
For ease of use, the SSDM package provides a user-friendly graphical interface (gui
).
The SSDM package provides five categories of functions (that you can find in details below): Data preparation, Modelling main functions, Model main methods, Model classes, and Miscellaneous.
modelling
Build an SDM using a single algorithm
ensemble_modelling
Build an SDM that assembles multiple algorithms
stack_modelling
Build an SSDMs that assembles multiple algorithms and species
ensemble,Algorithm.SDM-method
Build an ensemble SDM
stacking,Ensemble.SDM-method
Build an SSDM
update,Stacked.SDM-method
Update a previous SSDM with new occurrence data
Algorithm.SDM
S4 class to represent SDMs
Ensemble.SDM
S4 class to represent ensemble SDMs
Stacked.SDM
S4 class to represent SSDMs
gui
User-friendly interface for SSDM package
plot.model
Plot SDMs
save.model
Save SDMs
load.model
Load SDMs
This is a function to build an SSDM that assembles multiple algorithm and
species. The function takes as inputs an occurrence data frame made of
presence/absence or presence-only records and a raster object for data
extraction and projection. The function returns an S4
Stacked.SDM class object containing the local species richness
map, the between-algorithm variance map, and all evaluation tables coming with
(model evaluation, algorithm evaluation, algorithm correlation matrix and
variable importance), and a list of ensemble SDMs for each species (see
ensemble_modelling
).
stack_modelling( algorithms, Occurrences, Env, Xcol = "Longitude", Ycol = "Latitude", Pcol = NULL, Spcol = "SpeciesID", rep = 10, name = NULL, save = FALSE, path = getwd(), PA = NULL, cv = "holdout", cv.param = c(0.7, 1), final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, axes.metric = "Pearson", uncertainty = TRUE, tmp = FALSE, SDM.projections = FALSE, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, method = "pSSDM", rep.B = 1000, range = NULL, endemism = c("WEI", "Binary"), verbose = TRUE, GUI = FALSE, cores = 0, parmode = "species", ... )
stack_modelling( algorithms, Occurrences, Env, Xcol = "Longitude", Ycol = "Latitude", Pcol = NULL, Spcol = "SpeciesID", rep = 10, name = NULL, save = FALSE, path = getwd(), PA = NULL, cv = "holdout", cv.param = c(0.7, 1), final.fit.data = "all", bin.thresh = "SES", metric = NULL, thresh = 1001, axes.metric = "Pearson", uncertainty = TRUE, tmp = FALSE, SDM.projections = FALSE, ensemble.metric = c("AUC"), ensemble.thresh = c(0.75), weight = TRUE, method = "pSSDM", rep.B = 1000, range = NULL, endemism = c("WEI", "Binary"), verbose = TRUE, GUI = FALSE, cores = 0, parmode = "species", ... )
algorithms |
character. Choice of the algorithm(s) to be run (see details below). |
Occurrences |
data frame. Occurrence table (can be processed first by
|
Env |
raster object. Raster object of environmental variables (can be
processed first by |
Xcol |
character. Name of the column in the occurrence table containing Latitude or X coordinates. |
Ycol |
character. Name of the column in the occurrence table containing Longitude or Y coordinates. |
Pcol |
character. Name of the column in the occurrence table specifying whether a line is a presence or an absence. A value of 1 is presence and value of 0 is absence. If NULL presence-only dataset is assumed. |
Spcol |
character. Name of the column containing species names or IDs. |
rep |
integer. Number of repetitions for each algorithm. |
name |
character. Optional name given to the final Ensemble.SDM produced. |
save |
logical. If set to true, the SSDM is automatically saved. |
path |
character. If save is true, the path to the directory in which the ensemble SDM will be saved. |
PA |
list(nb, strat) defining the pseudo-absence selection strategy used in case of presence-only dataset. If PA is NULL, recommended PA selection strategy is used depending on the algorithm (see details below). |
cv |
character. Method of cross-validation used to evaluate the ensemble SDM (see details below). |
cv.param |
numeric. Parameters associated with the method of cross-validation used to evaluate the ensemble SDM (see details below). |
final.fit.data |
strategy used for fitting the final/evaluated Algorithm.SDMs: 'holdout'= use same train and test data as in (last) evaluation, 'all'= train model with all data (i.e. no test data) or numeric (0-1)= sample a custom training fraction (left out fraction is set aside as test data) |
bin.thresh |
character. Classification threshold ( |
metric |
(deprecated) character. Classification threshold ( |
thresh |
(deprecated) integer. Number of equally spaced thresholds in the interval 0-1 ( |
axes.metric |
Metric used to evaluate variable relative importance (see details below). |
uncertainty |
logical. If set to true, generates an uncertainty map and an algorithm correlation matrix. |
tmp |
logical. If set to true, the habitat suitability map of each
algorithms is saved in a temporary file to release memory. But beware: if
you close R, temporary files will be deleted. To avoid any loss you can
save your SSDM with |
SDM.projections |
logical. If FALSE (default), the Algorithm.SDMs inside the 'sdms' slot will not contain projections (for memory saving purposes). |
ensemble.metric |
character. Metric(s) used to select the best SDMs that will be included in the ensemble SDM (see details below). |
ensemble.thresh |
numeric. Threshold(s) associated with the metric(s) used to compute the selection. |
weight |
logical. Choose whether or not you want the SDMs to be weighted using the selection metric or, alternatively, the mean of the selection metrics. |
method |
character. Define the method used to create the local species richness map (see details below). |
rep.B |
integer. If the method used to create the local species richness is the random bernoulli (Bernoulli), rep.B parameter defines the number of repetitions used to create binary maps for each species. |
range |
integer. Set a value of range restriction (in pixels) around presences occurrences on habitat suitability maps (all further points will have a null probability, see Crisp et al (2011) in references). If NULL, no range restriction will be applied. |
endemism |
character. Define the method used to create an endemism map (see details below). |
verbose |
logical. If set to true, allows the function to print text in the console. |
GUI |
logical. Don't take that argument into account (parameter for the user interface). |
cores |
integer. Specify the number of CPU cores used to do the
computing. You can use |
parmode |
character. Parallelization mode: along 'species', 'algorithms' or 'replicates'. Defaults to 'species'. |
... |
additional parameters for the algorithm modelling function (see details below). |
'all' allows you to call directly all available algorithms. Currently, available algorithms include Generalized linear model (GLM), Generalized additive model (GAM), Multivariate adaptive regression splines (MARS), Generalized boosted regressions model (GBM), Classification tree analysis (CTA), Random forest (RF), Maximum entropy (MAXENT), Artificial neural network (ANN), and Support vector machines (SVM). Each algorithm has its own parameters settable with the ... (see each algorithm section below to set their parameters).
list with two values: nb number of pseudo-absences selected, and strat strategy used to select pseudo-absences: either random selection or disk selection. We set default recommendation from Barbet-Massin et al. (2012) (see reference).
Cross-validation method used to split the occurrence dataset used for evaluation: holdout data are partitioned into a training set and an evaluation set using a fraction (cv.param[1]) and the operation can be repeated (cv.param[2]) times, k-fold data are partitioned into k (cv.param[1]) folds being k-1 times in the training set and once the evaluation set and the operation can be repeated (cv.param[2]) times, LOO (Leave One Out) each point is successively taken as evaluation data.
Choice of the metric used to compute the binary map threshold and the confusion matrix (by default SES as recommended by Liu et al. (2005), see reference below): Kappa maximizes the Kappa, CCR maximizes the proportion of correctly predicted observations, TSS (True Skill Statistic) maximizes the sum of sensitivity and specificity, SES uses the sensitivity-specificity equality, LW uses the lowest occurrence prediction probability, ROC minimizes the distance between the ROC plot (receiving operating curve) and the upper left corner (1,1).
Choice of the metric used to evaluate the variable relative importance (difference between a full model and one with each variable successively omitted): Pearson (computes a simple Pearson's correlation r between predictions of the full model and the one without a variable, and returns the score 1-r: the highest the value, the more influence the variable has on the model), AUC, Kappa, sensitivity, specificity, and prop.correct (proportion of correctly predicted occurrences).
Ensemble metric(s) used to select SDMs: AUC, Kappa, sensitivity, specificity, and prop.correct (proportion of correctly predicted occurrences).
Choice of the method used to compute the local species richness map (see Calabrese et al. (2014) and D'Amen et al (2015) for more informations, see reference below): pSSDM sum probabilities of habitat suitability maps, Bernoulli drawing repeatedly from a Bernoulli distribution, bSSDM sum the binary map obtained with the thresholding (depending on the metric, see metric parameter), MaximumLikelihood adjust species richness using maximum likelihood parameter estimates on the logit-transformed occurrence probabilities (see Calabrese et al. (2014)), PRR.MEM model richness with a macroecological model (MEM) and adjust each ESDM binary map by ranking habitat suitability and keeping as much as predicted richness of the MEM, PRR.pSSDM model richness with a pSSDM and adjust each ESDM binary map by ranking habitat suitability and keeping as much as predicted richness of the pSSDM
Choice of the method used to compute the endemism map (see Crisp et al. (2001) for more information, see reference below): NULL No endemism map, WEI (Weighted Endemism Index) Endemism map built by counting all species in each cell and weighting each by the inverse of its range, CWEI (Corrected Weighted Endemism Index) Endemism map built by dividing the weighted endemism index by the total count of species in the cell. First string of the character is the method either WEI or CWEI, and in those cases second string of the vector is used to precise range calculation, whether the total number of occurrences 'NbOcc' whether the surface of the binary map species distribution 'Binary'.
See algorithm in detail section
an S4 Stacked.SDM class object viewable with the
plot.model
function.
Uses the glm
function from the package 'stats'. You can set parameters by supplying glm.args=list(arg1=val1,arg2=val2)
(see glm
for all settable arguments).
The following parameters have defaults:
character. Test used to evaluate the SDM, default 'AIC'.
list (created with glm.control
).
Contains parameters for controlling the fitting process. Default is glm.control(epsilon = 1e-08, maxit = 500)
.
'epsilon' is a numeric and defines the positive convergence tolerance (eps).
'maxit' is an integer giving the maximal number of IWLS (Iterative Weighted Last Squares) iterations.
Uses the gam
function from the package 'mgcv'. You can set parameters by supplying gam.args=list(arg1=val1,arg2=val2)
(see gam
for all settable arguments).
The following parameters have defaults:
character. Test used to evaluate the model, default 'AIC'.
list (created with gam.control
).
Contains parameters for controlling the fitting process. Default is gam.control(epsilon = 1e-08, maxit = 500)
.
'epsilon' is a numeric used for judging the conversion of the GLM IRLS (Iteratively Reweighted Least Squares) loop. 'maxit' is an integer giving the maximum number of IRLS iterations to perform.
Uses the
earth
function from the package 'earth'. You can set parameters by supplying mars.args=list(arg1=val1,arg2=val2)
(see earth
for all settable arguments).
The following parameters have defaults:
integer. Maximum degree of interaction (Friedman's mi) ; 1 meaning build an additive model (i.e., no interaction terms). By default, set to 2.
Uses the
gbm
function from the package 'gbm'. You can set parameters by supplying gbm.args=list(arg1=val1,arg2=val2)
(see gbm
for all settable arguments).
The following parameters have defaults:
character. Automatically detected from the format of the presence column in the occurrence dataset.
integer. The total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. By default, set to 2500.
integer. minimum number of observations in the trees terminal nodes. Note that this is the actual number of observations, not the total weight. By default, set to 1.
integer. Number of cross-validation folds to perform. If cv.folds>1 then gbm - in addition to the usual fit - will perform a cross-validation. By default, set to 3.
numeric. A shrinkage parameter applied to each tree in the expansion (also known as learning rate or step-size reduction). By default, set to 0.001.
numeric. Fraction of the training set observations randomly selected to propose the next tree in the expansion.
numeric. Training fraction used to fit the first gbm. The remainder is used to compute out-of-sample estimates of the loss function. By default, set to 1 (since evaluation/holdout is done with SSDM::evaluate
.
integer. Number of cores to use for parallel computation of the CV folds. By default, set to 1. If you intend to use this, please set ncores=0
to avoid conflicts.
Uses the rpart
function from the package 'rpart'. You can set parameters by supplying cta.args=list(arg1=val1,arg2=val2)
(see rpart
for all settable arguments).
The following parameters have defaults:
list (created with rpart.control
).
Contains parameters for controlling the rpart fit. The default is rpart.control(minbucket=1, xval=3)
.
'mibucket' is an integer giving the minimum number of observations in any
terminal node. 'xval' is an integer defining the number of
cross-validations.
Uses the randomForest
function
from the package 'randomForest'. You can set parameters by supplying cta.args=list(arg1=val1,arg2=val2)
(see randomForest
all settable arguments).
The following parameters have defaults:
integer. Number of trees to grow. This should not be set to a too small number, to ensure that every input row gets predicted at least a few times. By default, set to 2500.
integer. Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). By default, set to 1.
Uses the maxent
function
from the package 'dismo'. Make sure that you have correctly installed the
maxent.jar file in the folder ~\R\library\version\dismo\java available
at https://biodiversityinformatics.amnh.org/open_source/maxent/. As with the other algorithms, you can set parameters by supplying maxent.args=list(arg1=val1,arg2=val2)
. Mind that arguments are passed from dismo to the MAXENT software again as an argument list (see maxent
for more details).
No specific defaults are set with this method.
Uses the nnet
function from the package 'nnet'. You can set parameters by supplying ann.args=list(arg1=val1,arg2=val2)
(see nnet
for all settable arguments).
The following parameters have defaults:
integer. Number of units in the hidden layer. By default, set to 6.
integer. Maximum number of iterations, default 500.
Uses the svm
function
from the package 'e1071'. You can set parameters by supplying svm.args=list(arg1=val1,arg2=val2)
(see svm
for all settable arguments).
The following parameters have defaults:
character. Regression/classification type SVM should be used with. By default, set to "eps-regression".
float. Epsilon parameter in the insensitive loss function, default 1e-08.
integer. If an integer value k>0 is specified, a k-fold cross-validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Squared Error for regression. By default, set to 3.
character. The kernel used in training and predicting. By default, set to "radial".
numeric. Parameter needed for all kernels, default 1/(length(data) -1)
.
Depending on the raster object resolution the process can be more or less time and memory consuming.
M. D'Amen, A. Dubuis, R. F. Fernandes, J. Pottier, L. Pelissier, & A Guisan (2015) "Using species richness and functional traits prediction to constrain assemblage predicitions from stacked species distribution models" Journal of Biogeography 42(7):1255-1266 http://doc.rero.ch/record/235561/files/pel_usr.pdf
M. Barbet-Massin, F. Jiguet, C. H. Albert, & W. Thuiller (2012) "Selecting pseudo-absences for species distribution models: how, where and how many?" Methods Ecology and Evolution 3:327-338 http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00172.x/full
J.M. Calabrese, G. Certain, C. Kraan, & C.F. Dormann (2014) "Stacking species distribution models and adjusting bias by linking them to macroecological models." Global Ecology and Biogeography 23:99-112 https://onlinelibrary.wiley.com/doi/full/10.1111/geb.12102
M. D. Crisp, S. Laffan, H. P. Linder & A. Monro (2001) "Endemism in the Australian flora" Journal of Biogeography 28:183-198 http://biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Crisp2001_endemism.pdf
C. Liu, P. M. Berry, T. P. Dawson, R. & G. Pearson (2005) "Selecting thresholds of occurrence in the prediction of species distributions." Ecography 28:85-393 http://www.researchgate.net/publication/230246974_Selecting_Thresholds_of_Occurrence_in_the_Prediction_of_Species_Distributions
modelling
to build simple SDMs.
## Not run: # Loading data data(Env) data(Occurrences) # SSDM building SSDM <- stack_modelling(c('CTA', 'SVM'), Occurrences, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', Spcol = 'SPECIES') # Results plotting plot(SSDM) ## End(Not run)
## Not run: # Loading data data(Env) data(Occurrences) # SSDM building SSDM <- stack_modelling(c('CTA', 'SVM'), Occurrences, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', Spcol = 'SPECIES') # Results plotting plot(SSDM) ## End(Not run)
This is an S4 class to represent SSDMs that assembles multiple algorithms
(including generalized linear model, general additive model, multivariate
adaptive splines, generalized boosted regression model, classification tree
analysis, random forest, maximum entropy, artificial neural network, and
support vector machines) built for multiple species. It is obtained with
stack_modelling
or stacking
.
name
character. Name of the SSDM (by default 'Species.SSDM').
diversity.map
raster. Local species richness map produced by the SSDM.
endemism.map
raster. Endemism map produced by the SSDM (see Crisp et al (2011) in references).
uncertainty
raster. Between-algorithm variance map.
evaluation
data frame. Evaluation of the SSDM (AUC, Kappa, omission rate, sensitivity, specificity, proportion of correctly predicted occurrences).
variable.importance
data frame. Relative importance of each variable in the SSDM.
algorithm.correlation
data frame. Between-algorithm correlation matrix.
esdms
list. List of ensemble SDMs used in the SSDM.
parameters
data frame. Parameters used to build the SSDM.
algorithm.evaluation
data frame. Evaluation of the algorithm averaging the metrics of all SDMs (AUC, Kappa, omission rate, sensitivity, specificity, proportion of correctly predicted occurrences).
M. D. Crisp, S. Laffan, H. P. Linder & A. Monro (2001) "Endemism in the Australian flora" Journal of Biogeography 28:183-198 http://biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Crisp2001_endemism.pdf
Ensemble.SDM an S4 class to represent ensemble SDMs, and Algorithm.SDM an S4 class to represent SDMs.
This is a function to stack several ensemble SDMs in an SSDM. The function
takes as inputs several S4 Ensemble.SDM class objects produced
with ensemble_modelling
or ensemble
functions. The
function returns an S4 Stacked.SDM class object containing the
local species richness map, the between-algorithm variance map, and all
evaluation tables coming with (model evaluation, algorithm evaluation,
algorithm correlation matrix and variable importance), and a list of ensemble
SDMs for each species (see ensemble_modelling
).
stacking( esdm, ..., name = NULL, method = "pSSDM", rep.B = 1000, Env = NULL, range = NULL, endemism = c("WEI", "Binary"), eval = TRUE, uncertainty = TRUE, verbose = TRUE, GUI = FALSE ) ## S4 method for signature 'Ensemble.SDM' stacking( esdm, ..., name = NULL, method = "pSSDM", rep.B = 1000, Env = NULL, range = NULL, endemism = c("WEI", "Binary"), eval = TRUE, uncertainty = TRUE, verbose = TRUE, GUI = FALSE )
stacking( esdm, ..., name = NULL, method = "pSSDM", rep.B = 1000, Env = NULL, range = NULL, endemism = c("WEI", "Binary"), eval = TRUE, uncertainty = TRUE, verbose = TRUE, GUI = FALSE ) ## S4 method for signature 'Ensemble.SDM' stacking( esdm, ..., name = NULL, method = "pSSDM", rep.B = 1000, Env = NULL, range = NULL, endemism = c("WEI", "Binary"), eval = TRUE, uncertainty = TRUE, verbose = TRUE, GUI = FALSE )
esdm , ...
|
character. Ensemble SDMs to be stacked. |
name |
character. Optional name given to the final SSDM produced (by default 'Species.SDM'). |
method |
character. Define the method used to create the local species richness map (see details below). |
rep.B |
integer. If the method used to create the local species richness is the random bernoulli (Bernoulli), rep.B parameter defines the number of repetitions used to create binary maps for each species. |
Env |
raster object. Stacked raster object of environmental variables
(can be processed first by |
range |
integer. Set a value of range restriction (in pixels) around presences occurrences on habitat suitability maps (all further points will have a null probability, see Crisp et al (2011) in references). If NULL, no range restriction will be applied. |
endemism |
character. Define the method used to create an endemism map (see details below). |
eval |
logical. If set to FALSE, disable stack evaluation. |
uncertainty |
logical. If set to TRUE, generates an uncertainty map and an algorithm correlation matrix. |
verbose |
logical. If set to TRUE, allows the function to print text in the console. |
GUI |
logical. Don't take that argument into account (parameter for the user interface). |
Methods: Choice of the method used to compute the local species richness map (see Calabrese et al. (2014) and D'Amen et al (2015) for more informations, see reference below):
sum probabilities of habitat suitability maps
draw repeatedly from a Bernoulli distribution
sum the binary map obtained with the thresholding (depending on the metric of the ESDM).
adjust species richness of the model by linear regression
model richness with a macroecological model (MEM) and adjust each ESDM binary map by ranking habitat suitability and keeping as much as predicted richness of the MEM
model richness with a pSSDM and adjust each ESDM binary map by ranking habitat suitability and keeping as much as predicted richness of the pSSDM
Endemism: Choice of the method used to compute the endemism map (see Crisp et al. (2001) for more information, see reference below):
No endemism map
(Weighted Endemism Index) Endemism map built by counting all species in each cell and weighting each by the inverse of its range
(Corrected Weighted Endemism Index) Endemism map built by dividing the weighted endemism index by the total count of species in the cell.
First string of the character is the method either WEI or CWEI, and in those cases second string of the vector is used to precise range calculation, whether the total number of occurrences 'NbOcc' whether the surface of the binary map species distribution 'Binary'.
an S4 Stacked.SDM class object viewable with the
plot.model
function.
M. D'Amen, A. Dubuis, R. F. Fernandes, J. Pottier, L. Pelissier, & A Guisan (2015) "Using species richness and functional traits prediction to constrain assemblage predicitions from stacked species distribution models" Journal of Biogeography 42(7):1255-1266 http://doc.rero.ch/record/235561/files/pel_usr.pdf
J.M. Calabrese, G. Certain, C. Kraan, & C.F. Dormann (2014) "Stacking species distribution models and adjusting bias by linking them to macroecological models." Global Ecology and Biogeography 23:99-112 https://onlinelibrary.wiley.com/doi/full/10.1111/geb.12102
M. D. Crisp, S. Laffan, H. P. Linder & A. Monro (2001) "Endemism in the Australian flora" Journal of Biogeography 28:183-198 http://biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Crisp2001_endemism.pdf
C. Liu, P. M. Berry, T. P. Dawson, R. & G. Pearson (2005) "Selecting thresholds of occurrence in the prediction of species distributions." Ecography 28:85-393 http://www.researchgate.net/publication/230246974_Selecting_Thresholds_of_Occurrence_in_the_Prediction_of_Species_Distributions
stack_modelling
to build SSDMs.
## Not run: # Loading data data(Env) data(Occurrences) Occ1 <- subset(Occurrences, Occurrences$SPECIES == 'elliptica') Occ2 <- subset(Occurrences, Occurrences$SPECIES == 'gracilis') # SSDM building ESDM1 <- ensemble_modelling(c('CTA', 'SVM'), Occ1, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', name = 'elliptica', ensemble.thresh = c(0.6)) ESDM2 <- ensemble_modelling(c('CTA', 'SVM'), Occ2, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', name = 'gracilis', ensemble.thresh = c(0.6)) SSDM <- stacking(ESDM1, ESDM2) # Results plotting plot(SSDM) ## End(Not run)
## Not run: # Loading data data(Env) data(Occurrences) Occ1 <- subset(Occurrences, Occurrences$SPECIES == 'elliptica') Occ2 <- subset(Occurrences, Occurrences$SPECIES == 'gracilis') # SSDM building ESDM1 <- ensemble_modelling(c('CTA', 'SVM'), Occ1, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', name = 'elliptica', ensemble.thresh = c(0.6)) ESDM2 <- ensemble_modelling(c('CTA', 'SVM'), Occ2, Env, rep = 1, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', name = 'gracilis', ensemble.thresh = c(0.6)) SSDM <- stacking(ESDM1, ESDM2) # Results plotting plot(SSDM) ## End(Not run)
Update a previous SSDM with new occurrence data. The function takes as inputs updated or new occurrence data from one species, previous environmental variables, and an S4 Stacked.SDM class object containing a previously built SSDM.
## S4 method for signature 'Stacked.SDM' update( object, Occurrences, Env, Xcol = "Longitude", Ycol = "Latitude", Pcol = NULL, Spname = NULL, name = stack@name, save = FALSE, path = getwd(), thresh = 1001, tmp = FALSE, verbose = TRUE, GUI = FALSE, ... )
## S4 method for signature 'Stacked.SDM' update( object, Occurrences, Env, Xcol = "Longitude", Ycol = "Latitude", Pcol = NULL, Spname = NULL, name = stack@name, save = FALSE, path = getwd(), thresh = 1001, tmp = FALSE, verbose = TRUE, GUI = FALSE, ... )
object |
Stacked.SDM. The previously built SSDM. |
Occurrences |
data frame. New or updated occurrence table (can be
processed first by |
Env |
raster object. Environment raster object (can be processed first by
|
Xcol |
character. Name of the column in the occurrence table containing Latitude or X coordinates. |
Ycol |
character. Name of the column in the occurrence table containing Longitude or Y coordinates. |
Pcol |
character. Name of the column in the occurrence table specifying whether a line is a presence or an absence. A value of 1 is presence and value of 0 is absence. If NULL presence-only dataset is assumed. |
Spname |
character. Name of the new or updated species. |
name |
character. Optional name given to the final SSDM produced, by default it's the name of the previous SSDM. |
save |
logical. If set to true, the model is automatically saved. |
path |
character. Name of the path to the directory to contain the saved SSDM. |
thresh |
numeric. A single integer value representing the number of equal interval threshold values between 0 and 1. |
tmp |
logical. If set to true, the habitat suitability map of each
algorithm is saved in a temporary file to release memory. But beware: if you
close R, temporary files will be deleted To avoid any loss you can save
your model with |
verbose |
logical. If set to true, allows the function to print text in the console. |
GUI |
logical. Don't take that argument into account (parameter for the user interface). |
... |
additional parameters for the algorithm modelling function (see details below). |
an S4 Stacked.SDM class object viewable with the
plot.model
function.
stack_modelling
to build SSDMs.
## Not run: update(stack, Occurrences, Env, Spname = 'NewSpecie') ## End(Not run)
## Not run: update(stack, Occurrences, Env, Spname = 'NewSpecie') ## End(Not run)