Start with the necessary packages and seed for the vignette.
loadedPackages <- c("envi", "RStoolbox", "sf", "spatstat", "terra")
invisible(lapply(loadedPackages, library, character.only = TRUE))
set.seed(1234) # for reproducibilityWe use the gorillas data and the accompanying covariate
data in gorillas.extra from the spatstat.data
package on
CRAN. These data are locations of nesting sites of gorillas in the
Kagwene Gorilla Sanctuary in Cameroon. A detailed description and
analysis of the data are reported in Funwi-Gabga and Mateu
(2012). The authors used a kernel density-based smoothing technique
to detect hot spots of nesting in the park. Here, we use another kernel
density-based smoothing technique to detect hot spots of nesting within
the covariate information (i.e., the gorilla ecological niche) and then
predict where these hot spots are located within the park.
We start by importing the two covariate data of class
im:
Center and scale the covariate data.
Convert the covariate data to class SpatRaster.
Add appropriate marks to the gorillas data from
spatstat.data package. These points are considered our
“presence” locations.
presence <- unmark(gorillas)
marks(presence) <- data.frame(
  "presence" = rep(1, presence$n),
  "lon" = presence$x,
  "lat" = presence$y
)
marks(presence)$slopeangle <- slopeangle[presence]
marks(presence)$waterdist <- waterdist[presence]Randomly draw points from the study area and add the appropriate marks. These points are considered our “(pseudo-)absence” locations.
absence <- rpoispp(0.00004, win = slopeangle)
marks(absence) <- data.frame(
  "presence" = rep(0, absence$n),
  "lon" = absence$x,
  "lat" = absence$y
)
marks(absence)$slopeangle <- slopeangle[absence]
marks(absence)$waterdist <- waterdist[absence]Combine the presence (n = 647) and absence (769) locations into one
object of class data.frame and reorder the features
required for the lrren function in the envi
package:
obs_locs <- superimpose(absence, presence, check = FALSE)
marks(obs_locs)$presence <- as.factor(marks(obs_locs)$presence)
plot.ppp(
  obs_locs,
  which.marks = "presence",
  main = "Gorilla nesting sites (red-colored)\nPseudo-absence locations (blue-colored)",
  cols = c("#0000CD","#8B3A3A"),
  pch = 1,
  axes = TRUE,
  ann = TRUE
)
obs_locs <- marks(obs_locs)
obs_locs$id <- seq(1, nrow(obs_locs), 1)
obs_locs <- obs_locs[ , c(6, 2, 3, 1, 4, 5)]Extract the prediction locations within the study area from one of the covariates.
Run the lrren function within the envi
package. We use the default settings, except we want to predict the
ecological niche within the study area (predict = TRUE), so
we conduct k-fold cross-validation model fit diagnostics
(cv = TRUE) by undersampling absence locations to balance
the prevalence (0.5) within all testing data sets
(balance = TRUE).
start_time <- Sys.time() # record start time
test_lrren <- lrren(
  obs_locs = obs_locs,
  predict_locs = predict_locs,
  predict = TRUE,
  cv = TRUE,
  balance = TRUE
)
end_time <- Sys.time() # record end time
lrren_time <- end_time - start_time # calculate duration of lrren() exampleA single run of the lrren function took approximately
4.3 seconds on a machine with the features listed at the end of the
vignette.
We display the estimated ecological niche within a space of Covariate
1 by Covariate 2 using the plot_obs function. We use the
default two-tailed alpha level (alpha = 0.05) and the
default colors where the yellow color denotes areas with covariate data
combinations where we have sparse observations. As expected, extreme
values of the log relative risk surface are located near the edges of
the surface; however, these areas are highly variable and are not
statistically significant based on an asymptotic normal assumption. The
default color key for the log relative risk surface hides the
heterogeneity closer to the null value (zero). Therefore, we limit the
color key for the log relative risk surface to (-1, 1).
We observe two areas of positive log relative risk, and the center of these areas are statistically significant, suggesting two ecological niches of gorilla nesting sites compared to pseudo-absence points drawn randomly from within the park (based on only two covariates, slope gradient and distance from water). One niche is a combination of flat to moderate (about 10 - 30 degrees) slope gradient and moderate to far (about 200 - 400 meters) distance from water, which may correspond to the top of ridges. The second niche is a combination of moderate (30 - 40 degrees) slope gradient and moderate (about 100 - 200 meters) distance from water, which may correspond to within valleys.
We display the estimated ecological niche predicted to the study area
within geographic space using the plot_predict function. We
use the default two-tailed alpha level (alpha = 0.05) and
the default colors where the yellow color denotes areas with covariate
data combinations where we have sparse observations. We limit the color
key for the log relative risk prediction to (-1, 1).
plot_predict(
  test_lrren, 
  cref0 = "EPSG:32632",
  cref1 = "EPSG:4326",
  lower_lrr = -1,
  upper_lrr = 1
)The two estimated ecological niches are located in many small regions throughout the park, reflected by the large spatial heterogeneity in the two covariates. For example, the tops of ridges are the farthest distance from water and are located in many areas throughout the park. Importantly, gorilla nesting sites were not observed in many of these areas, but this prediction represents areas with combinations of covariates that match (or are similar) to the combinations occurring at observed locations.
This is an example of a scale mismatch. The scale of the gorilla nesting site presence is a broad, elliptical-shaped area in the northwest region of the park. Spatial interpolation of the gorilla nesting sites (i.e., not considering covariate information) can be seen in Figure 5a within the original study by Funwi-Gabga and Mateu (2012). The two covariates (slope gradient and distance from water) vary considerably within this area, and we observe a full range of covariate values within the larger gorilla presence area. Our approach is a version of environmental interpolation and considers covariate information when predicting the spatial distribution of gorilla nesting sites. When the gorilla nesting sites are arranged in covariate space, the smoothing bandwidth of our kernel density-based approach creates a smoother relative risk surface than the covariate surfaces themselves because the gorilla nesting sites are clustering in particular combinations of the two covariates (i.e., ecological niche) that are scattered throughout the park in geographic space.
We display the internal 10-fold cross-validation diagnostics using
the plot_cv function. We use the default two-tailed alpha
level (alpha = 0.05) and our prevalence is fairly balanced
at 0.4569209.
The log relative risk estimate accurately predicts about 60% of the gorilla nesting sites, which is better than chance (50%) but not a large improvement. The pseudo-absence locations are drawn at random throughout the park and are located in areas estimated with higher log relative risk values, which reduces the prediction performance of the model. Using observed absences instead of pseudo-absences may improve the prediction of the spatial distribution of gorilla nesting sites.
The choice in covariates is critical for ecological niche models and
especially for lrren because we are limited to two
covariates. The goal, if possible, is to discover covariates that
separate presence locations (i.e., nesting sites) from absence
locations. Because our pseudo-absence locations are randomly drawn
within the park it will be challenging to completely separate presence
and absence locations in covariate space. One approach to include more
than two covariates is to conduct an orthogonal linear transformation on
multiple covariates, such as a Principal Component Analysis (PCA). Here,
we can use the first two components of a PCA of all seven available
covariates in the gorillas.extra data in the
spatstat.data package, which include:
We can use the rasterPCA function within the
RStoolbox package to conduct a PCA of multiple spatial data
layers. We start by centering and scaling the numeric-valued layers. We
also center the factor-valued layer ‘aspect’ at “North” and we group
flatter slope types together in the factor-valued layer ‘slope type.’
NOTE: Using categorical (discrete) variables in PCA is requires more
consideration than demonstrated in this vignette (see: Kolenikov &
Angeles (2009) for more details).
aspect <- gorillas.extra$aspect # class factor
elevation <- gorillas.extra$elevation # class integer
heat <- gorillas.extra$heat # class factor
slopeangle <- gorillas.extra$slopeangle # class numeric
slopetype <- gorillas.extra$slopetype # class factor
vegetation <- gorillas.extra$vegetation # class factor
waterdist <- gorillas.extra$waterdist # class numeric
# Center and scale numeric
elevation$v <- scale(elevation$v)
slopeangle$v <- scale(slopeangle$v)
waterdist$v <- scale(waterdist$v)
# Create rasters
aspect <- rast(aspect)
names(aspect) <- 'aspect'
elevation <- rast(elevation)
names(elevation) <- 'elevation'
heat <- rast(heat)
names(heat) <- 'heat'
slopeangle <- rast(slopeangle)
names(slopeangle) <- 'slopeangle'
slopetype <- rast(slopetype)
names(slopetype) <- 'slopetype'
vegetation <- rast(vegetation)
names(vegetation) <- 'vegetation'
waterdist <- rast(waterdist)
names(waterdist) <- 'waterdist'
# Reorder aspect to center by "N" instead of "S"
values(aspect) <- factor(
  values(aspect),
  levels = c("5", "6", "7", "8", "1", "2", "3", "4")
) %>% as.numeric()
# Reorder slope types to order flatter types next to each other
values(slopetype) <- factor(
  values(slopetype),
  levels = c("1", "6", "3", "2", "4", "5")
) %>% as.numeric()
# Convert factors to numeric
values(heat) <- as.numeric(values(heat))
values(vegetation) <- as.numeric(values(vegetation))
# Stack of SpatRasters
park <- c(aspect, elevation, heat, slopeangle, slopetype, vegetation, waterdist)
# Principal Component Analysis
pca_park <- rasterPCA(park)
summary(pca_park$model) # PCA components## Importance of components:
##                           Comp.1    Comp.2    Comp.3     Comp.4     Comp.5
## Standard deviation     2.3124709 1.2328961 1.1067706 0.99305828 0.93245847
## Proportion of Variance 0.4952146 0.1407647 0.1134374 0.09132515 0.08051929
## Cumulative Proportion  0.4952146 0.6359794 0.7494168 0.84074191 0.92126120
##                           Comp.6     Comp.7
## Standard deviation     0.7603668 0.52162697
## Proportion of Variance 0.0535411 0.02519771
## Cumulative Proportion  0.9748023 1.00000000The first two components of the PCA explain almost 64% of the variation across the seven covariates.
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## aspect      0.995                                          
## elevation          0.654               -0.139  0.741       
## heat                                                  0.997
## slopeangle                0.225  0.959        -0.137       
## slopetype          0.134 -0.967  0.207                     
## vegetation         0.634        -0.166 -0.406 -0.631       
## waterdist          0.382                0.898 -0.173       
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.143  0.143  0.143  0.143  0.143  0.143  0.143
## Cumulative Var  0.143  0.286  0.429  0.571  0.714  0.857  1.000The loadings of the first component is almost entirely the centered ‘aspect’ variable. The loadings of the second component is a combination of centered ‘elevation,’ ‘vegetation type,’ centered ‘distance from water,’ and centered ‘slope type.’
# Temporary stopgap for converting 'SpatRaster' to 'im'
## Modification of `as.im.RasterLayer` from `maptools` because `maptools` is retiring in 2023
as.im.SpatRaster <- function(from, factor.col.name = NULL) {
  rs <- terra::res(from)
  orig <- sf::st_bbox(from)[1:2] + 0.5 * rs
  dm <- dim(from)[2:1]
  xx <- unname(orig[1] + cumsum(c(0, rep(rs[1], dm[1] - 1))))
  yy <- unname(orig[2] + cumsum(c(0, rep(rs[2], dm[2] - 1))))
  val <- terra::values(from)
  if (is.factor(from)) {
    lev <- levels(from)[[1]]
    if (!is.null(factor.col.name)) {
      if (factor.col.name %in% colnames(lev)) {
        factor.col <- which(colnames(lev) == factor.col.name)
      }
      else {
        stop("'factor.col.name' is not a column name of the SpatRaster 'from'")
      }
    }
    else {
      factor.col <- length(lev)
    }
    val <- factor(val, levels = lev$ID, labels = lev[[factor.col]])
  }
  dim(val) <- dm
  val <- spatstat.geom::transmat(val, from = list(x = "-i", 
                                                  y = "j"), to = "spatstat")
  im <- spatstat.geom::im(val, xcol = xx, yrow = yy)
  return(im)
}
# Extract Bands from PCA
pca_bands <- pca_park$map
pc1 <- pca_bands[[1]] # PC1
pc2 <- pca_bands[[2]] # PC2
pc1 <- as.im.SpatRaster(pc1) # convert to class 'im'
pc2 <- as.im.SpatRaster(pc2) # convert to class 'im'
plot.im(
  pc1,
  main = 'Principal Component 1\nprimarily aspect (centered at "North")',
  ann = TRUE,
  axes = TRUE
)
plot.im(
  pc2,
  main = 'Principal Component 2\ncombination of elevation, vegetation type,\ndistance from water, and slope type',
  ann = TRUE,
  axes = TRUE
)For the first component (centered aspect), we can observe northern (i.e., North, North East, North West) aspects are primarily located in the northern section of the park. For the second component (combination), we can see a large cluster of middle loading values (between 0 and 2) in the center of the park surrounded by lower loading values (< 0) and a few smaller clusters of high loading values (> 2) and low loading values (< -2) within the large middle loading value cluster.
We prepare inputs for a new lrren run similar to the
first example above. For consistency, we also reset the random number
generator to select similar pseudo-absence locations as the example
above.
set.seed(1234) # for similar locations as above example
presence <- unmark(gorillas)
marks(presence) <- data.frame(
  "presence" = rep(1, presence$n),
  "lon" = presence$x,
  "lat" = presence$y
)
marks(presence)$pc1 <- pc1[presence]
marks(presence)$pc2 <- pc2[presence]
absence <- rpoispp(0.00004, win = gorillas.extra$aspect)
marks(absence) <- data.frame(
  "presence" = rep(0, absence$n),
  "lon" = absence$x,
  "lat" = absence$y
)
marks(absence)$pc1 <- pc1[absence]
marks(absence)$pc2 <- pc2[absence]
obs_locs <- superimpose(absence, presence, check = FALSE)
marks(obs_locs)$presence <- as.factor(marks(obs_locs)$presence)
obs_locs <- marks(obs_locs)
obs_locs$id <- seq(1, nrow(obs_locs), 1)
obs_locs <- obs_locs[ , c(6, 2, 3, 1, 4, 5)]
predict_xy <- crds(pca_bands[[1]])
predict_locs <- as.data.frame(predict_xy)
predict_locs$pc1 <- extract(pca_bands[[1]], predict_xy)[ , 1]
predict_locs$pc2 <- extract(pca_bands[[2]], predict_xy)[ , 1]With the two principal components, we re-run the lrren
function.
start_time <- Sys.time() # record start time
pca_lrren <- lrren(
  obs_locs = obs_locs,
  predict_locs = predict_locs,
  predict = TRUE,
  cv = TRUE,
  balance = TRUE
)
end_time <- Sys.time() # record end time
pca_time <- end_time - start_time # calculate duration of lrren() exampleA single run of the lrren function took approximately
5.6 seconds on a machine with the features listed at the end of the
vignette.
We display the estimated ecological niche within a space of Principal
Component 1 by Principal Component 2 using the plot_obs
function. We use the default two-tailed alpha level
(alpha = 0.05) and the default colors where the yellow
color denotes areas with covariate data combinations where we have
sparse observations. We limit the color key for the log relative risk
surface to (-1, 1).
We display the estimated ecological niche predicted to the study area
within geographic space using the plot_predict function. We
use the default two-tailed alpha level (alpha = 0.05), and
we limit the color key for the log relative risk prediction to (-1,
1).
We display the internal 10-fold cross-validation diagnostics using
the plot_cv function. We use the default two-tailed alpha
level (alpha = 0.05).
Based on only the first two components of the Principal Component Analysis of seven covariates, we detected one ecological niche of gorillas compared to pseudo-absence points drawn randomly from within the park. Presence and absences appear separated more by Principal Component 2 (loading values between 1 and 4) than Principal Component 1. This is reflected in geographic space where the gorilla niche is located in an area similar to the large cluster of middle loading values of Principal Component 2, which is located in the central, northwestern section of the park. The log relative risk estimate accurately predicted about 75% of the gorilla nesting sites, which is markedly improved from our first example above (60%).
The two components captured about 64% of the variation across the seven covariates, which may not be important in separating presence locations from absence locations within covariate space. Further studies can assess different combinations of principal components (e.g., Principal Component 2 and Principal Component 3) or include another yet unavailable covariate that may be ecologically important for gorillas.
The original study by Funwi-Gabga and Mateu (2012) used kernel density estimation techniques in geographic space (see: Figure 5a in reference). Our method applies a kernel density estimation technique in covariate space and shows results similar to the inhomogeneous spatial point process model results of the original study (see: Figure 10a in reference), which incorporates the same covariate information to predict the spatial distribution of gorilla nesting sites. Our approach predicted more nesting sites in the western section of the park than in the original study.
Now we perform a sensitivity analysis of our ecological niche model.
For example, let us assume the investigators observed nesting sites at a
distance, and there is some spatial uncertainty in the exact coordinates
of the nesting sites. Let us also assume there is more uncertainty about
where the nesting sites of the ‘major’ gorilla group were located
(within 100 meters) than the ‘minor’ gorilla group (within 50 meters).
We can examine the influence of this type of uncertainty on our
estimated ecological niche model and its predicted spatial distribution
in the park. We assume there is no spatial uncertainty in the spatial
location of pseudo-absence points (0.1 meters). The three groups must be
categorized within a new feature named levels within the
observation data.
We start by preparing the observation data. Here, the data must be a marked planar point pattern of class ‘ppp’.
# Preserve 'group' feature from 'gorillas' data within {spatstat.data}
## Assign as a new mark named 'level'
marks(presence)$levels <- marks(gorillas)$group
# Assign a third 'level' for the pseudo-absences
marks(absence)$levels <- "none"
# Combine
obs_locs <- superimpose(absence, presence, check = FALSE)
# Set variables as class 'factor'
marks(obs_locs)$presence <- as.factor(marks(obs_locs)$presence)
marks(obs_locs)$levels <- as.factor(marks(obs_locs)$levels)
# Create 'id' feature
marks(obs_locs)$id <- seq(1, obs_locs$n, 1)
# Reorder and drop the two covariate features
marks(obs_locs) <- marks(obs_locs)[ , c(7, 2, 3, 1, 6)]The two covariate values will be assigned to all points in every iteration and are drawn from a list of objects of class ‘im’. Here, we use the same variables as above (slope gradient and distance from water), centered and scaled.
Run the perlrren function within the envi
package. We use the default settings and 10 simulated iterations. The
radii argument is a string of numeric values in the order
of the levels feature of class ‘factor’ within the
obs_locs object.
n_sim <- 100
start_time <- Sys.time() # record start time
test_perlrren <- perlrren(
  obs_ppp = obs_locs,
  covariates = ims,
  radii = c(100, 50, 0.1),
  n_sim = n_sim
)
end_time <- Sys.time() # record end time
perlrren_time <- end_time - start_time # calculate duration of perlrren() exampleA single (non-parallel) run of the perlrren function
with 100 iterations took approximately 1.9 minutes on a machine with the
features listed at the end of the vignette. In practice, a larger number
of simulated iterations would be recommended (e.g., n = 10,000).
We display the summary statistics from the sensitivity analysis with
the plot_perturb function. Here, we use the default
settings, except we limit the color key for the log relative risk
surface and log relative risk prediction to (-1, 1). We only display the
plots pertaining to the proportion of iterations with a significant
p-value, both in covariate space and predicted into geographic space
(i.e., the park).
plot_perturb(
  test_perlrren,
  cov_labs = c("Component 1", "Component 2"),
  cref0 = "EPSG:32632",
  cref1 = "EPSG:4326",
  lower_lrr = -1,
  upper_lrr = 1
)The uncertainty in the spatial coordinates of gorilla nesting sites greatly influences the estimated ecological niche of gorillas (based on the two principal components) compared to pseudo-absence points drawn randomly from within the park. The significant hot spot (i.e., ecological niche) within the covariate space diminished in size when adding this type of spatial uncertainty to the model. The geographic location of areas the model cannot distinguish as suitable or unsuitable for gorilla nesting locations is in the center of the park or the southern edge of the secondary forest with a southern facing aspect.
## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] terra_1.8-60           spatstat_3.4-0         spatstat.linnet_3.3-1 
##  [4] spatstat.model_3.4-0   rpart_4.1.24           spatstat.explore_3.5-2
##  [7] nlme_3.1-168           spatstat.random_3.4-1  spatstat.geom_3.5-0   
## [10] spatstat.univar_3.1-4  spatstat.data_3.1-8    sf_1.0-21             
## [13] RStoolbox_1.0.2.1      envi_1.0.1             knitr_1.50            
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3             pROC_1.18.5           deldir_2.0-4         
##   [4] tcltk_4.5.1           rlang_1.1.6           magrittr_2.0.3       
##   [7] e1071_1.7-16          compiler_4.5.1        mgcv_1.9-3           
##  [10] vctrs_0.6.5           maps_3.4.3            reshape2_1.4.4       
##  [13] cvAUC_1.1.4           stringr_1.5.1         pkgconfig_2.0.3      
##  [16] fastmap_1.2.0         rmarkdown_2.29        prodlim_2025.04.28   
##  [19] purrr_1.1.0           xfun_0.52             cachem_1.1.0         
##  [22] jsonlite_2.0.0        goftest_1.2-3         recipes_1.3.1        
##  [25] spatstat.utils_3.1-5  parallel_4.5.1        R6_2.6.1             
##  [28] bslib_0.9.0           stringi_1.8.7         RColorBrewer_1.1-3   
##  [31] parallelly_1.45.1     lubridate_1.9.4       jquerylib_0.1.4      
##  [34] Rcpp_1.1.0            iterators_1.0.14      tensor_1.5.1         
##  [37] future.apply_1.20.0   fields_16.3.1         Matrix_1.7-3         
##  [40] splines_4.5.1         nnet_7.3-20           timechange_0.3.0     
##  [43] tidyselect_1.2.1      rstudioapi_0.17.1     abind_1.4-8          
##  [46] yaml_2.3.10           timeDate_4041.110     doParallel_1.0.17    
##  [49] codetools_0.2-20      curl_6.4.0            misc3d_0.9-1         
##  [52] listenv_0.9.1         doRNG_1.8.6.2         lattice_0.22-7       
##  [55] tibble_3.3.0          plyr_1.8.9            withr_3.0.2          
##  [58] ROCR_1.0-11           evaluate_1.0.4        future_1.67.0        
##  [61] survival_3.8-3        units_0.8-7           proxy_0.4-27         
##  [64] polyclip_1.10-7       exactextractr_0.10.0  pillar_1.11.0        
##  [67] rngtools_1.5.2        KernSmooth_2.23-26    stats4_4.5.1         
##  [70] foreach_1.5.2         generics_0.1.4        sp_2.2-0             
##  [73] ggplot2_3.5.2         scales_1.4.0          globals_0.18.0       
##  [76] class_7.3-23          glue_1.8.0            tools_4.5.1          
##  [79] data.table_1.17.8     ModelMetrics_1.2.2.2  gower_1.0.2          
##  [82] sparr_2.3-16          dotCall64_1.2         XML_3.99-0.18        
##  [85] grid_4.5.1            Cairo_1.6-2           tidyr_1.3.1          
##  [88] ipred_0.9-15          raster_3.6-32         cli_3.6.5            
##  [91] spatstat.sparse_3.1-0 spam_2.11-1           viridisLite_0.4.2    
##  [94] lava_1.8.1            doFuture_1.1.2        concaveman_1.1.0     
##  [97] dplyr_1.1.4           V8_6.0.6              gtable_0.3.6         
## [100] pls_2.8-5             sass_0.4.10           digest_0.6.37        
## [103] classInt_0.4-11       caret_7.0-1           farver_2.1.2         
## [106] htmltools_0.5.8.1     lifecycle_1.0.4       hardhat_1.4.1        
## [109] MASS_7.3-65