SpaDES
simulationsAs part of a reproducible work flow, caching of various function
calls are a critical component. Down the road, it is likely that an
entire work flow from raw data to publication, decision support, report
writing, presentation building etc., could be built and be reproducible
anywhere, on demand. The reproducible::Cache function is
built to work with any R function. However, it becomes very powerful in
a SpaDES context because we can build large, powerful
applications that are transparent and tied to the raw data that may be
many conceptual steps upstream in the workflow. To do this, we
have built several customizations within the SpaDES
package. Important to this is dealing correctly with the
simList, which is an object that has slot that is an
environment. But more important are the various tools that can be used
at higher levels, i.e., not just for “standard” functions.
SpaDESSome of the details of the simList-specific features of
this Cache function include:
The function converts all elements that have an environment as
part of their attributes into a format that has no unique environment
attribute, using format if a function, and
as.list in the case of the simList
environment.
When used within SpaDES modules, Cache
(capital C) does not require that the argument cachePath be
specified. If called from inside a SpaDES module, Cache
will use the cachePath argument from a call to
cachePath(sim), taking the sim from the call
stack. Similarly, if no cachePath argument is specified,
then it will use getOption("spades.cachePath"), which will,
by default, be a temporary location with no persistence between R
sessions! To persist between sessions, use
SpaDES::setPaths() every session.
In a SpaDES context, there are several levels of caching
that can be used as part of a reproducible workflow. Each level can be
used to a modeller’s advantage; and, all can be – and are often – used
concurrently.
spades levelAnd entire call to spades can be cached. This will have
the effect of eliminating any stochasticity in the model as the output
will simply be the cached version of the simList. This is
likely most useful in situations where reproducibility is more important
than “new” stochasticity (e.g., building decision support
systems, apps, final version of a manuscript).
## Warning: package 'reproducible' was built under R version 4.5.1library(SpaDES.core)
mySim <- simInit(
  times = list(start = 0.0, end = 3.0),
  params = list(
    .globals = list(stackName = "landscape", burnStats = "testStats"),
    randomLandscapes = list(.plotInitialTime = NA),
    fireSpread = list(.plotInitialTime = NA)
  ),
  modules = list("randomLandscapes", "fireSpread"),
  paths = list(modulePath = getSampleModules(tempdir()))
)This functionality can be achieved within a spades
call.
# compare caching ... run once to create cache
system.time({
  outSim <- spades(Copy(mySim), cache = TRUE, notOlderThan = Sys.time())
})## Sep17 08:48:59 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.## Sep17 08:48:59 chckpn:init total elpsd: 0.3 secs | 0 checkpoint init 0## Sep17 08:48:59 save  :init total elpsd: 0.31 secs | 0 save init 0## Sep17 08:48:59 prgrss:init total elpsd: 0.31 secs | 0 progress init 0## Sep17 08:48:59 load  :init total elpsd: 0.31 secs | 0 load init 0## Sep17 08:48:59 rndmLn:init total elpsd: 0.32 secs | 0 randomLandscapes init 1## Sep17 08:49:01 rndmLn:init New objects created:## Sep17 08:49:01 rndmLn:init <char>
## Sep17 08:49:01 rndmLn:init  1:  landscape## Sep17 08:49:01 frSprd:init total elpsd: 2.8 secs | 0 fireSpread init 1## Sep17 08:49:01 frSprd:init fireSpread## Sep17 08:49:01 frSprd:init New objects created:## Sep17 08:49:01 frSprd:init <char>
## Sep17 08:49:01 frSprd:init  1:  testStats## Sep17 08:49:01 frSprd:burn total elpsd: 2.8 secs | 1 fireSpread burn 5## Sep17 08:49:01 frSprd:stats total elpsd: 2.9 secs | 1 fireSpread stats 5## Sep17 08:49:01 frSprd:stats fireSpread## Sep17 08:49:01 frSprd:burn total elpsd: 2.9 secs | 2 fireSpread burn 5## Sep17 08:49:01 frSprd:stats total elpsd: 2.9 secs | 2 fireSpread stats 5## Sep17 08:49:01 frSprd:stats fireSpread## Sep17 08:49:01 frSprd:burn total elpsd: 2.9 secs | 3 fireSpread burn 5## Sep17 08:49:01 frSprd:stats total elpsd: 3 secs | 3 fireSpread stats 5## Sep17 08:49:01 frSprd:stats fireSpread## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.## Saving large object (fn: spades , cacheId: f0413b6e4b5b05a7 ) to Cache : 10.3
##   Mb##  Done!## Saved! Cache file: f0413b6e4b5b05a7.rds; fn: spades##    user  system elapsed 
##    0.59    0.02    5.11Note that if there were any visualizations (here we turned them off
with .plotInitialTime = NA above) they will happen the
first time through, but not the cached times.
## Object to retrieve (fn: spades, f0413b6e4b5b05a7.rds) ...## Loaded! Cached result from previous spades call## from  module##    user  system elapsed 
##    0.19    0.00    0.87## [1] "Names: 2 string mismatches"                              
## [2] "Length mismatch: comparison on first 3 components"       
## [3] "Component 2: Modes: numeric, NULL"                       
## [4] "Component 2: Lengths: 4, 0"                              
## [5] "Component 2: target is numeric, current is NULL"         
## [6] "Component 3: target is NULL, current is PackedSpatRaster"If the parameter .useCache in the module’s metadata is
set to TRUE, then every event in the module will
be cached. That means that every time that module is called from within
a spades() call, Cache will be called. Only
the objects inside the simList that correspond to the
inputObjects or the outputObjects from the
module metadata will be assessed for caching. For general use,
module-level caching would be mostly useful for modules that have no
stochasticity, such as data-preparation modules, GIS modules etc.
In this example, we will use the cache on the
randomLandscapes module. This means that each subsequent
call to spades will result in identical outputs from the
randomLandscapes module (only!). This would be useful when
only one random landscape is needed simply for trying something out, or
putting into production code (e.g., publication, decision
support, etc.).
# Module-level
params(mySim)$randomLandscapes$.useCache <- TRUE
system.time({
  randomSim <- spades(Copy(mySim), .plotInitialTime = NA,
                      notOlderThan = Sys.time(), debug = TRUE)
})## Sep17 08:49:05 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.## Sep17 08:49:05 chckpn:init eventTime moduleName eventType eventPriority## Sep17 08:49:05 chckpn:init 0         checkpoint init      0## Sep17 08:49:05 save  :init 0         save       init      0## Sep17 08:49:05 prgrss:init 0         progress   init      0## Sep17 08:49:05 load  :init 0         load       init      0## Sep17 08:49:05 rndmLn:init 0         randomLandscapes init      1## Sep17 08:49:05 rndmLn:init Saving large object (fn: doEvent.randomLandscapes , cacheId: 668f5a92969f2fff )
## Sep17 08:49:05 rndmLn:init   to Cache : 10.2 Mb##  Done!
## ## Sep17 08:49:06 rndmLn:init Saved! Cache file: 668f5a92969f2fff.rds; fn: doEvent.randomLandscapes## Sep17 08:49:06 rndmLn:init New objects created:## Sep17 08:49:06 rndmLn:init <char>
## Sep17 08:49:06 rndmLn:init  1:  landscape## Sep17 08:49:06 frSprd:init 0         fireSpread       init      1## Sep17 08:49:06 frSprd:init fireSpread## Sep17 08:49:06 frSprd:init New objects created:## Sep17 08:49:06 frSprd:init <char>
## Sep17 08:49:06 frSprd:init  1:  testStats## Sep17 08:49:06 frSprd:burn 1         fireSpread       burn      5## Sep17 08:49:06 frSprd:stats 1         fireSpread       stats     5## Sep17 08:49:06 frSprd:stats fireSpread## Sep17 08:49:06 frSprd:burn 2         fireSpread       burn      5## Sep17 08:49:06 frSprd:stats 2         fireSpread       stats     5## Sep17 08:49:06 frSprd:stats fireSpread## Sep17 08:49:06 frSprd:burn 3         fireSpread       burn      5## Sep17 08:49:06 frSprd:stats 3         fireSpread       stats     5## Sep17 08:49:06 frSprd:stats fireSpread## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.##    user  system elapsed 
##    0.19    0.00    1.50# faster the second time
system.time({
  randomSimCached <- spades(Copy(mySim), .plotInitialTime = NA, debug = TRUE)
})## Sep17 08:49:07 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.## Sep17 08:49:07 chckpn:init eventTime moduleName eventType eventPriority## Sep17 08:49:07 chckpn:init 0         checkpoint init      0## Sep17 08:49:07 save  :init 0         save       init      0## Sep17 08:49:07 prgrss:init 0         progress   init      0## Sep17 08:49:07 load  :init 0         load       init      0## Sep17 08:49:07 rndmLn:init 0         randomLandscapes init      1## Sep17 08:49:07 rndmLn:init Object to retrieve (fn: doEvent.randomLandscapes, 668f5a92969f2fff.rds) ...## Sep17 08:49:07 rndmLn:init Loaded! Cached result from previous doEvent.randomLandscapes call## Sep17 08:49:07 rndmLn:init for init event in randomLandscapes module## Sep17 08:49:07 rndmLn:init randomLandscapes## Sep17 08:49:07 rndmLn:init New objects created:## Sep17 08:49:07 rndmLn:init <char>
## Sep17 08:49:07 rndmLn:init  1:  landscape## Sep17 08:49:07 frSprd:init 0         fireSpread       init      1## Sep17 08:49:07 frSprd:init fireSpread## Sep17 08:49:07 frSprd:init New objects created:## Sep17 08:49:07 frSprd:init <char>
## Sep17 08:49:07 frSprd:init  1:  testStats## Sep17 08:49:07 frSprd:burn 1         fireSpread       burn      5## Sep17 08:49:07 frSprd:stats 1         fireSpread       stats     5## Sep17 08:49:07 frSprd:stats fireSpread## Sep17 08:49:07 frSprd:burn 2         fireSpread       burn      5## Sep17 08:49:07 frSprd:stats 2         fireSpread       stats     5## Sep17 08:49:07 frSprd:stats fireSpread## Sep17 08:49:07 frSprd:burn 3         fireSpread       burn      5## Sep17 08:49:07 frSprd:stats 3         fireSpread       stats     5## Sep17 08:49:07 frSprd:stats fireSpread## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.##    user  system elapsed 
##    0.18    0.00    0.77Test that only layers produced in randomLandscapes are
identical, not fireSpread.
layers <- list("DEM", "forestAge", "habitatQuality", "percentPine", "Fires")
same <- lapply(layers, function(l) {
  identical(randomSim$landscape[[l]], randomSimCached$landscape[[l]])
})
names(same) <- layers
print(same) # Fires is not same because all non-init events in fireSpread are not cached## $DEM
## [1] TRUE
## 
## $forestAge
## [1] TRUE
## 
## $habitatQuality
## [1] TRUE
## 
## $percentPine
## [1] TRUE
## 
## $Fires
## [1] FALSEIf the parameter .useCache in the module’s metadata is
set to a character or character vector, then that or those
event(s), identified by their name, will be cached. That means that
every time the event is called from within a spades call,
Cache will be called. Only the objects inside the
simList that correspond to the inputObjects or
the outputObjects as defined in the module metadata will be
assessed for caching inputs or outputs, respectively. The fact that all
and only the named inputObjects and
outputObjects are cached and returned may be inefficient
(i.e., it may cache more objects than are necessary) for
individual events.
Similar to module-level caching, event-level caching would be mostly
useful for events that have no stochasticity, such as data-preparation
events, GIS events etc. Here, we don’t change the module-level caching
for randomLandscapes, but we add to it a cache for only the
“init” event for fireSpread.
params(mySim)$fireSpread$.useCache <- "init"
system.time({
  randomSim <- spades(Copy(mySim), .plotInitialTime = NA,
                      notOlderThan = Sys.time(), debug = TRUE)
})## Sep17 08:49:08 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.## Sep17 08:49:08 chckpn:init eventTime moduleName eventType eventPriority## Sep17 08:49:08 chckpn:init 0         checkpoint init      0## Sep17 08:49:08 save  :init 0         save       init      0## Sep17 08:49:08 prgrss:init 0         progress   init      0## Sep17 08:49:08 load  :init 0         load       init      0## Sep17 08:49:08 rndmLn:init 0         randomLandscapes init      1## Sep17 08:49:08 rndmLn:init Saving large object (fn: doEvent.randomLandscapes , cacheId: 668f5a92969f2fff )
## Sep17 08:49:08 rndmLn:init   to Cache : 10.2 Mb##  Done!
## ## Sep17 08:49:09 rndmLn:init Saved! Cache file: 668f5a92969f2fff.rds; fn: doEvent.randomLandscapes## Sep17 08:49:09 rndmLn:init New objects created:## Sep17 08:49:09 rndmLn:init <char>
## Sep17 08:49:09 rndmLn:init  1:  landscape## Sep17 08:49:09 frSprd:init 0         fireSpread       init      1## Sep17 08:49:09 frSprd:init Saving large object (fn: doEvent.fireSpread , cacheId: d140bfb5b6dad8f2 ) to
## Sep17 08:49:09 frSprd:init   Cache : 10.3 Mb##  Done!
## ## Sep17 08:49:10 frSprd:init Saved! Cache file: d140bfb5b6dad8f2.rds; fn: doEvent.fireSpread## Sep17 08:49:10 frSprd:init New objects created:## Sep17 08:49:10 frSprd:init <char>
## Sep17 08:49:10 frSprd:init  1:  testStats## Sep17 08:49:10 frSprd:burn 1         fireSpread       burn      5## Sep17 08:49:10 frSprd:stats 1         fireSpread       stats     5## Sep17 08:49:10 frSprd:stats fireSpread## Sep17 08:49:10 frSprd:burn 2         fireSpread       burn      5## Sep17 08:49:10 frSprd:stats 2         fireSpread       stats     5## Sep17 08:49:10 frSprd:stats fireSpread## Sep17 08:49:10 frSprd:burn 3         fireSpread       burn      5## Sep17 08:49:10 frSprd:stats 3         fireSpread       stats     5## Sep17 08:49:10 frSprd:stats fireSpread## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.##    user  system elapsed 
##    0.05    0.00    2.51# faster the second time
system.time({
  randomSimCached <- spades(Copy(mySim), .plotInitialTime = NA, debug = TRUE)
})## Sep17 08:49:11 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.## Sep17 08:49:11 chckpn:init eventTime moduleName eventType eventPriority## Sep17 08:49:11 chckpn:init 0         checkpoint init      0## Sep17 08:49:11 save  :init 0         save       init      0## Sep17 08:49:11 prgrss:init 0         progress   init      0## Sep17 08:49:11 load  :init 0         load       init      0## Sep17 08:49:11 rndmLn:init 0         randomLandscapes init      1## Sep17 08:49:11 rndmLn:init Object to retrieve (fn: doEvent.randomLandscapes, 668f5a92969f2fff.rds) ...## Sep17 08:49:11 rndmLn:init Loaded! Cached result from previous doEvent.randomLandscapes call## Sep17 08:49:11 rndmLn:init for init event in randomLandscapes module## Sep17 08:49:11 rndmLn:init randomLandscapes## Sep17 08:49:11 rndmLn:init New objects created:## Sep17 08:49:11 rndmLn:init <char>
## Sep17 08:49:11 rndmLn:init  1:  landscape## Sep17 08:49:11 frSprd:init 0         fireSpread       init      1## Sep17 08:49:11 frSprd:init Object to retrieve (fn: doEvent.fireSpread, d140bfb5b6dad8f2.rds) ...## Sep17 08:49:11 frSprd:init Loaded! Cached result from previous doEvent.fireSpread call## Sep17 08:49:11 frSprd:init for init event in fireSpread module## Sep17 08:49:11 frSprd:init fireSpread## Sep17 08:49:11 frSprd:burn 1         fireSpread       burn      5## Sep17 08:49:11 frSprd:stats 1         fireSpread       stats     5## Sep17 08:49:11 frSprd:stats fireSpread## Sep17 08:49:11 frSprd:stats New objects created:## Sep17 08:49:11 frSprd:stats <char>
## Sep17 08:49:11 frSprd:stats  1:  testStats## Sep17 08:49:11 frSprd:burn 2         fireSpread       burn      5## Sep17 08:49:11 frSprd:stats 2         fireSpread       stats     5## Sep17 08:49:12 frSprd:stats fireSpread## Sep17 08:49:12 frSprd:burn 3         fireSpread       burn      5## Sep17 08:49:12 frSprd:stats 3         fireSpread       stats     5## Sep17 08:49:12 frSprd:stats fireSpread## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.##    user  system elapsed 
##    0.06    0.02    1.04Any function can be cached using:
Cache(FUN = functionName, ...).
This will be a slight change to a function call, such as:
projectRaster(raster, crs = crs(newRaster)) to
Cache(projectRaster, raster, crs = crs(newRaster)).
ras <- terra::rast(terra::ext(0, 1e3, 0, 1e3), res = 1, vals = 1)
system.time({
  map <- Cache(SpaDES.tools::neutralLandscapeMap(ras),
               cachePath = cachePath(mySim),
               userTags = "neutralLandscapeMap",
               notOlderThan = Sys.time())
})## Warning: In (SpaDES.tools::neutralLandscapeMap(ras))(): nlm_mpd changes the
## dimensions of the RasterLayer if even ncols/nrows are choosen.## Saving large object (fn: SpaDES.tools::neutralLandscapeMap , cacheId:
##   94d035af43fc613d ) to Cache : 17.1 Mb##  Done!## Saved! Cache file: 94d035af43fc613d.rds; fn: SpaDES.tools::neutralLandscapeMap##    user  system elapsed 
##    0.22    0.06    2.53# faster the second time
system.time({
  mapCached <- Cache(SpaDES.tools::neutralLandscapeMap(ras),
                     cachePath = cachePath(mySim),
                     userTags = "neutralLandscapeMap")
})## Object to retrieve (fn: SpaDES.tools::neutralLandscapeMap,
##   94d035af43fc613d.rds) ...## Loaded! Cached result from previous SpaDES.tools::neutralLandscapeMap call##    user  system elapsed 
##    0.19    0.00    0.56## NOTE: can't use all.equal on SpatRaster (they are pointers); use compareGeom()
all.equal(map[], mapCached[]) ## [1] TRUESince the cache is simply a DBI database table, all
DBI functions will work as is. In addition, there are
several helpers in the reproducible package, including
showCache, keepCache and
clearCache, and the more advanced createCache,
loadFromCache, rmFromCache, and
saveToCache that may be useful. Also, one can access cached
items manually (rather than simply rerunning the same Cache
function again).
## Cache size:## Total (including Rasters): 4.3 Mb## Selected objects (not including Rasters): 4.3 Mb## get the RasterLayer that was produced with neutralLandscapeMap()
map <- loadFromCache(cacheId = cacheDB$cacheId, cachePath = cachePath(mySim))## Loaded! Cached result from previous  callIn general, we feel that a liberal use of Cache will
make a reusable and reproducible work flow. shiny apps can
be made, taking advantage of Cache. Indeed, much of the
difficulty in managing data sets and saving them for future use, can be
accommodated by caching.
simInit() --> many .inputObjects calls
spades() call --> many module calls --> many event calls --> many function callsLets say we start to introduce caching to this structure. We start
from the “inner” most functions that we could imaging Caching would be
useful. Lets say there are some GIS operations, like
raster::projectRaster, which operates on an input
shapefile. We can Cache the projectRaster call to make this
much faster, since it will always be the same result for a given input
raster.
If we look back at our structure above, we see that we still have
LOTS of places that are not Cached. That means that the
spades() call will still spawn many module calls, and many
event calls, just to get to the one Cache(projectRaster)
call which is cached. This function will likely be called many times.
This is good, but Cache does take some
time. So, even if Cache(projectRaster) takes only
0.02 seconds, calling it hundreds of times means maybe 4 seconds. If we
are doing this for many functions, then this will be too slow for some
purposes.
We can start putting Cache all up the sequence of calls.
Unfortunately, the way we use Cache at each of these levels is a bit
different, so we need a slightly different approach for each.
spades callspades(cache = TRUE)
This will cache the spades call, causing
stochasticity/randomness to be frozen.
Pass .useCache = TRUE as a parameter to the module,
during the simInit
Some modules are inherently non-random, such as GIS modules, or parameter fitting statistical modules. We expect these to be identical results each time, so we can safely cache the entire module.
parameters = list(
  FireModule = list(.useCache = TRUE)
)
mySim <- simInit(..., params = parameters)
mySimOut <- spades(mySim)The messaging should indicate the caching is happening on every event in that module.
Note: This option REQUIRES that the metadata in inputs
and outputs be exactly correct, i.e., all inputObjects and
outputObjects must be correctly identified and listed in
the defineModule metadata
If the module is cached, and there are errors when it is
run, it almost is guaranteed to be a problem with the
inputObjects and outputObjects incorrectly
specified.
Cache(<functionName>, <other arguments>)
This will allow fine scale control of individual function calls.
Once nested Caching is used all the way up to the
experiment (see SpaDES.experiment package)
level and even further up (e.g., if there is a shiny
module), then even very complex models can be put into a complete
workflow.
The current vision for SpaDES is that it will allow this
type of “data to decisions” complete workflow that allows for deep,
robust models, across disciplines, with easily accessible front ends,
that are quick and responsive to users, yet can handle data changes,
module changes, etc.