| Version: | 0.5.4 |
| Title: | Simulation Framework |
| Date: | 2021-10-11 |
| Depends: | R (≥ 3.0.0), Rcpp (≥ 0.8.6), lattice, parallel |
| Imports: | methods, stats4 |
| LinkingTo: | Rcpp |
| Description: | A general framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with minimal programming effort. The package provides functionality for drawing samples from a distribution or a finite population, for adding outliers and missing values, as well as for visualization of the simulation results. It follows a clear object-oriented design and supports parallel computing to increase computational performance. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| LazyLoad: | yes |
| Author: | Andreas Alfons [aut, cre], Yves Tille [ctb] (original R code of certain sampling algorithms), Alina Matei [ctb] (original R code of certain sampling algorithms) |
| Maintainer: | Andreas Alfons <alfons@ese.eur.nl> |
| Encoding: | UTF-8 |
| NeedsCompilation: | yes |
| Packaged: | 2021-10-11 10:19:15 UTC; andreas |
| Repository: | CRAN |
| Date/Publication: | 2021-10-14 11:10:02 UTC |
Simulation Framework
Description
A general framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with minimal programming effort. The package provides functionality for drawing samples from a distribution or a finite population, for adding outliers and missing values, as well as for visualization of the simulation results. It follows a clear object-oriented design and supports parallel computing to increase computational performance.
Details
The DESCRIPTION file:
| Package: | simFrame |
| Version: | 0.5.4 |
| Title: | Simulation Framework |
| Date: | 2021-10-11 |
| Depends: | R (>= 3.0.0), Rcpp (>= 0.8.6), lattice, parallel |
| Imports: | methods, stats4 |
| LinkingTo: | Rcpp |
| Description: | A general framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with minimal programming effort. The package provides functionality for drawing samples from a distribution or a finite population, for adding outliers and missing values, as well as for visualization of the simulation results. It follows a clear object-oriented design and supports parallel computing to increase computational performance. |
| License: | GPL (>= 2) |
| LazyLoad: | yes |
| Authors@R: | c(person("Andreas", "Alfons", email = "alfons@ese.eur.nl", role = c("aut", "cre")), person("Yves", "Tille", role = "ctb", comment = "original R code of certain sampling algorithms"), person("Alina", "Matei", role = "ctb", comment = "original R code of certain sampling algorithms")) |
| Author: | Andreas Alfons [aut, cre], Yves Tille [ctb] (original R code of certain sampling algorithms), Alina Matei [ctb] (original R code of certain sampling algorithms) |
| Maintainer: | Andreas Alfons <alfons@ese.eur.nl> |
| Encoding: | UTF-8 |
Index of help topics:
BasicVector-class Class "BasicVector"
ContControl Create contamination control objects
ContControl-class Class "ContControl"
DARContControl-class Class "DARContControl"
DCARContControl-class Class "DCARContControl"
DataControl-class Class "DataControl"
NAControl-class Class "NAControl"
NumericMatrix-class Class "NumericMatrix"
OptBasicVector-class Class "OptBasicVector"
OptCall-class Class "OptCall"
OptCharacter-class Class "OptCharacter"
OptContControl-class Class "OptContControl"
OptDataControl-class Class "OptDataControl"
OptNAControl-class Class "OptNAControl"
OptNumeric-class Class "OptNumeric"
OptSampleControl-class
Class "OptSampleControl"
SampleControl-class Class "SampleControl"
SampleSetup-class Class "SampleSetup"
SimControl-class Class "SimControl"
SimResults-class Class "SimResults"
Strata-class Class "Strata"
SummarySampleSetup-class
Class "SummarySampleSetup"
TwoStageControl-class Class "TwoStageControl"
VirtualContControl-class
Class "VirtualContControl"
VirtualDataControl-class
Class "VirtualDataControl"
VirtualNAControl-class
Class "VirtualNAControl"
VirtualSampleControl-class
Class "VirtualSampleControl"
aggregate-methods Method for aggregating simulation results
clusterRunSimulation Run a simulation experiment on a cluster
clusterSetup Set up multiple samples on a cluster
contaminate Contaminate data
draw Draw a sample
eusilcP Synthetic EU-SILC data
generate Generate data
getAdd Accessor and mutator functions for objects
getStrataLegend Utility functions for stratifying data
head-methods Methods for returning the first parts of an
object
inclusionProb Inclusion probabilities
length-methods Methods for getting the length of an object
plot-methods Plot simulation results
runSimulation Run a simulation experiment
setNA Set missing values
setup Set up multiple samples
simApply Apply a function to subsets
simBwplot Box-and-whisker plots
simDensityplot Kernel density plots
simFrame-package Simulation Framework
simSample Set up multiple samples
simXyplot X-Y plots
srs Random sampling
stratify Stratify data
summary-methods Methods for producing a summary of an object
tail-methods Methods for returning the last parts of an
object
Author(s)
Andreas Alfons [aut, cre]; C++ implementations of certain sampling algorithms are based on R code by Yves Tille and Alina Matei.
Maintainer: Andreas Alfons <alfons@ese.eur.nl>
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
Class "BasicVector"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Extends
Class "OptBasicVector", directly.
Methods
getStrataLegendsignature(x = "data.frame", design = "BasicVector"): get adata.framedescribing the strata.getStrataSplitsignature(x = "data.frame", design = "BasicVector"): get a list in which each element contains the indices of the observations belonging to the corresponding stratum.getStrataTablesignature(x = "data.frame", design = "BasicVector"): get adata.framedescribing the strata and containing the stratum sizes.getStratumSizessignature(x = "data.frame", design = "BasicVector"): get the stratum sizes.getStratumValuessignature(x = "data.frame", design = "BasicVector", split = "missing"): get the stratum number for each observation.getStratumValuessignature(x = "data.frame", design = "BasicVector", split = "list"): get the stratum number for each observation.simApplysignature(x = "data.frame", design = "BasicVector", fun = "function"): apply a function to subsets.simSapplysignature(x = "data.frame", design = "BasicVector", fun = "function"): apply a function to subsets.stratifysignature(x = "data.frame", design = "BasicVector"): stratify data.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
Examples
showClass("BasicVector")
Create contamination control objects
Description
Create objects of a class inheriting from "ContControl".
Usage
ContControl(..., type = c("DCAR", "DAR"))
Arguments
... |
arguments passed to |
type |
a character string specifying whether a control object of class
|
Value
If type = "DCAR", an object of class "DCARContControl".
If type = "DAR", an object of class "DARContControl".
Note
This constructor exists mainly for back compatibility with early draft
versions of simFrame.
Author(s)
Andreas Alfons
See Also
"DCARContControl", "DARContControl",
"ContControl"
Examples
## distributed completely at random
data(eusilcP)
sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20)
dcarc <- ContControl(target = "eqIncome", epsilon = 0.05,
dots = list(mean = 5e+05, sd = 10000), type = "DCAR")
contaminate(sam, dcarc)
## distributed at random
foo <- generate(size = 10, distribution = rnorm,
dots = list(mean = 0, sd = 2))
darc <- ContControl(target = "V1", epsilon = 0.2,
fun = function(x) x * 100, type = "DAR")
contaminate(foo, darc)
Class "ContControl"
Description
Virtual class for controlling contamination in a simulation experiment (used internally).
Objects from the Class
A virtual Class: No objects may be created from it.
Slots
target:Object of class
"OptCharacter"; a character vector specifying specifying the variables (columns) to be contaminated, orNULLto contaminate all variables (except the additional ones generated internally).epsilon:Object of class
"numeric"giving the contamination levels.grouping:Object of class
"character"specifying a grouping variable (column) to be used for contaminating whole groups rather than individual observations.aux:Object of class
"character"specifying an auxiliary variable (column) whose values are used as probability weights for selecting the items (observations or groups) to be contaminated.
Extends
Class "VirtualContControl", directly.
Class "OptContControl", by class "VirtualContControl",
distance 2.
Accessor and mutator methods
In addition to the accessor and mutator methods for the slots inherited from
"VirtualContControl", the following are available:
getGroupingsignature(x = "ContControl"): get slotgrouping.setGroupingsignature(x = "ContControl"): set slotgrouping.getAuxsignature(x = "ContControl"): get slotaux.setAuxsignature(x = "ContControl"): set slotaux.
Methods
In addition to the methods inherited from
"VirtualContControl", the following are available:
contaminatesignature(x = "data.frame", control = "ContControl"): contaminate data.showsignature(object = "ContControl"): print the object on the R console.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
The slot grouping was named group prior to version 0.2.
Renaming the slot was necessary since accessor and mutator functions were
introduced in this version and a function named getGroup already
exists.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"DCARContControl", "DARContControl",
"VirtualContControl", contaminate
Examples
showClass("ContControl")
Class "DARContControl"
Description
Class for controlling contamination in a simulation experiment. The values of the contaminated observations will be distributed at random (DAR), i.e., they will depend on on the original values.
Objects from the Class
Objects can be created by calls of the form
new("DARContControl", ...), DARContControl(...) or
ContControl(..., type="DAR").
Slots
target:Object of class
"OptCharacter"; a character vector specifying specifying the variables (columns) to be contaminated, orNULLto contaminate all variables (except the additional ones generated internally).epsilon:Object of class
"numeric"giving the contamination levels.grouping:Object of class
"character"specifying a grouping variable (column) to be used for contaminating whole groups rather than individual observations.aux:Object of class
"character"specifying an auxiliary variable (column) whose values are used as probability weights for selecting the items (observations or groups) to be contaminated.fun:Object of class
"function"generating the values of the contamination data. The original values of the observations to be contaminated will be passed as its first argument. Furthermore, it should return an object that can be coerced to adata.frame, containing the contamination data.dots:Object of class
"list"containing additional arguments to be passed tofun.
Extends
Class "ContControl", directly.
Class "VirtualContControl", by class "ContControl", distance 2.
Class "OptContControl", by class "ContControl", distance 3.
Details
With this control class, contamination is modeled as a two-step process. The
first step is to select observations to be contaminated, the second is to
model the distribution of the outliers. In this case, the original values
will be modified by the function given by slot fun, i.e., values of
the contaminated observations will depend on on the original values.
Accessor and mutator methods
In addition to the accessor and mutator methods for the slots inherited from
"ContControl", the following are available:
getFunsignature(x = "DARContControl"): get slotfun.setFunsignature(x = "DARContControl"): set slotfun.getDotssignature(x = "DARContControl"): get slotdots.setDotssignature(x = "DARContControl"): set slotdots.
Methods
Methods are inherited from "ContControl".
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
The slot grouping was named group prior to version 0.2.
Renaming the slot was necessary since accessor and mutator functions were
introduced in this version and a function named getGroup already
exists.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.
Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.
See Also
"DCARContControl", "ContControl",
"VirtualContControl", contaminate
Examples
foo <- generate(size = 10, distribution = rnorm,
dots = list(mean = 0, sd = 2))
cc <- DARContControl(target = "V1",
epsilon = 0.2, fun = function(x) x * 100)
contaminate(foo, cc)
Class "DCARContControl"
Description
Class for controlling contamination in a simulation experiment. The values of the contaminated observations will be distributed completely at random (DCAR), i.e., they will not depend on on the original values.
Objects from the Class
Objects can be created by calls of the form
new("DCARContControl", ...), DCARContControl(...) or
ContControl(..., type="DCAR") (the latter exists mainly for back
compatibility with early draft versions of simFrame).
Slots
target:Object of class
"OptCharacter"; a character vector specifying specifying the variables (columns) to be contaminated, orNULLto contaminate all variables (except the additional ones generated internally).epsilon:Object of class
"numeric"giving the contamination levels.grouping:Object of class
"character"specifying a grouping variable (column) to be used for contaminating whole groups rather than individual observations (the same values are used for all observations in the same group).aux:Object of class
"character"specifying an auxiliary variable (column) whose values are used as probability weights for selecting the items (observations or groups) to be contaminated.distribution:Object of class
"function"generating the values of the contamination data, e.g.,rnorm(the default) orrmvnormfrom package mvtnorm. It should take a non-negative integer as its first argument, giving the number of items to be created, and return an object that can be coerced to adata.frame, containing the contamination data.dots:Object of class
"list"containing additional arguments to be passed todistribution.
Extends
Class "ContControl", directly.
Class "VirtualContControl", by class "ContControl", distance 2.
Class "OptContControl", by class "ContControl", distance 3.
Details
With this control class, contamination is modeled as a two-step process. The
first step is to select observations to be contaminated, the second is to
model the distribution of the outliers. In this case, the values of the
contaminated observations will be generated by the function given by slot
fun and will not depend on on the original values.
Accessor and mutator methods
In addition to the accessor and mutator methods for the slots inherited from
"ContControl", the following are available:
getDistributionsignature(x = "DCARContControl"): get slotdistribution.setDistributionsignature(x = "DCARContControl"): set slotdistribution.getDotssignature(x = "DCARContControl"): get slotdots.setDotssignature(x = "DCARContControl"): set slotdots.
Methods
Methods are inherited from "ContControl".
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
The slot grouping was named group prior to version 0.2.
Renaming the slot was necessary since accessor and mutator functions were
introduced in this version and a function named getGroup already
exists.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.
Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.
See Also
"DARContControl", "ContControl",
"VirtualContControl", contaminate
Examples
data(eusilcP)
sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20)
cc <- DCARContControl(target = "eqIncome", epsilon = 0.05,
dots = list(mean = 5e+05, sd = 10000))
contaminate(sam, cc)
Class "DataControl"
Description
Class for controlling model-based generation of data.
Objects from the Class
Objects can be created by calls of the form new("DataControl", ...) or
DataControl(...).
Slots
size:Object of class
"numeric"giving the number of observations to be generated.distribution:Object of class
"function"generating the data, e.g.,rnorm(the default) orrmvnormfrom package mvtnorm. It should take a positive integer as its first argument, giving the number of observations to be generated, and return an object that can be coerced to adata.frame.dots:Object of class
"list"containing additional arguments to be passed todistribution.colnames:Object of class
"OptCharacter"; a character vector to be used as column names for the generateddata.frame, orNULL.
Extends
Class "VirtualDataControl", directly.
Class "OptDataControl", by class "VirtualDataControl", distance 2.
Accessor and mutator methods
getSizesignature(x = "DataControl"): get slotsize.setSizesignature(x = "DataControl"): set slotsize.getDistributionsignature(x = "DataControl"): get slotdistribution.setDistributionsignature(x = "DataControl"): set slotdistribution.getDotssignature(x = "DataControl"): get slotdots.setDotssignature(x = "DataControl"): set slotdots.getColnamessignature(x = "DataControl"): get slotcolnames.setColnamessignature(x = "DataControl"): set slotcolnames.
Methods
In addition to the methods inherited from
"VirtualDataControl", the following are available:
generatesignature(control = "DataControl"): generate data.showsignature(object = "DataControl"): print the object on the R console.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"VirtualDataControl", generate
Examples
dc <- DataControl(size = 10, distribution = rnorm,
dots = list(mean = 0, sd = 2))
generate(dc)
Class "NAControl"
Description
Class for controlling the insertion of missing values in a simulation experiment.
Objects from the Class
Objects can be created by calls of the form new("NAControl", ...) or
NAControl(...).
Slots
target:Object of class
"OptCharacter"; a character vector specifying the variables (columns) in which missing values should be inserted, orNULLto insert missing values in all variables (except the additional ones generated internally).NArate:Object of class
"NumericMatrix"giving the missing value rates, which may be selected individually for the target variables. In case of a vector, the same missing value rates are used for all target variables. In case of a matrix, on the other hand, the missing value rates to be used for each target variable are given by the respective column.grouping:Object of class
"character"specifying a grouping variable (column) to be used for setting whole groups toNArather than individual values.aux:Object of class
"character"specifying auxiliary variables (columns) whose values are used as probability weights for selecting the values to be set toNAin the respective target variables. If only one variable (column) is specified, it is used for all target variables.intoContamination:Object of class
"logical"indicating whether missing values should also be inserted into contaminated observations. The default is to insert missing values only into non-contaminated observations.
Extends
Class "VirtualNAControl", directly.
Class "OptNAControl", by class "VirtualNAControl",
distance 2.
Accessor and mutator methods
In addition to the accessor and mutator methods for the slots inherited from
"VirtualNAControl", the following are available:
getGroupingsignature(x = "NAControl"): get slotgrouping.setGroupingsignature(x = "NAControl"): set slotgrouping.getAuxsignature(x = "NAControl"): get slotaux.setAuxsignature(x = "NAControl"): set slotaux.getIntoContaminationsignature(x = "NAControl"): get slotintoContamination.setIntoContaminationsignature(x = "NAControl"): set slotintoContamination.
Methods
In addition to the methods inherited from
"VirtualNAControl", the following are available:
setNAsignature(x = "data.frame", control = "NAControl"): set missing values.showsignature(object = "NAControl"): print the object on the R console.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
Since version 0.3, this control class now allows to specify an auxiliary variable with probability weights for each target variable.
The slot grouping was named group prior to version 0.2.
Renaming the slot was necessary since accessor and mutator functions were
introduced in this version and a function named getGroup already
exists.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
Examples
data(eusilcP)
eusilcP$age[eusilcP$age < 0] <- 0 # this actually occurs
sam <- draw(eusilcP[, c("id", "age", "eqIncome")], size = 20)
## missing completely at random
mcarc <- NAControl(target = "eqIncome", NArate = 0.2)
setNA(sam, mcarc)
## missing at random
marc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "age")
setNA(sam, marc)
## missing not at random
mnarc <- NAControl(target = "eqIncome",
NArate = 0.2, aux = "eqIncome")
setNA(sam, mnarc)
Class "NumericMatrix"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "NumericMatrix" in the signature.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
Examples
showClass("NumericMatrix")
Class "OptBasicVector"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "OptBasicVector" in the signature.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
See Also
Examples
showClass("OptBasicVector")
Class "OptCall"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "OptCall" in the signature.
Author(s)
Andreas Alfons
Examples
showClass("OptCall")
Class "OptCharacter"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "OptCharacter" in the signature.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
Examples
showClass("OptCharacter")
Class "OptContControl"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "OptContControl" in the signature.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
See Also
Examples
showClass("OptContControl")
Class "OptDataControl"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "OptDataControl" in the signature.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
See Also
Examples
showClass("OptDataControl")
Class "OptNAControl"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "OptNAControl" in the signature.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
See Also
Examples
showClass("OptNAControl")
Class "OptNumeric"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "OptNumeric" in the signature.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
Examples
showClass("OptNumeric")
Class "OptSampleControl"
Description
Virtual class used internally for convenience.
Objects from the Class
A virtual Class: No objects may be created from it.
Methods
No methods defined with class "OptSampleControl" in the signature.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
See Also
Examples
showClass("OptSampleControl")
Class "SampleControl"
Description
Class for controlling the setup of samples.
Objects from the Class
Objects can be created by calls of the form new("SampleControl", ...)
or SampleControl(...).
Slots
design:Object of class
"BasicVector"specifying variables (columns) to be used for stratified sampling.grouping:Object of class
"BasicVector"specifying a grouping variable (column) to be used for sampling whole groups rather than individual observations.collect:Object of class
"logical"; if a grouping variable is specified and this isFALSE(which is the default value), groups are sampled directly. If grouping variable is specified and this isTRUE, individuals are sampled in a first step. In a second step, all individuals that belong to the same group as any of the sampled individuals are collected and added to the sample. If no grouping variable is specified, this is ignored.fun:Object of class
"function"to be used for sampling (defaults tosrs). It should return a vector containing the indices of the sampled items (observations or groups).size:Object of class
"OptNumeric"; an optional non-negative integer giving the number of items (observations or groups) to sample. In case of stratified sampling, a vector of non-negative integers, each giving the number of items to sample from the corresponding stratum, may be supplied.prob:Object of class
"OptBasicVector"; an optional numeric vector giving the probability weights, or a character string or logical vector specifying a variable (column) that contains the probability weights.dots:Object of class
"list"containing additional arguments to be passed tofun.k:Object of class
"numeric"; a single positive integer giving the number of samples to be set up.
Details
There are some restrictions on the argument names of the function
supplied to fun. If it needs population data as input,
the corresponding argument should be called x and should expect
a data.frame. If the sampling method only needs the population size
as input, the argument should be called N. Note that fun is
not expected to have both x and N as arguments, and that the
latter is much faster for stratified sampling or group sampling.
Furthermore, if the function has arguments for sample size and probability
weights, they should be called size and prob, respectively.
Note that a function with prob as its only argument is perfectly valid
(for probability proportional to size sampling). Further arguments of
fun may be supplied as a list via the slot dots.
Extends
Class "VirtualSampleControl", directly.
Class "OptSampleControl", by class "VirtualSampleControl", distance 2.
Accessor and mutator methods
In addition to the accessor and mutator methods for the slots inherited from
"VirtualSampleControl", the following are available:
getDesignsignature(x = "SampleControl"): get slotdesign.setDesignsignature(x = "SampleControl"): set slotdesign.getGroupingsignature(x = "SampleControl"): get slotgrouping.setGroupingsignature(x = "SampleControl"): set slotgrouping.getCollectsignature(x = "SampleControl"): get slotcollect.setCollectsignature(x = "SampleControl"): set slotcollect.getFunsignature(x = "SampleControl"): get slotfun.setFunsignature(x = "SampleControl"): set slotfun.getSizesignature(x = "SampleControl"): get slotsize.setSizesignature(x = "SampleControl"): set slotsize.getProbsignature(x = "SampleControl"): get slotprob.setProbsignature(x = "SampleControl"): set slotprob.getDotssignature(x = "SampleControl"): get slotdots.setDotssignature(x = "SampleControl"): set slotdots.
Methods
In addition to the methods inherited from
"VirtualSampleControl", the following are available:
clusterSetupsignature(cl = "ANY", x = "data.frame", control = "SampleControl"): set up multiple samples on a cluster.setupsignature(x = "data.frame", control = "SampleControl"): set up multiple samples.showsignature(object = "SampleControl"): print the object on the R console.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
The slots grouping and fun were named group and
method, respectively, prior to version 0.2. Renaming the slots was
necessary since accessor and mutator functions were introduced in this
version and functions named getGroup, getMethod and
setMethod already exist.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"VirtualSampleControl",
"TwoStageControl", "SampleSetup",
setup, draw
Examples
data(eusilcP)
## simple random sampling
srsc <- SampleControl(size = 20)
draw(eusilcP[, c("id", "eqIncome")], srsc)
## group sampling
gsc <- SampleControl(grouping = "hid", size = 10)
draw(eusilcP[, c("hid", "hid", "eqIncome")], gsc)
## stratified simple random sampling
ssrsc <- SampleControl(design = "region",
size = c(2, 5, 5, 3, 4, 5, 3, 5, 2))
draw(eusilcP[, c("id", "region", "eqIncome")], ssrsc)
## stratified group sampling
sgsc <- SampleControl(design = "region", grouping = "hid",
size = c(2, 5, 5, 3, 4, 5, 3, 5, 2))
draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgsc)
Class "SampleSetup"
Description
Class for set up samples.
Objects from the Class
Objects can be created by calls of the form new("SampleSetup", ...) or
SampleSetup(...).
However, objects are expected to be created by the function setup
or clusterSetup, these constructor functions are not supposed to
be called by the user.
Slots
indices:Object of class
"list"; each list element contains the indices of the sampled observations.prob:Object of class
"numeric"giving the inclusion probabilities.control:Object of class
"VirtualSampleControl"; the control object used to set up the samples.seed:Object of class
"list"containing the seeds of the random number generator before and after setting up the samples, respectively (for replication purposes).call:Object of class
"SimCall"; the function call used to set up the samples, orNULL.
Accessor methods
getIndicessignature(x = "SampleSetup"): get slotindices.getProbsignature(x = "SampleSetup"): get slotprob.getControlsignature(x = "SampleSetup"): get slotcontrol.getSeedsignature(x = "SampleSetup"): get slotseed.getCallsignature(x = "SampleSetup"): get slotcall.
Methods
clusterRunSimulationsignature(cl = "ANY", x = "data.frame", setup = "SampleSetup", nrep = "missing", control = "SimControl"): run a simulation experiment on a cluster.drawsignature(x = "data.frame", setup = "SampleSetup"): draw a sample.headsignature(x = "SampleSetup"): returns the first parts of set up samples.lengthsignature(x = "SampleSetup"): get the number of set up samples.runSimulationsignature(x = "data.frame", setup = "SampleSetup", nrep = "missing", control = "SimControl"): run a simulation experiment.showsignature(object = "SampleSetup"): print set up samples on the R console.summarysignature(object = "SampleSetup"): produce a summary of set up samples.tailsignature(x = "SampleSetup"): returns the last parts of set up samples.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
There are no mutator methods available since the slots are not supposed to be changed by the user.
Furthermore, the slot seed was added in version 0.2, and the slot
control was added in version 0.3. Since the control object used to
set up the samples is now stored, the redundant slots design,
grouping, collect and fun were removed. This has been
done as preparation for additional control classes for sampling, which will
be introduced in future versions.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"SampleControl", "TwoStageControl",
"VirtualSampleControl",
setup, draw
Examples
showClass("SampleSetup")
Class "SimControl"
Description
Class for controlling how simulation runs are performed.
Objects from the Class
Objects can be created by calls of the form new("SimControl", ...) or
SimControl(...).
Slots
contControl:Object of class
"OptContControl"; a control object for contamination, orNULL.NAControl:Object of class
"OptNAControl"; a control object for inserting missing values, orNULL.design:Object of class
"character"specifying variables (columns) to be used for splitting the data into domains. The simulations, including contamination and the insertion of missing values (unlessSAE=TRUE), are then performed on every domain.fun:Object of class
"function"to be applied in each simulation run.dots:Object of class
"list"containing additional arguments to be passed tofun.SAE:Object of class
"logical"indicating whether small area estimation will be used in the simulation experiment.
Details
There are some requirements for fun. It must return a numeric vector,
or a list with the two components values (a numeric vector) and
add (additional results of any class, e.g., statistical models).
Note that the latter is computationally slightly more expensive. A
data.frame is passed to fun in every simulation run. The
corresponding argument must be called x. If comparisons with the
original data need to be made, e.g., for evaluating the quality of imputation
methods, the function should have an argument called orig. If
different domains are used in the simulation, the indices of the current
domain can be passed to the function via an argument called domain.
For small area estimation, the following points have to be kept in mind. The
design for splitting the data must be supplied and SAE
must be set to TRUE. However, the data are not actually split into
the specified domains. Instead, the whole data set (sample) is passed to
fun. Also contamination and missing values are added to the whole
data (sample). Last, but not least, the function must have a domain
argument so that the current domain can be extracted from the whole data
(sample).
In every simulation run, fun is evaluated using try. Hence
no results are lost if computations fail in any of the simulation runs.
Accessor and mutator methods
getContControlsignature(x = "SimControl"): get slotContControl.setContControlsignature(x = "SimControl"): set slotContControl.getNAControlsignature(x = "SimControl"): get slotNAControl.setNAControlsignature(x = "SimControl"): set slotNAControl.getDesignsignature(x = "SimControl"): get slotdesign.setDesignsignature(x = "SimControl"): set slotdesign.getFunsignature(x = "SimControl"): get slotfun.setFunsignature(x = "SimControl"): set slotfun.getDotssignature(x = "SimControl"): get slotdots.setDotssignature(x = "SimControl"): set slotdots.getSAEsignature(x = "SimControl"): get slotSAE.setSAEsignature(x = "SimControl"): set slotSAE.
Methods
clusterRunSimulationsignature(cl = "ANY", x = "data.frame", setup = "missing", nrep = "numeric", control = "SimControl"): run a simulation experiment on a cluster.clusterRunSimulationsignature(cl = "ANY", x = "data.frame", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"): run a simulation experiment on a cluster.clusterRunSimulationsignature(cl = "ANY", x = "data.frame", setup = "SampleSetup", nrep = "missing", control = "SimControl"): run a simulation experiment on a cluster.clusterRunSimulationsignature(cl = "ANY", x = "VirtualDataControl", setup = "missing", nrep = "numeric", control = "SimControl"): run a simulation experiment on a cluster.clusterRunSimulationsignature(cl = "ANY", x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"): run a simulation experiment on a cluster.headsignature(x = "SimControl"): currently returns the object itself.runSimulationsignature(x = "data.frame", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "data.frame", setup = "SampleSetup", nrep = "missing", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "data.frame", setup = "missing", nrep = "numeric", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "data.frame", setup = "missing", nrep = "missing", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "missing", nrep = "numeric", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "missing", nrep = "missing", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"): run a simulation experiment.showsignature(object = "SimControl"): print the object on the R console.summarysignature(object = "SimControl"): currently returns the object itself.tailsignature(x = "SimControl"): currently returns the object itself.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
Examples
#### design-based simulation
set.seed(12345) # for reproducibility
data(eusilcP) # load data
## control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome", epsilon = 0.02,
fun = function(x) x * 25)
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02))
}
## combine these to "SimControl" object and run simulation
ctrl <- SimControl(contControl = cc, fun = sim)
results <- runSimulation(eusilcP, sc, control = ctrl)
## explore results
head(results)
aggregate(results)
tv <- mean(eusilcP$eqIncome) # true population mean
plot(results, true = tv)
#### model-based simulation
set.seed(12345) # for reproducibility
## function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
## control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.02, dots = list(mean = 15))
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.02),
median = median(x$value))
}
## combine these to "SimControl" object and run simulation
ctrl <- SimControl(contControl = cc, design = "group", fun = sim)
results <- runSimulation(dc, nrep = 50, control = ctrl)
## explore results
head(results)
aggregate(results)
plot(results, true = means)
Class "SimResults"
Description
Class for simulation results.
Objects from the Class
Objects can be created by calls of the form new("SimResults", ...) or
SimResults(...).
However, objects are expected to be created by the function
runSimulation or clusterRunSimulation, these
constructor functions are not supposed to be called by the user.
Slots
values:Object of class
"data.frame"containing the simulation results.add:Object of class
"list"containing additional simulation results, e.g., statistical models.design:Object of class
"character"giving the variables (columns) defining the domains used in the simulation experiment.colnames:Object of class
"character"giving the names of the columns ofvaluesthat contain the actual simulation results.epsilon:Object of class
"numeric"containing the contamination levels used in the simulation experiment.NArate:Object of class
"NumericMatrix"containing the missing value rates used in the simulation experiment.dataControl:Object of class
"OptDataControl"; the control object used for data generation in model-based simulation, orNULL.sampleControl:Object of class
"OptSampleControl"; the control object used for sampling in design-based simulation, orNULL.nrep:Object of class
"numeric"giving the number of repetitions of the simulation experiment (for model-based simulation or simulation based on real data).control:Object of class
"SimControl"; the control object used for running the simulations.seed:Object of class
"list"containing the seeds of the random number generator before and after the simulation experiment, respectively (for replication of the results).call:Object of class
"SimCall"; the function call used to run the simulation experiment, orNULL.
Accessor methods
getValuessignature(x = "SimResults"): get slotvalues.getAddsignature(x = "SimResults"): get slotadd.getDesignsignature(x = "SimResults"): get slotdesign.getColnamessignature(x = "SimResults"): get slotcolnames.getEpsilonsignature(x = "SimResults"): get slotepsilon.getNAratesignature(x = "SimResults"): get slotNArate.getDataControlsignature(x = "SimResults"): get slotdataControl.getSampleControlsignature(x = "SimResults"): get slotsampleControl.getNrepsignature(x = "SimResults"): get slotnrep.getControlsignature(x = "SimResults"): get slotcontrol.getSeedsignature(x = "SimResults"): get slotseed.getCallsignature(x = "SimResults"): get slotcall.
Methods
aggregatesignature(x = "SimResults"): aggregate simulation results.headsignature(x = "SimResults"): returns the first parts of simulation results.plotsignature(x = "SimResults", y = "missing"): selects a suitable graphical representation of the simulation results automatically.showsignature(object = "SimResults"): print simulation results on the R console.simBwplotsignature(x = "SimResults"): conditional box-and-whisker plot of simulation results.simDensityplotsignature(x = "SimResults"): conditional kernel density plot of simulation results.simXyplotsignature(x = "SimResults"): conditional x-y plot of simulation results.summarysignature(x = "SimResults"): produce a summary of simulation results.tailsignature(x = "SimResults"): returns the last parts of simulation results.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
There are no mutator methods available since the slots are not supposed to be changed by the user.
Furthermore, the slots dataControl, sampleControl, nrep
and control were added in version 0.3.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
runSimulation, simBwplot,
simDensityplot, simXyplot
Examples
showClass("SimResults")
Class "Strata"
Description
Class containing strata information for a data set.
Objects from the Class
Objects can be created by calls of the form new("Strata", ...) or
Strata(...).
However, objects are expected to be created by the function
stratify, these constructor functions are not supposed to be
called by the user.
Slots
values:Object of class
"integer"giving the stratum number for each observation.split:Object of class
"list"; each list element contains the indices of the observations belonging to the corresponding stratum.design:Object of class
"character"giving the variables (columns) defining the strata.nr:Object of class
"integer"giving the stratum numbers.legend:Object of class
"data.frame"describing the strata.size:Object of class
"numeric"giving the stratum sizes.call:Object of class
"OptCall"; the function call used to stratify the data, orNULL.
Accessor methods
getValuessignature(x = "Strata"): get slotvalues.getSplitsignature(x = "Strata"): get slotsplit.getDesignsignature(x = "Strata"): get slotdesign.getNrsignature(x = "Strata"): get slotnr.getLegendsignature(x = "Strata"): get slotlegend.getSizesignature(x = "Strata"): get slotsize.getCallsignature(x = "Strata"): get slotcall.
Methods
headsignature(x = "Strata"): returns the first parts of strata information.showsignature(object = "Strata"): print strata information on the R console.simApplysignature(x = "data.frame", design = "Strata", fun = "function"): apply a function to subsets.simSapplysignature(x = "data.frame", design = "Strata", fun = "function"): apply a function to subsets.summarysignature(object = "Strata"): produce a summary of strata information.tailsignature(x = "Strata"): returns the last parts of strata information.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
There are no mutator methods available since the slots are not supposed to be changed by the user.
Author(s)
Andreas Alfons
See Also
Examples
showClass("Strata")
Class "SummarySampleSetup"
Description
Class containing a summary of set up samples.
Objects from the Class
Objects can be created by calls of the form
new("SummarySampleSetup", ...) or SummarySampleSetup(...).
However, objects are expected to be created by the summary method for
class "SampleSetup", these constructor functions are not
supposed to be called by the user.
Slots
size:Object of class
"numeric"giving the size of each of the set up samples.
Accessor methods
getSizesignature(x = "SummarySampleSetup"): get slotsize.
Methods
showsignature(object = "SummarySampleSetup"): print a summary of set up samples on the R console.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Note
There are no mutator methods available since the slots are not supposed to be changed by the user.
Author(s)
Andreas Alfons
See Also
Examples
showClass("SummarySampleSetup")
Class "TwoStageControl"
Description
Class for controlling the setup of samples using a two-stage procedure.
Usage
TwoStageControl(..., fun1 = srs, fun2 = srs, size1 = NULL,
size2 = NULL, prob1 = NULL, prob2 = NULL,
dots1 = list(), dots2 = list())
Arguments
... |
the slots for the new object (see below). |
fun1 |
the function to be used for sampling in the first stage (the
first list component of slot |
fun2 |
the function to be used for sampling in the second stage (the
second list component of slot |
size1 |
the number of PSUs to sample in the first stage (the first list
component of slot |
size2 |
the number of items to sample in the second stage (the second
list component of slot |
prob1 |
the probability weights for the first stage (the first list
component of slot |
prob2 |
the probability weights for the second stage (the second list
component of slot |
dots1 |
additional arguments to be passed to the function for sampling
in the first stage (the first list component of slot |
dots2 |
additional arguments to be passed to the function for sampling
in the second stage (the second list component of slot |
Objects from the Class
Objects can be created by calls of the form new("TwoStageControl", ...)
or via the constructor TwoStageControl.
Slots
design:Object of class
"BasicVector"specifying variables (columns) to be used for stratified sampling in the first stage.grouping:Object of class
"BasicVector"specifying grouping variables (columns) to be used for sampling primary sampling units (PSUs) and secondary sampling units (SSUs), respectively.fun:Object of class
"list"; a list of length two containing the functions to be used for sampling in the first and second stage, respectively (defaults tosrsfor both stages). The functions should return a vector containing the indices of the sampled items.size:Object of class
"list"; a list of length two, where each component contains an optional non-negative integer giving the number of items to sample in the first and second stage, respectively. In case of stratified sampling in the first stage, a vector of non-negative integers, each giving the number of PSUs to sample from the corresponding stratum, may be supplied. For the second stage, a vector of non-negative integers giving the number of items to sample from each PSU may be used.prob:Object of class
"list"; a list of length two, where each component gives optional probability weights for the first and second stage, respectively. Each component may thereby be a numerical vector, or a character string or integer vector specifying a variable (column) that contains the probability weights.dots:Object of class
"list"; a list of length two, where each component is again a list containing additional arguments to be passed to the corresponding function for sampling infun.k:Object of class
"numeric"; a single positive integer giving the number of samples to be set up.
Details
There are some restrictions on the argument names of the functions for
sampling in fun. If the sampling method needs population data as
input, the corresponding argument should be called x and should expect
a data.frame. If it only needs the population size as input, the
argument should be called N. Note that the function is not expected
to have both x and N as arguments, and that the latter is
typically much faster. Furthermore, if the function has arguments for sample
size and probability weights, they should be called size and
prob, respectively. Note that a function with prob as its only
argument is perfectly valid (for probability proportional to size sampling).
Further arguments may be supplied as a list via the slot dots.
Extends
Class "VirtualSampleControl", directly.
Class "OptSampleControl", by class "VirtualSampleControl", distance 2.
Accessor and mutator methods
In addition to the accessor and mutator methods for the slots inherited from
"VirtualSampleControl", the following are available:
getDesignsignature(x = "TwoStageControl"): get slotdesign.setDesignsignature(x = "TwoStageControl"): set slotdesign.getGroupingsignature(x = "TwoStageControl"): get slotgrouping.setGroupingsignature(x = "TwoStageControl"): set slotgrouping.getCollectsignature(x = "TwoStageControl"): get slotcollect.setCollectsignature(x = "TwoStageControl"): set slotcollect.getFunsignature(x = "TwoStageControl"): get slotfun.setFunsignature(x = "TwoStageControl"): set slotfun.getSizesignature(x = "TwoStageControl"): get slotsize.setSizesignature(x = "TwoStageControl"): set slotsize.getProbsignature(x = "TwoStageControl"): get slotprob.setProbsignature(x = "TwoStageControl"): set slotprob.getDotssignature(x = "TwoStageControl"): get slotdots.setDotssignature(x = "TwoStageControl"): set slotdots.
Methods
In addition to the methods inherited from
"VirtualSampleControl", the following are available:
clusterSetupsignature(cl = "ANY", x = "data.frame", control = "TwoStageControl"): set up multiple samples on a cluster.setupsignature(x = "data.frame", control = "TwoStageControl"): set up multiple samples.showsignature(object = "TwoStageControl"): print the object on the R console.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
See Also
"VirtualSampleControl",
"SampleControl", "SampleSetup",
setup, draw
Examples
showClass("TwoStageControl")
Class "VirtualContControl"
Description
Virtual superclass for controlling contamination in a simulation experiment.
Objects from the Class
A virtual Class: No objects may be created from it.
Slots
target:Object of class
"OptCharacter"; a character vector specifying specifying the variables (columns) to be contaminated, orNULLto contaminate all variables (except the additional ones generated internally).epsilon:Object of class
"numeric"giving the contamination levels.
Extends
Class "OptContControl", directly.
Accessor and mutator methods
getTargetsignature(x = "VirtualContControl"): get slottarget.setTargetsignature(x = "VirtualContControl"): set slottarget.getEpsilonsignature(x = "VirtualContControl"): get slotepsilon.setEpsilonsignature(x = "VirtualContControl"): set slotepsilon.
Methods
headsignature(x = "VirtualContControl"): currently returns the object itself.lengthsignature(x = "VirtualContControl"): get the number of contamination levels to be used.showsignature(object = "VirtualContControl"): print the object on the R console.summarysignature(object = "VirtualContControl"): currently returns the object itself.tailsignature(x = "VirtualContControl"): currently returns the object itself.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"DCARContControl", "DARContControl",
"ContControl", contaminate
Examples
showClass("VirtualContControl")
Class "VirtualDataControl"
Description
Virtual superclass for controlling model-based generation of data.
Objects from the Class
A virtual Class: No objects may be created from it.
Extends
Class "OptDataControl", directly.
Methods
clusterRunSimulationsignature(cl = "ANY", x = "VirtualDataControl", setup = "missing", nrep = "numeric", control = "SimControl"): run a simulation experiment on a cluster.clusterRunSimulationsignature(cl = "ANY", x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"): run a simulation experiment on a cluster.headsignature(x = "VirtualContControl"): currently returns the object itself.runSimulationsignature(x = "VirtualDataControl", setup = "missing", nrep = "numeric", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "missing", nrep = "missing", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"): run a simulation experiment.summarysignature(object = "VirtualContControl"): currently returns the object itself.tailsignature(x = "VirtualContControl"): currently returns the object itself.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
Examples
showClass("VirtualDataControl")
Class "VirtualNAControl"
Description
Virtual superclass for controlling the insertion of missing values in a simulation experiment.
Objects from the Class
A virtual Class: No objects may be created from it.
Slots
target:Object of class
"OptCharacter"; a character vector specifying the variables (columns) in which missing values should be inserted, orNULLto insert missing values in all variables (except the additional ones generated internally).NArate:Object of class
"NumericMatrix"giving the missing value rates, which may be selected individually for the target variables. In case of a vector, the same missing value rates are used for all target variables. In case of a matrix, on the other hand, the missing value rates to be used for each target variable are given by the respective column.
Extends
Class "OptNAControl", directly.
Accessor and mutator methods
getTargetsignature(x = "VirtualNAControl"): get slottarget.setTargetsignature(x = "VirtualNAControl"): set slottarget.getNAratesignature(x = "VirtualNAControl"): get slotNArate.setNAratesignature(x = "VirtualNAControl"): set slotNArate.
Methods
headsignature(x = "VirtualNAControl"): currently returns the object itself.lengthsignature(x = "VirtualNAControl"): get the number of missing value rates to be used (the length in case of a vector or the number of rows in case of a matrix).showsignature(object = "VirtualNAControl"): print the object on the R console.summarysignature(object = "VirtualNAControl"): currently returns the object itself.tailsignature(x = "VirtualNAControl"): currently returns the object itself.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
Examples
showClass("VirtualNAControl")
Class "VirtualSampleControl"
Description
Virtual superclass for controlling the setup of samples.
Objects from the Class
A virtual Class: No objects may be created from it.
Slots
k:Object of class
"numeric", a single positive integer giving the number of samples to be set up.
Extends
Class "OptSampleControl", directly.
Accessor and mutator methods
getKsignature(x = "VirtualSampleControl"): get slotk.setKsignature(x = "VirtualSampleControl"): set slotk.
Methods
clusterRunSimulationsignature(cl = "ANY", x = "data.frame", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"): run a simulation experiment on a cluster.clusterRunSimulationsignature(cl = "ANY", x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"): run a simulation experiment on a cluster.drawsignature(x = "data.frame", setup = "VirtualSampleControl"): draw a sample.headsignature(x = "VirtualSampleControl"): currently returns the object itself.lengthsignature(x = "VirtualSampleControl"): get the number of samples to be set up.runSimulationsignature(x = "data.frame", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"): run a simulation experiment.runSimulationsignature(x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"): run a simulation experiment.showsignature(object = "VirtualSampleControl"): print the object on the R console.summarysignature(object = "VirtualSampleControl"): currently returns the object itself.tailsignature(x = "VirtualSampleControl"): currently returns the object itself.
UML class diagram
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame. Use
vignette("simFrame-intro") to view this vignette.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"SampleControl", "TwoStageControl",
"SampleSetup", setup, draw
Examples
showClass("VirtualSampleControl")
Accessor and mutator functions for objects
Description
Get values of slots of objects via accessor functions and set values via mutator functions. If no mutator methods are available, the slots of the corresponding objects are not supposed to be changed by the user.
Usage
getAdd(x)
getAux(x)
setAux(x, aux)
getCall(x, ...)
getCollect(x)
setCollect(x, collect)
getColnames(x)
setColnames(x, colnames)
getContControl(x)
setContControl(x, contControl)
getControl(x)
getDataControl(x)
getDesign(x)
setDesign(x, design)
getDistribution(x)
setDistribution(x, distribution)
getDots(x, ...)
setDots(x, dots, ...)
## S4 method for signature 'TwoStageControl'
getDots(x, stage = NULL)
## S4 method for signature 'TwoStageControl'
setDots(x, dots, stage = NULL)
getEpsilon(x)
setEpsilon(x, epsilon)
getFun(x, ...)
setFun(x, fun, ...)
## S4 method for signature 'TwoStageControl'
getFun(x, stage = NULL)
## S4 method for signature 'TwoStageControl'
setFun(x, fun, stage = NULL)
getGrouping(x)
setGrouping(x, grouping)
getIndices(x)
getIntoContamination(x)
setIntoContamination(x, intoContamination)
getK(x)
setK(x, k)
getLegend(x)
getNAControl(x)
setNAControl(x, NAControl)
getNArate(x)
setNArate(x, NArate)
getNr(x)
getNrep(x)
getProb(x, ...)
setProb(x, prob, ...)
## S4 method for signature 'TwoStageControl'
getProb(x, stage = NULL)
## S4 method for signature 'TwoStageControl'
setProb(x, prob, stage = NULL)
getSAE(x)
setSAE(x, SAE)
getSampleControl(x)
getSeed(x)
getSize(x, ...)
setSize(x, size, ...)
## S4 method for signature 'TwoStageControl'
getSize(x, stage = NULL)
## S4 method for signature 'TwoStageControl'
setSize(x, size, stage = NULL)
getSplit(x)
getTarget(x)
setTarget(x, target)
getValues(x)
Arguments
x |
an object. |
aux |
a character string specifying an auxiliary variable (see
|
collect |
a logical indicating whether groups should be collected after
sampling individuals or sampled directly (see
|
colnames |
a character vector specifying column names (see
|
contControl |
an object of class |
design |
a character vector specifying columns to be used for
stratification (see |
distribution |
a function generating data (see
|
dots |
additional arguments to be passed to a function (see
|
epsilon |
a numeric vector giving contamination levels (see
|
fun |
a function (see
|
grouping |
a character string specifying a grouping variable (see
|
intoContamination |
a logical indicating whether missing values should
also be inserted into contaminated observations (see
|
k |
a single positive integer giving the number of samples to be set up
(see |
NAControl |
an object of class |
NArate |
a numeric vector or matrix giving missing value rates (see
|
prob |
a numeric vector giving probability weights (see
|
SAE |
a logical indicating whether small area estimation will be used in
the simulation experiment (see |
size |
a non-negative integer or a vector of non-negative integers (see
|
stage |
optional integer; for certain slots of
|
target |
a character vector specifying target columns (see
|
... |
only used to allow for the |
Value
For accessor functions, the corresponding slot of x is returned.
For mutator functions, the corresponding slot of x is replaced.
Methods for function getAdd
signature(x = "SimResults")
Methods for functions getAux and setAux
signature(x = "ContControl")signature(x = "NAControl")
Methods for function getCall
signature(x = "SampleSetup")signature(x = "SimResults")signature(x = "Strata")
Methods for functions getCollect and setCollect
signature(x = "SampleControl")
Methods for function getColnames
signature(x = "DataControl")signature(x = "SimResults")
Methods for function setColnames
signature(x = "DataControl")
Methods for functions getContControl and setContControl
signature(x = "SimControl")
Methods for function getControl
signature(x = "SampleSetup")signature(x = "SimResults")
Methods for function getDataControl
signature(x = "SimResults")
Methods for function getDesign
signature(x = "SampleControl")signature(x = "TwoStageControl")signature(x = "SimControl")signature(x = "SimResults")signature(x = "Strata")
Methods for function setDesign
signature(x = "SampleControl")signature(x = "TwoStageControl")signature(x = "SimControl")
Methods for functions getDistribution and setDistribution
signature(x = "DataControl")signature(x = "DCARContControl")
Methods for functions getDots and setDots
signature(x = "DataControl")signature(x = "DARContControl")signature(x = "DCARContControl")signature(x = "SampleControl")signature(x = "TwoStageControl")signature(x = "SimControl")
Methods for function getEpsilon
signature(x = "SimResults")signature(x = "VirtualContControl")
Methods for function setEpsilon
signature(x = "VirtualContControl")
Methods for functions getFun and setFun
signature(x = "DARContControl")signature(x = "SampleControl")signature(x = "TwoStageControl")signature(x = "SimControl")
Methods for functions getGrouping and setGrouping
signature(x = "ContControl")signature(x = "NAControl")signature(x = "SampleControl")signature(x = "TwoStageControl")
Methods for function getIndices
signature(x = "SampleSetup")
Methods for functions getIntoContamination and setIntoContamination
signature(x = "NAControl")
Methods for functions getK and setK
signature(x = "VirtualSampleControl")
Methods for function getLegend
signature(x = "Strata")
Methods for functions getNAControl and setNAControl
signature(x = "SimControl")
Methods for function getNArate
signature(x = "SimResults")signature(x = "VirtualNAControl")
Methods for function setNArate
signature(x = "VirtualNAControl")
Methods for function getNr
signature(x = "Strata")
Methods for function getNrep
signature(x = "SimResults")
Methods for function getProb
signature(x = "SampleControl")signature(x = "TwoStageControl")signature(x = "SampleSetup")
Methods for function setProb
signature(x = "SampleControl")signature(x = "TwoStageControl")
Methods for functions getSAE and setSAE
signature(x = "SimControl")
Methods for function getSampleControl
signature(x = "SimResults")
Methods for function getSeed
signature(x = "SampleSetup")signature(x = "SimResults")
Methods for function getSize
signature(x = "DataControl")signature(x = "SampleControl")signature(x = "TwoStageControl")signature(x = "Strata")signature(x = "SummarySampleSetup")
Methods for function setSize
signature(x = "DataControl")signature(x = "SampleControl")signature(x = "TwoStageControl")
Methods for function getSplit
signature(x = "Strata")
Methods for functions getTarget and setTarget
signature(x = "VirtualContControl")signature(x = "VirtualNAControl")
Methods for function getValues
signature(x = "SimResults")signature(x = "Strata")
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
Examples
nc <- NAControl(NArate = 0.05)
getNArate(nc)
setNArate(nc, c(0.01, 0.03, 0.05, 0.07, 0.09))
getNArate(nc)
Method for aggregating simulation results
Description
Aggregate simulation results, i.e, split the data into subsets if applicable and compute summary statistics.
Usage
## S4 method for signature 'SimResults'
aggregate(x, select = NULL, FUN = mean, ...)
Arguments
x |
the simulation results to be aggregated, i.e., an object of class
|
select |
a character vector specifying the columns to be aggregated. It
must be a subset of the |
FUN |
a scalar function to compute the summary statistics (defaults to
|
... |
additional arguments to be passed down to
|
Value
If contamination or missing values have been inserted or the simulations have
been split into different domains, a data.frame is returned, otherwise
a vector.
Details
If contamination or missing values have been inserted or the simulations have
been split into different domains, aggregate is called
to compute the summary statistics for the respective subsets.
Otherwise, apply is called to compute the summary statistics
for each column specified by select.
Methods
x = "SimResults"aggregate simulation results.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
aggregate, apply,
"SimResults"
Examples
#### design-based simulation
set.seed(12345) # for reproducibility
data(eusilcP) # load data
## control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome", epsilon = 0.02,
fun = function(x) x * 25)
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02))
}
## run simulation
results <- runSimulation(eusilcP,
sc, contControl = cc, fun = sim)
## aggregate
aggregate(results) # means of results
aggregate(results, FUN = sd) # standard deviations of results
#### model-based simulation
set.seed(12345) # for reproducibility
## function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
## control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.02, dots = list(mean = 15))
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.02),
median = median(x$value))
}
## run simulation
results <- runSimulation(dc, nrep = 50,
contControl = cc, design = "group", fun = sim)
## aggregate
aggregate(results) # means of results
aggregate(results, FUN = sd) # standard deviations of results
Run a simulation experiment on a cluster
Description
Generic function for running a simulation experiment on a cluster.
Usage
clusterRunSimulation(cl, x, setup, nrep, control,
contControl = NULL, NAControl = NULL,
design = character(), fun, ...,
SAE = FALSE)
Arguments
cl |
a cluster as generated by |
x |
a |
setup |
an object of class |
nrep |
a non-negative integer giving the number of repetitions of the simulation experiment (for model-based simulation, mixed simulation designs or simulation based on real data). |
control |
a control object of class |
contControl |
an object of a class inheriting from
|
NAControl |
an object of a class inheriting from
|
design |
a character vector specifying variables (columns) to be used
for splitting the data into domains. The simulations, including
contamination and the insertion of missing values (unless |
fun |
a function to be applied in each simulation run. |
... |
for |
SAE |
a logical indicating whether small area estimation will be used in the simulation experiment. |
Details
Statistical simulation is embarrassingly parallel, hence computational
performance can be increased by parallel computing. Since version 0.5.0,
parallel computing in simFrame is implemented using the package
parallel, which is part of the R base distribution since version
2.14.0 and builds upon work done for the contributed packages
multicore and snow. Note that all objects and packages
required for the computations (including simFrame) need to be made
available on every worker process unless the worker processes are created by
forking (see makeCluster).
In order to prevent problems with random numbers and to ensure
reproducibility, random number streams should be used. With
parallel, random number streams can be created via the
function clusterSetRNGStream().
There are some requirements for slot fun of the control object
control. The function must return a numeric vector, or a list with
the two components values (a numeric vector) and add
(additional results of any class, e.g., statistical models). Note that the
latter is computationally slightly more expensive. A data.frame is
passed to fun in every simulation run. The corresponding argument
must be called x. If comparisons with the original data need to be
made, e.g., for evaluating the quality of imputation methods, the function
should have an argument called orig. If different domains are used
in the simulation, the indices of the current domain can be passed to the
function via an argument called domain.
For small area estimation, the following points have to be kept in mind. The
slot design of control for splitting the data must be supplied
and the slot SAE must be set to TRUE. However, the data are
not actually split into the specified domains. Instead, the whole data set
(sample) is passed to fun. Also contamination and missing values are
added to the whole data (sample). Last, but not least, the function must
have a domain argument so that the current domain can be extracted
from the whole data (sample).
In every simulation run, fun is evaluated using try. Hence
no results are lost if computations fail in any of the simulation runs.
Value
An object of class "SimResults".
Methods
cl = "ANY", x = "ANY", setup = "ANY", nrep = "ANY", control = "missing"convenience wrapper that allows the slots of
controlto be supplied as argumentscl = "ANY", x = "data.frame", setup = "missing", nrep = "numeric", control = "SimControl"run a simulation experiment based on real data with repetitions on a cluster.
cl = "ANY", x = "data.frame", setup = "SampleSetup", nrep = "missing", control = "SimControl"run a design-based simulation experiment with previously set up samples on a cluster.
cl = "ANY", x = "data.frame", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"run a design-based simulation experiment on a cluster.
cl = "ANY", x = "VirtualDataControl", setup = "missing", nrep = "numeric", control = "SimControl"run a model-based simulation experiment with repetitions on a cluster.
cl = "ANY", x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"run a simulation experiment using a mixed simulation design with repetitions on a cluster.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
L'Ecuyer, P., Simard, R., Chen E and Kelton, W. (2002) An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research, 50(6), 1073–1075.
Rossini, A., Tierney L. and Li, N. (2007) Simple Parallel Statistical Computing in R. Journal of Computational and Graphical Statistics, 16(2), 399–420.
Tierney, L., Rossini, A. and Li, N. (2009) snow: A Parallel Computing
Framework for the R System. International Journal of Parallel
Programming, 37(1), 78–90.
See Also
makeCluster,
clusterSetRNGStream,
runSimulation, "SimControl",
"SimResults", simBwplot,
simDensityplot, simXyplot
Examples
## Not run:
## these examples requires at least a dual core processor
## design-based simulation
data(eusilcP) #load data
# start cluster
cl <- makeCluster(2, type = "PSOCK")
# load package and data on workers
clusterEvalQ(cl, {
library(simFrame)
data(eusilcP)
})
# set up random number stream
clusterSetRNGStream(cl, iseed = "12345")
# control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome", epsilon = 0.02,
fun = function(x) x * 25)
# function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02))
}
# export objects to workers
clusterExport(cl, c("sc", "cc", "sim"))
# run simulation on cluster
results <- clusterRunSimulation(cl, eusilcP,
sc, contControl = cc, fun = sim)
# stop cluster
stopCluster(cl)
# explore results
head(results)
aggregate(results)
tv <- mean(eusilcP$eqIncome) # true population mean
plot(results, true = tv)
## model-based simulation
# start cluster
cl <- makeCluster(2, type = "PSOCK")
# load package on workers
clusterEvalQ(cl, library(simFrame))
# set up random number stream
clusterSetRNGStream(cl, iseed = "12345")
# function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
# control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.02, dots = list(mean = 15))
# function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.02),
median = median(x$value))
}
# export objects to workers
clusterExport(cl, c("rgnorm", "means", "dc", "cc", "sim"))
# run simulation on cluster
results <- clusterRunSimulation(cl, dc, nrep = 100,
contControl = cc, design = "group", fun = sim)
# stop cluster
stopCluster(cl)
# explore results
head(results)
aggregate(results)
plot(results, true = means)
## End(Not run)
Set up multiple samples on a cluster
Description
Generic function for setting up multiple samples on a cluster.
Usage
clusterSetup(cl, x, control, ...)
## S4 method for signature 'ANY,data.frame,SampleControl'
clusterSetup(cl, x, control)
Arguments
cl |
a cluster as generated by |
x |
the |
control |
a control object inheriting from the virtual class
|
... |
if |
Details
A fundamental design principle of the framework in the case of design-based simulation studies is that the sampling procedure is separated from the simulation procedure. Two main advantages arise from setting up all samples in advance.
First, the repeated sampling reduces overall computation time dramatically in certain situations, since computer-intensive tasks like stratification need to be performed only once. This is particularly relevant for large population data. In close-to-reality simulation studies carried out in research projects in survey statistics, often up to 10000 samples are drawn from a population of millions of individuals with stratified sampling designs. For such large data sets, stratification takes a considerable amount of time and is a very memory-intensive task. If the samples are taken on-the-fly, i.e., in every simulation run one sample is drawn, the function to take the stratified sample would typically split the population into the different strata in each of the 10000 simulation runs. If all samples are drawn in advance, on the other hand, the population data need to be split only once and all 10000 samples can be taken from the respective strata together.
Second, the samples can be stored permanently, which simplifies the reproduction of simulation results and may help to maximize comparability of results obtained by different partners in a research project. In particular, this is useful for large population data, when complex sampling techniques may be very time-consuming. In research projects involving different partners, usually different groups investigate different kinds of estimators. If the two groups use not only the same population data, but also the same previously set up samples, their results are highly comparable.
The computational performance of setting up multiple samples can be increased
by parallel computing. Since version 0.5.0, parallel computing in
simFrame is implemented using the package parallel, which is
part of the R base distribution since version 2.14.0 and builds upon work
done for the contributed packages multicore and snow. Note
that all objects and packages required for the computations (including
simFrame) need to be made available on every worker process unless the
worker processes are created by forking (see
makeCluster).
In order to prevent problems with random numbers and to ensure
reproducibility, random number streams should be used. With
parallel, random number streams can be created via the
function clusterSetRNGStream().
The control class "SampleControl" is highly flexible and allows
stratified sampling as well as sampling of whole groups rather than
individuals with a specified sampling method. Hence it is often sufficient
to implement the desired sampling method for the simple non-stratified case
to extend the existing framework. See "SampleControl"
for some restrictions on the argument names of such a function, which should
return a vector containing the indices of the sampled observations.
Nevertheless, for very complex sampling procedures, it is possible to define
a control class "MySampleControl" extending
"VirtualSampleControl", and the corresponding method
clusterSetup(cl, x, control) with signature 'ANY, data.frame,
MySampleControl'. In order to optimize computational performance, it is
necessary to efficiently set up multiple samples. Thereby the slot k
of "VirtualSampleControl" needs to be used to control the number of
samples, and the resulting object must be of class
"SampleSetup".
Value
An object of class "SampleSetup".
Methods
cl = "ANY", x = "data.frame", control = "character"set up multiple samples on a cluster using a control class specified by the character string
control. The slots of the control object may be supplied as additional arguments.cl = "ANY", x = "data.frame", control = "missing"set up multiple samples on a cluster using a control object of class
"SampleControl". Its slots may be supplied as additional arguments.cl = "ANY", x = "data.frame", control = "SampleControl"set up multiple samples on a cluster as defined by the control object
control.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
L'Ecuyer, P., Simard, R., Chen E and Kelton, W. (2002) An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research, 50(6), 1073–1075.
Rossini, A., Tierney L. and Li, N. (2007) Simple Parallel Statistical Computing in R. Journal of Computational and Graphical Statistics, 16(2), 399–420.
Tierney, L., Rossini, A. and Li, N. (2009) snow: A Parallel Computing
Framework for the R System. International Journal of Parallel
Programming, 37(1), 78–90.
See Also
makeCluster,
clusterSetRNGStream,
setup, draw,
"SampleControl", "TwoStageControl",
"VirtualSampleControl",
"SampleSetup"
Examples
## Not run:
# these examples require at least a dual core processor
# load data
data(eusilcP)
# start cluster
cl <- makeCluster(2, type = "PSOCK")
# load package and data on workers
clusterEvalQ(cl, {
library(simFrame)
data(eusilcP)
})
# set up random number stream
clusterSetRNGStream(cl, iseed = "12345")
# simple random sampling
srss <- clusterSetup(cl, eusilcP, size = 20, k = 4)
summary(srss)
draw(eusilcP[, c("id", "eqIncome")], srss, i = 1)
# group sampling
gss <- clusterSetup(cl, eusilcP, grouping = "hid", size = 10, k = 4)
summary(gss)
draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2)
# stratified simple random sampling
ssrss <- clusterSetup(cl, eusilcP, design = "region",
size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
summary(ssrss)
draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3)
# stratified group sampling
sgss <- clusterSetup(cl, eusilcP, design = "region",
grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
summary(sgss)
draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4)
# stop cluster
stopCluster(cl)
## End(Not run)
Contaminate data
Description
Generic function for contaminating data.
Usage
contaminate(x, control, ...)
## S4 method for signature 'data.frame,ContControl'
contaminate(x, control, i)
Arguments
x |
the data to be contaminated. |
control |
a control object of a class inheriting from the virtual class
|
i |
an integer giving the element of the slot |
... |
if |
Details
With the control classes implemented in simFrame, contamination is modeled as a two-step process. The first step is to select observations to be contaminated, the second is to model the distribution of the outliers.
In order to extend the framework by a user-defined control class
"MyContControl" (which must extend
"VirtualContControl"), a method
contaminate(x, control, i) with signature
'data.frame, MyContControl' needs to be implemented. In case the
contaminated observations need to be identified at a later stage of the
simulation, e.g., if conflicts with inserting missing values should be
avoided, a logical indicator variable ".contaminated" should be added
to the returned data set.
Value
A data.frame containing the contaminated data. In addition, the
column ".contaminated", which consists of logicals indicating the
contaminated observations, is added to the data.frame.
Methods
x = "data.frame", control = "character"contaminate data using a control class specified by the character string
control. The slots of the control object may be supplied as additional arguments.x = "data.frame", control = "ContControl"contaminate data as defined by the control object
control.x = "data.frame", control = "missing"contaminate data using a control object of class
"ContControl". Its slots may be supplied as additional arguments.
Note
Since version 0.3, contaminate no longer checks if the auxiliary
variable with probability weights are numeric and contain only finite positive
values (sample still throws an error in these cases). This has
been removed to improve computational performance in simulation studies.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.
Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.
See Also
"DCARContControl", "DARContControl",
"ContControl", "VirtualContControl"
Examples
## distributed completely at random
data(eusilcP)
sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20)
# using a control object
dcarc <- ContControl(target = "eqIncome", epsilon = 0.05,
dots = list(mean = 5e+05, sd = 10000), type = "DCAR")
contaminate(sam, dcarc)
# supply slots of control object as arguments
contaminate(sam, target = "eqIncome", epsilon = 0.05,
dots = list(mean = 5e+05, sd = 10000))
## distributed at random
foo <- generate(size = 10, distribution = rnorm,
dots = list(mean = 0, sd = 2))
# using a control object
darc <- DARContControl(target = "V1",
epsilon = 0.2, fun = function(x) x * 100)
contaminate(foo, darc)
# supply slots of control object as arguments
contaminate(foo, "DARContControl", target = "V1",
epsilon = 0.2, fun = function(x) x * 100)
Draw a sample
Description
Generic function for drawing a sample.
Usage
draw(x, setup, ...)
## S4 method for signature 'data.frame,SampleSetup'
draw(x, setup, i = 1)
## S4 method for signature 'data.frame,VirtualSampleControl'
draw(x, setup)
Arguments
x |
the data to sample from. |
setup |
an object of class |
i |
an integer specifying which one of the previously set up samples should be drawn. |
... |
if |
Value
A data.frame containing the sampled observations. In addition, the
column ".weight", which consists of the sample weights, is added to
the data.frame.
Methods
x = "data.frame", setup = "character"draw a sample using a control class specified by the character string
setup. The slots of the control object may be supplied as additional arguments.x = "data.frame", setup = "missing"draw a sample using a control object of class
"SampleControl". Its slots may be supplied as additional arguments.x = "data.frame", setup = "SampleSetup"draw a previously set up sample.
x = "data.frame", setup = "VirtualSampleControl"draw a sample using a control object inheriting from the virtual class
"VirtualSampleControl".
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
setup, "SampleSetup",
"SampleControl", "TwoStageControl",
"VirtualSampleControl"
Examples
## load data
data(eusilcP)
## simple random sampling
draw(eusilcP[, c("id", "eqIncome")], size = 20)
## group sampling
draw(eusilcP[, c("hid", "id", "eqIncome")],
grouping = "hid", size = 10)
## stratified simple random sampling
draw(eusilcP[, c("id", "region", "eqIncome")],
design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2))
## stratified group sampling
draw(eusilcP[, c("hid", "id", "region", "eqIncome")],
design = "region", grouping = "hid",
size = c(2, 5, 5, 3, 4, 5, 3, 5, 2))
Synthetic EU-SILC data
Description
This data set is synthetically generated from real Austrian EU-SILC (European Union Statistics on Income and Living Conditions) data.
Usage
data(eusilcP)
Format
A data.frame with 58 654 observations on the following 28 variables:
hidinteger; the household ID.
regionfactor; the federal state in which the household is located (levels
Burgenland,Carinthia,Lower Austria,Salzburg,Styria,Tyrol,Upper Austria,ViennaandVorarlberg).hsizeinteger; the number of persons in the household.
eqsizenumeric; the equivalized household size according to the modified OECD scale.
eqIncomenumeric; a simplified version of the equivalized household income.
pidinteger; the personal ID.
- id
the household ID combined with the personal ID. The first five digits represent the household ID, the last two digits the personal ID (both with leading zeros).
ageinteger; the person's age.
genderfactor; the person's gender (levels
maleandfemale).ecoStatfactor; the person's economic status (levels
1= working full time,2= working part time,3= unemployed,4= pupil, student, further training or unpaid work experience or in compulsory military or community service,5= in retirement or early retirement or has given up business,6= permanently disabled or/and unfit to work or other inactive person,7= fulfilling domestic tasks and care responsibilities).citizenshipfactor; the person's citizenship (levels
AT,EUandOther).py010nnumeric; employee cash or near cash income (net).
py050nnumeric; cash benefits or losses from self-employment (net).
py090nnumeric; unemployment benefits (net).
py100nnumeric; old-age benefits (net).
py110nnumeric; survivor's benefits (net).
py120nnumeric; sickness benefits (net).
py130nnumeric; disability benefits (net).
py140nnumeric; education-related allowances (net).
hy040nnumeric; income from rental of a property or land (net).
hy050nnumeric; family/children related allowances (net).
hy070nnumeric; housing allowances (net).
hy080nnumeric; regular inter-household cash transfer received (net).
hy090nnumeric; interest, dividends, profit from capital investments in unincorporated business (net).
hy110nnumeric; income received by people aged under 16 (net).
hy130nnumeric; regular inter-household cash transfer paid (net).
hy145nnumeric; repayments/receipts for tax adjustment (net).
mainlogical; indicates the main income holder (i.e., the person with the highest income) of each household.
Details
The data set is used as population data in some of the examples in package
simFrame. Note that it is included for illustrative purposes only. It
consists of 25 000 households, hence it does not represent the true population
sizes of Austria and its regions.
Only a few of the large number of variables in the original survey are included
in this example data set. Some variable names are different from the
standardized names used by the statistical agencies, as the latter are rather
cryptic codes. Furthermore, the variables hsize, eqsize,
eqIncome and age are not included in the standardized format of
EU-SILC data, but have been derived from other variables for convenience.
Moreover, some very sparse income components were not included in the the
generation of this synthetic data set. Thus the equivalized household income is
computed from the available income components.
Source
This is a synthetic data set based on Austrian EU-SILC data from 2006. The original sample was provided by Statistics Austria.
References
Eurostat (2004) Description of target variables: Cross-sectional and longitudinal. EU-SILC 065/04, Eurostat.
Examples
data(eusilcP)
summary(eusilcP)
strata <- stratify(eusilcP, c("region", "gender"))
summary(strata)
Generate data
Description
Generic function for generating data based on a (distribution) model.
Usage
generate(control, ...)
## S4 method for signature 'DataControl'
generate(control)
Arguments
control |
a control object inheriting from the virtual class
|
... |
if |
Details
The control class "DataControl" is quite simple but general. For
user-defined data generation, it often suffices to implement a function and
use it as the distribution slot in the "DataControl" object.
See "DataControl" for some requirements for such a
function.
However, if more specialized data generation models are required, the
framework can be extended by defining a control class "MyDataControl"
extending "VirtualDataControl" and the corresponding
method generate(control) with signature 'MyDataControl'. If,
e.g., a specific distribution or mixture of distributions is frequently used
in simulation experiments, a distinct control class may be more convenient
for the user.
Value
A data.frame.
Methods
control = "character"generate data using a control class specified by the character string
control. The slots of the control object may be supplied as additional arguments.control = "missing"generate data using a control object of class
"DataControl". Its slots may be supplied as additional arguments.control = "DataControl"generate data as defined by the control object
control.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"DataControl", "VirtualDataControl"
Examples
# using a control object
dc <- DataControl(size = 10, distribution = rnorm,
dots = list(mean = 0, sd = 2))
generate(dc)
# supply slots of control object as arguments
generate(size = 10, distribution = rnorm,
dots = list(mean = 0, sd = 2))
Methods for returning the first parts of an object
Description
Return the first parts of an object.
Usage
## S4 method for signature 'SampleSetup'
head(x, k = 6, n = 6, ...)
## S4 method for signature 'SimControl'
head(x)
## S4 method for signature 'SimResults'
head(x, ...)
## S4 method for signature 'Strata'
head(x, ...)
## S4 method for signature 'VirtualContControl'
head(x)
## S4 method for signature 'VirtualDataControl'
head(x)
## S4 method for signature 'VirtualNAControl'
head(x)
## S4 method for signature 'VirtualSampleControl'
head(x)
Arguments
x |
an object. |
k |
for objects of class |
n |
for objects of class |
... |
additional arguments to be passed down to methods. |
Value
An object of the same class as x, but in general smaller. See the
“Methods” section below for details.
Methods
signature(x = "SampleSetup")returns the first parts of set up samples. The first
nindices of each of the firstkset up samples are kept.signature(x = "SimControl")currently returns the object itself.
signature(x = "SimResults")returns the first parts of simulation results. The method of
headfor thedata.framein slotvaluesis thereby called.signature(x = "Strata")returns the first parts of strata information. The method of
headfor the vector in slotvaluesis thereby called and the slotssplitandsizeare adapted accordingly.signature(x = "VirtualContControl")currently returns the object itself.
signature(x = "VirtualDataControl")currently returns the object itself.
signature(x = "VirtualNAControl")currently returns the object itself.
signature(x = "VirtualSampleControl")currently returns the object itself.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
head, "SampleSetup",
"SimResults", "Strata"
Examples
## load data
data(eusilcP)
## class "SampleSetup"
# set up samples using group sampling
set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50)
summary(set)
# get the first 10 indices of each of the first 5 samples
head(set, k = 5, n = 10)
## class "Strata"
# set up samples using group sampling
strata <- stratify(eusilcP, "region")
summary(strata)
# get strata information for the first 10 observations
head(strata, 10)
Inclusion probabilities
Description
Get the first-order inclusion probabilities from a vector of probability weights.
Usage
inclusionProb(prob, size)
Arguments
prob |
a numeric vector of non-negative probability weights. |
size |
a non-negative integer giving the sample size. |
Value
A numeric vector of the first-order inclusion probabilities.
Note
This is a faster C++ implementation of
inclusionprobabilities from package sampling.
Author(s)
Andreas Alfons
See Also
setup, "SampleSetup"
Examples
pweights <- sample(1:5, 25, replace = TRUE)
inclusionProb(pweights, 10)
Methods for getting the length of an object
Description
Get the length of an object.
Usage
## S4 method for signature 'SampleSetup'
length(x)
## S4 method for signature 'VirtualContControl'
length(x)
## S4 method for signature 'VirtualNAControl'
length(x)
## S4 method for signature 'VirtualSampleControl'
length(x)
Arguments
x |
an object. |
Value
An integer giving the length of the object. See the “Methods” section below for details.
Methods
signature(x = "SampleSetup")get the number of set up samples.
signature(x = "VirtualContControl")get the number of contamination levels to be used.
signature(x = "VirtualNAControl")get the number of missing value rates to be used (the length in case of a vector in slot
NArateor the number of rows in case of a matrix).signature(x = "VirtualSampleControl")get the number of samples to be set up.
Author(s)
Andreas Alfons
See Also
Examples
## load data
data(eusilcP)
## class "SampleSetup"
# set up samples using group sampling
set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50)
summary(set)
length(set)
## class "ContControl"
cc <- ContControl(target = "eqIncome",
epsilon = c(0, 0.0025, 0.005, 0.0075, 0.01),
dots = list(mean = 5e+05, sd = 10000))
length(cc)
## class "NAControl"
nc <- NAControl(target = "eqIncome", NArate = c(0.1, 0.2, 0.3))
length(nc)
Plot simulation results
Description
Plot simulation results. A suitable plot function is selected automatically, depending on the structure of the results.
Usage
## S4 method for signature 'SimResults,missing'
plot(x, y , ...)
Arguments
x |
the simulation results. |
y |
not used. |
... |
further arguments to be passed to the selected plot function. |
Value
An object of class "trellis". The
update method can be used to update
components of the object and the print
method (usually called by default) will plot it on an appropriate plotting
device.
Details
The results of simulation experiments with at most one contamination level and at most one missing value rate are visualized by (conditional) box-and-whisker plots. For simulations involving different contamination levels or missing value rates, the average results are plotted against the contamination levels or missing value rates.
Methods
x = "SimResults", y = "missing"plot simulation results.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
simBwplot, simDensityplot,
simXyplot, "SimResults"
Examples
#### design-based simulation
set.seed(12345) # for reproducibility
data(eusilcP) # load data
## control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome", epsilon = 0.02,
fun = function(x) x * 25)
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02))
}
## run simulation
results <- runSimulation(eusilcP,
sc, contControl = cc, fun = sim)
## plot results
tv <- mean(eusilcP$eqIncome) # true population mean
plot(results, true = tv)
#### model-based simulation
set.seed(12345) # for reproducibility
## function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
## control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.02, dots = list(mean = 15))
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.02),
median = median(x$value))
}
## run simulation
results <- runSimulation(dc, nrep = 50,
contControl = cc, design = "group", fun = sim)
## plot results
plot(results, true = means)
Run a simulation experiment
Description
Generic function for running a simulation experiment.
Usage
runSimulation(x, setup, nrep, control, contControl = NULL,
NAControl = NULL, design = character(), fun, ...,
SAE = FALSE)
runSim(...)
Arguments
x |
a |
setup |
an object of class |
nrep |
a non-negative integer giving the number of repetitions of the simulation experiment (for model-based simulation, mixed simulation designs or simulation based on real data). |
control |
a control object of class |
contControl |
an object of a class inheriting from
|
NAControl |
an object of a class inheriting from
|
design |
a character vector specifying variables (columns) to be used
for splitting the data into domains. The simulations, including
contamination and the insertion of missing values (unless |
fun |
a function to be applied in each simulation run. |
... |
for |
SAE |
a logical indicating whether small area estimation will be used in the simulation experiment. |
Details
For convenience, the slots of control may be supplied as arguments.
There are some requirements for slot fun of the control object
control. The function must return a numeric vector, or a list with
the two components values (a numeric vector) and add
(additional results of any class, e.g., statistical models). Note that the
latter is computationally slightly more expensive. A data.frame is
passed to fun in every simulation run. The corresponding argument
must be called x. If comparisons with the original data need to be
made, e.g., for evaluating the quality of imputation methods, the function
should have an argument called orig. If different domains are used
in the simulation, the indices of the current domain can be passed to the
function via an argument called domain.
For small area estimation, the following points have to be kept in mind. The
design for splitting the data must be supplied and SAE
must be set to TRUE. However, the data are not actually split into
the specified domains. Instead, the whole data set (sample) is passed to
fun. Also contamination and missing values are added to the whole
data (sample). Last, but not least, the function must have a domain
argument so that the current domain can be extracted from the whole data
(sample).
In every simulation run, fun is evaluated using try. Hence
no results are lost if computations fail in any of the simulation runs.
runSim is a wrapper for runSimulation.
Value
An object of class "SimResults".
Methods
x = "ANY", setup = "ANY", nrep = "ANY", control = "missing"-
convenience wrapper that allows the slots of
controlto be supplied as arguments x = "data.frame", setup = "missing", nrep = "missing", control = "SimControl"run a simulation experiment based on real data without repetitions (probably useless, but for completeness).
x = "data.frame", setup = "missing", nrep = "numeric", control = "SimControl"run a simulation experiment based on real data with repetitions.
x = "data.frame", setup = "SampleSetup", nrep = "missing", control = "SimControl"run a design-based simulation experiment with previously set up samples.
x = "data.frame", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"run a design-based simulation experiment.
x = "VirtualDataControl", setup = "missing", nrep = "missing", control = "SimControl"run a model-based simulation experiment without repetitions (probably useless, but for completeness).
x = "VirtualDataControl", setup = "missing", nrep = "numeric", control = "SimControl"run a model-based simulation experiment with repetitions.
x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"run a simulation experiment using a mixed simulation design without repetitions (probably useless, but for completeness).
x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"run a simulation experiment using a mixed simulation design with repetitions.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"SimControl", "SimResults",
simBwplot, simDensityplot, simXyplot
Examples
#### design-based simulation
set.seed(12345) # for reproducibility
data(eusilcP) # load data
## control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome", epsilon = 0.02,
fun = function(x) x * 25)
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02))
}
## run simulation and explore results
results <- runSimulation(eusilcP,
sc, contControl = cc, fun = sim)
head(results)
aggregate(results)
tv <- mean(eusilcP$eqIncome) # true population mean
plot(results, true = tv)
#### model-based simulation
set.seed(12345) # for reproducibility
## function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
## control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.02, dots = list(mean = 15))
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.02),
median = median(x$value))
}
## run simulation and explore results
results <- runSimulation(dc, nrep = 50,
contControl = cc, design = "group", fun = sim)
head(results)
aggregate(results)
plot(results, true = means)
Random sampling
Description
Functions for random sampling.
Usage
srs(N, size, replace = FALSE)
ups(N, size, prob, replace = FALSE)
brewer(prob, eps = 1e-06)
midzuno(prob, eps = 1e-06)
tille(prob, eps = 1e-06)
Arguments
N |
a non-negative integer giving the number of observations from which to sample. |
size |
a non-negative integer giving the number of observations to sample. |
prob |
for |
replace |
a logical indicating whether sampling should be performed with or without replacement. |
eps |
a numeric control value giving the desired accuracy. |
Details
srs and ups are wrappers for simple random sampling and
unequal probability sampling, respectively. Both functions make use of
sample.
brewer, midzuno and tille perform Brewer's, Midzuno's and
Tillé's method, respectively, for unequal probability sampling
without replacement and fixed sample size.
Value
An integer vector giving the indices of the sampled observations.
Note
brewer, midzuno and tille are faster C++ implementations
of UPbrewer, UPmidzuno and UPtille, respectively, from
package sampling.
Author(s)
Andreas Alfons
References
Brewer, K. (1975), A simple procedure for sampling \pi pswor,
Australian Journal of Statistics, 17(3), 166-172.
Midzuno, H. (1952) On the sampling system with probability proportional to sum of size. Annals of the Institute of Statistical Mathematics, 3(2), 99–107.
Tillé, Y. (1996) An elimination procedure of unequal probability sampling without replacement. Biometrika, 83(1), 238–241.
Deville, J.-C. and Tillé, Y. (1998) Unequal probability sampling without replacement through a splitting method. Biometrika, 85(1), 89–101.
See Also
"SampleControl", "TwoStageControl",
setup, inclusionProb, sample
Examples
## simple random sampling
# without replacement
srs(10, 5)
# with replacement
srs(5, 10, replace = TRUE)
## unequal probability sampling
# without replacement
ups(10, 5, prob = 1:10)
# with replacement
ups(5, 10, prob = 1:5, replace = TRUE)
## Brewer, Midzuno and Tille sampling
# define inclusion probabilities
prob <- c(0.2,0.7,0.8,0.5,0.4,0.4)
# Brewer sampling
brewer(prob)
# Midzuno sampling
midzuno(prob)
# Tille sampling
tille(prob)
Set missing values
Description
Generic function for inserting missing values into data.
Usage
setNA(x, control, ...)
## S4 method for signature 'data.frame,NAControl'
setNA(x, control, i)
Arguments
x |
the data in which missing values should be inserted. |
control |
a control object inheriting from the virtual class
|
i |
an integer giving the element or row of the slot |
... |
if |
Details
In order to extend the framework by a user-defined control class
"MyNAControl" (which must extend
"VirtualNAControl"), a method
setNA(x, control, i) with signature 'data.frame, MyNAControl'
needs to be implemented.
Value
A data.frame containing the data with missing values.
Methods
x = "data.frame", control = "character"set missing values using a control class specified by the character string
control. The slots of the control object may be supplied as additional arguments.x = "data.frame", control = "missing"set missing values using a control object of class
"NAControl". Its slots may be supplied as additional arguments.x = "data.frame", control = "NAControl"set missing values as defined by the control object
control.
Note
Since version 0.3, setNA no longer checks if auxiliary variable(s)
with probability weights are numeric and contain only finite positive values
(sample still throws an error in these cases). This has been
removed to improve computational performance in simulation studies.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
"NAControl", "VirtualNAControl"
Examples
data(eusilcP)
eusilcP$age[eusilcP$age < 0] <- 0 # this actually occurs
sam <- draw(eusilcP[, c("id", "age", "eqIncome")], size = 20)
## using control objects
# missing completely at random
mcarc <- NAControl(target = "eqIncome", NArate = 0.2)
setNA(sam, mcarc)
# missing at random
marc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "age")
setNA(sam, marc)
# missing not at random
mnarc <- NAControl(target = "eqIncome",
NArate = 0.2, aux = "eqIncome")
setNA(sam, mnarc)
## supply slots of control object as arguments
# missing completely at random
setNA(sam, target = "eqIncome", NArate = 0.2)
# missing at random
setNA(sam, target = "eqIncome", NArate = 0.2, aux = "age")
# missing not at random
setNA(sam, target = "eqIncome", NArate = 0.2, aux = "eqIncome")
Set up multiple samples
Description
Generic function for setting up multiple samples.
Usage
setup(x, control, ...)
## S4 method for signature 'data.frame,SampleControl'
setup(x, control)
Arguments
x |
the data to sample from. |
control |
a control object inheriting from the virtual class
|
... |
if |
Details
A fundamental design principle of the framework in the case of design-based simulation studies is that the sampling procedure is separated from the simulation procedure. Two main advantages arise from setting up all samples in advance.
First, the repeated sampling reduces overall computation time dramatically in certain situations, since computer-intensive tasks like stratification need to be performed only once. This is particularly relevant for large population data. In close-to-reality simulation studies carried out in research projects in survey statistics, often up to 10000 samples are drawn from a population of millions of individuals with stratified sampling designs. For such large data sets, stratification takes a considerable amount of time and is a very memory-intensive task. If the samples are taken on-the-fly, i.e., in every simulation run one sample is drawn, the function to take the stratified sample would typically split the population into the different strata in each of the 10000 simulation runs. If all samples are drawn in advance, on the other hand, the population data need to be split only once and all 10000 samples can be taken from the respective strata together.
Second, the samples can be stored permanently, which simplifies the reproduction of simulation results and may help to maximize comparability of results obtained by different partners in a research project. In particular, this is useful for large population data, when complex sampling techniques may be very time-consuming. In research projects involving different partners, usually different groups investigate different kinds of estimators. If the two groups use not only the same population data, but also the same previously set up samples, their results are highly comparable.
The control class "SampleControl" is highly flexible and allows
stratified sampling as well as sampling of whole groups rather than
individuals with a specified sampling method. Hence it is often sufficient
to implement the desired sampling method for the simple non-stratified case
to extend the existing framework. See "SampleControl"
for some restrictions on the argument names of such a function, which should
return a vector containing the indices of the sampled observations.
Nevertheless, for very complex sampling procedures, it is possible to define
a control class "MySampleControl" extending
"VirtualSampleControl", and the corresponding method
setup(x, control) with signature 'data.frame, MySampleControl'.
In order to optimize computational performance, it is necessary to
efficiently set up multiple samples. Thereby the slot k of
"VirtualSampleControl" needs to be used to control the number of
samples, and the resulting object must be of class
"SampleSetup".
Value
An object of class "SampleSetup".
Methods
x = "data.frame", control = "character"set up multiple samples using a control class specified by the character string
control. The slots of the control object may be supplied as additional arguments.x = "data.frame", control = "missing"set up multiple samples using a control object of class
"SampleControl". Its slots may be supplied as additional arguments.x = "data.frame", control = "SampleControl"set up multiple samples as defined by the control object
control.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
simSample, draw,
"SampleControl", "TwoStageControl",
"VirtualSampleControl",
"SampleSetup"
Examples
set.seed(12345) # for reproducibility
data(eusilcP) # load data
## simple random sampling
srss <- setup(eusilcP, size = 20, k = 4)
summary(srss)
draw(eusilcP[, c("id", "eqIncome")], srss, i = 1)
## group sampling
gss <- setup(eusilcP, grouping = "hid", size = 10, k = 4)
summary(gss)
draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2)
## stratified simple random sampling
ssrss <- setup(eusilcP, design = "region",
size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
summary(ssrss)
draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3)
## stratified group sampling
sgss <- setup(eusilcP, design = "region",
grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
summary(sgss)
draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4)
Apply a function to subsets
Description
Generic functions for applying a function to subsets of a data set.
Usage
simApply(x, design, fun, ...)
simSapply(x, design, fun, ..., simplify = TRUE)
Arguments
x |
the |
design |
a character, logical or numeric vector specifying the variables (columns) used for subsetting. |
fun |
a function to be applied to the subsets. |
simplify |
a logical indicating whether the results should be simplified to a vector or matrix (if possible). |
... |
additional arguments to be passed to |
Value
For simApply a data.frame.
For simSapply, a list, vector or matrix (see sapply).
Methods for function simApply
x = "data.frame", design = "BasicVector", fun = "function"apply a function to subsets given by the variables (columns) in
design.x = "data.frame", design = "Strata", fun = "function"apply a function to subsets given by
design.
Methods for function simSapply
x = "data.frame", design = "BasicVector", fun = "function"apply a function to subsets given by the variables (columns) in
design.x = "data.frame", design = "Strata", fun = "function"apply a function to subsets given by
design.
Author(s)
Andreas Alfons
See Also
Examples
data(eusilcP)
eusilcP <- eusilcP[, c("region", "gender", "eqIncome")]
## returns data.frame
simApply(eusilcP, c("region", "gender"),
function(x) median(x$eqIncome))
## returns vector
simSapply(eusilcP, c("region", "gender"),
function(x) median(x$eqIncome))
Box-and-whisker plots
Description
Generic function for producing box-and-whisker plots.
Usage
simBwplot(x, ...)
## S4 method for signature 'SimResults'
simBwplot(x, true = NULL, epsilon, NArate, select, ...)
Arguments
x |
the object to be plotted. For plotting simulation results, this
must be an object of class |
true |
a numeric vector giving the true values. If supplied, reference lines are drawn in the corresponding panels. |
epsilon |
a numeric vector specifying contamination levels. If supplied, the values corresponding to these contamination levels are extracted from the simulation results and plotted. |
NArate |
a numeric vector specifying missing value rates. If supplied, the values corresponding to these missing value rates are extracted from the simulation results and plotted. |
select |
a character vector specifying the columns to be plotted. It
must be a subset of the |
... |
additional arguments to be passed down to methods and eventually
to |
Details
For simulation results with multiple contamination levels or missing value rates, conditional box-and-whisker plots are produced.
Value
An object of class "trellis". The
update method can be used to update
components of the object and the print
method (usually called by default) will plot it on an appropriate plotting
device.
Methods
x = "SimResults"produce box-and-whisker plots of simulation results.
Note
Functionality for producing conditional box-and-whisker plots was added in version 0.2. Prior to that, the function gave an error message if simulation results with multiple contamination levels or missing value rates were supplied.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
simDensityplot, simXyplot,
bwplot, "SimResults"
Examples
#### design-based simulation
set.seed(12345) # for reproducibility
data(eusilcP) # load data
## control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome", epsilon = 0.02,
fun = function(x) x * 25)
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02))
}
## run simulation
results <- runSimulation(eusilcP,
sc, contControl = cc, fun = sim)
## plot results
tv <- mean(eusilcP$eqIncome) # true population mean
simBwplot(results, true = tv)
#### model-based simulation
set.seed(12345) # for reproducibility
## function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
## control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.02, dots = list(mean = 15))
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.02),
median = median(x$value))
}
## run simulation
results <- runSimulation(dc, nrep = 50,
contControl = cc, design = "group", fun = sim)
## plot results
simBwplot(results, true = means)
Kernel density plots
Description
Generic function for producing kernel density plots.
Usage
simDensityplot(x, ...)
## S4 method for signature 'SimResults'
simDensityplot(x, true = NULL, epsilon, NArate, select, ...)
Arguments
x |
the object to be plotted. For plotting simulation results, this
must be an object of class |
true |
a numeric vector giving the true values. If supplied, reference lines are drawn in the corresponding panels. |
epsilon |
a numeric vector specifying contamination levels. If supplied, the values corresponding to these contamination levels are extracted from the simulation results and plotted. |
NArate |
a numeric vector specifying missing value rates. If supplied, the values corresponding to these missing value rates are extracted from the simulation results and plotted. |
select |
a character vector specifying the columns to be plotted. It
must be a subset of the |
... |
additional arguments to be passed down to methods and eventually
to |
Details
For simulation results with multiple contamination levels or missing value rates, conditional kernel density plots are produced.
Value
An object of class "trellis". The
update method can be used to update
components of the object and the print
method (usually called by default) will plot it on an appropriate plotting
device.
Methods
x = "SimResults"produce kernel density plots of simulation results.
Note
Functionality for producing conditional kernel density plots was added in version 0.2. Prior to that, the function gave an error message if simulation results with multiple contamination levels or missing value rates were supplied.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
simBwplot, simXyplot,
densityplot,
"SimResults"
Examples
#### design-based simulation
set.seed(12345) # for reproducibility
data(eusilcP) # load data
## control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome", epsilon = 0.02,
fun = function(x) x * 25)
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02))
}
## run simulation
results <- runSimulation(eusilcP,
sc, contControl = cc, fun = sim)
## plot results
tv <- mean(eusilcP$eqIncome) # true population mean
simDensityplot(results, true = tv)
#### model-based simulation
set.seed(12345) # for reproducibility
## function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
## control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.02, dots = list(mean = 15))
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.02),
median = median(x$value))
}
## run simulation
results <- runSimulation(dc, nrep = 50,
contControl = cc, design = "group", fun = sim)
## plot results
simDensityplot(results, true = means)
Set up multiple samples
Description
A convenience wrapper for setting up multiple samples using setup
with control class SampleControl.
Usage
simSample(x, design = character(), grouping = character(),
collect = FALSE, fun = srs, size = NULL,
prob = NULL, ..., k = 1)
Arguments
x |
the |
design |
a character, logical or numeric vector specifying variables (columns) to be used for stratified sampling. |
grouping |
a character string, single integer or logical vector specifying a grouping variable (column) to be used for sampling whole groups rather than individual observations. |
collect |
logical; if a grouping variable is specified and this is
|
fun |
a function to be used for sampling (defaults to
|
size |
an optional non-negative integer giving the number of items (observations or groups) to sample. For stratified sampling, a vector of non-negative integers, each giving the number of items to sample from the corresponding stratum. |
prob |
an optional numeric vector giving the probability weights, or a character string or logical vector specifying a variable (column) that contains the probability weights. |
... |
additional arguments to be passed to |
k |
a single positive integer giving the number of samples to be set up. |
Details
There are some restrictions on the argument names of the function
supplied to fun. If it needs population data as input,
the corresponding argument should be called x and should expect
a data.frame. If the sampling method only needs the population size
as input, the argument should be called N. Note that fun is
not expected to have both x and N as arguments, and that the
latter is much faster for stratified sampling or group sampling.
Furthermore, if the function has arguments for sample size and probability
weights, they should be called size and prob, respectively.
Note that a function with prob as its only argument is perfectly valid
(for probability proportional to size sampling). Further arguments of
fun may be passed directly via the ... argument.
Value
An object of class "SampleSetup".
Author(s)
Andreas Alfons
See Also
setup, "SampleControl",
"SampleSetup"
Examples
data(eusilcP)
## simple random sampling
srss <- simSample(eusilcP, size = 20, k = 4)
summary(srss)
draw(eusilcP[, c("id", "eqIncome")], srss, i = 1)
## group sampling
gss <- simSample(eusilcP, grouping = "hid", size = 10, k = 4)
summary(gss)
draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2)
## stratified simple random sampling
ssrss <- simSample(eusilcP, design = "region",
size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
summary(ssrss)
draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3)
## stratified group sampling
sgss <- simSample(eusilcP, design = "region",
grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
summary(sgss)
draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4)
X-Y plots
Description
Generic function for producing x-y plots. For simulation results, the average results are plotted against the corresponding contamination levels or missing value rates.
Usage
simXyplot(x, ...)
## S4 method for signature 'SimResults'
simXyplot(x, true = NULL, epsilon, NArate,
select, cond = c("Epsilon", "NArate"),
average = c("mean", "median"), ...)
Arguments
x |
the object to be plotted. For plotting simulation results, this
must be an object of class |
true |
a numeric vector giving the true values. If supplied, reference lines are drawn in the corresponding panels. |
epsilon |
a numeric vector specifying contamination levels. If supplied, the values corresponding to these contamination levels are extracted from the simulation results and plotted. |
NArate |
a numeric vector specifying missing value rates. If supplied, the values corresponding to these missing value rates are extracted from the simulation results and plotted. |
select |
a character vector specifying the columns to be plotted. It
must be a subset of the |
cond |
a character string; for simulation results with multiple
contamination levels and multiple missing value rates, this specifies
the column of the simulation results to be used for producing conditional
x-y plots. If |
average |
a character string specifying how the averages should be
computed. Possible values are |
... |
additional arguments to be passed down to methods and eventually
to |
Details
For simulation results with multiple contamination levels and multiple
missing value rates, conditional x-y plots are produced, as specified by
cond.
Value
An object of class "trellis". The
update method can be used to update
components of the object and the print
method (usually called by default) will plot it on an appropriate plotting
device.
Methods
x = "SimResults"produce x-y plots of simulation results.
Note
Functionality for producing conditional x-y plots (including the argument
cond) was added in version 0.2. Prior to that, the function gave an
error message if simulation results with multiple contamination levels and
multiple missing value rates were supplied.
The argument average that specifies how the averages are computed
was added in version 0.1.2. Prior to that, the mean has always been used.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
simBwplot, simDensityplot,
xyplot, "SimResults"
Examples
#### design-based simulation
set.seed(12345) # for reproducibility
data(eusilcP) # load data
## control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome",
epsilon = seq(0, 0.05, by = 0.01),
fun = function(x) x * 25)
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.05))
}
## run simulation
results <- runSimulation(eusilcP,
sc, contControl = cc, fun = sim)
## plot results
tv <- mean(eusilcP$eqIncome) # true population mean
simXyplot(results, true = tv)
#### model-based simulation
set.seed(12345) # for reproducibility
## function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
## control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = seq(0, 0.05, by = 0.01),
dots = list(mean = 15))
## function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.05),
median = median(x$value))
}
## run simulation
results <- runSimulation(dc, nrep = 50,
contControl = cc, design = "group", fun = sim)
## plot results
simXyplot(results, true = means)
Stratify data
Description
Generic function for stratifying data.
Usage
stratify(x, design)
Arguments
x |
the |
design |
a character, logical or numeric vector specifying the variables (columns) to be used for stratification. |
Value
An object of class "Strata".
Methods
x = "data.frame", design = "BasicVector"stratify data according to the variables (columns) given by
design.
Author(s)
Andreas Alfons
See Also
"Strata"
Examples
data(eusilcP)
strata <- stratify(eusilcP, c("region", "gender"))
summary(strata)
Utility functions for stratifying data
Description
Generic utility functions for stratifying data. These are useful if not all the
information of class "Strata" is necessary.
Usage
getStrataLegend(x, design)
getStrataSplit(x, design, USE.NAMES = TRUE)
getStrataTable(x, design)
getStratumSizes(x, design, USE.NAMES = TRUE)
getStratumValues(x, design, split)
Arguments
x |
the |
design |
a character, logical or numeric vector specifying the variables (columns) to be used for stratification. |
USE.NAMES |
a logical indicating whether information about the strata
should be used as |
split |
an optional list in which each list element contains the indices
of the observations belonging to the corresponding stratum (as returned by
|
Value
For getStrataLegend, a data.frame describing the strata.
For getStrataSplit, a list in which each element contains the
indices of the observations belonging to the corresponding stratum.
For getStrataTable, a data.frame describing the strata
and containing the stratum sizes.
For getStratumSizes, a numeric vector of the stratum sizes.
For getStratumValues, a numeric vector giving the stratum number for
each observation.
Methods for function getStrataLegend
- x = "data.frame", design = "BasicVector"
get a
data.framedescribing the strata, according to the variables specified bydesign.
Methods for function getStrataSplit
- x = "data.frame", design = "BasicVector"
get a list in which each element contains the indices of the observations belonging to the corresponding stratum, according to the variables specified by
design.
Methods for function getStrataTable
- x = "data.frame", design = "BasicVector"
get a
data.framedescribing the strata and containing the stratum sizes, according to the variables specified bydesign.
Methods for function getStratumSizes
- x = "list", design = "missing"
get the stratum sizes for a list in which each list element contains the indices of the observations belonging to the corresponding stratum (as returned by
getStrataSplit).- x = "data.frame", design = "BasicVector"
get the stratum sizes of a data set, according to the variables specified by
design.
Methods for function getStratumValues
- x = "data.frame", design = "BasicVector", split = "list"
get the stratum number for each observation, according to the variables specified by
design. A previously computed list in which each list element contains the indices of the observations belonging to the corresponding stratum (as returned bygetStrataSplit) speeds things up a bit.- x = "data.frame", design = "BasicVector", split = "missing"
get the stratum number for each observation, according to the variables specified by
design.
Author(s)
Andreas Alfons
See Also
Examples
data(eusilcP)
## all data
getStrataLegend(eusilcP, c("region", "gender"))
getStrataTable(eusilcP, c("region", "gender"))
getStratumSizes(eusilcP, c("region", "gender"))
## small sample
sam <- draw(eusilcP, size = 25)
getStrataSplit(sam, "gender")
getStratumValues(sam, "gender")
Methods for producing a summary of an object
Description
Produce a summary an object.
Usage
## S4 method for signature 'SampleSetup'
summary(object)
## S4 method for signature 'SimControl'
summary(object)
## S4 method for signature 'SimResults'
summary(object, ...)
## S4 method for signature 'Strata'
summary(object)
## S4 method for signature 'VirtualContControl'
summary(object)
## S4 method for signature 'VirtualDataControl'
summary(object)
## S4 method for signature 'VirtualNAControl'
summary(object)
## S4 method for signature 'VirtualSampleControl'
summary(object)
Arguments
object |
an object. |
... |
additional arguments to be passed down to methods. |
Value
The form of the resulting object depends on the class of the argument
object. See the “Methods” section below for details.
Methods
signature(x = "SampleSetup")returns an object of class
SummarySampleSetup, which contains information on the size of each of the set up samples.signature(x = "SimControl")currently returns the object itself.
signature(x = "SimResults")produces a summary of the simulation results by calling the method of
summaryfor thedata.framein slotvalues.signature(x = "Strata")returns a
data.framecontaining the size of each stratum.signature(x = "VirtualContControl")currently returns the object itself.
signature(x = "VirtualDataControl")currently returns the object itself.
signature(x = "VirtualNAControl")currently returns the object itself.
signature(x = "VirtualSampleControl")currently returns the object itself.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
summary, "SampleSetup",
"SummarySampleSetup", "SimResults",
"Strata"
Examples
## load data
data(eusilcP)
## class "SampleSetup"
# set up samples using group sampling
set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50)
summary(set)
## class "Strata"
# set up samples using group sampling
strata <- stratify(eusilcP, "region")
summary(strata)
Methods for returning the last parts of an object
Description
Return the last parts of an object.
Usage
## S4 method for signature 'SampleSetup'
tail(x, k = 6, n = 6, ...)
## S4 method for signature 'SimControl'
tail(x)
## S4 method for signature 'SimResults'
tail(x, ...)
## S4 method for signature 'Strata'
tail(x, ...)
## S4 method for signature 'VirtualContControl'
tail(x)
## S4 method for signature 'VirtualDataControl'
tail(x)
## S4 method for signature 'VirtualNAControl'
tail(x)
## S4 method for signature 'VirtualSampleControl'
tail(x)
Arguments
x |
an object. |
k |
for objects of class |
n |
for objects of class |
... |
additional arguments to be passed down to methods. |
Value
An object of the same class as x, but in general smaller. See the
“Methods” section below for details.
Methods
signature(x = "SampleSetup")returns the last parts of set up samples. The last
nindices of each of the lastkset up samples are kept.signature(x = "SimControl")currently returns the object itself.
signature(x = "SimResults")returns the last parts of simulation results. The method of
tailfor thedata.framein slotvaluesis thereby called.signature(x = "Strata")returns the last parts of strata information. The method of
tailfor the vector in slotvaluesis thereby called and the slotssplitandsizeare adapted accordingly.signature(x = "VirtualContControl")currently returns the object itself.
signature(x = "VirtualDataControl")currently returns the object itself.
signature(x = "VirtualNAControl")currently returns the object itself.
signature(x = "VirtualSampleControl")currently returns the object itself.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
See Also
tail, "SampleSetup",
"SimResults", "Strata"
Examples
## load data
data(eusilcP)
## class "SampleSetup"
# set up samples using group sampling
set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50)
summary(set)
# get the last 10 indices of each of the last 5 samples
tail(set, k = 5, n = 10)
## class "Strata"
# set up samples using group sampling
strata <- stratify(eusilcP, "region")
summary(strata)
# get strata information for the last 10 observations
tail(strata, 10)