In meatPL() the case of cross-section data (i.e.,
all elements of order.by being equal) is handled
consistently even if aggregate = TRUE (reported by Christof
Schoetz).
Fix bread() method for survreg()
objects in case the latter already used a “robust” sandwich variance. In
that case $naive.var rather than $var has to
be used for the bread (reported by Daniel Klinenberg).
Improve warnings in vcovHC() for hat values
numerically equal to 1. (Suggested by Sanford Weisberg and John
Fox.)
Added jackknife estimator in all vcovBS() methods
(suggested by Joe Ritter). This is of particular practical interest in
linear regression models where the (clustered) jackknife and the
(clustered) HC3 (or CV3, without cluster adjustment) estimator coincide.
In nonlinear models (including non-Gaussian GLMs) the jackknife and the
HC3 estimator do not coincide but the jackknife might still be a useful
alternative when the HC3 cannot be computed.
Added a new convenience interface vcovJK() for the
jackknife covariance whose default method simply calls
vcovBS(..., type = "jackknife") (also suggested by Joe
Ritter, for more details see the previous item).
Added fractional-random-weight bootstrap, also known as Bayesian
bootstrap, in all vcovBS() methods (suggested by Noah
Greifer and Grant McDermott). This is an alternative to the classical xy
bootstrap which has the computational advantage that all observations
are always part of the bootstrap samples with positive weights drawn
from a Dirichlet distribution. As weights can become close to zero but
no observations are excluded completely, this can stabilize the
computation of models that are not well-defined on all subsets.
Support weights, offsets, and different fitting methods in
lm and glm objects in the respective
vcovBS() methods (reported by Noah Greifer).
More verbose error messages in bwAndrews() and
bwNeweyWest() when bandwidth cannot be computed, e.g., due
to singular regressor variables (suggested by Andrei V.
Kostyrka).
Fix bread() method for coxph() objects
in case the latter already used a “robust” sandwich variance. In that
case $naive.var rather than $var has to be
used for the bread (reported by Alec Todd).
Fix plm::plm(..., index = ...) calls which
incorrectly used indexes = ... (as in
plm.data(), reported by Kevin Tappe).
Added new argument aggregate = TRUE to
meatPL() which is thus inherited by vcovPL().
By default, this still yields the Driscoll & Kraay (1998) covariance
matrix. When setting aggregate = FALSE the cross-sectional
and cross-serial correlation is set to zero, yielding the “pure” panel
Newey-West covariance matrix.
Bug fix in vcovCL(..., type = "HC2") for
glm objects or lm objects with weights. The
code had erroneously assumed that the hat matrices were all symmetric
(as in the lm case without weights). This is corrected now.
(Detected and reported by Bixi Zhang.)
Issue a warning in vcovHC() for HC2/HC3/HC4/HC4m/HC5
if any of the hat values are numerically equal to 1. This leads to
numerically unstable covariances, in the most extreme case
NaN because the associated residuals are equal to 0 and
divided by 0. (Suggested by Ding Peng and John Fox.)
Speed improvement in vcovBS.lm(): For
"xy" bootstrap, .lm.fit() rather than
lm.fit() is used which is somewhat more efficient in some
situations (suggested by Grant McDermott). For "residual"
and wild bootstrap, the bootstrap by default still samples coefficients
via QR decomposition in each iteration (qrjoint = FALSE)
but may alternatively sample the dependent variable and then apply the
QR decomposition jointly only once (qrjoint = TRUE). If the
sample size (and the number of coefficients) is large, then
qrjoint = TRUE may be significantly faster while requiring
much more memory (proposed by Alexander Fischer).
Enable passing score matrix (as computed by
estfun()) directly to bwAndrews() and
bwNeweyWest(). If this is used, the score matrix should
either have a column (Intercept) or the
weights argument should be set appropriately to identify
the column pertaining to the intercept (if any).
The vignettes have been tweaked so that they still “run” without technical errors when suggested packages (listed in the VignetteDepends) are not available. This is achieved by defining replacement functions that do not fail but lead to partially non-sensical output. A warning is added in the vignettes if any of the replacements is used.
Extended the “Getting started” page with information on how to use sandwich in combination with the modelsummary package (Arel-Bundock) based on broom infrastructure (Robinson, Hayes, Couch). (Based on ideas from Grant McDermott.) https://sandwich.R-Forge.R-project.org/articles/sandwich.html
Catch NA observations in cluster and/or
order.by indexes for vcovCL(),
vcovBS(), vcovPL(), and vcovPC().
Such missing observations cannot be handled in the covariance extractor
functions but need to be addressed prior to fitting the model object,
either by omitting these observations or by imputing the missing values.
(Raised by Alexander Fischer on StackOverflow https://stackoverflow.com/questions/64849935/clustered-standard-errors-and-missing-values.)
In vcovHC() if there are estfun() rows
that are all zero and type = "const", then the working
residuals for lm and glm objects are obtained
via residuals() rather than estfun().
(Prompted by an issue raised by Alex Torgovitsky.)
Release of version 3.0-0 accompanying the publication of the
paper “Various Versatile Variances: An Object-Oriented Implementation of
Clustered Covariances in R.” together with Susanne Koell and Nathaniel
Graham in the Journal of Statistcal Software at https://doi.org/10.18637/jss.v095.i01. The paper is also
provided as a vignette in the package as
vignette("sandwich-CL", package = "sandwich").
Improved or clarified notation in Equations 6, 9, 21, and 22 (based on feedback from Bettina Gruen).
The documentation of the HC1 bias correction for clustered
covariances in
vignette("sandwich-CL", package = "sandwich") has been
corrected (Equation 15). While both the code in vcovCL()
and the corresponding documentation ?vcovCL always
correctly used (n-1)/(n-k), the vignette had incorrectly stated it as
n/(n-k). (Reported by Yves Croissant.)
The package is also accompanied by a pkgdown website
on R-Forge now:
https://sandwich.R-Forge.R-project.org/
This essentially uses the previous content of the package
(documentation, vignettes, NEWS) and just formatting was enhanced. But a
few new features were also added:
pkgdown page (but not
shipped in the package) providing an introduction to the package and
listing all variance-covariance functions provided with links to further
details.pkgdown page (but
also not shipped in the package) linking the Sweave-based
PDF vignettes so that they are easily accessible online.README with very brief overview for the
pkgdown title page.All kernel weights functions in kweights() are made
symmetric around zero now (suggested by Christoph Hanck). The quadratic
spectral kernal is approximated by exp(-c * x^2) rather
than 1 for very small x.
In case the Formula namespace is loaded, warnings
are suppressed now for processing formula specifications like
cluster = ~ id in expand.model.frame().
Otherwise warnings may occur with the | separator in
multi-part formulas with factors. (Reported by David
Hugh-Jones.)
The bread() method for mlm objects has
been improved to also handle weighted mlm objects.
(Suggested by James Pustejovsky.)
In various vcov*() functions assuring that the
variance-covariance matrix is positive-definite (via
fix = TRUE) erroneously dropped the dimnames. Now these are
properly preserved. (Reported by Joe Ritter.)
Added suppressWarnings(RNGversion("3.5.0")) in those
places where set.seed() was used to assure exactly
reproducible results from R 3.6.0 onwards.
Enhanced
vignette("sandwich-CL", package = "sandwich") by better
describing the background of clustered covariances and being more
precise in the mathematical notation. Documentation for the new features
(see below, e.g., the formula cluster specification and the
vcovBS() methods) has been added.
In vcovCL(), vcovPL(),
vcovPC(), and vcovBS(), the
cluster argument (and potentially also
order.by) can be specified by a formula - provided that
expand.model.frame(x, cluster) works for the model object
x.
The cluster and/or order.by are
processed accordingly if observations were dropped in the
NA processing of the model object x (provided
x$na.action is available).
New dedicated vcovBS() method for lm
objects that (a) provides many more bootstrapping techniques applicable
to linear models (e.g., residual-based or wild bootstrap), (b) computes
the bootstrap coefficients more efficiently with lm.fit()
or qr.coef() rather than update().
New dedicated vcovBS() method for glm
objects that uses "xy" bootstrap like the default method
but uses glm.fit() instead of update() and
hence is slightly faster.
All vcovBS() methods (default, glm, and lm)
facilitate parallel bootstrapping by changing the applyfun
from the default lapply(). By setting cores
either parallel::parLapply() (on Windows) or
parallel::mclapply() (otherwise) are used.
Default handling of missing parameter estimates in
vcovBS() changed from "everything" to
"pairwise.complete.obs" and allow modification of
cov(..., use = ...). This is relevant if not all parameters
can be re-estimated on the bootstrap samples, e.g., for dummy variables
of relatively rare events.
Fix of a bug in vcovHC.mlm() (reported by James
Pustejovsky). The off-diagonal values of the vcovHC() were
computed without preserving the sign of the underlying residuals. This
issue did not affect the diagonal because the underlying cross product
amounts to squaring all values - but it does matter for the
off-diagonal. Also, type = "const" was disabled in this
scenario and vcov(...) is simply used instead of
vcovHC(..., type = "const").
Bug fix in vcovCL()/meatCL() for
multi-way clustering (reported by Brian Tsay). If patterns of levels in
one clustering variable also occured in another clustering variable,
their interactions were sometimes not computed correctly.
In vcovCL() for multi-way clustering without cluster
adjustment, all cluster adjustment factors are omitted entirely. In
previous versions they were scaled with (Gmin - 1)/Gmin, where Gmin is
the minimal number of clusters across clustering dimensions.
meatHC() and meatHAC() now pass their
... argument to estfun(), just as
meatCL(), meatPL(), and meatPC()
do as well.
Various flavors of clustered sandwich estimators in
vcovCL(), panel sandwich estimators in
vcovPL(), and panel-corrected estimators a la Beck &
Katz in vcovPC(). The new
vignette("sandwich-CL", package = "sandwich") introduces
all functions and illustrates their use and properties.
The new function vcovBS() provides a basic
(clustered) bootstrap covariance matrix estimate, using case-based
resampling.
In meatHAC(), bwAndrews(), and
bwNeweyWest() it is now assured that the
estfun() is transformed to a plain matrix. Otherwise for
time series regression with irregular zoo data, the
bandwidth estimation might have failed.
In meatHC() it is now assured that the residuals are
zero in observations where all regressors and all estimating functions
are zero.
Now the default methods of vcovHC() and
vcovHAC() are also correctly registered as S3 methods in
the NAMESPACE.
Corrected errors in Equation 3 of
vignette("sandwich", package = "sandwich"). The equation
incorrectly listed the error terms “u” instead of the observations “y”
on the right-hand side (pointed out by Karl-Kuno Kunze).
sandwich(), vcovHC(), and
vcovHAC() did not work when models were fitted with
na.action = na.exclude because the estfun()
then (correctly) preserved the NAs. This is now avoided and
all functions handle the na.exclude case like the
na.omit case. (Thanks to John Fox for spotting the problem
and suggesting the solution.)estfun() methods for survreg and
coxph objects incorrectly returned the unweighted
estimating functions in case the objects were fitted with weights. Now
the weights are properly extracted and used.Depends/Imports: Package
zoo is only in Imports now.Added estfun() and bread() methods for
ordered response models from MASS::polr() and
ordinal::clm().
Added output of examples and vignettes as .Rout.save
for R CMD check.
Added convenience function lrvar() to compute the
long-run variance of the mean of a time series: a simple wrapper for
kernHAC() and NeweyWest() applied to
lm(x ~ 1).
lm/mlm/glm models with
aliased parameters were not handled correctly (leading to errors in
sandwich()/vcovHC() etc.), fixed now.
An improved error message is issued if prewhitening in
vcovHAC() cannot work due to collinearity in the estimating
functions.
bwNeweyWest() for mlm
objects that only have an intercept.vcovHC() and related functions.estfun() method for survreg
objects.estfun() methods for
hurdle/zeroinfl objects can now handle
multiple offset vectors (if any).vcovHC() method for mlm objects. This
simply combines the “meat” for each individual regression and combines
the result.bread() method for mlm objects.estfun() methods for
hurdle/zeroinfl objects.Added/enhanced bread() and estfun()
methods for rlm and mlogit objects (from
MASS and mlogit, respectively).
Made vcovHC() and vcovHAC() generic
with previous function definitions as default methods.
Updated vignettes (in particular using the more convenient
tobit() interface from the AER
package).
Added bread() and estfun() methods for
hurdle/zeroinfl objects as computed by
hurdle()/zeroinfl() in package
pscl.
Fixed bread() and estfun() methods for
negative binomial glm objects: Now
dispersion = 1 is used.
bread() method for lm objects now calls
summary.lm() explicitely (rather than the generic) so that
it also works with aov objects.Added new vcovOPG() function for computing the outer
product of gradients estimator (works for maximum likelihood
estfun() methods only).
Scaled estfun() and bread() method for
glm objects by dispersion estimate. Hence,
this corresponds to maximum likelihood and not deviance
methods.
bwAndrews() so that it can be easily used
in models for multivariate means.A paper based on the "sandwich-OOP" vignette was
accepted for publication in volume 16(9) of Journal of Statistical
Software at https://doi.org/10.18637/jss.v016.i09.
A NAMESPACE was added for the package.
The vignette "sandwich-OOP" has been revised,
extended and released as a technical report.
Several estfun() methods and some of the
meat*() functions have been enhanced and made more
consistent.
Thanks to Henric Nilsson and Giovanni Millo for feedback and testing.
estfun() methods now use directly the
model.matrix() method instead of the terms()
and model.frame() methods.sandwich is made object-oriented, so that various
types of sandwich estimators can be computed not only for
lm models, but also glm, survreg,
etc. To achieve object orientation various changes have been made: a
sandwich() function is provided which needs a
bread and a meat matrix. For the
bread, a generic bread() function is provided,
for the meat, there are meat(),
meatHC() and meatHAC(). All rely on the
existence of a estfun() method.
vcovHC() and vcovHAC() have been
restructured to use sandwich() together with
meatHC() and meatHAC(), respectively.
A new vignette("sandwich-OOP", package = "sandwich")
has been added, explaining the new object-orientation features.
Various methods to bread() and estfun()
have been added, particularly for survreg and
coxph.
Added CITATION file, see
citation("sandwich").
Small documentation improvements.
vignette("sandwich", package = "sandwich").Added bandwidth selection a la Newey & West (1994) in
bwNeweyWest(). NeweyWest() is a new
convenience function for vcovHAC() with
bwNeweyWest().
Added estfun() methods for rlm and
coxph.
Argument omega can also be a function in
vcovHC().
Added data sets from Greene (1993): Investment and
PublicSchools.
Improvements in vcovHC() and vcovHAC().
Argument order.by can now be a formula and
ar.method can be modified (rather than being hard-coded
"ols" which is still the default).
Thanks to Hiroyuki Kawakatsu for feedback and testing.
vcovHC(): The new default is now HC3
and support was added for HC4.First CRAN release of the sandwich package for
robust covariance matrix estimators. Provides
heteroscedasticity-consistent (HC) and hetereoscedasticity- and
autocorrelation-consistent (HAC) covariance matrix estimators. Based on
prior work by Thomas Lumley in his weave package.
Thanks to Christian Kleiber for support, feedback, and testing.