| Version: | 2.0.3 | 
| Title: | Regression Helper Functions | 
| Maintainer: | Max Gordon <max@gforge.se> | 
| Description: | Methods for manipulating regression models and for describing these in a style adapted for medical journals. Contains functions for generating an HTML table with crude and adjusted estimates, plotting hazard ratio, plotting model estimates and confidence intervals using forest plots, extending this to comparing multiple models in a single forest plots. In addition to the descriptive methods, there are functions for the robust covariance matrix provided by the 'sandwich' package, a function for adding non-linearities to a model, and a wrapper around the 'Epi' package's Lexis() functions for time-splitting a dataset when modeling non-proportional hazards in Cox regressions. | 
| License: | GPL (≥ 3) | 
| URL: | http://gforge.se | 
| BugReports: | https://github.com/gforge/Greg/issues | 
| Biarch: | yes | 
| Encoding: | UTF-8 | 
| Imports: | broom, Epi, dplyr, glue, graphics, grDevices, htmlTable (≥ 2.0.0), Hmisc, knitr, methods, nlme, purrr, rlang, rms, sandwich, stats, stringr, tibble, tidyr, tidyselect, utils | 
| Depends: | Gmisc (≥ 1.0.3), forestplot, R (≥ 4.1.0) | 
| Suggests: | boot, testthat, cmprsk, survival, ggplot2, parallel, rmarkdown, rmeta, tidyverse | 
| VignetteBuilder: | knitr | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-08-19 06:31:38 UTC; max | 
| Author: | Max Gordon [aut, cre], Reinhard Seifert [aut] (Author of original plotHR) | 
| Repository: | CRAN | 
| Date/Publication: | 2025-08-19 07:40:02 UTC | 
Regression Helper Functions
Description
This package provides utilities that streamline working with regression models. Standard regression output offers a wealth of information, but for publication—especially in journals such as *NEJM* and *BMJ*—results often need additional simplification. The functions here help you get there.
Output functions
The package contains functions that automatically print crude (unadjusted) estimates alongside adjusted estimates, a presentation style expected in many medical papers.
Forest-plot wrappers let you visualise regression estimates—handy when you’re dealing with a long list of covariates. Utilities are also included for comparing models (e.g., patient subsets or different regression types).
Time splitter
Cox models sometimes violate proportional-hazards assumptions. When the
'tt()' approach becomes unwieldy on large data sets, you can time-split the
data and use start time as an interaction term. See
timeSplitter() and vignette("timeSplitter").
Other regression functions
The package extends linear regression by enabling robust covariance
matrices through the sandwich framework for
rms::ols(). Additional helpers target stats and
survival workflows, including coxph().
Important notice
Extensive tests guard against regressions, but always verify results in your own context. If you use the package with other model classes—and have tests to share—please let me know.
Author(s)
Max Gordon
See Also
Useful links:
- Report bugs at https://github.com/gforge/Greg/issues 
Add a nonlinear function to the model
Description
This function takes a model and adds a non-linear function if
the likelihood-ratio supports this (via the
anova(..., test = "chisq") test for stats
while for rms you need to use the rcs() spline
that is automatically evaluated for non-linearity).
Usage
addNonlinearity(
  model,
  variable,
  spline_fn,
  flex_param = 2:7,
  min_fn = AIC,
  sig_level = 0.05,
  verbal = FALSE,
  workers,
  ...
)
## S3 method for class 'negbin'
addNonlinearity(model, ...)
Arguments
| model | The model that is to be evaluated and adapted for non-linearity | 
| variable | The name of the parameter that is to be tested for non-linearity. Note that the variable should be included plain (i.e. as a linear variable) form in the model. | 
| spline_fn | Either a string or a function that is to be used for testing alternative non-linearity models | 
| flex_param | A  | 
| min_fn | This is the function that we want to minimized if the variable supports
the non-linearity assumption. E.g.  | 
| sig_level | The significance level for which the non-linearity is deemed as significant, defaults to 0.05. | 
| verbal | Set this to  | 
| workers | The function tries to run everything in parallel. Under some
circumstances you may want to restrict the number of parallel threads to less
than the default  | 
| ... | Passed onto internal  | 
Examples
library(Greg)
data("melanoma", package = "boot", envir = environment())
library(dplyr)
melanoma <- mutate(melanoma, 
                   status = factor(status,
                                   levels = 1:3,
                                   labels = c("Died from melanoma", 
                                              "Alive", 
                                              "Died from other causes")),
                   ulcer = factor(ulcer,
                                  levels = 0:1,
                                  labels = c("Absent", "Present")),
                   time = time/365.25, # All variables should be in the same time unit
                   sex = factor(sex,
                                levels = 0:1,
                                labels = c("Female", "Male")))
library(survival)
model <- coxph(Surv(time, status == "Died from melanoma") ~ sex + age,
  data = melanoma
)
nl_model <- addNonlinearity(model, "age",
  spline_fn = "pspline",
  verbal = TRUE,
  workers = FALSE
)
# Note that there is no support for nonlinearity in this case
Getting the bread for the vcovHC
Description
The original bread.lm uses the summary.lm function
it seems like a quick fix and I've therefore created
the original bread definition: $(X'X)^-1$
Usage
## S3 method for class 'ols'
bread(x, ...)
Arguments
| x | The  | 
| ... | arguments passed to methods. | 
Value
matrix The bread for the sandwich vcovHC function
Examples
# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)
library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")
# Main function
f <- ols(y ~ rcs(x1, 3) + x2)
# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)
options(org.op)
A function for gathering all the description options
Description
Since there are so many different description options
for the printCrudeAndAdjustedModel() function they
have been gathered into a list. This function is simply a
helper in order to generate a valid list.
Usage
caDescribeOpts(
  show_tot_perc = FALSE,
  numb_first = TRUE,
  continuous_fn = describeMean,
  prop_fn = describeFactors,
  factor_fn = describeFactors,
  digits = 1,
  colnames = c("Total", "Event")
)
Arguments
| show_tot_perc | Show percentages for the total column | 
| numb_first | Whether to show the number before the percentages | 
| continuous_fn | Stat function used for the descriptive statistics,
defaults to  | 
| prop_fn | Stat function used for the descriptive statistics,
defaults to  | 
| factor_fn | Stat function used for the descriptive statistics,
defaults to  | 
| digits | Number of digits to use in the descriptive columns. Defaults to the general digits if not specified. | 
| colnames | The names of the two descriptive columns. By default Total and Event. | 
Value
list Returns a list with all the options
A confint function for the ols
Description
This function checks that there is a df.residual
before running the qt(). If not found it then
defaults to the qnorm() function. Otherwise it is
a copy of the confint() function.
Usage
## S3 method for class 'ols'
confint(object, parm, level = 0.95, ...)
Arguments
| object | a fitted  | 
| parm | a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. | 
| level | the confidence level required. | 
| ... | additional argument(s) for methods. | 
Value
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in
Examples
# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)
library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")
# Main function
f <- ols(y ~ rcs(x1, 3) + x2)
# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)
options(org.op)
The confint function adapted for vcovHC
Description
The confint.lm uses the t-distribution as the default
confidence interval estimator. When there is reason to believe that
the normal distribution is violated an alternative approach using
the vcovHC() may be more suitable.
Usage
confint_robust(
  object,
  parm,
  level = 0.95,
  HC_type = "HC3",
  t_distribution = FALSE,
  ...
)
Arguments
| object | The regression model object, either an ols or lm object | 
| parm | a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. | 
| level | the confidence level required. | 
| HC_type | See options for  | 
| t_distribution | A boolean for if the t-distribution should be used or not. Defaults to FALSE. According to Cribari-Nieto and Lima's study from 2009 this should not be the case. | 
| ... | Additional parameters that are passed on to
 | 
Value
matrix A matrix (or vector) with columns giving lower and
upper confidence limits for each parameter. These will be labelled as
(1-level)/2 and 1 - (1-level)/2 in 
References
F. Cribari-Neto and M. da G. A. Lima, "Heteroskedasticity-consistent interval estimators", Journal of Statistical Computation and Simulation, vol. 79, no. 6, pp. 787-803, 2009 (doi:10.1080/00949650801935327)
Examples
n <- 50
x <- runif(n)
y <- x + rnorm(n)
fit <- lm(y~x)
library("sandwich")
confint_robust(fit, HC_type = "HC4m")
Fix for the Extract Empirical Estimating Functions
Description
As missing data is handled a little different for the ols
than for the lm we need to change the 
estfun to work with the ols().
Usage
## S3 method for class 'ols'
estfun(x, ...)
Arguments
| x | A fitted  | 
| ... | arguments passed to methods. | 
Details
I have never worked with weights and this should probably be checked
as this just uses the original estfun.lm as a template.
Value
matrix A matrix containing the empirical estimating functions.
Examples
# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)
library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")
# Main function
f <- ols(y ~ rcs(x1, 3) + x2)
# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)
options(org.op)
Compares different scores in different regression objects.
Description
Creates a composite from different regression objects into one forestplot where you can choose the variables of interest to get an overview and easier comparison.
Usage
forestplotCombineRegrObj(
  regr.obj,
  variablesOfInterest.regexp = NULL,
  estimate.txt = NULL,
  add_first_as_ref = FALSE,
  ref_txt = "ref.",
  digits = 1,
  post_process_data = function(x) x,
  is.summary = NULL,
  xlab = NULL,
  zero = NULL,
  xlog = NULL,
  exp = xlog,
  ...
)
Arguments
| regr.obj | A list with all the fits that have variables that are to be identified through the regular expression | 
| variablesOfInterest.regexp | A regular expression identifying the variables that are of interest of comparing. For instance it can be "(score|index|measure)" that finds scores in different models that should be compared. | 
| estimate.txt | The text of the estimate, usually HR for hazard ratio, OR for odds ratio | 
| add_first_as_ref | If you want that the first variable should be reference for that group of variables. The ref is a variable with the estimate 1 or 0 depending if exp() and the confidence interval 0. | 
| ref_txt | Text instead of estimate number | 
| digits | Number of digits to use for the estimate output | 
| post_process_data | A function that takes the data frame just prior to calling 'forestplot' and allows you to manipulate it. Primarily used for changing the 'column_label' that has the names shown in the final plot. | 
| is.summary | A vector indicating by  | 
| xlab | x-axis label | 
| zero | Indicates what is zero effect. For survival/logistic fits the zero is 1 while in most other cases it's 0. | 
| xlog | If TRUE, x-axis tick marks are to follow a logarithmic scale, e.g. for
logistic regression (OR), survival estimates (HR), Poisson regression etc.
Note: This is an intentional break with the original  | 
| exp | Report in exponential form. Default true since the function was built for use with survival models. | 
| ... | Passed to  | 
See Also
Other forestplot wrappers: 
forestplotRegrObj()
Examples
org.par <- par("ask" = TRUE)
# simulated data to test
library(tidyverse)
set.seed(10)
cov <- tibble(ftime = rexp(200),
              fstatus = sample(0:1, 200, replace = TRUE),
              x1 = runif(200),
              x2 = runif(200),
              x3 = runif(200)) |> 
  # Add some column labels
  Gmisc::set_column_labels(x1 = "First variable",
                           x2 = "Second variable")
library(rms)
ddist <- datadist(cov)
options(datadist = "ddist")
fit1 <- cph(Surv(ftime, fstatus) ~ x1 + x2, data = cov)
fit2 <- cph(Surv(ftime, fstatus) ~ x1 + x3, data = cov)
list(`First model` = fit1, 
     `Second model` = fit2) |> 
  forestplotCombineRegrObj(variablesOfInterest.regexp = "(x2|x3)") |> 
  fp_set_style(lines = "steelblue",
               box = "darkblue")
# How to add expressions to the plot label
list(fit1, fit2) |> 
  forestplotCombineRegrObj(variablesOfInterest.regexp = "(x2|x3)",
                           reference.names = c("First model", "Second model"),
                           post_process_data = \(data) {
                             data$column_label[4] <- c(rlang::expr(expression(Fever >= 38.5)))
                             return(data)
                           })
par(org.par)
Forest plot for multiple models
Description
Plot different model fits with similar variables in order to compare the model's estimates and confidence intervals. Each model is represented by a separate line on top of eachother and are therefore ideal for comparing different models. This extra appealing when you have lots of variables included in the models.
Usage
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  ...
)
## Default S3 method:
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  ...
)
## S3 method for class 'coxph'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = "Hazard Ratio",
  estimate.txt = "HR",
  xlog = TRUE,
  zero = 1,
  exp = TRUE,
  ...
)
## S3 method for class 'lrm'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = "Odds ratio",
  estimate.txt = "HR",
  xlog = TRUE,
  zero = 1,
  exp = TRUE,
  ...
)
## S3 method for class 'lm'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = "Effect",
  estimate.txt = "Coef",
  xlog = FALSE,
  zero = 0,
  exp = FALSE,
  ...
)
## S3 method for class 'glm'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = NULL,
  xlog = NULL,
  zero = NULL,
  estimate.txt = NULL,
  exp = NULL,
  ...
)
## S3 method for class 'list'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = NULL,
  xlog = NULL,
  zero = NULL,
  estimate.txt = NULL,
  exp = NULL,
  ...
)
fpBoxSize(p_values, variable_count, boxsize, significant = 0.05)
Arguments
| regr.obj | A regression model object. It should be of coxph, crr or glm class. Warning: The glm is not fully tested. | 
| postprocess_estimates.fn | A function that takes the regression outputs and returns the same data with modifications. The input columns are: * 'Rowname' * 'Coef' * 'Lower' * 'Upper' * 'Sort' | 
| rowname | The name of the variables | 
| ci.txt | The text above the confidence interval, defaults to '"CI"' | 
| ci.glue | The string used for [glue::glue()] the 'lower' and 'higher' confidence intervals together. | 
| digits | The number of digits to round presented values to | 
| get_box_size | A function for extracting the box sizes | 
| ... | Passed to  | 
| xlab | x-axis label | 
| estimate.txt | The text above the estimate, e.g. Est, HR | 
| xlog | If TRUE, x-axis tick marks are to follow a logarithmic scale, e.g. for
logistic regression (OR), survival estimates (HR), Poisson regression etc.
Note: This is an intentional break with the original  | 
| zero | Indicates what is zero effect. For survival/logistic fits the zero is 1 while in most other cases it's 0. | 
| exp | Report in exponential form. Default true since the function was built for use with survival models. | 
| p_values | The p-values that will work as the foundation for the box size | 
| variable_count | The number of variables | 
| boxsize | The default box size | 
| significant | Level of significance .05 | 
See Also
Other forestplot wrappers: 
forestplotCombineRegrObj()
Examples
org.par <- par("ask" = TRUE)
library(tidyverse)
# simulated data to test
set.seed(102)
cov <- tibble(ftime = rexp(200)) |> 
  mutate(x1 = runif(n()),
         x2 = runif(n()),
         x3 = runif(n()),
         fstatus1 = if_else(x1 * 1 + 
                              x2 * 0.2 + 
                              x3 * 0.5 + 
                              runif(n()) * 0.5 > 1, 
                            1, 0),
         fstatus2 = if_else(x1 * 0.2 + 
                              x2 * 0.5 + 
                              x3 * 0.1 + 
                              runif(n()) * 2 > 1, 
                            1, 0)) |> 
  # Add some column labels
  Gmisc::set_column_labels(x1 = "First variable",
                           x2 = "Second variable")
library(rms)
dd <- datadist(cov)
options(datadist = "dd")
fit1 <- cph(Surv(ftime, fstatus1 == 1) ~ x1 + x2 + x3, data = cov)
fit1 |> 
  forestplotRegrObj() |> 
  fp_set_zebra_style("#f0f0f0")
fit2 <- update(fit1, Surv(ftime, fstatus2 == 1) ~ .)
list("Frist model" = fit1, "Second model"  = fit2) |> 
  forestplotRegrObj(legend_args = fpLegend(title = "Type of regression"),
                    postprocess_estimates.fn = function(x) {
                      x |> 
                        filter(str_detect(column_term, "(x2|x3)"))
                    }) |> 
  fp_set_style(box = rep(c("darkblue", "darkred"), each = 3))
par(org.par)
This function helps with printing regression models
Description
This function is used for getting the adjusted and unadjusted values
for a regression model. It takes a full model and walks through each
variable, removes in the regression all variables except one then
reruns that variable to get the unadjusted value. This functions not
intended for direct use, it's better to use printCrudeAndAdjustedModel()
that utilizes this function.
Usage
getCrudeAndAdjustedModelData(
  model,
  level = 0.95,
  remove_interaction_vars = TRUE,
  remove_strata = FALSE,
  remove_cluster = FALSE,
  var_select,
  ...
)
## S3 method for class 'getCrudeAndAdjustedModelData'
x[i, j, ...]
Arguments
| model | The regression model | 
| level | The confidence interval level | 
| remove_interaction_vars | Removes the interaction terms as they in the raw state are difficult to understand | 
| remove_strata | Strata should most likely not be removed in the crude
version. If you want to force the removal of stratas you can specify the
 | 
| remove_cluster | Cluster information should most likely also retain
just as the  | 
| var_select | A vector with regular expressions for choosing what variables
to return (the same format as for the  | 
| ... | Not used | 
Details
This function saves a lot of time creating tables since it compiles a fully unadjusted list of all your used covariates.
If the model is an exponential poisson/logit/cox regression model then it automatically reports the exp() values instead of the original values
The function skips by default all spline variables since this becomes very complicated and there is no simple
\beta
to display. For the same reason it skips any interaction variables since it's probably better to display these as a contrast table.
Note that the rms regression has a separate function that uses the rms:::summaryrms function
that returns a matrix that is then pruned.
Value
Returns a matrix with the columns:
c("Crude", "2.5 %", "97.5 %", "Adjusted", "2.5 %", "97.5 %").
The row order is not changed from the original model. The percentages can vary depending
on the set level.
See Also
Other crudeAndAdjusted functions: 
printCrudeAndAdjustedModel()
Examples
# simulated data to use
set.seed(10)
ds <- data.frame(
  ftime = rexp(200),
  fstatus = sample(0:1, 200, replace = TRUE),
  x1 = runif(200),
  x2 = runif(200),
  x3 = runif(200),
  x4 = runif(200),
  x5 = runif(200)
)
library(rms)
dd <- datadist(ds)
options(datadist = "dd")
s <- Surv(ds$ftime, ds$fstatus == 1)
fit <- cph(s ~ x1 + x2 + x3, data = ds)
data_matrix <- getCrudeAndAdjustedModelData(fit)
print(data_matrix)
# If we have interaction then those variable are not
# reported
fit <- cph(s ~ x1 + x2 + x3 + x4 * x5, data = ds)
data_matrix <- getCrudeAndAdjustedModelData(fit)
print(data_matrix)
Get model data
Description
A helper function for forestplotCombineRegrObj(). Extracts
the data from the regression model fits and returns a list
with model data gathered by the function [broom::tidy()]
Usage
getModelData4Forestplot(
  regr.obj,
  exp = TRUE,
  variablesOfInterest.regexp = NULL,
  add_first_as_ref = FALSE
)
Arguments
| regr.obj | A list with all the fits that have variables that are to be identified through the regular expression | 
| exp | Report in exponential form. Default true since the function was built for use with survival models. | 
| variablesOfInterest.regexp | A regular expression identifying the variables that are of interest of comparing. For instance it can be "(score|index|measure)" that finds scores in different models that should be compared. | 
| add_first_as_ref | If you want that the first variable should be reference for that group of variables. The ref is a variable with the estimate 1 or 0 depending if exp() and the confidence interval 0. | 
Examples
org.par <- par("ask" = TRUE)
# simulated data to test
library(tidyverse)
set.seed(10)
cov <- tibble(ftime = rexp(200),
              fstatus = sample(0:1, 200, replace = TRUE),
              x1 = runif(200),
              x2 = runif(200),
              x3 = runif(200)) |> 
  # Add some column labels
  Gmisc::set_column_labels(x1 = "First variable",
                           x2 = "Second variable")
library(rms)
ddist <- datadist(cov)
options(datadist = "ddist")
fit1 <- cph(Surv(ftime, fstatus) ~ x1 + x2, data = cov)
fit2 <- cph(Surv(ftime, fstatus) ~ x1 + x3, data = cov)
list(`First model` = fit1, 
     `Second model` = fit2) |> 
  forestplotCombineRegrObj(variablesOfInterest.regexp = "(x2|x3)") |> 
  fp_set_style(lines = "steelblue",
               box = "darkblue")
# How to add expressions to the plot label
list(fit1, fit2) |> 
  forestplotCombineRegrObj(variablesOfInterest.regexp = "(x2|x3)",
                           reference.names = c("First model", "Second model"),
                           post_process_data = \(data) {
                             data$column_label[4] <- c(rlang::expr(expression(Fever >= 38.5)))
                             return(data)
                           })
par(org.par)
Get the hat matrix for the OLS
Description
The hat matrix comes from the residual definition:
\hat{\epsilon} = y-X\hat{\beta} = \{I_n-X(X'X)X'\}y = (I_n-H)y
where the H is called the hat matrix since
Hy = \hat{y}
. The hat
values are actually the diagonal elements of the matrix that sum up
to p (the rank of X, i.e. the number of parameters + 1). 
See ols.influence().
Usage
## S3 method for class 'ols'
hatvalues(model, ...)
Arguments
| model | The ols model fit | 
| ... | arguments passed to methods. | 
Value
vector
Examples
# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)
library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")
# Main function
f <- ols(y ~ rcs(x1, 3) + x2)
# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)
options(org.op)
Functions for checking regression type
Description
The isFitCoxPH A simple check if object inherits either "coxph" or "crr" class indicating that it is a survival function.
Usage
isFitCoxPH(fit)
isFitLogit(fit)
Arguments
| fit | Regression object | 
Value
boolean Returns TRUE if the object is of that type
otherwise it returns FALSE.
Examples
# simulated data to use
set.seed(10)
ds <- data.frame(
  ftime = rexp(200),
  fstatus = sample(0:1, 200, replace = TRUE),
  x1 = runif(200),
  x2 = runif(200),
  x3 = runif(200)
)
library(survival)
library(rms)
dd <- datadist(ds)
options(datadist = "dd")
s <- Surv(ds$ftime, ds$fstatus == 1)
fit <- cph(s ~ x1 + x2 + x3, data = ds)
if (isFitCoxPH(fit)) {
  print("Correct, the cph is of cox PH hazard type")
}
fit <- coxph(s ~ x1 + x2 + x3, data = ds)
if (isFitCoxPH(fit)) {
  print("Correct, the coxph is of cox PH hazard type")
}
library(cmprsk)
set.seed(10)
ftime <- rexp(200)
fstatus <- sample(0:2, 200, replace = TRUE)
cov <- matrix(runif(600), nrow = 200)
dimnames(cov)[[2]] <- c("x1", "x2", "x3")
fit <- crr(ftime, fstatus, cov)
if (isFitCoxPH(fit)) {
  print(paste(
    "Correct, the competing risk regression is",
    "considered a type of cox regression",
    "since it has a Hazard Ratio"
  ))
}
# ** Borrowed code from the lrm example **
# Fit a logistic model containing predictors age, blood.pressure, sex
# and cholesterol, with age fitted with a smooth 5-knot restricted cubic
# spline function and a different shape of the age relationship for males
# and females.
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c("female", "male"), n, TRUE))
label(age) <- "Age" # label is in Hmisc
label(cholesterol) <- "Total Cholesterol"
label(blood.pressure) <- "Systolic Blood Pressure"
label(sex) <- "Sex"
units(cholesterol) <- "mg/dl" # uses units.default in Hmisc
units(blood.pressure) <- "mmHg"
# To use prop. odds model, avoid using a huge number of intercepts by
# grouping cholesterol into 40-tiles
# Specify population model for log odds that Y = 1
L <- .4 * (sex == "male") + .045 * (age - 50) +
  (log(cholesterol - 10) - 5.2) * (-2 * (sex == "female") + 2 * (sex == "male"))
# Simulate binary y to have Prob(y = 1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
cholesterol[1:3] <- NA # 3 missings, at random
ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist = "ddist")
fit_lrm <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol, 4)),
  x = TRUE, y = TRUE
)
if (isFitLogit(fit_lrm) == TRUE) {
  print("Correct, the lrm is a logistic regression")
}
fit_lm <- lm(blood.pressure ~ sex)
if (isFitLogit(fit_lm) == FALSE) {
  print("Correct, the lm is not a logistic regression")
}
fit_glm_logit <- glm(y ~ blood.pressure + sex * (age + rcs(cholesterol, 4)),
  family = binomial()
)
if (isFitLogit(fit_glm_logit) == TRUE) {
  print("Correct, the glm with a family of binomial is a logistic regression")
}
fit_glm <- glm(blood.pressure ~ sex)
if (isFitLogit(fit_glm) == FALSE) {
  print("Correct, the glm without logit as a family is not a logistic regression")
}
A fix for the model.matrix
Description
The model.matrix.lm() that the ols() falls back upon
"forgets" the intercept value and behaves unreliable in
the vcovHC() functions. I've therefore created this sub-function
to generate the actual model.matrix() by just accessing the formula.
Usage
## S3 method for class 'ols'
model.matrix(object, ...)
Arguments
| object | A Model | 
| ... | Parameters passed on | 
Value
matrix
Plot a spline in a Cox regression model
Description
This function is a more specialized version of the termplot() function. It
creates a plot with the spline against hazard ratio. The plot can additianally have
indicator of variable density and have multiple lines.
Usage
plotHR(
  models,
  term = 1,
  se = TRUE,
  cntrst = ifelse(inherits(models, "rms") || inherits(models[[1]], "rms"), TRUE, FALSE),
  polygon_ci = TRUE,
  rug = "density",
  xlab = "",
  ylab = "Hazard Ratio",
  main = NULL,
  xlim = NULL,
  ylim = NULL,
  col.term = "#08519C",
  col.se = "#DEEBF7",
  col.dens = grey(0.9),
  lwd.term = 3,
  lty.term = 1,
  lwd.se = lwd.term,
  lty.se = lty.term,
  x.ticks = NULL,
  y.ticks = NULL,
  ylog = TRUE,
  cex = 1,
  y_axis_side = 2,
  plot.bty = "n",
  axes = TRUE,
  alpha = 0.05,
  ...
)
## S3 method for class 'plotHR'
print(x, ...)
## S3 method for class 'plotHR'
plot(x, y, ...)
Arguments
| models | A single model or a list() with several models | 
| term | The term of interest. Can be either the name or the number of the covariate in the model. | 
| se | Boolean if you want the confidence intervals or not | 
| cntrst | By contrasting values you can have the median as a reference point making it easier to compare hazard ratios. | 
| polygon_ci | If you want a polygon as indicator for your confidence interval. This can also be in the form of a vector if you have several models. Sometimes you only want one model to have a polygon and the rest to be dotted lines. This gives the reader an indication of which model is important. | 
| rug | The rug is the density of the population along the spline variable. Often this is displayed as a jitter with bars that are thicker & more common when there are more observations in that area or a smooth density plot that looks like a mountain. Use "density" for the mountain view and "ticks" for the jitter format. | 
| xlab | The label of the x-axis | 
| ylab | The label of the y-axis | 
| main | The main title of the plot | 
| xlim | A vector with 2 elements containing the upper & the lower bound of the x-axis | 
| ylim | A vector with 2 elements containing the upper & the lower bound of the y-axis | 
| col.term | The color of the estimate line. If multiple lines you can have different colors by giving a vector. | 
| col.se | The color of the confidence interval. If multiple lines you can have different colors by giving a vector. | 
| col.dens | The color of the density plot. Ignored if you're using jitter | 
| lwd.term | The width of the estimated line. If you have more than one model then provide the function with a vector if you want to have different lines for different width for each model. | 
| lty.term | The typeof the estimated line, see lty. If you have more than one model then provide the function with a vector if you want to have different line types for for each model. | 
| lwd.se | The line width of your confidence interval. This is ignored if you're using polygons for all the confidence intervals. | 
| lty.se | The line type of your confidence interval. This is ignored if you're using polygons for all the confidence intervals. | 
| x.ticks | The ticks for the x-axis if you desire other than the default. | 
| y.ticks | The ticks for the y-axis if you desire other than the default. | 
| ylog | Show a logarithmic y-axis. Not having a logarithmic axis might seem easier to understand but it's actually not really a good idea. The distance between HR 0.5 and 2.0 should be the same. This will only show on a logarithmic scale and therefore it is strongly recommended to use the logarithmic scale. | 
| cex | Increase if you want larger font size in the graph. | 
| y_axis_side | The side that the y axis is to be plotted, see axis() for details | 
| plot.bty | Type of box that you want. See the bty description in graphical parameters (par). If bty is one of "o" (the default), "l", "7", "c", "u", or "]" the resulting box resembles the corresponding upper case letter. A value of "n" suppresses the box. | 
| axes | A boolean that is used to identify if axes are to be plotted | 
| alpha | The alpha level for the confidence intervals | 
| ... | Any additional values that are to be sent to the plot() function | 
| x | Sent the 'plotHR' object to plot | 
| y | Ignored in plot | 
Value
The function does not return anything
Multiple models in one plot
The function allows for plotting multiple splines in one graph. Sometimes you might want to show more than one spline for the same variable. This allows you to create that comparison.
Examples of a situation where I've used multiple splines in one plot is when I want to look at a variables behavior in different time periods. This is another way of looking at the proportional hazards assumption. The Schoenfeld residuals can be a little tricky to look at when you have the splines.
Another example of when I've used this is when I've wanted to plot adjusted and unadjusted splines. This can very nicely demonstrate which of the variable span is mostly confounded. For instance - younger persons may exhibit a higher risk for a procedure but when you put in your covariates you find that the increased hazard changes back to the basic
Author(s)
Reinhard Seifert, Max Gordon
Examples
org_par <- par(xaxs = "i", ask = TRUE)
library(survival)
library(rms)
library(dplyr)
library(Gmisc)
# Get data for example
n <- 1000
set.seed(731)
ds <- tibble(age = round(50 + 12 * rnorm(n), 1),
             smoking = sample(c("Yes", "No"), n, rep = TRUE, prob = c(.2, .75)),
             sex = sample(c("Male", "Female"), n, rep = TRUE, prob = c(.6, .4))) |>
  # Build outcome
  mutate(h = .02 * exp(.02 * (age - 50) + .1 *
                         ((age - 50) / 10)^3 + .8 *
                         (sex == "Female") + 2 *
                         (smoking == "Yes")),
         cens = 15 * runif(n),
         dt = -log(runif(n)) / h,
         e = if_else(dt <= cens, 1, 0),
         dt = pmin(dt, cens),
         # Add missing data to smoking
         smoking = case_when(runif(n) < 0.05 ~ NA_character_,
                             TRUE ~ smoking)) |>
  set_column_labels(age = "Age",
                    dt = "Follow-up time") |>
  set_column_units(dt = "Year")
library(splines)
fit.coxph <- coxph(Surv(dt, e) ~ bs(age, 3) + sex + smoking, data = ds)
plotHR(fit.coxph, term = "age", plot.bty = "o", xlim = c(30, 70), xlab = "Age")
dd <- datadist(ds)
options(datadist = "dd")
fit.cph <- cph(Surv(dt, e) ~ rcs(age, 4) + sex + smoking, data = ds, x = TRUE, y = TRUE)
plotHR(fit.cph,
       term = 1,
       plot.bty = "L",
       xlim = c(30, 70),
       ylim = 2^c(-3, 3),
       xlab = "Age"
)
plotHR(fit.cph,
       term = "age",
       plot.bty = "l",
       xlim = c(30, 70),
       ylog = FALSE,
       rug = "ticks",
       xlab = "Age"
)
unadjusted_fit <- cph(Surv(dt, e) ~ rcs(age, 4), data = ds, x = TRUE, y = TRUE)
plotHR(list(fit.cph, unadjusted_fit),
       term = "age",
       xlab = "Age",
       polygon_ci = c(TRUE, FALSE),
       col.term = c("#08519C", "#77777799"),
       col.se = c("#DEEBF7BB", grey(0.6)),
       lty.term = c(1, 2),
       plot.bty = "l", xlim = c(30, 70)
)
par(org_par)
Add reference according to the model
Description
This is of course for factored variables and not in general.
Usage
prCaAddRefAndStat(
  model,
  var_order,
  add_references,
  add_references_pos,
  reference_zero_effect,
  values,
  ds,
  desc_column,
  desc_args,
  use_labels
)
Arguments
| model | A regression model fit, i.e. the returned object from your
regression function, or the output from  | 
| var_order | The output from the  | 
| add_references | True if it should use the data set to look for references, otherwise supply the function with a vector with names. Sometimes you want to indicate the reference row for each group. This needs to be just as many as the groups as the order identified. Use NA if you don't want to have a reference for that particular group. | 
| add_references_pos | The position where a reference should be added.
Sometimes you don't want the reference to be at the top, for instance
if you have age groups then you may have < 25, 25-39, 40-55, > 55 and
you have the reference to be 25-39 then you should set the reference
list for  | 
| reference_zero_effect | Used with references, tells if zero effect
is in exponential form, i.e.  | 
| values | The values that are to be outputted | 
| ds | The dataset | 
| desc_column | Add descriptive column to the crude and adjusted table | 
| desc_args | The description arguments that are to be used for the
the description columns. The options/arguments should be generated by the
 | 
| use_labels | If the rowname.fn function doesn't change the name then the label should be used instead of the name, that is if there is a label and it isn't a factor. | 
Value
list
See Also
Other printCrudeAndAdjusted functions: 
prCaAddReference(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Adds a reference to value matrix
Description
Adds a reference to value matrix
Usage
prCaAddReference(
  vn,
  var_order,
  values,
  add_references_pos,
  reference_zero_effect,
  ds,
  use_labels
)
Arguments
| vn | Variable name | 
| var_order | The output from the  | 
| values | The value matrix | 
| add_references_pos | The position where a reference should be added.
Sometimes you don't want the reference to be at the top, for instance
if you have age groups then you may have < 25, 25-39, 40-55, > 55 and
you have the reference to be 25-39 then you should set the reference
list for  | 
| reference_zero_effect | Used with references, tells if zero effect
is in exponential form, i.e.  | 
| ds | The data set | 
| use_labels | If the rowname.fn function doesn't change the name then the label should be used instead of the name, that is if there is a label and it isn't a factor. | 
Value
matrix A matrix with rgroup and n.rgroup attributes
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Adds references
Description
Adds references
Usage
prCaAddUserReferences(
  reordered_groups,
  var_order,
  add_references,
  add_references_pos,
  reference_zero_effect
)
Arguments
| reordered_groups | The value matrix that needs refrences | 
| var_order | The output from the  | 
| add_references | True if it should use the data set to look for references, otherwise supply the function with a vector with names. Sometimes you want to indicate the reference row for each group. This needs to be just as many as the groups as the order identified. Use NA if you don't want to have a reference for that particular group. | 
| add_references_pos | The position where a reference should be added.
Sometimes you don't want the reference to be at the top, for instance
if you have age groups then you may have < 25, 25-39, 40-55, > 55 and
you have the reference to be 25-39 then you should set the reference
list for  | 
| reference_zero_effect | Used with references, tells if zero effect
is in exponential form, i.e.  | 
Value
matrix The reordered_groups with references and the
attribute "var_order" in order to keep track of no. of variables per row.
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Get the confidence intervals
Description
These are functions that get the estimates and the confidence intervals. Due to package differences there are some local modifications.
Usage
prCaDefaultGetCoefAndCI(model, level, skip_intercept = FALSE)
prCaRmsGetCoefAndCI(model, level, vn, data)
Arguments
| model | The regression model | 
| level | The confidence interval level | 
| skip_intercept | If the model should remove the intercept from the returned values. | 
| vn | The variable names | 
| data | The data set | 
Value
matrix Returns a n x 3 matrix where the n equals the number
of variables.
The default
Gets the estimate and confidence interval using the confint
and  coef.
The rms
The rms-package does not have confint implemented and it is therefore a
better option to go through the summary function (rms:::summary.rms).
Infortunately skip intercept is not an option as the summary doesn't
include the intercept for the rms regression outputs
Function for retrieving the imputation arguments
Description
Function for retrieving the imputation arguments
Usage
prCaGetImputationCols(impute_args, output_mtrx, model, data)
Arguments
| impute_args | The imputation arguments from  | 
| output_mtrx | The reordered groups matrix (a nx4 matrix)
that have been prepared in for the  | 
| model | The imputation model. Currently only  | 
| data | The data that has been used for generating the model. | 
Value
matrix Returns a matrix with the requested columns
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaAddUserReferences(),
prCaGetRowname(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Gets the labelled rowname if it exists
Description
Looks for matches inside factors if rowname contains the name of the column.
Usage
prCaGetRowname(vn, use_labels, dataset)
Arguments
| vn | The variable name | 
| use_labels | If labels should be used | 
| dataset | The dataset | 
Value
string The rowname
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Gets the variable stats
Description
Gets the variable stats
Usage
prCaGetVnStats(
  model,
  vn,
  outcome,
  ds,
  add_references,
  add_references_pos,
  desc_args
)
Arguments
| model | The model | 
| vn | The variable name | 
| outcome | The outcome vector | 
| ds | The dataset | 
| add_references | True if it should use the data set to look for references, otherwise supply the function with a vector with names. Sometimes you want to indicate the reference row for each group. This needs to be just as many as the groups as the order identified. Use NA if you don't want to have a reference for that particular group. | 
| add_references_pos | The position where a reference should be added.
Sometimes you don't want the reference to be at the top, for instance
if you have age groups then you may have < 25, 25-39, 40-55, > 55 and
you have the reference to be 25-39 then you should set the reference
list for  | 
| desc_args | The description arguments that are to be used for the
the description columns. The options/arguments should be generated by the
 | 
Value
matrix A matrix from getDescriptionStatsBy or
prGetStatistics
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Prettify the text
Description
Sets the number of digits, formats the confidence interval and changes the number of cols into 4 where the upper and lower CI meet in one string column
Usage
prCaPrepareCrudeAndAdjusted(x, ci_lim, digits, sprintf_ci_str)
Arguments
| x | The value matrix from getCrudeAndAdjusted | 
| ci_lim | The limits of the confidence interval | 
| digits | The number of decimal digits to use | 
| sprintf_ci_str | The  | 
Value
matrix A string matrix with the values formated
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaGetVnStats(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Reorder according to the requested variables
Description
Uses the prCaSelectAndOrderVars for finding the
orders according to the order argument.
Usage
prCaReorder(mtrx2reorder, var_order, order)
Arguments
| mtrx2reorder | The matrix to reorder | 
| var_order | The variables representing different rows
 | 
| order | A vector of strings used for  | 
Value
matrix Returns the mtrx2reorder rearranged with the
attribute "greps" for the greps from prCaSelectAndOrderVars
and the attribute "var_order" for the new var_order
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Adds the ordering, references, and descriptions
Description
This is a wrapper function around some more basic functions that
printCrudeAndAdjustedModel() uses.
Usage
prCaReorderReferenceDescribe(
  x,
  model,
  order,
  var_order,
  add_references,
  add_references_pos,
  reference_zero_effect,
  ds,
  desc_column,
  desc_args,
  use_labels
)
Arguments
| x | The main value matrix from the  | 
| model | The model | 
| order | A vector A vector with regular expressions for each group. | 
| var_order | The output from the  | 
| add_references | True if it should use the data set to look for references, otherwise supply the function with a vector with names. Sometimes you want to indicate the reference row for each group. This needs to be just as many as the groups as the order identified. Use NA if you don't want to have a reference for that particular group. | 
| add_references_pos | The position where a reference should be added.
Sometimes you don't want the reference to be at the top, for instance
if you have age groups then you may have < 25, 25-39, 40-55, > 55 and
you have the reference to be 25-39 then you should set the reference
list for  | 
| reference_zero_effect | Used with references, tells if zero effect
is in exponential form, i.e.  | 
| ds | The dataset from the model | 
| desc_column | Add descriptive column to the crude and adjusted table | 
| desc_args | The description arguments that are to be used for the
the description columns. The options/arguments should be generated by the
 | 
| use_labels | If the rowname.fn function doesn't change the name then the label should be used instead of the name, that is if there is a label and it isn't a factor. | 
Value
The reordered groups as a matrix
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaSelectAndOrderVars(),
prCaSetRownames()
Re-order variables
Description
Re-order variables
Usage
prCaSelectAndOrderVars(names, order, ok2skip = FALSE)
Arguments
| names | The names of the variables | 
| order | The order regular expression | 
| ok2skip | If you have the intercept then it should be ok for the function to skip that variable if it isn't found among the variable list | 
Value
vector A vector containing the greps
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSetRownames()
Sets the rownames of the reordered_groups
Description
Sets the rownames of the reordered_groups
Usage
prCaSetRownames(reordered_groups, var_order, rowname.fn, use_labels, ds)
Arguments
| reordered_groups | The value matrix that needs refrences | 
| var_order | The output from the  | 
| rowname.fn | A rowname function for tailoring names | 
| use_labels | Whether to use labels or not | 
| ds | The model data set | 
Value
matrix Returns the reordered_groups
See Also
Other printCrudeAndAdjusted functions: 
prCaAddRefAndStat(),
prCaAddReference(),
prCaAddUserReferences(),
prCaGetImputationCols(),
prCaGetRowname(),
prCaGetVnStats(),
prCaPrepareCrudeAndAdjusted(),
prCaReorder(),
prCaReorderReferenceDescribe(),
prCaSelectAndOrderVars()
Removes the printCrudeAndAdjusted class from arguments
Description
Removes the printCrudeAndAdjusted class from arguments
Usage
prClearPCAclass(pca)
Value
list
A function for converting a useNA variable
Description
The variable is suppose to be directly compatible with table(..., useNA = useNA). It throughs an error if not compatible
Usage
prConvertShowMissing(useNA)
Arguments
| useNA | Boolean or "no", "ifany", "always" | 
Value
string
Runs an fastDoCall() within the environment of the model
Description
Sometimes the function can't find some of the variables that
were available when running the original variable. This function
uses the as.formula() together with
environment() in order to get the environment
that the original code used.
Usage
prEnvModelCall(model, what, ...)
Arguments
| model | The model used | 
| what | The function or non-empty character string used for
 | 
| ... | Additional arguments passed to the function | 
Get model outcome
Description
Uses the model to extract the outcome variable. Throws error if unable to find the outcome.
Usage
prExtractOutcomeFromModel(model, mf)
Arguments
| model | The fitted model | 
| mf | The dataset that the model is fitted to - if missing it
uses the  | 
Value
vector
Looks for unique rowname match without grep
Description
Since a rowname may contain characters reserved by regular expressions I've found it easier to deal with the rowname finding by just checking for matching strings at the beginning of the name while at the same time excluding names that have the same stem, i.e. DM and DM_COMP will cause an issue since DM will match both rows.
Usage
prFindRownameMatches(rnames, vn, vars)
Arguments
| rnames | A vector with the rownames that are looked for | 
| vn | The variable name that is of interest | 
| vars | A vector with all the names and the potentially competing names | 
Value
integer A vector containing the position of the matches
TODO: remove this function in favor of the more powerful prMapVariable2Name
Get model data.frame
Description
Returns the raw variables from the original data
frame using the get_all_vars()
but with the twist that it also performs any associated
subsetting based on the model's subset() argument.
Usage
prGetModelData(x, terms_only = FALSE, term.label)
Arguments
| x | The fitted model. | 
| terms_only | Only use the right side of the equation by selecting the terms | 
| term.label | Sometimes need to retrieve specific spline labels that are not among the 'labels(terms(x))' | 
Value
data.frame
Get the models variables
Description
This function extract the modelled variables. Any interaction terms are removed as those should already be represented by the individual terms.
Usage
prGetModelVariables(
  model,
  remove_splines = TRUE,
  remove_interaction_vars = FALSE,
  add_intercept = FALSE
)
Arguments
| model | A model fit | 
| remove_splines | If splines, etc. should be cleaned from the variables as these no longer are "pure" variables | 
| remove_interaction_vars | If interaction variables are
not interesting then these should be removed. Often in
the case of  | 
| add_intercept | Adds the intercept if it exists | 
Value
vector with names
Get statistics according to the type
Description
A simple function applied by the getDescriptionStatsBy()
for the total column. This function is also used by printCrudeAndAdjustedModel()
in case of a basic linear regression is asked for a raw stat column
Usage
prGetStatistics(
  x,
  show_perc = FALSE,
  html = TRUE,
  digits = 1,
  numbers_first = TRUE,
  useNA = "no",
  show_all_values = FALSE,
  continuous_fn = describeMean,
  factor_fn = describeFactors,
  prop_fn = factor_fn,
  percentage_sign = percentage_sign
)
Arguments
| x | The variable that we want the statistics for | 
| show_perc | If this is a factor/proportion variable then we might want to show the percentages | 
| html | If the output should be in html or LaTeX formatting | 
| digits | Number of decimal digits | 
| numbers_first | If number is to be prior to the percentage | 
| useNA | If missing should be included | 
| show_all_values | This is by default false as for instance if there is no missing and there is only one variable then it is most sane to only show one option as the other one will just be a complement to the first. For instance sex - if you know gender then automatically you know the distribution of the other sex as it's 100 % - other %. | 
| continuous_fn | A function for describing continuous variables
defaults to  | 
| factor_fn | A function for describing factors, defaults to
 | 
| prop_fn | A function for describing proportions, defaults to the factor function | 
| percentage_sign | If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. | 
Value
A matrix or a vector depending on the settings
TODO: Use the Gmisc function instead of this copy
A function that tries to resolve what variable corresponds to what row
Description
As both the getCrudeAndAdjustedModelData() and the
printCrudeAndAdjustedModel() need to now exactly
what name from the coef()/summary.rms()
correspond to we for generalizeability this rather elaborate function.
Usage
prMapVariable2Name(var_names, available_names, data, force_match = TRUE)
Arguments
| var_names | The variable names that are saught after | 
| available_names | The names that are available to search through | 
| data | The data set that is saught after | 
| force_match | Whether all variables need to be identified or not.
E.g. you may only want to use some variables and already pruned the
 | 
Value
list Returns a list with each element has the corresponding
variable name and a subsequent list with the parameters no_rows
and location indiciting the number of rows corresponding to that
element and where those rows are located. For factors the list also contains
lvls and no_lvls.
Chooses the degrees of freedom for the non-linearity
Description
Looks for the model with the minimal min_fn within the flex_param span.
Usage
prNlChooseDf(
  model,
  flex_param,
  variable,
  spline_fn,
  min_fn,
  simplest_nonlinear,
  verbal,
  workers,
  libraries
)
Arguments
| model | The model that is to be evaluated and adapted for non-linearity | 
| flex_param | A  | 
| variable | The name of the parameter that is to be tested for non-linearity. Note that the variable should be included plain (i.e. as a linear variable) form in the model. | 
| spline_fn | Either a string or a function that is to be used for testing alternative non-linearity models | 
| min_fn | This is the function that we want to minimized if the variable supports
the non-linearity assumption. E.g.  | 
| simplest_nonlinear | The simplest non-linear form that the ANOVA has been tested against | 
| verbal | Set this to  | 
| workers | The function tries to run everything in parallel. Under some
circumstances you may want to restrict the number of parallel threads to less
than the default  | 
| libraries | If we use the parallel approach we need to make sure that the right libraries are available in the threads | 
Plots the confidence intervals
Description
Uses polygon() or
lines() to plot confidence
intervals.
Usage
prPhConfIntPlot(model_data, color, polygon, lwd, lty)
Arguments
| model_data | A data frame with 'xvalues', 'upper', and 'lower' columns. | 
| color | The color of the line/polygon | 
| polygon | Boolean indicating polygon or line | 
| lwd | Line width - see  | 
| lty | Line type - see  | 
Value
void The function performs the print
Plot a density on the datapoints
Description
Plot a density on the datapoints
Usage
prPhDensityPlot(xvalues, color)
Arguments
| xvalues | The xvalues that are used for the density | 
| color | The color of the density polygon | 
Value
void
Gets the non-linear function's estimate
Description
The function uses predict if not specified contrast in order to attain the estimate, upper, and lower confidence interval
Usage
prPhEstimate(model, term.label, ylog, cntrst, xlim, alpha, new_data)
Arguments
| model | The fit of the model to be plotted | 
| term.label | The name of the label | 
| ylog | If the outcome should be presented in the anti-log form, i.e.
 | 
| cntrst | A boolean that indicates if the  | 
| xlim | The xlim if provided | 
| new_data | If not provided the function looks for the most common values i.e. median for continuous and mode for factors. | 
Value
data.frame with the columns xvalues, fit, ucl, lcl
A function for retrieving new_data argument for predict
Description
A function for retrieving new_data argument for predict
Usage
prPhNewData(model, term.label, xlim)
Arguments
| model | |
| term.label | The label that is the one that  | 
| xlim | The x-limits for the plot if any | 
Value
data.frame
Plot a rug on the datapoints
Description
Plot a rug on the datapoints
Usage
prPhRugPlot(xvalues)
Arguments
| xvalues | The xvalues that are used for the density | 
Value
void
Prep for printing
Description
Since we have both the print() and the
knit_print() that we need to call it is
useful to have a common string preparation.
Note: Currently knit_print doesn't work as expected...
Usage
prPrintCAstring(x, ...)
Arguments
| x | The output object from the  | 
| ... | outputs from  | 
Output crude and adjusted model data
Description
Prints table for a fitted object. It prints by default a latex table but can
also be converted into a HTML table that should be more compatible with common
word processors. For details run vignette("printCrudeAndAdjustedModel")
Usage
printCrudeAndAdjustedModel(
  model,
  order,
  digits = 2,
  ci_lim = c(-Inf, Inf),
  sprintf_ci_str = getOption("sprintf_ci_str", "%s to %s"),
  add_references,
  add_references_pos,
  reference_zero_effect,
  groups,
  rowname.fn,
  use_labels = TRUE,
  desc_column = FALSE,
  desc_args = caDescribeOpts(digits = digits),
  impute_args,
  ...
)
## S3 method for class 'printCrudeAndAdjusted'
rbind(..., alt.names, deparse.level = 1)
## S3 method for class 'printCrudeAndAdjusted'
print(x, ...)
## S3 method for class 'printCrudeAndAdjusted'
htmlTable(x, ...)
## S3 method for class 'printCrudeAndAdjusted'
x[i, j, ...]
## S3 method for class 'printCrudeAndAdjusted'
cbind(..., alt.names, deparse.level = 1)
## S3 method for class 'printCrudeAndAdjusted'
knit_print(x, ...)
## S3 method for class 'printCrudeAndAdjusted'
latex(object, ...)
Arguments
| model | A regression model fit, i.e. the returned object from your
regression function, or the output from  | 
| order | A vector with regular expressions for each group, use if youe want to reorder the groups in another way than what you've used in your original function. You can also use this in order to skip certain variables from the output. | 
| digits | The number of digits to round to | 
| ci_lim | A limit vector number that specifies if any values should be
abbreviated above or below this value, for instance a value of 1000
would give a value of  | 
| sprintf_ci_str | A string according to  | 
| add_references | True if it should use the data set to look for references, otherwise supply the function with a vector with names. Sometimes you want to indicate the reference row for each group. This needs to be just as many as the groups as the order identified. Use NA if you don't want to have a reference for that particular group. | 
| add_references_pos | The position where a reference should be added.
Sometimes you don't want the reference to be at the top, for instance
if you have age groups then you may have < 25, 25-39, 40-55, > 55 and
you have the reference to be 25-39 then you should set the reference
list for  | 
| reference_zero_effect | Used with references, tells if zero effect
is in exponential form, i.e.  | 
| groups | If you wish to have other than the default  | 
| rowname.fn | A function that takes a row name and sees if it needs beautifying. The function has only one parameter the coefficients name and should return a string or expression. | 
| use_labels | If the rowname.fn function doesn't change the name then the label should be used instead of the name, that is if there is a label and it isn't a factor. | 
| desc_column | Add descriptive column to the crude and adjusted table | 
| desc_args | The description arguments that are to be used for the
the description columns. The options/arguments should be generated by the
 | 
| impute_args | A list with additional arguments if the provided input is
a imputed object. Currently the list options  | 
| ... | outputs from  | 
| alt.names | If you don't want to use named arguments for the  | 
| deparse.level | backward compatibility | 
| x | The output object from the  | 
| object | The output object from the printCrudeAndAdjustedModel function | 
Value
matrix Returns a matrix of class printCrudeAndAdjusted that
has a default print method associated with
Warning
If you call this function and you've changed any of the variables used in the original call, i.e. the premises are changed, this function will not remember the original values and the statistics will be faulty!
See Also
latex() for details.
Other crudeAndAdjusted functions: 
getCrudeAndAdjustedModelData()
Examples
# simulated data to use
set.seed(10)
ds <- data.frame(
  ftime = rexp(200),
  fstatus = sample(0:1, 200, replace = TRUE),
  Variable1 = runif(200),
  Variable2 = runif(200),
  Variable3 = runif(200),
  Variable4 = factor(sample(LETTERS[1:4], size = 200, replace = TRUE))
)
library(rms)
dd <- datadist(ds)
options(datadist = "dd")
fit <- cph(Surv(ftime, fstatus) ~ Variable1 + Variable3 + Variable2 + Variable4,
  data = ds, x = TRUE, y = TRUE
)
printCrudeAndAdjustedModel(fit, order = c("Variable[12]", "Variable3"))
printCrudeAndAdjustedModel(fit,
  order = c("Variable3", "Variable4"),
  add_references = TRUE,
  desc_column = TRUE
)
# Now to a missing example
n <- 500
ds <- data.frame(
  x1 = factor(sample(LETTERS[1:4], size = n, replace = TRUE)),
  x2 = rnorm(n, mean = 3, 2),
  x3 = factor(sample(letters[1:3], size = n, replace = TRUE))
)
ds$Missing_var1 <- factor(sample(letters[1:4], size = n, replace = TRUE))
ds$Missing_var2 <- factor(sample(letters[1:4], size = n, replace = TRUE))
ds$y <- rnorm(nrow(ds)) +
  (as.numeric(ds$x1) - 1) * 1 +
  (as.numeric(ds$Missing_var1) - 1) * 1 +
  (as.numeric(ds$Missing_var2) - 1) * .5
# Create a messy missing variable
non_random_missing <- sample(which(ds$Missing_var1 %in% c("b", "d")),
  size = 150, replace = FALSE
)
# Restrict the non-random number on the x2 variables
non_random_missing <- non_random_missing[non_random_missing %in%
  which(ds$x2 > mean(ds$x2) * 1.5) &
  non_random_missing %in%
    which(ds$x2 > mean(ds$y))]
ds$Missing_var1[non_random_missing] <- NA
# Simple missing variable
ds$Missing_var2[sample(1:nrow(ds), size = 50)] <- NA
# Setup the rms environment
ddist <- datadist(ds)
options(datadist = "ddist")
impute_formula <-
  as.formula(paste(
    "~",
    paste(colnames(ds),
      collapse = "+"
    )
  ))
imp_ds <- aregImpute(impute_formula, data = ds, n.impute = 10)
fmult <- fit.mult.impute(y ~ x1 + x2 + x3 +
  Missing_var1 + Missing_var2,
fitter = ols, xtrans = imp_ds, data = ds
)
printCrudeAndAdjustedModel(fmult,
  impute_args = list(
    variance.inflation = TRUE,
    coef_change = list(
      type = "diff",
      digits = 3
    )
  )
)
# Use some labels and style to prettify the output
# fro the mtcars dataset
data("mtcars")
label(mtcars$mpg) <- "Gas"
units(mtcars$mpg) <- "Miles/(US) gallon"
label(mtcars$wt) <- "Weight"
units(mtcars$wt) <- "10^3 kg" # not sure the unit is correct
mtcars$am <- factor(mtcars$am, levels = 0:1, labels = c("Automatic", "Manual"))
label(mtcars$am) <- "Transmission"
mtcars$gear <- factor(mtcars$gear)
label(mtcars$gear) <- "Gears"
# Make up some data for making it slightly more interesting
mtcars$col <- factor(sample(c("red", "black", "silver"), size = NROW(mtcars), replace = TRUE))
label(mtcars$col) <- "Car color"
require(splines)
fit_mtcar <- lm(mpg ~ wt + gear + col, data = mtcars)
printCrudeAndAdjustedModel(fit_mtcar,
  add_references = TRUE,
  ctable = TRUE,
  desc_column = TRUE,
  digits = 1,
  desc_args = caDescribeOpts(
    digits = 1,
    colnames = c("Avg.")
  )) |>
  htmlTable::addHtmlTableStyle(css.rgroup = "",
                               css.header = "font-weight: normal")
printCrudeAndAdjustedModel(fit_mtcar,
  add_references = TRUE,
  desc_column = TRUE,
  order = c("Interc", "gear")
)
# Alterntive print - just an example, doesn't make sense to skip reference
printCrudeAndAdjustedModel(fit_mtcar,
  order = c("col", "gear"),
  groups = c("Color", "Gears"),
  add_references = c("black", NA),
  ctable = TRUE
)
# Now we can also combine models into one table using rbind()
mpg_model <- printCrudeAndAdjustedModel(lm(mpg ~ wt + gear + col, data = mtcars),
  add_references = TRUE,
  ctable = TRUE,
  desc_column = TRUE,
  digits = 1,
  desc_args = caDescribeOpts(
    digits = 1,
    colnames = c("Avg.")
  )
)
wt_model <- printCrudeAndAdjustedModel(lm(wt ~ mpg + gear + col, data = mtcars),
  add_references = TRUE,
  ctable = TRUE,
  desc_column = TRUE,
  digits = 1,
  desc_args = caDescribeOpts(
    digits = 1,
    colnames = c("Avg.")
  )
)
library(htmlTable)
rbind(Miles = mpg_model, Weight = wt_model) |>
  addHtmlTableStyle(pos.caption = "bottom") |>
  htmlTable(caption = paste("Combining models together with a table spanner element",
                            "separating each model"))
Robust covariance matrix based upon the 'sandwich'-package
Description
This is an alternative to the 'rms'-package robust covariance
matrix that uses the 'sandwich' package vcovHC() function
instead of the 'rms'-built-in estimator. The advantage being that
many more estimation types are available.
Usage
robcov_alt(fit, type = "HC3", ...)
Arguments
| fit | The ols fit that | 
| type | a character string specifying the estimation type. See
 | 
| ... | You should specify type= followed by some of the alternative available
for the  | 
Value
model The fitted model with adjusted variance and df.residual set to NULL
Examples
# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)
library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")
# Main function
f <- ols(y ~ rcs(x1, 3) + x2)
# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)
options(org.op)
A simpler latex output of the latex.anova.rms
Description
The original problem is that the anova default function output is very detailed and cause a complaint in Sweave/knitr that \hbox is overfull. It basically changes capitalized TOTAL, TOTAL INTERACTION and TOTAL NONLINEAR INTERACTION into lower case letters. It also deletes the (Factor + Higher Order Factors).
Usage
simpleRmsAnova(
  anova_output,
  subregexps,
  digits = 4,
  pval_lim.sig = 10^-4,
  rowlabel = "",
  ...
)
## S3 method for class 'simpleRmsAnova'
print(x, html = TRUE, ...)
Arguments
| anova_output | An object from the  | 
| subregexps | A 2 column matrix with sub() regular expressions to search for and their substitutions. The regular expression should be located in column 1 and the substitution in column 2. | 
| digits | Number of digits in using the round | 
| pval_lim.sig | The threshold before setting "<", default is < 0.0001 | 
| rowlabel | The label of the rows | 
| ... | Passed on to latex() or htmlTable | 
| x | The output object from the SimpleRmsAnova function | 
| html | If HTML output through the htmlTable should be used instead of traditional latex() function | 
Value
void See the latex() function
Examples
# ** Borrowed code from the lrm example **
# Fit a logistic model containing predictors age, blood.pressure, sex
# and cholesterol, with age fitted with a smooth 5-knot restricted cubic
# spline function and a different shape of the age relationship for males
# and females.
library(rms)
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c("female", "male"), n, TRUE))
label(age) <- "Age" # label is in Hmisc
label(cholesterol) <- "Total Cholesterol"
label(blood.pressure) <- "Systolic Blood Pressure"
label(sex) <- "Sex"
units(cholesterol) <- "mg/dl" # uses units.default in Hmisc
units(blood.pressure) <- "mmHg"
# To use prop. odds model, avoid using a huge number of intercepts by
# grouping cholesterol into 40-tiles
# Specify population model for log odds that Y = 1
L <- .4 * (sex == "male") + .045 * (age - 50) +
     (log(cholesterol - 10) - 5.2) * (-2 * (sex == "female") + 2 * (sex == "male"))
# Simulate binary y to have Prob(y = 1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
cholesterol[1:3] <- NA # 3 missings, at random
ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist = "ddist")
fit_lrm <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol, 4)),
     x = TRUE, y = TRUE
)
a_out <- anova(fit_lrm,
     dec.F = 1,
     ss = FALSE
)
simpleRmsAnova(a_out,
     subregexps = rbind(
          c("age", "Age"),
          c("cholesterol", "Cholesterol"),
          c("sex", "Sex")
     ),
     caption = "Anova output for a logistic regression model"
)
Tidy a(n) rms model object
Description
Tidy summarizes information about the components of a model. A model component might be a single term in a regressions. Exactly what tidy considers to be a model component varies across models but is usually self-evident. If a model has several distinct types of components, you will need to specify which components to return.
Usage
## S3 method for class 'rms'
tidy(
  x,
  conf.int = FALSE,
  conf.level = 0.95,
  exponentiate = FALSE,
  ...,
  .add_print_p_and_stat_values = getOption("Greg.tidy_add_p_and_stat_values", default =
    FALSE)
)
Arguments
| x | An rms model, e.g. ['rms::cph()'], ['rms::lrm()'] | 
| conf.int | Logical indicating whether or not to include a confidence
interval in the tidied output. Defaults to  | 
| conf.level | The confidence level to use for the confidence interval
if  | 
| exponentiate | Logical indicating whether or not to exponentiate the
the coefficient estimates. This is typical for logistic and multinomial
regressions, but a bad idea if there is no log or logit link. Defaults
to  | 
| ... | Additional arguments. Not used. Needed to match generic
signature only. Cautionary note: Misspelled arguments will be
absorbed in  
 | 
| .add_print_p_and_stat_values | For estimating print values there is a workaround that relies on capturing output from the 'print(x)' and is not considered safe. | 
Details
This is a quick fix for addressing the lack of 'rms'-compatibility with the 'broom' package, see [broom issue 30](https://github.com/tidymodels/broom/issues/30).
Value
A tibble::tibble() with columns: - 'term' The name of the regression term. - 'factor' The factor if the term is a character/factor term. - 'column_term' The full name as in the original input data - 'estimate' The estimated value of the regression term. - 'conf.high' Upper bound on the confidence interval for the estimate.c - 'conf.low' Lower bound on the confidence interval for the estimate. - 'p.value' The two-sided p-value associated with the observed statistic. - 'statistic' The value of a statistic to use in a hypothesis that the regression term is non-zero. - 'std.error' The standard error of the regression term.
Examples
library(rms)
library(broom)
library(tidyverse)
set.seed(10)
cov <- tibble(x1 = runif(200)) |> 
  mutate(x_bool_fact = if_else(x1 > 0.5,
                               "Yes",
                               sample(c("Yes", "No"), size = n(), replace = TRUE)),
         x_multi_fact = sample(c("Strange", "Factor", "Names"), size = n(), replace = TRUE),
         ftime = rexp(n()),
         fstatus = sample(0:1, size = n(), replace = TRUE),
         x_good_predictor = fstatus * runif(n()))
ddist <- datadist(cov)
options(datadist = "ddist")
cph_fit <- cph(Surv(ftime, fstatus) ~ x1 + x_bool_fact + 
                 x_multi_fact + x_good_predictor, data = cov)
tidy(cph_fit)
A function for splitting a time according to time periods
Description
If we have a violation of the cox proprtional hazards assumption we need to
split an individual's followup time into several. See vignette("timeSplitter", package = "Greg")
for a detailed description.
Usage
timeSplitter(
  data,
  by,
  time_var,
  event_var,
  event_start_status,
  time_related_vars,
  time_offset
)
Arguments
| data | The dataset that you want to split according to the  | 
| by | The time period that you want to split the dataset by. The size of the variable
must be in proportion to the the  | 
| time_var | The name of the main time variable in the dataset. This variable must be a numeric variable. | 
| event_var | The event variable | 
| event_start_status | The start status of the event status, e.g. "Alive" | 
| time_related_vars | A dataset often contains other variabels that you want to update during the split, most commonly these are age or calendar time. | 
| time_offset | If you want to skip the initial years you can offset the entire dataset by setting this variable. See detailed description below. | 
Details
Important note: The time variables must have the same time unit. I.e. function can not dedu if all variables are in years or if one happens to be in days.
Value
data.frame with the split data. The starting time for each period
is named Start_time and the ending time is called Stop_time. Note
that the resulting event_var will now contain the time-splitted eventvar.
The time_offset - details
Both time_var and other variables will be adjusted by the time_offset,
e.g. if we the time scale is in years and we want to skip the first 4 years
we set the time_offset = 4. In the outputted dataset the smallest
time_var will be 0. Note: 0 will not be included as we
generally want to look at those that survived the start date, e.g. if a
patient dies on the 4-year mark we would not include him/her in our study.
Examples
test_data <- data.frame(
  id = 1:4,
  time = c(4, 3.5, 1, 5),
  event = c("alive", "censored", "dead", "dead"),
  age = c(62.2, 55.3, 73.7, 46.3),
  date = as.Date(
    c("2003-01-01", 
      "2010-04-01", 
      "2013-09-20",
      "2002-02-23")),
  stringsAsFactors = TRUE
)
timeSplitter(test_data, .5, 
             time_var = "time",
             time_related_vars = c("age", "date"),
             event_var = "event")