| Type: | Package |
| Title: | Estimating the Error Variance in a High-Dimensional Linear Model |
| Version: | 0.9.0 |
| Maintainer: | Guo Yu <gy63@cornell.edu> |
| Description: | Implementation of the two error variance estimation methods in high-dimensional linear models of Yu, Bien (2017) <doi:10.48550/arXiv.1712.02412>. |
| URL: | https://arxiv.org/abs/1712.02412 |
| BugReports: | https://github.com/hugogogo/natural/issues |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 6.0.1 |
| Imports: | Matrix, glmnet |
| Suggests: | knitr, rmarkdown |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2018-01-16 01:32:01 UTC; hugo |
| Author: | Guo Yu [aut, cre] |
| Repository: | CRAN |
| Date/Publication: | 2018-01-16 10:35:43 UTC |
natural: Natural and Organic lasso estimates of error variance in high-dimensional linear models
Description
The package contains implementation of the two methods introduced in Yu, Bien (2017) https://arxiv.org/abs/1712.02412.
Details
The main functions are nlasso_cv, olasso_cv, and olasso.
Get the two (theoretical) values of lambdas used in the organic lasso
Description
Get the two (theoretical) values of lambdas used in the organic lasso
Usage
getLam_olasso(x)
Arguments
x |
design matrix |
Get the two (theoretical) values of lambdas used in scaled lasso
Description
Get the two (theoretical) values of lambdas used in scaled lasso
Usage
getLam_slasso(n, p)
Arguments
n |
number of observations |
p |
number of features |
Generate sparse linear model and random samples
Description
Generate design matrix and response following linear models
y = X \beta + \epsilon, where
\epsilon ~ N(0, \sigma^2), and X ~ N(0, \Sigma).
Usage
make_sparse_model(n, p, alpha, rho, snr, nsim)
Arguments
n |
the sample size |
p |
the number of features |
alpha |
sparsity, i.e., |
rho |
pairwise correlation among features |
snr |
signal to noise ratio, defined as |
nsim |
the number of simulations |
Value
A list object containing:
x:The
nbypdesign matrixy:The
nbynsimmatrix of response vector, each column representing one replication of the simulationbeta:The true regression coefficient vector
sigma:The true error standard deviation
Cross-validation for natural lasso
Description
Provide natural lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value The output also includes the cross-validation result of the naive estimate and the degree of freedom adjusted estimate of the error standard deviation.
Usage
nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100,
flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08,
glmnet_output = NULL)
Arguments
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
intercept |
Indicator of whether intercept should be fitted. Default to be |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
nfold |
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal |
foldid |
A vector of length |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
glmnet_output |
Should the estimate be computed using a user-specified output from |
Value
A list object containing:
nandp:The dimension of the problem.
lambda:The path of tuning parameter used.
beta:Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0:Estimate of intercept
mat_mse:The estimated prediction error on the test sets in cross-validation. A matrix of size
nlambynfold. Ifglmnet_outputis notNULL, thenmat_msewill be NULL.cvm:The averaged estimated prediction error on the test sets over K folds.
cvse:The standard error of the estimated prediction error on the test sets over K folds.
ibest:The index in
lambdathat attains the minimal mean cross-validated error.foldid:Fold assignment. A vector of length
n.nfold:The number of folds used in cross-validation.
sig_obj:Natural lasso estimate of standard deviation of the error, with the optimal tuning parameter selected by cross-validation.
sig_obj_path:Natural lasso estimates of standard deviation of the error. A vector of length
nlam.sig_naive:Naive estimates of the error standard deviation based on lasso regression, i.e.,
||y - X \hat{\beta}||_2 / \sqrt n, selected by cross-validation.sig_naive_path:Naive estimate of standard deviation of the error based on lasso regression. A vector of length
nlam.sig_df:Degree-of-freedom adjusted estimate of standard deviation of the error, selected by cross-validation. See Reid, et, al (2016).
sig_df_path:Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length
nlam.type:whether the output is of a natural or an organic lasso.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])
Fit a linear model with natural lasso
Description
Calculate a solution path of the natural lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the lasso problems and returns the lasso objective function values as estimates of the error variance:
\hat{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1.
The output also includes a path of naive estimates and a path of degree of freedom adjusted estimates of the error standard deviation.
Usage
nlasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01,
thresh = 1e-08, intercept = TRUE, glmnet_output = NULL)
Arguments
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
thresh |
Threshold value for the underlying optimization algorithm to claim convergence. Default to be |
intercept |
Indicator of whether intercept should be fitted. Default to be |
glmnet_output |
Should the estimate be computed using a user-specified output from |
Value
A list object containing:
nandp:The dimension of the problem.
lambda:The path of tuning parameters used.
beta:Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size
pbynlam. Thej-th column represents the estimate of coefficient corresponding to thej-th tuning parameter inlambda.a0:Estimate of intercept. A vector of length
nlam.sig_obj_path:Natural lasso estimates of the error standard deviation. A vector of length
nlam.sig_naive_path:Naive estimates of the error standard deviation based on lasso regression, i.e.,
||y - X \hat{\beta}||_2 / \sqrt n. A vector of lengthnlam.sig_df_path:Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length
nlam. See Reid, et, al (2016).type:whether the output is of a natural or an organic lasso.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
nl_path <- nlasso_path(x = sim$x, y = sim$y[, 1])
Error standard deviation estimation using organic lasso
Description
Solve the organic lasso problem
\tilde{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1^2
with two pre-specified values of tuning parameter:
\lambda_1 = log p / n, and \lambda_2, which is a Monte-Carlo estimate of ||X^T e||_\infty^2 / n^2, where e is n-dimensional standard normal.
Usage
olasso(x, y, intercept = TRUE, thresh = 1e-08)
Arguments
x |
An |
y |
A response vector of size |
intercept |
Indicator of whether intercept should be fitted. Default to be |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
Value
A list object containing:
nandp:The dimension of the problem.
lam_1,lam_2:log(p) / n, and an Monte-Carlo estimate of||X^T e||_\infty^2 / n^2, whereeis n-dimensional standard normal.a0_1,a0_2:Estimate of intercept, corresponding to
lam_1andlam_2.beta_1,beta_2:Organic lasso estimate of regression coefficients, corresponding to
lam_1andlam_2.sig_obj_1,sig_obj_2:Organic lasso estimate of the error standard deviation, corresponding to
lam_1andlam_2.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol <- olasso(x = sim$x, y = sim$y[, 1])
Cross-validation for organic lasso
Description
Provide organic lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value
Usage
olasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100,
flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08)
Arguments
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
intercept |
Indicator of whether intercept should be fitted. Default to be |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
nfold |
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal |
foldid |
A vector of length |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
Value
A list object containing:
nandp:The dimension of the problem.
lambda:The path of tuning parameter used.
beta:Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0:Estimate of intercept
mat_mse:The estimated prediction error on the test sets in cross-validation. A matrix of size
nlambynfoldcvm:The averaged estimated prediction error on the test sets over K folds.
cvse:The standard error of the estimated prediction error on the test sets over K folds.
ibest:The index in
lambdathat attains the minimal mean cross-validated error.foldid:Fold assignment. A vector of length
n.nfold:The number of folds used in cross-validation.
sig_obj:Organic lasso estimate of the error standard deviation, selected by cross-validation.
sig_obj_path:Organic lasso estimates of the error standard deviation. A vector of length
nlam.type:whether the output is of a natural or an organic lasso.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol_cv <- olasso_cv(x = sim$x, y = sim$y[, 1])
Fit a linear model with organic lasso
Description
Calculate a solution path of the organic lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the squared-lasso problems and returns the objective function values as estimates of the error variance:
\tilde{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1^2.
Usage
olasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01,
thresh = 1e-08, intercept = TRUE)
Arguments
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
intercept |
Indicator of whether intercept should be fitted. Default to be |
Details
This package also includes the outputs of the naive and the degree-of-freedom adjusted estimates, in analogy to nlasso_path.
Value
A list object containing:
nandp:The dimension of the problem.
lambda:The path of tuning parameter used.
a0:Estimate of intercept. A vector of length
nlam.beta:Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size
pbynlam. Thej-th column represents the estimate of coefficient corresponding to thej-th tuning parameter inlambda.sig_obj_path:Organic lasso estimates of the error standard deviation. A vector of length
nlam.sig_naive:Naive estimate of the error standard deviation based on the squared-lasso regression. A vector of length
nlam.sig_df:Degree-of-freedom adjusted estimate of the error standard deviation, based on the squared-lasso regression. A vector of length
nlam.type:whether the output is of a natural or an organic lasso.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol_path <- olasso_path(x = sim$x, y = sim$y[, 1])
Solve organic lasso problem with a single value of lambda The lambda values are for slow rates, which could give less satisfying results
Description
Solve organic lasso problem with a single value of lambda The lambda values are for slow rates, which could give less satisfying results
Usage
olasso_slow(x, y, thresh = 1e-08)
Arguments
x |
An |
y |
A response vector of size |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
plot a natural.cv object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.cv'
plot(x, ...)
Arguments
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
plot a natural.path object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.path'
plot(x, ...)
Arguments
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
print a natural.path object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.path'
print(x, ...)
Arguments
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
Standardize the n -by- p design matrix X to have column means zero and ||X_j||_2^2 = n for all j
Description
Standardize the n -by- p design matrix X to have column means zero and ||X_j||_2^2 = n for all j
Usage
standardize(x, center = TRUE)
Arguments
x |
design matrix |
center |
should we set column means equal to zero |