runPCA {scater} | R Documentation |
Perform a principal components analysis (PCA) on cells, based on the data in a SingleCellExperiment object.
runPCA(object, ncomponents = 2, method = c("prcomp", "irlba"), ntop = 500, exprs_values = "logcounts", feature_set = NULL, scale_features = TRUE, use_coldata = FALSE, selected_variables = NULL, detect_outliers = FALSE, rand_seed = NULL, ...)
object |
A SingleCellExperiment object. |
ncomponents |
Numeric scalar indicating the number of principal components to obtain. |
method |
String specifying how the PCA should be performed. |
ntop |
Numeric scalar specifying the number of most variable features to use for PCA. |
exprs_values |
Integer scalar or string indicating which assay of |
feature_set |
Character vector of row names, a logical vector or a numeric vector of indices indicating a set of features to use for PCA.
This will override any |
scale_features |
Logical scalar, should the expression values be standardised so that each feature has unit variance? |
use_coldata |
Logical scalar specifying whether the column data should be used instead of expression values to perform PCA. |
selected_variables |
List of strings or a character vector indicating which variables in |
detect_outliers |
Logical scalar, should outliers be detected based on PCA coordinates generated from column-level metadata? |
rand_seed |
Numeric scalar specifying the random seed when using |
... |
Additional arguments to pass to |
The function prcomp
is used internally to do the PCA when method="prcomp"
.
Alternatively, the irlba package can be used, which performs a fast approximation of PCA through the prcomp_irlba
function.
This is especially useful for large, sparse matrices.
If use_coldata=TRUE
, PCA will be performed on column-level metadata.
The selected_variables
defaults to a vector containing:
"pct_counts_top_100_features"
"total_features"
"pct_counts_feature_control"
"total_features_feature_control"
"log10_total_counts_endogenous"
"log10_total_counts_feature_control"
This can be useful for identifying outliers cells based on QC metrics, especially when combined with detect_outliers=TRUE
.
If outlier identification is enabled, the outlier
field of the output colData
will contain the identified outliers.
A SingleCellExperiment object containing the first ncomponent
principal coordinates for each cell.
If use_coldata=FALSE
, this is stored in the "PCA"
entry of the reducedDims
slot.
Otherwise, it is stored in the "PCA_coldata"
entry.
The proportion of variance explained by each PC is stored as a numeric vector in the "percentVar"
attribute of the reduced dimension matrix.
Note that this will only be of length equal to ncomponents
when method
is not "prcomp"
.
This is because approximate PCA methods do not compute singular values for all components.
Aaron Lun, based on code by Davis McCarthy
## Set up an example SingleCellExperiment data("sc_example_counts") data("sc_example_cell_info") example_sce <- SingleCellExperiment( assays = list(counts = sc_example_counts), colData = sc_example_cell_info ) example_sce <- normalize(example_sce) example_sce <- runPCA(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))