getBestFeatures,matrixOrHDF5-method {clusterExperiment} | R Documentation |
Calls limma on input data to determine features most associated with found clusters (based on an F-statistic, pairwise comparisons, or following a tree that clusters the clusters).
## S4 method for signature 'matrixOrHDF5' getBestFeatures(x, cluster, contrastType = c("F", "Dendro", "Pairs", "OneAgainstAll"), dendro = NULL, pairMat = NULL, contrastAdj = c("All", "PerContrast", "AfterF"), isCount = FALSE, normalize.method = "none", ...) ## S4 method for signature 'ClusterExperiment' getBestFeatures(x, contrastType = c("F", "Dendro", "Pairs", "OneAgainstAll"), isCount = FALSE, ...)
x |
data for the test. Can be a numeric matrix or a
|
cluster |
A numeric vector with cluster assignments. “-1” indicates the sample was not assigned to a cluster. |
contrastType |
What type of test to do. ‘F’ gives the omnibus
F-statistic, ‘Dendro’ traverses the given dendrogram and does contrasts of
the samples in each side, ‘Pairs’ does pair-wise contrasts based on the
pairs given in pairMat (if pairMat=NULL, does all pairwise), and
‘OneAgainstAll’ compares each cluster to the average of all others. Passed
to |
dendro |
The dendrogram to traverse if contrastType="Dendro". Note that this should be the dendrogram of the clusters, not of the individual samples, either of class "dendrogram" or "phylo4" |
pairMat |
matrix giving the pairs of clusters for which to do pair-wise
contrasts (must match to elements of cl). If NULL, will do all pairwise of
the clusters in |
contrastAdj |
What type of FDR correction to do for contrasts tests (i.e. if contrastType='Dendro' or 'Pairs'). |
isCount |
logical as to whether input data is count data, in which case to perform voom correction to data. See details. |
normalize.method |
character value, passed to |
... |
options to pass to |
getBestFeatures returns the top ranked features corresponding to a
cluster assignment. It uses limma to fit the models, and limma's functions
topTable
or topTableF
to find the
best features. See the options of these functions to put better control on
what gets returned (e.g. only if significant, only if log-fc is above a
certain amount, etc.). In particular, set 'number=' to define how many
significant features to return (where number is per contrast for the
'Pairs' or 'Dendro' option)
When 'contrastType' argument implies that the best features should be found via contrasts (i.e. 'contrastType' is 'Pairs' or 'Dendro'), then then 'contrastAdj' determines the type of multiple testing correction to perform. 'PerContrast' does FDR correction for each set of contrasts, and does not guarantee control across all the different contrasts (so probably not the preferred method). 'All' calculates the corrected p-values based on FDR correction of all of the contrasts tested. 'AfterF' controls the FDR based on a hierarchical scheme that only tests the contrasts in those genes where the omnibus F statistic is significant. If the user selects 'AfterF', the user must also supply an option 'p.value' to have any effect, and then only those significant at that p.value level will be returned. Note that currently the correction for 'AfterF' is not guaranteed to control the FDR; improvements will be added in the future.
Note that the default option for topTable
is
to not filter based on adjusted p-values (p.value = 1
) and return
only the top 10 most significant (number = 10
) – these are options
the user can change (these arguments are passed via the ...
in
getBestFeatures
). In particular, it only makes sense to set
requireF = TRUE
if p.value
is meaningful (e.g. 0.1 or 0.05);
the default value of p.value = 1
will not result in any effect on
the adjusted p-value otherwise.
isCount
triggers whether the "voom" correction will be
performed in limma
. If the input data is a matrix is counts (or a
'ClusterExperiment' object with counts as the primary data before
transformation) this should be set to TRUE and they will be log-transformed
internally by voom for the differential expression analysis in a way that
accounts for the difference in the mean-variance relationships. Otherwise,
dat should be on the correct (log) scale for differential expression
analysis without a need a variance stabilization (e.g. microarray data).
Currently the default is set to FALSE, simply because the isCount has not
been heavily tested. If the But TRUE with x
being counts really
should be the default for RNA-Seq data. If the input data is a
'ClusterExperiment' object, setting 'isCount=TRUE' will cause the program
to ignore the internally stored transformation function and instead use
voom with log2(x+0.5). Alternatively, isCount=FALSE
for a
ClusterExperiment
object will cause the DE to be performed with
limma
after transforming the data with the stored transformation.
Although some writing about "voom" seem to suggest that it would be
appropriate for arbitrary transformations, the authors have cautioned
against using it for anything other than count data on mailing lists. For
this reason we are not implementing it for arbitrary transformations at
this time (e.g. log(FPKM+epsilon) transformations).
A data.frame
in the same format as
topTable
, except for the following additional or
changed columns:
Feature
This is the column called 'ProbeID' by
topTable
IndexInOriginal
Gives the index of the feature to the original
input dataset, x
Contrast
The contrast that the results corresponds to (if
applicable, depends on contrastType
argument)
ContrastName
The name of the contrast that the results
corresponds to. For dendrogram searches, this will be the node of the tree of
the dendrogram.
Ritchie, ME, Phipson, B, Wu, D, Hu, Y, Law, CW, Shi, W, and Smyth, GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47. http://nar.oxfordjournals.org/content/43/7/e47
Law, CW, Chen, Y, Shi, W, and Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29. http://genomebiology.com/2014/15/2/R29
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, Volume 3, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf
data(simData) #create a clustering, for 8 clusters (truth was 4) cl <- clusterSingle(simData, subsample=FALSE, sequential=FALSE, mainClusterArgs=list(clusterFunction="pam", clusterArgs=list(k=8))) #basic F test, return all, even if not significant: testF <- getBestFeatures(cl, contrastType="F", number=nrow(simData), isCount=FALSE) #Do all pairwise, only return significant, try different adjustments: pairsPerC <- getBestFeatures(cl, contrastType="Pairs", contrastAdj="PerContrast", p.value=0.05, isCount=FALSE) pairsAfterF <- getBestFeatures(cl, contrastType="Pairs", contrastAdj="AfterF", p.value=0.05, isCount=FALSE) pairsAll <- getBestFeatures(cl, contrastType="Pairs", contrastAdj="All", p.value=0.05, isCount=FALSE) #not useful for this silly example, but could look at overlap with Venn allGenes <- paste("Row", 1:nrow(simData),sep="") if(require(limma)){ vennC <- vennCounts(cbind(PerContrast= allGenes %in% pairsPerC$Feature, AllJoint=allGenes %in% pairsAll$Feature, FHier=allGenes %in% pairsAfterF$Feature)) vennDiagram(vennC, main="FDR Overlap") } #Do one cluster against all others oneAll <- getBestFeatures(cl, contrastType="OneAgainstAll", contrastAdj="All", p.value=0.05) #Do dendrogram testing hcl <- makeDendrogram(cl) allDendro <- getBestFeatures(hcl, contrastType="Dendro", contrastAdj=c("All"), number=ncol(simData), p.value=0.05) # do DE on counts using voom # compare results to if used simData instead (not on count scale). # Again, not relevant for this silly example, but basic principle useful testFVoom <- getBestFeatures(simCount, primaryCluster(cl), contrastType="F", number=nrow(simData), isCount=TRUE) plot(testF$P.Value[order(testF$Index)], testFVoom$P.Value[order(testFVoom$Index)],log="xy")