remp {REMP} | R Documentation |
remp
is used to predict genomewide methylation levels of locus-specific repetitive elements (RE).
Two major RE types in human, Alu element (Alu) and LINE-1 (L1) are available.
remp(methyDat, REtype = c("Alu", "L1"), parcel = NULL, work.dir = tempdir(), win = 1000, method = c("rf", "svmLinear", "svmRadial", "naive"), autoTune = TRUE, param = NULL, seed = NULL, ncore = NULL, BPPARAM = NULL, verbose = FALSE)
methyDat |
A |
REtype |
Type of RE. Currently |
parcel |
An |
work.dir |
Path to the directory where the annotation data generated by |
win |
An integer specifying window size to confine the upstream and downstream flanking
region centered on the predicted CpG in RE for prediction. Default = |
method |
Name of model/approach for prediction. Currently |
autoTune |
Logical parameter. If |
param |
A number or a vector specifying the model tuning parameter(s) (not applicable for Random Forest).
For SVM, |
seed |
Random seed for Random Forest model for reproducible prediction results.
Default is |
ncore |
Number of cores to run parallel computation. By default, max number of cores available
in the machine will be utilized. If |
BPPARAM |
An optional |
verbose |
Logical parameter. Should the function be verbose? |
Before running remp
, user should make sure the methylation data have gone through
proper quality control, background correction, and normalization procedures. Both beta value
and M value are allowed. Rows represents probes and columns represents samples. Please make
sure to have row names that specify the Illumina probe ID (i.e. cg00000029). Parameter
win = 1000
is based on previous findings showing that neighboring CpGs are more likely
to be co-modified within 1000 bp. User can specify narrower window size for slight improvement of
prediction accuracy at the cost of less predicted RE. Window size greater than 1000 is not
recommended as the machine learning models would not be able to learn much userful information
for prediction but introduce noise. Random Forest model (method = "rf"
) is recommented
as it offers more accurate prediction and it also enables prediction reliability functionality.
Prediction reliability is estimated by conditional standard deviation using Quantile Regression Forest.
Please note that if parallel computing is allowed, parallel Random Forest
(powered by package ranger
) will be used automatically. The performance of
Random Forest model is often relatively insensitive to the choice of mtry
.
Therefore, auto-tune will be turned off using Random Forest and mtry
will be set to one third
of the total number of predictors. For SVM, if autoTune = TRUE
, preset tuning parameter
search grid can be access and modified using remp_options
.
A REMProduct
object containing predicted RE methylation results.
See initREMP
to prepare necessary annotation database before running remp
.
# Obtain example Illumina example data (450k) GM12878_450k <- getGM12878('450k') # Make sure you have run 'initREMP'. See ?initREMP. # Run prediction remp.res <- remp(GM12878_450k, REtype = 'Alu', ncore = 1) remp.res details(remp.res) rempB(remp.res) # Methylation data (beta value) # Extract CpG location information (inherit from class 'RangedSummarizedExperiment') rowRanges(remp.res) # RE annotation information rempAnnot(remp.res) # Add gene annotation remp.res <- decodeAnnot(remp.res, type = "symbol") rempAnnot(remp.res) # (Recommended) Trim off less reliable prediction remp.res <- rempTrim(remp.res) # (Recommended) Obtain RE-level methylation (aggregate by mean) remp.res <- rempAggregate(remp.res) rempB(remp.res) # Methylation data (beta value) # Extract RE location information rowRanges(remp.res) # Density plot across predicted RE plot(remp.res)