remp {REMP}R Documentation

Repetitive element methylation prediction

Description

remp is used to predict genomewide methylation levels of locus-specific repetitive elements (RE). Two major RE types in human, Alu element (Alu) and LINE-1 (L1) are available.

Usage

remp(methyDat, REtype = c("Alu", "L1"), parcel = NULL,
  work.dir = tempdir(), win = 1000, method = c("rf", "svmLinear",
  "svmRadial", "naive"), autoTune = TRUE, param = NULL, seed = NULL,
  ncore = NULL, BPPARAM = NULL, verbose = FALSE)

Arguments

methyDat

A RatioSet, GenomicRatioSet, DataFrame, data.table, data.frame, or matrix of Illumina BeadChip methylation data (450k or EPIC array). See Details.

REtype

Type of RE. Currently "Alu" and "L1" are supported.

parcel

An REMParcel object containing necessary data to carry out the prediction. If NULL, the function will search the .rds data file in work.dir exported by initREMP (with export = TRUE) or saveParcel.

work.dir

Path to the directory where the annotation data generated by initREMP are saved. Valid when the argument parcel is missing. If not specified, temporary directory tempdir() will be used. If specified, the directory path has to be the same as the one specified in initREMP or in saveParcel.

win

An integer specifying window size to confine the upstream and downstream flanking region centered on the predicted CpG in RE for prediction. Default = 1000. See Details.

method

Name of model/approach for prediction. Currently "rf" (Random Forest), "svmLinear" (SVM with linear kernel), "svmRadial" (SVM with linear kernel), and "naive" (carrying over methylation values of the closest CpG site) are available. Default = "rf" (Random Forest). See Details.

autoTune

Logical parameter. If TRUE, a 3-time repeated 5-fold cross validation will be performed to determine the best model parameter. If FALSE, the param option (see below) must be specified. Default = TRUE. Auto-tune will be disabled using Random Forest. See Details.

param

A number or a vector specifying the model tuning parameter(s) (not applicable for Random Forest). For SVM, param represents 'Cost' (for linear kernel) or 'Sigma' and 'Cost' (for radial basis function kernel). This parameter is valid only when autoTune = FALSE.

seed

Random seed for Random Forest model for reproducible prediction results. Default is NULL, which generates a seed.

ncore

Number of cores to run parallel computation. By default, max number of cores available in the machine will be utilized. If ncore = 1, no parallel computation is allowed.

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation. If not specified, default back-end in the machine will be used.

verbose

Logical parameter. Should the function be verbose?

Details

Before running remp, user should make sure the methylation data have gone through proper quality control, background correction, and normalization procedures. Both beta value and M value are allowed. Rows represents probes and columns represents samples. Please make sure to have row names that specify the Illumina probe ID (i.e. cg00000029). Parameter win = 1000 is based on previous findings showing that neighboring CpGs are more likely to be co-modified within 1000 bp. User can specify narrower window size for slight improvement of prediction accuracy at the cost of less predicted RE. Window size greater than 1000 is not recommended as the machine learning models would not be able to learn much userful information for prediction but introduce noise. Random Forest model (method = "rf") is recommented as it offers more accurate prediction and it also enables prediction reliability functionality. Prediction reliability is estimated by conditional standard deviation using Quantile Regression Forest. Please note that if parallel computing is allowed, parallel Random Forest (powered by package ranger) will be used automatically. The performance of Random Forest model is often relatively insensitive to the choice of mtry. Therefore, auto-tune will be turned off using Random Forest and mtry will be set to one third of the total number of predictors. For SVM, if autoTune = TRUE, preset tuning parameter search grid can be access and modified using remp_options.

Value

A REMProduct object containing predicted RE methylation results.

See Also

See initREMP to prepare necessary annotation database before running remp.

Examples

# Obtain example Illumina example data (450k)
GM12878_450k <- getGM12878('450k')

# Make sure you have run 'initREMP'. See ?initREMP.

# Run prediction
remp.res <- remp(GM12878_450k, REtype = 'Alu', ncore = 1)
remp.res
details(remp.res)
rempB(remp.res) # Methylation data (beta value)

# Extract CpG location information (inherit from class 'RangedSummarizedExperiment')
rowRanges(remp.res) 

# RE annotation information
rempAnnot(remp.res)

# Add gene annotation
remp.res <- decodeAnnot(remp.res, type = "symbol")
rempAnnot(remp.res)

# (Recommended) Trim off less reliable prediction
remp.res <- rempTrim(remp.res)

# (Recommended) Obtain RE-level methylation (aggregate by mean)
remp.res <- rempAggregate(remp.res)
rempB(remp.res) # Methylation data (beta value)

# Extract RE location information 
rowRanges(remp.res)

# Density plot across predicted RE
plot(remp.res)


[Package REMP version 1.4.1 Index]