TCGAanalyze_DMR {TCGAbiolinks}R Documentation

Differentially methylated regions Analysis

Description

This function will search for differentially methylated CpG sites, which are regarded as possible functional regions involved in gene transcriptional regulation.

In order to find these regions we use the beta-values (methylation values ranging from 0.0 to 1.0) to compare two groups.

Firstly, it calculates the difference between the mean methylation of each group for each probes. Secondly, it calculates the p-value using the wilcoxon test using the Benjamini-Hochberg adjustment method. The default parameters will require a minimum absolute beta values delta of 0.2 and a false discovery rate (FDR)-adjusted Wilcoxon rank-sum P-value of < 0.01 for the difference.

After these analysis, we save a volcano plot (x-axis:diff mean methylation, y-axis: significance) that will help the user identify the differentially methylated CpG sites and return the object with the calculus in the rowRanges.

If the calculus already exists in the object it will not recalculated. You should set overwrite parameter to TRUE to force it, or remove the collumns with the results from the object.

Usage

TCGAanalyze_DMR(data, groupCol = NULL, group1 = NULL, group2 = NULL,
  calculate.pvalues.probes = "all",
  plot.filename = "methylation_volcano.pdf",
  ylab = expression(paste(-Log[10], " (FDR corrected -P values)")),
  xlab = expression(paste("DNA Methylation difference (", beta,
  "-values)")), title = NULL, legend = "Legend", color = c("black",
  "red", "darkgreen"), label = NULL, xlim = NULL, ylim = NULL,
  p.cut = 0.01, probe.names = FALSE, diffmean.cut = 0.2,
  paired = FALSE, adj.method = "BH", overwrite = FALSE, cores = 1,
  save = TRUE, save.directory = ".", filename = NULL)

Arguments

data

SummarizedExperiment obtained from the TCGAPrepare

groupCol

Columns with the groups inside the SummarizedExperiment object. (This will be obtained by the function colData(data))

group1

In case our object has more than 2 groups, you should set the name of the group

group2

In case our object has more than 2 groups, you should set the name of the group

calculate.pvalues.probes

In order to get the probes faster the user can select to calculate the pvalues only for the probes with a difference in DNA methylation. The default is to calculate to all probes. Possible values: "all", "differential". Default "all"

plot.filename

Filename. Default: volcano.pdf, volcano.svg, volcano.png. If set to FALSE, there will be no plot.

ylab

y axis text

xlab

x axis text

title

main title. If not specified it will be "Volcano plot (group1 vs group2)

legend

Legend title

color

vector of colors to be used in graph

label

vector of labels to be used in the figure. Example: c("Not Significant","Hypermethylated in group1", "Hypomethylated in group1"))

xlim

x limits to cut image

ylim

y limits to cut image

p.cut

p values threshold. Default: 0.01

probe.names

is probe.names

diffmean.cut

diffmean threshold. Default: 0.2

paired

Wilcoxon paired parameter. Default: FALSE

adj.method

Adjusted method for the p-value calculation

overwrite

Overwrite the pvalues and diffmean values if already in the object for both groups? Default: FALSE

cores

Number of cores to be used in the non-parametric test Default = groupCol.group1.group2.rda

save

Save object with results? Default: TRUE

save.directory

Directory to save the files. Default: working directory

filename

Name of the file to save the object.

Value

Volcano plot saved and the given data with the results (diffmean.group1.group2,p.value.group1.group2, p.value.adj.group1.group2,status.group1.group2) in the rowRanges where group1 and group2 are the names of the groups

Examples

nrows <- 200; ncols <- 20
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GenomicRanges::GRanges(rep(c("chr1", "chr2"), c(50, 150)),
                   IRanges::IRanges(floor(runif(200, 1e5, 1e6)), width=100),
                    strand=sample(c("+", "-"), 200, TRUE),
                    feature_id=sprintf("ID%03d", 1:200))
colData <- S4Vectors::DataFrame(Treatment=rep(c("ChIP", "Input"), 5),
                    row.names=LETTERS[1:20],
                    group=rep(c("group1","group2"),c(10,10)))
data <- SummarizedExperiment::SummarizedExperiment(
         assays=S4Vectors::SimpleList(counts=counts),
         rowRanges=rowRanges,
         colData=colData)
SummarizedExperiment::colData(data)$group <- c(rep("group 1",ncol(data)/2),
                         rep("group 2",ncol(data)/2))
hypo.hyper <- TCGAanalyze_DMR(data, p.cut = 0.85,"group","group 1","group 2")
SummarizedExperiment::colData(data)$group2 <- c(rep("group_1",ncol(data)/2),
                         rep("group_2",ncol(data)/2))
hypo.hyper <- TCGAanalyze_DMR(data, p.cut = 0.85,"group2","group_1","group_2")

[Package TCGAbiolinks version 2.8.4 Index]