Differential methylation calling

Identifying Deferentially methylated regions (DMRs) can be done using within R or using external command line programs. Here we describe how methrix object can be used with both the approaches. For DMR calling in R, popular programs include DSS, DMRseq. Outside R, metilene offers a fast and simple alternative. Note that DNRseq and DSS can be resource hungry with run rimes often involving hours. Metilene however is ultrafast and lightweight with small memory footprint.

DSS

A very important step of a WGBS data analysis is the differential methylation calling. During this step, we would like to identify changes between groups or conditions first on the site level. Then, we will need to identify larger regions (DMR-s, differentially methylated regions) affected by methylation changes that are more likely to represent functional alterations. Methrix doesn’t have a DMR caller, therefore we will use DSS. At this stage, DSS is not yet able to work directly with methrix, therefore we transform our methrix object to bsseq for the sake of DMR calling.


if(!requireNamespace("DSS")) {
BiocManager::install("DSS")
}
library(DSS)

For more detailed description, options and recommendations on smoothing, please refer to the ´DSS´ vignette

bs_obj <- methrix::methrix2bsseq(meth)
bs_obj
#> An object of type 'BSseq' with
#>   28217448 methylation loci
#>   6 samples
#> has not been smoothed
#> All assays are in-memory
dmlTest = DSS::DMLtest(bs_obj, group1=rownames(sample_anno)[sample_anno$Day=="D6"], group2=rownames(sample_anno)[sample_anno$Day=="D14"], 
                       smoothing = TRUE)

Call DMRs and DMLs

dmls = DSS::callDML(dmlTest, p.threshold = 0.05)
dmrs = DSS::callDMR(dmlTest, p.threshold = 0.05)

DMRs:

head(dmrs)
#>          chr     start       end length nCG meanMethy1 meanMethy2 diff.Methy
#> 292302  chr7   1686521   1687244    724  36  0.9253184  0.4316565  0.4936619
#> 120729 chr16  73582841  73585140   2300  28  0.8963596  0.3711405  0.5252191
#> 91590  chr13 112225194 112225927    734  22  0.9684702  0.3701374  0.5983328
#> 122302 chr16  85482141  85483212   1072  41  0.7926689  0.4123773  0.3802916
#> 897     chr1  10732115  10733515   1401  37  0.9412203  0.5546301  0.3865901
#> 195240  chr3  18411127  18413011   1885  25  0.9463775  0.4042538  0.5421237
#>        areaStat
#> 292302 258.9261
#> 120729 224.1427
#> 91590  220.4902
#> 122302 201.0486
#> 897    196.7819
#> 195240 191.6146

DMLs:

head(dmls)
#>         chr      pos       mu1        mu2      diff    diff.se     stat
#> 115002 chr1  7164841 0.9478878 0.20845803 0.7394297 0.05289578 13.97899
#> 238604 chr1 17112454 0.9218719 0.30521631 0.6166556 0.06054654 10.18482
#> 278735 chr1 20577038 0.8532530 0.06043134 0.7928217 0.04640743 17.08394
#> 342292 chr1 25803237 0.8620008 0.26811728 0.5938835 0.05771677 10.28962
#> 484470 chr1 38802438 0.9382340 0.31429739 0.6239366 0.05476446 11.39309
#> 580547 chr1 48508231 0.8705919 0.26986039 0.6007315 0.05822061 10.31819
#>              phi1       phi2         pval          fdr postprob.overThreshold
#> 115002 0.04792590 0.05572107 2.094283e-44 1.277624e-38                      1
#> 238604 0.04677971 0.05479802 2.317896e-24 1.353868e-19                      1
#> 278735 0.04869763 0.07494907 1.954656e-65 5.365998e-59                      1
#> 342292 0.04616408 0.06169127 7.848714e-25 4.967516e-20                      1
#> 484470 0.05334149 0.06542750 4.526264e-30 6.061307e-25                      1
#> 580547 0.06240985 0.05203124 5.830977e-25 3.825611e-20                      1

DMR calling with dmrseq

A popular R package for DMR calling is dmrseq. For more detailed description of the possibilites, see the vignette.


if(!requireNamespace("dmrseq")) {
BiocManager::install("dmrseq")
}
library(dmrseq)

register(MulticoreParam(2))

testCovariate <- "Day"
regions <- dmrseq::dmrseq(bs=bs_obj[240001:260000,],
                  cutoff = 0.05,
                  testCovariate=testCovariate, verbose = T)

DMR calling with Metilene

If you prefer using Metilene which is significantly faster than R based tools, you can export the methrix object to Metilene input file formats.

dir.create(path = "./metilene/", showWarnings = FALSE, recursive = TRUE)
methrix::write_bedgraphs(m = meth, output_dir = "./metilene/", rm_NA = FALSE, metilene = TRUE,multiBed = "metline_ip", phenoCol = "Day", compress = FALSE)
#----------------------
#*Writing ./metilene//metline_ip.bedGraph.gz 
#----------------------  

Once the file is generated, you can run metilene which should take less than couple of minutes to finish.

$ ./metilene_v0.2-8/metilene -t 4 -a D14_WT -b D6_WT metilene/metline_ip.bedGraph > metilene/D14_vs_D6.tsv

#Sort output by q values
$ cat metilene/D14_vs_D6.tsv | sort -k 4 | head
# chr6  163235589   163236271   0.00010009  -0.464417   12  2.717e-12   1.7226e-11  0.499   0.96342
# chr16 72935841    72936574    0.00010268  -0.593495   13  9.7322e-13  1.7671e-11  0.31243 0.90592
# chr3  126891865   126892175   0.00010581  -0.548356   13  1.7963e-13  1.8211e-11  0.27462 0.82297
# chr2  13448747    13448946    0.00011202  0.142889    24  1.3844e-13  1.9279e-11  0.99314 0.85025
# chr1  18381792    18381953    0.00011518  -0.464353   10  5.7773e-11  1.9823e-11  0.45378 0.91813
# chr8  38587666    38587922    0.00011689  -0.475570   10  3.8831e-11  2.0118e-11  0.4333  0.90887
# chr1  44030377    44030580    0.00011877  -0.705450   10  3.1796e-11  2.0442e-11  0.11405 0.8195
# chr17 77158527    77158873    0.00012301  -0.488690   10  4.7383e-11  2.117e-11   0.45551 0.9442
# chr17 75588404    75588890    0.00013396  -0.523367   10  3.1796e-11  2.3054e-11  0.412   0.93537
# chr2  222317260   222317626   0.00013438  -0.598000   10  3.5142e-11  2.3128e-11  0.3712  0.9692

Visualizing DMRs and candidate regions

Visualizing a region

We can visualize certain regions and/or DMRs using the Gviz package. The plotting function is included in the best_practices_WGBS.Rmd file.

dmrs <- makeGRangesFromDataFrame(dmrs, keep.extra.columns = TRUE)
genome <- "hg19"
mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="feb2014.archive.ensembl.org", dataset="hsapiens_gene_ensembl")

i <- 5

#candidate_region <- GRanges(paste0(seqlevels(dmrs)[i], ":", start(dmrs)[i], "-", end(dmrs)[i]))
candidate_region <- dmrs[i]
region_plot(m = meth, region=candidate_region, mart=mart, genome=genome, groups=meth$Day, dmrs=dmrs)
#the region_plot function is available in the best_practices.Rmd file. 

Heatmap of DMRs


heatmap_data <- as.data.frame(methrix::get_region_summary(meth, makeGRangesFromDataFrame(dmrs)))
heatmap_data <- heatmap_data[complete.cases(heatmap_data),]
rownames(heatmap_data) <- paste0(heatmap_data$chr,":", heatmap_data$start, "-", heatmap_data$end)

Top 50 DMRs

pheatmap::pheatmap(mat = head(heatmap_data[,-(1:5)], 50), show_rownames = FALSE, annotation_col = as.data.frame(meth@colData), fontsize = 8)

Number and size of DMRs

if(!requireNamespace("plotly")) {
install.packages("plotly")
}
if(!requireNamespace("ggplot2")) {
install.packages("ggplot2")
}
if(!requireNamespace("scales")) {
install.packages("scales")
}
library(plotly)
library(ggplot2)
library(scales)

count <- c("Higher methylation in day 14" = length(dmrs[dmrs$diff.Methy<0,]),
           "Higher methylation in day 6" = length(dmrs[dmrs$diff.Methy>0,]))
length <- c("Higher methylation in day 14" = sum(width(dmrs[dmrs$diff.Methy<0,])), 
            "Higher methylation in day 6" = sum(width(dmrs[dmrs$diff.Methy<0,]))) 


data <- data.frame(Direction=c("Gain", "Loss"), Count=count, Length=length)

   g <- ggplot(data=data)+geom_col(aes(x=factor(1), y=Count, fill=Direction), position = "dodge")+theme_bw()+theme(axis.title.x = element_blank(), axis.text.x = element_blank())+scale_fill_brewer(palette = "Dark2")+ggtitle("Number of differentially methylated regions")+scale_y_continuous(labels = comma)
ggplotly(g)
   p <- ggplot(data=data)+geom_col(aes(x=factor(1), y=Length, fill=Direction), position = "dodge")+theme_bw()+theme(axis.title.x = element_blank(), axis.text.x = element_blank())+scale_fill_brewer(palette = "Dark2")+ggtitle("Length of differentially methylated regions")+scale_y_continuous(labels = comma)
ggplotly(p)

Region annotation

To describe and interpret the differentially methylated regions, we can use the ChIPseeker package. The location of the DMRs can already give us a hint the involved processes, but once we assign the regions to genes, a pathway enrichment analysis is also possible. For additional details, see the vignette of ChIPseeker:

## loading packages
if(!requireNamespace("ChIPseeker")) {
BiocManager::install("ChIPseeker")
}
if(!requireNamespace("TxDb.Hsapiens.UCSC.hg19.knownGene")) {
BiocManager::install("TxDb.Hsapiens.UCSC.hg19.knownGene")
}
if(!requireNamespace("ReactomePA")) {
BiocManager::install("ReactomePA")
}
if(!requireNamespace("clusterProfiler")) {
BiocManager::install("clusterProfiler")
}
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
library(ReactomePA)
library(clusterProfiler)
peakAnnoList <- lapply(list("Higher in day 14"=dmrs[dmrs$diff.Methy<0,], "Higher in day 6" = dmrs[dmrs$diff.Methy>0,]), annotatePeak, TxDb=txdb, tssRegion=c(-3000, 3000), verbose=FALSE)

ChIPseeker::plotAnnoBar(peakAnnoList)

ChIPseeker::plotDistToTSS(peakAnnoList)