Bedgraph files generated by BS pipelines often come in various flavors. Critical downstream step requires aggregation of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix
, including many other useful downstream functions.
For a short and quick documentation, see the Bioconductor vignette.
A exemplary complete data analysis with steps from reading in to annotation and differential methylation calling can be find in our best practice pipeline.
data.table
back-endSummarizedExperiment
with custom methods for CpG extraction, sub-setting, and filteringHDF5Array
and saveHDF5SummarizedExperiment
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("CompEpigen/methrix")
Usage is simple and involves generating a methrix
object using read_bedgraphs() command which can be passed to all downstream analyses.
The example data of the methrix package is used.
#Example bedgraph files
> bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = "*bdg\\.gz$", full.names = TRUE)
> meth = methrix::read_bedgraphs(files = bdg_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4,
stranded = TRUE, collapse_strands = TRUE)
----------------------------
-Preset: Custom
--Missing beta and coverage info. Estimating them from M and U values
-CpGs raw: 29,891,155 (total reference CpGs)
-CpGs retained: 28,217,448(reference CpGs from contigs of interest)
-CpGs stranded: 56,434,896(reference CpGs from both strands)
----------------------------
-Processing: C1.bedGraph.gz
--CpGs missing: 56,434,219 (from known reference CpGs)
-Processing: C2.bedGraph.gz
--CpGs missing: 56,434,207 (from known reference CpGs)
-Processing: N1.bedGraph.gz
--CpGs missing: 56,434,194 (from known reference CpGs)
-Processing: N2.bedGraph.gz
--CpGs missing: 56,434,195 (from known reference CpGs)
-Finished in: 00:02:00 elapsed (00:02:23 cpu)
> meth
An object of class methrix
n_CpGs: 28,217,448
n_samples: 4
is_h5: FALSE
Reference: hg19
What can be done on methrix
object? Following are the key functions
#reading and writing: read_bedgraphs() #Reads in bedgraph files into methrix write_bedgraphs() #Writes bedGraphs from methrix object write_bigwigs() #Writes bigWigs from methrix object #operations order_by_sd() #Orders methrix object by SD region_filter() #Filters matrices by region mask_methrix() #Masks lowly covered CpGs coverage_filter() #Filters methrix object based on coverage subset_methrix() #Subsets methrix object based on given conditions. remove_uncovered() #Removes loci that are uncovered across all samples remove_snps() #Removes loci overlapping with possible SNPs #Visualization and QC methrix_report() #Creates a detailed interative html summary report from methrix object methrix_pca() #Principal Component Analysis plot_pca() #Plots the result of PCA plot_coverage() #Plots coverage statistics plot_density() #Plots the density distribution of the beta values plot_violin() #Plots the distribution of the beta values on a violin plot plot_stats() #Plot descriptive statistics get_stats() #Estimate descriptive statistics of the object #Other methrix2bsseq() #Convert methrix to bsseq object