05-Large_cohorts.Rmd
read_bedgraph
functionFor memory efficient read in, one can use an HDF5 based methrix
object. Only one bedgraph file is in the memory at the same time, while the resulting object won’t be stored in the memory, but on-disk.
Additional arguments to use HDF5:
h5=TRUE
vect=FALSE
meth <- methrix::read_bedgraphs( files = bed_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 5, U_idx = 6, stranded = FALSE, zero_based = TRUE, collapse_strands = FALSE, coldata = sample_anno, vect = FALSE, h5 = TRUE)
All methrix
functions work with HDF5-based objects as well, there is no difference in using different functions.
meth <- methrix::remove_uncovered(meth)
It is also possible to transform non-HDF5-based objects to HDF5-based ones and back.
m <- convert_HDF5_methrix(m=meth) m2 <- convert_methrix(m=m)
Saving and loading of an HDF5-based object is not possible using the standard save or saveRDS functions. methrix
offers easy to use saving and loading tools, which are essentially wrappers around the saveHDF5SummarizedExperiment
and loadHDF5SummarizedExperiment
functions.
target_dir = paste0( getwd(), '/temp/') save_HDF5_methrix(meth, dir = target_dir, replace = TRUE) meth <- load_HDF5_methrix(dir = target_dir)
The primary goal of methrix
is to allow users to handle the whole-genome methylation data. The functions are optimized to keep the speed high and the memory need low. However, additional efforts were taken to allow methrix
to handle large number of samples (even >100) in the samples in the same, efficient way. Therefore, many functions implement the argument n_chunks
to split these datasets into digestible chunks and n_cores
to parallelize the processing of these chunks.
Functions currently supporting the arguments n_chunks
and n_cores
: coverage_filter
get_region_summary
remove_snps
mask_methrix
only support the n_cores
argument.
The multicore option is not available on Windows.
if (grepl("Windows", Sys.getenv("OS"))){ res <- get_region_summary(meth, regions=dmrs[1:5], n_chunks = 2, n_cores = 1)} else { res <- get_region_summary(meth, regions=dmrs[1:5], n_chunks = 2, n_cores = 2) } res