vignettes/x05_hdf5.Rmd
x05_hdf5.Rmdread_beds functionFor memory efficient read in, one can use an HDF5 based scMethrix object. A small number of bedgraph files are in the memory at the same time, while the resulting object won’t be stored in the memory, but on-disk.
Additional arguments to use HDF5:
h5=TRUE
meth <- read_beds(
files = bed_files,
ref_cpgs = mm19_cpgs,
chr_idx = 1,
start_idx = 2,
strand_idx = 3,
cov_idx = 4,
M_idx = 5,
stranded = FALSE,
zero_based = TRUE,
#collapse_strands = FALSE,
colData = sample_anno,
batch_size = 2,
h5 = TRUE
)Basic scMethrix operations work with HDF5-based objects as well. Functions relying on external packages (e.g. imputation and clustering) will require casting to an in-memory matrix before processing.
meth <- scMethrix::remove_uncovered(meth)It is also possible to transform non-HDF5-based objects to HDF5-based ones and back.
m <- convert_HDF5_scMethrix(meth)
m2 <- convert_scMethrix(m)The primary goal of scMethrix is to allow users to handle the whole-genome methylation data. The functions are optimized to keep the speed high and the memory need low. However, additional efforts were taken to allow scMethrix to handle large number of samples (even > 1000) in the samples in the same, efficient way. Therefore, many functions implement the argument batch_size to split these datasets into digestible chunks and n_threads to parallelize the processing of these chunks. Functions currently supporting the arguments batch_size and n_threads: read_beds get_region_summary
The multicore option is platform independent.