Versatile BedGraph reader.

read_bedgraphs(
  files = NULL,
  pipeline = NULL,
  zero_based = TRUE,
  stranded = FALSE,
  collapse_strands = FALSE,
  ref_cpgs = NULL,
  ref_build = NULL,
  contigs = NULL,
  vect = FALSE,
  vect_batch_size = NULL,
  coldata = NULL,
  chr_idx = NULL,
  start_idx = NULL,
  end_idx = NULL,
  beta_idx = NULL,
  M_idx = NULL,
  U_idx = NULL,
  strand_idx = NULL,
  cov_idx = NULL,
  synced_coordinates = FALSE,
  n_threads = 1,
  h5 = FALSE,
  h5_dir = NULL,
  h5temp = NULL,
  verbose = TRUE
)

Arguments

files	bedgraph files.
pipeline	Default NULL. Currently supports "Bismark_cov", "MethylDackel", "MethylcTools", "BisSNP", "BSseeker2_CGmap" If not known use idx arguments for manual column assignments.
zero_based	Are bedgraph regions zero based ? Default TRUE
stranded	Default FALSE
collapse_strands	If TRUE collapses CpGs on different crick strand into watson. Deafult FALSE
ref_cpgs	BSgenome object, or name of the installed BSgenome package, or an output from `extract_CPGs`. Example: BSgenome.Hsapiens.UCSC.hg19
ref_build	reference genome for bedgraphs. Default NULL. Only used for additional details. Doesnt affect in any way.
contigs	contigs to restrict genomic CpGs to. Default all autosomes and allosomes - ignoring extra contigs.
vect	To use vectorized code. Default FALSE. Set to TRUE if you don't have large number of BedGraph files.
vect_batch_size	Default NULL. Process samples in batches. Applicable only when vect = TRUE
coldata	An optional DataFrame describing the samples. Row names, if present, become the column names of the matrix. If NULL, then a DataFrame will be created with basename of files used as the row names.
chr_idx	column index for chromosome in bedgraph files
start_idx	column index for start position in bedgraph files
end_idx	column index for end position in bedgraph files
beta_idx	column index for beta values in bedgraph files
M_idx	column index for read counts supporting Methylation in bedgraph files
U_idx	column index for read counts supporting Un-methylation in bedgraph files
strand_idx	column index for strand information in bedgraph files
cov_idx	column index for total-coverage in bedgraph files
synced_coordinates	Are the start and end coordinates of a stranded bedgraph are synchronized between + and - strands? Possible values: FALSE (default), TRUE if the start coordinates are the start coordinates of the C on the plus strand.
n_threads	number of threads to use. Default 1. Be-careful - there is a linear increase in memory usage with number of threads. This option is does not work with Windows OS.
h5	Should the coverage and methylation matrices be stored as 'HDF5Array'
h5_dir	directory to store H5 based object
h5temp	temporary directory to store hdf5
verbose	Be little chatty ? Default TRUE.

Value

An object of class methrix

Details

Reads BedGraph files and generates methylation and coverage matrices. Optionally arrays can be serialized as on-disk HDFS5 arrays.

Examples

if (FALSE) {
bdg_files = list.files(path = system.file('extdata', package = 'methrix'),
pattern = '*\\.bedGraph\\.gz$', full.names = TRUE)
hg19_cpgs = methrix::extract_CPGs(ref_genome = 'BSgenome.Hsapiens.UCSC.hg19')
meth = methrix::read_bedgraphs( files = bdg_files, ref_cpgs = hg19_cpgs,
chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4,
stranded = FALSE, zero_based = FALSE, collapse_strands = FALSE)
}