Versatile BedGraph reader.

read_bedgraphs(
  files = NULL,
  pipeline = NULL,
  zero_based = TRUE,
  stranded = FALSE,
  collapse_strands = FALSE,
  ref_cpgs = NULL,
  ref_build = NULL,
  contigs = NULL,
  vect = FALSE,
  vect_batch_size = NULL,
  coldata = NULL,
  chr_idx = NULL,
  start_idx = NULL,
  end_idx = NULL,
  beta_idx = NULL,
  M_idx = NULL,
  U_idx = NULL,
  strand_idx = NULL,
  cov_idx = NULL,
  synced_coordinates = FALSE,
  n_threads = 1,
  h5 = FALSE,
  h5_dir = NULL,
  h5temp = NULL,
  verbose = TRUE
)

Arguments

files

bedgraph files.

pipeline

Default NULL. Currently supports "Bismark_cov", "MethylDackel", "MethylcTools", "BisSNP", "BSseeker2_CGmap" If not known use idx arguments for manual column assignments.

zero_based

Are bedgraph regions zero based ? Default TRUE

stranded

Default FALSE

collapse_strands

If TRUE collapses CpGs on different crick strand into watson. Deafult FALSE

ref_cpgs

BSgenome object, or name of the installed BSgenome package, or an output from extract_CPGs. Example: BSgenome.Hsapiens.UCSC.hg19

ref_build

reference genome for bedgraphs. Default NULL. Only used for additional details. Doesnt affect in any way.

contigs

contigs to restrict genomic CpGs to. Default all autosomes and allosomes - ignoring extra contigs.

vect

To use vectorized code. Default FALSE. Set to TRUE if you don't have large number of BedGraph files.

vect_batch_size

Default NULL. Process samples in batches. Applicable only when vect = TRUE

coldata

An optional DataFrame describing the samples. Row names, if present, become the column names of the matrix. If NULL, then a DataFrame will be created with basename of files used as the row names.

chr_idx

column index for chromosome in bedgraph files

start_idx

column index for start position in bedgraph files

end_idx

column index for end position in bedgraph files

beta_idx

column index for beta values in bedgraph files

M_idx

column index for read counts supporting Methylation in bedgraph files

U_idx

column index for read counts supporting Un-methylation in bedgraph files

strand_idx

column index for strand information in bedgraph files

cov_idx

column index for total-coverage in bedgraph files

synced_coordinates

Are the start and end coordinates of a stranded bedgraph are synchronized between + and - strands? Possible values: FALSE (default), TRUE if the start coordinates are the start coordinates of the C on the plus strand.

n_threads

number of threads to use. Default 1. Be-careful - there is a linear increase in memory usage with number of threads. This option is does not work with Windows OS.

h5

Should the coverage and methylation matrices be stored as 'HDF5Array'

h5_dir

directory to store H5 based object

h5temp

temporary directory to store hdf5

verbose

Be little chatty ? Default TRUE.

Value

An object of class methrix

Details

Reads BedGraph files and generates methylation and coverage matrices. Optionally arrays can be serialized as on-disk HDFS5 arrays.

Examples

if (FALSE) { bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = '*\\.bedGraph\\.gz$', full.names = TRUE) hg19_cpgs = methrix::extract_CPGs(ref_genome = 'BSgenome.Hsapiens.UCSC.hg19') meth = methrix::read_bedgraphs( files = bdg_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4, stranded = FALSE, zero_based = FALSE, collapse_strands = FALSE) }