read_bedgraphs.Rd
Versatile BedGraph reader.
read_bedgraphs( files = NULL, pipeline = NULL, zero_based = TRUE, stranded = FALSE, collapse_strands = FALSE, ref_cpgs = NULL, ref_build = NULL, contigs = NULL, vect = FALSE, vect_batch_size = NULL, coldata = NULL, chr_idx = NULL, start_idx = NULL, end_idx = NULL, beta_idx = NULL, M_idx = NULL, U_idx = NULL, strand_idx = NULL, cov_idx = NULL, synced_coordinates = FALSE, n_threads = 1, h5 = FALSE, h5_dir = NULL, h5temp = NULL, verbose = TRUE )
files | bedgraph files. |
---|---|
pipeline | Default NULL. Currently supports "Bismark_cov", "MethylDackel", "MethylcTools", "BisSNP", "BSseeker2_CGmap" If not known use idx arguments for manual column assignments. |
zero_based | Are bedgraph regions zero based ? Default TRUE |
stranded | Default FALSE |
collapse_strands | If TRUE collapses CpGs on different crick strand into watson. Deafult FALSE |
ref_cpgs | BSgenome object, or name of the installed BSgenome package, or an output from |
ref_build | reference genome for bedgraphs. Default NULL. Only used for additional details. Doesnt affect in any way. |
contigs | contigs to restrict genomic CpGs to. Default all autosomes and allosomes - ignoring extra contigs. |
vect | To use vectorized code. Default FALSE. Set to TRUE if you don't have large number of BedGraph files. |
vect_batch_size | Default NULL. Process samples in batches. Applicable only when vect = TRUE |
coldata | An optional DataFrame describing the samples. Row names, if present, become the column names of the matrix. If NULL, then a DataFrame will be created with basename of files used as the row names. |
chr_idx | column index for chromosome in bedgraph files |
start_idx | column index for start position in bedgraph files |
end_idx | column index for end position in bedgraph files |
beta_idx | column index for beta values in bedgraph files |
M_idx | column index for read counts supporting Methylation in bedgraph files |
U_idx | column index for read counts supporting Un-methylation in bedgraph files |
strand_idx | column index for strand information in bedgraph files |
cov_idx | column index for total-coverage in bedgraph files |
synced_coordinates | Are the start and end coordinates of a stranded bedgraph are synchronized between + and - strands? Possible values: FALSE (default), TRUE if the start coordinates are the start coordinates of the C on the plus strand. |
n_threads | number of threads to use. Default 1. Be-careful - there is a linear increase in memory usage with number of threads. This option is does not work with Windows OS. |
h5 | Should the coverage and methylation matrices be stored as 'HDF5Array' |
h5_dir | directory to store H5 based object |
h5temp | temporary directory to store hdf5 |
verbose | Be little chatty ? Default TRUE. |
An object of class methrix
Reads BedGraph files and generates methylation and coverage matrices. Optionally arrays can be serialized as on-disk HDFS5 arrays.
if (FALSE) { bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = '*\\.bedGraph\\.gz$', full.names = TRUE) hg19_cpgs = methrix::extract_CPGs(ref_genome = 'BSgenome.Hsapiens.UCSC.hg19') meth = methrix::read_bedgraphs( files = bdg_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4, stranded = FALSE, zero_based = FALSE, collapse_strands = FALSE) }