Subsets a given list of CpGs by another list of CpGs

subset_ref_cpgs(ref_cpgs, gen_cpgs, verbose = TRUE)

Arguments

ref_cpgs

data.table; A reference set of CpG sites (e.g. Hg19 or mm10) in bedgraph format

gen_cpgs

data.table; A subset of CpG sites. Usually obtained from read_index.

verbose

boolean; flag to output messages or not

Value

Returns list of CpG sites in bedgraph format

Details

Typically used to reduce the number of potential CpG sites to include only those present in the input files so as to maximize performance and minimize resources. Can also be used for quality control to see if there is excessive number of CpG sites that are not present in the reference genome.

Examples

ref_cpgs = data.frame(chr="chr1",start=(1:5*2-1), end=(1:5*2)) subset_ref_cpgs(ref_cpgs,ref_cpgs[1:3,])
#> Dropped 2/5 CpGs (40%) from the reference set
#> 0/3 subset CpGs (0%) were not present in the reference set
#> chr start end #> 1 chr1 1 2 #> 2 chr1 3 4 #> 3 chr1 5 6