Parse BED files for unique genomic coordinates

read_index(
  files,
  col_list,
  n_threads = 1,
  zero_based = FALSE,
  batch_size = 200,
  verbose = TRUE
)

Arguments

files	list of strings; file.paths of BED files
col_list	string; The column index object for the input BED files
n_threads	integer; number of threads to use. Default 1. Be-careful - there is a linear increase in memory usage with number of threads. This option is does not work with Windows OS.
zero_based	boolean; flag for whether the input data is zero-based or not
batch_size	integer; Max number of files to hold in memory at once. Default 20
verbose	boolean; flag to output messages or not.

Value

data.table containing all unique genomic coordinates

Details

Create list of unique genomic regions from input BED files. Populates a list of batch_size+1 with the genomic coordinates from BED files, then runs unique when the list is full and keeps the running results in the batch_size+1 position. Also indexes based on 'chr' and 'start' for later searching.

Examples

if (FALSE) {
#Do Nothing
}