Parse BED files for unique genomic coordinates

read_index(
  files,
  col_list,
  n_threads = 1,
  zero_based = FALSE,
  batch_size = 200,
  verbose = TRUE
)

Arguments

files

list of strings; file.paths of BED files

col_list

string; The column index object for the input BED files

n_threads

integer; number of threads to use. Default 1. Be-careful - there is a linear increase in memory usage with number of threads. This option is does not work with Windows OS.

zero_based

boolean; flag for whether the input data is zero-based or not

batch_size

integer; Max number of files to hold in memory at once. Default 20

verbose

boolean; flag to output messages or not.

Value

data.table containing all unique genomic coordinates

Details

Create list of unique genomic regions from input BED files. Populates a list of batch_size+1 with the genomic coordinates from BED files, then runs unique when the list is full and keeps the running results in the batch_size+1 position. Also indexes based on 'chr' and 'start' for later searching.

Examples

if (FALSE) { #Do Nothing }