Useful packages:

Interactive visualization:

plotly ggplotly Glimma

nice tables in Rmarkdown

DT

enrichment analysis

clusterProfiler

annotation of genomic regions

annotatr

Gene expression, sequencing

DESeq2 EdgeR

mutation data

MAFtools

Visualization genomic regions

Gviz

Handling genomic data (e.g. bam files)

rtracklayer

Methylation data

Array

minfi RnBeads

Sequencing

methrix bsseq

Visualization, heatmaps

pheatmap Complexheatmap

Nice colors palettes

ggsci scico

Funtions in R

R is a function-oriented language. Basically every object manipulation is done with functions. It is possible that during a data analysis we are only using functions from the different packages, but it is useful to write our own functions as well. The main purpose is to avoid code repetition. As a rule of a thumb, if you need to do the same thing more than twice -> write a function instead.

General syntax

f <- function(){
  cat("Hello world! \n")
}
f()
## Hello world!
  • name: if we want to actually run the function
  • arguments (optional)
  • expressions - what the function is doing
  • return value (options)
f <- function(num1, num2){
  res <- num1+num2
  res
}
f(1,2)
## [1] 3

Return value

  • Returns the last evaluated function

  • Explicitly stated. In this case, the evaluation of the function stops and exits.

f <- function(num1, num2){
  res <- num1+num2
  return(res)
  cat("Hello world! \n")
}
f(1,2)
## [1] 3

Arguments

  • Named arguments
  • Evaluation can be position or name-wise, or even with their combination.
f <- function(num1, num2){
  print(num1)
  print(num2)
}
f(1,2)
## [1] 1
## [1] 2
f(num2=2, num1=1)
## [1] 1
## [1] 2
f(2, num1=1)
## [1] 1
## [1] 2
try(f(2))
## [1] 2
## Error in print(num2) : argument "num2" is missing, with no default

Arguments

  • the arguments can have default values:
f <- function(num1, num2=3){
  print(num1)
  #browser()
  print(num2)
}
f(1,2)
## [1] 1
## [1] 2
f(num2=2, num1=1)
## [1] 1
## [1] 2
f(2)
## [1] 2
## [1] 3

Environment

The functions are working in their own environment. This environment can’t be seen from the outside. E.g. if you define a variable within the function, it won’t be available outside the function. There are certain rules of what variables can be seen from inside the function’s environment. Lexical scoping - searches the environment where the function was defined.

g <- function(x) { 
         x*y
}
try(g(2))
## Error in g(2) : object 'y' not found
y <- 10
g(2)
## [1] 20
#but!
x <- 1:10
try(mean())
## Error in mean.default() : argument "x" is missing, with no default
g <- function(x) { 
  ab <- 12
         x*y
}
try(print(ab))
## Error in print(ab) : object 'ab' not found

Why?

## [1] "f" "g" "x" "y"
#ls(environment(mean))

search()
##  [1] ".GlobalEnv"        "package:stats"     "package:graphics" 
##  [4] "package:grDevices" "package:utils"     "package:datasets" 
##  [7] "package:methods"   "Autoloads"         "tools:callr"      
## [10] "package:base"

Troubleshooting

Due to the own environment, it can be difficult what goes wrong within the function. Useful functions:

traceback() browser() debug() options(error=recover) options(error=stop)