R

Introduction

R is free programming language for statistical computing and graphics. R code can be executed in the Integrated Development Environment (IDE) RStudio, or from the command line interface (CLI).

Availability

Multiple installations of R are available on all our systems. You can use them by loading an R module: module load R will load the latest version or you can use module load R/<version> to select a specific version.

Several versions of the base R installation are available:

  • R/4.3.2-gfbf-2023a

  • R/4.3.3-gfbf-2023b

  • R/4.4.1-gfbf-2023b

Additionally, a couple bundle versions are available that include many readily available packages:

  • R-bundle-CRAN/2023.12-foss-2023a

  • R-bundle-Bioconductor/3.18-foss-2023a-R-4.3.2

RStudio is also available on Open OnDemand (OOD) for the tinkercliffs and owl clusters. Currently only 1 version of R is available:

  • R 4.3.2

Execution modes

Just like any other software, R code can be executed in two modes:

  • Interactive mode

  • Batch mode

Running interactively (e.g. in RStudio or the CLI) can be great for code development with small examples. Larger computations should be submitted as jobs, via a tradition job submission script.

R from the command line

To run R from the command line, we need to load the software. In an interactive job on TinkerCliffs, this would look like so:

[slima2@tinkercliffs2 ~]$ interact -A <your slurm account> -p normal_q
srun: job 2920622 queued and waiting for resources
srun: job 2920622 has been allocated resources
[slima2@tc008 ~]$ module load R/4.4.1-gfbf-2023b
[slima2@tc008 ~]$ R
...
>

Alternatively, you run R code from the command line in batch mode. This would generally require 2 scripts:

  1. an R script with the actual R code we are needing to run

  2. a shell script for submission to the job schedulers

An example R script called hello.R might print a message containing the host name to standard output:

print("Rello, from", Sys.info()["nodename"])

And an example bash shell script called hello_R.sh might request just a single core on a single node:

#!/bin/bash
#SBATCH -A arcadm
#SBATCH -p normal_q
module load R/4.4.1-gfbf-2023b
Rscript hello.R

With these files, we can submit the job from the login node:

[slima2@tinkercliffs2 ~]$ sbatch hello_R.sh
Submitted batch job 2920595

Once your job is finished, anything that was printed to standard output will be in a file called slurm-2920595.out.

Installing packages

By default, the available directories are set based on the location of the R installation, and the value of the environment variable R_LIBS_USER which should be the path to where packages are installed. This value is created and set for you upon launching RStudio from OOD. This is to help you organize your environment and ensure that packages are loaded with the correct version of R you are using. For example:

> .libPaths()
[1] "/home/slima2/R/tinkercliffs-rome/4.3.2"                                       
[2] "/apps/easybuild/software/tinkercliffs-rome/R-bundle-CRAN/2023.12-foss-2023a"  
[3] "/apps/easybuild/software/tinkercliffs-rome/R/4.3.2-gfbf-2023a/lib64/R/library"

If your library paths look similar, packages will be installed into your home directory. To install packages, do:

>install.packages("package of interest")

If you are using R version 4.3.X, you may notice issues installing common packages (e.g. ggplot2) because CRAN won’t find versions of some dependencies that are compatible with R 4.3. The solution is to install older versions of these dependencies from archived source files. For example, two common dependencies are MATRIX and MASS:

packageurl1 <- 'https://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_1.6-5.tar.gz'
install.packages(packageurl1, repos=NULL, type="source")
packageurl2 <- 'https://cran.r-project.org/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz'
install.packages(packageurl2, repos=NULL, type="source")

Legacy Availability via Containers

We are moving away from developing RStudio in containerized environments; however we still have our old implementations available in Open OnDemand >> “Legacy Apps” >> “RStudio – RStudio > 1.4.1717, R>4.1”. Several containers are available with relatively older versions of R with different sets of packages:

  • ood-rstudio-basic

  • ood-rstudio-bio

  • ood-rstudio-geospatial

  • ood-rstudio-keras

  • ood-rstudio-qiime2

The Dockerfiles are available on GitHub searching for “ood-rstudio” and the images available on DockerHub searching for “rsettlag/ood-rstudio”. The easiest way to see what libraries are installed in the container is to simply start the Rstudio app via Open Ondemand.