Continuous integration is an amazing tool when developing R packages. We push a change to the server, and a process is spawned that checks we haven’t done something silly. It protects us from ourselves! However this process can become slow, as typically the CI process starts with a blank virtual machine (VM).
If you are using R, then the current most popular CI pipeline is Travis CI, but there’s also Jenkins, GitHub Actions, GitLab CI, Circle CI and a few others. They all follow the same idea. Start a VM, install your R package, then run a bunch of checks. One obvious bottle neck is the “install your R package” step, as any R package may have a large number of dependencies.
In a recent post, we showed the different ways of speeding up package installation (worth checking this out if you find package installation/updating slow). In this post, we’ll discuss leveraging some of those techniques for our CI pipeline.
RStudio Package Manager (RSPM)
The RStudio package manager is perhaps the easiest way of speeding up your CI process. RSPM provides precompiled binaries for CRAN packages, which should ensure a faster install. To test this I made a simple package, with no functions, but a dependency on the tidyverse, .i.e.
Imports: tidyverse in the DESCRIPTION file. Then I started two travis CI jobs. The first had a
language: r cache: packages
The total time for this travis job was around twelve minutes.
The second job had same two lines, but also an additional
before_install: - echo "options(repos = c(CRAN = 'https://packagemanager.rstudio.com/all/__linux__/xenial/latest'))" >> ~/.Rprofile.site - echo "options(HTTPUserAgent = paste0('R/', getRversion(), ' R (', paste(getRversion(), R.version['platform'], R.version['arch'], R.version['os']), ')'))" >> ~/.Rprofile.site
While looking complicated, it is actually fairly simple. The first line adds the RStudio binary package repository to the
.Rprofile. The second adds an
HTTPUserAgent to the
.Rprofile to enable packages that are installed via
Rscript to also use the binary package versions. These few lines cut the travis build time from around 12 minutes to under 4 minutes.
The above is an incredibly easy way to speed-up your CI steps and works with other CI systems. If you use GitHub Actions, then this has already been implemented.
A couple of things to note
- The above code is for Ubuntu 16.04 Xenial. If you are using
18.04 bionic, then change in the obvious way
- There are few different OSs available for RSPM
- If you are interested in using the RSPM in your own organisation, give us a shout - we’re RStudio Partners.
There are three other possibilities for reducing your CI time.
The first is similar to the RStudio package manager and use binary builds, but this time use the Ubuntu versions provided by Michael Rutter. The general idea is to add a new Ubuntu package repository, then install packages via
apt install r-cran-*. Details are available at CRAN. Also see Dirk Eddelbuettel’s recent blog post and youtube video for even more details.
Alternatively, we could use the
ccachetrick, where we store compiled files to be used for the next build. This requires a little more work, but this has already been done by Patrick Schratz
Parallel builds using the
install.packages()typically doesn’t typically work for most CI systems, as the (free) VM will only have a single core.
Jumping Rivers are full service, RStudio certified. Part of our role is to offer support in RStudio Pro products. If you use any RStudio Pro products, feel free to contact us (email@example.com). We may be able to offer free support.