Dr Colin Gillespie

Colin has been using R since 1999. He’s the author of a number of R packages and has published the book Efficient R Programming with O’Reilly.

Parquet vs the RDS Format

Author: Colin Gillespie

A benefit of using the {arrow} package with parquet files is it enables you to work with ridiculously large data sets from the comfort of an R session. In this post we explore the timescales associated with different methods of data storage.

Reading and Writing Data with {arrow}

Author: Colin Gillespie

Apache Arrow is a cross-language format for super fast in-memory data. It's designed for efficient analytic operations. In this post, we look at reading and writing data using Arrow and the advantages of the parquet file format.

Automating Dockerfile creation for Shiny apps

Authors: Jamie Owen & Colin Gillespie

Deploying shiny applications can be frustrating, making sure your production environment matches your local environment where you can see your application running. In this blog post we explore how we might start writing code to automate the process of creating Dockerfiles for producing images that make our local, running, shiny application able to be deployed in a container.

New features in R 4.2.0

Author: Colin Gillespie

R version 4.2.0 is about to be released. This release includes an update to the native pipe, changes to logical operators and improvements to the help page. In this blog post, we take a look at (some of) these new features. Highlighting (in our opinion) the most exciting changes.

Forgotten features of R 4.0.0

Author: Colin Gillespie

R 4.0 was released almost two years ago. However, the majority of R users didn't immediately adopt the new version due to obvious constraints when updating software. The consequence is that many of the new and useful features are forgotten about. This post highlights the features as we've moved to R 4.0.

Git: Moving from Master to Main

Author: Colin Gillespie

In 2020, GitHub took the correct decision to change the default branch from master to main. For single, independent repositories, this is relatively straightforward. But moving groups or organisations is more complex and requires planning.

Webinars: R in Production

Author: Colin Gillespie

Bridging the gap between data science and IT teams is much easier than you might expect! This two-part webinar will discuss why open source languages are suitable for enterprise data science, and how data scientists can work with the IT team to get their organisational buy-in.

Moving to Hugo

Author: Colin Gillespie

Moving your website to Hugo brings a lot of benefits, but there are also challenges. In this post, we'll discuss our top tips for making that move to Hugo as smooth as possible.

External Graphics with knitr

Author: Colin Gillespie

Adding images with {knitr} is straightforward; we simply use include_graphics(). However, it is easy to add an image that is too large, or has the wrong dimensions. This post tells you what to watch out for, and how to optimise your images for the web.

Selecting the correct image file type

Author: Colin Gillespie

When including graphics within a markdown document, it's crucial to use the correct file type from generating graphics. However, there isn't one size fits all, instead, we should choose what's most appropriate for the image.

Detecting Security Vulnerabilities in R Packages

Author: Colin Gillespie

One of our main roles at Jumping Rivers is to set-up and provide ongoing maintenance to R, Python and RStudio infrastructure. This typically involves ensuring software is up-to-date and making sure everything is running smoothly. The {oysteR} package is an R interface to the OSS Index that allows users to scan their installed R packages.

Speeding up your Continuous Integration Builds

Author: Colin Gillespie

Continuous integration is an amazing tool when developing R packages. We push a change to the server, and a process is spawned that checks we haven’t done something silly. It protects us from ourselves! However this process can become slow, as typically the CI process starts with a blank virtual machine (VM).

Saving R Graphics across OSs

Author: Colin Gillespie

R is known for it’s amazing graphics. Not only {ggplot2}, but also {plotly}, and the other dozens of packages at the graphics task view. There seems to be a graph for every scenario. However once you’ve created your figure, how do you export it? This post compares standard methods for exporting R plots as PNGs/PDFs across different OSs.

Customising your Rprofile

Author: Colin Gillespie

Every time R starts, it runs through a couple of R scripts. One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script!

R Packages: Are we too trusting?

Author: Colin Gillespie

One of the great things about R, is the myriad of packages. Packages are typically installed via CRAN, Bioconductor and GitHub. But how often do we think about what we are installing? Do we pay attention or just install when something looks neat? Do we think about security or just take it that everything is secure?

{benchmarkme}: new version

Author: Colin Gillespie

When discussing how to speed up slow R code, my first question is what is your computer spec? It’s always surprised me that people are wondering why analysing big data is slow, yet they are using a five-year-old cheap laptop. Spending a few thousand pounds would often make their problems disappear.

Hacking Bioconductor

Author: Colin Gillespie

Domain squatting or URL hijacking is a straightforward attack that requires little skill. An attacker registers a domain that is similar to the target domain and hopes that a user accidentally visits the site. For example, if the domain is example.com, then a typo-squatter would register similar domains such as

What R version do you really need for a package?

Author: Colin Gillespie

At Jumping Rivers we run a lot of R courses. Some of our most popular courses revolve around the tidyverse, in particular, our Introduction to the tidyverse and our more advanced mastering course. We even trained over 200 data scientists NHS - see our case study for more details.

R from the turn of the century

Author: Colin Gillespie

Last week I spent some time reminiscing about my PhD and looking through some old R code. This trip down memory lane led to some of my old R scripts that amazingly still run. My R scripts were fairly simple and just created a few graphs.

Styling {ggplot2} Graphics

Author: Colin Gillespie

In our previous post, we demonstrated that contrary to popular opinion, it is possible to generate attractive looking plots using just base graphics. Although we did confess, that it did take a lot of time and effort. In this post, we repeat the same exercise.

Our Logo In R

Author: Colin Gillespie

Hi all, so given our logo here at Jumping Rivers is a set of lines designed to look like a Gaussian Process, we thought it would be a neat idea to recreate this image in R. To do so we’re going to need a couple packages. We do the usual install.packages() dance (remember this step can be performed in parallel).

Styling Base R Graphics

Author: Colin Gillespie

Base R graphics get a bad press (although to be fair, they could have chosen their default values better). In general, they are viewed as a throw back to the dawn of the R era. I think that most people would agree that, in general, there are better graphics techniques in R (e.g. {ggplot2}).

Hosting RStudio Server on Azure

Author: Colin Gillespie

Can’t be bothered reading, tell me now. Host RStudio server on an azure instance. Configure the instance to access RStudio with a nice url. Getting started: Azure is cloud computing framework provided by Microsoft, the same idea as AWS by Amazon.

Speeding up package installation

Author: Colin Gillespie

Can’t Be Bothered Reading, Tell Me Now. A simple one line tweak can significantly speed up package installation and updates. The Wonder Of CRAN: One of the best features of R is CRAN. When a package is submitted to CRAN, not only is it checked under three versions of R