Styling Base R Graphics

  •  
  •  
  •  
  •  

Publication quality base R graphics

Base R graphics get a bad press (although to be fair, they could have chosen their default values better). In general, they are viewed as a throw back to the dawn of the R era. I think that most people would agree that, in general, there are better graphics techniques in R (e.g. ggplot2). However it is occasionally worthwhile making a plot using base R graphics. For example, if you have a publication and you want to make sure the graphics are reproducible in five years.

In this post we’ll discuss methods for dramatically altering the look and feel of a base R plot. With a bit (ok, a lot) of effort, it is possible to change all aspects of the plot to your liking.

Typically I detest the iris the data set. It’s perhaps the most over used dataset in the entire R world. For this very reason, we’ll use it in this post to show what’s possible 😉

The standard base R scatter plot is

plot(iris$Sepal.Length, iris$Sepal.Width, col = iris$Species)
legend("topright", legend = levels(iris$Species), col = 1:3, pch = 21)

This gives a simple scatter plot with associated legend using the default colour scheme. The list of things wrong with the this plot is fairly lengthy, but not limited to

  • Colours
  • Margins
  • Axis labels
  • Overlapping points
  • Wasted space

However with base R graphics we can fix all of these faults!

Fixing the problem

What’s not clear in the scatter plot above is that some points lie on top of each other. So the first step is to wiggle the points using the jitter() function to avoid points sitting on top of each other.

## Same as geom_jitter
iris$Sepal.Length = jitter(iris$Sepal.Length)
iris$Sepal.Width = jitter(iris$Sepal.Width)

Next we select nicer colours (I’ve taken this palette from the great I want hue website). The palette() function allows you to globally change the colour palette used by base R plots

alpha = 150 # Transparent points
palette(c(rgb(200, 79, 178, alpha = alpha, maxColorValue = 255), 
          rgb(105, 147, 45, alpha = alpha, maxColorValue = 255),
          rgb(85, 130, 169, alpha = alpha, maxColorValue = 255)))

Next we alter a few plot characteristics with the par() function

par(mar = c(3, 3, 2, 1), # Dist' from plot to side of page
    mgp = c(2, 0.4, 0), # Dist' plot to label
    las = 1, # Rotate y-axis text
    tck = -.01, # Reduce tick length
    xaxs = "i", yaxs = "i") # Remove plot padding

Then it comes to the plot() function itself. This has now become a lot more complicated. We create the plot using the plot() function, with a number of arguments

plot(iris$Sepal.Length, iris$Sepal.Width, 
     bg = iris$Species, # Fill colour
     pch = 21, # Shape: circles that can filed
     xlab = "Sepal Length", ylab = "Sepal Width", # Labels
     axes = FALSE, # Don't plot the axes
     frame.plot = FALSE, # Remove the frame 
     xlim = c(4, 8), ylim = c(2, 4.5), # Limits
     panel.first = abline(h = seq(2, 4.5, 0.5), col = "grey80"))

then add in the x-axis tick marks

at = pretty(iris$Sepal.Length)
mtext(side = 1, text = at, at = at, 
      col = "grey20", line = 1, cex = 0.9)

and the y-axis

at = pretty(iris$Sepal.Width)
mtext(side = 2, text = at, at = at, col = "grey20", line = 1, cex = 0.9)

This just leaves the legend. Instead of using the legend() function, we’ll place the names next to the points via the text() function

text(5, 4.2, "setosa", col = rgb(200, 79, 178, maxColorValue = 255))
text(5.3, 2.1, "versicolor", col = rgb(105, 147, 45, maxColorValue = 255))
text(7, 3.7, "virginica", col = rgb(85, 130, 169, maxColorValue = 255))

Finally, we have the plot title

title("The infamous IRIS data", adj = 1, 
      cex.main = 0.8, font.main = 2, col.main = "black")

Putting it all together gives

A much better job.

Why not use ggplot2 (or something else)?

This seems like a lot of work to create a simple scatter plot. Why not use X, Y, or ggplot2? We even have a course on ggplot2 so we’re not biased. The purpose of this article isn’t to get into a religious visualisation war on base R vs … However if you want such a war, have a look at the blog posts by Flowing Data, Jeff Leek and David Robinson.

One point that is worth making is that since we are only using base R functions, our plot will almost certainly be reproducible for all future versions of R! Not something to quickly dismiss.


8 thoughts on “Styling Base R Graphics

  1. Nice example of creative plotting. I like base R graphics because of its flexibility and the straightforward steps needed to get you where you want to go, especially for graphics that are not run of the mill.I’ve found that it can do some things I’d spend much more time attempting in ggplot2. Recently, I decided to port a plot I’d made in base R to ggplot2, just to see how it might be done. I spent an inordinate amount of time typing/tweaking to get it just right; it took me longer than the original plot did.

    • Frankly/bluntly: neither cherry-picking examples nor basing time-estimation nor the definition of “complexity” based on one’s “comfort-level”/knowledge-base are exemplary characteristics of sincere “data scientists”.

  2. Great post. The base graphic at the end looks awesome.ggplot2 users (which I use as well) will say you can do that in one line (aes(x=.,y=.,group=.))… Yeah but the graph still looks terrible, just as bad as base in my opinion. In fact, ggplot2 graphs look really bad if you don’t spend a lot of time customizing. Ever seen a default histogram plot from ggplot2?The power of social marketing is strong. I guess that’s why Lattice faded away (I don’t think Bill Cleveland is on Twitter).

    • Thanks for the comments. I plan to do a similar post using ggplot2 in the next few days. If you fancy doing one on lattice, it would make a nice trio (I never got my head around lattice).

      • Thanks but I don’t think I’m the man to do lattice justice either. Applied Predictive Modeling (Max Kuhn and Kjell Johnson) is full of beautiful graphs done in lattice (Sadly, I could not find the source code online to reproduce these graphs).Max recently joined RStudio and switched (forced?) to gpplot2 in blog posts where the graphs are not nearly as attractive as in the book. Perhaps because he was a master at lattice and doesn’t have the same experience with ggplot2 (just my guess).

  3. The GIANT difference is that one one plotting system is based on a visualization grammar while the other is based on … well … “random acts of interface kindness”. One can make pretty darn good looking Excel charts. That doesn’t mean one _should_. The one area where I’ll grant _some_ slack to base graphics is speed. `grid` introduces some overhead which can make it slower for more complex plots, but — unlike the random `plot()` inheritance foibles — ggplot has a foundation based on a consistent idiom. That oft gets overlooked in these comparisons. It’s not just about “pretty”.

    • So I tried to write this post avoiding ggplot2 wars (I failed, sorry).Anyway, I more or less agree with your points, but obviously not everything 😉 I’m sure you’d agree that while plot() is faster, for most tasks the ease of use of ggplot2 out weighs this saving. Also, the comment about comparisons and pretty is unfair. Since * I wasn’t comparing anything! * All other comparisons use pretty as a hammer over plot()For me the big consideration is reproduciblilty. For example, I can reproduce every plot from my 2002 thesis perfectly (admittedly they are ugly). I’m fairly confident that in 16 years time, I’ll still be able to run this code. To stress, I emailed code to reproduce the graphics from a paper I published 2005. I think this is a fairly comment occurrence for most academics.This morning I used your excellent R package (hrbthemes) to produce a graph for a paper. It’s looks amazing and only used a few lines of consistent code. What is the likelihood of this code working in 16 or 32 years? Would it be worth me spending 15 minutes to redo your plot using base graphics, to avoid future pain?

  4. I prefer the first graphic, cause I can distinguish versicolor and virginica better than in the second plot! Base-R is not that bad, there are too many ggplot2 fanboys. 😉

Comments are closed.