class: center, middle, inverse, title-slide # Efficient R Programming ## Turning it up to 11 ### Colin Gillespie (
@csgillespie
) ###
jumpingrivers.com/t/2017strata/
--- background-image: url(graphic/doll.jpg) background-size: cover # About .pull-right[ * Academic: * Senior Stats' lecturer, Newcastle University * Consultant at [Jumping Rivers](https://www.www.jumpingrivers.com) * Data science training & consultancy * R, Stan, Scala * [Efficient R programming](http://shop.oreilly.com/product/0636920047995.do), O'Reilly ] --- background-image: url(graphic/black_logo.png) background-size: cover # Assumptions on your background * You've used R * General data structures, basic functions --- background-image: url(graphic/black_logo.png) background-size: cover # Benchmarking * Benchmark solutions - Remember, we need multiple _samples_ * __microbenchmark__ package makes this easy -- ## Matrix vs data frame ```r library("microbenchmark") microbenchmark(times = 100, mat[1,], df[1,]) #> Unit: milliseconds #> expr min lq mean median uq max neval #> mat[1, ] 0.003 0.004 0.08 0.006 0.006 7.1 100 #> df[1, ] 0.800 0.800 1.00 0.800 0.900 8.3 100 ``` --- ## The difference: data frame & matrix ![](graphic/combined.svg) --- class: center, middle, inverse background-image: url(graphic/white_logo.png) background-size: cover # Start-up files: .Rprofile & .Renviron --- background-image: url(graphic/monkey.jpg) background-size: cover # .Rprofile: read when R starts 1. Looks for site-wide configuration file - Not widely used 2. Project specific 3. Your home area -- ### Pro-tip Make sure ```r R> Sys.getenv("HOME") #> "/home/ncsg3" ``` is sensible! --- background-image: url(graphic/confused.jpg) class: center # What should you put in your .Rprofile? --- background-image: url(graphic/confused.jpg) class: center # What should you put in your .Rprofile? ### R code --- # Examples: Set a CRAN mirror ```r local({ r = getOption("repos") r["CRAN"] = "https://cran.rstudio.com/" options(repos = r) }) ``` -- * Do you want an internal repository? * Add the repository here! -- ```r local({ r = getOption("repos") r["CRAN"] = "https://cran.rstudio.com/" r["INTERNAL"] = "internal.server.com" options(repos = r) }) ``` --- # Add useful functions ```r nice_par = function(mar = c(3, 3, 2, 1), mgp = c(2, 0.4, 0), tck = -0.01, cex.axis = 0.9, las = 1, mfrow = c(1, 1), ...) { par(mar = mar, mgp = mgp, tck = tck, cex.axis = cex.axis, las = las, mfrow = mfrow, ...) } ``` --- # Nicer base graphics .pull-left[ ```r plot(1:10, 1:10) ``` ![](index_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] .pull-right[ ```r nice_par() plot(1:10, 1:10) ``` ![](index_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] --- class: inverse, center, middle background-image: url(graphic/white_logo.png) background-size: cover # What about the .Renviron file? --- # Store system variables - `R_LIBS=~/R/library` - `install.packages()` saves packages in the directory specified by `R_LIBS` -- - `TMPDIR=/not/a/network/drive` - When R is running, it creates temporary copies. On my work machine, the default directory is a network drive. -- - `R_COMPILE_PKGS=3` - Byte compiles all packages (we'll come back to this) -- - `R_LIBS_SITE=/usr/lib/R/site-library:/usr/lib/R/library` - Explicitly state where to look for packages -- - API keys --- background-image: url(graphic/blas.jpg) # Basic Linear Algebra System (BLAS) * Matrix operations are performed by the BLAS library * By switching to a different BLAS library, it may be possible to speed-up your R code - Easy on Linux - A bit tricky for Windows users. --- class: middle ![](graphic/blas-bench.jpg) Source: [benchmarkme](https://cran.r-project.org/web/packages/benchmarkme/) package --- # Byte compiling ```r mean ``` ``` ## function (x, ...) ## UseMethod("mean") ## <bytecode: 0x4b44628> ## <environment: namespace:base> ``` --- # The mean function ```r mean_r = function(x) { m = 0 n = length(x) for(i in seq_len(n)) m = m + x[i] / n m } mean_r(rnorm(10)) ``` ``` ## [1] -0.3369 ``` --- class: middle ![](graphic/byte-bench.png) --- background-image: url(graphic/bite.jpg) class: center # How do you byte (compile)? --- # Byte compiling * Less 4% of CRAN packages are byte compiled - `ByteCompile` in the DESCRIPTION file * Automatically byte compile packages with `R_COMPILE_PKGS=3` - May need Rtools for some packages (Windows) * Also byte compile per package --- background-image: url(graphic/broken.jpg) # Inconvenient data -- > Data not big enough to boast about -- > CSV files --- # Storing tabular data * `read.csv()`; also __data.table__ & __readr__ * RDS (`readRDS()`, `saveRDS()`) - R binary format * `feather` --- ![](graphic/file.png) --- background-image: url(graphic/white_logo.png) background-size: cover class: center, middle, inverse # What about C++? --- class: inverse background-image: url(graphic/R_vs_C.png) background-size: cover --- background-image: url(graphic/white_logo.png) background-size: cover class: center, middle, inverse # Rcpp: the best of both worlds --- ![](index_files/figure-html/unnamed-chunk-10-1.svg)<!-- --> --- # Mean C++ function ```cpp double mean_cpp(NumericVector x) { int i; int n = x.size(); double mean = 0; for(i = 0; i < n; i++) { mean = mean + x[i] / n; } return mean; } ``` --- # Mean.cpp ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] double mean_cpp(NumericVector x) { int i; int n = x.size(); double mean = 0; for(i = 0; i < n; i++) { mean = mean + x[i] / n; } return mean; } ``` --- class: middle, inverse background-image: url(graphic/white_logo.png) background-size: cover ```r sourceCpp("src/mean_cpp.cpp") ``` --- ![](index_files/figure-html/unnamed-chunk-14-1.svg)<!-- --> --- background-image: url(graphic/black_logo.png) background-size: cover # Sugared versions * Sweeten C++ code * Write C++ that looks like R code! ```r NumericVector res_sugar(NumericVector x, NumericVector y) { return pow(x - y, 2); } ``` -- * Sugared versions typically aren't faster than pure C++ * But there's not much difference --- class: middle, inverse background-image: url(graphic/white_logo.png) background-size: cover # Summary * R's strength is the vast number of R packages available * R's weakness, is it's easy to write very slow R code * But, we can avoid the obvious pitfalls ## More details * [Efficient R programming](http://shop.oreilly.com/product/0636920047995.do), O'Reilly * Meet the Expert today @ 2:05 pm * Courses [jumpingrivers.com](https://www.jumpingrivers.com) * 10% discount on upcoming courses: mention `strata17`, valid until July 2017 --- ## Picture credits * https://unsplash.com/collections/593236/confused * https://unsplash.com/search/blas?photo=9UZGlklg8JE * https://burst.shopify.com/photos/hot-fresh-donut * http://www.gratisography.com/