Training Course Details

R for Big Data

Course Level: Advanced

Dealing with big data sets in R can be painful. One small mistake, and a seemly trivial calculation makes our computer grind to a halt. This training course is a one-day intensive practical introduction to dealing with big data. Unfortunately, there are no easy answers. So we’ll take you through the different possible strategies you might employ, clearly highlighting the positives and negatives of each. During the day, we’ll cover hardware, programming with Rcpp, out-of-memory datasets and sparklyr.

No Events Currently Scheduled

Sorry, there are no upcoming events for this course, but please get in touch if you would like to be kept informed when events are scheduled in the future.

View our full training course calendar >>

Course Details

Course Outline

  • Hardware: a brief overview of CPU, memory sizes and RAM. The benefit of switching to the cloud.
  • Rcpp: leveraging C++ for slow operations.
  • Tips and tricks for visualising big data sets.
  • The remainder of the course will consider three classes of data sets:
    • Large in-memory data sets: the dplyr package.
    • Out of memory: ff and the big memory suite of packages.
    • Distributed datasets: using Spark and sparklyr.

Participants are encouraged to bring their own datasets and associated problems to this event.

View course PDF

Learning Outcomes

By the end of the day participants will be able to…

  • understand the different methods of dealing with big data
  • understand which method(s) apply to their problem
  • be able to apply said method to their own data using R’s interface to C coding, spark and other tools

Prior Knowledge

It is expected that participants have previous R experience, in particular, they are familiar with the topics in the introduction to R and introduction to programming with R course.