R for Big Data
Dealing with big data sets in R can be painful. One small mistake, and a seemly trivial calculation makes our computer grind to a halt. This training course is a one-day intensive practical introduction to dealing with big data. Unfortunately, there are no easy answers. So we’ll take you through the different possible strategies you might employ, clearly highlighting the positives and negatives of each. During the day, we’ll cover hardware, programming with Rcpp, out-of-memory datasets and sparklyr.
No Events Currently Scheduled
Sorry, there are no upcoming events for this course, but please get in touch if you would like to be kept informed when events are scheduled in the future.
- Hardware: a brief overview of CPU, memory sizes and RAM. The benefit of switching to the cloud.
- Rcpp: leveraging C++ for slow operations.
- Tips and tricks for visualising big data sets.
- The remainder of the course will consider three classes of data sets:
- Large in-memory data sets: the dplyr package.
- Out of memory: ff and the big memory suite of packages.
- Distributed datasets: using Spark and sparklyr.
Participants are encouraged to bring their own datasets and associated problems to this event.
By the end of the day participants will be able to…
- understand the different methods of dealing with big data
- understand which method(s) apply to their problem
- be able to apply said method to their own data using R’s interface to C coding, spark and other tools