Introduction to R
This is a one day intensive course
on R.
This course will be a mixture of lectures and computer
practicals. The main focus will be to introduce fundamental R
concepts.
No prior programming knowledge of any kind is assumed. This
course is suitable for a wide range of applicants e.g.,
biologists, statisticians, engineers, students.
Course outline:
 Introduction to R: A brief overview of the background and features of the R statistical programming system.
 Entering Data: A description of how to import and export data from R.
 Data types: A summary of R's data types.
 R environment: A description of the R environment including the R working directory, creating/using scripts, saving data and results.
 R Graphics: Creating, editing and storing graphics in R.
 Manipulating data in R: Describing how data can be manipulated in R using logical operators.
Course structure
This course will be structured as follows:
 8:30  9:00: Registration and coffee
 9:00  10:30: Lecture
 10:30  11:00: Coffee break
 11:00  12:00: Lecture
 12:00  1:00 Lunch (not provided)
 1:00  2:00 Practical 1
 2:00  2:40: Lecture
 2:40  3:00: Coffee break
 3:00  4:30: Practical 2
These times are intended to giveo a flavour of how the course
is run and are subject to change.
Comments from previous courses
 Clear explanations; combination of theory and practice is excellent.
 Good pace, good split of practical and lecture.
 Excellent introduction to R!
 Nice friendly environment.
 Almost one to one! Great teaching, good lectures.
Presenter
Dr
Colin Gillespie, Statistics Lecturer in the School of
Mathematics &Statistics.
Introduction to statistical modelling in R
This is a one day intensive course on modelling
in R.
This course will be a mixture of lectures and computer
practicals.
Prior knowledge: it will be assumed that participants
are familiar with R. For example, inputting data, basic
visualisation and data frames. Attending the introduction to R
courses will provide a sufficient background. This course is
suitable to a wide range of applicants e.g., biologists,
statisticians, engineers, students.
Course outline:
 Basic hypothesis testing: examples include onesample
ttest, onesample Wilcoxon signedrank test, independent
twosample ttest, MannWhitney test, twosample ttest for paired
samples, Wilcoxon signedrank test.
 ANOVA tables: 1way and 2way tables.
 Simple and multiple linear regression: including model diagnostics.
 Clustering: hierarchical clustering, kmeans.
 Principal components analysis: plotting and scaling data
Course structure
This course will be structured as follows:
 8:30  9:00: Registration and coffee
 9:00  10:30: Lecture
 10:30  10:45: Coffee break
 10:45  12:15: Practical 1
 12:15  1:15: Lunch (not provided)
 1:15  2:40: Lecture
 2:40  3:00: Coffee break
 3:00  4:30: Practical 2
These times are intended to give a flavour of
how the course is run and are subject to
change.
Comments from previous courses
 The balance between lectures and practicals was good.
 Great help during the practicals.
 High quality lecture materials.
Presenter
Dr
Colin Gillespie, Statistics Lecturer in the School of
Mathematics &Statistics.
Programming with R
This is a one day intensive course on R. The course
will be a mixture of lectures and computer practicals. The main
focus of the course is R programming techniques, such as functions,
for loops and conditional expressions.
The course follows on from the Introduction to R course. It
is assumed that all students have attended this course (or have
equivalent skills). This course is suitable to a wide range of
applicants e.g., biologists, statisticians, engineers,
students.
Course outline
 Vector operations: details of R's vectors operations.
 Conditionals: using "if" and "else" statements in R
 Functions: what is function is, how are they used,
and how can we construct our own functions.
 Looping in R: an introduction to the concept of looping in R. In particular "for" and "while" loops.
 The apply functions: apply, tapply and other
members of the apply family.
Course structure
This course will be structured as follows:
 8:30  9:00: Registration and coffee
 9:00  10:30: Lecture
 10:30  11:00: Coffee break
 11:00  12:00: Practical 1
 12:00  1:00: Lunch (not provided)
 1:00  2:00: Lecture
 2:00  4:30: Practical 2 (with a coffee break)
These times are intended to give a flavour of how the course
is run and are subject to change.
Comments from previous courses
 We started from the beginning and achieved a lot by the
end. I'm not scared of R anymore. It was actually fun!
 You cover all the aspects that we need to learn to get
started.
Presenter
Dr Colin Gillespie, Statistics Lecturer in the School of Mathematics &Statistics.
Advanced programming
This is a two day intensive course on R. The
course will be a mixture of lectures and computer practicals. The
main focus of the course is advanced R programming techniques,
such as S3/S4 objects, reference classes and function
closures.
The course follows on from the Programming with R course. It
is assumed that all students have attended this course (or have
equivalent skills). This course is suitable to a wide range of
applicants e.g. biologists, statisticians, engineers,
students.
Course outline:

Functions:
 Scoping rules (including lexical scope)
 The ... argument
 Functions as first class objects
 Functions closures and mutable states
 Argument matching

Customising your workspace
 The Rprofile and Renviron files
 Dealing with errors
 Messages, warnings and errors
 Using try and tryCatch effectively

S3 classes:
 Introduction to object oriented programming
 Constructing S3 objects
 Drawbacks

S4 and reference classes:
 Creating and using S4 and reference classes
 Differences between S3 and S4
Presenter
Dr
Colin Gillespie, Statistics Lecturer in the School of
Mathematics &Statistics.
Advanced graphics
This is a one day intensive course on
advanced graphics with R. The standard plotting
commands in R are known as the Base graphics. In
this course, we cover more advanced graphics
packages  in particular, ggplot2. The ggplot2
package can create very advanced and informative
graphics. For example:
A basic knowledge of R is assumed for this course. In
particular, attendees should be familiar with the topics covered
in the first course.This course will be a mixture of lectures
and computer practicals. The goal is to enable participants to
apply the techniques covered to their own data. This course is
suitable to a wide range of applicants e.g., biologists,
statisticians, engineers, students.
Course outline
 The grammar of graphics
 Mastering the grammar
 Groups, geoms, stats and layers
 Scales, axes and legends
 Facets
Course structure
This course will be structured as follows:
 8:30  9:00: Registration and coffee
 9:00  9:30: Lecture
 9:30  10:30: Practical 1
 10:30  11:00: Coffee break
 11:00  12:15: Lecture
 12:15  1:30: Lunch
 2:45  3:15: Coffee break
 1:30  4:30: Practical 2 & Lecture
Comments from previous courses
 Very clear lectures and handouts.
 Good overview of the main topics. Also gave advice on how to find out about other features that may be needed above the standards.
 The ability to ask more general questions about our data in the practical.
Presenter
Dr Colin Gillespie, Statistics Lecturer in the School of Mathematics &Statistics.
Efficient R programming
This is a one day intensive course on
efficient R programming. This course will be a
mixture of lectures and computer practicals.This course is
aimed at anyone who uses R, but wants tips and techniques on
speeding up their code.
Prior knowledge: it will be assumed that participants
are familiar with R. For example, inputting data, basic
visualisation and data frames. Attending the introduction to R
will be sufficient. This course is suitable to a wide range of
applicants e.g. biologists, statisticians, engineers,
students.
Course outline:
 Why is your code slow? Code profiling: which part
of the code should you optimise.
 Efficient data structures: object growth and memory
allocation.
 Avoiding loops: accessing the underlying C code faster.
 Parallel computing: an introduction to multicore
computing.
Course structure
This course will be structured as follows:
 8:30  9:00: Registration and coffee
 9:00  9:45: Lecture
 9:45  10:30: Practical 1
 10:30  11:00: Coffee break
 11:00  12:00: Lecture
 12:00  1:00: Lunch
 1:00  2:00: Practical 2
 2:00  2:40: Lecture
 2:40  3:00: Coffee break
 3:00  4:30: Practical 3
These times are intended to give a flavour of how the course
is run and are subject to change.
Presenter
Dr
Colin Gillespie, Statistics Lecturer in the School of
Mathematics & Statistics.
Building an R package
This is a one day intensive course on
building an package. This course will be a
mixture of lectures and computer practicals. The
main focus will be getting a working R package
ready for distribution. It is assumed that all
applicants have a basic knowledge of R.
Course outline:
Participants can bring their own code or they can use the
provided example code to write a fully functional R package.
 Why create an R package.
 What is in an R package.
 Writing documentation with roxygen.
 Creating packages with rstudio.
 Distributing your package.
Participants will need to bring their own laptop.
Course structure
This course will be structured as follows:
 9:00  9:30 Registration and coffee
 9:30  12:15: Lecture & practical session
 12:15  1:15 Lunch (not provided)
 1:15  2:40 Lecture & practical
 2:40  3:00: Coffee break
 3:00  4:30: Practical session
These times are intended to give a flavour of how the course
is run and are subject to change.
Presenter
Dr
Colin Gillespie, Statistics Lecturer in the School of
Mathematics & Statistics.
Five day Bioconductor course
Course outline:
This is a five day intensive course
on R
and Bioconductor. The course will be a mixture
of lectures and computer practicals. The final day provide
participants an opportunity to analyse their own data.
No prior programming knowledge of any kind is assumed.
Course structure:
This course will be structured as follows:
 Day 1: Introduction to R
 Standard R data types, base graphics, Manipulating data
 Day 2: Bioconductor input/output
 What is Bioconductor
 Installing packages
 Loading Affymetrix and Illumina data into R
 Data quality checks
 Day 3:
 Object oriented programming in R
 Microarray data analysis including Limma, RankProd
 Day 4: Clustering, ArrayExpress, GO
stats
 Day 5: RNAseq and Analysis of participants'
data
Presenters
Dr Colin Gillespie, Statistics Lecturer in the School of Mathematics &Statistics.
Dr Simon Cockell, Newcastle Bioinformatics Support Unit
Dr
Matthew Bashton, Newcastle Bioinformatics Support Unit
Predictive analytics
Course outline:
This is a two day intensive course on using the R programming
language for predictive analytics. This course will be a mixture of lectures and
computer practicals.
It will be assumed that participants are familiar with R. For
example inputting data, basic visualisation, basic data
structures and use of functions. Attending the introduction to R
course will provide a sufficient background.
Course structure:
This course will be structured as follows:
 Introduction to analytics: a general introduction into
analytics and some of the techniques that are in common
use.
 Simple regression problems: simple and multiple linear
regression and model diagnostics.
 Classification: KNN, clustering, logistic regression,
Linear Discriminant analysis and associated diagnostics.
 Model selection: various model selection procedures,
subset selection, shrinkage.
 Advanced regression techniques: polynomial regression,
splines, local regression, GAMs, trees and random
forests.
Presenters
Dr
Colin Gillespie, Statistics Lecturer in the School of
Mathematics &Statistics.
Mr Jamie Owen, R trainer in the School of Mathematics
&Statistics.
Spatial data analysis with R
Course outline:
As spatial datasets get larger more sophisticated software
needs to be harnessed for their analysis. R is now a widely used
open source software platform for working with spatial data
thanks to its powerful analysis and visualisation packages. The
first day of the course introduces the basics of how R can be used for spatial
data.
The second day demonstrates the many useful features that are
hidden away in package documentation. get_map and getData,
from ggmap and raster packages, for example, allow users to
download data from anywhere in the world into R directly.
Participants will be introduced to functionality in R that is
very difficult to achieve in other software, such as the
clustering of points into polygons and geographically weighted
regression. The focus is on the principles rather than the
specific methods, providing participants with the
understanding needed to apply R's powerful suite of
geographical tools to their own problems.
It is expected that participants have basic R experience, e.g.
attending, Course 1, Introduction to R. The course will be
handson and applied with short introductory lectures to each of
the topics, followed by practical sessions loading and analysing
real spatial datasets.
Course structure:
This course will be structured as follows:
 Introducing R as a GIS
 The structure of spatial objects in R
 Loading and interrogating spatial data
 Visualising spatial datasets
 Acquiring external data with R
 Point pattern analysis and spatial interpolation
 Geographical models in R
 Webmaps
Survival analysis using R
Course outline:
This course is a practical introduction to some of the
everyday and more sophisticated tools used for the analysis of
survival data.
It is expected that participants have basic R experience.
The course will be handson and applied with short introductory
lectures to each topic, followed by practical sessions working
with real survival datasets, available within R.
Course structure:
 Introduction to survival data: KaplanMeier curves and the logrank test
 The Cox proportional hazards model: implementation, interpretation and limitations
 Residual analysis, model checking and model comparison
 Timedependent covariates
 Parametric survival models using the Weibull distribution
 Advanced topics: frailty and joint modelling of
longitudinal and survival data
Participants are encourage to bring their own datasets and
associated problems to the event.
R for Big Data
Course outline:
This course is a practical introduction to dealing with large
data sets in R. We'll cover hardware, programming with Rcpp,
outofmemory datasets and SparkR.
It is expected that participants have previous R experience,
in particular, they are familar with the topics in the
programming with R course.
Course structure:
 Hardware: a brief overview of CPU, memory sizes and RAM.
The benefit of switching to the cloud.
 Rcpp: leveraging C++ for slow
operations
 The remainder of the course will consider three classes of data sets:
 Large inmemory data sets: the dplyr package
 Out of memory: ff and the big memory suite of packages
 Distributed data sets: Spark
Participants are encourage to bring their own datasets and
associated problems to the event.
Presenters
Dr
Colin Gillespie, Statistics Lecturer in the School of
Mathematics & Statistics.
Dr Pete Philipson, Northumbria University.
Interactive graphics with Shiny
Shiny is an R package that allows you do create cutting edge,
interactive webgraphics. Regardless of your background, Shiny will
enable you to present your data in new and innovative ways. From the
Shiny documentation "Shiny makes it incredibly easy to build interactive
web applications with R. Automatic 'reactive' binding between inputs and
outputs and extensive prebuilt widgets make it possible to build
beautiful, responsive, and powerful applications with minimal effort."
This course is based on workshop run by Garret Grolemund (RStudio) and
Colin Gillespie (Newcastle University)
at Strata
2015.
Prerequisite
Partipants should have
completed the Automated reporting R course, which covers the fundamentals of markdown and knitr.
Course outline:
 Widgets: HTML widgets
 Introduction to Shiny and html: Introduciton to the server
and ui files
 Building and deploying apps
 Reactive programming: Creating dynamic graphics.
Course structure
This course will consist of short lectures, followed by short
practical sessions. This course will be structured as follows:
 9:00  9:30: Registration and coffee
 9:30  10:45: Lecture and practical
 10:45  11:00 Coffee
 11:00  12:30: Lecture and practical
 12:30  1:30: Lunch
 1:30  2:30: Lecture and practical
 2:30  2:50 Coffee
 2:50  4:30 Lecture and practical
These times are intended to give a flavour of how the course
is run and are subject to change.
Introduction to Bayesian inference using RStan
This is an intensive, two day course introducing the use of RStan for Bayesian computation.
The course will be a mixture of lectures and computer practicals.
The main focus will be on the specification of models using the Stan language
and on the practicalities of generating samples from the posterior distribution and diagnosing convergence.
Prerequisite
Partipants should be familiar with basic Probability and Statistics
including common distributions and regression. Basic R programming, is also required, i.e. writing loops and functions.
We do not expect you to have experience with Bayesian Inference or Stan, but some knowledge of the former will be helpful.
Course outline:
 Introduction to Bayesian inference: A brief overview of the main ideas behind Bayesian inference.
 Markov chain Monte Carlo methods: A brief overview of Markov chain Monte Carlo methods for Bayesian computation and Hamiltonian Monte Carlo.
 The Stan language: An outline of the main components of a Stan program.
 Using RStan: A guide to the use of the R interface to Stan.
 Examples: Including linear regression, Poisson regression and hierarchical models.
Course structure
This course will consist of short lectures, followed by short
practical sessions.
Presenter
Dr
Dr Sarah Heap, Statistics Lecturer in the School of
Mathematics &Statistics. Expert in Big Data, Bayesian Statistics and Time Series analysis.
Scala for statistical computing and data science
This course is aimed at statisticians and data scientists already familiar with a dynamic programming language (such as R, Python or Octave) who would like to learn how to use Scala.
Scala is a free modern, powerful, stronglytyped, functional programming language, wellsuited to statistical computing and data science applications.
In particular, it is fast and efficient, runs on the Java virtual machine (JVM), and is designed to easily exploit modern multicore and distributed computing architectures.
The course will begin with an introduction to the Scala language and basic concepts of functional programming (FP),
as well as essential Scala tools such as SBT for managing builds and library dependencies.
The course will continue with an overview of the Scala collections library, including parallel collections, and we will see how parallel collections enable trivial parallelisation of many statistical computing algorithms on multicore hardware. We will next survey the wider Scala library ecosystem, paying particular attention to Breeze, the Scala library for scientific computing and numerical linear algebra.
We will see how to exploit nonuniform random number generation and matrix computations in Breeze for statistical applications.
Both maximumlikelihood and simulationbased Bayesian statistical inference algorithms will be considered.
Much of the final day will be dedicated to understanding Apache Spark, the distributed Big Data analytics platform for Scala.
We will understand how Spark relates to the parallel collections we have already examined, and see how it can be used not only for the processing
of very large data sets, but also for the parallel and distributed analysis of large or otherwise computationallyintensive models.
As time permits, we will discuss more advanced FP concepts, such as typeclasses, higherkinded types, monoids, functors, monads,
applicatives, streams and streaming data, and see how these enable the development of flexible, scalable, generic code in stronglytyped functional languages.
Prerequisite
The course assumes a basic familiarity with essential concepts in statistical computing, as well as some basic programming experience.
It is assumed that participants will be familiar with writing their own functions in a language such as R, including essential control structures such as "forloops" and "ifstatements".
The course is not suitable for people completely new to programming. However, no prior knowledge of Scala or functional programming is assumed.
All participants will be expected to bring their own (multicore) laptop and to have a recent version of Java preinstalled.
Other setup instructions will be provided in advance to registered participants.
Course structure
The course will be delivered through a combination of lectures,
live demos and handson practical sessions. For the practical sessions,
participants will be expected to actively engage with the material, run demos, follow examples,
and write code to solve simple problems.
Presenters
The course will be delivered by Prof Darren Wilkinson (Newcastle University, U.K.).
Prof Wilkinson is coDirector of Newcastle's EPSRC Centre for Doctoral Training in Cloud Computing for Big Data.
He is a wellknown expert in computational Bayesian statistics and a leading proponent of the use of stronglytyped FP languages (such as Scala) for scalable statistical computing.
Automated reporting (first steps towards Shiny)
Do you want to create interactive documents? Do you want your reports to automatically
update when the data changes? Then this course is for you!
This course is based on workshop run by Garret Grolemund (RStudio) and
Colin Gillespie (Newcastle University)
at Strata
2015.
Prerequisite
It is expected that partipants are already familar with R. In
particular, they should be familar with basic data manipulations,
functions, if statements and for loops. These concepts are covered in the
Introduction to R and Programming courses.
Course outline:
 Rmarkdown Creating documents using Markdown
 knitr Running dynamic R code
 Widgets: HTML widgets
 Building and deploying apps Via the web and dropbox
 Latex A brief introduction to latex for additional styling
Course structure
This course will consist of short lectures, followed by short
practical sessions. This course will be structured as follows:
 9:00  9:30: Registration and coffee
 9:30  10:45: Lecture and practical
 10:45  11:00 Coffee
 11:00  12:30: Lecture and practical
 12:30  1:30: Lunch
 1:30  2:30: Lecture and practical
 2:30  2:50 Coffee
 2:50  4:30 Lecture and practical
These times are intended to give a flavour of how the course
is run and are subject to change.