Training Course Details

Mastering the Tidyverse (Data Carpentry)

Course Level: Foundation

The tidyverse is essential for any statistician or data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation.  The tidyverse allows you to

  • Import data from databases and data sources with ease
  • Remove the pain of data cleaning
  • Start understanding that data by transforming it, visualising it with imagery and modelling it
  • Communicate your findings throughout your organisation securely and simply with apps, documents or plots
  • Make business decisions based on accurate data

This training course covers key aspects of the tidyverse, including dplyr, lubridate, tidyr and tibbles.

Edinburgh, UK | November 1, 2019

Price:
Click here for price details
Venue Details:
Edinburgh, Codeclan
Date:
November 1, 2019
Time:
9.00 am - 5.00 pm
Duration:
1 day
Book now  Ask For More Details About This Course

Booking handled by 3rd party:

Course Details

Course Structure

This course assumes familiarity with the concepts of the Introduction to R course. Any topics that you see occur in both, will be more advanced in this course.

 

Introduction to the tidyverse:

  • What is the tidyverse?

dplyr: the workhorse of the tidyverse

Before the first coffee break, we’ll tackle the dplyr package. This package forms the foundation of the tidyverse by providing a standardised data manipulation grammar.

  • What is dplyr?
  • The grammar of tidyverse functions
  • filter(), summarise() – it may be that a review of boolean algebra is necessary at this point for subsetting
  • The pipe operator %>% and chaining functions into a workflow
  • Other useful dplyr functions such as group_by()
  • Joins for dealing with data split across multiple data frames

Tidy data

Your data should be tidy. An obvious statement, except what do we mean by tidy? This section will elucidate what we mean by tidy data and how to make it part of our workflow.

  • Tidy data
    • What is tidy data?
  • Using tidyr
    • spread() and gather() for reshaping data
    • seperate() and unite() for splitting data into one column or the reverse
    • dealing with missing values

Data Input/Output

In order to manipulate data, we need to be able to load data into R. We’ll cover the key packages and provide advice as required.

  • Data storage: practical advice for managing data
  • Tidyverse packages
    • readr and readxl for dealing with .csv and .xls/.xlsx files
    • Database connections
  • Non-tidyverse packages
    • Not all data sets can be loaded using tidyverse packages
    • foreign package for reading data from other statistical systems (SAS, SPSS, Minitab)

Data manipulation

We’ll finish the day by looking at common difficulties that may crop up in a data scientist’s day

  • Dates/times with the lubridate package
  • Efficient string concatenation with glue
  • String manipulation with stringr

Materials