Text Manipulation with the Tidyverse

Course Level: Intermediate

Having trouble handling text data in R? If so, this course is certainly for you! One of the main problems Data Scientists face when importing data into R is incosistencies within the raw data. For example, cells may have trailing whitespace or names might not be in title case. We will be covering the {stringr} package which can be used to solve these problems! We will also explore how to parse objects into strings using {glue} and how to text mine using {tidytext}.

  • Course Outline
  • Learning Outcomes
  • Materials
  • Prior Knowledge

Course Outline

  • {glue}: Parsing objects into strings using glue()
  • {stringr}: Manipulating strings
  • Regular expressions with {stringr}: Matching complicated string patterns to known strings
  • {tidytext}: A collection of methods for text mining

Learning Outcomes

By the end of the day participants will be able to…

  • write dynamic code which allows an object to be parsed into a string
  • understand the challenges and solutions when working with strings
  • understand the types of problems regular expressions can help with
  • perform text mining techniques to extract information from textual data


Prior Knowledge

This course assumes basic familiarity with R and the {tidyverse}. Attending our Data Wrangling in the Tidyverse course, is more than sufficient in providing you with the pre-requisite knowledge required for this course!

