Skip to main content
AI in Production 2026 is now open for talk proposals.
Share insights that help teams build, scale, and maintain stronger AI systems.
items
Menu
  • About
    • Overview 
    • Join Us  
    • Community 
    • Contact 
  • Training
    • Overview 
    • Course Catalogue 
    • Public Courses 
  • Posit
    • Overview 
    • License Resale 
    • Managed Services 
    • Health Check 
  • Data Science
    • Overview 
    • Visualisation & Dashboards 
    • Open-source Data Science 
    • Data Science as a Service 
    • Gallery 
  • Engineering
    • Overview 
    • Cloud Solutions 
    • Enterprise Applications 
  • Our Work
    • Blog 
    • Case Studies 
    • R Package Validation 
    • diffify  

Vetiver: Monitoring Models in Production

Author: Myles Mitchell

Published: October 31, 2024

tags: r, vetiver, machine-learning, production, mlops

This post is the third in our series of blogs on MLOps with vetiver:

  • Part 1: Vetiver: First steps in MLOps
  • Part 2: Vetiver: Model Deployment
  • Part 3: Vetiver: Monitoring Models in Production (this post)
  • Part 4: Vetiver: MLOps for Python

In Parts 1 and 2, we introduced the {vetiver} package and its use as a tool for streamlined MLOps. Using the {palmerpenguins} dataset as an example, we outlined the steps of training a model using {tidymodels} then converting this into a {vetiver} model. We then demonstrated the steps of versioning our trained model and deploying it into production.

Getting your first model into production is great! But it’s really only the beginning, as you will now have to carefully monitor it over time to ensure that it continues to perform as expected on the latest data. Thankfully, {vetiver} comes with a suite of functions for this exact purpose!

Whether you want to start from scratch, or improve your skills, Jumping Rivers has a training course for you.

Preparing the data

A crucial step in the monitoring process is the introduction of a time component. We will be tracking key scoring metrics over time as new data is collected, therefore our analysis will now depend on a time dimension even if our deployed model has no explicit time dependence.

To demonstrate the monitoring steps, we will be working with the World Health Organisation Life Expectancy data which tracks the average life expectancy in various countries over a number of years. We start by loading the data:

download.file("https://www.kaggle.com/api/v1/datasets/download/kumarajarshi/life-expectancy-who",
              "archive.zip")
unzip("archive.zip")
life_expectancy = readr::read_csv("./Life Expectancy Data.csv")

We will attempt to predict the life expectancy using the percentage expenditure, total expenditure, population, body-mass-index (BMI) and schooling. Let’s select the columns of interest, tidy up the variable names and drop any missing values:

life_expectancy = life_expectancy |>
  janitor::clean_names(case = "snake",
                       abbreviations = c("BMI")) |>
  dplyr::select("year", "life_expectancy",
                "percentage_expenditure",
                "total_expenditure", "population",
                "bmi", "schooling") |>
  tidyr::drop_na()

life_expectancy
#> # A tibble: 2,111 × 7
#>     year life_expectancy percentage_expenditure total_expenditure population
#>    <dbl>           <dbl>                  <dbl>             <dbl>      <dbl>
#>  1  2015            65                    71.3               8.16   33736494
#>  2  2014            59.9                  73.5               8.18     327582
#>  3  2013            59.9                  73.2               8.13   31731688
#>  4  2012            59.5                  78.2               8.52    3696958
#>  5  2011            59.2                   7.10              7.87    2978599
#>  6  2010            58.8                  79.7               9.2     2883167
#>  7  2009            58.6                  56.8               9.42     284331
#>  8  2008            58.1                  25.9               8.33    2729431
#>  9  2007            57.5                  10.9               6.73   26616792
#> 10  2006            57.3                  17.2               7.43    2589345
#> # ℹ 2,101 more rows
#> # ℹ 2 more variables: bmi <dbl>, schooling <dbl>

The data contains a numeric year column which will come in handy for monitoring the model performance over time. However, the {vetiver} monitoring functions will require this column to use <date> ("YYYY-MM-DD") formatting and it will have to be sorted in ascending order:

life_expectancy = life_expectancy |>
  dplyr::mutate(
    year = lubridate::ymd(year, truncated = 2L)
  ) |>
  dplyr::arrange(year)

life_expectancy
#> # A tibble: 2,111 × 7
#>    year       life_expectancy percentage_expenditure total_expenditure
#>    <date>               <dbl>                  <dbl>             <dbl>
#>  1 2000-01-01            54.8                  10.4               8.2 
#>  2 2000-01-01            72.6                  91.7               6.26
#>  3 2000-01-01            71.3                 154.                3.49
#>  4 2000-01-01            45.3                  15.9               2.79
#>  5 2000-01-01            74.1                1349.                9.21
#>  6 2000-01-01            72                    32.8               6.25
#>  7 2000-01-01            79.5                 347.                8.8 
#>  8 2000-01-01            78.1                3557.                1.6 
#>  9 2000-01-01            66.6                  35.1               4.67
#> 10 2000-01-01            65.3                   3.70              2.33
#> # ℹ 2,101 more rows
#> # ℹ 3 more variables: population <dbl>, bmi <dbl>, schooling <dbl>

Finally, let’s imagine the year is currently 2002, so our historical training data should only cover the years 2000 to 2002:

historic_life_expectancy = life_expectancy |>
  dplyr::filter(year <= "2002-01-01")

Later in this post we will check how our model performs on more recent data to illustrate the effects of model drift.

Training our model

Before we start training our model, we should split the data into “train” and “test” sets:

library("tidymodels")

data_split = rsample::initial_split(
  historic_life_expectancy,
  prop = 0.7
)
train_data = rsample::training(data_split)
test_data = rsample::testing(data_split)

The test set makes up 30% of the original data and will be used to score the model on unseen data following training.

The code cell below handles the steps of setting up a trained model in {vetiver} and versioning it using {pins}. For a more detailed explanation of what this code is doing, we refer the reader back to Part 1.

We will again use a basic K-nearest-neighbour model, although this time we have set up the workflow as a regression model since we are predicting a continuous quantity. Note that this requires the {kknn} package to be installed.

# Train the model with {tidymodels}
model = recipe(
  life_expectancy ~ percentage_expenditure +
    total_expenditure + population + bmi + schooling,
  data = train_data
) |>
  workflow(nearest_neighbor(mode = "regression")) |>
  fit(train_data)

# Convert to a {vetiver} model
v_model = vetiver::vetiver_model(
  model,
  model_name = "k-nn",
  description = "life-expectancy"
)

# Store the model using {pins}
model_board = pins::board_temp(versioned = TRUE)
vetiver::vetiver_pin_write(model_board, v_model)

Here the model {pins} board is created using pins::board_temp() which generates a temporary local folder.

At this point we should check how our model performs on the unseen test data. The maximum absolute error (mae), root-mean-squared error (rmse) and R2 (rsq) can be computed over a specified time period using vetiver::vetiver_compute_metrics():

metrics = augment(v_model, new_data = test_data) |>
  vetiver::vetiver_compute_metrics(
    date_var = year,
    period = "year",
    truth = life_expectancy,
    estimate = .pred
  )

metrics
#> # A tibble: 9 × 5
#>   .index        .n .metric .estimator .estimate
#>   <date>     <int> <chr>   <chr>          <dbl>
#> 1 2000-01-01    46 rmse    standard       4.06 
#> 2 2000-01-01    46 rsq     standard       0.836
#> 3 2000-01-01    46 mae     standard       3.05 
#> 4 2001-01-01    44 rmse    standard       4.61 
#> 5 2001-01-01    44 rsq     standard       0.844
#> 6 2001-01-01    44 mae     standard       3.43 
#> 7 2002-01-01    36 rmse    standard       4.14 
#> 8 2002-01-01    36 rsq     standard       0.853
#> 9 2002-01-01    36 mae     standard       3.04

The first line of code here sends new data (in this case the unseen test data) to our model and generates a .pred column containing the model predictions. This output is then piped to vetiver::vetiver_compute_metrics() which includes the following arguments:

  • date_var: the name of the date column to use for monitoring the model performance over time.
  • period: the period ("hour", "day", "week", etc) over which the scoring metrics should be computed. We are restricted by our data to using "year"; for more granular data it may be more sensible to monitor the model over shorter timescales.
  • truth: the actual values of the target variable (in our example this is the life_expectancy column of the test data).
  • estimate: the predictions of the target variable to compare the actual values against (in our example this is the .pred column computed in the previous step).

We will come back to these metrics later in this post, so for now let’s store them along with our model using {pins}:

pins::pin_write(model_board, metrics, "k-nn")

We will skip over the details of deploying our model since this is already covered in Part 2.

Monitoring our model

Over time we may notice our model start to drift, where its predictions gradually diverge from the truth as the data evolves. There are two common causes of this:

  • Data drift: the statistical distribution of an input variable changes.
  • Concept drift: the relationship between the target and an input variable changes.

Taking the example of life expectancy data:

  • A country’s expenditure is expected to vary over time due to changes in government policy and unexpected events like pandemics and economic crashes. This is data drift.
  • Advances in medicine may mean that life expectancy can improve even if BMI remains unchanged. This is concept drift.

Going back to our model which was trained using data from 2000 to 2002, let’s now check how it would perform on “future” data up to 2010:

# Generate "new" data from 2003 to 2010
new_life_expectancy = life_expectancy |>
  dplyr::filter(year > "2002-01-01" &
                  year <= "2010-01-01")

# Score the model performance on the new data
new_metrics = augment(v_model, new_data = new_life_expectancy) |>
  vetiver::vetiver_compute_metrics(
    date_var = year,
    period = "year",
    truth = life_expectancy,
    estimate = .pred
  )

new_metrics
#> # A tibble: 24 × 5
#>    .index        .n .metric .estimator .estimate
#>    <date>     <int> <chr>   <chr>          <dbl>
#>  1 2003-01-01   141 rmse    standard       5.21 
#>  2 2003-01-01   141 rsq     standard       0.760
#>  3 2003-01-01   141 mae     standard       3.64 
#>  4 2004-01-01   141 rmse    standard       5.14 
#>  5 2004-01-01   141 rsq     standard       0.761
#>  6 2004-01-01   141 mae     standard       3.60 
#>  7 2005-01-01   141 rmse    standard       5.83 
#>  8 2005-01-01   141 rsq     standard       0.684
#>  9 2005-01-01   141 mae     standard       4.19 
#> 10 2006-01-01   141 rmse    standard       6.23 
#> # ℹ 14 more rows

Let’s now store the new metrics in the model {pins} board (along with the original metrics):

vetiver::vetiver_pin_metrics(
  model_board,
  new_metrics,
  "k-nn"
)

We can now load both the original and new metrics then visualise these with vetiver::vetiver_plot_metrics():

# Load the metrics
monitoring_metrics = pins::pin_read(model_board, "k-nn")

# Plot the metrics
vetiver::vetiver_plot_metrics(monitoring_metrics) +
  scale_size(name = "Number of\nobservations", range = c(2, 4)) +
  theme_minimal()
A line plot showing the evolution of the maximum absolute error, root-mean-squared error and R-squared metric of the trained life expectancy model over time between the years 2000 and 2010. Both error measurements increase over time, while the R-squared metric decreases.

The size of the data points represents the number of observations used to compute the metrics at each period. Up to 2002 we are using the unseen test data to score our model; after this we are using the full available data set.

We observe an increasing model error over time, suggesting that the deployed model should only be trained using the latest data. For this particular data set it would be sensible to retrain and redeploy the model annually.

Summary

In this blog we have introduced the idea of monitoring models in production using the Vetiver framework. Using the life expectancy data from the World Health Organisation as an example, we have outlined how to track key model metrics over time and identify model drift.

As you start to retire your old models and replace these with new models trained on the latest data, make sure to keep ALL of your models (old and new) versioned and stored. That way you can retrieve any historical model and establish why it gave a particular prediction on a particular date.

The {vetiver} framework also includes an R Markdown template for creating a model monitoring dashboard. For more on this, check out the {vetiver} documentation.

The next post in our Vetiver series will provide an outline of the Python framework. Stay tuned for that sometime in the new year!


Jumping Rivers Logo

Recent Posts

  • Should I Use Figma Design for Dashboard Prototyping? 
  • Announcing AI in Production 2026: A New Conference for AI and ML Practitioners 
  • Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November! 
  • Get Involved in the Data Science Community at our Free Meetups 
  • Polars and Pandas - Working with the Data-Frame 
  • Highlights from Shiny in Production (2025) 
  • Elevate Your Data Skills with Jumping Rivers Training 
  • Creating a Python Package with Poetry for Beginners Part2 
  • What's new for Python in 2025? 
  • Upcoming Free Webinar: Understanding Posit - Ecosystem and Use Cases 

Top Tags

  • R (235) 
  • Rbloggers (181) 
  • Pybloggers (88) 
  • Python (88) 
  • Shiny (63) 
  • Events (26) 
  • Training (22) 
  • Machine Learning (21) 
  • Conferences (20) 
  • Tidyverse (17) 
  • Packages (13) 
  • Statistics (13) 

Authors

  • Aida Gjoka 
  • Gigi Kenneth 
  • Theo Roe 
  • Russ Hyde 
  • Liam Kalita 
  • Osheen MacOscar 
  • Pedro Silva 
  • Amieroh Abrahams 
  • Colin Gillespie 
  • Sebastian Mellor 
  • Myles Mitchell 
  • Keith Newman 
  • Tim Brock 
  • Shane Halloran 

Keep Updated

Like data science? R? Python? Stan? Then you’ll love the Jumping Rivers newsletter. The perks of being part of the Jumping Rivers family are:

  • Be the first to know about our latest courses and conferences.
  • Get discounts on the latest courses.
  • Read news on the latest techniques with the Jumping Rivers blog.

We keep your data secure and will never share your details. By subscribing, you agree to our privacy policy.

Follow Us

  • GitHub
  • Bluesky
  • LinkedIn
  • YouTube
  • Eventbrite

Find Us

The Catalyst Newcastle Helix Newcastle, NE4 5TG
Get directions

Contact Us

  • hello@jumpingrivers.com
  • + 44(0) 191 432 4340

Newsletter

Sign up

Events

  • North East Data Scientists Meetup
  • Leeds Data Science Meetup
  • Shiny in Production
British Assessment Bureau, UKAS Certified logo for ISO 9001 - Quality management British Assessment Bureau, UKAS Certified logo for ISO 27001 - Information security management Cyber Essentials Certified Plus badge
  • Privacy Notice
  • |
  • Booking Terms

©2016 - present. Jumping Rivers Ltd