Skip to main content
AI in Production 2026 is now open for talk proposals.
Share insights that help teams build, scale, and maintain stronger AI systems.
items
Menu
  • About
    • Overview 
    • Join Us  
    • Community 
    • Contact 
  • Training
    • Overview 
    • Course Catalogue 
    • Public Courses 
  • Posit
    • Overview 
    • License Resale 
    • Managed Services 
    • Health Check 
  • Data Science
    • Overview 
    • Visualisation & Dashboards 
    • Open-source Data Science 
    • Data Science as a Service 
    • Gallery 
  • Engineering
    • Overview 
    • Cloud Solutions 
    • Enterprise Applications 
  • Our Work
    • Blog 
    • Case Studies 
    • R Package Validation 
    • diffify  

Vetiver: First steps in MLOps

Author: Colin Gillespie

Published: June 13, 2024

tags: r, vetiver, machine-learning, production, mlops

This is Part 1 of a series of blogs on {vetiver}. Future blogs will be linked here as they are released.

  • Part 1: Vetiver: First steps in MLOps (This post)
  • Part 2: Vetiver: Model Deployment
  • Part 3: Vetiver: Monitoring Models in Production
  • Part 4: Vetiver: MLOps for Python

Most R users are familiar with the classic workflow popularised by R for Data Science. Data scientists begin by importing and cleaning the data, then iteratively transform, model, and visualise it. Visualisation drives the modeling process, which in turn prompts new visualisations, and periodically, they summarise their work and report results.

Traditional data science workflow diagram. Stages are import, tidy, then transform, visualise, model in a loop, then communicate.

This workflow stems partly from classical statistical modeling, where we are interested in a limited number of models and understanding the system behind the data. In contrast, machine learning prioritises prediction, necessitating the consideration and updating of many models. Machine Learning Operations (MLOps) expands the modeling component of the traditional data science workflow, providing a framework to continuously build, deploy, and maintain machine learning models in production.

Machine learning cycle diagram. Stages are import + tidy, model, version, deploy, monitor, looping backround to import and tidy. Version, deploy and monitor are all gathered under the logo for vetiver.

Data: Importing and Tidying

The first step in deploying your model is automating data importation and tidying. Although this step is a standard part of the data science workflow, a few considerations are worth highlighting.

File formats: Consider moving from large CSV files to a more efficient format like Parquet, which reduces storage costs and simplifies the tidying step.

Moving to packages: As your analysis matures, consider creating an R package to encourage proper documentation, testing, and dependency management.

Tidying & cleaning: With your code in a package and tests in place, optimise bottlenecks to improve efficiency.

Versioning data: Ensure reproducibility by including timestamps in your database queries or otherwise ensuring you can retrieve the same dataset in the future.

Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.

Modelling

This post isn’t focused on modeling frameworks, so we’ll use {tidymodels} and the {palmerpenguins} dataset for brevity.

library("palmerpenguins")
library("tidymodels")
# Remove missing values
penguins_data = tidyr::drop_na(penguins, flipper_length_mm)

We aim to predict penguin species using island, flipper_length_mm, and body_mass_g. A scatter plot indicates this should be feasible. Plot of Body mass (g) vs flipper length (mm). The species of penguin is shown by the colour and the island is shown by the shape. There is a visible split between the Gentoo penguins and the others, with gentoo being overall larger in both ways. The scatter plot points to an obvious separation of Gentoo, to the other species. But pulling apart Adelie / Chinstrap looks a little more tricky.

Modelling wise, we’ll again keep things simple - a straight forward nearest neighbour model, where we use the island, flipper length and body mass to predict species type:

model = recipe(species ~ island + flipper_length_mm + body_mass_g, 
               data = penguins_data) |>
  workflow(nearest_neighbor(mode = "classification")) |> 
  fit(penguins_data) 

The model object can now be used to predict species. Reusing the same data as before, we have an accuracy of around 95%.

model_pred = predict(model, penguins_data)
mean(model_pred$.pred_class == as.character(penguins_data$species))
#> [1] 0.9474

Vetiver Model

Now that we have a model, we can start with MLOps and {vetiver}. First, collate all the necessary information to store, deploy, and version the model.

v_model = vetiver::vetiver_model(model, 
                           model_name = "k-nn", 
                           description = "blog-test")
v_model
#> 
#> ── k-nn ─ <bundled_workflow> model for deployment 
#> blog-test using 3 features

The v_model object is a list with six elements, including our description.

names(v_model)
#> [1] "model"       "model_name"  "description" "metadata"    "prototype"  
#> [6] "versioned"
v_model$description
#> [1] "blog-test"

The metadata contains various model-related components.

v_model$metadata
#> $user
#> list()
#> 
#> $version
#> NULL
#> 
#> $url
#> NULL
#> 
#> $required_pkgs
#> [1] "kknn"      "parsnip"   "recipes"   "workflows"

Storing your Model

To deploy a {vetiver} model object, we use a pin from the {pins} package. A pin is simply an R (or Python!) object that is stored for reuse at a later date. The most common use case of the {pins} package (at least for me) is for caching data for a shiny application or quarto document. Basically an easy way to cache data.

However, we can pin any R object - including a pre-built model. We pin objects to “boards” - boards can exist in many places, including Azure, Google drive, or a simple s3 bucket. For this example, I’m using using Posit Connect:

vetiver::vetiver_pin_write(board = pins::board_connect(), v_model)

To retrieve the object, use:

# Not something you would normally do with a {vetiver} model
pins::pin_read(pins::board_connect(), "colin/k-nn")
#> $model
#> bundled workflow object.
#> 
#> $prototype
#> # A tibble: 0 × 3
#> # ℹ 3 variables: island <fct>, flipper_length_mm <int>, body_mass_g <int>

Deploying as an API

The final step is to construct an API around your stored model. This is achieved using the {plumber} package. To deploy locally, i.e. on your own computer, we create a plumber instance and pass the model using {vetiver}

plumber::pr() |>
  vetiver::vetiver_api(v_model) |>
  plumber::pr_run()

This deploys the APIs locally. When you run the code, a browser window will likely open. If it doesn’t simply navigate to http://127.0.0.1:7764/__docs__/.

If the API has successfully deployed, then

base_url = "127.0.0.1:7764/"
url = paste0(base_url, "ping")
r = httr::GET(url)
metadata = httr::content(r, as = "text", encoding = "UTF-8")
jsonlite::fromJSON(metadata)

should return

#$status
#[1] "online"
#
#$time
#[1] "2024-05-27 17:15:39"

The API also has endpoints metadata and pin-url allowing you to programmatically query the model. The key endpoint for MLops, is predict. This endpoint allows you to pass new data to your model, and predict the outcome

url = paste0(base_url, "predict")
endpoint = vetiver::vetiver_endpoint(url)
pred_data = penguins_data |>
  dplyr::select("island", "flipper_length_mm", "body_mass_g") |>
  dplyr::slice_sample(n = 10)
predict(endpoint, pred_data)

Summary

This post introduces MLOps and its applications. In the next post, we’ll discuss deploying models in production.


Jumping Rivers Logo

Recent Posts

  • Should I Use Figma Design for Dashboard Prototyping? 
  • Announcing AI in Production 2026: A New Conference for AI and ML Practitioners 
  • Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November! 
  • Get Involved in the Data Science Community at our Free Meetups 
  • Polars and Pandas - Working with the Data-Frame 
  • Highlights from Shiny in Production (2025) 
  • Elevate Your Data Skills with Jumping Rivers Training 
  • Creating a Python Package with Poetry for Beginners Part2 
  • What's new for Python in 2025? 
  • Upcoming Free Webinar: Understanding Posit - Ecosystem and Use Cases 

Top Tags

  • R (235) 
  • Rbloggers (181) 
  • Pybloggers (88) 
  • Python (88) 
  • Shiny (63) 
  • Events (26) 
  • Training (22) 
  • Machine Learning (21) 
  • Conferences (20) 
  • Tidyverse (17) 
  • Packages (13) 
  • Statistics (13) 

Authors

  • Aida Gjoka 
  • Gigi Kenneth 
  • Theo Roe 
  • Russ Hyde 
  • Liam Kalita 
  • Osheen MacOscar 
  • Sebastian Mellor 
  • Pedro Silva 
  • Amieroh Abrahams 
  • Colin Gillespie 
  • Keith Newman 
  • Tim Brock 
  • Shane Halloran 
  • Myles Mitchell 

Keep Updated

Like data science? R? Python? Stan? Then you’ll love the Jumping Rivers newsletter. The perks of being part of the Jumping Rivers family are:

  • Be the first to know about our latest courses and conferences.
  • Get discounts on the latest courses.
  • Read news on the latest techniques with the Jumping Rivers blog.

We keep your data secure and will never share your details. By subscribing, you agree to our privacy policy.

Follow Us

  • GitHub
  • Bluesky
  • LinkedIn
  • YouTube
  • Eventbrite

Find Us

The Catalyst Newcastle Helix Newcastle, NE4 5TG
Get directions

Contact Us

  • hello@jumpingrivers.com
  • + 44(0) 191 432 4340

Newsletter

Sign up

Events

  • North East Data Scientists Meetup
  • Leeds Data Science Meetup
  • Shiny in Production
British Assessment Bureau, UKAS Certified logo for ISO 9001 - Quality management British Assessment Bureau, UKAS Certified logo for ISO 27001 - Information security management Cyber Essentials Certified Plus badge
  • Privacy Notice
  • |
  • Booking Terms

©2016 - present. Jumping Rivers Ltd