Skip to main content
AI in Production 2026 is now open for talk proposals.
Share insights that help teams build, scale, and maintain stronger AI systems.
items
Menu
  • About
    • Overview 
    • Join Us  
    • Community 
    • Contact 
  • Training
    • Overview 
    • Course Catalogue 
    • Public Courses 
  • Posit
    • Overview 
    • License Resale 
    • Managed Services 
    • Health Check 
  • Data Science
    • Overview 
    • Visualisation & Dashboards 
    • Open-source Data Science 
    • Data Science as a Service 
    • Gallery 
  • Engineering
    • Overview 
    • Cloud Solutions 
    • Enterprise Applications 
  • Our Work
    • Blog 
    • Case Studies 
    • R Package Validation 
    • diffify  

Machine Learning Powered Naughty List: A Festive Jumping Rivers Story

Author: Amieroh Ambrhams

Published: December 18, 2025

tags: r, ml, christmas

Introduction

Ho ho ho! 🎅 The holiday season is here, and at Jumping Rivers, we’re decking the halls with data, not just tinsel. While elves are busy checking their lists twice, we thought: why not bring a little machine learning magic to Christmas? After all, what’s more festive than combining predictive modeling with candy canes, cookies, and a sprinkle of office mischief?

This blog is your all-access pass to a code-powered journey where we find out who’s been naughty, who’s nice, and who’s just mischievously hovering in between.

We’ll walk you through the process step by step: gathering the team data, inventing the most festive features, training our ML model, and revealing the results with a cheeky, holiday twist. So grab a mug of cocoa, put on your favorite Christmas socks, and let’s dive into the Jumping Rivers ML-Powered Naughty List adventure!

Note: All data, labels, and results in this post are entirely fictional and randomly generated for festive fun.

Step 1: Data Collection and Team Introduction

Our first step was gathering our dataset. We used the Jumping Rivers team as the participants, assigning playful, holiday-themed features to reflect their potential ‘naughty’ traits. Here’s a concise, festive overview in a side-by-side table format:

Each participant is assigned four playful features that represent holiday mischief:

  • Ate too many cookies 🍪
  • Forgot to send Christmas cards 💌
  • Sang off-key during carols 🎶
  • Gift wrapping disasters 🎁

Every name on this list is now in the running for the ultimate festive title: Naughty, Nice, or Mildly Mischievous. Rumor has it that Santa’s Intern Elf already claimed the top spot for cookie mischief, while Rudolph keeps dashboards squeaky clean, and Frosty the Snow Analyst is maintaining a perfectly balanced winter score.

Whether you want to start from scratch, or improve your skills, Jumping Rivers has a training course for you.

Step 2: Feature Engineering

For ML purposes, names were encoded numerically. This is not meaningful in a real-world ML context but serves as a demonstration of preprocessing. The features for modeling include:

  • Name (encoded)
  • Ate too many cookies
  • Forgot to send Christmas cards
  • Sang off-key
  • Gift wrapping disasters

Step 3: Model Training

We chose a Random Forest classifier in R for its simplicity and interpretability. The model was trained on the dataset to predict the ‘naughty’ label based on the four behavioral features and the encoded name. Although the dataset is small and playful, this demonstrates a proper ML workflow: data collection, preprocessing, model training, prediction.

library(tidyverse)
library(randomForest)
library(ggplot2)

The first thing we need to do is set up a vector containing the team members along with some Christmas temp workers Santa’s Intern Elf, Rudolph the Data Reindeer and Frosty the Snow Analyst.

# Team members
team = c(
  "Esther Gillespie",
  "Colin Gillespie",
  "Sebastian Mellor",
  "Martin Smith",
  "Richard Brown",
  "Shane Halloran",
  "Mitchell Oliver",
  "Keith Newman",
  "Russ Hyde",
  "Gigi Kenneth",
  "Pedro Silva",
  "Carolyn Wilson",
  "Myles Mitchell",
  "Theo Roe",
  "Tim Brock",
  "Osheen MacOscar",
  "Emily Wales",
  "Amieroh Abrahams",
  "Deborah Washington",
  "Susan Smith",
  "Santa's Intern Elf",
  "Rudolph the Data Reindeer",
  "Frosty the Snow Analyst"
)

Now we have the team members we will randomly generate some values for the model features.

# Randomly generate playful 'naughty traits'
set.seed(51)
df = tibble(
  name = team,
  ate_too_many_cookies = sample(0:1, length(team), replace = TRUE),
  forgot_to_send_cards = sample(0:1, length(team), replace = TRUE),
  sang_off_key = sample(0:1, length(team), replace = TRUE),
  wrapping_disaster = sample(0:1, length(team), replace = TRUE),
  naughty = sample(0:1, length(team), replace = TRUE)
)


# Encode names as numeric
df$name_encoded = as.numeric(factor(df$name))

Next on the list is to set up a vector of features we want to use, and then train the model. We can then use the model to predict our fictitious naughtiness score for each team member! We can see Theo is at the top of the list, closely followed by Osheen.

features = c(
  "name_encoded",
  "ate_too_many_cookies",
  "forgot_to_send_cards",
  "sang_off_key",
  "wrapping_disaster"
)


# Train Random Forest
rf_model = randomForest(x = df[, features],
                        y = as.factor(df$naughty),
                        ntree = 100)


# Predict naughtiness
df$predicted_naughty = predict(rf_model, df[, features])
df$naughtiness_score = predict(rf_model, df[, features], 
                                type = "prob")[, 2]


# Create the Naughty List
naughty_list = df %>% 
  arrange(desc(naughtiness_score)) %>% 
  select(name, naughtiness_score, predicted_naughty)

print(naughty_list)
## # A tibble: 23 × 3
##    name               naughtiness_score predicted_naughty
##    <chr>                          <dbl> <fct>            
##  1 Theo Roe                        0.76 1                
##  2 Osheen MacOscar                 0.74 1                
##  3 Myles Mitchell                  0.72 1                
##  4 Esther Gillespie                0.68 1                
##  5 Deborah Washington              0.66 1                
##  6 Tim Brock                       0.59 1                
##  7 Amieroh Abrahams                0.55 1                
##  8 Santa's Intern Elf              0.48 0                
##  9 Carolyn Wilson                  0.38 0                
## 10 Susan Smith                     0.2  0                
## # ℹ 13 more rows

The last thing to do is visualise our results with {ggplot2}:

# Fun bar plot
ggplot(naughty_list, 
       aes(x = reorder(name, naughtiness_score), 
           y = naughtiness_score, 
           fill = as.factor(predicted_naughty))) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(values = c("0" = "forestgreen",
                               "1" = "darkred"), 
                    labels = c("Nice", "Naughty")) +
  labs(title = "🎅 Jumping Rivers ML-powered Naughty List 🎄",
       x = "Team Member", 
       y = "Naughtiness Score", 
       fill = "Status",
       alt = "Jumping Rivers Naughty List") +
  theme_minimal(base_family = "outfit")
Ggplot2 column chart showing Jumping Rivers Naughty List

Step 4: Analysis and Notes

After generating predictions, we can interpret the Naughty List. The highest naughtiness scores indicate which participants are most mischievous according to our playful model.

Observations from this analysis include:

  • Cookie Enthusiasts: Participants with multiple cookie infractions scored higher.
  • Gift Wrapping Chaos: Those whose presents looked like abstract art contributed to higher scores.
  • Musical Mishaps: Off-key carolers were highlighted as naughty.
  • Forgotten Cards: Small lapses in festive correspondence nudged some up the naughty rankings.

Special mentions:

  • Theo unsurprisingly tops the naughty list.
  • Santa’s Intern Elf performed well, staying mostly nice.
  • Shane had the best score and I’m sure Santa will be very nice to him this year!

This analysis provides both a technical demonstration of ML workflow and a fun story that engages readers during the festive season.

Step 5: Conclusion

This project demonstrates how machine learning can be used in creative ways outside of traditional business use cases. By combining features with a proper ML workflow, we created a light-hearted, festive story suitable for a blog, while also reinforcing good practices in data collection, preprocessing, modeling, and visualization.

Ultimately, the Jumping Rivers ML-Powered Naughty List is a celebration of data science, team culture, and holiday fun. Whether you’re naughty or nice, we hope this inspires creative applications of ML in festive contexts.


Jumping Rivers Logo

Recent Posts

  • Machine Learning Powered Naughty List: A Festive Jumping Rivers Story 
  • Make Your Shiny Apps Accessible to Everyone – Free Jumping Rivers Webinar! 
  • Creating a Python Package with Poetry for Beginners Part 3 
  • Beginner’s Guide to Submitting Conference Abstracts 
  • Start 2026 Ahead of the Curve: Boost Your Career with Jumping Rivers Training 
  • Should I Use Figma Design for Dashboard Prototyping? 
  • Announcing AI in Production 2026: A New Conference for AI and ML Practitioners 
  • Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November! 
  • Get Involved in the Data Science Community at our Free Meetups 
  • Polars and Pandas - Working with the Data-Frame 

Top Tags

  • R (239) 
  • Rbloggers (184) 
  • Pybloggers (91) 
  • Python (91) 
  • Shiny (63) 
  • Events (27) 
  • Machine Learning (25) 
  • Training (24) 
  • Conferences (21) 
  • Tidyverse (17) 
  • Statistics (15) 
  • Packages (13) 

Authors

  • Amieroh Abrahams 
  • Colin Gillespie 
  • Russ Hyde 
  • Sebastian Mellor 
  • Myles Mitchell 
  • Keith Newman 
  • Tim Brock 
  • Theo Roe 
  • Aida Gjoka 
  • Shane Halloran 
  • Gigi Kenneth 
  • Osheen MacOscar 
  • Pedro Silva 

Keep Updated

Like data science? R? Python? Stan? Then you’ll love the Jumping Rivers newsletter. The perks of being part of the Jumping Rivers family are:

  • Be the first to know about our latest courses and conferences.
  • Get discounts on the latest courses.
  • Read news on the latest techniques with the Jumping Rivers blog.

We keep your data secure and will never share your details. By subscribing, you agree to our privacy policy.

Follow Us

  • GitHub
  • Bluesky
  • LinkedIn
  • YouTube
  • Eventbrite

Find Us

The Catalyst Newcastle Helix Newcastle, NE4 5TG
Get directions

Contact Us

  • hello@jumpingrivers.com
  • + 44(0) 191 432 4340

Newsletter

Sign up

Events

  • North East Data Scientists Meetup
  • Leeds Data Science Meetup
  • AI in Production
British Assessment Bureau, UKAS Certified logo for ISO 9001 - Quality management British Assessment Bureau, UKAS Certified logo for ISO 27001 - Information security management Cyber Essentials Certified Plus badge
  • Privacy Notice
  • |
  • Booking Terms

©2016 - present. Jumping Rivers Ltd