R Package Quality: Package Popularity
This is blog two of five:
- R Package Quality: Validation and beyond!
- Package Popularity (this one)
In our previous post, we introduced the four components that make up a litmus package score: documentation, popularity, code quality, and maintenance. In this post, we’ll look at package popularity. Package popularity is an interesting, and sometimes controversial, measure. In our experience it often sparks strong (and usually negative) reactions. The idea is simple: if a package is widely used, bugs are more likely to be found and fixed, and if the maintainer steps away, there’s a higher chance someone else will take over. Of course, high usage doesn’t mean a package is risk-free. But popularity can provide helpful context. Consider this example:
{pkgA}
: Extremely popular and a dependency for many other packages.{pkgB}
: Very few downloads and minimal usage.
In a situation like this, {pkgA}
may offer more stability over time, simply because more people rely on it.
It does not mean that {pkgA}
is risk free, only that the risk is lower than {pkgB}
.
All other things being equal, if you had sixty minutes to assess both packages, would you spend thirty minutes on each, or weight your time to the “least popular” package?
It’s important to keep in mind that statistical packages tend to be less popular than “foundational” ones. Packages for tasks like data wrangling, date-times, and plotting are used by nearly everyone, regardless of the use case. In contrast, more specialised packages, for example, those designed to handle experimental designs with drop-outs, naturally have a smaller audience.
So a lower popularity doesn’t necessarily reflect lower quality or usefulness. It may just reflect a more niche purpose.
Score 1: Yearly Downloads
For packages on CRAN, we can obtain download statistics. Of course, the obvious question is, “what is a large number of downloads?” To answer this question, we obtained the download statistics of every package on CRAN, and used that data as the basis of our score.
More precisely, if a package is in the upper quartile for the number of package downloads (approximately 7,000 downloads per year), the package is scored 1. Otherwise, the empirical CDF is used to score.
Of course, you could choose a different period of time, say month, or a trend over time. But our investigations suggest that while having a variety of scores based on downloads, very little new information is gained. But there is an additional increase in complexity.
Score 2: Reverse Dependencies
We also examine the number of reverse dependencies, that is, how many other packages rely on it. The reasoning is simple: if many packages depend on it, there’s a greater chance that bugs will be spotted and fixed. It also suggests that other developers have reviewed and trusted the package enough to build on top of it.
Similar to package downloads, we used all packages on CRAN as a basis for scoring. Packages in the top quartile for reverse dependencies receive a score of 1. All others are scored using the empirical cumulative distribution function (CDF). In practice, this ends up behaving like a near-binary score, since only a small number of packages have significant reverse dependencies.
Examples
We’ve selected five packages to illustrate these scores - the total litmus score is given in brackets:
{drat}
(0.94): A fantastic little package that simplifies creating local R repositories.{microbenchmark}
(0.87): A useful utility package, for (precisely) measuring function calls in R.{shinyjs}
(0.90): Perform common useful JavaScript operations in Shiny apps, created by Dean Attali.{tibble}
(0.81): The cornerstone(?) of the tidyverse.{tsibble}
(0.80): Tibbles for time series.
All five packages, as we would expect, have a high overall litmus score; we didn’t want to pick on more risky packages!
For package popularity, which makes up 15% of the total litmus score, all five packages selected, score a maximum of 1
for downloads and reverse dependencies.
Potentially, we could change the score to make it a more “continuous” measure.
For example, the number of downloads for {tibble}
is always more than {tsibble}
, as the latter depends on the former.
However, the purpose of assessing packages, isn’t to provide a ranked list of packages, it’s to identify packages that are potentially risky.
So having a more continuous measure isn’t that helpful.
Summary
We tend to think about package popularity as a way of crowd sourcing information about the package of interest. As we’ve mentioned, it’s only a signal, and as such it only contributes to 15% of the overall litmus score.
