R Package Quality: Code Quality
This is part four of a five part series of related posts on validating R packages. Other posts in the series are:
- Validation Guidelines
- Package Popularity
- Package Documentation
- Code Quality (this post)
- Maintenance
In this post, we’ll take a closer look at code quality and how we can use automated tools
to quickly get a feel for a package.
The obvious package check is R CMD check
.
Anyone who has created a package, is familiar with constantly running R CMD check
to ensure that
their package is note, warning and error free.
However, that’s not the only tool we can draw on.
Codebase size, security vulnerabilities and the number of exported functions all give a hint
to the package quality.
When validating R packages, code quality contributes around 50% to the total. Remember to check out our dashboard to get an overview.
Score 1: Passing R CMD check
The bedrock of all good R packages!
Packages are downloaded, installed and the standard R CMD check
is performed.
The score is the weighted sum of errors (1) and warnings (0.25), with a maximum score of 1 (no errors or warnings) and a minimum score of 0.
Essentially, the metric will allow up to 1 error or 4 warnings before returning the lowest score of 0.
We are working on being more discerning on notes and warnings, but just now, it’s a relatively simple metric that highlights packages with potential issues.
Score 2: Codebase Size
This score is based on the R codebase size, as determined by the number of lines of R code. The general idea is that larger codebases are harder to maintain. Of course, the obvious question is “what is a large R base”?
Instead of coming up with arbitrary numbers, we analysed all packages on CRAN (2025/03). If a package is in the lower quartile for codebase size, the package is scored 1. Otherwise, the empirical CDF is used.
For those who are interested, the largest R package on CRAN had 100,000+ lines of R code!
Score 3: Security Vulnerabilities
If a package has a known security vulnerability, it receives a score of 0.
This uses the {oysteR}
package to detect issues.
Score 4: Release
This is a binary score, if the package under assessment is the latest version, it’s scored 1. Otherwise, a 0 is returned. We did investigate using a more sophisticated scoring system based on minor and major releases. But within the R community, semantic versioning isn’t consistently followed, so we opted for a simpler rule.
Score 5: Exported Namespace Size
Score a package based on the number of exported objects. Fewer exported objects mean the risk surface is lower, and bugs are potentially less likely. Similar to codebase size, the question is what is large? Analysing all packages on CRAN, gave us suitable cut-offs. If a package is in the lower quartile for the number of exports, the package is scored 1. Otherwise, the empirical CDF is used.
Our analysis of CRAN suggests that most packages export relatively few objects. A modest package exporting 11 objects scores 0.5. Exporting around 26 objects reduces this to around 0.25.
Score 6: Unit Test Coverage
Score based on the fraction of lines of code which are covered by a unit test. For validation of packages in the Pharmaceutical sector we also provide additional unit tests (remediated code coverage) and investigate the Exported function test coverage.
Score 7: Dependencies
Score based on the number of dependencies a package has, assuming a lower score for more packages. ‘Suggests’, ‘Enhances’, base or recommended packages are not considered as dependencies when calculating this score.
This is a data driven score, based on all packages in CRAN (2025/03). If a package is in the lower quartile for the number of package dependencies, the package is scored 1. Otherwise, the empirical CDF is used. In practice, this means that packages with around 5 dependencies are scored 0.5, which decreases to 0 around twenty dependencies.
Dependencies can be an emotive topic! As with all other scores, this metric isn’t the “be all and end all”, instead it’s just an indication of package fragility.
Examples
For simplicity, we’ve removed the columns on vulnerabilities, R CMD check and release, as for all packages, the score was 1.
Package | Dependencies | Exported Namespace | Test Coverage | Codebase Size |
---|---|---|---|---|
{drat} | 1.00 | 0.56 | 0.75 | 0.73 |
{microbenchmark} | 1.00 | 1.00 | 0.56 | 0.84 |
{shinyjs} | 0.82 | 0.13 | 0.03 | 0.66 |
{tibble} | 0.36 | 0.12 | 0.82 | 0.17 |
{tsibble} | 0.20 | 0.04 | 0.87 | 0.11 |
The scores above indicate that {tibble}
and {tsibble}
are relatively large, complex packages.
These packages export many functions, and have multiple dependencies.
Reassuringly, they have a high test coverage.
The {shinyjs}
package has a worryingly low test coverage.
However, inspection of the code shows that there are many manual tests that aren’t
captured.
This highlights a key aspect, automated aren’t enough, especially in the validated setting. Part of litmus is to having a qualified person
assess the package.
