It has been 6 months since the launch of Diffify, our website for comparing package releases. We are delighted to announce that, in addition to CRAN’s 20,000 R packages, you can now track 1600 popular Python packages! What’s included? The current criteria for a Python package to be included in Diffify are: The package is listed in the top 2000 PyPI packages according to download statistics. The package has had version releases since 1st May 2020.
packages
We’re now three months on from the initial release of Diffify, and what a few months it’s been! We thought now seemed like a good time to give you an overview of the big updates that Diffify has been through since it’s launch. Recognition and user feedback We are delighted to see that our app has been quickly adopted by the R community: R Weekly now displays links to Diffify for updated CRAN packages!
You know that sinking feeling that you get when you’re months into a big project and you log in one day and nothing works? Turns out something has updated and things have been removed that you needed and now you need to spend hours-days figuring out what’s changed and your masters deadline is getting closer and … ok, apparently this took me back to a very specific event. But I’m sure most of that sounds familiar to you if you’ve ever programmed something over a longer period of time.
Published: July 19, 2021
Many organisations have a robust infrastructure that allows their data science teams to provide, fast and reliable insights. But for many groups, they are just starting down this path. Do you use Professional Posit Products? If so, check out our managed Posit services We, Jumping Rivers, have partnered with RStudio to launch a two-part webinar series which examines and explores the usage of R in production environments. The first webinar will discuss the big picture of using open source languages and tools in enterprise environments.
One great thing about using GitHub is the ability to view and contribute to others’ code. Even the code underlying many of our favourite packages is available for us to examine and play around with. Forking a repository is a great way to create an exact replica of someone else’s project in our own user space. We can then freely make changes to this copy without affecting the original project. If you end up especially proud of your changes, you can then submit a Pull Request to offer them up to the owner of the original repository.
Continuous integration is an amazing tool when developing R packages. We push a change to the server, and a process is spawned that checks we haven’t done something silly. It protects us from ourselves! However this process can become slow, as typically the CI process starts with a blank virtual machine (VM). If you are using R, then the current most popular CI pipeline is Travis CI, but there’s also Jenkins, GitHub Actions, GitLab CI, Circle CI and a few others.
Faster package installation Every few weeks or so, a tweet pops up asking about how to speed up package installation in R Depending on the luck of twitter, the author may get a few suggestions. The bigger picture is that package installation time is starting to become more of an issue for a number of reasons. For example, packages are getting larger and more complex (tidyverse and friends), so installation just takes longer.
When talking about languages to use in Production in data science, R is usually not part of the conversation and if it is, it’s referenced as a secondary language. One of the main reasons this occurs is because R it’s commonly associated with being more suitable for statistical analysis and languages like Python and JavaScript, more suitable for doing other tasks such as creating web applications or implementing machine learning models.
What is an Rprofile Every time R starts, it runs through a couple of R scripts. One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script! Full details of how the .Rprofile works can be found in my book with Robin on Efficient R programming.
At Jumping Rivers we run a lot of R courses. Some of our most popular courses revolve around the tidyverse, in particular, our Data Wrangling in the Tidyverse and our more advanced purrr course. We even trained over 200 data scientists NHS - see our case study for more details. As you can imagine, when giving an on-site course, a reasonable question is what version of R is required for the course.
Recent Posts
- Shiny in Production: Sponsors
- Reproducible reports with Jupyter
- posit::conf(2023)
- Shiny in Production: Full speaker lineup
- Using Stan to analyse global UFO sighting reports
- Talks to watch at the RSS International Conference 2023
- Our ISO 27001 Certification
- Best Practices for Data Cleaning and Preprocessing
- SatRdays London 2023 - Recordings
- Generate multiple presentations with Quarto parameters