2020 Training Review

This blog post was written by our intern Osheen Macoscar. 2020 is a year most of us would like to leave behind. But not all change is bad, and many interesting developments, especially in education, happened due to the constraints imposed by COVID. Like many other training providers, we had to pivot to online learning, which brought with it challenges but also new opportunities. This review will hopefully offer some insight into what the year looked like for our trainers and training course attendees with some key facts and figures along the way.

Git: Moving from Master to Main

In June 2020, GitHub announced that is was moving the default branch name from master to the more neutral name, main. GitLab followed suit in a few months later. Tobie Langel makes the salient point on why changing the name is a good thing: So master is not only racist, it’s also a silly name in the first place. The purpose of this post is summarise some of the challenges we faced when moving from master to main, with the goal that if you decide to make the same change, you’ll hopefully avoid some of the issues.

Your first D3 visualisation with {r2d3} and Scooby-Doo

Get the code for this blog on GitHub What is this tutorial and who is it for? This tutorial is aimed mainly at R users who want to learn a bit of D3, and specifically those who are interested in how you can incorporate D3 into your existing workflows in RStudio. It will gloss over a lot of the fundamentals of D3 and related topics (JavaScript, CSS, and HTML) to fast-forward the process of creating your first D3.

Understanding the Parquet file format

Apache Parquet is a popular column storage file format used by Hadoop systems, such as Pig, Spark, and Hive. The file format is language independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension .parquet. This blog post aims to understand how parquet works and the tricks it uses to efficiently store data. Key features of parquet are: it’s cross platform it’s a recognised file format used by many systems it stores data in a column layout it stores metadata The latter two points allow for efficient storage and querying of data.

Webinars: Practical Advice for R in Production

Many organisations have a robust infrastructure that allows their data science teams to provide, fast and reliable insights. But for many groups, they are just starting down this path. We, Jumping Rivers, have partnered with RStudio to launch a two-part webinar series which examines and explores the usage of R in production environments. The first webinar will discuss the big picture of using open source languages and tools in enterprise environments.

Cleaning up forked GitHub repositories with {gh}

One great thing about using GitHub is the ability to view and contribute to others’ code. Even the code underlying many of our favourite packages is available for us to examine and play around with. Forking a repository is a great way to create an exact replica of someone else’s project in our own user space. We can then freely make changes to this copy without affecting the original project. If you end up especially proud of your changes, you can then submit a Pull Request to offer them up to the owner of the original repository.

Job vacancies at Jumping Rivers!

In line with the continuous growth at Jumping Rivers, we are looking to expand our team of dedicated professionals working in our teams. If you are enthusiastic and keen to develop your skills in cutting edge data science or infrastructure please read on! Who are we and what do we do? Jumping Rivers is an analytics company whose passion is data and machine learning. We help our clients move from data storage to data insights.

Jumping Rivers 2021 Online Training Schedule

Good news! In tandom with the loosening of lockdown restrictions, Jumping Rivers has released the updated 2021 public, online training course schedule. We are offering courses across multiple programming languages, including R, Python, Stan, Scala and git. In the past year, we have converted all of our courses to be online friendly and have recieved great feedback in relation to interactivity, course structure and overall attendee satisfaction. Some examples of feedback we have recieved can be seen below:

New features in R 4.1.0

R-4.1.0 is released! Rejoice! A new R release (v 4.1.0) is due on 18th May 2021. Typically most major R releases don’t contain that many new features, but this release does contain some interesting and important changes. This post summarises some of the notable changes introduced. More detail on the changes can be found at the R changelog. Declining support for 32-bit Windows The 4.1.x series will be the last to support 32-bit Windows systems.

Tips & tricks when moving to Hugo

Over Christmas we moved our main site from Wordpress to Hugo & Netlify. The main benefits for us moving to Hugo were Security. We were always getting emails about various Wordpress plugins. As our site was essentially static, this was an additional maintenance task. Site-speed. Although Wordpress has lots of clever plugins for optimising site-speed (which then leads to the situation above); Wordpress is just “big”. Raw cost. By this I mean web-site fees.