What's new in R 4.3.0?

Published: April 20, 2023

tags: r

Logic will get you from A to B. Imagination will you take everywhere. (Einstein)

R can already take you everywhere. With it we can learn about the minutest particles and the largest galaxies. So, to celebrate the release of R 4.3 (“Already Tomorrow”, on April 21st, 2023), let’s reverse Einstein’s quote and take you from A to B with logic.

Two modes of comparison

In R, almost all of your data will be stored as a vector. Even if your vector holds a single value it is still considered to be a vector by R. This is unlike many other languages, and getting comfortable “thinking for the whole vector” can gain you efficiencies from several viewpoints. Your code will be more concise and it may even run quicker, when compared with an iterative approach to the same problem.

1:10 # A vector of integers
##  [1]  1  2  3  4  5  6  7  8  9 10
is.vector(1:10)
## [1] TRUE
sum(1:10) # A vectorised computation
## [1] 55

integer(0) # An empty vector of integers
## integer(0)
1L # A single integer, stored as a vector
## [1] 1

But the conciseness that R’s vectorised operations provide may trip you up unexpectedly. A typical case is when you think you are working with a scalar (a length-1 vector) but you are actually working with an empty or multivalued vector.

The logical values in R (TRUE, FALSE) are a little bit special. A vector of logical values might be used to represent some quality in a dataset, for example, to select those rows of a dataset that are to be kept in dplyr::filter().

library("tidyverse")
head(diamonds)
## # A tibble: 6 × 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

head(diamonds$cut == "Ideal") # A logical vector
## [1]  TRUE FALSE FALSE FALSE FALSE FALSE
filter(diamonds, cut == "Ideal") # Subsetting a data-frame using a logical vector
## # A tibble: 21,551 × 10
##    carat cut   color clarity depth table price     x     y     z
##    <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.23 Ideal E     SI2      61.5    55   326  3.95  3.98  2.43
##  2  0.23 Ideal J     VS1      62.8    56   340  3.93  3.9   2.46
##  3  0.31 Ideal J     SI2      62.2    54   344  4.35  4.37  2.71
##  4  0.3  Ideal I     SI2      62      54   348  4.31  4.34  2.68
##  5  0.33 Ideal I     SI2      61.8    55   403  4.49  4.51  2.78
##  6  0.33 Ideal I     SI2      61.2    56   403  4.49  4.5   2.75
##  7  0.33 Ideal J     SI1      61.1    56   403  4.49  4.55  2.76
##  8  0.23 Ideal G     VS1      61.9    54   404  3.93  3.95  2.44
##  9  0.32 Ideal I     SI1      60.9    55   404  4.45  4.48  2.72
## 10  0.3  Ideal I     SI2      61      59   405  4.3   4.33  2.63
## # ℹ 21,541 more rows

head(diamonds$carat > 0.3)
## [1] FALSE FALSE FALSE FALSE  TRUE FALSE
filter(diamonds, carat > 0.3)
## # A tibble: 49,737 × 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  2  0.31 Ideal     J     SI2      62.2    54   344  4.35  4.37  2.71
##  3  0.32 Premium   E     I1       60.9    58   345  4.38  4.42  2.68
##  4  0.31 Very Good J     SI1      59.4    62   353  4.39  4.43  2.62
##  5  0.31 Very Good J     SI1      58.1    62   353  4.44  4.47  2.59
##  6  0.31 Good      H     SI1      64      54   402  4.29  4.31  2.75
##  7  0.33 Ideal     I     SI2      61.8    55   403  4.49  4.51  2.78
##  8  0.33 Ideal     I     SI2      61.2    56   403  4.49  4.5   2.75
##  9  0.33 Ideal     J     SI1      61.1    56   403  4.49  4.55  2.76
## 10  0.32 Good      H     SI2      63.1    56   403  4.34  4.37  2.75
## # ℹ 49,727 more rows

But there are places where you use logical values, where it would make no sense (and could potentially be dangerous) to use a multivalued logical vector. We use if (...) {} and while (...) {} statements for flow control in R. The conditional expression in these statements (the ... in if (...) {}) should always evaluate to a logical scalar: either TRUE or FALSE.

When R 4.2.0 was released, stricter guarantees were placed on the length of these conditional expressions. We mentioned this in an earlier blog post. So in addition to getting an error when the conditional is empty, we now get an error when the conditional is too long:

# Comparison with an empty logical vector:
if (logical(0)) {
  print("I didn't expect to get here")
}
## Error in if (logical(0)) {: argument is of length zero

# Comparison with an over-sized logical vector:
numbers <- c(1, 3, 5, 6)

print(numbers %% 2 == 0) # Determine if even
## [1] FALSE FALSE FALSE  TRUE

if (numbers %% 2 == 0) {
  print("Should we ever be allowed to get here?")
}
## Error in if (numbers%%2 == 0) {: the condition has length > 1

Previously, R would use the first entry in a non-scalar conditional vector to decide whether to enter the if or while block.

Strictly comparing

So, we have two main ways of using a logical vector, one of which now requires that the vector is a scalar.

Another place where it is really important to know the length of your vectors is when combining logical values together.

R has a number of ways to combine logical values together that build on the AND and OR operations in Boolean algebra:

all and any for combining the values in a single vector (are all of the values TRUE; are any of the values TRUE)
&, && (representing “AND”), |, and || (for “OR”) for combining two different vectors

is_april = TRUE
is_r_released = TRUE
is_already_tomorrow = FALSE

# Logical AND within a single vector
all(c(is_april, is_r_released, is_already_tomorrow))
## [1] FALSE

# Logical OR within a single vector
any(c(is_april, is_r_released, is_already_tomorrow))
## [1] TRUE

# Logical AND between vectors
is_april & is_r_released
## [1] TRUE
is_april && is_already_tomorrow
## [1] FALSE

# Logical OR between vectors
is_april | is_r_released
## [1] TRUE
is_april || is_already_tomorrow
## [1] TRUE

For scalars, there’s no difference between the single-character operators (&, |) and the two-character operators (&&, ||). So why have a pair of operators for each concept?

&& and || are intended for use solely with scalars, they return a single logical value.
& and | work with multivalued vectors, they return a vector whose length matches their input arguments.

Since they always return a scalar logical, you should use && and || in your if/while conditional expressions (when needed). If an & or | is used, you may end up with a non-scalar vector inside if (...) {} and R will throw an error.

To illustrate the difference between the scalar operators and vectorised operators, here’s an example:

x = c(TRUE, TRUE, FALSE, FALSE)
y = c(TRUE, FALSE, TRUE, FALSE)

The vectorised operators apply AND/OR on matched pairs of elements:

x & y # c(x[1] && y[1], x[2] && y[2], ...)
## [1]  TRUE FALSE FALSE FALSE

x | y # c(x[1] || y[1], x[2] || y[2], ...)
## [1]  TRUE  TRUE  TRUE FALSE

In R 4.2.0, a warning is thrown when a non-scalar input is passed to the scalar-operators. But, a scalar logical is returned (here, the result of x[1] && y[1]). In earlier versions of R, no warning was printed.

# R 4.2
x && y
[1] TRUE
Warning messages:
1: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
2: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'

This could lead to hidden bugs. For example, if you used this code in an if conditional, a warning would be printed when a non-scalar vector was used but the code would continue happily:

# R 4.2
if (x && y) {
  print("The world can't end today...")
}
[1] "The world can't end today..."
Warning messages:
1: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
2: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'

In R 4.3.0, this warning has been elevated to an error and no value is returned:

# R 4.3
x && y
Error in x && y : 'length = 4' in coercion to 'logical(1)'

This more strict version of the scalar comparison operators will help catch those bugs where you didn’t realise a logical variable could contain more than one entry.

To check whether the strict comparison operators will affect your existing code, before upgrading to R 4.3.0, you can set an environment variable before running it:

# In R:
Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2" = TRUE)

A more logical flow

Where else do we work with scalars in R? Many functions expect certain arguments to be scalars. For example, the seq() function complains with non-scalar arguments:

seq(from = 1:3, to = 4)
## Error in seq.default(from = 1:3, to = 4): 'from' must be of length 1

seq(from = 1, to = 4:5)
## Error in seq.default(from = 1, to = 4:5): 'to' must be of length 1

There are several other places where R will throw an error if we provide a value that is of the wrong size:

a_data_frame[[column_index]] # column_index must be a scalar
a_matrix[rows, cols] = value # value must match the size of the replaced element(s)

There are other places where R will throw a warning, and try to gracefully handle values that are of an unexpected size:

# R's recycling rules are used to match the size of the vector input
c(1, 3, 5) * c(2, 3) # c(1 * 2, 3 * 3, 5 * 2)
## Warning in c(1, 3, 5) * c(2, 3): longer object length is not a multiple of
## shorter object length
## [1]  2  9 10

# The smaller vector was recycled to match the size of the larger
# c(1, 3, 5) * c(2, 3, 2)

An interesting case is the : operator, which like seq(), can be used to create sequences of numbers.

3:5
## [1] 3 4 5

If we provide a non-scalar on either side of the operator, R will warn us:

# R 4.2
(1:2) : 5
[1] 1 2 3 4 5
Warning message:
In (1:2):5 : numerical expression has 2 elements: only the first used

# R 4.2
1 : (4:6)
[1] 1 2 3 4
Warning message:
In 1:(4:6) : numerical expression has 3 elements: only the first used

Now, because the output should be a single sequence, R has to pick a specific value for the start- and the end-point of that sequence from the arguments provided. It uses the first entry in each argument. So,

(1:2) : 5 is equivalent to 1:5; and
1 : (4:6) is equivalent to 1:4.

If your code is providing non-scalar arguments to :, there may be a bug in your code or the packages that it depends upon. R 4.3.0 has introduced a more strict setting, which will catch the use of non-scalar values when constructing sequences with the : operator.

Much like with the stricter logic comparisons described above, the R developers have introduced this as an optional setting. After setting the environment variable _R_CHECK_LENGTH_COLON_ to a true value, R will throw an error whenever an oversized argument is passed into a:b.

# R 4.3
# Without the check enabled:
(1:2) : 5
[1] 1 2 3 4 5
Warning message:
In (1:2):5 : numerical expression has 2 elements: only the first used

# With the strict check enabled:
Sys.setenv("_R_CHECK_LENGTH_COLON_" = TRUE)
(1:2) : 5
Error in (1:2):5 : numerical expression has length > 1

And finally: Extracting from a pipe

Have you started using the native pipe yet? In our blog post to celebrate the release of R 4.2.0, we showed this example:

mtcars |> lm(mpg ~ disp, data = _)
## 
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
## 
## Coefficients:
## (Intercept)         disp  
##    29.59985     -0.04122

Here the pipe |> passes the value on it’s left-hand side into the function on the right. By default that value will be used as the first argument to the right-hand function. But when an underscore is present, the piped-in value will replace that underscore. So the above is equivalent to:

lm(mpg ~ disp, data = mtcars)
## 
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
## 
## Coefficients:
## (Intercept)         disp  
##    29.59985     -0.04122

What if you want to extract values that are output by a pipeline? For example, if you want the coef entry from the linear model above. One way would be to store the results in a variable and extract the coef from that:

model = mtcars |> lm(mpg ~ disp, data = _)
model$coef
## (Intercept)        disp 
## 29.59985476 -0.04121512

Or you could wrap the pipeline in parentheses:

(
  mtcars |> lm(mpg ~ disp, data = _)
)$coef
## (Intercept)        disp 
## 29.59985476 -0.04121512

R 4.3.0 provides a much neater solution, where the underscore _ can be used to refer to the final value from a pipeline. This can make your code much neater:

mtcars |> lm(mpg ~ disp, data = _) |> _$coef
(Intercept)        disp
29.59985476 -0.04121512

Trying the latest version out for yourself

To take away the pain of installing the latest development version of R, you can use docker. To use the devel version of R, you can use the following commands:

docker pull rstudio/r-base:devel-jammy
docker run --rm -it rstudio/r-base:devel-jammy

See the r-docker project for more details.