Logic will get you from A to B. Imagination will you take everywhere. (Einstein)

R can already take you everywhere. With it we can learn about the minutest particles and the largest galaxies. So, to celebrate the release of R 4.3 (“Already Tomorrow”, on April 21st, 2023), let’s reverse Einstein’s quote and take you from A to B with logic.

### Two modes of comparison

In R, almost all of your data will be stored as a vector. Even if your vector holds a single value it is still considered to be a vector by R. This is unlike many other languages, and getting comfortable “thinking for the whole vector” can gain you efficiencies from several viewpoints. Your code will be more concise and it may even run quicker, when compared with an iterative approach to the same problem.

```
1:10 # A vector of integers
## [1] 1 2 3 4 5 6 7 8 9 10
is.vector(1:10)
## [1] TRUE
sum(1:10) # A vectorised computation
## [1] 55
integer(0) # An empty vector of integers
## integer(0)
1L # A single integer, stored as a vector
## [1] 1
```

But the conciseness that R’s vectorised operations provide may trip you
up unexpectedly. A typical case is when you *think* you are working with
a *scalar* (a length-1 vector) but you are actually working with an
empty or multivalued vector.

The `logical`

values in R (`TRUE`

, `FALSE`

) are a little bit special. A
vector of logical values might be used to represent some quality in a
dataset, for example, to select those rows of a dataset that are to be
kept in `dplyr::filter()`

.

```
library("tidyverse")
head(diamonds)
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
head(diamonds$cut == "Ideal") # A logical vector
## [1] TRUE FALSE FALSE FALSE FALSE FALSE
filter(diamonds, cut == "Ideal") # Subsetting a data-frame using a logical vector
## # A tibble: 21,551 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.46
## 3 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71
## 4 0.3 Ideal I SI2 62 54 348 4.31 4.34 2.68
## 5 0.33 Ideal I SI2 61.8 55 403 4.49 4.51 2.78
## 6 0.33 Ideal I SI2 61.2 56 403 4.49 4.5 2.75
## 7 0.33 Ideal J SI1 61.1 56 403 4.49 4.55 2.76
## 8 0.23 Ideal G VS1 61.9 54 404 3.93 3.95 2.44
## 9 0.32 Ideal I SI1 60.9 55 404 4.45 4.48 2.72
## 10 0.3 Ideal I SI2 61 59 405 4.3 4.33 2.63
## # ℹ 21,541 more rows
head(diamonds$carat > 0.3)
## [1] FALSE FALSE FALSE FALSE TRUE FALSE
filter(diamonds, carat > 0.3)
## # A tibble: 49,737 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 2 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71
## 3 0.32 Premium E I1 60.9 58 345 4.38 4.42 2.68
## 4 0.31 Very Good J SI1 59.4 62 353 4.39 4.43 2.62
## 5 0.31 Very Good J SI1 58.1 62 353 4.44 4.47 2.59
## 6 0.31 Good H SI1 64 54 402 4.29 4.31 2.75
## 7 0.33 Ideal I SI2 61.8 55 403 4.49 4.51 2.78
## 8 0.33 Ideal I SI2 61.2 56 403 4.49 4.5 2.75
## 9 0.33 Ideal J SI1 61.1 56 403 4.49 4.55 2.76
## 10 0.32 Good H SI2 63.1 56 403 4.34 4.37 2.75
## # ℹ 49,727 more rows
```

But there are places where you use logical values, where it would make
no sense (and could potentially be dangerous) to use a multivalued
logical vector. We use `if (...) {}`

and `while (...) {}`

statements for
flow control in R. The conditional expression in these statements (the
`...`

in `if (...) {}`

) should always evaluate to a logical scalar:
either `TRUE`

or `FALSE`

.

When R 4.2.0 was released, stricter guarantees were placed on the length of these conditional expressions. We mentioned this in an earlier blog post. So in addition to getting an error when the conditional is empty, we now get an error when the conditional is too long:

```
# Comparison with an empty logical vector:
if (logical(0)) {
print("I didn't expect to get here")
}
## Error in if (logical(0)) {: argument is of length zero
```

```
# Comparison with an over-sized logical vector:
numbers <- c(1, 3, 5, 6)
print(numbers %% 2 == 0) # Determine if even
## [1] FALSE FALSE FALSE TRUE
if (numbers %% 2 == 0) {
print("Should we ever be allowed to get here?")
}
## Error in if (numbers%%2 == 0) {: the condition has length > 1
```

Previously, R would use the first entry in a non-scalar conditional
vector to decide whether to enter the `if`

or `while`

block.

Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.

### Strictly comparing

So, we have two main ways of using a logical vector, one of which now requires that the vector is a scalar.

Another place where it is really important to know the length of your vectors is when combining logical values together.

R has a number of ways to combine logical values together that build on the AND and OR operations in Boolean algebra:

`all`

and`any`

for combining the values in a single vector (are`all`

of the values TRUE; are`any`

of the values TRUE)`&`

,`&&`

(representing “AND”),`|`

, and`||`

(for “OR”) for combining two different vectors

```
is_april = TRUE
is_r_released = TRUE
is_already_tomorrow = FALSE
```

```
# Logical AND within a single vector
all(c(is_april, is_r_released, is_already_tomorrow))
## [1] FALSE
# Logical OR within a single vector
any(c(is_april, is_r_released, is_already_tomorrow))
## [1] TRUE
```

```
# Logical AND between vectors
is_april & is_r_released
## [1] TRUE
is_april && is_already_tomorrow
## [1] FALSE
# Logical OR between vectors
is_april | is_r_released
## [1] TRUE
is_april || is_already_tomorrow
## [1] TRUE
```

For scalars, there’s no difference between the single-character
operators (`&`

, `|`

) and the two-character operators (`&&`

, `||`

). So
why have a pair of operators for each concept?

`&&`

and`||`

are intended for use*solely*with scalars, they return a single logical value.`&`

and`|`

work with multivalued vectors, they return a vector whose length matches their input arguments.

Since they always return a scalar logical, you *should* use `&&`

and
`||`

in your if/while conditional expressions (when needed). If an `&`

or `|`

is used, you may end up with a non-scalar vector inside
`if (...) {}`

and R will throw an error.

To illustrate the difference between the scalar operators and vectorised operators, here’s an example:

```
x = c(TRUE, TRUE, FALSE, FALSE)
y = c(TRUE, FALSE, TRUE, FALSE)
```

The vectorised operators apply AND/OR on matched pairs of elements:

```
x & y # c(x[1] && y[1], x[2] && y[2], ...)
## [1] TRUE FALSE FALSE FALSE
```

```
x | y # c(x[1] || y[1], x[2] || y[2], ...)
## [1] TRUE TRUE TRUE FALSE
```

In R 4.2.0, a warning is thrown when a non-scalar input is passed to the
scalar-operators. But, a scalar logical is returned (here, the result of
`x[1] && y[1]`

). In earlier versions of R, no warning was printed.

```
# R 4.2
x && y
[1] TRUE
Warning messages:
1: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
2: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
```

This could lead to hidden bugs. For example, if you used this code in an
`if`

conditional, a warning would be printed when a non-scalar vector
was used but the code would continue happily:

```
# R 4.2
if (x && y) {
print("The world can't end today...")
}
[1] "The world can't end today..."
Warning messages:
1: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
2: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
```

In R 4.3.0, this warning has been elevated to an error and no value is returned:

```
# R 4.3
x && y
Error in x && y : 'length = 4' in coercion to 'logical(1)'
```

This more strict version of the scalar comparison operators will help catch those bugs where you didn’t realise a logical variable could contain more than one entry.

To check whether the strict comparison operators will affect your existing code, before upgrading to R 4.3.0, you can set an environment variable before running it:

```
# In R:
Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2" = TRUE)
```

Whether you want to start from scratch, or improve your skills, Jumping Rivers has a training course for you.

### A more logical flow

Where else do we work with scalars in R? Many functions expect certain
arguments to be scalars. For example, the `seq()`

function complains
with non-scalar arguments:

```
seq(from = 1:3, to = 4)
## Error in seq.default(from = 1:3, to = 4): 'from' must be of length 1
```

```
seq(from = 1, to = 4:5)
## Error in seq.default(from = 1, to = 4:5): 'to' must be of length 1
```

There are several other places where R will throw an error if we provide a value that is of the wrong size:

```
a_data_frame[[column_index]] # column_index must be a scalar
a_matrix[rows, cols] = value # value must match the size of the replaced element(s)
```

There are other places where R will throw a warning, and try to gracefully handle values that are of an unexpected size:

```
# R's recycling rules are used to match the size of the vector input
c(1, 3, 5) * c(2, 3) # c(1 * 2, 3 * 3, 5 * 2)
## Warning in c(1, 3, 5) * c(2, 3): longer object length is not a multiple of
## shorter object length
## [1] 2 9 10
# The smaller vector was recycled to match the size of the larger
# c(1, 3, 5) * c(2, 3, 2)
```

An interesting case is the `:`

operator, which like `seq()`

, can be used
to create sequences of numbers.

```
3:5
## [1] 3 4 5
```

If we provide a non-scalar on either side of the operator, R will warn us:

```
# R 4.2
(1:2) : 5
[1] 1 2 3 4 5
Warning message:
In (1:2):5 : numerical expression has 2 elements: only the first used
```

```
# R 4.2
1 : (4:6)
[1] 1 2 3 4
Warning message:
In 1:(4:6) : numerical expression has 3 elements: only the first used
```

Now, because the output should be a single sequence, R has to pick a specific value for the start- and the end-point of that sequence from the arguments provided. It uses the first entry in each argument. So,

`(1:2) : 5`

is equivalent to`1:5`

; and`1 : (4:6)`

is equivalent to`1:4`

.

If your code is providing non-scalar arguments to `:`

, there may be a
bug in your code or the packages that it depends upon. R 4.3.0 has
introduced a more strict setting, which will catch the use of non-scalar
values when constructing sequences with the `:`

operator.

Much like with the stricter logic comparisons described above, the R
developers have introduced this as an optional setting. After setting
the environment variable `_R_CHECK_LENGTH_COLON_`

to a true value, R
will throw an error whenever an oversized argument is passed into `a:b`

.

```
# R 4.3
# Without the check enabled:
(1:2) : 5
[1] 1 2 3 4 5
Warning message:
In (1:2):5 : numerical expression has 2 elements: only the first used
# With the strict check enabled:
Sys.setenv("_R_CHECK_LENGTH_COLON_" = TRUE)
(1:2) : 5
Error in (1:2):5 : numerical expression has length > 1
```

### And finally: Extracting from a pipe

Have you started using the native pipe yet? In our blog post to celebrate the release of R 4.2.0, we showed this example:

```
mtcars |> lm(mpg ~ disp, data = _)
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Coefficients:
## (Intercept) disp
## 29.59985 -0.04122
```

Here the pipe `|>`

passes the value on it’s left-hand side into the
function on the right. By default that value will be used as the first
argument to the right-hand function. But when an underscore is present,
the piped-in value will replace that underscore. So the above is
equivalent to:

```
lm(mpg ~ disp, data = mtcars)
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Coefficients:
## (Intercept) disp
## 29.59985 -0.04122
```

What if you want to extract values that are output by a pipeline? For
example, if you want the `coef`

entry from the linear model above. One
way would be to store the results in a variable and extract the `coef`

from that:

```
model = mtcars |> lm(mpg ~ disp, data = _)
model$coef
## (Intercept) disp
## 29.59985476 -0.04121512
```

Or you could wrap the pipeline in parentheses:

```
(
mtcars |> lm(mpg ~ disp, data = _)
)$coef
## (Intercept) disp
## 29.59985476 -0.04121512
```

R 4.3.0 provides a much neater solution, where the underscore `_`

can be
used to refer to the final value from a pipeline. This can make your
code much neater:

```
mtcars |> lm(mpg ~ disp, data = _) |> _$coef
(Intercept) disp
29.59985476 -0.04121512
```

### Trying the latest version out for yourself

To take away the pain of installing the latest development version of R,
you can use docker. To use the `devel`

version of R, you can use the
following commands:

```
docker pull rstudio/r-base:devel-jammy
docker run --rm -it rstudio/r-base:devel-jammy
```

See the `r-docker`

project for more
details.

### See also

Do you have nostalgia for previous versions of R? If so, check out our previous blog posts: