R 4.4.0 (“Puppy Cup”) was released on the 24th April 2024 and it is a
beauty. In time-honoured tradition, here we summarise some of the
changes that caught our eyes. R 4.4.0 introduces some cool features (one
of which is experimental) and makes one of our favourite {rlang}
operators available in base R. There are a few things you might need to
be aware of regarding handling `NULL`

and `complex`

values.

The full changelog can be found at the r-release ‘NEWS’ page and if you want to keep up to date with developments in base R, have a look at the r-devel ‘NEWS’ page.

Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.

### A tail-recursive tale

Years ago, before I’d caused my first stack overflow, my Grandad used to tell me a daft tale:

```
It was on a dark and stormy night,
And the skipper of the yacht said to Antonio,
"Antonio, tell us a tale",
So Antonio started as follows...
It was on a dark and stormy night,
And the skipper of the yacht .... [ad infinitum]
```

The tale carried on in this way forever. Or at least it would until you were finally asleep.

At around the same age, I was toying with BASIC programming and could knock out classics such as

```
>10 PRINT "Ali stinks!"
>20 GOTO 10
```

Burn! Infinite burn!

That was two example processes that demonstrate recursion. Antonio’s tale quotes itself recursively, and my older brother will be repeatedly mocked unless someone intervenes.

Recursion is an elegant approach to many programming problems - this usually takes the form of a function that can call itself. You would use it when you know how to get closer to a solution, but not necessarily how to get directly to that solution. And unlike the un-ending examples above, when we write recursive solutions to computational problems, we include a rule for stopping.

An example from mathematics would be finding zeros for a continuous function. The sine function provides a typical example:

We can see that when *x* = *π*, there is a zero for `sin(x)`

, but the
computer doesn’t know that.

One recursive solution to finding the zeros of a function, `f()`

, is the
bisection method,
which iteratively narrows a range until it finds a point where `f(x)`

is
close enough to zero. Here’s a quick implementation of that algorithm.
If you need to perform root-finding in R, please don’t use the following
function. `stats::uniroot()`

is much more robust…

```
bisect = function(f, interval, tolerance, iteration = 1, verbose = FALSE) {
if (verbose) {
msg = glue::glue(
"Iteration {iteration}: Interval [{interval[1]}, {interval[2]}]"
)
message(msg)
}
# Evaluate 'f' at either end of the interval and return
# any endpoint where f() is close enough to zero
lhs = interval[1]; rhs = interval[2]
f_left = f(lhs); f_right = f(rhs)
if (abs(f_left) <= tolerance) {
return(lhs)
}
if (abs(f_right) <= tolerance) {
return(rhs)
}
stopifnot(sign(f_left) != sign(f_right))
# Bisect the interval and rerun the algorithm
# on the half-interval where y=0 is crossed
midpoint = (lhs + rhs) / 2
f_mid = f(midpoint)
new_interval = if (sign(f_mid) == sign(f_left)) {
c(midpoint, rhs)
} else {
c(lhs, midpoint)
}
bisect(f, new_interval, tolerance, iteration + 1, verbose)
}
```

We know that *π* is somewhere between 3 and 4, so we can find the zero
of `sin(x)`

as follows:

```
bisect(sin, interval = c(3, 4), tolerance = 1e-4, verbose = TRUE)
#> Iteration 1: Interval [3, 4]
#> Iteration 2: Interval [3, 3.5]
#> Iteration 3: Interval [3, 3.25]
#> Iteration 4: Interval [3.125, 3.25]
#> Iteration 5: Interval [3.125, 3.1875]
#> Iteration 6: Interval [3.125, 3.15625]
#> Iteration 7: Interval [3.140625, 3.15625]
#> Iteration 8: Interval [3.140625, 3.1484375]
#> Iteration 9: Interval [3.140625, 3.14453125]
#> Iteration 10: Interval [3.140625, 3.142578125]
#> Iteration 11: Interval [3.140625, 3.1416015625]
#> [1] 3.141602
```

It takes 11 iterations to get to a point where `sin(x)`

is within
10^{−4} of zero. If we tightened the tolerance, had a more
complicated function, or had a less precise starting range, it might
take many more iterations to approximate a zero.

Importantly, this is a recursive algorithm - in the last statement of
the `bisect()`

function body, we call `bisect()`

again. The initial call
to `bisect()`

(with `interval = c(3, 4)`

) has to wait until the second
call to `bisect()`

(`interval = c(3, 3.5)`

) completes before it can
return (which in turn has to wait for the third call to return). So we
have to wait for 11 calls to `bisect()`

to complete before we get our
result.

Those function calls get placed on a computational object named the
call stack. For each
function call, this stores details about how the function was called and
where from. While waiting for the first call to `bisect()`

to complete,
the call stack grows to include the details about 11 calls to
`bisect()`

.

Imagine our algorithm didn’t just take 11 function calls to complete, but thousands, or millions. The call stack would get really full and this would lead to a “stack overflow” error.

We can demonstrate a stack-overflow in R quite easily:

```
blow_up = function(n, max_iter) {
if (n >= max_iter) {
return("Finished!")
}
blow_up(n + 1, max_iter)
}
```

The recursive function behaves nicely when we only use a small number of iterations:

```
blow_up(1, max_iter = 100)
#> [1] "Finished!"
```

But the call-stack gets too large and the function fails when we attempt to use too many iterations. Note that we get a warning about the size of the call-stack before we actually reach it’s limit, so the R process can continue after exploding the call-stack.

```
blow_up(1, max_iter = 1000000)
# Error: C stack usage 7969652 is too close to the limit
```

In R 4.4, we are getting (experimental) support for tail-call recursion. This allows us (in many situations) to write recursive functions that won’t explode the size of the call stack.

How can that work? In our `bisect()`

example, we still need to make 11
calls to `bisect()`

to get a result that is close enough to zero, and
those 11 calls will still need to be put on the call-stack.

Remember the first call to `bisect()`

? It called `bisect()`

as the very
last statement in it’s function body. So the value returned by the
second call to `bisect()`

was returned to the user without modification
by the first call. So we could return the second call’s value directly
to the user, instead of returning it via the first `bisect()`

call;
indeed, we could remove the first call to `bisect()`

from the call stack
and put the second call in it’s place. This would prevent the call stack
from expanding with recursive calls.

The key to this (in R) is to use the new `Tailcall()`

function. That
tells R “you can remove me from the call stack, and put this cat on
instead”. Our final line in `bisect()`

should look like this:

```
bisect = function(...) {
... snip ...
Tailcall(bisect, f, new_interval, tolerance, iteration + 1, verbose)
}
```

Note that you are passing the name of the recursively-called function
into `Tailcall()`

, rather than a call to that function (`bisect`

rather
than `bisect(...)`

).

To illustrate that the stack no longer blows up when tail-call recursion
is used. Let’s rewrite our `blow_up()`

function:

```
# R 4.4.0
blow_up = function(n, max_iter) {
if (n >= max_iter) {
return("Finished!")
}
Tailcall(blow_up, n+1, max_iter)
}
```

We can still successfully use a small number of iterations:

```
blow_up(1, 100)
#> [1] "Finished!"
```

But now, even a million iterations of the recursive function can be performed:

```
blow_up(1, 1000000)
#> [1] "Finished!"
```

Note that the tail-call optimisation only works here, because the
recursive call was made as the very last step in the function body. If
your function needs to modify the value after the recursive call, you
may not be able to use `Tailcall()`

.

### Rejecting the NULL

Missing values are everywhere.

In a typical dataset you might have missing values encoded as `NA`

(if
you’re lucky) and invalid numbers encoded as `NaN`

, you might have
implicitly missing rows (for example, a specific date missing from a
time series) or factor levels that aren’t present in your table. You
might even have empty vectors, or data-frames with no rows, to contend
with. When writing functions and data-science workflows, where the input
data may change over time, by programming defensively and handling these
kinds of edge-cases your code will throw up less surprises in the long
run. You don’t want a critical report to fail because a mathematical
function you wrote couldn’t handle a missing value.

When programming defensively with R, there is another important form of missingness to be cautious of …

The `NULL`

object.

`NULL`

is an actual object. You can assign it to a variable, combine it
with other values, index into it, pass it into (and return it from) a
function. You can also test whether a value is `NULL`

.

```
# Assignment
my_null = NULL
my_null
#> NULL
# Use in functions
my_null[1]
#> NULL
c(NULL, 123)
#> [1] 123
c(NULL, NULL)
#> NULL
toupper(NULL)
#> character(0)
# Testing NULL-ness
is.null(my_null)
#> [1] TRUE
is.null(1)
#> [1] FALSE
identical(my_null, NULL)
#> [1] TRUE
# Note that the equality operator shouldn't be used to
# test NULL-ness:
NULL == NULL
#> logical(0)
```

R functions that are solely called for their side-effects (`write.csv()`

or `message()`

, for example) often return a `NULL`

value. Other
functions may return `NULL`

as a valid value - one intended for
subsequent use. For example, list-indexing (which is a function call,
under the surface) will return `NULL`

if you attempt to access an
undefined value:

```
config = list(user = "Russ")
# When the index is present, the associated value is returned
config$user
#> [1] "Russ"
# But when the index is absent, a `NULL` is returned
config$url
#> NULL
```

Similarly, you can end up with a `NULL`

output from an incomplete stack
of `if`

/ `else`

clauses:

```
language = "Polish"
greeting = if (language == "English") {
"Hello"
} else if (language == "Hawaiian") {
"Aloha"
}
greeting
#> NULL
```

A common use for `NULL`

is as a default argument in a function
signature. A `NULL`

default is often used for parameters that aren’t
critical to function evaluation. For example, the function signature for
`matrix()`

is as follows:

```
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
```

The `dimnames`

parameter isn’t really needed to create a `matrix`

, but
when a non-`NULL`

value for `dimnames`

is provided, the values are used
to label the row and column names of the created `matrix`

.

```
matrix(1:4, nrow = 2)
#> [,1] [,2]
#> [1,] 1 3
#> [2,] 2 4
matrix(1:4, nrow = 2, dimnames = list(c("2023", "2024"), c("Jan", "Feb")))
#> Jan Feb
#> 2023 1 3
#> 2024 2 4
```

R 4.4 introduces the `%||%`

operator to help when handling variables
that are potentially `NULL`

. When working with variables that could be
`NULL`

, you might have written code like this:

```
# Remember there is no 'url' field in our `config` list
# Set a default value for the 'url' if one isn't defined in
# the config
my_url = if (is.null(config$url)) {
"https://www.jumpingrivers.com/blog/"
} else {
config$url
}
my_url
#> [1] "https://www.jumpingrivers.com/blog/"
```

Assuming `config`

is a `list`

:

- when the
`url`

entry is absent from`config`

(or is itself`NULL`

), then`config$url`

will be`NULL`

and the variable`my_url`

will be set to the default value; - but when the
`url`

entry is found within`config`

(and isn’t`NULL`

) then that value will be stored in`my_url`

.

That code can now be rewritten as follows:

```
# R 4.4.0
my_url = config$url %||% "https://www.jumpingrivers.com/blog"
my_url
#> [1] "https://www.jumpingrivers.com/blog"
```

Note that the left-hand value must evaluate to `NULL`

for the right-hand
side to be evaluated, and that empty vectors aren’t `NULL`

:

```
# R 4.4.0
NULL %||% 1
#> [1] 1
c() %||% 1
#> [1] 1
numeric(0) %||% 1
#> numeric(0)
```

This operator has been available in the `{rlang}`

package for eight
years and is implemented in exactly the same way. So if you have been
using `%||%`

in your code already, the base-R version of this operator
should work without any problems, though you may want to wait until you
are certain all your users are using R >= 4.4 before switching from
{rlang} to the base-R version of `%||%`

.

### Any other business

A shorthand hexadecimal format (common in web-programming) for specifying RGB colours has been introduced. So, rather than writing the 6-digit hexcode for a colour “#112233”, you can use “#123”. This only works for those 6-digit hexcodes where the digits are repeated in pairs.

Parsing and formatting of complex numbers has been improved. For
example, `as.complex("1i")`

now returns the complex number `0 + 1i`

,
previously it returned `NA`

.

There are a few other changes related to handling `NULL`

that have been
introduced in R 4.4. The changes highlight that `NULL`

is quite
different from an empty vector. Empty vectors contain nothing, whereas
`NULL`

represents nothing. For example, whereas an empty numeric vector
is considered to be an atomic (unnestable) data structure, `NULL`

is no
longer atomic. Also, `NCOL(NULL)`

(the number of columns in a matrix
formed from `NULL`

) is now 0, whereas it was formerly 1.

`sort_by()`

a new function for sorting objects based on values in a
separate object. This can be used to sort a `data.frame`

based on it’s
columns (they should be specified as a formula):

```
mtcars |> sort_by(~ list(cyl, mpg)) |> head()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
```

### Try the latest version out for yourself

To take away the pain of installing the latest development version of R,
you can use docker. To use the `devel`

version of R, you can use the
following commands:

```
docker pull rstudio/r-base:devel-jammy
docker run --rm -it rstudio/r-base:devel-jammy
```

Once R 4.4 is the released version of R and the `r-docker`

repository
has been updated, you should use the following command to test out R
4.4.

```
docker pull rstudio/r-base:4.4-jammy
docker run --rm -it rstudio/r-base:4.4-jammy
```

### See also

The R 4.x versions have introduced a wealth of interesting changes. These have been summarised in our earlier blog posts: