API as a package: Logging

Published: September 22, 2022

tags: r, python

This is part two of our three part series

Part 1: API as a package: Structure
Part 2: API as a package: Logging (this post)
Part 3: API as a package: Testing

Part 1 of this series laid out some ideas for how one might structure a {plumber} application as an R package, inspired by solutions such as {golem} and {leprechaun} for {shiny}. In this installment of the series we look at adding some functions to our package that will take care of logging as our application runs. If you haven’t already, we recommend reading the first installment of this series as the example package created for that post will form the basis of the starting point for this one.

Getting started

There are numerous packages for logging in the R programming language; one of our favourites is {logger} as it provides a host of useful functions out of the box, yet it remains easily customisable. If you are not familiar with {logger} we highly recommend reading the articles on the packagedown site for it.

Persistant logging

For a deployed application, it is crucial that log messages are persisted to disk. When publishing on ~~RStudio~~ Posit Connect here at Jumping Rivers we like our log messages to both display in the console (as they can be quickly viewed from the web interface) and be persisted to disk for future recovery. The {logger} package makes this trivial to do using the logger::appender_tee().

To set your logger to one that both displays in the console and persists to disk on a rotating log file basis we could do something like

library(logger)

file = "log"
log_appender(appender_tee(
  # set of 20 rotating log files, each with at most 2000
  # lines in
  file, max_lines = 2000L, max_files = 20L
))

By rotating log files, we prevent the log files consuming an increasing amount of storage at the expense of overwriting older logs.

Formatting

There are numerous file formats that could be used for logging, but JSON is a decent choice as it is fairly ubiquitous. The {logger} package provides a formatter function for this purpose already logger::formatter_json() which can be set as the default formatter. This will simply capture the data sent in a log request as a list and apply jsonlite::toJSON() to it.

The formatter is responsible for converting data into something that can be used for a log message. For organising this data and generating actual records we need a layout function. {logger} does provide a logger::layout_json() however for this application we found it to be not quite perfect. This was because we wanted to be able to log bits of data that were received in requests to the API. Using the provided layout function we found that this would stringify the JSON object received in a POST request, for example, and made it more difficult to do any processing of log files afterwards. For example

logger::log_formatter(logger::formatter_json)
logger::log_layout(logger::layout_json(fields = c("level", "msg")))
logger::log_appender(logger::appender_stdout)

x = "foo"
y = "bar"
z = 42
logger::log_info(x, y = y)
#> {"level":"INFO","msg":"{\"1\":\"foo\",\"y\":\"bar\"}"}
logger::log_info(request = "meaning of life", response = z)
#> {"level":"INFO","msg":"{\"request\":\"meaning of life\",\"response\":42}"}

To solve this we wrote a slightly modified version which allows arbitrary objects to be passed through to the logger. {logger} has some handy information on writing customisations for your logger objects.

layout_json_custom = function(fields = c("time")) {
  force(fields)
  # structure to match the logger documented requirements
  # for custom layout functions
  structure(
    function(
      level, msg, namespace = NA_character_,
      .logcall = sys.call(), .topcall = sys.call(-1),
      .topenv = parent.frame()
    ) {
      json = logger::get_logger_meta_variables(
        log_level = level, namespace = namespace,
        .logcall = .logcall, .topcall = .topcall,
        .topenv = .topenv
      )
      # take the message data passed in by the 
      # formatter and convert back to a list
      data = jsonlite::fromJSON(msg)
      sapply(
        msg, function(msg) {
          # reformat the output
          jsonlite::toJSON(c(json[fields], data), auto_unbox = TRUE)
        }
      )
  }, generator = deparse(match.call()))
}

Notice now how data in the log request is a top level field of the JSON, rather than stringified under msg.

logger::log_layout(layout_json_custom(fields = "level"))
logger::log_info(x, y = y)
#> {"level":"INFO","1":"foo","y":"bar"}
logger::log_info(request = "meaning of life", response = z)
#> {"level":"INFO","request":"meaning of life","response":42}

A setup function then

Following on from all of the above we could add a function to our API as a package.

#' Set up default logger
#'
#' Creates a rotating file log using json format, see
#' \link[logger]{appender_file} for details.
#'
#' @param dir directory path for logs
#' @export
setup_logger = function(dir = "./API_logs") {
  if (! dir.exists(dir)) dir.create(dir)
  f = normalizePath(path.expand(file.path(dir, "log")))
  logger::log_formatter(logger::formatter_json)
  logger::log_layout(layout_json_custom(
    fields = c(
      "time", "level", "ns", "ns_pkg_version", "r_version"
    )
  ))
  logger::log_appender(logger::appender_tee(
    f, max_lines = 2000L, max_files = 20L
  ))
}

We would then just stick a call to setup_logger() at the top of our entrypoint script for running the API.

Automatic logging with hooks

We can leverage {plumber} hooks to add some automatic logging at various points in the lifecycle of a request.

For example, we might be interested in logging all incoming requests and the data sent with it. As a point of note the data being received by your application for, say, a POST request, is a stream. This means that once it has been read once you need to set the stream back to the beginning to be able to read again, otherwise subsequent attempts to read the data (for example when {plumber} passes the request to the function handling a particular endpoint) will return nothing.

So we write a function to parse data in a request object

# extract request data from req environment
parse_req_data = function(req) {
  # if POST we will have content_length > 0
  if ((is.null(req$CONTENT_LENGTH)) || as.integer(req$CONTENT_LENGTH) == 0L) {
    return(NULL)
  }
  # rewind first as well, it seems plumber does not rewind the stream
  req$rook.input$rewind()
  data = rawToChar(req$rook.input$read())
  # rewind the stream before passing it on
  # as req is env (pass by reference)
  # we need to do this to ensure the stream is available for
  # internal plumber methods/functions.
  req$rook.input$rewind()
  data = jsonlite::fromJSON(data)
  data
}

and a function that will take the place of a hook for the pre-route stage of a request.

pre_route = function(req, res) {
  data = parse_req_data(req)
  logger::log_info(
    method = req$REQUEST_METHOD, path = req$PATH_INFO,
    origin = req$REMOTE_ADDR,
    data = data
  )
}

With all that in place we could create a simple function for adding the hooks to our plumber API

#' default hooks for API
#'
#' Adds a default set of hooks (currently only
#' pre route) to the API. These hooks will be used for
#' logging interactions with the running api.
#'
#' @param api a Plumber api object
#' @return a Plumber api object with added hooks
#' @export
add_default_hooks = function(api) {
  plumber::pr_hooks(
    api, list(
      preroute = pre_route
    )
  )
}

If you have been following the series, our entrypoint script would now look like

# updated for base R pipe
setup_logger()
get_internal_routes() |>
  create_routes() |>
  generate_api() |>
  add_default_hooks()

This version of the cookieCutter package can be found at our Github blog repository.

That’s it for this installment, there are other things you might choose to log automatically in a running application at different stages of the request lifecycle or indeed in other parts of your code base, but in the interest of keeping this to a manageable length, we will conclude here. The next installment in this series will look at some of the things we learned about testing {plumber} APIs and their functions within the context of {testthat} tests in an R package. Stay tuned!