Retrieval-Augmented Generation: Setting up a Knowledge Store in R

Published: January 8, 2026

tags: r, large-language-models, llm, artificial-intelligence, ai, retrieval-augmented-generation, rag, ellmer, ragnar

Happy New Year from the team at Jumping Rivers!

Now that we’re well into the second-half of the 2020s, it’s a good time to reflect on the changes that we have seen so far in this decade. In the world of data science nothing has dominated headlines quite like the rapid growth and uptake of generative artificial intelligence (GenAI).

Large language models (LLMs) such as ChatGPT, Claude and Gemini have incredible potential to streamline day-to-day tasks, whether that’s processing vast amounts of information, providing a human-like chat interface for customers or generating code. But they also come with notable risks if not harnessed responsibly.

Anyone that has interacted with these models is likely to have come across hallucination, where the model confidently presents false information as though it is factually correct. This can happen for a variety of reasons:

LLMs often have no access to real-time information: how would a model that was trained last year know today’s date?
The training data may be missing domain-specific information: can we really trust an off-the-shelf model to have a good understanding of pharmaceuticals and medicinal drugs?
The model may be over-eager to come across as intelligent, so it decides to provide a confident output rather than a more nuanced, honest answer.

Often we need to give the model access to additional contextual information before we can make it “production-ready”. We can achieve this using a retrieval-augmented generation (RAG) workflow. In this blog post we will explore the steps involved and set up an example RAG workflow using free and open source packages in R.

What is RAG?

In a typical interaction with an LLM we have:

A user prompt: the text that is submitted by the user.
A response: the text that is returned by the LLM.
(optional) A system prompt: additional instructions for how the LLM should respond (for example, "You respond in approximately 10 words or less").

In a RAG workflow we provide access to an external knowledge store which can include text-based documents and webpages. Additional contextual info is then retrieved from the knowledge store (hence “retrieval”) and added to the user prompt before it is sent. In doing so we can expect to receive a higher quality output.

How does it work?

Before going further, we must first introduce the concept of vectorisation.

Contrary to what you might believe, LLMs do not understand non-numerical text! They are mathematical models, meaning they can only ingest and output numerical vectors.

So how can a user interact with a model using plain English? The trick is that mappings exist which are able to convert between numerical vectors and text. These mappings are called “vector embeddings” and are used to convert the user prompt into a vector representation before it is passed to the LLM.

So, when setting up our RAG knowledge store, we have to store the information using a compatible vector representation. With this in mind, let’s introduce a typical RAG workflow:

Content: we decide which documents to include in the knowledge store.
Extraction: we extract the text from these documents in Markdown format.
Chunking: the Markdown content is split into contextual “chunks” (for example, each section or subsection of a document might become a chunk).
Vectorisation: the chunks are “vectorised” (i.e. we convert them into a numerical vector representation).
Index: we create an index for our knowledge store which will be used to retrieve relevant chunks of information.
Retrieval: we register the knowledge store with our model interface. Now, when a user submits a prompt, it will be combined with relevant chunks of information before it is ingested by the model.

At the retrieval step, a matching algorithm is typically used so that only highly relevant chunks are retrieved from the knowledge store. In this way, we are able to keep the size of the user prompts (and any incurred costs) to a minimum.

Setting up a RAG workflow in R

We will make use of two packages which are available to install via the Comprehensive R Archive Network (CRAN). Both are actively maintained by Posit (formerly RStudio) and are free to install and use.

{ragnar}

The {ragnar} package provides functions for extracting information from both text-based documents and webpages, and provides vector embeddings that are compatible with popular LLM providers including OpenAI and Google.

We will use {ragnar} to build our knowledge store.

{ellmer}

The {ellmer} package allows us to interact with a variety of LLM APIs from R. A complete list of supported model providers can be found in the package documentation.

Note that, while {ellmer} is free to install and use, you will still need to set up an API token with your preferred model provider before you can interact with any models. We will use the free Google Gemini tier for our example workflow. See the Gemini API documentation for instructions on creating an API key, and the {ellmer} documentation for authenticating with your API key from R.

Example RAG workflow

We begin by loading the {ragnar} package.

library("ragnar")

The URL provided below links to the title page of the “Efficient R Programming” textbook, written by Robin Lovelace and our very own Colin Gillespie. We’re going to use a couple of chapters from the book to construct a RAG knowledge store.

url = "https://csgillespie.github.io/efficientR/"

Let’s use {ragnar} to read the contents of this page into a Markdown format.

md = read_as_markdown(url)

We could vectorise this information as it is, but first we should split it up into contextual chunks.

chunks = markdown_chunk(md)
chunks
#> # @document@origin: https://csgillespie.github.io/efficientR/
#> # A tibble:         2 × 4
#>   start   end context                                text                      
#> * <int> <int> <chr>                                  <chr>                     
#> 1     1  1572 ""                                     "# Efficient R programmin…
#> 2   597  2223 "# Welcome to Efficient R Programming" "## Authors\n\n[Colin Gil…

The chunks are stored in a tibble format, with one row per chunk. The text column stores the chunk text (in the interests of saving space we have only included the start of each chunk in the printed output above).

The title page has been split into two chunks and we can see that there is significant overlap (chunk 1 spans characters 1 to 1572 and chunk 2 spans characters 597 to 2223). Overlapping chunks are perfectly normal and provides added context as to where each chunk sits relative to the other chunks.

Note that you can visually inspect the chunks by running ragnar_chunks_view(chunks).

It’s time to build our knowledge store with a vector embedding that is appropriate for Google Gemini models.

# Initialise a knowledge store with the Google Gemini embedding
store = ragnar_store_create(
  embed = embed_google_gemini()
)

# Insert the Markdown chunks
ragnar_store_insert(store, chunks)

The Markdown chunks are automatically converted into a vector representation at the insertion step. It is important to use the appropriate vector embedding when we create the store. A knowledge store created using an OpenAI embedding will not be compatible with Google Gemini models!

Before we can retrieve information from our store, we must create a store index.

ragnar_store_build_index(store)

We can now test the retrieval capabilities of our knowledge store using the ragnar_retreive() function. For example, to retrieve any chunks relevant to the text Who are the authors of “Efficient R Programming”? we can run:

relevant_knowledge = ragnar_retrieve(
  store,
  text = "Who are the authors of \"Efficient R Programming\"?"
)
relevant_knowledge
#> # A tibble: 1 × 9
#>   origin        doc_id chunk_id start   end cosine_distance bm25  context text 
#>   <chr>          <int> <list>   <int> <int> <list>          <lis> <chr>   <chr>
#> 1 https://csgi…      1 <int>        1  2223 <dbl [2]>       <dbl> ""      "# E…

Note that the \ operators in \"Efficient R Programming\" have been used to print raw double quotes in the character string.

Without going into too much detail, the cosine_distance and bm25 columns in the returned tibble provide information relating to the matching algorithm used to identify the chunks. The other columns relate to the location and content of the chunks.

From the output tibble we see that the full content of the title page (characters 1 to 2223) has been returned. This is because the original two chunks both contained information about the authors.

Let’s add a more technical chapter from the textbook to the knowledge store. The URL provided below links to Chapter 7 (“Efficient Optimisation”). Let’s add this to the knowledge store and rebuild the index.

url = "https://csgillespie.github.io/efficientR/performance.html"

# Extract Markdown content and split into chunks
chunks = url |>
  read_as_markdown() |>
  markdown_chunk()

# Add the chunks to the knowledge store
ragnar_store_insert(store, chunks)

# Rebuild the store index
ragnar_store_build_index(store)

Now that our knowledge store includes content from both the title page and Chapter 7, let’s ask something more technical, like What are some good practices for parallel computing in R?.

relevant_knowledge = ragnar_retrieve(
  store,
  text = "What are some good practices for parallel computing in R?"
)
relevant_knowledge
#> # A tibble: 4 × 9
#>   origin        doc_id chunk_id start   end cosine_distance bm25  context text 
#>   <chr>          <int> <list>   <int> <int> <list>          <lis> <chr>   <chr>
#> 1 https://csgi…      1 <int>        1  2223 <dbl [2]>       <dbl> ""      "# E…
#> 2 https://csgi…      2 <int>        1  1536 <dbl [1]>       <dbl> ""      "# 7…
#> 3 https://csgi…      2 <int>    22541 23995 <dbl [1]>       <dbl> "# 7 E… "## …
#> 4 https://csgi…      2 <int>    23996 26449 <dbl [2]>       <dbl> "# 7 E… "The…

Four chunks have been returned:

One chunk from the title page of the textbook.
One chunk from the start of Chapter 7.
Two chunks from Section 7.5 (“Parallel Computing”).

It makes sense that we have chunks from Section 7.5, which appears to be highly relevant to the question. By including the title page and the start of Chapter 7, the LLM will also have access to useful metadata in case the user wants to find out where the model is getting its information from.

Now that we have built and tested our retrieval tool, it’s time to connect it up to a Gemini interface using {ellmer}. The code below will create a chat object allowing us to send user prompts to Gemini.

chat = ellmer::chat_google_gemini(
  system_prompt = "You answer in approximately 10 words or less."
)

A system prompt has been included here to ensure a succinct response from the model API.

We can register this chat interface with our retrieval tool.

ragnar_register_tool_retrieve(chat, store)

To check if our RAG workflow has been set up correctly, let’s chat with the model.

chat$chat("What are some good practices for parallel computing in R?")
#> Use the `parallel` package, ensure you stop clusters with `stopCluster()` (or 
#> `on.exit()`), and utilize `parLapply()`, `parApply()`, or `parSapply()`.

The output looks plausible. Just to make sure, let’s check where the model found out this information.

chat$chat("Where did you get that answer from?")
#> I retrieved the information from "Efficient R programming" by Colin Gillespie 
#> and Robin Lovelace.

Success! The LLM has identified the name of the textbook and if we wanted to we could even ask about the specific chapter. A user interacting with our model interface could now search online for this textbook to fact-check the responses.

In the example workflow above, we manually selected a couple of chapters from the textbook to include in our knowledge store. It’s worth noting that you can also use the ragnar_find_links(url) function to retrieve a list of links from a given webpage.

Doing so for the title page will provide the links to all chapters.

ragnar_find_links("https://csgillespie.github.io/efficientR/")
#>  [1] "https://csgillespie.github.io/efficientR/"                                  
#>  [2] "https://csgillespie.github.io/efficientR/building-the-book-from-source.html"
#>  [3] "https://csgillespie.github.io/efficientR/collaboration.html"                
#>  [4] "https://csgillespie.github.io/efficientR/data-carpentry.html"               
#>  [5] "https://csgillespie.github.io/efficientR/hardware.html"                     
#>  [6] "https://csgillespie.github.io/efficientR/index.html"                        
#>  [7] "https://csgillespie.github.io/efficientR/input-output.html"                 
#>  [8] "https://csgillespie.github.io/efficientR/introduction.html"                 
#>  [9] "https://csgillespie.github.io/efficientR/learning.html"                     
#> [10] "https://csgillespie.github.io/efficientR/performance.html"                  
#> [11] "https://csgillespie.github.io/efficientR/preface.html"                      
#> [12] "https://csgillespie.github.io/efficientR/programming.html"                  
#> [13] "https://csgillespie.github.io/efficientR/references.html"                   
#> [14] "https://csgillespie.github.io/efficientR/set-up.html"                       
#> [15] "https://csgillespie.github.io/efficientR/workflow.html"

You could then iterate through these links, extracting the contents from each webpage and inserting these into your RAG knowledge store. Just note, however, that including additional information in your store will likely increase the amount of text being sent to the model, which could raise costs. You should therefore think about what information is actually relevant for your LLM application.

Summary

In summary, we have introduced the concept of retrieval-augmented generation for LLM-powered workflows and built an example workflow in R using open source packages.

Before finishing, we are excited to announce that our new course “LLM-Driven Applications with R & Python” has just been added to our training portfolio. You can search for it here.

If you’re interested in practical AI-driven workflows, we would love to see you at our upcoming AI In Production 2026 conference which is running from 4-5 June in Newcastle-Upon-Tyne. If you would like to present a talk or workshop, please submit your abstracts before the deadline on 23 January.