In the previous section, we saw that Shreya faced a problem. When she asked a simple question expecting a straightforward answer, she was overwhelmed with too much information and responses she didn't request. To recap, the question was:

What was the most common control systems topic asked in last 5 years?

The response she received included:

What was the most common control systems topic asked in last 5 years?

What are control systems and its usage.

What important topics does control systems include?

What was the most common topics asked in last 5 years? (From other subjects as well)

Now she started looking for an improvement algorithm and thinks she has found one.

Reciprocal Rank Fusion

In the previous section, we transformed the user's query and found documents from the vector database using similarity search. Unlike before, where we dumped all the matched documents into the LLM, we will now rank the documents based on how often they appear in transformed queries and how early they appear in the order.

Here's the flowchart of the architecture. Let's go through this step by step:

Reciprocal Rank Fusion Flowchart

Before the retrieval step, as you can see, the flowchart is exactly the same as it was in the previous section. We take the user's query as input and transform it into several other queries.

Retrieval Step

The retrieval step is similar to the previous section, with a few minor changes highlighted in the flowchart. When we pass a query to our similarity_search method, the function might return multiple chunks because different parts of the PDF may relate to the query. The different chunks found by the function are shown in various colors in the diagram, such as Red, Green, Yellow, and Blue. We will refer to them by their color in the following paragraphs.

In this technique, I have also included the user's original query in similarity_search, which is always a good idea to implement.

Now, based on how often the chunks appear in similarity_search and their order, each chunk will be assigned a reciprocal_rank_fusion(rrf_score). Only the higher ranked chunks will be sent to the LLM, while the rest will be ignored. The threshold rrf_score for including documents in the LLM call will be determined by the developer according to the use case.

Ranking Algorithm

The method for calculating rrf_score can vary depending on the use case. Here, I will explain the most common approach that is generally used.

What’s our goal?

Favor documents that appear more frequently.
Favor documents that appear earlier (higher) in the lists.

Instead of just counting how many times a document appears or averaging ranks, this approach uses a formula that gives more weight to documents that appear early in any list.

The formula is:

$$\text{Score}(d) = \sum_{i=1}^{n} \frac{1}{k + \text{rank}_{i,d}}$$

rank i,d is the rank of document d in list i (starting from 1).
k is a constant to make sure that scores aren't dominated by rank 1s (usually k=60 is used).
If a document doesn't appear in a list, it contributes zero.

Let’s take an example to better understand this approach:

Assume we have 3 query results:

Query 1: [A, B, C, D]
Query 2: [B, C, E]
Query 3: [C, A, F]

Let’s use k=60.

The k keeps the impact of one very early rank from being too extreme (so rank 1 isn’t 1.0 but more like 0.016).

We now calculate scores:

For document A:

Appears in Query 1 at rank 1 → 1 / (60 + 1) = 1/61
Appears in Query 3 at rank 2 → 1 / (60 + 2) = 1/62
Total: ~0.0164 + ~0.0161 = 0.0325

For document C:

Appears in Query 1 at rank 3 → 1 / (60 + 3) = 1/63
Appears in Query 2 at rank 2 → 1 / (60 + 2) = 1/62
Appears in Query 3 at rank 1 → 1 / (60 + 1) = 1/61
Total: ~0.0159 + 0.0161 + 0.0164 = 0.0484

Similarly, we can do this for all documents and sort them by total score.

Once all scores are computed, since we only want to keep the best ones — so filter out those whose score is below a certain threshold.

Now that we have a good understanding of the ranking algorithm, let's see how we can implement this in code:

from collections import defaultdict

def reciprocal_rank_fusion(rankings, k=60, threshold=0.0):
    rrf_scores = defaultdict(float)

    # Iterate over each ranking list
    for ranking in rankings:
        for rank, doc_id in enumerate(ranking):
            # Calculate RRF score: 1 / (k + rank + 1)
            rrf_scores[doc_id] += 1 / (k + rank + 1)

    # Filter by threshold and sort by score in descending order
    filtered = [(doc_id, score) for doc_id, score in rrf_scores.items() if score >= threshold]
    return sorted(filtered, key=lambda x: x[1], reverse=True)

# Example input: 3 ranked lists of documents from different queries
rankings = [
    ["A", "B", "C", "D"],  # Query 1
    ["B", "C", "E"],        # Query 2
    ["C", "A", "F"]         # Query 3
]

# Apply RRF to combine rankings and filter based on threshold
result = reciprocal_rank_fusion(rankings, k=60, threshold=0.02) # Adjust threshold as needed

# Print the results
print("Ranked Documents based on Reciprocal Rank Fusion:")
for doc, score in result:
    print(f"Document: {doc}, RRF Score: {score:.4f}")

The above code can be used to filter out chunks based on their rrf_score.

Generation Step

After filtering the most relevant chunks, the process is the same as we've been doing since chapter 1. Simply insert those chunks into the LLM along with the user’s query, and the LLM will handle the rest. Then, return its response to the user.

Click here to get the full code…

Issue

After doing this, Shreya was over the moon because she thought her system was now super powerful and could answer any type of query. She played with it for a few minutes, but her smile vanished when she asked:

Trace how digital logic topics expanded in the last five years.

The system didn't respond as she expected. However, since she is brave and a great problem-solver, she wasn't disheartened and went back to the drawing board to improve the system further.

Let's see what she did in the next chapter.

In this article, we explore a solution to improve the retrieval and ranking of documents in response to user queries using the Reciprocal Rank Fusion (RRF) algorithm. Shreya's original problem was receiving overwhelming responses to her query about common control systems topics over the last five years. To address this, the RRF algorithm was introduced to rank documents based on their frequency and order of appearance across transformed queries. This process filters out less relevant documents before sending them to a language model for a concise response. Despite initially feeling triumphant with the implementation, Shreya encountered limitations when the system failed to adequately trace the expansion of digital logic topics over five years, prompting further refinement to enhance the system's capabilities.

Reciprocal Rank Fusion Aids Shreya

Reciprocal Rank Fusion