Shreya Finds Excitement Again with Fan-Out Magic

In the last chapter, we saw how Shreya was discouraged by her system's response to one of her questions. She then returned to the drawing board to find ways to improve her system. Let's see what she discovered and whether it solved her issue.
Let me remind you, this was the question she asked:
Give me definitions, examples, plus tricky MCQs on LTI systems?
And the system responded with only definitions.
Parallel Query Retrieval / Fan-out Technique
In Parallel Query Retrieval, we create several different versions of the user's query, each focusing on a different aspect. Not clear yet? Don't worry, let's look at an example.
User’s Query:
How does garbage collection work in Python?
We'll input this query into an LLM and ask it to create multiple queries, each focusing on different aspects of the original question, such as:
What triggers garbage collection in Python?
What are Garbage collection algorithms in Python?
How memory leaks relate to GC?
This technique is called Parallel Query Retrieval or Fan-out technique.
How will this help?
As we saw in Shreya’s case, when we directly passed her detailed query about LTI Systems to the LLM, it didn't respond well. Now, let's apply the transformation mentioned above to this question.
The actual query was:
Give me definitions, examples, plus tricky MCQs on LTI systems?
The transformed queries would be something like:
Give me definitions of LTI systems.
Give me examples of LTI systems.
Give me tricky MCQs on LTI systems.
Now, if I pass these three queries individually through the Retrieval and Generation steps learned in the previous chapter, don't you think Shreya will get a better response?
Query 1 will focus only on the definitions.
Query 2 will focus only on the examples.
Query 3 will focus only on the MCQs.
After compiling the LLM responses for all three queries, Shreya will achieve her objective.
Understand the whole flow here:

How to do?
Implementing this is quite simple by following these steps:
Take the user's file input.
Perform the indexing process as described in the previous chapter.
Take the user's query.
Make an LLM call with an effective
SYSTEM_PROMPTand ask it to regenerate the query into3ornqueries, each focusing on different aspects of the original query.Follow the Retrieval & Generation steps from the last chapter again.
Here’s a code snippet of Query Transformation:
finalResponse = ""
def fan_out():
user_query = input(">> ")
global finalResponse
finalResponse = "" # Reset the final response for each new query
FAN_OUT_SYSTEM_PROMPT = """
You are a helpful assistant. You will be provided with a question and you need to generate 3 questions out of it focusing on different aspects of it or related to it. The focus should be on what user might be interested in and maybe he couldn't ask it directly. You need to generate them.
Rules:
- Follow the output JSON format.
Example:
User Query: How does garbage collection workin python?
Output: {{ "q1": "What triggers garbage collection in python?", "q2": "Garbage collection algorithms in Python?", "q3": "How memory leaks relate to GC?" }}
"""
response = client.chat.completions.create(
model="gemini-2.0-flash",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": FAN_OUT_SYSTEM_PROMPT},
{
"role": "user",
"content": user_query
}
]
)
content = response.choices[0].message.content
print("Fan out response:", content)
# Parse the JSON response
parsed_response = json.loads(content)
# Extract the questions
questions = [parsed_response["q1"], parsed_response["q2"], parsed_response["q3"]]
print("Questions:", questions)
# Call the retrieval_generation function for each question
for question in questions:
retrieval_generation(question)
print("Final response:", finalResponse)
Get the full Code here…
This setup was working well for Shreya, and she was jumping on the sofa in excitement.
Issue
Did she face any issues again? Yes, her joy was short-lived, and soon she encountered another problem. She realized she was receiving a lot more content that she wasn't interested in and hadn't asked for. She was frustrated to see such long responses, even when she asked a simple question.
For example, Now if she asks:
What was the most common control systems topic asked in last 5 years?
In response, she’s getting:
What was the most common control systems topic asked in last 5 years?
What are control systems and its usage.
What important topics does control systems include?
What was the most common topics asked in last 5 years? (From other subjects as well)
And she thought, this isn't a foolproof solution, and she needs to make more improvements. Let's see in the next chapter what idea she comes up with.
Shreya, facing unsatisfactory responses from her system, explores the Parallel Query Retrieval or Fan-out Technique to enhance the quality of information retrieval. This approach involves breaking down queries into multiple focused sub-queries, which individually target different aspects of the original question. For instance, a comprehensive question on LTI systems is divided into queries asking for definitions, examples, and tricky MCQs. This method initially proves effective, but eventually leads to excessive and irrelevant information. The narrative outlines Shreya's ongoing challenge to refine her system's response quality.



