In the previous section, we saw that although Shreya made a good improvement in her system and it worked well for a few prompts, it still struggled with complex tasks. In such cases, the LLM was hallucinating and not performing well.

Chain of Thoughts / Less Abstract Query Transformation Technique

As Shreya sat at her drawing board, pondering a solution, she remembered her mother's advice:

Break complex problems into smaller subproblems and solve them one by one.

Shreya had an idea and felt confident it might work. Here's her plan:

The question she had asked last night was:

Trace how digital logic topics expanded in the last five years.

What if the LLM takes her mother's advice seriously and uses this approach? For example, the question above can be broken down into these steps:

Identify syllabus changes per year

Summarize each trend

Stitch them into a timeline

After breaking it into subproblems, she is fully confident that her system will be able to perform the task perfectly.

In short, the main idea is to break the query into multiple, less abstract subqueries. This way, the LLM can better understand their task.

Let’s look at another example from Google's white paper. It suggests breaking down the prompt

Think Machine Learning

into:

First, think about the machine.

Next, think about learning.

Finally, think about machine learning.

Let's explore her approach in detail using the flow diagram she created:

Here are the steps she wants her system to follow:

Take the user's query as input.
Give the query to the LLM and ask it to break it down into smaller subproblems or steps that can be solved easily.
Perform the next steps synchronously.
Take the first step given by the LLM and perform Retrieval and Generation steps just as in previous sections. You can use any of the Fan-out, Reciprocal-rank fusion, or even a simple generation technique. Let's say the generation was G1.
Now take the second step given by the LLM, append G1 to it, and pass it to Retrieval and Generation steps, as done in step 4. Let's say the generation was G2.
Follow the same pattern for all the steps given by the LLM. For example, G2 will be appended in Step-3, G3 in Step-4, and so on.
In the end, the final generation Gn will be given to the LLM along with the original user's query for the final generation.
This response can then be directly provided to the user.

Why will it work?

Some of you might be wondering why this unusual approach would even work. Let's use Shreya's query as an example to understand it better.

The query was:

Trace how digital logic topics expanded in the last five years.

Since this query wasn't simple enough to just find some chunks from a database, analyze a few paragraphs, and return an answer, it required a lot of computation before reaching a conclusion.

Let’s assume, during the query-breaking phase, the LLM broke the query down in these steps:

Identify syllabus changes per year.

Summarize each trend.

Stitch them into a timeline.

Step-1 (Identify syllabus changes per year.)

Let's start with the first step: Identify syllabus changes per year. When this query is processed through the similarity_search and generation steps, don't you think that with the accuracy Shreya has achieved in her LLM so far, her system will be able to answer it efficiently? Yes.

Step-2 (Summarize each trend.)

After successfully completing Step-1, the Generation has gathered all the data on how the syllabus has changed over the years. Now, if that data is provided along with this step’s query, which is Summarize each trend, don't you think the LLM will effectively summarize it and provide a clear response? Absolutely.

Step-3 (Stitch them into a timeline.)

After successfully completing Step-2, the Generation has collected all the data on the syllabus changes and how these trends have developed. With this information, the LLM can certainly create a timeline. Do you agree?

After finishing this step, we have fully contextual raw data after many filtering processes. Now, we just need to Polish it according to the user’s original query, and that’s what we do.

Pass the Generation of the final step along with the user’s original query to the LLM and return the response to the user.

How to do?

Implementing this is quite simple if you've been following the series up to this point. Here is the code snippet for breaking the query into smaller steps:

def generate_steps(client, user_query):
    """Break out the user query into multiple smaller steps"""
    GENERATE_STEPS_SYSTEM_PROMPT = """
    You are a helpful assistant. You will be provided with a question and you need to break it into 3 simpler & sequential steps to solve the problem. What steps do you think would be best to solve the problem?

    Rules:
    - Follow the output JSON format.
    - The `content` in output JSON must be a list of steps.

    Example:
    User Query: How to handle file-uploads on server?
    Output: { "type": "steps", "content": ["Accept file from req.files. Take help of multer to do that.", "Upload file to the S3 bucket or any other db and take out public url", "Store that public url in actual database"] }
    """

    response = client.chat.completions.create(
        model="gemini-1.5-flash",
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": GENERATE_STEPS_SYSTEM_PROMPT},
            {
                "role": "user",
                "content": user_query
            }
        ]
    )
    content = response.choices[0].message.content
    print("Query Breaker response:", content)

    # Parse the JSON response
    parsed_response = json.loads(content)

    # Extract the steps
    steps = parsed_response["content"]
    print("Generated steps:", steps)

    return steps

Get the full code here…

Issue

Shreya was on a roll. With her system now able to answer complex queries using fan-out retrieval and even create thoughtful summaries through chain-of-thought prompts, she felt almost unstoppable.

One evening, while revising Electromagnetics, she typed:

“Explain the difference between waveguide and coaxial cable in practical applications.”

To her surprise, the system returned partial matches or generic definitions—not the crisp, real-world comparison she expected.

This made her realize that the job isn't done yet! She'll be back in the next chapter with a possible solution.

Shreya encountered limitations in her language model system when faced with complex tasks. Inspired by her mother's advice, she developed a method to break down these tasks into smaller, manageable subproblems. Her approach involves using a less abstract query transformation technique to enhance the model's comprehension and performance. By iteratively processing each subproblem, Shreya's system aims to deliver a polished final response. Although she made significant progress, an issue with generating specific comparisons highlighted the ongoing challenge of refining the system's capabilities.

Chain of Thoughts rescue Shreya

Chain of Thoughts / Less Abstract Query Transformation Technique

Why will it work?

Step-1 (Identify syllabus changes per year.)

Step-2 (Summarize each trend.)

Step-3 (Stitch them into a timeline.)

How to do?

Get the full code here…

Issue

Comments

RAGs

Understanding HyDE: A Guide to Hypothetical Document Embeddings

More from this blog

Understanding HyDE: A Guide to Hypothetical Document Embeddings

Reciprocal Rank Fusion Aids Shreya

Shreya Finds Excitement Again with Fan-Out Magic

Understanding RAG: A Comprehensive Intro and Shreya's Story

Command Palette

Chain of Thoughts / Less Abstract Query Transformation Technique

Why will it work?

Step-1 (Identify syllabus changes per year.)

Step-2 (Summarize each trend.)

Step-3 (Stitch them into a timeline.)

How to do?

Get the full code here…

Issue

Comments

RAGs

Understanding HyDE: A Guide to Hypothetical Document Embeddings

More from this blog