Skip to main content

Command Palette

Search for a command to run...

Understanding HyDE: A Guide to Hypothetical Document Embeddings

Updated
Understanding HyDE: A Guide to Hypothetical Document Embeddings

In the previous section, we saw that Shreya didn’t get the expected response when she asked:

Explain the difference between waveguide and coaxial cable in practical applications.

The system returned partial matches or generic definitions—not the crisp, real-world comparison she expected.

HyDE – Hypothetical Document Embeddings

Shreya realized that the issue wasn’t the retrieval model or the LLM. It was that her query was too real-world, and her dataset was full of exam-oriented phrasing.

This is where Hypothetical Document Embeddings (HyDE) came to the rescue.

Instead of searching the vector database with the raw user query, HyDE first asks the LLM to generate a “document”—a short, hypothetical paragraph that might resemble the ideal answer to the question. Then that generated paragraph is embedded and used for retrieval.

Steps

Here are the steps followed in this approach:

  1. Take the user's query as input.

  2. Provide it to an LLM and ask it to write a Document on the topic.

  3. Use this document to perform a similarity_search.

  4. Retrieve the chunks from the similarity_search in Step-3 and provide them to the LLM along with the user's original query.

  5. Return the response given by the LLM to the user.

Why will it work?

Before understanding Why will it work?, let’s recall what was the actual issue in the previous section because of which it wasn’t working.

Since the user's query was very real-world and contained broken English, while her document was full of technical phrases and jargon, the system struggled. When it applied similarity_search to the user's query, the matching chunks returned were not very good, leading to a lower quality response from the LLM.

Now, instead of directly using the user's query for similarity_search, we ask an LLM to write a document on the topic. The document created by the LLM will include all the technical phrases and jargon used in the industry. So, when we perform similarity_search on this document, the matching documents will be much more accurate and will cover the topic thoroughly. This ultimately leads to a better response from the LLM.

How to do?

If you have followed the series till here, implementing this certainly would not be the big challenge for you. Still here’s the full code for you:

import os
import json
from collections import defaultdict
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from openai import OpenAI

GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

def load_and_split_documents(pdf_path):
    """Load PDF and split into chunks"""
    loader = PyPDFLoader(file_path=pdf_path)
    docs = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )

    split_docs = text_splitter.split_documents(docs)

    print("Number of documents before splitting:", len(docs))
    print("Number of documents after splitting:", len(split_docs))

    return split_docs

def setup_vector_store(split_docs, embedder):
    """Initialize vector store with documents"""
    vector_store = QdrantVectorStore.from_documents(
        documents=split_docs,
        url="http://localhost:6333",
        collection_name="learning_langchain",
        embedding=embedder
    )
    return vector_store

def generate_document(client, user_query):
    """Break out the user query into multiple smaller steps"""
    GENERATE_DOCUMENT_SYSTEM_PROMPT = """
    You are a helpful assistant. You will be provided with a question and you need to write a proper document on the topics included in it. Use proper technical phrases and terms used in the related industry. 
    """

    response = client.chat.completions.create(
        model="gemini-1.5-flash",
        messages=[
            {"role": "system", "content": GENERATE_DOCUMENT_SYSTEM_PROMPT},
            {
                "role": "user",
                "content": user_query
            }
        ]
    )
    content = response.choices[0].message.content
    print("Generate Document response:", content)

    return content

def similarity_search(vector_store, query):
    """Perform similarity search for a given query"""
    relevant_chunks = vector_store.similarity_search(query=query)
    return relevant_chunks

def retrieval_generation(client, query, context_docs):
    """Generate an answer based on query and context"""
    # Format context from documents
    context = "\n\n".join([doc.page_content for doc in context_docs])
    print(context)

    GENERATION_SYSTEM_PROMPT = f"""
    You are a helpful assistant. You will be provided with a question and relevant context filtered according to user's query. 
    Your task is to provide a concise answer based on the context.

    Context: {context}
    """

    response = client.chat.completions.create(
        model="gemini-2.0-flash",
        messages=[
            {"role": "system", "content": GENERATION_SYSTEM_PROMPT},
            {"role": "user", "content": query}
        ]
    )
    return response.choices[0].message.content

def main():
    # Initialize components
    pdf_path = Path("./nodejs.pdf")
    split_docs = load_and_split_documents(pdf_path)

    embedder = GoogleGenerativeAIEmbeddings(
        model="models/text-embedding-004",
        google_api_key=GOOGLE_API_KEY,
    )

    vector_store = setup_vector_store(split_docs, embedder)

    # Create client for chatting
    client = OpenAI(
        api_key=GOOGLE_API_KEY,
        base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
    )

    # Main interaction loop
    while True:
        user_query = input(">> ")
        if user_query.lower() in ["exit", "quit", "q"]:
            break

        # Generate related questions
        content = generate_document(client, user_query)

        # Final generation that uses all previous context
        relevant_chunks = similarity_search(vector_store, content)
        print(f"Final query: {len(relevant_chunks)} relevant chunks found.")
        final_generation = retrieval_generation(client, content, relevant_chunks)
        print(f"Final Answer: {final_generation}")


if __name__ == "__main__":
    main()

In the article, I explore the use of Hypothetical Document Embeddings (HyDE) to improve document retrieval and information extraction from large datasets, especially when dealing with real-world queries that differ significantly from the technical jargon in the dataset. By generating a hypothetical document that fits the technical tone of industry-standard language, HyDE enhances the accuracy of similarity searches, leading to more relevant document retrieval and improved responses from language models. The article includes a detailed breakdown of the steps in this process and provides an implementation using Python, LangChain, and OpenAI’s generative AI models.

RAGs

Part 1 of 5

In this series, we’ll walk through the practical and technical aspects of building a RAG pipeline, with code examples and real-world use cases. Our anchor example will be a project called TalkToPDF, a tool that lets you “chat” with your PDFs.

Up next

Chain of Thoughts rescue Shreya

In the previous section, we saw that although Shreya made a good improvement in her system and it worked well for a few prompts, it still struggled with complex tasks. In such cases, the LLM was hallucinating and not performing well. Chain of Thought...