假设文档嵌入

本文档介绍如何使用假设文档嵌入(HyDE),详细内容可参阅这篇论文open in new window

在高层次上,HyDE 是一种嵌入技术,它接受查询,生成一个假设的答案,然后将生成的文档嵌入,并将其作为最终的示例。

为了使用 HyDE,我们需要提供一个基础嵌入模型,以及一个可以用于生成这些文档的 LLMChain。默认情况下,HyDE 类带有一些默认的提示(有关详细信息,请参阅论文),但我们也可以创建自己的提示。

from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import LLMChain, HypotheticalDocumentEmbedder
from langchain.prompts import PromptTemplate
base_embeddings = OpenAIEmbeddings()
llm = OpenAI()
# Load with `web_search` prompt
embeddings = HypotheticalDocumentEmbedder.from_llm(llm, base_embeddings, "web_search")
# Now we can use it as any embedding class!
result = embeddings.embed_query("Where is the Taj Mahal?")

多次生成

我们还可以生成多个文档,然后将这些文档的嵌入组合起来。默认情况下,我们通过取平均值来组合这些嵌入。我们可以通过改变用于生成文档的 LLM 来实现这一点,使其返回多个结果。

multi_llm = OpenAI(n=4, best_of=4)
embeddings = HypotheticalDocumentEmbedder.from_llm(multi_llm, base_embeddings, "web_search")
result = embeddings.embed_query("Where is the Taj Mahal?")

使用自定义提示

除了使用预配置的提示外,我们还可以轻松构建自己的提示,并将其用于生成文档的 LLMChain 中。如果我们知道查询所在的领域,这样做可能很有用,因为我们可以将提示设定为生成与该领域更相似的文本。

在下面的示例中,让我们将其设定为生成关于国情咨文的文本(因为我们将在下一个示例中使用它)。

prompt_template = """Please answer the user's question about the most recent state of the union address
Question: {question}
Answer:"""
prompt = PromptTemplate(input_variables=["question"], template=prompt_template)
llm_chain = LLMChain(llm=llm, prompt=prompt)
embeddings = HypotheticalDocumentEmbedder(llm_chain=llm_chain, base_embeddings=base_embeddings)
result = embeddings.embed_query("What did the president say about Ketanji Brown Jackson")

使用 HyDE

现在我们有了 HyDE,我们可以像使用其他嵌入类一样使用它!下面是使用 HyDE 在国情咨文示例中查找相似段落的示例。

from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

with open("../../state_of_the_union.txt") as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)
docsearch = Chroma.from_texts(texts, embeddings)

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
print(docs[0].page_content)
In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. 

We cannot let this happen. 

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
Last Updated:
Contributors: 刘强