Qdrant
Qdrant(读作:quadrant)是一个向量相似性搜索引擎。它提供了一个方便的 API,用于存储、搜索和管理带有附加载荷的点向量,并可用于扩展过滤支持。
Qdrant
适用于各种神经网络或基于语义的匹配、分面搜索和其他应用。
本文档展示了如何使用与 Qdrant
向量数据库相关的功能。
Qdrant
有多种运行模式,具体取决于所选择的模式,会有一些细微的差别。选项包括:
- 本地模式,无需服务器
- 本地部署服务器
- Qdrant Cloud
请参阅安装指南。
!pip install qdrant-client
我们希望使用 OpenAIEmbeddings
,因此需要获取 OpenAI API 密钥。
import os
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
OpenAI API Key: ········
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Qdrant
from langchain.document_loaders import TextLoader
loader = TextLoader('../../../state_of_the_union.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
从 LangChain 连接到 Qdrant
本地模式
Python 客户端允许您在本地模式下运行相同的代码,而无需运行 Qdrant 服务器。这对于测试和调试非常方便,或者如果您计划存储的向量数量很少。嵌入向量可以完全保存在内存中,也可以持久化存储在磁盘上。
内存中存储
对于某些测试场景和快速实验,您可能更喜欢仅将所有数据保存在内存中,这样当客户端被销毁时(通常是在脚本/笔记本的结尾),数据将丢失。
qdrant = Qdrant.from_documents(
docs, embeddings,
location=":memory:", # Local mode with in-memory storage only
collection_name="my_documents",
)
磁盘存储
在本地模式下,不使用 Qdrant 服务器,您也可以将向量存储在磁盘上,以便在运行之间持久化保存。
qdrant = Qdrant.from_documents(
docs, embeddings,
path="/tmp/local_qdrant",
collection_name="my_documents",
)
本地服务器部署
无论您选择使用 Docker 容器本地启动 Qdrant,还是选择使用 官方的 Helm Chart 部署在 Kubernetes 上,您连接到该实例的方式都是相同的。您需要提供指向该服务的 URL。
url = "<---在此处填入 qdrant 的 URL--->"
qdrant = Qdrant.from_documents(
docs, embeddings,
url, prefer_grpc=True,
collection_name="my_documents",
)
Qdrant Cloud
如果您不想自己管理基础架构,可以选择在 Qdrant Cloud 上设置一个完全托管的 Qdrant 集群。试用期间,您可以免费使用永久的 1GB 集群。使用托管版 Qdrant 的主要区别是您需要提供 API 密钥以保护部署不被公开访问。
url = "<---qdrant cloud cluster url here --->"
api_key = "<---api key here--->"
qdrant = Qdrant.from_documents(
docs, embeddings,
url, prefer_grpc=True, api_key=api_key,
collection_name="my_documents",
)
重用同一集合
无论是 Qdrant.from_texts
还是 Qdrant.from_documents
方法都非常适合在 LangChain 中开始使用 Qdrant,但是它们将会销毁现有集合并从头创建!如果您想重用现有集合,您可以自己创建一个 Qdrant
实例,并传递带有连接详细信息的 QdrantClient
实例。
del qdrant
import qdrant_client
client = qdrant_client.QdrantClient(
path="/tmp/local_qdrant", prefer_grpc=True
)
qdrant = Qdrant(
client=client, collection_name="my_documents",
embeddings=embeddings
)
相似性搜索
使用 Qdrant 向量存储的最简单场景是执行相似性搜索。在底层,我们的查询将使用 embedding_function
对其进行编码,并用于在 Qdrant 集合中查找相似的文档。
query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.similarity_search(query)
print(found_docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
带有分数的相似性搜索
有时我们可能希望执行搜索,同时获得一个相关性分数,以了解特定结果的好坏程度。 返回的距离分数是余弦距离。因此,得分越低越好。
query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.similarity_search_with_score(query)
document, score = found_docs[0]
print(document.page_content)
print(f"\nScore: {score}")
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
Score: 0.8153784913324512
元数据过滤
Qdrant 具有广泛的过滤系统,支持丰富的类型。在 LangChain 中也可以使用过滤器,只需向 similarity_search_with_score
和 similarity_search
方法传递额外的参数即可。
from qdrant_client.http import models as rest
query = "总统对 Ketanji Brown Jackson 说了什么"
found_docs = qdrant.similarity_search_with_score(query, filter=rest.Filter(...))
最大边际相关性搜索 (MMR)
如果您想查找一些相似的文档,但又希望获得多样化的结果,最大边际相关性 (MMR) 是您应该考虑的方法。最大边际相关性在查询相似性和所选文档之间寻求相似性和多样性的平衡。
query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.max_marginal_relevance_search(query, k=2, fetch_k=10)
for i, doc in enumerate(found_docs):
print(f"{i + 1}.", doc.page_content, "\n")
1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
2. We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together.
I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera.
They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun.
Officer Mora was 27 years old.
Officer Rivera was 22.
Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers.
I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.
Qdrant 作为检索器
Qdrant,就像其他向量存储一样,是 LangChain 中的一个检索器,使用余弦相似性进行检索。
retriever = qdrant.as_retriever()
retriever
VectorStoreRetriever(vectorstore=<langchain.vectorstores.qdrant.Qdrant object at 0x7fc4e5720a00>, search_type='similarity', search_kwargs={})
It might be also specified to use MMR as a search strategy, instead of similarity.
retriever = qdrant.as_retriever(search_type="mmr")
retriever
VectorStoreRetriever(vectorstore=<langchain.vectorstores.qdrant.Qdrant object at 0x7fc4e5720a00>, search_type='mmr', search_kwargs={})
query = "What did the president say about Ketanji Brown Jackson"
retriever.get_relevant_documents(query)[0]
Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})
自定义 Qdrant
Qdrant 将您的向量嵌入与可选的 JSON 类似负载一起存储。负载是可选的,但由于 LangChain 假设嵌入是从文档生成的,因此我们保留上下文数据,以便您也可以提取原始文本。
默认情况下,您的文档将以以下负载结构存储:
{
"page_content": "Lorem ipsum dolor sit amet",
"metadata": {
"foo": "bar"
}
}
但是,您可以选择为页面内容和元数据使用不同的键。如果您已经有一个要重用的集合,这将非常有用。您可以随时更改键名称。
Qdrant.from_documents(
docs, embeddings,
location=":memory:",
collection_name="my_documents_2",
content_payload_key="my_page_content_key",
metadata_payload_key="my_meta",
)
<langchain.vectorstores.qdrant.Qdrant at 0x7fc4e2baa230>