时间加权的向量存储器
该检索器结合了语义相似性和时间衰减来进行检索。
其评分算法为:
semantic_similarity + (1.0 - decay_rate) ** hours_passed
需要注意的是,hours_passed
指的是自检索器中的对象上次访问以来经过的小时数,而不是自创建以来经过的小时数。这意味着经常访问的对象保持“新鲜”。
import faiss
from datetime import datetime, timedelta
from langchain.docstore import InMemoryDocstore
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain.schema import Document
from langchain.vectorstores import FAISS
低衰减率
较低的 decay rate
(在这里,为了极端情况,我们将其设置接近0)意味着记忆将被“记住”更长时间。 decay rate
为0意味着记忆永远不会被遗忘,使得该检索器等效于向量查找。
# Define your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.0000000000000000000000001, k=1)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents([Document(page_content="hello world", metadata={"last_accessed_at": yesterday})])
retriever.add_documents([Document(page_content="hello foo")])
['d7f85756-2371-4bdf-9140-052780a0f9b3']
# "Hello World" is returned first because it is most salient, and the decay rate is close to 0., meaning it's still recent enough
retriever.get_relevant_documents("hello world")
[Document(page_content='hello world', metadata={'last_accessed_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 678341), 'created_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 279596), 'buffer_idx': 0})]
高衰减率
当衰减率较高(例如,多个9)时,recency score
快速降至0!如果将其设为1,所有对象的 recency
都为0,这又使得它等效于向量查找。
# Define your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.999, k=1)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents([Document(page_content="hello world", metadata={"last_accessed_at": yesterday})])
retriever.add_documents([Document(page_content="hello foo")])
['40011466-5bbe-4101-bfd1-e22e7f505de2']
# "Hello Foo" is returned first because "hello world" is mostly forgotten
retriever.get_relevant_documents("hello world")
[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 4, 16, 22, 9, 2, 494798), 'created_at': datetime.datetime(2023, 4, 16, 22, 9, 2, 178722), 'buffer_idx': 1})]
虚拟时间
使用LangChain中的一些工具,您可以模拟时间组件。
from langchain.utils import mock_now
import datetime
# Notice the last access time is that date time
with mock_now(datetime.datetime(2011, 2, 3, 10, 11)):
print(retriever.get_relevant_documents("hello world"))
[Document(page_content='hello world', metadata={'last_accessed_at': MockDateTime(2011, 2, 3, 10, 11), 'created_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 279596), 'buffer_idx': 0})]