Vector DB 文本生成
本文档介绍如何使用 LangChain 在矢量索引上进行文本生成。这对于我们希望生成能够利用大量自定义文本的文本非常有用,例如生成具有对先前编写的博客文章的理解的博客文章,或者可以参考产品文档的产品教程。
准备数据
首先,我们准备数据。在这个例子中,我们从 Github 获取了一个由 Markdown 文件组成的文档网站,并将它们拆分为足够小的文档。
from langchain.llms import OpenAI
from langchain.docstore.document import Document
import requests
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.prompts import PromptTemplate
import pathlib
import subprocess
import tempfile
def get_github_docs(repo_owner, repo_name):
with tempfile.TemporaryDirectory() as d:
subprocess.check_call(
f"git clone --depth 1 https://github.com/{repo_owner}/{repo_name}.git .",
cwd=d,
shell=True,
)
git_sha = (
subprocess.check_output("git rev-parse HEAD", shell=True, cwd=d)
.decode("utf-8")
.strip()
)
repo_path = pathlib.Path(d)
markdown_files = list(repo_path.glob("*/*.md")) + list(
repo_path.glob("*/*.mdx")
)
for markdown_file in markdown_files:
with open(markdown_file, "r") as f:
relative_path = markdown_file.relative_to(repo_path)
github_url = f"https://github.com/{repo_owner}/{repo_name}/blob/{git_sha}/{relative_path}"
yield Document(page_content=f.read(), metadata={"source": github_url})
sources = get_github_docs("yirenlu92", "deno-manual-forked")
source_chunks = []
splitter = CharacterTextSplitter(separator=" ", chunk_size=1024, chunk_overlap=0)
for source in sources:
for chunk in splitter.split_text(source.page_content):
source_chunks.append(Document(page_content=chunk, metadata=source.metadata))
Cloning into '.'...
设置 Vector DB
现在我们将文档内容分块后,让我们将所有这些信息放入一个矢量索引中,以便轻松检索。
search_index = Chroma.from_documents(source_chunks, OpenAIEmbeddings())
设置带有自定义提示的 LLM Chain
接下来,让我们设置一个简单的 LLM Chain,但为其提供一个用于博客文章生成的自定义提示。请注意,自定义提示是参数化的,并接受两个输入:context
,它将是从矢量搜索中获取的文档,以及topic
,它由用户提供。
from langchain.chains import LLMChain
prompt_template = """Use the context below to write a 400 word blog post about the topic below:
Context: {context}
Topic: {topic}
Blog post:"""
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "topic"]
)
llm = OpenAI(temperature=0)
chain = LLMChain(llm=llm, prompt=PROMPT)
生成文本
最后,我们编写一个函数来将输入应用于链条。该函数接受一个输入参数 topic
。我们在矢量索引中找到与该 topic
对应的文档,并将它们作为我们简单 LLM Chain 中的额外上下文。
def generate_blog_post(topic):
docs = search_index.similarity_search(topic, k=4)
inputs = [{"context": doc.page_content, "topic": topic} for doc in docs]
print(chain.apply(inputs))
generate_blog_post("environment variables")
[{'text': '\n\nEnvironment variables are a great way to store and access sensitive information in your Deno applications. Deno offers built-in support for environment variables with `Deno.env`, and you can also use a `.env` file to store and access environment variables.\n\nUsing `Deno.env` is simple. It has getter and setter methods, so you can easily set and retrieve environment variables. For example, you can set the `FIREBASE_API_KEY` and `FIREBASE_AUTH_DOMAIN` environment variables like this:\n\n```ts\nDeno.env.set("FIREBASE_API_KEY", "examplekey123");\nDeno.env.set("FIREBASE_AUTH_DOMAIN", "firebasedomain.com");\n\nconsole.log(Deno.env.get("FIREBASE_API_KEY")); // examplekey123\nconsole.log(Deno.env.get("FIREBASE_AUTH_DOMAIN")); // firebasedomain.com\n```\n\nYou can also store environment variables in a `.env` file. This is a great'}, {'text': '\n\nEnvironment variables are a powerful tool for managing configuration settings in a program. They allow us to set values that can be used by the program, without having to hard-code them into the code. This makes it easier to change settings without having to modify the code.\n\nIn Deno, environment variables can be set in a few different ways. The most common way is to use the `VAR=value` syntax. This will set the environment variable `VAR` to the value `value`. This can be used to set any number of environment variables before running a command. For example, if we wanted to set the environment variable `VAR` to `hello` before running a Deno command, we could do so like this:\n\n```\nVAR=hello deno run main.ts\n```\n\nThis will set the environment variable `VAR` to `hello` before running the command. We can then access this variable in our code using the `Deno.env.get()` function. For example, if we ran the following command:\n\n```\nVAR=hello && deno eval "console.log(\'Deno: \' + Deno.env.get(\'VAR'}, {'text': '\n\nEnvironment variables are a powerful tool for developers, allowing them to store and access data without having to hard-code it into their applications. In Deno, you can access environment variables using the `Deno.env.get()` function.\n\nFor example, if you wanted to access the `HOME` environment variable, you could do so like this:\n\n```js\n// env.js\nDeno.env.get("HOME");\n```\n\nWhen running this code, you\'ll need to grant the Deno process access to environment variables. This can be done by passing the `--allow-env` flag to the `deno run` command. You can also specify which environment variables you want to grant access to, like this:\n\n```shell\n# Allow access to only the HOME env var\ndeno run --allow-env=HOME env.js\n```\n\nIt\'s important to note that environment variables are case insensitive on Windows, so Deno also matches them case insensitively (on Windows only).\n\nAnother thing to be aware of when using environment variables is subprocess permissions. Subprocesses are powerful and can access system resources regardless of the permissions you granted to the Den'}, {'text': '\n\nEnvironment variables are an important part of any programming language, and Deno is no exception. Deno is a secure JavaScript and TypeScript runtime built on the V8 JavaScript engine, and it recently added support for environment variables. This feature was added in Deno version 1.6.0, and it is now available for use in Deno applications.\n\nEnvironment variables are used to store information that can be used by programs. They are typically used to store configuration information, such as the location of a database or the name of a user. In Deno, environment variables are stored in the `Deno.env` object. This object is similar to the `process.env` object in Node.js, and it allows you to access and set environment variables.\n\nThe `Deno.env` object is a read-only object, meaning that you cannot directly modify the environment variables. Instead, you must use the `Deno.env.set()` function to set environment variables. This function takes two arguments: the name of the environment variable and the value to set it to. For example, if you wanted to set the `FOO` environment variable to `bar`, you would use the following code:\n\n```'}]