自定义 LLM Agent

本笔记本介绍如何创建自定义 LLM Agent。

LLM Agent 包含三个部分：

PromptTemplate：这是用于指导语言模型执行任务的提示模板。
LLM：这是驱动 Agent 的语言模型。
stop 序列：它指示 LLM 在找到此字符串时停止生成。

OutputParser：决定如何将 LLMOutput 解析为 AgentAction 或 AgentFinish 对象。

LLMAgent 用于 AgentExecutor。AgentExecutor 可以被看作是一个循环，它的工作流程如下：

将用户输入和任何先前步骤传递给 Agent（在本例中是 LLMAgent）。
如果 Agent 返回 AgentFinish，则将其直接返回给用户。
如果 Agent 返回 AgentAction，则使用该动作调用工具，并获取 Observation。
重复上述步骤，将 AgentAction 和 Observation 传递回 Agent，直到返回 AgentFinish。

AgentAction 是一个响应，包含 action 和 action_input。action 指定要使用的工具，action_input 指定该工具的输入。log 可以提供更多上下文信息（用于日志记录、追踪等）。

AgentFinish 是一个响应，包含要发送给用户的最终消息。这应该用于结束 Agent 的运行。

在本笔记本中，我们将逐步介绍如何创建自定义 LLM Agent。

设置环境

进行必要的导入等操作。

from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import StringPromptTemplate
from langchain import OpenAI, SerpAPIWrapper, LLMChain
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish
import re

设置工具

设置 Agent 可以使用的工具。这可能需要在提示中指定（以便 Agent 知道何时使用这些工具）。

# 定义 Agent 可以使用的工具来回答用户的查询
search = SerpAPIWrapper()
tools = [
    Tool(
        name="Search",
        func=search.run,
        description="有助于回答有关当前事件的问题"
    )
]

Prompt Template

这个提示模板指导 Agent 如何执行任务。通常，模板应该包括：

tools：Agent 可以访问的工具，以及如何调用它们和何时调用它们。
intermediate_steps：这些是先前的 (AgentAction, Observation) 对。通常不直接传递给模型，但提示模板会以特定的方式格式化它们。
input：通用的用户输入。

# Set up the base template
template = """Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Remember to speak as a pirate when giving your final answer. Use lots of "Arg"s

Question: {input}
{agent_scratchpad}"""

# Set up a prompt template
class CustomPromptTemplate(StringPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[Tool]
    
    def format(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)
        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "
        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts
        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        return self.template.format(**kwargs)

prompt = CustomPromptTemplate(
    template=template,
    tools=tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    input_variables=["input", "intermediate_steps"]
)

Output Parser

The output parser is responsible for parsing the LLM output into AgentAction and AgentFinish. This usually depends heavily on the prompt used.

This is where you can change the parsing to do retries, handle whitespace, etc

class CustomOutputParser(AgentOutputParser):
    
    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )
        # Parse out the action and action input
        regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

output_parser = CustomOutputParser()

设置 LLM

选择要使用的 LLM！

llm = OpenAI(temperature=0)

定义停止序列

这很重要，因为它告诉 LLM 何时停止生成。

这在很大程度上取决于您使用的提示和模型。通常，您希望它是您在提示中使用的标记，用于表示 Observation 的开始（否则，LLM 可能会为您产生幻觉的观察结果）。

设置 Agent

现在，我们可以将所有内容组合起来设置我们的 Agent。

# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)

tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm_chain=llm_chain, 
    output_parser=output_parser,
    stop=["\nObservation:"], 
    allowed_tools=tool_names
)

使用 Agent

现在我们可以使用它！

agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)

agent_executor.run("How many people live in canada as of 2023?")

Entering new AgentExecutor chain...
Thought: I need to find out the population of Canada in 2023
Action: Search
Action Input: Population of Canada in 2023

Observation:The current population of Canada is 38,658,314 as of Wednesday, April 12, 2023, based on Worldometer elaboration of the latest United Nations data. I now know the final answer
Final Answer: Arrr, there be 38,658,314 people livin' in Canada as of 2023!

Finished chain.

"Arrr, there be 38,658,314 people livin' in Canada as of 2023!"

添加记忆

如果您想要为代理添加记忆功能，您需要：

在自定义提示中添加一个聊天历史记录的位置。
在代理执行器中添加一个记忆对象。

# Set up the base template
template_with_history = """Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Remember to speak as a pirate when giving your final answer. Use lots of "Arg"s

Previous conversation history:
{history}

New question: {input}
{agent_scratchpad}"""

prompt_with_history = CustomPromptTemplate(
    template=template_with_history,
    tools=tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    input_variables=["input", "intermediate_steps", "history"]
)

llm_chain = LLMChain(llm=llm, prompt=prompt_with_history)

tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm_chain=llm_chain, 
    output_parser=output_parser,
    stop=["\nObservation:"], 
    allowed_tools=tool_names
)

from langchain.memory import ConversationBufferWindowMemory

memory=ConversationBufferWindowMemory(k=2)

agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory)

agent_executor.run("How many people live in canada as of 2023?")

Entering new AgentExecutor chain...
Thought: I need to find out the population of Canada in 2023
Action: Search
Action Input: Population of Canada in 2023

Observation:The current population of Canada is 38,658,314 as of Wednesday, April 12, 2023, based on Worldometer elaboration of the latest United Nations data. I now know the final answer
Final Answer: Arrr, there be 38,658,314 people livin' in Canada as of 2023!

Finished chain.

"Arrr, there be 38,658,314 people livin' in Canada as of 2023!"

agent_executor.run("how about in mexico?")

Entering new AgentExecutor chain...
Thought: I need to find out how many people live in Mexico.
Action: Search
Action Input: How many people live in Mexico as of 2023?

Observation:The current population of Mexico is 132,679,922 as of Tuesday, April 11, 2023, based on Worldometer elaboration of the latest United Nations data. Mexico 2020 ... I now know the final answer.
Final Answer: Arrr, there be 132,679,922 people livin' in Mexico as of 2023!

Finished chain.

"Arrr, there be 132,679,922 people livin' in Mexico as of 2023!"

# 自定义 LLM Agent

# 设置环境

# 设置工具

# Prompt Template

# Output Parser

# 设置 LLM

# 定义停止序列

# 设置 Agent

# 使用 Agent

# 添加记忆