Azure Cognitive Services 工具包
该工具包用于与 Azure Cognitive Services API 进行交互,以实现一些多模态功能。
目前,该工具包中包含四个工具:
- AzureCogsImageAnalysisTool:用于从图像中提取标题、对象、标签和文本。(注意:由于依赖于目前仅在 Windows 和 Linux 上支持的
azure-ai-vision
包,此工具目前尚不支持在 Mac OS 上使用。) - AzureCogsFormRecognizerTool:用于从文档中提取文本、表格和键值对。
- AzureCogsSpeech2TextTool:用于将语音转录为文本。
- AzureCogsText2SpeechTool:用于将文本合成为语音。
首先,您需要设置一个 Azure 帐户并创建一个 Cognitive Services 资源。您可以按照这里的说明创建资源。
然后,您需要获取资源的终结点、密钥和区域,并将它们设置为环境变量。您可以在资源的 "Keys and Endpoint" 页面找到它们。
# !pip install --upgrade azure-ai-formrecognizer > /dev/null
# !pip install --upgrade azure-cognitiveservices-speech > /dev/null
# For Windows/Linux
# !pip install --upgrade azure-ai-vision > /dev/null
import os
os.environ["OPENAI_API_KEY"] = "sk-"
os.environ["AZURE_COGS_KEY"] = ""
os.environ["AZURE_COGS_ENDPOINT"] = ""
os.environ["AZURE_COGS_REGION"] = ""
创建工具包
from langchain.agents.agent_toolkits import AzureCognitiveServicesToolkit
toolkit = AzureCognitiveServicesToolkit()
[tool.name for tool in toolkit.get_tools()]
['Azure Cognitive Services Image Analysis',
'Azure Cognitive Services Form Recognizer',
'Azure Cognitive Services Speech2Text',
'Azure Cognitive Services Text2Speech']
在代理中使用
from langchain import OpenAI
from langchain.agents import initialize_agent, AgentType
llm = OpenAI(temperature=0)
agent = initialize_agent(
tools=toolkit.get_tools(),
llm=llm,
agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
)
agent.run("What can I make with these ingredients?"
"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png")
Entering new AgentExecutor chain...
Action:
```
{
"action": "Azure Cognitive Services Image Analysis",
"action_input": "https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png"
}
```
Observation: Caption: a group of eggs and flour in bowls
Objects: Egg, Egg, Food
Tags: dairy, ingredient, indoor, thickening agent, food, mixing bowl, powder, flour, egg, bowl
Thought: I can use the objects and tags to suggest recipes
Action:
```
{
"action": "Final Answer",
"action_input": "You can make pancakes, omelettes, or quiches with these ingredients!"
}
```
Finished chain.
'You can make pancakes, omelettes, or quiches with these ingredients!'
audio_file = agent.run("Tell me a joke and read it out for me.")
Entering new AgentExecutor chain...
Action:
```
{
"action": "Azure Cognitive Services Text2Speech",
"action_input": "Why did the chicken cross the playground? To get to the other slide!"
}
```
Observation: /tmp/tmpa3uu_j6b.wav
Thought: I have the audio file of the joke
Action:
```
{
"action": "Final Answer",
"action_input": "/tmp/tmpa3uu_j6b.wav"
}
```
Finished chain.
'/tmp/tmpa3uu_j6b.wav'
from IPython import display
audio = display.Audio(audio_file)
display.display(audio)