Wikipedia

Wikipediaopen in new window 是一个多语言的免费在线百科全书,由一群志愿者社区(称为维基人)通过开放协作和使用名为 MediaWiki 的基于 Wiki 的编辑系统编写和维护。Wikipedia 是历史上最大且最受欢迎的参考工具。

本文档展示了如何将 wikipedia.org 的维基页面加载到我们在后续处理中使用的文档格式中。

安装

首先,您需要安装 wikipedia Python 包。

#!pip install wikipedia

示例

WikipediaLoader has these arguments:

  • query: free text which used to find documents in Wikipedia
  • optional lang: default="en". Use it to search in a specific language part of Wikipedia
  • optional load_max_docs: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.
  • optional load_all_available_meta: default=False. By default only the most important fields downloaded: Published (date when document was published/last updated), title, Summary. If True, other fields also downloaded.
from langchain.document_loaders import WikipediaLoader
docs = WikipediaLoader(query='HUNTER X HUNTER', load_max_docs=2).load()
len(docs)
docs[0].metadata  # meta-information of the Document
docs[0].page_content[:400]  # a content of the Document 

Last Updated:
Contributors: 刘强