| acreom | acreom is a dev-first knowledge base with tasks running on local mark... |
| AirbyteLoader | Airbyte is a data integration platform for ELT pipelines from APIs, d... |
| Airbyte CDK (Deprecated) | Note: AirbyteCDKLoader is deprecated. Please use AirbyteLoader instea... |
| Airbyte Gong (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
| Airbyte Hubspot (Deprecated) | Note: AirbyteHubspotLoader is deprecated. Please use AirbyteLoader in... |
| Airbyte JSON (Deprecated) | Note: AirbyteJSONLoader is deprecated. Please use AirbyteLoader inste... |
| Airbyte Salesforce (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
| Airbyte Shopify (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
| Airbyte Stripe (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
| Airbyte Typeform (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
| Airbyte Zendesk Support (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
| Airtable | * Get your API key here. |
| Alibaba Cloud MaxCompute | Alibaba Cloud MaxCompute (previously known as ODPS) is a general purp... |
| Amazon Textract | Amazon Textract is a machine learning (ML) service that automatically... |
| Apify Dataset | Apify Dataset is a scalable append-only storage with sequential acces... |
| ArcGIS | This notebook demonstrates the use of the langchaincommunity.document... |
| ArxivLoader | arXiv is an open-access archive for 2 million scholarly articles in t... |
| AssemblyAI Audio Transcripts | The AssemblyAIAudioTranscriptLoader allows to transcribe audio files ... |
| AstraDB | DataStax Astra DB is a serverless vector-capable database built on Ca... |
| Async Chromium | Chromium is one of the browsers supported by Playwright, a library us... |
| AsyncHtml | AsyncHtmlLoader loads raw HTML from a list of URLs concurrently. |
| Athena | Amazon Athena is a serverless, interactive analytics service built |
| AWS S3 Directory | Amazon Simple Storage Service (Amazon S3) is an object storage service |
| AWS S3 File | Amazon Simple Storage Service (Amazon S3) is an object storage servic... |
| AZLyrics | AZLyrics is a large, legal, every day growing collection of lyrics. |
| Azure AI Data | Azure AI Studio provides the capability to upload data assets to clou... |
| Azure Blob Storage Container | Azure Blob Storage is Microsoft's object storage solution for the clo... |
| Azure Blob Storage File | Azure Files offers fully managed file shares in the cloud that are ac... |
| Azure AI Document Intelligence | Azure AI Document Intelligence (formerly known as Azure Form Recogniz... |
| BibTeX | BibTeX is a file format and reference management system commonly used... |
| BiliBili | Bilibili is one of the most beloved long-form video sites in China. |
| Blackboard | Blackboard Learn (previously the Blackboard Learning Management Syste... |
| Blockchain | Overview |
| Brave Search | Brave Search is a search engine developed by Brave Software. |
| Browserbase | Browserbase is a developer platform to reliably run, manage, and moni... |
| Browserless | Browserless is a service that allows you to run headless Chrome insta... |
| Cassandra | Cassandra is a NoSQL, row-oriented, highly scalable and highly availa... |
| ChatGPT Data | ChatGPT is an artificial intelligence (AI) chatbot developed by OpenA... |
| College Confidential | College Confidential gives information on 3,800+ colleges and univers... |
| Concurrent Loader | Works just like the GenericLoader but concurrently for those who choo... |
| Confluence | Confluence is a wiki collaboration platform that saves and organizes ... |
| CoNLL-U | CoNLL-U is revised version of the CoNLL-X format. Annotations are enc... |
| Copy Paste | This notebook covers how to load a document object from something you... |
| Couchbase | Couchbase is an award-winning distributed NoSQL cloud database that d... |
| CSV | A comma-separated values (CSV) file is a delimited text file that use... |
| Cube Semantic Layer | This notebook demonstrates the process of retrieving Cube's data mode... |
| Datadog Logs | Datadog is a monitoring and analytics platform for cloud-scale applic... |
| Dedoc | This sample demonstrates the use of Dedoc in combination with LangCha... |
| Diffbot | Diffbot is a suite of ML-based products that make it easy to structur... |
| Discord | Discord is a VoIP and instant messaging social platform. Users have t... |
| Docugami | This notebook covers how to load documents from Docugami. It provides... |
| Docusaurus | Docusaurus is a static-site generator which provides out-of-the-box d... |
| Dropbox | Dropbox is a file hosting service that brings everything-traditional ... |
| DuckDB | DuckDB is an in-process SQL OLAP database management system. |
| Email | This notebook shows how to load email (.eml) or Microsoft Outlook (.m... |
| EPub | EPUB is an e-book file format that uses the ".epub" file extension. T... |
| Etherscan | Etherscan is the leading blockchain explorer, search, API and analyt... |
| EverNote | EverNote is intended for archiving and creating notes in which photos... |
| Facebook Chat | Messenger) is an American proprietary instant messaging app and platf... |
| Fauna | Fauna is a Document Database. |
| Figma | Figma is a collaborative web application for interface design. |
| FireCrawl | FireCrawl crawls and convert any website into LLM-ready data. It craw... |
| Geopandas | Geopandas is an open-source project to make working with geospatial d... |
| Git | Git is a distributed version control system that tracks changes in an... |
| GitBook | GitBook is a modern documentation platform where teams can document e... |
| GitHub | This notebooks shows how you can load issues and pull requests (PRs) ... |
| Glue Catalog | The AWS Glue Data Catalog is a centralized metadata repository that a... |
| Google AlloyDB for PostgreSQL | AlloyDB is a fully managed relational database service that offers hi... |
| Google BigQuery | Google BigQuery is a serverless and cost-effective enterprise data wa... |
| Google Bigtable | Bigtable is a key-value and wide-column store, ideal for fast access ... |
| Google Cloud SQL for SQL server | Cloud SQL is a fully managed relational database service that offers ... |
| Google Cloud SQL for MySQL | Cloud SQL is a fully managed relational database service that offers ... |
| Google Cloud SQL for PostgreSQL | Cloud SQL for PostgreSQL is a fully-managed database service that hel... |
| Google Cloud Storage Directory | Google Cloud Storage is a managed service for storing unstructured da... |
| Google Cloud Storage File | Google Cloud Storage is a managed service for storing unstructured da... |
| Google Firestore in Datastore Mode | Firestore in Datastore Mode is a NoSQL document database built for au... |
| Google Drive | Google Drive is a file storage and synchronization service developed ... |
| Google El Carro for Oracle Workloads | Google El Carro Oracle Operator |
| Google Firestore (Native Mode) | Firestore is a serverless document-oriented database that scales to m... |
| Google Memorystore for Redis | Google Memorystore for Redis is a fully-managed service that is power... |
| Google Spanner | Spanner is a highly scalable database that combines unlimited scalabi... |
| Google Speech-to-Text Audio Transcripts | The GoogleSpeechToTextLoader allows to transcribe audio files with th... |
| Grobid | GROBID is a machine learning library for extracting, parsing, and re-... |
| Gutenberg | Project Gutenberg is an online library of free eBooks. |
| Hacker News | Hacker News (sometimes abbreviated as HN) is a social news website fo... |
| Huawei OBS Directory | The following code demonstrates how to load objects from the Huawei O... |
| Huawei OBS File | The following code demonstrates how to load an object from the Huawei... |
| HuggingFace dataset | The Hugging Face Hub is home to over 5,000 datasets in more than 100 ... |
| iFixit | iFixit is the largest, open repair community on the web. The site con... |
| Images | This covers how to load images into a document format that we can use... |
| Image captions | By default, the loader utilizes the pre-trained Salesforce BLIP image... |
| IMSDb | IMSDb is the Internet Movie Script Database. |
| Iugu | Iugu is a Brazilian services and software as a service (SaaS) company... |
| Joplin | Joplin is an open-source note-taking app. Capture your thoughts and s... |
| Jupyter Notebook | Jupyter Notebook (formerly IPython Notebook) is a web-based interacti... |
| Kinetica | This notebooks goes over how to load documents from Kinetica |
| lakeFS | lakeFS provides scalable version control over the data lake, and uses... |
| LarkSuite (FeiShu) | LarkSuite is an enterprise collaboration platform developed by ByteDa... |
| LLM Sherpa | This notebook covers how to use LLM Sherpa to load files of many type... |
| Mastodon | Mastodon is a federated social media and social networking service. |
| MediaWiki Dump | MediaWiki XML Dumps contain the content of a wiki (wiki pages with al... |
| Merge Documents Loader | Merge the documents returned from a set of specified data loaders. |
| mhtml | MHTML is a is used both for emails but also for archived webpages. MH... |
| Microsoft Excel | The UnstructuredExcelLoader is used to load Microsoft Excel files. Th... |
| Microsoft OneDrive | Microsoft OneDrive (formerly SkyDrive) is a file hosting service oper... |
| Microsoft OneNote | This notebook covers how to load documents from OneNote. |
| Microsoft PowerPoint | Microsoft PowerPoint is a presentation program by Microsoft. |
| Microsoft SharePoint | Microsoft SharePoint is a website-based collaboration system that use... |
| Microsoft Word | Microsoft Word is a word processor developed by Microsoft. |
| Near Blockchain | Overview |
| Modern Treasury | Modern Treasury simplifies complex payment operations. It is a unifie... |
| MongoDB | MongoDB is a NoSQL , document-oriented database that supports JSON-li... |
| News URL | This covers how to load HTML news articles from a list of URLs into a... |
| Notion DB 1/2 | Notion is a collaboration platform with modified Markdown support tha... |
| Notion DB 2/2 | Notion is a collaboration platform with modified Markdown support tha... |
| Nuclia | Nuclia automatically indexes your unstructured data from any internal... |
| Obsidian | Obsidian is a powerful and extensible knowledge base |
| Open Document Format (ODT) | The Open Document Format for Office Applications (ODF), also known as... |
| Open City Data | Socrata provides an API for city open data. |
| Oracle Autonomous Database | Oracle autonomous database is a cloud database that uses machine lear... |
| Oracle AI Vector Search: Document Processing | Oracle AI Vector Search is designed for Artificial Intelligence (AI) ... |
| Org-mode | A Org Mode document is a document editing, formatting, and organizing... |
| Pandas DataFrame | This notebook goes over how to load data from a pandas DataFrame. |
| Pebblo Safe DocumentLoader | Pebblo enables developers to safely load data and promote their Gen A... |
| Polars DataFrame | This notebook goes over how to load data from a polars DataFrame. |
| Psychic | This notebook covers how to load documents from Psychic. See here for... |
| PubMed | PubMedยฎ by The National Center for Biotechnology Information, Nationa... |
| PyPDFLoader | This notebook provides a quick overview for getting started with PyPD... |
| PySpark | This notebook goes over how to load data from a PySpark DataFrame. |
| Quip | Quip is a collaborative productivity software suite for mobile and We... |
| ReadTheDocs Documentation | Read the Docs is an open-sourced free software documentation hosting ... |
| Recursive URL | The RecursiveUrlLoader lets you recursively scrape all child links fr... |
| Reddit | Reddit is an American social news aggregation, content rating, and di... |
| Roam | ROAM is a note-taking tool for networked thought, designed to create ... |
| Rockset | Rockset is a real-time analytics database which enables queries on ma... |
| rspace | This notebook shows how to use the RSpace document loader to import r... |
| RSS Feeds | This covers how to load HTML news articles from a list of RSS feed UR... |
| RST | A reStructured Text (RST) file is a file format for textual data used... |
| scrapfly | ScrapFly |
| ScrapingAnt | Overview |
| Sitemap | Extends from the WebBaseLoader, SitemapLoader loads a sitemap from a ... |
| Slack | Slack is an instant messaging program. |
| Snowflake | This notebooks goes over how to load documents from Snowflake |
| Source Code | This notebook covers how to load source code files using a special ap... |
| Spider | Spider is the fastest and most affordable crawler and scraper that re... |
| Spreedly | Spreedly is a service that allows you to securely store credit cards ... |
| Stripe | Stripe is an Irish-American financial services and software as a serv... |
| Subtitle | The SubRip file format is described on the Matroska multimedia contai... |
| SurrealDB | SurrealDB is an end-to-end cloud-native database designed for modern ... |
| Telegram | Telegram Messenger is a globally accessible freemium, cross-platform,... |
| Tencent COS Directory | Tencent Cloud Object Storage (COS) is a distributed |
| Tencent COS File | Tencent Cloud Object Storage (COS) is a distributed |
| TensorFlow Datasets | TensorFlow Datasets is a collection of datasets ready to use, with Te... |
| TiDB | TiDB Cloud, is a comprehensive Database-as-a-Service (DBaaS) solution... |
| 2Markdown | 2markdown service transforms website content into structured markdown... |
| TOML | TOML is a file format for configuration files. It is intended to be e... |
| Trello | Trello is a web-based project management and collaboration tool that ... |
| TSV | A tab-separated values (TSV) file is a simple, text-based file format... |
| Twitter | Twitter is an online social media and social networking service. |
| Unstructured | This notebook covers how to use Unstructured document loader to load ... |
| Upstage | This notebook covers how to get started with UpstageLayoutAnalysisLoa... |
| URL | This example covers how to load HTML documents from a list of URLs in... |
| Vsdx | A visio file (with extension .vsdx) is associated with Microsoft Visi... |
| Weather | OpenWeatherMap is an open-source weather service provider |
| WebBaseLoader | This covers how to use WebBaseLoader to load all text from HTML webpa... |
| WhatsApp Chat | WhatsApp (also called WhatsApp Messenger) is a freeware, cross-platfo... |
| Wikipedia | Wikipedia is a multilingual free online encyclopedia written and main... |
| XML | The UnstructuredXMLLoader is used to load XML files. The loader works... |
| Xorbits Pandas DataFrame | This notebook goes over how to load data from a xorbits.pandas DataFr... |
| YouTube audio | Building chat or QA applications on YouTube videos is a topic of high... |
| YouTube transcripts | YouTube is an online video sharing and social media platform created ... |
| Yuque | Yuque is a professional cloud-based knowledge base for team collabora... |