YouTube Scraping Guide for AI
Last updated
Was this helpful?
Last updated
Was this helpful?
This guide will walk you through the workflow for collecting and filtering YouTube data for AI training purposes using : youtube_search
, youtube_video_trainability
, youtube_metadata
, youtube_download
, and youtube_transcript
.
Start by searching for videos related to your topic of interest.
For a quick search that returns up to 20 results:
For more comprehensive results (up to 700 results):
Refine your search with filters:
After receiving search results, extract the video IDs for further processing. In the response from youtube_search
or youtube_search_max
, video IDs are directly available in the videoId
field of each result item, as shown in this example response snippet:
Extract these video IDs into a list for use in subsequent API calls.
Before downloading or using videos for AI training, check their eligibility:
The response will indicate if the video can be used for AI training purposes:
["all"]
- Training permitted for all parties
["none"]
- No training permitted for any party
["party1", "party2", ...]
- Training permitted only for specific parties
Collect additional information about the videos to further evaluate their quality and relevance:
The response will contain metadata like view counts, comments, ratings, and other metrics that can help you assess content quality.
The parse
parameter must be set to true
for the metadata source.
After identifying high-quality, trainable videos based on their eligibility and metadata, you can proceed with content retrieval. This can be done in two parallel steps:
Additional options for download:
Note:
Videos can be up to 3 hours in length
Default resolution is 720p (can be customized)
You can specify audio-only, video-only, or both
Transcripts are not the same as closed captions (CC). Not all videos have transcripts available in all languages. If a transcript doesn't exist in your specified language, the API will return a 404
status code.
If the metadata shows transcripts are available, you can retrieve them with:
For videos with manually created transcripts, specify:
On YouTube, click the "..." menu below the video, then look for "Show transcript" in the menu options. If this option is missing, the video doesn't have transcripts available. When present, you can click it to view available transcript languages.
For efficient processing of multiple videos, use batch endpoints:
Follow the discovery workflow from search → trainability → metadata → content to maximize efficiency
Narrow down search results before processing individual videos
Always verify trainability before using content for AI
This source is only available via the asynchronous and feature.
The most efficient way to check transcript availability is by examining the video metadata , which includes these fields:
Check and implement retries for failed requests