As of 4 days ago, you were able to send a GET request to or visit https://video.google.com/timedtext?lang=en&v={youtubeVideoId} and receive an xml response containing the caption track of a given youtube video. Does anyone know if this support has been removed, because as of tonight, it no longer provides the xml response with the captions, the page is simply empty for every video. There were numerous videos this worked for 4 days ago that no longer work. Thanks in advance
Google Video no longer able to retrieve captions?
5.4k Views Asked by Dillon Duff AtThere are 5 best solutions below
On
Captions in default language (single available or English it seems):
To get captions of a YouTube video just use this Linux command (using curl and base64):
curl -s 'https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -H 'Content-Type: application/json' --data-raw "{\"context\":{\"client\":{\"clientName\":\"WEB\",\"clientVersion\":\"2.9999099\"}},\"params\":\"$(printf '\n\x0bVIDEO_ID' | base64)\"}"
Change the VIDEO_ID parameter with the one interesting you.
Note: the key isn't a YouTube Data API v3 one, it is the first public (tested on some computers in different countries) one coming if you curl https://www.youtube.com/ | grep AIzaSy
Note: If interested in how I reverse-engineered this YouTube feature, say it in the comments and I would write a paragraph to explain
Captions in desired language if available:
YouTube made things tricky maybe to lose you at this step, so follow me: the only thing we have to change is the params value which is base64 encoded data which is in addition to weird characters also containing base64 data which also contains weird characters.
- Get the language initials like
rufor russian - Encode
\n\x00\x12\x02LANGUAGE_INITIALS\x1a\x00in base64 with for instanceA=$(printf '\n\x00\x12\x02LANGUAGE_INITIALS\x1a\x00' | base64)(don't forget to changeLANGUAGE_INITIALSto your language initials wantedrufor instance). The result forruisCgASAnJ1GgA= - Encode the result as a URL by replacing the
=to%3Dwith for instanceB=$(printf %s $A | jq -sRr @uri). The result forruisCgASAnJ1GgA%3D - Only if using shell commands: replace the single
%to two%with for instanceC=$(echo $B | sed 's/%/%%/'). The result forruisCgASAnJ1GgA%%3D - Encode
\n\x0bVIDEO_ID\x12\x0e$C(don't forget to changeVIDEO_IDto your video id, with$Cthe result of the previous step) with for instanceD=$(printf "\n\x0bVIDEO_ID\x12\x0e$C" | base64). The result forruandlo0X2ZdElQ4isCgtsbzBYMlpkRWxRNBIOQ2dBU0FuSjFHZ0ElM0Q= - Use this
paramsvalue from the Captions in default language section:curl -s 'https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -H 'Content-Type: application/json' --data-raw "{\"context\":{\"client\":{\"clientName\":\"WEB\",\"clientVersion\":\"2.2021111\"}},\"params\":\"$D\"}"
Here is a one-line version (do not forget to change $VIDEO_ID and $LANGUAGE_INITIALS):
curl -s 'https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -H 'Content-Type: application/json' --data-raw "{\"context\":{\"client\":{\"clientName\":\"WEB\",\"clientVersion\":\"2.2021111\"}},\"params\":\"`printf "\n\x0b$VIDEO_ID\x12\x0e\`printf "\n\x00\x12\x02$LANGUAGE_INITIALS\x1a\x00" | base64 -w 0 | jq -sRr @uri | sed 's/%/%%/g'\`" | base64`\"}"
On
I recommend that anyone who uses python to try the module youtube_transcript_api. I used to send GET request to https://video.google.com/timedtext?lang=en&v={videoId}, but now the page is blank. The following is the code example. In addition, this method does not need api key.
from youtube_transcript_api import YouTubeTranscriptApi
srt = YouTubeTranscriptApi.get_transcript("videoId",languages=['en'])
On
The YouTube API change around captions caused me a lot of hassle, which I circumvented through use of youtube-dl, which has won GitHub legal support and is now again available for download/clone.
The software is available as source or binary download for all major platforms, details on their GitHub page, linked above.
Sample use is this simple:
youtube-dl --write-sub --sub-lang en --skip-download --sub-format vtt https://www.youtube.com/watch?v=E-lZ8lCG7WY
On
This is a working Python implementation of the CURL answer provided by Benjamin Loison. Replace ZhT6BeHNmvo with your video ID.
import base64
import json
import requests
base64_string = base64.b64encode("\n\vZhT6BeHNmvo".encode("utf-8")).decode("utf-8")
headers = {
"Content-Type": "application/json",
}
body = json.dumps(
{
"context": {"client": {"clientName": "WEB", "clientVersion": "2.9999099"}},
"params": base64_string,
}
)
response = requests.post(
"https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8",
headers=headers,
data=body,
)
print(response.text)
Old API currently returns 404 on every request. And YouTube right now uses new version of this API:
https://www.youtube.com/api/timedtext?v={youtubeVideoId}&asr_langs=de%2Cen%2Ces%2Cfr%2Cid%2Cit%2Cja%2Cko%2Cnl%2Cpt%2Cru%2Ctr%2Cvi&caps=asr&exp=xftt%2Cxctw&xoaf=5&hl=en&ip=0.0.0.0&ipbits=0&expire=1637102374&sparams=ip%2Cipbits%2Cexpire%2Cv%2Casr_langs%2Ccaps%2Cexp%2Cxoaf&signature=0BEBD68A2638D8A18A5BC78E1851D28300247F93.7D5E6D26397D8E8A93F65CCA97260D090C870462&key=yt8&kind=asr&lang=en&fmt=json3
The main problem with this API is to calculate the
signaturefield of request. Unfortunately I couldn't find its algorithm. Maybe someone can reverse engineered it from YouTube player.