I have a ttml file that contains video captions, I want to fetch thru all the pairs time\caption and place them into a JSON file, I have tried https://www.npmjs.com/package/ttml?activeTab=readme but it did not work this one. Any ideas ? Thank you
How to browse thru TTML and get all the time\captions into JSON file
450 Views Asked by Lydia halls At
2
There are 2 best solutions below
0
Pierre-Anthony Lemieux
On
For folks that prefer Python, ttconv can split TTML/IMSC documents into a series of Intermediate Synchronic Documents (ISDs), each one corresponding to a period of time where the contents of the TTML/IMSC document is static.
import ttconv.imsc.reader
import ttconv.isd
import xml.etree.ElementTree as et
tt_doc = """<?xml version="1.0" encoding="UTF-8"?>
<tt xml:lang="fr" xmlns="http://www.w3.org/ns/ttml">
<body>
<div>
<p begin="1s" end="2s">Hello</p>
<p begin="3s" end="4s">Bonjour</p>
</div>
</body>
</tt>"""
m = ttconv.imsc.reader.to_model(et.ElementTree(et.fromstring(tt_doc)))
st = ttconv.isd.ISD.significant_times(m)
for t in st:
isd = ttconv.isd.ISD.from_model(m, t)
# convert ISD to JSON
ttconv also supports conversion from TTML/IMSC to SRT, which is a simple text-based format. All styling information is lost however.
tt.py convert -i <input .ttml file> -o <output .srt file> --otype SRT --itype TTML
Related Questions in JAVASCRIPT
- Using Puppeteer to scrape a public API only when the data changes
- inline SVG text (js)
- An array of images and a for loop display the buttons. How to assign each button to open its own block by name?
- Storing the preferred font-size in localStorage
- Simple movie API request not showing up in the console log
- Authenticate Flask rest API
- Deploying sveltekit app with gunjs on vercel throws cannot find module './lib/text-encoding'
- How to request administrator rights?
- mp4 embedded videos within github pages website not loading
- Scrimba tutorial was working, suddenly stopped even trying the default
- In Datatables, start value resets to 0, when column sorting
- How do I link two models in mongoose?
- parameter values only being sent to certain columns in google sheet?
- Run main several times of wasm in browser
- Variable inside a Variable, not updating
Related Questions in NODE.JS
- Using Puppeteer to scrape a public API only when the data changes
- How to request administrator rights?
- How do I link two models in mongoose?
- Variable inside a Variable, not updating
- Unable to Post Form Data to MongoDB because of picturepath
- Connection terminated unexpectedly while performing multi row insert using pg-promise
- Processing multiple forms in nodejs and postgresql
- Node.js Server + Socket.IO + Android Mobile Applicatoin XHR Polling Error...?
- How to change the Font Weight of a SelectValue component in React when a SelectItem is selected?
- My unban and ban commands arent showing when i put the slash
- how to make read only file/directory in Mac writable
- How can I outsource worker processes within a for loop?
- Get remote MKV file metadata using nodejs
- Adding google-profanity-words to web page
- Products aren't displayed after fetching data from mysql db (node.js & express)
Related Questions in CAPTION
- latex float caption first letter capital, rest lower case
- Transformers/PIL image contains values outside the range [0, 1]
- Can't download Full srt youtube captions
- How to change size and font of set_caption in flextable
- Tumblr Automatic Insertion of Image Caption Base Upon Image File Name? - Example Script I Use on Blogger to Do This
- Quarto table caption: does not resolve figure/table references in caption with kableExtra::kbl but does with knitr::kable
- Capture div element animation on top of a video
- Target image captions using a macro in MS Word
- How to download youtube autogenerated captions without yt-dlp?
- How to fix unsafe attempt error and show the caption?
- Break line and bold caption in Kable
- Generating captions from image embeddings
- How can I add a shape to a text caption?
- Schema/DTD for Youtube json3 transcript format
- Linking figure caption number to chapter number in a different style than Heading 1 in Word
Related Questions in TRANSCRIPTION
- How can I get live transcription on OS X (without audio files)?
- Continuous speech recognition without restart after 1 minute
- google speech API does only partial transcript
- Music Transcription of Wav files in Java
- Not Transcribing short answers
- Enhanced playback with Spotify API
- Split transcript into transcripts for different speakers
- Is there is a time-limit to speech recognition while using Web Speech API?
- Optimal string from segments with words and timestamps
- Error 'tuple.index(x): x not in tuple' in 'model.transcribe' from wishper in python
- Google Cloud Speech-to-Text Automatic Punctuation
- Detecting a pause of 2 Seconds or more in Speech
- Azure Transcription: Missing/Incorrect caption generated by Speech to Text
- Timestamping audio from any language given the audio source and an accurate transcription
- transcribe a phone recording
Related Questions in TTML
- YouTube Data API V3: Download caption
- Youtube Timed Text Caption box alignment issue
- Setting font size in webvtt without css styling?
- HTML5: Playing <audio> with TTML subtitle
- How to convert .srt file into ttml based xml subtitle file using ffmpeg?
- Python parsing weird root - TTML (XML) with ElementTree or lxml
- How to browse thru TTML and get all the time\captions into JSON file
- how to setup MPEG-DASH in mac?
- How to convert a document to a .str and a .ttml file
- CAF: subtitles positioning issue even after shaka-player update
- CAF Receiver: Positioning of subtitles
- Remove xml node from ttml file C#
- What does '+' mean in W3C syntax representation for TTML timeExpression?
- Adding some kind of subtitle to an mp4 with ffmpeg
- How to convert TTML (Timed Text Markup Language) tick to seconds
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Try looking at https://github.com/sandflow/imscJS for code that extracts the Intermediate Synchronic Documents (ISDs) - e.g. the file isd.js may be relevant.
By the way, it's worth noting that the data model in TTML doesn't exactly match the idea of a mapping between pairs of times and individual captions. You may get duplications.
Each ISD is a snapshot between two moments on the timeline in which the presented content does not change.
This is an important distinction because in TTML it is possible to have the same "caption" appear at times that overlap with other captions appearing and disappearing, for example:
So the result in ISDs is:
As you can see that first line appears in two ISDs. It's up to you in your application how you deal with this, of course.