I have to highlight a word in video captions when user taps on it. So, given the coordinates of a tap the problem is to find the word index in a cue string.
Any ideas? Is it's possible at all? I am talking about the newest HTML5 touch events and WebVTT cues https://www.w3.org/TR/webvtt1/.
I don't think this is possible with traditional WebVTT cues - they are pseudo-elements, which are not directly part of the DOM, so you can't bind events to them. Styling is also extremely limited for ::cues.
However, you should be able to leverage TextTrack events to accomplish something that works in a similar way. You can bind a custom function to the video track's oncuechange event, and then use the track's activeCues to generate your own captions. This custom div can then be styled and have whatever events on it that you want.
This will grab the first text track from your video, and get the text from the currently active cue every time a cue change occurs.
You will probably need to parse each word of the cue into its own span so you can add events to it, add highlight classes, etc. Then you can style/interact with each piece however you'd like.
https://developer.mozilla.org/en-US/docs/Web/API/TextTrack