Get PII durations (start-end time) from an Audio file using Transcription/other techniques

17 Views Asked by At

I have a use-case where I want to:

  1. Locate all PII data in any given Audio file (done: using GPT/similar models)
  2. Transcribe the audio and then mask all those PII in the text file (done using whisper/similar models)
  3. Also, in the original audio mask the PII portions with beeps. (Remaining)

The typical problem is, a transcription model isn't giving back the times (start/end time) of each word spoken. Hence, it becomes very difficult to locate back the PII basis the transcription output.

Anyone figured out any way to solve the same? On-prem models or API based services, anything is fine, some direction is what I am looking for.

0

There are 0 best solutions below