Regular expression in Libre Office Writer to remove timings of transcription

39 Views Asked by At

I'm using openai's Whisper to convert an audio to text.

I get an output indexed by time entries of the form [01:28.000 --> 01:36.000].

I'd like to remove these in Libre Office Writer using the Find and Replace tool.

However, what's the convention to write this pattern, i.e. [ followed by something, followed by arrow, followed by something, followed by ] in Libre Office?

1

There are 1 best solutions below

0
the busybee On BEST ANSWER

There are multiple possible regular expression, this one is the most simple one:

^\[.+?\]

If you insist of matching the arrow, it is:

^\[.+?-->.+?\]

What does it mean?

  • ^ matches the beginning of the line, assuming that the time stamp is there;
  • \[ matches the opening bracket, the backslash removes the special meaning of [;
  • .+? matches any number of characters, at least one, non-greedy;
  • --> matches the arrow;
  • \] matches the closing bracket.

Leave the field for the replacement empty to remove the time stamp.

If you need to remove the blank between the time stamp and the text, which I assume exists, add a blank at the end of the pattern.

The documentation linked from Writer's help page is most helpful.