I am unable to figure out the exact reason behind why the opencv videowriter.write() takes more time when ran sequentially without any a relatively costly blocking operation in between, compared to when there is a blocking call.
Take a look at the following two code blocks, the only difference b/w the two is that one has a small wait/delay b/w consecutive frame write operation.
Code 1 with no inter frame delay
import cv2
import time
videoReader = cv2.VideoCapture("1080_1920.mp4")
videoWriter = cv2.VideoWriter('processed.mp4', cv2.VideoWriter_fourcc(*'H264'), 25.0, (1920,1080))
totalWaitTime = 0
totalEncodingTime = 0
totalTimeOverall = 0
tBegin = time.time()
while(True):
ret, frame = videoReader.read()
if not ret:
break
t3 = time.time()
# time.sleep(0.01)
t4 = time.time()
totalWaitTime += (t4-t3)
t5 = time.time()
videoWriter.write(frame)
t6 = time.time()
totalEncodingTime += (t6-t5)
print(f"tSlp: {t4-t3:.4f}, tEnc: {t6-t5:.4f}")
print("-"*30)
tEnd = time.time()
totalTimeOverall = tEnd-tBegin
print("="*50,'\n Summary : ')
print(f"Time taken : {totalTimeOverall:.4f} \t --%")
print(f"Of which slp : {totalWaitTime:.4f} \t {int(totalWaitTime*100/totalTimeOverall)}%")
print(f"Of which enc : {totalEncodingTime:.4f} \t {int(totalEncodingTime*100/totalTimeOverall)}%")
Output Code 1
Note : the total encoding time is about 5.5 seconds.

Code 2 with small inter frame delay
import cv2
import time
videoReader = cv2.VideoCapture("1080_1920.mp4")
videoWriter = cv2.VideoWriter('processed.mp4', cv2.VideoWriter_fourcc(*'H264'), 25.0, (1920,1080))
totalWaitTime = 0
totalEncodingTime = 0
totalTimeOverall = 0
tBegin = time.time()
while(True):
ret, frame = videoReader.read()
if not ret:
break
t3 = time.time()
time.sleep(0.01)
t4 = time.time()
totalWaitTime += (t4-t3)
t5 = time.time()
videoWriter.write(frame)
t6 = time.time()
totalEncodingTime += (t6-t5)
print(f"tSlp: {t4-t3:.4f}, tEnc: {t6-t5:.4f}")
print("-"*30)
tEnd = time.time()
totalTimeOverall = tEnd-tBegin
print("="*50,'\n Summary : ')
print(f"Time taken : {totalTimeOverall:.4f} \t --%")
print(f"Of which slp : {totalWaitTime:.4f} \t {int(totalWaitTime*100/totalTimeOverall)}%")
print(f"Of which enc : {totalEncodingTime:.4f} \t {int(totalEncodingTime*100/totalTimeOverall)}%")
Output Code 2
Note : the total encoding time has now decreased about 3.3 seconds.

My thoughts and what I have tested so far Looking at the load snapshots attached below, during execution of code 1, the CPU spikes to 100% while for code 2 it does not. This may mean that CPU is bottle-necking, but given that the operation (videowriter.write()) is synchronous not async, the total time should have ideally remained same? What I have tried is read all frames in memory as opposed to sequentially reading them in the loop, and then write the frames sequentially, this did not change the results. I also change the writing pipeline with gstreamer to write a UDP stream, as opposed to writing on a file on disk, to test a hypothesis that DISK I/O maybe bottle-necking. but this experiment did not change the results either.
What I am looking for is the core reason behind why the total encoding time goes down when a small delay is added in loop.

