I am reading a 10 mins Audio file from S3, but I want to say take first x seconds only as byte stream and do some speech analytics (cutting it due to API limitations of downstream model)
Code:
s3_bucket = 'llm-test-tmp'
key_prefix = "training-datasets/asr_notebook_data"
input_audio_file_name = 'audio.wav'
s3_client = boto3.client("s3")
s3_client.download_file(s3_bucket, f"{input_audio_file_name }", input_audio_file_name)
with open(input_audio_file_name, "rb") as file:
wav_file_read = file.read()
I want to cut this audio byte stream to say first 30 seconds or 1 minute or any arbitrary first x seconds. How can I do it in the easiest way without much change or alterations in data type?