I want to extract the audio stream of an avi file as a wav file, it works but it is really slow (~4-5fps) although I just want to copy the stream.
Here is the type of stream I want to extract (ffprobe info):
Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Going through AviSynth does it about 100 times faster, but I would prefer a pure FFmpeg solution. Why such a speed difference? It looks like FFmpeg is reading and processing through the whole file whereas AviSynth can just extract the data without reading it.
Example:
ffmpeg -i file.avi -vn -ac 2 -c:a copy audio.wav
or
ffmpeg -i file.avi -map 0:a -ac 2 -c:a copy audio.wav
both work fine but take time.
Using an AviSynth script as input:
ffmpeg -i script.avs -map 0:a -ac 2 -c:a copy audio.wav
with script.avs containing just:
AviSource("file.avi")
does the same but almost instantaneously!
Any idea why AviSynth is so much faster and if there is a way to get the same speed in FFmpeg?
Edit: adding logs
Using FFmpeg directly:
E:\>ffmpeg -i "file.avi" -map 0:a -c:a copy -y -benchmark "output.wav"
ffmpeg version N-92936-ged3b64402e Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 8.2.1 (GCC) 20181201
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
libavutil 56. 25.100 / 56. 25.100
libavcodec 58. 43.100 / 58. 43.100
libavformat 58. 25.100 / 58. 25.100
libavdevice 58. 6.101 / 58. 6.101
libavfilter 7. 47.100 / 7. 47.100
libswscale 5. 4.100 / 5. 4.100
libswresample 3. 4.100 / 3. 4.100
libpostproc 55. 4.100 / 55. 4.100
[avi @ 0000018d3c38a680] non-interleaved AVI
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, avi, from 'file.avi':
Duration: 00:18:37.49, start: 0.000000, bitrate: 534682 kb/s
Stream #0:0: Video: rawvideo, bgr24, 1280x720, 533183 kb/s, 24.11 fps, 24.11 tbr, 24.10 tbn, 24.10 tbc
Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Output #0, wav, to 'output.wav':
Metadata:
ISFT : Lavf58.25.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size= 192445kB time=00:18:37.12 bitrate=1411.2kbits/s speed=4.77x
video:0kB audio:192445kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000040%
bench: utime=1.188s stime=50.766s rtime=234.254s
bench: maxrss=17468kB
Using AviSynth:
E:\>ffmpeg -i "soundout.avs" -map 0:a -c:a copy -y -benchmark "output.wav"
ffmpeg version N-92936-ged3b64402e Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 8.2.1 (GCC) 20181201
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
libavutil 56. 25.100 / 56. 25.100
libavcodec 58. 43.100 / 58. 43.100
libavformat 58. 25.100 / 58. 25.100
libavdevice 58. 6.101 / 58. 6.101
libavfilter 7. 47.100 / 7. 47.100
libswscale 5. 4.100 / 5. 4.100
libswresample 3. 4.100 / 3. 4.100
libpostproc 55. 4.100 / 55. 4.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, avisynth, from 'soundout.avs':
Duration: 00:18:37.49, start: 0.000000, bitrate: N/A
Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1280x720, 24.11 fps, 24.11 tbr, 24.10 tbn, 24.10 tbc
Stream #0:1: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Output #0, wav, to 'output.wav':
Metadata:
ISFT : Lavf58.25.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size= 192445kB time=00:18:37.11 bitrate=1411.2kbits/s speed= 155x
video:0kB audio:192445kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000040%
bench: utime=0.234s stime=1.047s rtime=7.236s
bench: maxrss=23792kB
Edit: tests after "reencoding" AVI file:
Onto something...
Say my original file is f.avi. Here is ffprobe's results:
[avi @ 0x55a9c4b1e740] non-interleaved AVI
Input #0, avi, from 'f.avi':
Duration: 00:00:38.18, start: 0.000000, bitrate: 1104582 kb/s
Stream #0:0: Video: rawvideo, bgr24, 1632x1200, 1104265 kb/s, 23.47 fps, 23.47 tbr, 23.47 tbn, 23.47 tbc
Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
Extracting audio takes a long time.
Now if I "reencode" the file in another AVI:
ffmpeg -i f.avi -c copy f2.avi
I can extract the audio from f2.avi in milliseconds!
FFprobe on f2.avi:
Input #0, avi, from 'f2.avi':
Metadata:
encoder : Lavf57.56.101
Duration: 00:00:38.18, start: 0.000000, bitrate: 1104456 kb/s
Stream #0:0: Video: rawvideo, bgr24, 1632x1200, 1104265 kb/s, 23.47 fps, 23.47 tbr, 23.47 tbn, 23.47 tbc
Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
It's the same apart from the Metadata, which shouldn't make a difference, but with this comparison I see the problem must have to do with the fact that the original is non-interleaved!
I would assume it was easier to read and extract the audio from a non-interleaved file but maybe this is not conforming to AVI standards, hence the extra work needed?
You answered your question yourself: It looks like you are input bandwidth bottlenecked and ffmpeg reads the raw video just to throw it away, while avisynth (which will probably use the AVI Splitter from DirectShow) only reads the audio data from disk. I don't see a way to make ffmpeg do the same.