FFmpeg audio stream extraction on non-interleaved AVI - slow compared to AviSynth

1k Views Asked by At

I want to extract the audio stream of an avi file as a wav file, it works but it is really slow (~4-5fps) although I just want to copy the stream.

Here is the type of stream I want to extract (ffprobe info):
Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s

Going through AviSynth does it about 100 times faster, but I would prefer a pure FFmpeg solution. Why such a speed difference? It looks like FFmpeg is reading and processing through the whole file whereas AviSynth can just extract the data without reading it.

Example:
ffmpeg -i file.avi -vn -ac 2 -c:a copy audio.wav
or
ffmpeg -i file.avi -map 0:a -ac 2 -c:a copy audio.wav
both work fine but take time.

Using an AviSynth script as input:
ffmpeg -i script.avs -map 0:a -ac 2 -c:a copy audio.wav
with script.avs containing just:
AviSource("file.avi")
does the same but almost instantaneously!

Any idea why AviSynth is so much faster and if there is a way to get the same speed in FFmpeg?

Edit: adding logs
Using FFmpeg directly:

E:\>ffmpeg -i "file.avi" -map 0:a -c:a copy -y -benchmark "output.wav"
ffmpeg version N-92936-ged3b64402e Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20181201
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 25.100 / 56. 25.100
  libavcodec     58. 43.100 / 58. 43.100
  libavformat    58. 25.100 / 58. 25.100
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 47.100 /  7. 47.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
[avi @ 0000018d3c38a680] non-interleaved AVI
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, avi, from 'file.avi':
  Duration: 00:18:37.49, start: 0.000000, bitrate: 534682 kb/s
    Stream #0:0: Video: rawvideo, bgr24, 1280x720, 533183 kb/s, 24.11 fps, 24.11 tbr, 24.10 tbn, 24.10 tbc
    Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Output #0, wav, to 'output.wav':
  Metadata:
    ISFT            : Lavf58.25.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=  192445kB time=00:18:37.12 bitrate=1411.2kbits/s speed=4.77x
video:0kB audio:192445kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000040%
bench: utime=1.188s stime=50.766s rtime=234.254s
bench: maxrss=17468kB

Using AviSynth:

E:\>ffmpeg -i "soundout.avs" -map 0:a -c:a copy -y -benchmark "output.wav"
ffmpeg version N-92936-ged3b64402e Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20181201
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 25.100 / 56. 25.100
  libavcodec     58. 43.100 / 58. 43.100
  libavformat    58. 25.100 / 58. 25.100
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 47.100 /  7. 47.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, avisynth, from 'soundout.avs':
  Duration: 00:18:37.49, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1280x720, 24.11 fps, 24.11 tbr, 24.10 tbn, 24.10 tbc
    Stream #0:1: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Output #0, wav, to 'output.wav':
  Metadata:
    ISFT            : Lavf58.25.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=  192445kB time=00:18:37.11 bitrate=1411.2kbits/s speed= 155x
video:0kB audio:192445kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000040%
bench: utime=0.234s stime=1.047s rtime=7.236s
bench: maxrss=23792kB

Edit: tests after "reencoding" AVI file:
Onto something...
Say my original file is f.avi. Here is ffprobe's results:

[avi @ 0x55a9c4b1e740] non-interleaved AVI
Input #0, avi, from 'f.avi':
  Duration: 00:00:38.18, start: 0.000000, bitrate: 1104582 kb/s
    Stream #0:0: Video: rawvideo, bgr24, 1632x1200, 1104265 kb/s, 23.47 fps, 23.47 tbr, 23.47 tbn, 23.47 tbc
    Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s

Extracting audio takes a long time.
Now if I "reencode" the file in another AVI:

ffmpeg -i f.avi -c copy f2.avi

I can extract the audio from f2.avi in milliseconds!
FFprobe on f2.avi:

Input #0, avi, from 'f2.avi':
  Metadata:
    encoder         : Lavf57.56.101
  Duration: 00:00:38.18, start: 0.000000, bitrate: 1104456 kb/s
    Stream #0:0: Video: rawvideo, bgr24, 1632x1200, 1104265 kb/s, 23.47 fps, 23.47 tbr, 23.47 tbn, 23.47 tbc
    Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s

It's the same apart from the Metadata, which shouldn't make a difference, but with this comparison I see the problem must have to do with the fact that the original is non-interleaved!
I would assume it was easier to read and extract the audio from a non-interleaved file but maybe this is not conforming to AVI standards, hence the extra work needed?

1

There are 1 best solutions below

5
micha137 On

You answered your question yourself: It looks like you are input bandwidth bottlenecked and ffmpeg reads the raw video just to throw it away, while avisynth (which will probably use the AVI Splitter from DirectShow) only reads the audio data from disk. I don't see a way to make ffmpeg do the same.