I know there are already a few questions like this here on SO, however they do not fully explain the formulas presented in the answers.
Im writing a parser that should be able to process MPEG-1,2,2.5 Audio Layer I,II,III frame headers. The goal is to calculate the exact size of the frame, including header, CRC (if present) and any data or metadata of this frame (basically the number of bytes between the start of one header and the beginning of the next one).
One of the code snippets/formulas commonly seen on the internet to achieve this is (in no specific programming language):
padding = doesThisFramehavePadding ? 1 : 0;
coefficient = sampleCount / 8;
// makes sense to me. the slot size seems to be the smallest addressable space in an mp3 frame
// and is thus important for padding.
slotSize = mpegLayer == Layer1 ? 4 : 1;
// all fine here. bitRate / sampleRate yields bits per sample, multiplied by that weird
// coefficient from earlier probably gives us <total bytes> per <all samples in this frame>.
// then add padding times slotSize.
frameSizeInBytes = ((coefficient * bitRate / sampleRate) + padding) * slotSize;
I have multiple questions regarding above code snippet:
- What exactly would this "coefficient" even represent? As it's just
sampleCount / 8it's probably just something used to convert the units from bits to bytes in the final calculation, right? - If my assumption from 1. is correct: if
(coefficient * bitRate / sampleRate)already yields something in bytes what would multiplying it with the slot size achieve for Audio Layer I specifically? Wouldn't this imply that the unit of(coefficient * bitRate / sampleRate)should have been "slots" earlier, not "bytes"? If so, then what does the coefficient do, like why divide by 8, even for audio layer 1 frames? Is this even correct? - Questions 1. and 2. lead me to believe that the code snippet above may not even be correct. If so what would the correct calculation for MPEG versions 1,2,3.5 and layers I,II and III look like?
- Does above calculation still yield the correct result if the CRC protection bit is set in the frame header (i.e. 16 additional CRC bytes are appended to the header)?
- Speaking of the header: are the 4 header bytes included in the resulting
frameSizeInBytesor does the result indicate the length of the frame data/body?
Basically all these sub-questions can be summarized to:
What is the formula to calculate the total and exact length of the current frame in bytes, including the header, and stuff like CRC, or Xing and LAME meta data frames and other eventualities?
I wrote that in Delphi/Pascal and the function returns either
0for a bad frame or its exact size of bytes. It is based on multiple websites - the first two illustrate and explains an MPEG audio frame header with full precision, while the third has crucial additions like the formula(s):If the function returns
0you're most likely in any metadata tag's area. The calculated frame size is for its payload=content and does not count the 4 bytes of header data. It's exactly the amount of bytes to seek forward in the file to be in front of the next frame's headers.I wrote this to exactly count frames in MP3 files encoded with variable bitrates, where frame sizes can have very different lengths. And I was fed up with lazy overall calculations that would only do guesswork.
The "special" VBR frames that don't contain audio but instead additional info can be fairly well detected, too. For this we need to know the "side info" of a frame:
You may also want to read
...which is also useful to know where the first audio frame is to be found (after tags at the start of the file) and when you've reached the last one (before tags at the end of the file).