what is diffrence between mvhd box timescale and mdhd box timescale in isobmff format??
I find the definition in official document.
movie box timescale is
timescale is an integer that specifies the time-scale for the entire presentation; this is the number of
time units that pass in one second. For example, a time coordinate system that measures time in
sixtieths of a second has a time scale of 60
mdhd box timescale is
timescale is an integer that specifies the number of time units that pass in one second for this media.
For example, a time coordinate system that measures time in sixtieths of a second has a time scale
of 60
If Movie Box timescale is 1000, and fps 24.
then mdhd timescale value is 24000 of video track.
Is it Correct?
(My thought is video mdhd timescale is (fps * mvhd timescale) and
audio mdhd timescale is Sampling Rate(48000kHz.. etc)
I am curious about some files of mvhd timescale value is 30,
some file has 90000 value in case of video fragment files.
below picture has mdhd timescale 30

below picture has mdhd timescale 90000

MVHD =
global/movie timescaleFor Movie time : frequency of (usually set as) 1000 ticks to represent 1 second of real-world clock.
MDHD =
media-specific timescaleFor Video time : This specified frequency shall represent 1 second of real-world clock.
video note:
This is connected to (and affected by) the FPS and sample duration in the STTS atom/box.
video example:
If FPS is 24 and Sample Duration is 1000 then in
mdhdwe set: 24000 ticks per 1 second.We are saying that 1 sample (frame) should last 1/24 of a second in real-clock time.
24 samples == 1 sec.
For Audio time : This specified frequency shall represent 1 second of real-world clock.
audio note:
This is usually the Rate of PCM Samples-per-second (in hertz) of the audio data.
audio example:
48khz is 48 000 PCM samples (per second), so in
mdhdwe set: 48000 ticks per 1 second.In the above example, the total number of expected PCM audio samples for one second is 48000.
You can imagine for example, we now divide those 48000 samples into 24 audio frames. How many PCM samples per audio frame in this example?
It is 2000 because:
( 2000 samples x 24 frames ) = 48000total samples.In STTS the sample duration is a count of audio samples in the frame, not a count of audio time per frame.
At 24 audio frames-per-sec, one audio frame holds a count of 2000 samples, so it has 41.666 milliseconds worth of audio time.
So you can calculate:
Inside MP4, an audio frame will actually be an AAC frame.
It holds a different number of expected samples per frame
For 44100 or 48000, an AAC frame holds 1024 samples (or 21.333 ms of sound/PCM data).
How many AAC frames (audio frames) each with 1024 PCM samples are needed to play the expected 48000 PCM audio samples in 1 second?
The answer is 46.875 frames. An audio decoder reads 47 AAC frames though, and the remaining 128 PCM audio samples from those 47 frames is carried over into the next second of sound
(2) Regarding side queries...
Your video must be using Constant Frame Rate for that logic to work.
Your STTS must have only one entry saying all video frames apply the same sample duration of 1000, then in MDHD_timescale you can set 24000, and also in MVHD_timescale you can set 1000.
Both audio/video timescales in MDHD are for saying how many "ticks" are needed to make 1 second of media time. In STTS you are saying how much (ratio) of the MDHD timescale this current frame represents.
In video:
The MDHD is 24000 because each video sample (frame) in STTS has a 1000 ticks duration.
STTS tells us that 24 video frames are needed to match the MDHD value.
In audio:
The MDHD tick is 48000 because each audio frame in STTS holds 1024 ticks of PCM audio.
STTS tells us that 47 audio frames are needed to match the MDHD value.
Those are ratios that are specific to whatever other numbers are used in MDHD and STTS entries.
90,000 is a good value for getting usable integers out of the many frame rates out there: