Decode a proprietary H264 network video stream

1.3k Views Asked by At

I have incoming byte streams probably encoded in H264 from a RTSP camera through a websocket in my spring boot application,

I need to decode the incoming H264 streams to transmit the video to my frontend clients. I have tried using javaCV/FFMpeg but nothing works.

Any help would be appreciated

this is the part of hex dump received through socket

00000000: 01 00 00 00 04 48 32 36 34 00 00 00 24 38 65 34    .....H264...$8e4
00000010: 32 39 65 37 61 2D 32 66 34 66 2D 34 37 31 61 2D    29e7a-2f4f-471a-
00000020: 39 61 63 30 2D 66 66 62 38 64 64 37 63 37 64 37    9ac0-ffb8dd7c7d7
00000030: 32 00 00 00 D4 7B 22 49 73 49 6E 69 74 22 3A 66    2...T{"IsInit":f
00000040: 61 6C 73 65 2C 22 49 73 41 75 64 69 6F 22 3A 66    alse,"IsAudio":f
00000050: 61 6C 73 65 2C 22 54 6F 74 61 6C 53 65 63 6F 6E    alse,"TotalSecon
00000060: 64 73 22 3A 30 2E 30 36 2C 22 46 72 61 6D 65 54    ds":0.06,"FrameT
00000070: 69 6D 65 22 3A 22 32 30 32 33 2D 30 32 2D 32 33    ime":"2023-02-23
00000080: 54 30 34 3A 32 31 3A 35 33 2E 35 33 31 5A 22 2C    T04:21:53.531Z",
00000090: 22 53 65 71 75 65 6E 63 65 49 64 22 3A 31 2C 22    "SequenceId":1,"
000000a0: 42 61 73 65 44 65 63 6F 64 65 54 69 6D 65 22 3A    BaseDecodeTime":
000000b0: 32 36 35 38 37 2C 22 4D 65 64 69 61 54 69 6D 65    26587,"MediaTime
000000c0: 22 3A 32 36 35 38 37 2C 22 49 73 46 72 61 6D 65    ":26587,"IsFrame
000000d0: 48 69 64 64 65 6E 22 3A 66 61 6C 73 65 2C 22 49    Hidden":false,"I
000000e0: 73 4B 65 79 46 72 61 6D 65 22 3A 66 61 6C 73 65    sKeyFrame":false
000000f0: 2C 22 49 64 22 3A 34 34 35 2C 22 47 65 6E 65 72    ,"Id":445,"Gener
00000100: 61 74 69 6F 6E 22 3A 31 7D 00 00 3F 50 00 00 00    ation":1}..?P...
00000110: 68 6D 6F 6F 66 00 00 00 10 6D 66 68 64 00 00 00    hmoof....mfhd...
00000120: 00 00 00 01 BD 00 00 00 50 74 72 61 66 00 00 00    ....=...Ptraf...
00000130: 10 74 66 68 64 00 02 00 00 00 00 00 01 00 00 00    .tfhd...........
00000140: 14 74 66 64 74 01 00 00 00 00 00 00 00 00 00 67    .tfdt..........g
00000150: DB 00 00 00 24 74 72 75 6E 01 00 0F 01 00 00 00    [...$trun.......
00000160: 01 00 00 00 70 00 00 00 3C 00 00 3E E0 00 01 00    ....p...<..>`...
00000170: 00 00 00 00 00 00 00 3E E8 6D 64 61 74 00 00 3E    .......>hmdat..>
00000180: DC 41 E1 81 80 93 BE 16 2B 33 77 3D 4C B6 55 8B    \Aa...>.+3w=L6U.
00000190: D2 55 60 92 05 F7 F7 A4 97 54 4B 6C A6 68 48 84    RU`..ww$.TKl&hH.
000001a0: 68 FF D2 B6 6C 02 31 FC 24 01 78 EA BD 20 AD 15    h.R6l.1|$.xj=.-.
000001b0: F1 73 31 4B EB EF 18 1B 50 B3 13 F2 DC C6 4C E1    qs1Kko..P3.r\FLa
000001c0: 75 8B 94 52 6B C5 09 37 55 1E 45 66 6A 92 39 23    u..RkE.7U.Efj.9#
000001d0: C9 2D FD BB EC AD FD CF C4 30 75 FF 44 66 FA 85    I-};l-}OD0u.Dfz.
000001e0: D9 7C 18 72 AE 63 45 60 DD D7 65 44 84 49 95 8D    Y|.r.cE`]WeD.I..
000001f0: 2C 70 6C 57 8E E9 A9 EB B6 F6 78 BD D6 88 99 F6    ,plW.i)k6vx=V..v
00000200: FC 25 B1 0A FF DF CB 77 6A 67 37 24 A5 3D 8F A1    |%1.._Kwjg7$%=.!
00000210: 27 9B 4F 42 0E CD B8 87 6E C9 99 FC 6F 4C 53 4B    '.OB.M8.nI.|oLSK
00000220: 01 EA B6 AF 99 F8 22 C1 8F 1E C1 66 D6 8A 09 D6    .j6/.x"A..AfV..V
00000230: 99 79 91 F7 C1 2A 08 1F 81 CB 5E DD C3 CA 86 8F    .y.wA*...K^]CJ..
00000240: 57 BF 17 A2 64 6B 69 56 AE 19 1F 57 AD A6 D8 C2    W?."dkiV...W-&XB
00000250: 06 28 EB 46 D3 E4 85 51 3E E2 A5 40 50 50 85 7D    .(kFSd.Q>b%@PP.}
00000260: 72 6B 20 87 1A 6E 73 E1 B8 88 9E 20 23 48 6D FE    rk...nsa8...#Hm~
00000270: C2 0D 39 ED 24 B2 6D B5 9B 81 B6 BC F4 EE DE A2    B.9m$2m5..6<tn^"
00000280: CF A1 08 D0 D2 5B EE FA 0D DA FD 3B 79 C7 89 E5    O!.PR[nz.Z};yG.e
00000290: 4F 64 73 37 98 D6 2D 47 1D 8B A3 47 DD EA C9 8E    Ods7.V-G..#G]jI.
000002a0: 3E 8C 97 E2 42 15 FB 22 A6 83 A1 34 18 52 5E 35    >..bB.{"&.!4.R^5
000002b0: 2A A6 E2 71 D7 4F 96 0A EC AE 8D 39 27 B8 CF 61    *&bqWO..l..9'8Oa
000002c0: CC ED E9 AF 74 C3 95 D3 E3 96 32 20 E6 31 0B E4    Lmi/tC.Sc.2.f1.d
000002d0: DC F4 FF 41 37 36 E7 DB 87 AE B3 7D BF CA F8 05    \t.A76g[..3}?Jx.
000002e0: 72 2A 38 AB B8 8E 98 43 97 C8 5E 80 57 C6 E7 1E    r*8+8..C.H^.WFg.
000002f0: 86 75 CE CD CE BF CF 10 C9 8A C2 C9 6E 33 41 AC    .uNMN?O.I.BIn3A,
00000300: 91 AC A8 F3 1B E6 D5 0A 22 A1 2C 4C 68 19 51 4D    .,(s.fU."!,Lh.QM
00000310: 17 DA AE E1 D7 BC 0E 2D F8 14 61 E2 4F BA 26 A3    .Z.aW<.-x.abO:&#
00000320: 0A E4 A6 BE 08 EA 3C 28 E6 C5 6B CA 3A 86 D2 59    .d&>.j<(fEkJ:.RY
00000330: 34 C2 ED 91 72 5A EF 2C BE D7 38 A4 60 D7 F3 97    4Bm.rZo,>W8$`Ws.
00000340: BB E6 FD C2 D0 29 10 B5 A4 79 D8 3E 61 48 8A F9    ;f}BP).5$yX>aH.y
00000350: C6 D8 13 D0 FD DB D6 FA 24 7F CD 5A BF 06 57 49    FX.P}[Vz$.MZ?.WI
00000360: 51 EC ED B2 74 AB 92 1D 37 68 70 A2 A5 31 B5 5F    Qlm2t+..7hp"%15_
00000370: EA CF 9E 3E 6A B1 78 16 B7 94 D1 46 7B 63 C1 67    jO.>j1x.7.QF{cAg
00000380: D2 B0 08 44 64 1E 68 15 39 80 E3 DD EB C0 E1 71    R0.Dd.h.9.c]k@aq
00000390: E8 EE D0 4D DF 4F 41 E0 96 C5 34 AD BC D3 9E 88    hnPM_OA`.E4-<S..
000003a0: 0B 17 D8 7D 3A A8 3B 06 78 79 93 B7 30 92 C8 D8    ..X}:(;.xy.70.HX
000003b0: 5D 27 04 D7 00 9F E3 EA A3 C6 BD B9 05 21 5C 68    ]'.W..cj#F=9.!\h
000003c0: 45 DB 90 2A 05 38 79 D9 84 60 C7 F2 BB DE 1B 5A    E[.*.8yY.`Gr;^.Z
000003d0: 44 0B ED 67 34 DF 07 8B F5 04 27 9E 1A F0 04 CA    D.mg4_..u.'..p.J
000003e0: 86 B1 2C 0B 78 D0 58 86 81 62 D8 70 3D BA 9D 51    .1,.xPX..bXp=:.Q
000003f0: D8 2C 6C 6A 10 88 B9 F8 89 3D 6F 39 C2 52 49 CF    X,lj..9x.=o9BRIO
00000400: 9F C1 50 6A D4 9E A5 96 B2 0A 99 1D 6B BC 63 03    .APjT.%.2...k<c.
00000410: A4 8C 7E 1D BD DF 8B D8 97 EE 9A 59 78 63 FC 74    $.~.=_.X.n.Yxc|t
00000420: 3B 40 75 AF A7 1A B7 F0 56 A5 5F 3E 81 54 83 A0    ;@u/'.7pV%_>.T..
00000430: 7F FC AD 71 CE AF 54 8B 5D DC 27 34 20 A3 0A 73    .|-qN/T.]\'4.#.s
00000440: 76 A5 81 33 22 31 56 6B 1D 82 C4 32 FB 82 15 F6    v%.3"1Vk..D2{..v
00000450: 97 C8 47 29 3C 9E 59 9A C0 83 48 A0 55 CB C8 D6    .HG)<[email protected]
00000460: 36 92 CC 54 A7 00 E3 28 9E 99 45 B2 E5 7E 88 A7    6.LT'.c(..E2e~.'
00000470: 28 4E CA 75 17 3C D3 B5 6C F5 FD AC 05 55 BF F7    (NJu.<S5lu},.U?w
00000480: 98 61 92 30 D8 0F 0E A5 DD 61 4D 80 27 5B A7 68    .a.0X..%]aM.'['h
00000490: E5 B9 C2 B8 EE 31 F6 63 29 37 C5 C9 11 39 90 8D    e9B8n1vc)7EI.9..
000004a0: D8 00 35 F4 7A 2D 79 D0 6A BB 9C 98 E4 41 CF 3F    X.5tz-yPj;..dAO?
000004b0: DE 9D 8B BF 04 69 1D BC 5C E7 E1 F2 49 01 8D F5    ^..?.i.<\garI..u
000004c0: 41 3E 3F FB AE 54 B2 D9 F2 A0 E8 0A F7 59 47 77    A>?{.T2Yr.h.wYGw
000004d0: 3C 19 C8 7B 81 9B 17 19 E9 81 A0 36 AD C6 62 71    <.H{....i..6-Fbq
000004e0: DB 68 72 8F 6A 37 45 D9 0E 6E DC 2C 5E 52 C2 75    [hr.j7EY.n\,^RBu
000004f0: 51 2F F9 CE 8A 10 12 E9 C8 68 A9 D6 A6 D7 5B 14    Q/yN...iHh)V&W[.
00000500: 11 51 42 FD BE B5 09 56 7F 19 C3 EB A7 A6 DF 6C    .QB}>5.V..Ck'&_l
00000510: 55 A3 11 DC EF 81 C3 CD DD 63 BF 38 F8 5A 4A 45    U#.\o.CM]c?8xZJE
00000520: 33 24 7B A4 55 B3 85 A6 87 75 3B 85 51 5C 03 B7    3${$U3.&.u;.Q\.7

UPDATE TO THE CODE

1st Packet find here 2nd Packet find here

I have updated the code as per one of the comment to read only MDAT box to retrieve H264 stream from the incoming bytes[] through the socket, now I send only MDAT box contents (next byte after MDAT box)

public Map.Entry<Boolean, List<Integer>> hasMdat(byte[] byteArray) {
    for (int i = 0; i < byteArray.length - 3; i++) {
        if (byteArray[i] == (byte) 109 &&
                byteArray[i + 1] == (byte) 100 &&
                byteArray[i + 2] == (byte) 97 &&
                byteArray[i + 3] == (byte) 116) {

            return Map.entry(true, Arrays.asList(i, i + 1, i + 2, i + 3));
        }
    }
    return Map.entry(false, List.of(0));
}

This is my code which handles the byte stream

initSocketConnection(new VideoStreamCallback() {
        @Override
        public void onVideoStreamReceived(byte[] bytes) {
           
Map.Entry<Boolean, List<Integer>> b = hasMdat(bytes);
        if (b.getKey()) {
            byte[] b1 = Arrays.copyOfRange(bytes, b.getValue().get(3) + 1, bytes.length);
  //write b1 back to client using spring SSE
            
        }

        }
    });
2

There are 2 best solutions below

13
VC.One On BEST ANSWER

"I have incoming byte streams probably encoded in H264 from a Genetec camera..."
"I need to decode the incoming H264 streams to transmit the video to my frontend..."

Note: If your "frontend" playback system can play MP4 then you already have a playable file as given. There is no need to extract H.264 bytes from within MP4 bytes, or convert bytes to AnnexB format, or adding Start-codes, or skipping audio frames, etc. Simply remove the starting Genetec header and the rest of data is a playable MP4.

This answer below is for when the playback system accepts chunked MP4 data (H.264 contained in MP4). If the player expects actual raw H.264 frames then for extraction: see the Answer by Markus Schumann.

## Shortest version:

From each packet:

  • At offset [49] read a skippable "Size" from the 32-bit integer (is in Big-Endian format).
  • MP4 data begins at int mp4_data_pos = (49 + skipSize);.
  • Extract MP4 data parts only, and use them for testing as video playback.
  • In MP4 data: [moov] is metadata and then chunks are [moof+mdat] ... [moof+mdat].
  • Find moov as: 0x6D6F6F76, find moof as: 0x6D6F6F66, find mdat as: 0x6D646174.
  • Get a size of each atom by reading a 32-bit integer from the previous four bytes.
    (eg: int size_mdat = read_32_bits_from( pos_mdat - 4); since one array slot holds 8 bits).
  • First [moof + mdat] chunk after the metadata part must contain a keyframe for display.
  • Find keyframe through checking for a frame type 5 with:
    int frame_type = ( byteArray[ mdat_pos + 8 ] & 0x1F );
    (NB 1: This will only find a keyframe if it's the first frame in a chunk (eg: a [moof + mdat] package).
    (NB 2: It's possible your Genetec device outputs a keyframe later (eg: every 25th frame) or else maybe it only emits a keyframe at device start-up? Check for such an issue if not getting a keyframe.
  • If testing with an HTML5 video tag, then your codec setup is: avc1.640028.
  • If testing as file then save N-amount of "MP4 data" parts as one file (ie: file is the joined MP4 chunks).

## Short version:

[ to add later: Image of header sections to skip + with 32-bit integers highlighted ]

In each packet...

  • Skip ahead, past the first 49 bytes.
  • At offset 49, read a 32-bit integer (ie: to update some "skipSize" variable).
  • Skip ahead (past some Object bytes) by the new amount from skipSize variable.
  • MP4 data begins here (eg: mp4_begins_pos = (49 + skipSize);.

Some notes about MP4 data in packets:

  • MP4 data is up to end of packet, and may have a larger size than the current packet size.
  • MP4 data which is larger than the packet's own size will continue in the next packet.
  • in bytes: moov atom has required MP4 metadata, and moof atom is a playable MP4 chunk.
  • in bytes: an MP4 atom begins with a SIZE (32-bits or 4 bytes), followed by atom NAME.
  • in bytes: Copy all atoms (in order of appearance) by their SIZEs into a new bytes array.
    This array can be saved as a file or sent to a decoder for playback.

(Option A) For online playback (using HTML5 video tag):
Send the array of "MP4 data" parts only to your front-end.
(meaning all packet bytes minus the first 49 bytes and also minus the Object bytes (by its size)).

Web playback means choosing either:

  • Using MediaSource Extensions API to manually feed received chunks to the browser's MPEG decoder.

  • Using a server-side script as tag's src, where the script "pipes" the MP4 data back into the video tag.

  • Pre-saving the chunks on server as they arrive, then using HLS Live Playlist format to serve frontend clients.

From the STSD atom I can see you have: H.264 "High" profile @ level 4.0 (or avc1.640028).

type= video/mp4; codecs= "avc1.640028";

(Option B) To save as a file for quick testing:
Concat (join) your required N-amount of "MP4 data" parts into one long array, then save that array as a file. Test the new MP4 file from your storage folder in a player like VLC Media Player.

## Long version:

The format of Genetec's MP4 header is easy enough to understand:

(1) Each packet begins with some 49 bytes of skippable values.

  • (a) Each packet's first byte value is 0x01.

    • Confirm with a if (byteArray[0] == 1) { /* is OK packet */ }.
    • Else assume there is some packet corruption. Skip/ignore such packets.
  • (b) Followed by 32-int for size of data, and then data itself: is type String (eg: H264).

    • example packet-1 bytes: 00 00 00 04 (=4) then 48 32 36 34 (=H264)
  • (c) Followed by 32-int size for possible GUUID (seems to be enough hex values for 16 bytes).

    • example packet-1 bytes: 00 00 00 24 (=36) then XXXX-XX-XX-etc style hex values.

This above data seems to be always 49 bytes. Confirm by checking other packets, (is there a pattern of 49 bytes that are: a string of "H264" followed by a sequence of "XXXX-XX-XX-etc" style values?).

  • Skip past these 49 bytes (of Genetec's custom header bytes).
  • Skip range is [byte 0] ... [byte 48], since these bytes are not needed for playback.

(2) After skipping, there is an Object (added by Genetec) of side metadata (eg: {"IsAudio":false} ).

  • (a) From offset 49 onwards, read a 32-bit integer for the size (bytes length) of the Object.
    • example packet-1 bytes: 00 00 00 C8 == is size of 36 bytes
  • (b) Skip the Object bytes by using the found size (to land at example position: offset N).
    • These custom Object's bytes are not needed for playback.

(3) At offset N, the actual MP4 data begins.

Here is a starting example code to check the packet for MP4 data size to extract for playback.

Use the read_int32 function that I've provided to get a 32-bit integer from some Array position.

import java.util.Arrays;

public class Main 
{
    //# Vars for MP4 data
    public static int size_MP4_data = 0;
    public static int size_expected_MP4_data = 0;
    public static int size_received_MP4_data = 0;
    public static int offset_MP4_data = 0;
    
    public static boolean need_more_MP4_data = false;
     
    public static void main(String[] args) 
    {
        //# Example Array to represent a received input Packet
        //# Array contents are from the first 320 bytes of your first example packet
        //# See full bytes at: https://pastebin.com/embed_js/3Ca8ZDFk 
        int[] bytes_Packet =    {

                                    0x01, 0x00, 0x00, 0x00, 0x04, 0x48, 0x32, 0x36, 0x34, 0x00, 0x00, 0x00, 0x24, 0x39, 0x33, 0x65, 
                                    0x63, 0x35, 0x39, 0x31, 0x30, 0x2D, 0x65, 0x65, 0x35, 0x38, 0x2D, 0x34, 0x39, 0x37, 0x32, 0x2D, 
                                    0x61, 0x30, 0x66, 0x66, 0x2D, 0x32, 0x65, 0x62, 0x33, 0x61, 0x33, 0x61, 0x34, 0x32, 0x66, 0x35, 
                                    0x66, 0x00, 0x00, 0x00, 0xC8, 0x7B, 0x22, 0x49, 0x73, 0x49, 0x6E, 0x69, 0x74, 0x22, 0x3A, 0x74, 
                                    0x72, 0x75, 0x65, 0x2C, 0x22, 0x49, 0x73, 0x41, 0x75, 0x64, 0x69, 0x6F, 0x22, 0x3A, 0x66, 0x61, 
                                    0x6C, 0x73, 0x65, 0x2C, 0x22, 0x54, 0x6F, 0x74, 0x61, 0x6C, 0x53, 0x65, 0x63, 0x6F, 0x6E, 0x64, 
                                    0x73, 0x22, 0x3A, 0x30, 0x2E, 0x30, 0x2C, 0x22, 0x46, 0x72, 0x61, 0x6D, 0x65, 0x54, 0x69, 0x6D, 
                                    0x65, 0x22, 0x3A, 0x22, 0x32, 0x30, 0x32, 0x33, 0x2D, 0x30, 0x32, 0x2D, 0x32, 0x35, 0x54, 0x31, 
                                    0x36, 0x3A, 0x35, 0x30, 0x3A, 0x32, 0x37, 0x2E, 0x32, 0x36, 0x31, 0x5A, 0x22, 0x2C, 0x22, 0x53, 
                                    0x65, 0x71, 0x75, 0x65, 0x6E, 0x63, 0x65, 0x49, 0x64, 0x22, 0x3A, 0x31, 0x2C, 0x22, 0x42, 0x61, 
                                    0x73, 0x65, 0x44, 0x65, 0x63, 0x6F, 0x64, 0x65, 0x54, 0x69, 0x6D, 0x65, 0x22, 0x3A, 0x30, 0x2C, 
                                    0x22, 0x4D, 0x65, 0x64, 0x69, 0x61, 0x54, 0x69, 0x6D, 0x65, 0x22, 0x3A, 0x30, 0x2C, 0x22, 0x49, 
                                    0x73, 0x46, 0x72, 0x61, 0x6D, 0x65, 0x48, 0x69, 0x64, 0x64, 0x65, 0x6E, 0x22, 0x3A, 0x66, 0x61,
                                    0x6C, 0x73, 0x65, 0x2C, 0x22, 0x49, 0x73, 0x4B, 0x65, 0x79, 0x46, 0x72, 0x61, 0x6D, 0x65, 0x22, 
                                    0x3A, 0x66, 0x61, 0x6C, 0x73, 0x65, 0x2C, 0x22, 0x49, 0x64, 0x22, 0x3A, 0x30, 0x2C, 0x22, 0x47,
                                    0x65, 0x6E, 0x65, 0x72, 0x61, 0x74, 0x69, 0x6F, 0x6E, 0x22, 0x3A, 0x31, 0x7D, 0x00, 0x00, 0x02,
                                    0xC4, 0x00, 0x00, 0x00, 0x1C, 0x66, 0x74, 0x79, 0x70, 0x64, 0x61, 0x73, 0x68, 0x00, 0x00, 0x00,
                                    0x00, 0x69, 0x73, 0x6F, 0x6D, 0x64, 0x61, 0x73, 0x68, 0x6D, 0x70, 0x34, 0x31, 0x00, 0x00, 0x02, 
                                    0xA8, 0x6D, 0x6F, 0x6F, 0x76, 0x00, 0x00, 0x00, 0x78, 0x6D, 0x76, 0x68, 0x64, 0x01, 0x00, 0x00, 
                                    0x00, 0x00, 0x00, 0x00, 0x00, 0xE0, 0x1F, 0xEC, 0xD3, 0x00, 0x00, 0x00, 0x00, 0xE0, 0x1F, 0xEC
                                  
                                };

        //# Process the packet to extract MP4 data
        int[] data_MP4 = process_Packet( bytes_Packet ); //# returns a trimmed array
    }
    
    static int[] process_Packet( int[] input ) 
    {
        //# (optional) Confirm function code is running ....
        System.out.println( "## Checking received Packet byte values .... " );
        
        //# NOTE: 
        //# "size_header_Genetec" is the mentioned "skipSize" variable equivalent.
        //# it stores how much bytes length (size) to skip past to reach MP4 data.
        
        int temp_int = 0; //# temp number for counting
        int size_total_packet = input.length; //# using size of "input" packet given to this function
        int size_header_Genetec = 0;
        
        //# first check if this packet's MP4 data needs to be added to another previous packet's data to make a full (uncorrupt) chunk.
        if( need_more_MP4_data == true)
        {
            //## to fix later (if needed)
            //## solution: extract needed remainder and append to an existing array
        }
        
        ///////////////////////////////
        //### Phase 1: Find MP4 attoms
        ///////////////////////////////
        
        //# Account for starting "0x01" byte
        size_header_Genetec = 1;
        
        //# since the size is increased with a "+=" we can re-use
        //# the newly increased "size_header_Genetec" value.
        
        //# get next size (usually String of 4 letters: "H264")
        temp_int = read_int32( input, size_header_Genetec );
        size_header_Genetec += (temp_int + 4);
        
        //# get next size (usually a GUUID of hex values)
        temp_int = read_int32( input, size_header_Genetec );
        size_header_Genetec += (temp_int + 4);
        
        //# get next size (usually an Object of metadata values)
        temp_int = read_int32( input, size_header_Genetec );
        size_header_Genetec += (temp_int + 4);
        
        //# skip next 4 bytes
        size_header_Genetec += 4;
       
        //# Update offset for later use ...
        offset_MP4_data = size_header_Genetec;
        
        //# final check before next steps
        System.out.println( "- MP4 data begins at offset: [" + offset_MP4_data + "] until end of this packet" );
        
        ////////////////////////////////
        //### Phase 2: Handle MP4 atoms
        ////////////////////////////////
        
        //# check atom NAME
        temp_int = read_int32( input, offset_MP4_data + 4 );
        
        if( 
            //# IF atom "name" is one of these (as expected from your packet MP4 structure)...
            ( temp_int == 0x66747970 )    //# is "ftyp"
            || ( temp_int == 0x6D6F6F76 ) //# is "moov"
            || ( temp_int == 0x6D6F6F66 ) //# is "moof"
            || ( temp_int == 0x6D646174 ) //# is "mdat"
            
        )
        {
            //# THEN check all MP4 atoms in this packet for total size of MP4 data
            //# do this by adding together all the atom sizes
            //# note: if size is bigger than packet, then store data for completion with next packet
            
            //# add sizes of MP4 atoms...
            
            temp_int = offset_MP4_data;
                
            while(true)
            {
                size_expected_MP4_data += read_int32( input, temp_int );
                temp_int = ( offset_MP4_data + size_expected_MP4_data );
                
                //# avoid reading past end of packet
                if( size_expected_MP4_data >= ( size_total_packet - offset_MP4_data ) )
                {
                    break;
                }
            }
        
        }
        
        /////////////////////////////////////////////////////////////////////////////
        //# confirm MP4 data positions are correct (double-check by a hex view of same bytes)...
        System.out.println( ">> MP4 data offset: " + offset_MP4_data );
        System.out.println( ">> MP4 data length is : " + size_expected_MP4_data );
        /////////////////////////////////////////////////////////////////////////////
        
        ///////////////////////////////
        //# Phase 3: Copy the MP4 data
        ///////////////////////////////
        
        int copyStartPos = offset_MP4_data;
        int copyEndPos = (size_total_packet-1);
        
        //# prepare for when getting the next new packet
        
        if( size_expected_MP4_data > ( size_total_packet - offset_MP4_data ) ) { need_more_MP4_data = true; }
        
        if( need_more_MP4_data == false )
        {
            //# reset the count for this new MP4 chunk
            //size_received_MP4_data = 0;
            //size_expected_MP4_data = 0;
        }
        
        //# slice the Array to keep only the MP4 data parts
        return ( Arrays.copyOfRange( input , copyStartPos, copyEndPos) );
        
    }
    
    //# function expects:    
    //# "input" = array to search, 
    //# "pos" = start position of reading a 4-byte sequence 
    static int read_int32( int[] input, int pos ) 
    {
        
        //# integer to hold combined 4-byte values as one result/number
        int temp_int = 0;
        
        //# join the four byte values into "temp_int"
        temp_int = ( input[ pos+0 ] << 24 );
        temp_int |= ( input[ pos+1 ] << 16 );
        temp_int |= ( input[ pos+2 ] << 8 );
        temp_int |= ( input[ pos+3 ] << 0 );

        return ( temp_int ); 
        
    }
        
}

Options for stream playback:

(a) If your player expects MP4 data:

  • Your data is already playable in most video players
  • Send all data of each packet, after skipping past the first 49 bytes and also the { ... } Object bytes.

(b) If your player expects raw H.264 data (ie: It does not play the MP4 data, only H.264 data):

  • Then you will have to extract each H.264 video frame from inside the MP4 data.
  • Your H.264 data is in AVCC format (.mp4), which means each frame comes with a "size in bytes" value.
  • Most players will expect raw H.264 to be in AnnexB format (.h264).
    Replace all four bytes of frame's size with this four byte sequence 00 00 00 01.
    (correctly: Use 00 00 00 01 for SPS, PPS and Keyframes, then 00 00 01 for other frames eg: P-frames).
0
Markus Schumann On

Your hex dump looks like a partial fragmented MP4 prefixed with some JSON. Typically H.264 uses inter frame compression. So not every frame is a full frame but just the differences between two frames. Therefore you can't decode H.264 at arbitrary points. You need to look for an IDR (instant decoder refresh) frame in your stream. IDR frames may only be transmitted every 10-100 frames. Now I am looking at your 'mdat' hex dump

00 00 3E E8 6D 64 61 74 00 00 3E DC 41

00 00 3E E8 : size of 'mdat'

6D 64 61 74 : 'mdat'

00 00 3E DC : size of the NAL unit (part of the H.264) stream

41 : indicates NAL unit type of '1' (lower 4 bits indicate NAL unit type)

NAL unit of type '1' is a coded slice of a non-IDR picture.

So your 'mdat' does not contain an IDR (or key) frame therefore it is not decodable.

If you look at your JSON - you'll get another datapoint indicating the lack of a keyframe ("IsKeyFrame":false).

{
"IsInit":false,
"IsAudio":false,
"TotalSeconds":0.06,
"FrameTime":"2023-02-23T04:21:53.531Z",
"SequenceId":1,
"BaseDecodeTime":26587,
"MediaTime":26587,
"IsFrameHidden":false,
"IsKeyFrame":false,
"Id":445,"Generation":1
}

So you could modify your code and start decoding once you receive and IDR or key frame. But there is potentially another issue. You may need certain metadata to prime the H.264 decoder that you typically find in 'moov' (and not 'moof') part of the fragmented mp4.

The stream metadata is called Sequence Parameter Set (SPS) and Picture Parameter Set (PPS). It is legal to omit SPS and PPS in 'mdat' since it is usually factored out into the 'moov'.

Handing just the 'mdat' to a H.264 decoder may not work if SPS/PPS is missing.

In any case - if your IDR frames are prefixed with SPS/PPS - you still have to remove the size field(s) from the 'mdat' and replace them with start codes (00 00 00 01)

Essentially you have to convert the mp4-style H.264 stream from 'mdat' into a 'AnnexB'-style before feeding it into a decoder:

Your 'mdat':

'mdat' = <size> data[0] | <size> data[1] | ... | <size> data[n] |

Replace the size with a start code and break the multiple Access Units into individual Access Units.

Required decoder input:

00 00 00 01 data[0]
00 00 00 01 data[1]
...
00 00 00 01 data[n]

If your 'mdat' does not contain SPS/PPS then your best bet is to wait or request the 'moov' or init part of the fragmented mp4 and hand the complete fragmented mp4 to ffmpeg.

I am guessing that the 'moov' or init part of the fragmented MP4 will be prefixed with the JSON member ("IsInit":true).

A previous answer Decoding H264 Stream Always Returns MF_E_TRANSFORM_NEED_MORE_INPUT

links a self contained example parsing 'mdat' and handing it to Media Foundation for decoding.