In the tutorials I've learned, the composition of JPEG files is only binary system. But when I use python to open a JPEG file, the content of the file is not as regular as tutorials. The content of JPEG file I hope to see is like:\xff\xd8\xff\xe0\x00\x10... But in fact, it is like:\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t Why there are characters like JFIF, C, \t\t and so on? enter image description here enter image description here
I hope I can figure it out, and modify this JPEG file on a small scale.
A valid JPEG file must begin with the Start of Image (SOI) marker 0xff, 0xd8 and must contain Huffman tables and quantisation tables as well as the compressed image data. There are several other optional things it can contain too - many JPEGs out of a camera will have a thumbnail embedded. A bare JPEG file doesn't need much header info but it absolutely has to begin with SOI.
In theory it should end with EOI too but only the strictest decoders are fussy about that.
The second item 0xff, 0xe0 is for application specific metadata which allows the program opening the file to know what flavour of JPEG it is dealing with - in this case JFIF. It specifies the JPEG File Interchange Format.
A full list of all the various JPEG markers is on Wiki
The two most common flavours of JPEG files encountered are [Exif] (https://en.wikipedia.org/wiki/Exif) 0xff, 0xe1 from most modern cameras and older JFIF.
Some can also include comments. There have been past threads here on SO about creating the smallest possible valid JPEG image file - using esoteric and rarely seen arithmetic encoding options.
It is an interesting programming exercise to parse the markers and embedded strings in a JPEG file. I suggest trying one from a NASA or HST site as they sometimes have interesting spare thumbnails lurking in them.
If you want more detail about the JPEG internals then Miano's book "Compressed Image File Formats" isn't a bad introduction and much more accessible than the JPEG standards document.