i want to evaluate some EXIF data in awk, and i have this documentation
https://web.archive.org/web/20131111073619/http://www.exif.org/Exif2-1.PDF
but i get stuck by evaluating an example EXIF
the documentation says:
IFD structure
bytes 0 -1 Tag
Bytes 2 -3 Type
Bytes 4 -7 Count
Bytes 8 -11 Value Offset
Type
The following types are used in Exif:
1 = BYTE An 8-bit unsigned integer.,
2 = ASCII An 8-bit byte containing one 7-bit ASCII code. The final byte is terminated with NULL.,
3 = SHORT A 16-bit (2 -byte) unsigned integer,
4 = LONG A 32-bit (4 -byte) unsigned integer,
5 = RATIONAL Two LONGs. The first LONG is the numerator and the second LONG expresses the
denominator.,
7 = UNDEFINED An 8-bit byte that can take any value depending on the field definition,
9 = SLONG A 32-bit (4 -byte) signed integer (2's complement notat ion),
10 = SRATIONAL Two SLONGs. The first SLONG is the numerator and the second SLONG is the
denominator
My EXIF example data length is 14062 bytes it starts with:
45 78 69 66 00 00 49 49 2a 00 08 00 00 00 0b 00 0e 01 02 00 20 00 00 00 92 00 00 00 0f 01 02 00 05 00 00 00 b2 00 00 00 10 01 02 00 07 00 00 00 b8 00 00 00 12 01 03
45 78 69 66 00 00 -> Exif\x00\x00 exif header
then the tiff header
49 49 -> II means little Endian
2a 00 -> 002a -> 42 tiff file marker
08 00 00 00 -> 8 -> offset for the first IFD to the tiff header
so the first IFD is 12 bytes long and starts with at offset 8:
0b 00 0e 01 02 00 20 00 00 00 92 00
if i evaluate this as IFD structure i get:
0b 00 tag -> 00 0b
0e 01 field type -> 01 e1 -> decimal 481 - but there are only fieldtypes from 1 to 10 ??
02 00 20 00 counter -> 20 00 02-> decimal 2097154 - my whole exif data is 14062 bytes lol
00 00 92 00 offset next IFD -> 92 00 00 -> decimal 9568256 bigger than exif data...wtf
please help me - something is wrong here, where starts the first ifd?
i finished this and write myself an answer.
1) How to get the EXIF data from an jpeg
The whole jpeg is structured with markers. A marker starts with Byte 0xFF followed by a next byte, which indentifies the marker. for example:
0xFFD8 ... Start of Image, 0xFFDA ... Start of Scan (short segment and compressed data is following), ....
or the APP markers from 0xFFE0 to 0xFFEF.
After the marker a segment of bytes is appended, which starts with two bytes of segment size.
segment_size= Byte 1*256+Byte 2, including the size bytes themselves.
Not all markers are followed by a segment, but the most!
For example: 0xFFD8 ... Start Of Image is not followed by a segment.
But thats the only marker in the jpeg header (0xFFD8 - to marker 0xFFDA).
The Exif data is stored in APP marker E1 -> 0xFFE1, which normally directly follows the marker 0xFFD8 (Start Of Image).
gawk example reading the Exif data from image - splitting the data at every byte \xFF , so that the complete image is not loaded at once:
The Exif header contains 6 bytes
45 78 69 66 00 00-> Exif\x00\x00 and is followed by the Tiff header.The Exif properties are structured in IFDs Internal Field Directories, which is a concept of Tiff, Tiff files, so thats why an Tiff header follows.
2) Tiff header and IFDs
The Tiff header is 8 byte long.
Bytes 0-1: little Endian “II”(0x4949) or Big Endian “MM” (0x4D4D)
----- the following bytes depend on Bytes 0-1 little Endian or big Endian
Bytes 2-3: 42(decimal) ... a number, which identifies TIFF
Bytes 4-7: offset value to the Tiff header (not Exifheader!), where IFD 0 starts, which is 8(decimal).
The first IFD you can read is IFD 0 and follows directly after the Tiff header!
In the Internal Field Directory, fields of bytes are described (offset, how to read it out), which contain the values of the properties. This fields of bytes are in the data segment directly following the IFD. Some of the property values are directly written into the IFD, if they are small enough to fit in the 4 bytes valueoffset for a field descriptor in the IFD.
IFD structure
The first two bytes of the IFD, contain the number of fields.
Byte 0-1: number of fields
---- then followed by an array of 12 bytes for every field descriptor
Byte 02-13: first field descriptor
Byte 14-25: second field descriptor
----- and so on
----- finishing with an 4 byte offset value for the next IFD
Byte (n-3)-n: offset value for the next IFD, which is for IFD 0, IFD 1
----- if no IFD is following, offset value 0(dec) is entered.
IFD field descriptor
Bytes 00-01: Exif Property Tag
Bytes 02-03: Type (ASCII 8bit, SHORT 16bit, Long 32bit, ...)
Bytes 04-07: Count - this is not an byte count, its a word count, how many SHORTS, LONGS, ASCIIs
Bytes 08-11: valueoffset - value or offset
--- if the value is short enough to fit in the 4 bytes e.g. (four ASCII chars), than the value is directly written into the Bytes 08-11, if the value is bigger, than the Bytes 08-11 contain the offset value to the byte field in the data segment following the IFD.
If the value is smaller than 4 bytes, than it is written in from the left side. For example a 3 byte value uses the bytes 08-10, one byte -> 08, two bytes 08-09
a simple calculation counter * bytes of type<= 4 byte : valueoffset => value
Field Types
1 = BYTE An 8-bit unsigned integer
2 = ASCII An 8-bit byte containing one 7-bit ASCII code. The final byte is terminated with NULL
3 = SHORT A 16-bit (2 -byte) unsigned integer
4 = LONG A 32-bit (4 -byte) unsigned integer
5 = RATIONAL Two LONGs. The first LONG is the numerator and the second LONG expresses the denominator
7 = UNDEFINED An 8-bit byte that can take any value depending on the field definition
9 = SLONG A 32-bit (4 -byte) signed integer (2's complement notat ion)
10 = SRATIONAL Two SLONGs. The first SLONG is the numerator and the second SLONG is the denominator
How to find other IFDs than IFD 0, IFD 1
There are a lot of Exif Property tags, which are pointers to other IFDs.
So the value of these tags contain an offset value for an other IFD.
e.g. tag 0x8769 - Exif IFD, tag 0x8825 - GPS IFD, tag 0xA005 - InterOp IFD
go to https://exiftool.org/TagNames/EXIF.html - search for string "-->" in the table, which shows you all pointers.
So you have to search through the field descriptors in IFD 0 for these property tags, and in the other IFDs you are pointed to, to get all IFDs.
last an awk example - reads out IFD 0, 1 and searches for GPS, Exif, Interop IFD tags
look at functions evaluate_EXIF, read_IFD, get_IFD_field_value,