The ZIP Archive File Format

The ZIP Archive File Format

Original Documentation

The ZIP archives are created by the PkZIP/PkUnZIP combo produced
by the PkWare company. The PkZIP programs have with LHArc and ARJ
the best compression.
The directory information is stored at the end of the archive, each local
file in the archive begins with the following header; This header can be used
to identify a ZIP file as such :
OFFSET              Count TYPE   Description
0000h                   4 char   ID='PK',03,04
0004h                   1 word   Version needed to extract archive
0006h                   1 word   General purpose bit field (bit mapped)
									  0 - file is encrypted
									  1 - 8K/4K sliding dictionary used
									  2 - 3/2 Shannon-Fano trees were used
									3-4 - unused
								   5-15 - used internally by ZIP
								 Note:  Bits 1 and 2 are undefined if the
										compression method is other than
										type 6 (Imploding).
0008h                   1 word   Compression method (see table 0010)
000Ah                   1 dword  Original DOS file date/time (see table 0009)
000Eh                   1 dword  32-bit CRC of file (inverse??)
0012h                   1 dword  Compressed file size
0016h                   1 dword  Uncompressed file size
001Ah                   1 word   Length of filename
001Ch                   1 word   Length of extra field
001Eh               "LEN" char   path/filename
001Eh               "XLN" char   extra field
After all the files, there comes the central directory structure.
(Table 0010)
PkZip compression types
0 - Stored / No compression
1 - Shrunk / LZW, 8K buffer, 9-13 bits with partial clearing
2 - Reduced-1 / Probalistic compression, lower 7 bits
3 - Reduced-2 / Probalistic compression, lower 6 bits
4 - Reduced-3 / Probalistic compression, lower 5 bits
5 - Reduced-4 / Probalistic compression, lower 4 bits
6 - Imploded / 2/3 Shanno-Fano trees, 4K/8K sliding dictionary

--- Central directory structure
The CDS is at the end of the archive and contains additional information
about the files stored within the archive.
OFFSET              Count TYPE   Description
0000h                   4 char   ID='PK',01,02
0004h                   1 byte   Version made by
0005h                   1 byte   Host OS (see table 0011)
0006h                   1 byte   Minimum version needed to extract
0007h                   1 byte   Target OS
								 see above "Host OS"
0008h                   1 word   General purpose bit flag
								 see above "General purpose bit flag"
000Ah                   1 word   Compression method
								 see above "Compression method"
000Ch                   1 dword  DOS date / time of file
0010h                   1 dword  32-bit CRC of file (see table 0009)
0014h                   1 dword  Compressed size of file
0018h                   1 dword  Uncompressed size of file
001Ch                   1 word   Length of filename
001Eh                   1 word   Length of extra field
0020h                   1 word   Length of file comment
0022h                   1 word   Disk number ??
0024h                   1 word   Internal file attributes (bit mapped)
									0 - file is apparently an ASCII/binary file
								 1-15 - unused
0026h                   1 dword  External file attributes (OS dependent)
002Ah                   1 dword  Relative offset of local header from the
								 start of the first disk this file appears on
002Eh               "LEN" char   Filename / path; should not contain a drive
								 or device letter, all slashes should be forward
								 slashes '/'.
002Eh+              "XLN" char   Extra field
002Eh               "CMT" char   File comment

(Table 0011)
PkZip Host OS table
0 - MS-DOS and OS/2 (FAT)
1 - Amiga
2 - VMS
3 - *nix
4 - VM/CMS
5 - Atari ST
6 - OS/2 1.2 extended file sys
7 - Macintosh
8-255 - unused

--- End of central directory structure
The End of Central Directory Structure header has following format :
OFFSET              Count TYPE   Description
0000h                   4 char   ID='PK',05,06
0004h                   1 word   Number of this disk
0006h                   1 word   Number of disk with start of central directory
0008h                   1 word   Total number of file/path entries on this disk
000Ah                   1 word   Total number of entries in central dir
000Ch                   1 dword  Size of central directory
0010h                   1 dword  Offset of start of central directory relative
								 to starting disk number
0014h                   1 word   Archive comment length
0016h               "CML" char   Zip file comment


This information is from and is used with permission.

More Resources