These are the original files from by Max Maischein. Used with permission.
This is really just for reference in case I screwed up when I reformatted and indexed them.
In the file format list, several short mnemonics are used to describe the structure of the data stored. Here I describe the structure (and possible conversion) between some of these types. As some types have different sizes across the platforms, for most types the byte order and bit size is given to describe it. ASCIIZ A sequence of characters(->char), terminated with the special character with the value 0. Note that ASCIIZ strings as most structures on Intel machines should not be larger than 64Kb due to the ancient segmentation used. BCD Binary coded decimal A decimal number is converted into a hexadecimal number which has the same digits as the decimal number. (10d becomes 10h, 21d becomes 21h) Bitmap If a value is declared as bitmapped, that means that every bit in this value might have a different meaning. The bytes are numbered from right to left, the least significant bit has the number 0. After the bit number, there are either two statements, separated by a slash("/"), which are the two meanings if the bit is set / not set, or one single statement, which is the meaning of this bit, if it is set. Byte 8 bit unsigned number. Smallest unit a record consists of. All offsets are in the unit bytes. (0-255) Char Synonym for byte, most values are between 32 and 255. (#0-#255) DWord 32 bit signed number. Well, maybe some of the formats use a DWord which is a 32 bit unsigned number, but as files tend not to be greater than 2GB, this won't be my concern. To convert between Intel and Motorola format, you have to swap bytes #2 & #3 and bytes #1 & #4.(-2Gb-+2Gb) Int Integer. Signed 16-bit number. (-32767-+32767) LString A string which is preceeded by the length. Also named "counted" string. Used by most Pascal implementations Maximum length is 255 bytes, but it can contain any char. Nybble The upper or lower four bits of a byte. A nybble is a single hex digit and can have values from 0 to 15. A signed nybble can have values from -8 to 7 with bit 3 being the sign bit. Paragraph A multiple of 16. A paragraph was the resolution of the Intel chip 64K segments. Word 16 bit unsigned number. Note that byte order is important, wether you have a Motorola machine or an Intel one. Conversion between the two formats is simply by swapping byte #1 with byte #2. (0-65535) How to identify different files While searching for different file formats, I found the following programs helpful to gather information about different files. They all are DOS programs since I'm not familiar with other platforms (except Windows). Most of them should be available on SimTel CDs or via FTP at, except for my program TF, which is still in beta. LIST.COM v9.0a by Vernon Buerg List is a file lister which supports both text and hex-view. HIEW.EXE v4.18 by Sen Another file lister with build-in disassembler. FILE.EXE v2.0 by Felix von Leitner File is a file identification program. Q.COM v3.01 by SemWare QEdit is the editor I'm editing the list with. TF.EXE v0.38 by me The program that started it all. A "simple" file identification program - no more, since it has grown too big by now. Still unreleased, since it is not really extensible yet. The file formats list meta list ;) The file format list uses a certain format to make it readable by programs which convert it into the WinHelp format or create program structures out of the lists. This format is very similar to the format used by Ralf Brown in his PC interrupt list but was extended by me to accomodate for the specific needs of this list : Each topic in the list is delimited by a line of 45 chars, in which the first 8 contain the char '-'. After these, there follows one character which contains the type of topic. The different topics are described in the list itself, the char '!' denotes an information topic - like the list of chars and their meaning. After the topic identifier, there follows another '-' char and then the topic name, not containing any '-' chars. After the topic name, there may be some other descriptors like for Motorola byte ordering, guesswork marking or other purposes, see the main list for further information. The line is ended with at least one '-' char. Take the following prototype : --------?-TEST------------------------------ OFFSET Count TYPE Description EXTENSION: OCCURENCES: PROGRAMS: REFERENCE: SEE ALSO: VALIDATION: Sub-topics like different records are mostly delimited by three dashes ('-'). I suggest folding them up and making them available as a popup window. Tables have the following format : (see table 0000) for a table reference and (Table 0000) for the beginning of a table. The end of a table is undefined (yet). A primer on file formats Abbrevations Throughout the list, many abbrevations are used, some in the reference section. Here some are explained : c't The c't is a german computer magazine, which developed the Borland Pascal for OS/2 patch. They release source code in files called CTmmyy.*. Note that comments in the source code and the language in the issues tend to be german :-) DDJxxyy (Doctor Dobb's Journal) The DDJ is a monthly publication by M&T/US which is intended for the professional programmer. The four digits after the name indicate the month/year of the issue referred to. Most of the sourcecode published in the issue is available electronically on Compu$erve and other BBSes. The files have the name DDJyymm. PDN Programmer's Distribution Net A network dedicated to the distribution of source code useful to programmers. Often linked with Fido-nodes. Contributions to this list were made by : Ralf Brown (The .EXE file formats from the INTERRUPT List, general layout) David Dilworth ([email protected]) Daniel Dissett ([email protected]) Marcus Groeber ([email protected]) Darrel Hankerson ([email protected]) Carl Hauser ([email protected]) Jouni Miettunen ([email protected]) Jan Nicolai Langfeldt ([email protected]) Mark Ouellet (Telix .FON structures) Greg Roelofs ([email protected]) Robert Rothenburg Walking-Owl ([email protected]) Jesus Villena (CONVERT.EXE, a digital sample conversion program) Christos Zoulas ([email protected]) JAL / Nostalgia David McDuffee, (75530,[email protected]) Information gleaned from other programs : Formats for Word and WordPerfect (Selke's filetype)
File format list Release 3.00 Last change 02/04/96 This compilation is Copyright (c) 1994,2002 Max Maischein --------!-CONTACT_INFO---------------------- If you notice any mistakes or omissions, please let me know! It is only with YOUR help that the list can continue to grow. Please send all changes to me rather than distributing a modified version of the list. This file has been authored in the style of the INTERxxy.* file list by Ralf Brown, and uses almost the same format. Please read the file FILEFMTS.1ST before asking me any questions. You may find that they have already been addressed. Max Maischein [email protected] Corion on #coders@IRC --------!-DISCLAIMER------------------------ DISCLAIMER: THIS MATERIAL IS PROVIDED "AS IS". I verify the information contained in this list to the best of my ability, but I cannot be held responsible for any problems caused by use or misuse of the information, especially for those file formats foreign to the PC, like AMIGA or SUN file formats. If an information it is marked "guesswork" or undocumented, you should check it carefully to make sure your program will not break with an unexpected value (and please let me know whether or not it works the same way). Information marked with "???" is known to be incomplete or guesswork. Some file formats were not released by their creators, others are regarded as proprietary, which means that if your programs deal with them, you might be looking for trouble. I don't care about this. --------!-FLAGS----------------------------- One or more letters may follow the file format ID; they have the following meanings: Cx - Charset used : 7 - Unix 7-bit characters A - Amiga charset (if there is one) E - EBDIC character format U - Unicode character set W - Windows char set Default is the 8-Bit IBM PC-II Charset. Note that Microsoft introduced codepages which might be relevant with other programs. G - guesswork, incomplete, unreliable etc. M - Motorola byte order Default is Intel byte order O - obsolete, valid only for version noted below X - Synonym topic. See topic named under see also. --------!-CATEGORIES------------------------ The ninth column of the divider line preceding an entry usually contains a classification code for the application that uses those files. The codes currently in use are: ! - User information ( not really a file format ) A - Archives (ARC,LZH,ZIP,...) a - Animations (CEL, FLI, FLT,...) B - Binary files for compilers etc. (OBJ,TPU) H - Help file (HLP,NG) I - Images, bit maps (GIF,BMP,TIFF,...) D - Data support files (CPI,FON,...) E - Executable files (EXE,PIF) f - Generic file format. RIFF and IFF are generic file formats. F - Font files (TTF) G - General graphics file M - Module music file (MIDI,MOD,S3M,...) R - Resource data files (RES) S - Sound files (WAV,VOC,ZYX) T - Text files (DOC,TXT) W - Spreadsheet and related (WKS) X - Database files (DBF) --------!-FIELDS---------------------------- After a format description, you will sometimes find other keywords. The meanings of these are : EXTENSION: This is the default extension of files of the given type. On DOS systems, most files have a 3 letter extension. On Amiga systems, the files are prefixed with something. The DOS extensions are all uppercase, extensions for other systems are in lower case chars. On other systems, which do not have the con- cept of extensions, as the MAC, this is the file type. OCCURENCES: Where you are likely to encounter those files. This specifies machines (like PC,AMIGA) or operating systems (like UNIX). PROGRAMS: Programs which either create, use or convert files of this format. Some might be used for validation or conversion. REFERENCE: A reference to a file or an article in a magazine which is mandatory or recommended for further understanding of the matter. SEE ALSO: A cross reference to a topic which might be interesting as well. VALIDATION: Methods to validate that the file you have is not corrupt. Normally this is a method to check the theoretical file size against the real filesize. Some file formats allow no reliable validation. --------!-FORMAT---------------------------- The block oriented files are organized in some other fashion, since the order of blocks is at best marginally obligatory. Each block type starts with the block ID (eg. RIFFblock for a RIFF file) and in square brackets the character value of the ID field (eg. [WAVE] for RIFF WAVe sound files). The block itself is descripted in the format description, that means you will have to look after RIFF or FORM. In the record description, the header information is omitted ! If a record is descripted, the record ends when the next offset is given. Bitmapped values have a description for each bit. The value left of the slash ("/") is for the bit not set (=0), the right sided value applies if the bit is set. A note on the tables section. The tables were added as they were introduced into Ralf Browns interrupt list - so not everything was pressed into a table. The tables (should) have unique numbers, but they sure are out of order ! --------!-MACHINES-------------------------- Machines that use Intel byte ordering PC Machines that use Motorola byte ordering AMIGA, ATARI ST, MAC, SUN --------M-669------------------------------- The .669 format is a module format for digital music. OFFSET Count TYPE Description 0000h 1 word ID=6669h 0002h 108 byte ASCII song message 006Eh 1 byte Number of saved samples (0-40h) ="NOS" 006Fh 1 byte Number of saved patterns (0-80h) ="NOP" 0070h 1 byte Loop order number 0071h 128 byte Order list 00F1h 128 byte Tempo list for patterns 0171h 128 byte Break location list for patterns 01F1h "NOS" rec Sample data The sample data is in the file for "NOS" 13 byte ASCIIZ filename of instrument 1 dword Length of instrument sample 1 dword Offset of beginning of loop 1 dword Offset of end of loop 01F1h+ "NOP"*600 rec The note patterns "NOS"*19h Those patterns are repeated for each row, and the array of these is repeated 64 times for each pattern. 3 byte Note(see table 0000) 01F1h+ ? byte Sample data (unsigned) "NOS"*0x19+ "NOP"*0x600 (Table 0000) 669 Note format Each note looks like this : BYTE[0]: BYTE[1]: BYTE[2]: nnnnnnii iiiivvvv ccccdddd n : note value i : 6-bit instrument number v : 4-bit volume c : command data (Protracker format mapped) : 0 = a 1 = b 2 = c 3 = d 4 = e 5 = f d : command value (Protracker format) Special values for byte 0 : 0FEh : no note, only volume 0FFh : no note or no command, if byte 2 = 0FFh EXTENSION:669 OCCURENCES:PC SEE ALSO:MOD PROGRAMS:669 Mod Composer, DMP VALIDATION: --------S-8SVX-MG--------------------------- The 8SVX files are IFF files used for digital audio data. The format of the VHDR block is complete guesswork. These files use Motorola byte order. The 8SVX file format is fixed to 8-bit mono sample data - at least GoldWave does not support saving files in any other format than 8-bit mono. FORMblock [VHDR] This is the sample information block. The normal size is 20 bytes. OFFSET Count TYPE Description 0000h 1 dword Sampling rate of digital data in Hz. This count seems not to be too accurate, at least GoldWave v2.0 creates different rates for Wave and 8SVX files. 0004h 4 dword Other data, unknown FORMblock [BODY] This block contains the raw sample data, maybe the usual IFF compression was used. The details of both the compression and the information about the IFF format are unknown to me. EXTENSION:IFF OCCURENCES:PC,Amiga PROGRAMS:GoldWave SEE ALSO:IFF,WAVE VALIDATION: --------S-AIFC-MG--------------------------- The AIFC files seem to be a variation of the AIFF files - but I don't know anything about them. EXTENSION:IFF SEE ALSO:AIFF --------S-AIFF-MG--------------------------- The Audio Interchangeable File Format files are digital audio files stored in the IFF format; the samples are stored in signed PCM. The header block is [AIFF], different subblocks are : [AUTH] The authors information; optional [COMM] This record stores information about the sampled data : OFFSET Count TYPE Description 0000h 1 word ??? number of channels ??? ??? number of instrument samples ??? 0002h 1 dword Sample length 0006h 1 dword lower frequency 000Ah 1 dword maximum frequency 000Dh 1 dword ??? [MARK] [NAME] The name of the instrument / sample [SSND] The stored sample data. Further information wanted. EXTENSION:AIF,IFF --------E-AMIGA EXECUTABLE-MG--------------- All Amiga executables I've seen start with this signature. Of course the bytes are in Motorola byte order, as you would exspect from a Motorola based machine. This info here is based completely on my guesswork, maybe somebody from the Amiga could help flesh out this part. OFFSET Count TYPE Description 0000h 1 dword ID=03F3h EXTENSION:EXE OCCURENCES:AMIGA SEE ALSO: VALIDATION: --------M-AMS------------------------------- The AMS format is a multichannel module format created by the X-Tracker (not to be mistaken for he tracker of the same name by D-Lusion). The X-Tracker by Extreme PC is a multichannel tracker that features 32 digital channels, 64 MIDI channels, 255 samples, 64K patterns and positions. The tracker is currently in beta status and not enough information is yet available yet. OFFSET Count TYPE Description EXTENSION: OCCURENCES: PROGRAMS: REFERENCE: SEE ALSO:MOD VALIDATION: --------A-ARC------------------------------- The ARC files are archive files created by the SEA ARC program. The compression has been superceded by more recent compression programs. Similar archives can be created by the PAK and PkPAK programs. There have been many variations and enhancements to the ARC format though not as many as to the TIFF format. You may have to use some (paranoid) checks to ensure that you actually are processing an ARC file, since other archivers also adopted the idea of putting a 01Ah byte at offset 0, namely the Hyper archiver. To check if you have a Hyper-archive, check the next two bytes for "HP" or "ST" (or look below for "HYP"). Also the ZOO archiver also does put a 01Ah at the start of the file, see the ZOO entry below. OFFSET Count TYPE Description 0000h 1 byte ID=1Ah 0001h 1 byte Compression method (see table 0001) 0002h 12 char File name 000Fh 1 dword Compressed file size 0013h 1 dword File date in MS-DOS format (see table 0009) 0017h 1 word 16-bit CRC 0019h 1 dword Original file size ="SIZ" (Table 0001) ARC compression types 0 - End of archive marker 1 - unpacked (obsolete) - ARC 1.0 ? 2 - unpacked - ARC 3.1 3 - packed (RLE encoding) 4 - squeezed (after packing) 5 - crunched (obsolete) - ARC 4.0 6 - crunched (after packing) (obsolete) - ARC 4.1 7 - crunched (after packing, using faster hash algorithm) - ARC 4.6 8 - crunched (after packing, using dynamic LZW variations) - ARC 5.0 9 - Squashed c/o Phil Katz (no packing) (var. on crunching) 10 - crushed (PAK only) 11 - distilled (PAK only) 12-19 - to 19 unknown (ARC 6.0 or later) - ARC 7.0 (?) 20-29 - ?informational items? - ARC 6.0 30-39 - ?control items? - ARC 6.0 40+ - reserved According to SEA's technical memo, the information and control items were added to ARC 6.0. Information items use the same headers as archived files, although the original file size (and name?) can be ignored. OFFSET Count TYPE Description 0000h 2 byte Length of header (includes "length" and "type"?) 0002h 1 byte (sub)type 0003h ? byte data Informational item types as used by ARC 6.0 : Block type Subtype Description 20 archive info 0 archive description (ASCIIZ) 1 name of creator program (ASCIIZ) 2 name of modifier program (ASCIIZ) 21 file info 0 file description (ASCIIZ) 1 long name (if not MS-DOS "8.3" filename) 2 extended date-time info (reserved) 3 icon (reserved) 4 file attributes (ASCIIZ) Attributes use an uppercase letter to signify the following: R read access W write access H hidden file S system file N network shareable 22 operating system info (reserved) (Table 0009) Format of the MS-DOS time stamp (32-bit) The MS-DOS time stamp is limited to an even count of seconds, since the count for seconds is only 5 bits wide. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 |<---- year-1980 --->|<- month ->|<--- day ---->| 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 |<--- hour --->|<---- minute --->|<- second/2 ->| EXTENSION:ARC,PAK OCCURENCES:PC PROGRAMS:SEA ARC,PAK,PkPAK SEE ALSO:HYP,ZOO VALIDATION:FileSize="SIZ" --------A-ARJ------------------------------- The ARJ program by Robert K. Jung is a "newcomer" which compares well to PKZip and LhArc in both compression and speed. An ARJ archive contains two types of header blocks, one archive main header at the head of the archive and local file headers before each archived file. OFFSET Count TYPE Description 0000h 1 word ID=0EA60h 0002h 1 word Basic header size (0 if end of archive) 0004h 1 byte Size of header including extra data 0005h 1 byte Archiver version number 0006h 1 byte Minimum version needed to extract 0007h 1 byte Host OS (see table 0002) 0008h 1 byte Internal flags, bitmapped : 0 - no password / password 1 - reserved 2 - file continues on next disk 3 - file start position field is available 4 - path translation ( "\" to "/" ) 0009h 1 byte Compression method : 0 - stored 1 - compressed most 2 - compressed 3 - compressed faster 4 - compressed fastest Methods 1 to 3 use Lempel-Ziv 77 sliding window with static Huffman encoding, method 4 uses Lempel-Ziv 77 sliding window with pointer/ length unary encoding. 000Ah 1 byte File type : 0 - binary 1 - 7-bit text 2 - comment header 3 - directory 4 - volume label 000Bh 1 byte reserved 000Ch 1 dword Date/Time of original file in MS-DOS format 0010h 1 dword Compressed size of file 0014h 1 dword Original size of file 0018h 1 dword Original file's CRC-32 001Ah 1 word Filespec position in filename 001Ch 1 word File attributes 001Eh 1 word Host data (currently not used) ? 1 dword Extended file starting position when used (see above) ? char ASCIIZ file name ? char Comment ????h 1 dword Basic header CRC-32 ????h 1 word Size of first extended header (0 if none) ="SIZ" ????h+"SIZ"+2 1 dword Extended header CRC-32 ????h+"SIZ"+6 ? byte Compressed file (Table 0002) ARJ HOST-OS types 0 - MS-DOS 1 - PRIMOS 2 - UNIX 3 - AMIGA 4 - MAC-OS (System xx) 5 - OS/2 6 - APPLE GS 7 - ATARI ST 8 - NeXT 9 - VAX VMS EXTENSION:ARJ OCCURENCES:PC PROGRAMS:ARJ.EXE REFERENCE: SEE ALSO: VALIDATION: --------S-AU-MG----------------------------- The AU files are digital audio files used by the Sun and NeXT workstations. Further information wanted. OFFSET Count TYPE Description 0000h 4 char ID='.snd' 0004h 1 dword Offset of start of sample 0008h 1 dword Length of stored sample 000Ch 1 dword Sound encoding : 1 - 8-bit ISDN u-law 2 - 8-bit linear PCM (REF-PCM) 3 - 16-bit linear PCM 4 - 24-bit linear PCM 5 - 32-bit linear PCM 6 - 32-bit IEEE floating point 7 - 64-bit IEEE floating point 23 - 8-bit ISDN u-law compressed(G.721 ADPCM) 0010h 1 dword Sampling rate 0014h 1 dword Number of sample channels EXTENSION:AU OCCURENCES:SunOS --------B-BGI-G----------------------------- The BGI files are graphic drivers used by the Borland compilers to provide graphics output for different graphics cards. They are loaded dynamically. The exact format is not known to me ... OFFSET Count TYPE Description 0000h 4 char ID='FBGD' 0004h 1 dword ID=08080808h used to backspace over ID if typing the file 0008h ? char Driver ID string, terminated with #26 EXTENSION:BGI OCCURENCES:PC PROGRAMS:Borland Pascal, Borland C, Turbo Pascal --------I-BMP------------------------------- The BMP files are the way, Windows stores bit mapped images. The BMP image data is bit packed but every line must end on a dword boundary - if thats not the case, it must be padded with zeroes. BMP files are stored bottom-up, that means that the first scan line is the bottom line. The BMP format has four incarnations, two under Windows (new and old) and two under OS/2, all are described here. OFFSET Count TYPE Description 0000h 2 char ID='BM' - BitMap OS/2 also supports the following IDs : ID='BA' - Bitmap Array ID='CI' - Color Icon ID='CP' - Color Pointer (mouse cursor) ID='IC' - Icon ID='PT' - Pointer (mouse cursor) 0002h 1 dword Filesize of whole file 0006h 4 byte reserved 000Ah 1 dword Offset of bitmap in file ="BOF" 000Eh 1 dword Length of BitMapInfoHeader The BitMapInfoHeader starts directly after this header. 12 - OS/2 1.x format 40 - Windows 3.x format 64 - OS/2 2.x format 0012h 1 dword Horizontal width of bitmap in pixels 0016h 1 dword Vertical width of bitmap in pixels 001Ah 1 word Number of planes 001Ch 1 word Bits per pixel ( thus the number of colors ) ="BPP" 001Eh 1 dword Compression type, see ALGRTHMS.txt for descrip- tion of the different types 0 - none 1 - RLE 8-bit/Pixel 2 - RLE 4-bit/Pixel 0022h 1 dword Size of picture in bytes 0026h 1 dword Horizontal resolution 002Ah 1 dword Vertical resolution 002Ah 1 dword Number of used colors 002Ah 1 dword Number of important colors 0036h ? rec Definition of N colors N=1 shl "BPP" 1 byte Blue component 1 byte Green component 1 byte Red component 1 byte Filler "BOF" ? byte Image data EXTENSION:BMP,RLE,LGO OCCURENCES:PC PROGRAMS:Windows,Paintbrush REFERENCE:DDJ0994 VALIDATION: SEE ALSO:rDIB --------I-CEG------------------------------- The CEG (Continous Edge Graphic)-format is a raw picture format used by the Edsun cards with CEG-chips which provide some better look through anti-aliasing or something like that. The header before the data looks like this : OFFSET Count TYPE Description 0000h 1 word Version number of the CEG-format 0002h 9 char ID='Edsun CEG' 000Bh 1 byte Number of pixels per byte 000Ch 9 byte Reserved 0015h 80 char ASCIIZ copyright notice for the image 0065h 1 byte CEG-revision number (1) 0066h 1 byte Used CEG-mode (0..15) 0067h 1 Word Number of pixels per line 0069h 1 word Number of lines 006Ah 1 byte Old VGA-mode 006Bh 1 byte VGA Data flag : 0 - VGA registers are invalid 1 - VGA registers are valid 006Ch 92 byte VGA register data 00C2h 256 rec VGA palette entries 1 byte Value for red 1 byte Value for green 1 byte Value for blue 03C2h 1 word Year of file creation 03C4h 1 byte Day of file creation 03C5h 1 byte Month of file creation 03C6h 1 byte Hour of file creation 03C7h 1 byte Minute of file creation 03C8h 1 byte Second of file creation 03C9h 24 byte Reserved for future use EXTENSION:??? OCCURENCES:PC PROGRAMS:??? --------a-CEL------------------------------- CEL files contain one or more frames of image data used by the Autodesk Animator and Animator Pro animation pakages. Both Animator Pro and the original Animator produce CEL files, but each uses a different file format. --- Animator Pro CEL Files An Animator Pro CEL file is identical to a FLC file in all respects. A CEL file should have a Celdata block in the file prefix block which describes the x,y placement of the CEL. If the Celdata placement block is not present, assume a placement of 0,0. --- Original Animator CEL Files The original Animator also produced CEL files. These were still-picture files, not the multi-frame files Animator Pro now uses. A CEL file from the original Animator is identical to a PIC file from the original Animator in all respects. EXTENSION:CEL OCCURENCES:PC PROGRAMS:Autodesk Animator SEE ALSO:FLIc,FLC,PIC VALIDATION: --------F-CHR------------------------------- The CHR files are scalable fonts used by the Borland graphics interface (BGI) to display fonts in graphics mode. OFFSET Count TYPE Description 0000h 4 char ID='PK',08,08 0004h 4 char ID='BGI ' 0008h ? char Font description, terminated with #26 0008h 1 word Headersize +???? ="SIZ" 4 char Internal font name 1 word Font file size in bytes 1 byte Font driver major version 1 byte Font driver minor version 1 word 0100h "SIZ" word Zeroes to pad out the header 0080h 1 char Signature byte, '+' means stroke font 0081h 1 word Number of chars in font file ="NUM" 0083h 1 byte undefined 0084h 1 byte ASCII value of first char in file 0085h 1 word Offset to stroke definitions 0087h 1 byte Scan flag ?? 0088h 1 byte Distance from origin to top of capital 0089h 1 byte Distance from origin to baseline 008Ah 1 byte Distance from origin to bottom descender 008Bh 4 char Four character name of font 0090h "NUM" word Offsets to character definitions 0090h+ "NUM" byte Width table for the characters "NUM"*2 0090h+ Start of character definitions "NUM"*3 The individual character definitions consist of a variable number of words describing the operations required to render a character. Each word consists of an (x,y) coordinate pair and a two-bit opcode, encoded as shown here: Byte 1 7 6 5 4 3 2 1 0 bit # op1 <seven bit signed X coord> Byte 2 7 6 5 4 3 2 1 0 bit # op2 <seven bit signed Y coord> Opcodes op1=0 op2=0 End of character definition. op1=0 op2=1 Do scan op1=1 op2=0 Move the pointer to (x,y) op1=1 op2=1 Draw from current pointer to (x,y) EXTENSION:CHR OCCURENCES:PC PROGRAMS:Borland Pascal, Borland C REFERENCE:BGIKIT.ZIP SEE ALSO: VALIDATION: --------M-CMF-G----------------------------- The CMF files are music files used by the SoundBlaster sound card family. The Creative Labs Music Format might be proprietary, the info is guesswork. OFFSET Count TYPE Description 0000h 4 char ID="CTMF" ********* EXTENSION:CMF OCCURENCES:PC PROGRAMS:PLAYCMF.EXE --------I-COL------------------------------- A COL file stores the rgb values for entries in the color palette. Both Animator Pro and the original Animator produce COL files, but the formats are different. To process a COL file for input, check the file size. If it is exactly 768 bytes, the file is an original Animator COL file. If the file is any other size, it is an Animator Pro COL file - which makes identification almost impossible. Animator Pro COL Files do have a 8-byte header : OFFSET Count TYPE Description 0000h 1 dword File size, including this header 0004h 1 word ID=0B123h 0006h 1 word Version, currently 0 Following the file header are palette entries in rgbrgb... order. Each of the r, g, and b components is a single byte in the range of 0-255. Generally, there will be data for 256 palette entries, but this cannot be assumed. The actual number of palette entries is ((size-8)/3); if this value is not an even multiple of three, the file is corrupted. Original Animator COL Files A COL file created by the original Animator is exactly 768 bytes long. There is no file header or other control information in the file. EXTENSION:COL OCCURENCES:PC PROGRAMS:Autodesk Animator, Autodesk Animator Pro SEE ALSO:FLIc,FLT --------E-COM------------------------------- The COM files are raw binary executables and are a leftover from the old CP/M machines with 64K RAM. A COM program can only have a size of less than one segment (64K), including code and static data since no fixups for segment relocation or anything else is included. One method to check for a COM file is to check if the first byte in the file could be a valid jump or call opcode, but this is a very weak test since a COM file is not required to start with a jump or a call. In principle, a COM file is just loaded at offset 100h in the segment and then executed. OFFSET Count TYPE Description 0000h 1 byte ID=0E9h ID=0EBh Those are not safe ways to determine wether a file is a COM file or not, but most COM files start with a jump. Further information not available. EXTENSION:COM OCCURENCES:PC SEE ALSO:EXE,MZ EXE,NE EXE --------E-CORE-G---------------------------- The core images are dumps of the system core from different unix machines (as far as I gather). Info comes from a magic file - so this is only good for identification. What you would do with a core image on a foreign machine, eludes me anyway. Maybe the information below is wrong and the 386 core dump also belongs to the word at 0174h... OFFSET Count TYPE Description 0000h 4 char ID='core' 0174h 1 word Executable type 1 015Dh - B370 executable 5D01h - B370 executable 0158h - B370 executable 5801h - B370 executable 015Fh - XA370 executable 05F01h - XA370 executable 015Ah - XA370 executable 0176h 1 word Executable type 2 0176h - 386 executable EXTENSION:??? OCCURENCES:Unix flavours PROGRAMS:N/A SEE ALSO: --------D-CPI-G----------------------------- The DOS CPI files are data files which are loaded by the country drivers of MS-DOS. The information comes from a magic file, which makes it good for identification only. OFFSET Count TYPE Description 0000h 9 char ID=255,'FONT ',0 EXTENSION:CPI OCCURENCES:PC PROGRAMS:MS-DOS --------X-CRD-G----------------------------- The Windows 3.1 Cardfile.EXE is a (simple) addressbook application included with the Windows 3.1+ operating system by Microsoft. OFFSET Count TYPE Description EXTENSION:CRD OCCURENCES:PC, ALPHA? PROGRAMS:CARDFILE.EXE --------X-DBase II-O------------------------ The DBase II file format. The dBASE II file header has a fixed size of 521 bytes. OFFSET Count TYPE Description 0000h 1 byte dBASE version, 02h = dBASE II 0001h 1 word Number of data records in file ="NDR" 0003h 1 byte Month of last update 0004h 1 byte Day of last update 0005h 1 byte Year of last update 0006h 1 word Size of each data record ="DRS" 0008h 64 rec Field descriptors 11 char ASCIIZ field name, 0Dh as first char indicates end of list. 1 char Data type 'C' - Char 'N' - Numerical 'L' - Logical 1 byte Field length 1 word Field data address ( set in RAM ) 1 byte Number of decimal places 0208h 1 byte If 0Dh, then all 32 field descriptors were used; otherwise 00h EXTENSION:DBF OCCURENCES:PC PROGRAMS:DBase SEE ALSO:DBASE III,XBase VALIDATION:FileSize="NDR"*"DRS"+0208h --------X-DBase III------------------------- DBASE - File header structure (DBASE III) OFFSET Count TYPE Description 0000h 1 byte dBASE version, 03h = dBASE III w/o *.DBT 83h = dBASE III w *.DBT 0001h 1 byte Month of last update 0002h 1 byte Day of last update 0003h 1 byte Year of last update 0004h 1 dword Number of data records in file ="NDR" 0008h 1 word Header size ="HSZ" 000Ah 1 word Data record size ="DRS" 000Ch 12 byte reserved 0020h ? rec Field descriptors The list of field descriptors is terminated with a terminator byte 0Dh. 11 char ASCIIZ field name 1 char Data type 'C' - Char 'D' - Date 'L' - Logical 'M' - Memo 'N' - Numerical 1 dword Field data address ( set in RAM ) 1 byte Field length 1 byte Number of decimal places 14 byte reserved EXTENSION:DBF SEE ALSO:DBASE II,XBase OCCURENCES:PC PROGRAMS:DBase VALIDATION:FileSize="NDR"*"DRS"+"HSZ" --------X-DBASE IV-------------------------- **** Description missing **** EXTENSION:DBF,DBT OCCURENCES:PC PROGRAMS:DBase 4.0, Clipper REFERENCE: SEE ALSO:DBASE II,DBASE III,XBase VALIDATION: --------M-DCM------------------------------- The DCM module format was designed by Winfried Welti, and is based on a RIFF / IFF style format called WUFF - Welti's Universal File header Format. The header for WUFF files is built much like the RIFF header : OFFSET Count TYPE Description 0000h 4 char ID="WUFF" 0004h 4 char Subformat ID, see below 0008h 1 dword File length including the WUFF header 000Ch 1 word File format version as BCD. Bits 15-12 are flags : 12 - Archive file. If set, the data after the header contains only WUFF style files. 13-15 - reserved. 000Eh 1 word Length of subheader following this header. The DCM format has a header ID of "DCMw" and a version word of 0100h. It extends the header with the following values: 0010h 1 word Song flags, bitmapped 0 - Samples present 1 - Songdata present 2 - Infotext present 3-15 - reserved 0012h 1 word Number of instruments After the header, there follow the included (WUFF) files; Allowed fileformats for include are : MDCw (Patterns), EDIw (Instrument), TXTw (Text); see below. The MDC format is a module format which uses compiled pattern data. It has the subformat ID "MDCw", the current version is 1.01, it extends the header with the following fields : OFFSET Count TYPE Description 0010h 1 word Flags for the song (see table 0011) 0012h 1 word Internal frequency for replay 0014h 1 dword Length of packed data channels 0028h 1 byte Number of used channels 24 : Chnls : Byte Used Channels (0..chnls-1) (Table 0011) MDC song flags 0 - Stereo enable 1 - Free Frequency (can replay freq change in song ?) 2-3 - Offset size : 00 - Byte (mod offsets, multiply by 256) 01 - Word (16 bit offsets) 10 - DWord (32 bit offsets) 11 - reserved 4-5 - Panning range 00 - GUS panning (4 bit, byte value) 01 - 8 Bit panning 10 - reserved 11 - reserved 6-7 - Instrument number range 00 - Byte 01 - Word 10 - reserved 11 - reserved 8 - S3M compatibility bit (all ranges are like s3m : mod offsets, GUS panning, Ins Num Range : Byte) 9 - Tuning control 0 - Use Period values (word) (s3m) 1 - Use Frequency values (Dword) 10-15- reserved After the header, there comes the packed data for the module. This format consists of one control byte and depending on the value some other data bytes. Values of control byte : 0 - Next Frame 1 - End of File 2 - Order Num. follows [byte] 3 - Loop to Ord Num (1 byte follows) 4 - Frames to wait follows [byte] 5 - New Replay freq follows [byte] 6..31 - reserved If the byte is greater than 31 then it has the following bitmapped format : 0-4 - Channel Nr. 5 - Key Byte follows 6 - Period Value follows [word] 7 - Volume Value follows [byte] Key byte format, bitmapped : 0 - Start Sample 1 - Stop Sample 2 - Instr Nr follows [byte/word/??] 3 - Offset follows [byte/word/??] 4 - Pan pos follows [byte/??] 5-7 - reserved The EDI format has an ID value of "EDIw" and a version of 0100h, and it extends the header with the following information : OFFSET Count TYPE Description 0010h 1 word Sample flags, bitmapped 0-1 - Loop type 00 - none 01 - forward loop 10 - bidirectional loop 11 - reserved 2 - 32 bit values for sample length etc., see below 3 - Sample is 16 bit 4 - Frequency is 32 bit. 5-15 - reserved 0012h 1 word C2 frequency of sample 0014h 1 dword Loop start, this may be a word, depending on the sample flags. 0018h 1 dword Loop end, see loop start 001Ch 1 dword Sample length The song text is plain ASCII. EXTENSION:DCM OCCURENCES:PC SEE ALSO:S3M --------M-DMF------------------------------- The Digital Music Files are high quality MOD style files with up to 32 channels/1024 beats per track. The X-Tracker by the demo group D-Lusion produces this format. In general, the format is well organised due to the ID/Blocklength structure wich makes downward compatibility to older version files easy, but the Version 4 (current version) of the file format, produced by X-Tracker 0.30β still requires some manual scanning for the next ID which I regard as not so nice. Version 5 of the format has the [SEQU] block length fixed, but the [SMPD] block has the length 0. The file consists of several blocks, each with a 4 char (dwordint) ID tag and a length of the record data. The main file header looks as follows : OFFSET Count TYPE Description 0000h 4 char ID='DDMF' 0004h 1 byte Version id. 4 -> XTracker 0.30β 0005h 8 char Tracker name, e.g. 'XTRACKER', 'HACKTRAK' :-) 000Dh 30 char Song name (ASCIIZ?) 002Bh 20 char Name of composer (ASCIIZ?) 0049h 1 byte Day of creation 004Ah 1 byte Month of creation 004Bh 1 byte Year of creation The other headers have the standard skip record format, in this section named DMFblock. The offsets start _after_ this header record : OFFSET Count TYPE Description 0000h 4 char Record tag (see below) 0004h 1 dword Size of data bedwording to this tag DMFblock [INFO] Contains some message in ASCII. Length of the message is the size of the record. DMFblock [CMSG] Contains the message the composer wants to bring to us. After the ID record, another fill byte preceeds the real message ! OFFSET Count TYPE Description 0000h 1 byte Junk byte 0001h ? char Composer message DMFblock [SEQU] Contains the information necessary for sequencing the different tracks. OFFSET Count TYPE Description 0000h 1 word Song loop start 0002h 1 word Song loop end 0004h ? word Sequencer data DMFblock [PATT] This block contains the information about the different patterns and tracks. 0000h 1 word Maximum pattern (=Songlength) ="MPT" 0004h 1 byte Number of channels of this song (<= 16) 0005h "MPT" rec Pattern data. 1 byte Track entries. (<=32) ="TET" How many tracks this pattern has. XTracker allows a different number of tracks for each pattern. 1 byte Beat information High nibble : Ticks per beat Low nibble : Beats per measure 1 word Maximum number of ticks (<=512) 1 dword Number of bytes to skip for the next pattern information. ? rec Track data stream 1 byte Global track effect 1 byte Global track data (only if global effect >0 !!!) "TET" rec 1 byte Information byte, bitmapped For each bit set in the info byte, one or two data byte(s) follow. This info byte must not always be there, see below. For effects, 2 bytes follow. 0 - reserved 1 - Volume effect 2 - Note effect 3 - Instrument effect 4 - Volume set 5 - Note set 6 - Instrument set 7 - Counter to next information byte. Not set means, that next info byte follows in 1 tick, unit is in ticks. The maximum number of effects is 3 at a time, the maxximum size of a track information is 11 bytes (with info=0FEh). ? rec Effect bytes 1 byte Effect number 1 byte Effect data ? byte Set data ** Here follows the pattern data, but it's too late today ** DMFblock [INST] This block contains the information about the instrument data. If this block does not exists, then the instrument numbers in the patterns point directly to the samples in the [SMPI] block. OFFSET Count TYPE Description 0000h 1 byte Number of instruments 0001h ? rec Instrument information block 30 char The name of the instrument 1 byte Instrument type, bitmapped 0 - Instrument type 1 - Instrument type 00 - Sample in [SMPI] block 01 - MIDI device 10 - FM instrument 11 - reserved 2 - valid attack envelope 3 - sustain on 4 - reserved 5 - reserved 6 - reserved 7 - reserved 1 byte Range entries ="REN" Like the GF1 patterns, an instrument can consist of several samples. "REN" rec Range definition 1 byte Sample to be played in this range 1 byte Length of this range in half tone steps up 6 byte Not yet defined 6-point envelope DMFblock [SMPI] This block contains the information about the samples stored in the file. OFFSET Count TYPE Description 0000h 1 byte Number of samples (<= 250) ="NUM" "NUM" rec Sample record 1 byte Length of sample name ? char Name of the sample 1 dword Length of sample in bytes 1 dword Start of sample loop 1 dword End of sample loop 1 word Frequency used for C-3 1 byte Volume for sample 0 - Don't change current volume otherwise volume (linear scale) 1 byte Sample type, bitmapped 0 - not looped/looped 1 - 8/16-bit sample (16-bit not supported with X-Tracker v0.30) 2,3 - Pack type : 00 - unpacked, signed sample 01 - pack type 0 10 - pack type 1 11 - pack type 2 4-6 - reserved, set to zero 7 - Sample stored in dmf/bib 1 word reserved, set to zero 1 dword crc32 of sample to identify samples in BIB. DMFblock [SMPD] This block contains the sample data (raw or packed, see [SMPI] block) in the following format : <SampleLength> <SampleData> <SampleLength> <SampleData> etc. OFFSET Count TYPE Description 0000h 1 dword Length of the following sample ? byte Sample data (might be packed) DMFBlock [ENDE] This block serves as a end of file marker and can be used for validation. Note that the four ID characters are _not_ followed by a length dword ! Each DMF file simply ends with the four characters 'ENDE'. EXTENSION:DMF OCCURENCES:PC PROGRAMS:X-TRACKER,PLAY_DMF SEE ALSO: VALIDATION: --------?-DMS------------------------------- The DMS (Digital Music System??) are some other files I found on a mixed system CD, so I include them in my listing. They are Amiga files, so here's the call to the Amiga folks again. OFFSET Count TYPE Description 0000h 4 char ID="DMS!" EXTENSION:DMS OCCURENCES:Amiga --------A-DWC-?----------------------------- The DWC archives seem to be a relict from ancient computing times - I've never seen any program that dealt with them or could create them. They are yet included in this compilation for reasons I don't know. But maybe one of you stumbles over such a file, he might find this documentation helpful. The DWC archives consist of single file entries with one archive trailer. The archive entries seem to be at the start of the archive, but maybe they are stored at the end of the archive, before the trailer. Each file header has the following format : OFFSET Count TYPE Description 0000h 13 char Name of the original file in ASCIIZ. 000Dh 1 dword Size of the original file 0011h 1 dword MS-DOS date and time of the original file 0015h 1 dword Size of the compressed file 0019h 1 dword Offset of compressed data in archive file 001Dh 3 byte reserved 0020h 1 byte Method : 1 - crunched 2 - stored The trailer at the end of each archive has the following format : OFFSET Count TYPE Description 0000h 1 word Length of trailer (=27) 0002h 1 word Size of the directory entries (=34)?? 0004h 16 byte reserved 0014h 1 dword Count of the directory entries 0018h 3 char ID="DWC" EXTENSION:DWC?? OCCURENCES:PC?? PROGRAMS:DWC.EXE?? --------S-EFE------------------------------- The EFE files are instrument files for the Ensoniq sampler system. Further information wanted. EXTENSION:EFE SEE ALSO:GKH,INS --------E-EXE-X----------------------------- Different types of executables have emerged on the Intel DOS related platforms - but all contain at least a stub MZ Exe before their actual EXE body... SEE ALSO:MZ EXE,NE EXE --------M-FAR------------------------------- The Fandarole composer is a 16 channel composer created by the group Digital Infinity / Daniel Potter for digital music in module style. The Fandarole modules have the following format : OFFSET Count TYPE Description 0000h 4 char ID='FAR',254 0004h 40 char Song name 002Ch 3 char ID=13,10,26 This ID makes it possible to see the song name by simply typing the .far file. 002Fh 1 word Remaining header size 0031h 1 byte Version number as BCD, high nibble = major version low nibble = minor version 0032h 16 byte Channel on/off map <> 0 means that channel is used 0042h 1 rec Editing data. This data is not necessary for playback, but is stored by the composer for resume of edit. 1 byte Current octave 1 byte Current voice 1 byte Current row 1 byte Current pattern 1 byte Current order 1 byte Current sample 1 byte Current volume 1 byte Current top of screen display 1 byte Current editing area 0=samples, 1=patterns, 2=orders 1 byte Current tempo (default tempo) 004Ch 16 byte Panning map for each channel, 0=left,15=right 005Ch 1 byte Marked block start 005Dh 1 byte Marked block end 005Eh 1 byte Grid granularity 005Fh 1 byte Edit mode 0060h 1 word Song text length ="STL" 0062h "STL" char Song text 0062h+ 256 byte Order bytes for pattern ordering "STL" 0162h+ 1 byte Number of stored patterns "STL" 0163h+ 1 byte Song length in patterns "STL" 0164h+ 1 byte Loop position. This is the restart position "STL" if the end of the song is reached. 0165h+ 256 word Length of each pattern. The number of rows in "STL" each pattern is ( this word-2 )/(16*4) After this block, there might be additional data in the future (see remaining header size, above), after that, the pattern data follows. The pattern data : OFFSET Count TYPE Description 0000h 1 byte Length of pattern in rows ="LIR" 0001h 1 byte Tempo for this pattern - Unsupported, use not recommended 0002h 4*"LIR" rec Note data for each pattern in 4 channels 1 byte Note value (Octave*12+Note)+1 0 means no note 1 byte Sample number 1 byte Volume byte. The volume is stored reversed, the lower nibble is the major volume, the lower nibble is the minor volume adjust. 1 byte Effect byte. Upper nibble is effect, lower nibble is data. (see table 0004) (Table 0004) Note Effects in FAR-modules 01 - Pitch adjust 02 - Pitch adjust 03 - Portamento to note 04 - Retrigger note data times for one bar 05 - Set vibrato depth 06 - Vibrato 07-0C - ?Possibly undefined? 0D - Fine tune tempo down 128/Tempo 0E - Fine tune tempo up 128/Tempo 0F - Tempo, notes per second = 32/Tempo After the pattern data, the sample map follows. This is an array of 64 bits (eight bytes), each set bit corresponds to a sample record stored in the file, each zero bit means that the corresponding record is not stored in the file. OFFSET Count TYPE Description 0000h 8 byte Sample flags, see above After the sample flags, the samples themselves are stored in the FSM format, except for the ("FSM",254) header. They follow header-data-header-data-etc., see the FSM entry for further information. EXTENSION:FAR OCCURENCES:PC PROGRAMS:Fandarole Composer REFERENCE: SEE ALSO:FSM VALIDATION: --------a-FLT------------------------------- The FLC files are files created by the Autodesk Animator Pro and contain animations. The FLC files are a superset of those created by the Autodesk Animator (FLIc files). In some cases, new data fields or compression methods were added. The FLC files use a hierarchical block oriented structure and blocks are a combination of control information and data. The file consists of one header followed by data blocks. It is possible that new types of blocks not described in this document will be added to animation files in the future. It is recommended that you quietly ignore unknown block types you encounter during animation playback. The size fields in the block headers make it easy to skip an entire unrecognized block. The FLC files consist of one 128-byte header block and one or more of the following blocks : The prefix block, if present, contains Animator Pro settings information, CEL placement information, and other auxiliary data. A frame block exists for each frame in the animation. In addition, a ring frame follows all the animation frames. Each frame block contains color palette information and/or pixel data. The ring frame contains delta-compressed information to loop from the last frame of the flic back to the first. It can be helpful to think of the ring frame as a copy of the first frame, compressed in a different way. All flic files will contain a ring frame, including a single-frame flic. The FLC file header OFFSET Count TYPE Description 0000h 1 dword The size of the whole animation file, including the size of this header. 0004h 1 word ID=0AF12h 0006h 1 word Number of frames in this animation, not including the ring frame. FLC files have a maximum length of 4000 frames. 0008h 1 word Screen width in pixels 000Ah 1 word Screen height in pixels 000Ch 1 word Bits per pixel (always 8) 000Eh 1 word Flags - bitmapped 0 - Ring frame not written / ring frame written 1 - Flic header not updated / updated 2-15 - reserved 0010h 1 dword Delay between frames in miliseconds. 0014h 1 word reserved 0016h 1 dword MS-DOS date and time of file creation (see table 0009) 001Ah 1 dword Serial number of the Animator Pro program used to create the file. If the file was created with the FlicLib development kit, this value equals 0464c4942h ("FLIB"). 001Eh 1 dword MS-DOS date and time of last modification (see table 0009) 0022h 1 dword Serial number of program that made the last modification. See Serial Number. 0026h 1 word X-axis aspect ratio of the file 0028h 1 word Y-axis aspect ratio of the file (320x200 = 6:5) 002Ah 38 byte reserved (0) 0050h 1 dword Offset from begin of file to the first animation frame block. 0054h 1 dword Offset from begin of file to the second animation frame block. This value is used when looping the animation. 0058h 40 byte reserved (0) Each subblock in the animation file has an identical header structure, which is formatted like this : 0000h 1 dword The size of the whole block and all subordinate blocks including the size of this header 0004h 1 word Block ID, varies depending on the block type. 0006h 1 word Number of subordinate blocks in this block. including the ring frame. FLC files have a maximum length of 4000 frames. 0008h 8 byte reserved(0) Immediately after the header there may be an optional prefix block, which is used to store additional data which is not directly involved in animation playback. The prefix block has the usual header with an ID of 0F100h. The prefix block should only be created by the Animator Pro programs and never by any other software, it is to be ignored by other software. The FLC frame blocks contain the information to convert the current frame into the next frame; they have an ID of 0F1FAh. Directly after the frame header, there are the subordinate data blocks - if the subblock count is 0 this means, that the current frame is identical to the previous frame, only the appropriate delay has to be made. The data blocks have a different header format : OFFSET Count TYPE Description 0000h 1 dword Size of this block, including header size 0004h 1 word Data type identifier : 4 - 256-level color palette info 7 - Word-oriented delta compression 11 - 64-level color palette info 12 - Byte-oriented delta compression 13 - Entire frame is color index 0 15 - Byte run length compression 16 - No compression 18 - Postage stamp sized image 0006h ? byte Color or pixel data The following sections describe each of these data encoding methods in detail. --- Block Type 4 (FLI_COLOR256) - 256-Level Color The data in this block is organized in packets. The first word following the block header is a count of the number of packets in the blocks. Each packet consists of a one-byte color index skip count, a one-byte color count and three bytes of color information for each color defined. At the start of the block, the color index is assumed to be zero. Before processing any colors in a packet, the color index skip count is added to the current color index. The number of colors defined in the packet is retrieved. A zero in this byte indicates 256 colors follow. The three bytes for each color define the red, green, and blue components of the color in that order. Each component can range from 0 (off) to 255 (full on). The data to change colors 2,7,8, and 9 would appear as follows: 2 ; two packets 2,1,r,g,b ; skip 2, change 1 4,3,r,g,b,r,g,b,r,g,b ; skip 4, change 3 --- Block Type 11 (FLI_COLOR) - 64-Level Color This block is identical to FLI_COLOR256 except that the values for the red, green and blue components are in the range of 0-63 instead of 0-255, i.e. in native VGA values which can be written to the VGA without modification. --- Block Type 13 (FLI_BLACK) - No Data This block has no data following the header. All pixels in the frame are set to color index 0. --- Block Type 16 (FLI_COPY) - No Compression This block contains an uncompressed raw image of the frame, from upper left to the lower right, storing each line sequentially. This type of block is created when the preferred compression method (SS2 or BRUN) generates more data than the uncompressed frame image; a relatively rare situation. --- Block Type 15 (FLI_BRUN) - Byte Run Length Compression This block contains the entire image in a compressed format. Usually this block is used in the first frame of an animation, or within a postage stamp image block. The data is organized in lines. Each line contains packets of compressed pixels. The first line is at the top of the animation, followed by subsequent lines moving downward. The number of lines in this block is given by the height of the animation. The first byte of each line is a count of packets in the line. This value is ignored, it is a holdover from the original Animator. It is possible to generate more than 255 packets on a line. The width of the animation is now used to drive the decoding of packets on a line; continue reading and processing packets until width pixels have been processed, then proceed to the next line. Each packet consist of a type/size byte, followed by one or more pixels. If the number is negative (the high bit of the packet type is set), the absolute value is the count of pixels to be copied from the packet to the animation image, otherwise the next byte contains a single pixel which is to be replicated; The lower 7 bits are the number of times the pixel is to be replicated. --- Block Type 12 (FLI_LC) - Byte Aligned Delta Compression This block contains the differences between the previous frame and this frame. This compression method was used by the original Animator, but is not created by Animator Pro. This type of block can appear in an Animator Pro file, however, if the file was originally created by Animator, then some (but not all) frames were modified using Animator Pro. The first word following the block header contains the position of the first line in the block. This is a count of lines (down from the top of the image) which are unchanged from the prior frame. The second word contains the number of lines in the block. The data for the lines follows these two words. Each line begins with two bytes. The first byte contains the starting x position of the data on the line, and the second byte the number of packets for the line. Unlike BRUN compression, the packet count is significant (because this compression method is only used on 320x200 flics). Each packet consists of a single byte column skip, followed by a packet type/ size byte, which has the reverse meaning of in the block type 15. --- Block Type 7 (FLI_SS2) - Word Aligned Delta Compression This format contains the differences between consecutive frames. This is the format most often used by Animator Pro for frames other than the first frame of an animation. It is similar to the line coded delta (LC) compression, but is word oriented instead of byte oriented. The data is organized into lines and each line is organized into packets. The first word in the data following the block header contains the number of lines in the block. Each line can begin with some optional words that are used to skip lines and set the last byte in the line for animations with odd widths. These optional words are followed by a count of the packets in the line. The line count does not include skipped lines. The high order two bits of the word is used to determine the contents of the word : Bit 15 Bit 14 Meaning 0 0 The word contains the packet count. The packets follow this word. The packet count can be zero; this occurs when only the last pixel on a line changes. 1 0 The low order byte is to be stored in the last byte of the current line. The packet count always follows this word. 1 1 The word contains a line skip count. The number of lines skipped is given by the absolute value of the word. This word can be followed by more skip counts, by a last byte word, or by the packet count. The packets in each line are similar to the packets for the line coded block. The first byte of each packet is a column skip count. The second byte is a packet type. If the packet type is positive, the packet type is a count of words to be copied from the packet to the animation image. If the packet type is negative, the packet contains one more word which is to be replicated. The absolute value of the packet type gives the number of times the word is to be replicated. The high and low order byte in the replicated word do not necessarily have the same value. --- Block Type 18 (FLI_PSTAMP) - Postage Stamp Image This block type holds a postage stamp - a reduced-size image - of the frame. It generally appears only in the first frame block within a flic file. When creating a postage stamp, Animator Pro considers the ideal size to be 100x63 pixels. The actual size will vary as needed to maintain the same aspect ratio as the original. The pixels in a postage stamp image are mapped into a six-cube color space, regardless of the color palette settings for the full frame image. A six-cube color space is formed as follows: start at palette entry 0 for red = 0 thru 5 for green = 0 thru 5 for blue = 0 thru 5 palette_red = (red * 256)/6 palette_green = (green * 256)/6 palette_blue = (blue * 256)/6 move to next palette entry end for blue end for green end for red Any arbitrary rgb value (where each component is in the range of 0-255) can be mapped into the six-cube space using the formula: ((6*red)/256)*36 + ((6*green)/256)*6 + ((6*blue)/256) The full postage stamp block header is defined as follows: Offset Length Name Description OFFSET Count TYPE Description 0000h 1 dword Size of this block, including header size 0004h 1 word ID=18 0006h 1 word Height of the postage stamp image 0008h 1 word Width of the image 000Ah 1 word Color translation type : 1 - six-cube color space Immediately following this header is the postage stamp data. The data is formatted as a block with standard size/type header. The type will be one of: 15 FPS_BRUN Byte run length compression 16 FPS_COPY No compression 18 FPS_XLAT256 Six-cube color xlate table The FPS_BRUN and FPS_COPY types are identical to the FLI_BRUN and FLI_COPY encoding methods described above. The FPS_XLAT256 type indicates that the block contains a 256-byte color translation table instead of pixel data. To process this type of postage stamp, read the pixel data for the full-sized frame image, and translate its pixels into six-cube space using a lookup in the 256-byte color translation table. This type of postage stamp appears when the size of the animation frames is smaller than the standard 100x63 postage stamp size. ************* TWE - Tween Data Files A TWE file holds information about a tweening operation set up via the Tween menus. The information includes the starting and ending shapes, and the optional userD specified links between the shapes. Animator Pro creates tween files. A TWE file begins with an 8-byte header defined as follows: Offset Length Name Description 0 2 magic File format identifier. Always hex 1995. 2 2 version The file format version; always zero. 4 4 tcount The number of tween shapes in the file; always 2. 8 8 reserved Unused space; set to zeroes. 16 4 linkcount The number of link entries in the file. Immediately following the file header are the link entries. If the linkcount value is zero there are no links. Each link entry is a pair of 32-bit integers. The first value in each pair is the index of the point in the first shape, and the second value is the index of the point in the ending shape. (IE, a link value of 2,7 says to link the second starting-shape point to the seventh ending-shape point.) Following the link entries is the data block that describes the starting shape, then the data block that describes the ending shape. The format of these blocks is identical to that of the polygon (PLY) file, including file header data. In other words, they appear as if a pair of polygon files are embedded in the tween file at this point. ********** OPT - Optics Menu Settings Files An OPT file holds information about an optics operation set up via the Optics menus. Both Animator Pro and the original Animator create OPT files. The file format is the same for both. An OPT file starts with a 4-byte header, as follows: Offset Length Name Description 0 2 magic File type identifier. Always hex 1A3F. 2 2 count Number of records in the file. Following the file header are optics records of 50 bytes each. A record is generated for each click on CONTINUE MOVE in the OPTICS menu. The move records are formatted as follows: Offset Length Name Description 0 4 link In the file, this field is always zero. In memory, it's a pointer to the next move record. 4 6 spincenter The x,y,z coordinates of the spin center point; three 16-bit values. 10 6 spinaxis The x,y,z coordinates of the spin axis; three 16-bit values. 16 6 spinturns The x,y,z coordinates of the spin turns; three 16-bit values. 22 4 spininter Intermediate turns. Two 16-bit values. These are values for a conjugation matrix that corresponds to spin axis. 26 6 sizecenter The x,y,z coordinates of the size center point; three 16-bit values. 32 2 xmultiplier Determines (along with xdivisor) how to scale along x dimension. 34 2 xdivisor Determines (along with xmultiplier) how to scale along x dimension. 36 2 ymultiplier Determines (along with ydivisor) how to scale along y dimension. 38 2 ydivisor Determines (along with ymultiplier) how to scale along y dimension. 40 2 bothmult Like xmultiplier, but applied to both dimensions. 42 2 bothdiv Like xdivisor, but applied to both dimensions. 44 6 linearmove The x,y,z offset for a linear move; three 16-bit values. EXTENSION:FLT OCCURENCES:PC PROGRAMS:Autodesk Animator Pro REFERENCE: SEE ALSO:FLIc VALIDATION: --------a-FLIc------------------------------ The Flic file format was one of the first graphic animation formats on the PC. It was developed by <> and used by the Autodesk Animator. It provides relatively fast animation in 320x200 resolution modes. The FLI use delta updating for faster animation. The single block information and prefix blocks are missing for the FLI files, see the FLT format for a discussion. OFFSET Count TYPE Description 0000h 1 dword Size of the FLIc file 0004h 1 word ID=0AF11h AF11h means the file is a FLI file. 0006h 1 word Number of frames 0008h 1 word Width of displayed animation 000Ah 1 word Height of displayed animation 000Ch 1 word Number of used colors ("Depth") 000Eh 1 word Flags (=0003h) 0010h 1 dword Frame speed in sec/1024 ** 0014h 1 word reserved 0016h 1 dword Date/Time of creation in DOS format (see table 0009) 001Ah 1 dword Creator 001Eh 1 dword Date/Time of last change in DOS format (see table 0009) 0022h 1 dword Serial number? of changer 0026h 1 word X-Aspect ratio of animation 0028h 1 word Y-Aspect ratio of animation 002Ah 38 byte reserved 0052h 1 dword Offset of frame 1 in file 0056h 1 dword Offset of frame 2 in file 005Ah 40 byte reserved EXTENSION:FLI,FLT OCCURENCES:PC REFERENCE:DDJ0693 PROGRAMS:Autodesk Animator SEE ALSO:QuickTime,AVI,FLT --------D-FON-?----------------------------- The Telix .FON files are the telephone books Telix uses to store numbers in. The format is for Telix 3.22 OFFSET Count TYPE Description 0000h 1 dword ID=2E2B291Ah 0004h 1 word Version info (=1) 0006h 1 word Number of entries in directory (count from 1) 0007h 1 char ?will be used for encryption? Currently 0 0008h 55 byte reserved 0040h ? rec Actual phonebook entry 25 char Name (0 terminated) 17 char Phone number (0 terminated) 1 byte Baud rate (see table 0006) 1 byte Parity type (see table 0007) 1 byte Data bits (7 or 8) 1 byte Stop bits (1 or 2) 12 char Script file name 6 char Date of last call in ASCII 1 word Number of total calls 1 byte Terminal type (see table 0008) 1 byte Protocol 1 byte Flags, bitmapped 0 - Local echo on / off 1 - add linefeeds on / off 2 - backspace is destructive on / off 3 - backspace sends DEL / sends BS 4 - strip high bits on / off 5-7 - reserved 1 word unknown 1 byte Dial prefix index 14 char Password (Table 0006) Baud rate tables for Telix 0 = 300 baud 1 = 1200 baud 2 = 2400 baud 3 = 4800 baud 4 = 9600 baud 5 = 19200 baud 6 = 38400 baud 7 = 57600 baud 8 = 115200 baud (Table 0007) Parity types for Telix 0 = None 1 = Even 2 = Odd 3 = Mark 4 = Space (Table 0008) Terminal types for Telix 0 = TTY 1 = ANSI-BBS 2 = VT102 3 = VT52 4 = AVATAR 5 = ANSI EXTENSION:FON OCCURENCES:PC PROGRAMS:Telix v3.22 REFERENCE: SEE ALSO: VALIDATION: --------M-FPT------------------------------- The Fandarole Pattern files are used by the Fandarole Composer to store single patterns in a file. OFFSET Count TYPE Description 0000h 4 char ID='FPT',254 0004h 32 char ASCII pattern name 0024h 3 char ID=10,13,26 0027h 1 word Remaining size of file (size of pattern) 0029h 1 byte Break location (length of pattern) 002Ah 1 byte reserved 002Bh ? byte Pattern in raw format like in the .FAR file EXTENSION:FAR,FPT OCCURENCES:PC PROGRAMS:Fandarole Composer SEE ALSO:FAR,FSM VALIDATION: --------S-FSM------------------------------- The .FSM files are samples to be used for module style music with the Fandarole Composer. Currently only samples of up to 64K length are supported, altough the header reserves a dword for the sample size. OFFSET Count TYPE Description 0000h 4 char ID='FSM',254 0004h 32 char ASCII name of sample 0024h 3 char ID=10,13,26 0027h 1 dword Length of sample (<=64K) 0028h 1 byte Fine tune value for sample (currently unsupported) 0029h 1 byte Sample volume (currently unsupported) 002Ah 1 dword Start of sample loop 002Dh 1 dword End of sample loop. If the sample is not set to loop (see below) this should be set to the end of the sample. 0032h 1 byte Sample type, bitmapped 0 - 8-bit/16-bit sample 1-7 - reserved 0033h 1 byte Loop mode, ?bit mapped? 0-2 - reserved 3 - loop off/loop on 4-7 - reserved 0034h ? byte Sample data in signed format EXTENSION:FSM OCCURENCES:PC PROGRAMS:Fandarole Composer REFERENCE: SEE ALSO:FAR,USM VALIDATION: --------S-GF1 PATCH------------------------- The GF1 Patch files are multipart sound files for the Gravis Ultrasound sound card to emulate MIDI sounds in high quality. Each Patch can consist of many samples (for example, a string ensemble consists of Violin, Viola, Cello, Bass) which are played depending on the note to play. A patch can also contain a part to be played before the loop and a part to be played after the tone has been released. OFFSET Count TYPE Description 0000h 12 char ID='GF1PATCH110' 000Ch 10 char Manufacturer ID 0018h 60 char Description of the contained Instruments or copyright of manufacturer. 0054h 1 byte Number of instruments in this patch 0055h 1 byte Number of voices for sample 0056h 1 byte Number of output channels (1=mono,2=stereo) 0057h 1 word Number of waveforms 0059h 1 word Master volume for all samples 005Bh 1 dword Size of the following data 0060h 36 byte reserved Following this header, the instruments with their headers follow. An instrument header contains the name and other data about one instrument contained within the patch. OFFSET Count TYPE Description 0000h 1 word Instrument number. ?Maybe the MIDI instrument number?. In the Gravis patches, this is 0, in other patches, I found random values. 0002h 16 char ASCII name of the instrument. 0012h 1 dword Size of the whole instrument in bytes. 0016h 1 byte Layers. Needed for whatever. 0017h 40 byte reserved About the patch, I don't know anything. Maybe somebody could enlighten me. Each patch record has the following format : OFFSET Count TYPE Description 0000h 7 char Wave file name 0007h 1 byte Fractions 0008h 1 dword Wave size. Size of the wave digital data 000Ch 1 dword Start of wave loop 0010h 1 dword End of wave loop 0012h 1 word Sample rate of the wave 0014h 1 word Minimum frequency to play the wave 0016h 1 word Maximum frequency to play the wave 0018h 1 dword Original sample rate of the wave data 001Ch 1 int Fine tune value for the wave 001Eh 1 byte Stereo balance, values unknown** 001Fh 6 byte Filter envelope rate 0025h 6 byte Filter envelope offse 002Bh 1 byte Tremolo sweep 002Ch 1 byte Tremolo rate 002Dh 1 byte Tremolo depth 002Fh 1 byte Vibrato sweep 0030h 1 byte Vibrato rate 0031h 1 byte Vibrato depth 0032h 1 byte Wave data, bitmapped 0 - 8/16 bit wave data 1 - signed/unsigned data 2 - de/enable looping 3 - no/has bidirectional looping 4 - loop forward/backward 5 - Turn envelope sustaining off/on 6 - Dis/Enable filter envelope 7 - reserved 0033h 1 int Frequency scale, whatever that means 0035h 1 word Frequency scale factor 0037h 36 byte Reserved EXTENSION:PAT OCCURENCES:PC PROGRAMS:Patch Maker SEE ALSO:VOC,WAVe --------I-GIF------------------------------- The Graphics Interchange Format (tm) was created by Compuserve Inc. as a standard for the storage and transmission of raster-based graphics information, i.e. images. A GIF file may contain several images, which are to be displayed overlapping and without any delay betwenn the images. The image data itself is compressed using a LZW scheme. Please note that the LZW algorithm is patented by UniSys and that since Jan. 1995 royalties to Compuserve are due for every software that implements GIF images. The GIF file consists of a global GIF header, one or more image blocks and optionally some GIF extensions. OFFSET Count TYPE Description 0000h 6 char ID='GIF87a', ID='GIF89a' This ID may be viewed as a version number 0006h 1 word Image width 0008h 1 word Image height 000Ah 1 byte bit mapped 0-2 - bits per pixel -1 3 - reserved 4-6 - bits of color resolution 7 - Global color map follows image descriptor 000Bh 1 byte Color index of screen background 000Ch 1 byte reserved The global color map immediately follows the screen descriptor and has the size (2**BitsPerPixel), and has the RGB colors for each color index. 0 is none, 255 is full intensity. The bytes are stored in the following format : OFFSET Count TYPE Description 0000h 1 byte Red component 0001h 1 byte Green component 0002h 1 byte Blue component After the first picture, there may be more pictures attached in the file whic overlay the first picture or parts of the first picture. The Image Descriptor defines the actual placement and extents of the following image within the space defined in the Screen Descriptor. Each Image Descriptor is introduced by an image separator character. The role of the Image Separator is simply to provide a synchronization character to introduce an Image Descriptor, the image separator is defined as ",", 02Ch, Any characters encountered between the end of a previous image and the image separator character are to be ignored. The format of the Image descriptor looks like this : OFFSET Count TYPE Description 0000h 1 char Image separator ID=',' 0001h 1 word Left offset of image 0003h 1 word Upper offset of image 0005h 1 word Width of image 0007h 1 word Height of image 0009h 1 byte Palette description - bitmapped 0-2 - Number of bits per pixel-1 3-5 - reserved (0) 6 - Interlaced / sequential image 7 - local / global color map, ignore bits 0-2 To provide for some possibility of an extension of the GIF files, a special extension block introducer can be added after the GIF data block. The block has the following structure : OFFSET Count TYPE Description 0000h 1 char ID='!' 0001h 1 byte Extension ID 0002h ? rec 1 word Byte count ? byte Extra data ????h 1 byte Zero byte count - terminates extension block. EXTENSION:GIF OCCURENCES:PC PROGRAMS:CSHOW.EXE SEE ALSO: VALIDATION: --------A-GZIP------------------------------ The GNU ZIP program is an archive program mostly for the UNIX machines developed by the GNU project. OFFSET Count TYPE Description 0000h 2 char ID='!',139 0002h 1 byte Method : 0-7 - reserved 8 - deflated 0003h 1 byte File flags : 0 - ASCII-text 1 - Multi-part file 2 - Name present 3 - Comment present 4 - Encrypted 5-8 - reserved 0004h 1 dword File date and time (see table 0009) 0008h 1 byte Extra flags 0009h 1 byte Target OS : 0 - DOS 1 - Amiga 2 - VMS 3 - Unix 4 - ???? 5 - Atari 6 - OS/2 7 - MacOS 10 - TOPS-20 11 - Win/32 EXTENSION:ZIP PROGRAMS:GNU gzip --------S-GKH------------------------------- The GKH files are disk images of the Ensoniq EPS sampler system. Further information is missing. EXTENSION:GKH SEE ALSO:EFE,INS --------a-GRASPRT GL-G---------------------- The .GL animation files are graphic animations, some just .GIF files, others mini-movies, used mostly for x-rated adult animations. The format of the files is plain guesswork by me. The analyzed file did not include any animations but only .GIF files and two text files which seemed to be the animation script. There is no safe way of identifying a file as a GL animation, maybe except for adding the subfile sizes and the header size and then check if this matches the file size. OFFSET Count TYPE Description 0000h 1 word Length of header, excluding this word ="HLN" 0002h ? rec The directory entries for each file 1 dword Offset of the stored file 12 char DOS file name of the stored file 0002h+ 1 dword Length of the first stored file "HLN" ? byte The first file The other files follow in similar manner, length->file->length->file EXTENSION:GL OCCURENCES:PC PROGRAMS:GRASPRT --------?-GRIB------------------------------ The GRIB weather product information files just might be some satellite images or something else. I have only seen this signature in a magic file and further informations about the format is not known to me. OFFSET Count TYPE Description 0000h 4 char ID='GRIB' EXTENSION:??? OCCURENCES:??? PROGRAMS:??? --------A-HA-------------------------------- HA files (not to be confused with HamarSoft's HAP files [3]) contain a small archive header with a word count of the number of files in the archive. The constituent files stored sequentially with a header followed by the compressed data, as is with most archives. The main file header is formatted as follows: OFFSET Count TYPE Description 0000h 2 char ID='HA' 0002h 1 word Number of files in archive Every compressed file has a header before it, like this : OFFSET Count TYPE Description 0000h 1 byte Version & compression type 0001h 1 dword Compressed file size 0005h 1 dword Original file size 0009h 1 dword CCITT CRC-32 (same as ZModem/PkZIP) 000Dh 1 dword File time-stamp (Unix format) ? ? char ASCIIZ pathname ? ? char ASCIIZ filename ????h 1 byte Length of machine specific information ? byte Machine specific information Note that the path separator for pathnames is the 0FFh (255) character. The high nybble of the version & compression type field contains the version information (0=HA 0.98), the low nybble is the compression type : (Table 0012) HA compression types 0 "CPY" File is stored (no compression) 1 "ASC" Default compression method, using a sliding window dictionary with an arithmetic coder. 2 "HSC" Compression using a "finite context [sic] model and arithmetic coder" 14 "DIR" Directory entry 15 "SPECIAL" Used with HA 0.99B (?) Machine specific information known: 1 byte Machine type (Host-OS) 1 = MS DOS 2 = Linux (Unix) ? bytes Information (currently only file-attribute info) EXTENSION:HA OCCURENCES:PC, Linux PROGRAMS:HA REFERENCE: --------I-HSI1------------------------------ The HSI1 images are a JPEG derivative made by Handmade Software for their Image Alchemy package. OFFSET Count TYPE Description 0000h 4 char ID='HSI1' EXTENSION:JPG OCCURENCES:PC,SUN PROGRAMS:Image Alchemy REFERENCE: SEE ALSO:JPEG VALIDATION: --------A-HYP------------------------------- The Hyper archiver is a very fast compression program by P. Sawatzki and K.P. Nischke, which uses LZW compression techniques for compression. It is not very widespread - in fact, I've yet to see a package distributed in this format. OFFSET Count TYPE Description 0000h 1 byte ID=1Ah 0001h 2 char Compression method "HP" - compressed "ST" - stored 0003h 1 byte Version file was compressed by in BCD 0004h 1 dword Compressed file size 0008h 1 dword Original file size 000Ch 1 dword MS-DOS date and time of file (see table 0009) 0010h 1 dword CRC-32 of file 0014h 1 byte MS-DOS file attribute 0015h 1 byte Length of filename ="LEN" 0016h "LEN" char Filename EXTENSION:HYP OCCURENCES:PC PROGRAMS:HYPER.EXE --------f-IFF-M----------------------------- The IFF format is comparable to the RIFF file format, but it uses Motorola byte ordering. After the FORM header, the different records follow. Each record has a header ID of 4 bytes and then following the size of the data (in Motorola byte ordering). Each IFF record starts on an even byte boundary, that means if the record length is odd, you will have to skip one more byte to get the next record. OFFSET Count TYPE Description 0000h 4 char ID='FORM' 0004h 1 dword Size of the whole IFF block 0008h 4 char Type of the IFF file Each IFF record has the following format : OFFSET Count TYPE Description 0000h 4 char ID 0004h 1 dword Blocksize 0008h ? byte Block data, depends on block type. OCCURENCES:Amiga,PC SEE ALSO:8SVX,LBM,RIFF --------S-INS------------------------------- The INS files are instrument files for the Ensoniq sampler system. Further information wanted. EXTENSION:INS SEE ALSO:EFE,GKH --------I-JPEG-G---------------------------- The JPEG image standard is a standard for lossy (but efficient) image compression made by the ???? Group. The endianness of the JPEG files is unknown to me, there seem to exist both types of JPEG files. The JPEG files are block oriented, there is a header for each JPG block, but I was not able to find a list of all blocks - so you'll have to stick with what I gathered here ;) Format of a JPEG block (all data is in Motorola byte order) : OFFSET Count TYPE Description 0000h 1 word Block ID 0FFD8h - JPEG signature block(4 chars="JFIF") 0FFC0h - JPEG color information 0FFC1h - JPEG color information 0002h 1 word Block size in bytes, without ID word. Format of JPEG color information (motorola byte order) : OFFSET Count TYPE Description 0000h 1 byte 1=Grayscale image 0001h 1 word Height 0003h 1 word Width Another try for JPEG identification could be this one : OFFSET Count TYPE Description 0000h 1 dword ID=FFD9FFE0h ID=FFD8FFE0h Big endian JPEG file (Intel) ID=E0FFD8FFh Little endian JPEG file (Motorola) EXTENSION:JPG OCCURENCES:PC,Amiga,SUN PROGRAMS: REFERENCE: SEE ALSO:HSI1 VALIDATION: --------I-LBM-M----------------------------- The LBM/ILBM format is used by Deluxe Paint to store bitmap images. It uses the IFF file format and Motorola byte order. FORMblock [BMHD] This block contains the information about the image. OFFSET Count TYPE Description 0000h 1 word The image width (x-axis) 0002h 1 word The image height (y-axis) 0004h 1 dword reserved 0008h 1 byte Bits per pixel 0009h 1 byte ??reserved?? FORMblock [BODY] This block contains the (compressed) image data... **** FORMblock [CRGN] This block contains palette information for a range of palette entries. OFFSET Count TYPE Description FORMblock [TINY] This block contains a small image used for previewing. OFFSET Count TYPE Description EXTENSION:IFF,LBM OCCURENCES:AMIGA,PC PROGRAMS:Deluxe Paint REFERENCE:??? SEE ALSO:IFF --------A-LBR------------------------------- The LBR files consist of a direcotry and one or more "members". The directory contains from 4 to 256 entries and each entry describes one member. The first directory entry describes the directory itself. All space allocations are in terms of sectors, where a sector is 128 bytes long. Four directory entries fit in one sector thus the number of directory entries is always evenly divisible by 4. Different types of LBR files exist, all versions are discussed here, the directory entry looks like this : OFFSET Count TYPE Description 0000h 1 byte File status : 0 - active 254 - deleted 255 - free 0001h 11 char File name in FCB format (8/3, blank padded), directory name is blanks for old LU, ID='********DIR' for LUPC 000Ch 1 word Offset to file data in sectors 000Eh 1 word Length of stored data in sectors For the LUPC program, the remaining 16 bytes are used like this : OFFSET Count TYPE Description 0000h 8 char ASCII date of creation (MM/DD/YY) 0008h 8 char ASCII time of creation (HH:MM:SS) For the LU86 program, the remaining 16 bytes are used like this : OFFSET Count TYPE Description 0000h 1 word CRC-16 or 0 0002h 1 word Creation date in CP/M format 0004h 1 word Creation time in DOS format 0006h 1 word Date of last modification, CP/M format 0008h 1 word Time of last modification, DOS format 000Ah 1 byte Number of bytes in last sector 000Bh 5 byte reserved (0) EXTENSION:LBR OCCURENCES:PC,CP/M PROGRAMS:LU.COM, LUU.COM, LU86.COM SEE ALSO: --------A-LZH------------------------------- The LHArc/LHA archiver is a multi platform archiver made by Haruyasu Yoshizaki, which has a relatively good compression. It uses more or less the same technology like the ZIP programs by Phil Katz. There was a hack named "ICE", which had only the graphic characters displayed on decompression changed. OFFSET Count TYPE Description 0000h 1 byte Size of archived file header 0001h 1 byte Checksum of remaining bytes 0002h 3 char ID='-lh' ID='-lz' 0005h 1 char Compression methods used (see table 0005) 0006h 1 char ID='-' 0007h 1 dword Compressed size 000Bh 1 dword Uncompressed size 000Fh 1 dword Original file date/time (see table 0009) 0013h 1 word File attribute 0015h 1 byte Filename / path length in bytes ="LEN" 0016h "LEN" char Filename / path 0018h 1 word CRC-16 of original file +"LEN" (Table 0005) LHArc compression types "0" - No compression "1" - LZW, 4K buffer, Huffman for upper 6 bits of position "2" - unknown "3" - unknown "4" - LZW, Arithmetic Encoding "5" - LZW, Arithmetic Encoding "s" - LHa 2.x archive? "\" - LHa 2.x archive? "d" - LHa 2.x archive? EXTENSION:LZH,ICE OCCURENCES:PC PROGRAMS:LHArc.EXE, LHA.EXE --------M-MIDI-M---------------------------- The MIDI file format is used to store MIDI song data on disk. The discussed version of the MIDI file spec is the approved MIDI Manufacturers' Associations format version 0.06 of (3/88). The contact address is listed in the adresses file. Version 1.0 is technically identical but the description has been rewritten. The description was made by Dave Oppenheim, most of the text was taken right out of his document. MIDI files contain one or more MIDI streams, with time information for each event. Song, sequence, and track structures, tempo and time signature information, are all supported. Track names and other descriptive information may be stored with the MIDI data. This format supports multiple tracks and multiple sequences so that if the user of a program which supports multiple tracks intends to move a file to another one, this format can allow that to happen. The MIDI files are block oriented files, currently only 2 block types are defined, header and track data. Opposed to the IFF and RIFF formats, no global header is given, so that the validation must be done by adding the different block sizes. A MIDI file always starts with a header block, and is followed by one or more track block. The format of the header block : OFFSET Count TYPE Description 0000h 4 char ID='MThd' 0004h 1 dword Length of header data (=6) 0008h 1 word Format specification 0 - one, single multi-channel track 1 - one or more simultaneous tracks 2 - one or more sequentially independent single-track patterns 000Ah 1 word Number of track blocks in the file 000Ch 1 int Unit of delta-time values. If negative : Absolute of high byte : Number of frames per second. Low byte : Resolution within one frame If positive, division of a quarter-note. The track data format : The MTrk block type is where actual song data is stored. It is simply a stream of MIDI events (and non-MIDI events), preceded by delta-time values. Some numbers in MTrk blocks are represented in a form called a variable- length quantity. These numbers are represented 7 bits per byte, most significant bits first. All bytes except the last have bit 7 set, and the last byte has bit 7 clear. If the number is between 0 and 127, it is thus represented exactly as one byte. Since this explanation might not be too clear, some exapmles : Number (hex) Representation (hex) 00000000 00 00000040 40 0000007F 7F 00000080 81 00 00002000 C0 00 00003FFF FF 7F 001FFFFF FF FF 7F 08000000 C0 80 80 00 0FFFFFFF FF FF FF 7F The largest number which is allowed is 0FFFFFFF so that the variable- length representation must fit in 32 bits in a routine to write variable-length numbers. Each track block contains one or more MIDI events, each event consists of a delta-time and the number of the event. The delta-time is stored as a variable-length quantity and represents the time to delay before the following event. A delta-time of 0 means, that the event occurs simultaneous with the previous event or occurs right at the start of a track. The delta-time unit is specified in the header block. Format of track information block : OFFSET Count TYPE Description 0000h 4 char ID='MTrk' 0004h 1 dword Length of header data 0008h ? rec <delta-time>, <event> Three types of events are defined, MIDI event, system exclusive event and meta event. The first event in a file must specify status; delta-time itself is not an event. Meta events are non-MIDI informations. The format of the meta event : OFFSET Count TYPE Description 0000h 1 byte ID=FFh 0001h 1 byte Type (<=128) 0002h ? ? Length of the data, 0 if no data stored as variable length quantity ? byte Data A few meta-events are defined. It is not required for every program to support every meta-event. Meta-events initially defined include: FF 00 02 ssss Sequence Number This optional event, which must occur at the beginning of a track, before any nonzero delta-times, and before any transmittable MIDI events, specifies the number of a sequence. FF 01 len text Text Event Any amount of text describing anything. It is a good idea to put a text event right at the beginning of a track, with the name of the track, a description of its intended orchestration, and any other information which the user wants to put there. Programs on a computer which does not support non-ASCII characters should ignore those characters with the hi-bit set. Meta event types 01 through 0F are reserved for various types of text events, each of which meets the specification of text events(above) but is used for a different purpose: FF 02 len text Copyright Notice Contains a copyright notice as printable ASCII text. The notice should contain the characters (C), the year of the copyright, and the owner of the copyright. If several pieces of music are in the same MIDI file, all of the copyright notices should be placed together in this event so that it will be at the beginning of the file. This event should be the first event in the first track block, at time 0. FF 03 len text Sequence/Track Name If in a format 0 track, or the first track in a format 1 file, the name of the sequence. Otherwise, the name of the track. FF 04 len text Instrument Name A description of the type of instrumentation to be used in that track. FF 05 len text Lyric A lyric to be sung. Generally, each syllable will be a separate lyric event which begins at the event's time. FF 06 len text Marker Normally in a format 0 track, or the first track in a format 1 file. The name of that point in the sequence, such as a rehearsal letter or section name ("First Verse", etc.). FF 07 len text Cue Point A description of something happening on a film or video screen or stage at that point in the musical score ("Car crashes into house", "curtain opens", "she slaps his face", etc.) FF 2F 00 End of Track This event is not optional. It is included so that an exact ending point may be specified for the track, so that it has an exact length, which is necessary for tracks which are looped or concatenated. FF 51 03 tttttt Set Tempo, in microseconds per MIDI quarter-note This event indicates a tempo change. Another way of putting "microseconds per quarter-note" is "24ths of a microsecond per MIDI clock". Representing tempos as time per beat instead of beat per time allows absolutely exact dword-term synchronization with a time-based sync protocol such as SMPTE time code or MIDI time code. This amount of accuracy provided by this tempo resolution allows a four-minute piece at 120 beats per minute to be accurate within 500 usec at the end of the piece. Ideally, these events should only occur where MIDI clocks would be located Q this convention is intended to guarantee, or at least increase the likelihood, of compatibility with other synchronization devices so that a time signature/tempo map stored in this format may easily be transferred to another device. FF 54 05 hr mn se fr ff SMPTE Offset This event, if present, designates the SMPTE time at which the track block is supposed to start. It should be present at the beginning of the track, that is, before any nonzero delta-times, and before any transmittable MIDI events. The hour must be encoded with the SMPTE format, just as it is in MIDI Time Code. In a format 1 file, the SMPTE Offset must be stored with the tempo map, and has no meaning in any of the other tracks. The ff field contains fractional frames, in 100ths of a frame, even in SMPTE-based tracks which specify a different frame subdivision for delta-times. FF 58 04 nn dd cc bb Time Signature The time signature is expressed as four numbers. nn and dd represent the numerator and denominator of the time signature as it would be notated. The denominator is a negative power of two: 2 represents a quarter-note, 3 represents an eighth-note, etc. The cc parameter expresses the number of MIDI clocks in a metronome click. The bb parameter expresses the number of notated 32nd-notes in a MIDI quarter- note (24 MIDI Clocks). FF 59 02 sf mi Key Signature sf = -7: 7 flats sf = -1: 1 flat sf = 0: key of C sf = 1: 1 sharp sf = 7: 7 sharps mi = 0: major key mi = 1: minor key FF 7F len data Sequencer-Specific Meta-Event Special requirements for particular sequencers may use this event type: the first byte or bytes of data is a manufacturer ID. However, as this is an interchange format, growth of the spec proper is preferred to use of this event type. This type of event may be used by a sequencer which elects to use this as its only file format; sequencers with their established feature-specific formats should probably stick to the standard features when using this format. The system exclusive event is used as an escape to specify arbitrary bytes to be transmitted. The system exclusive event has two forms, to compensate for some manufacturer-specific modes, the F7h event is used if a F0h is to be transmitted. Each system exclusive event must end with an F7h event. The format of a system exclusive event : OFFSET Count TYPE Description 0000h 1 byte ID=F0h,ID=F7h 0001h ? ? Length as variable length qty. ? byte bytes to be transmitted EXTENSION:MID,MIDI OCCURENCES:PC,MAC PROGRAMS:Cubase VALIDATION: --------M-MOD-M----------------------------- The Protracker composer is a composer for digital music. The MOD files are a quasi standard for digital music, all words are in Motorola byte order. The original MOD format allowed only 4 digital channels and 15 instruments, the specification became enlarged (maybe by Mahoney and Kaktus??) to 4 channels and 31 instruments. Check the file at offset 1080d for the signatures 'M.K', '4CHN', '6CHN','8CHN','FLT4','FLT8. If you find any of them, the module uses 31 instruments. With rising sound quality on the PC and other platforms, the old MODule format has been replaced by numerous other formats. The 4/15 format has almost become extinct. Below, only the 4/31 format is descripted. The digital sample data is signed (two's complement) as necessary for the Amiga, the sample data immediately follows the pattern data. Maybe this is not valid for some 8CHN files; One of the two I have, uses Intel byte ordering and unsigned samples. OFFSET Count TYPE Description 0000h 20 char Song title, padded with spaces 0014h 31 rec Sample description record For original MOD files, the number of instruments would be 15. 22 char Sample name, padded with zeroes to full length. 2 word Sample length / 2. Needs to be multiplied by 2 to get the actual length. If the sample length is greater than 8000h, the sample is bigger than 64k. 1 byte Sample finetune. Only the lower nibble is valid. Fine tune table : 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 +1 +2 +3 +4 +5 +6 +7 -8 -7 -6 -5 -4 -3 -2 -1 1 byte Sample volume (0-40h) 1 word Sample loop start / 2 1 word Sample loop length / 2 950d 1 byte Song length in patterns (0-80h) 951d 1 byte Restart byte for song looping (Noisetracker?) 952d 128 byte Pattern play sequences 1080d 4 char ID='M.K.', ID='4CHN',ID='6CHN',ID='8CHN' ID='4FLT',ID='8FLT' If this position contains 'M.K.','8CHN', '4CHN','6CHN','FLT4' or 'FLT8' the module has 31 instruments. 1084d ? rec Patterns Each pattern has 64 rows. Depending on the number of channels, each row has from 4 to 8 notes. The channel count is determined by the ID. (see table 0005) The number of patterns is the highest pattern number stored in the pattern list. Each note has four bytes. Four notes make up a track in a four channel MOD file. Each track is saved sequentially : byte 0-3 4-7 8-11 12-15 Chn #1 Chn #2 Chn #3 Chn #4 byte 16-19 20-23 24-27 28-31 Chn #1 Chn #2 Chn #3 Chn #4 1 word Instrument / period The instrument number is in bits 12-15, the 12-bit period in bits 0-11. 1 byte Upper nibble : Lower 4 bits of the instrument, Lower nibble : Special effect command. 1 byte Special effects data (Table 0005) Protracker 16 note conversion table / MOD Period table +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ PT16 : I 1I 2I 3I 4I 5I 6I 7I 8I 9I 10I 11I 12I MOD : I 1712I 1616I 1524I 1440I 1356I 1280I 1208I 1140I 1076I 1016I 960I 906I Note : I C-0I C#0I D-0I D#0I E-0I F-0I F#0I G-0I G#0I A-0I A#0I B-0I +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ I 13I 14I 15I 16I 17I 18I 19I 20I 21I 22I 23I 24I I 856I 808I 762I 720I 678I 640I 604I 570I 538I 508I 480I 453I I C-1I C#1I D-1I D#1I E-1I F-1I F#1I G-1I G#1I A-1I A#1I B-1I +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ I 25I 26I 27I 28I 29I 30I 31I 32I 33I 34I 35I 36I I 428I 404I 381I 360I 339I 320I 302I 285I 269I 254I 240I 226I I C-2I C#2I D-2I D#2I E-2I F-2I F#2I G-2I G#2I A-2I A#2I B-2I +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ I 37I 38I 39I 40I 41I 42I 43I 44I 45I 46I 47I 48I I 214I 202I 190I 180I 170I 160I 151I 143I 135I 127I 120I 113I I C-3I C#3I D-3I D#3I E-3I F-3I F#3I G-3I G#3I A-3I A#3I B-3I +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ I 49I 50I 51I 52I 53I 54I 55I 56I 57I 58I 59I 60I I 107I 101I 95I 90I 85I 80I 75I 71I 67I 63I 60I 56I I C-4I C#4I D-4I D#4I E-4I F-4I F#4I G-4I G#4I A-4I A#4I B-4I +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ EXTENSION:MOD,module OCCURENCES:AMIGA,PC PROGRAMS:DMP,ModEdit VALIDATION:NONE --------A-MS COMPRESS 5.0-G----------------- Microsoft ships its files compressed with COMPRESS.EXE, for expansion the program EXPAND.EXE (how original ;) ) is used. The program EXPAND.EXE is available with every copy of MS-DOS 5.0+, the program COMPRESS.EXE is available with several development kits, I found it with Borland Pascal 7.0. The compression seems to be some kind of LZ-Compression, as the fully compatible? LZCopy command under Windows can decompress the same files. This compression feature seems to be available on all DOS-PCs. OFFSET Count TYPE Description 0000h 4 char ID='SZDD' 0004h 1 long reserved, always 3327F088h ? 0008h 1 byte reserved 0009h 1 char Last char of filename if file was compressed into "FILENAME.EX_". 000Ah 1 long Original file size 000Eh 1 byte reserved, varies... EXTENSION:*.??_ OCCURENCES:PC PROGRAMS:COMPRESS.EXE, EXPAND.EXE, LZEXPAND.DLL REFERENCE:?Windows SDK? SEE ALSO:MS COMPRESS 6.22+ VALIDATION: --------A-MS COMPRESS 6.22+-G--------------- At least with the version 6.22 of MS-DOS, Microsoft changed their compression program to a new signature; The program seems no more to be able to restore files to their original name, if it is not given on the command line. OFFSET Count TYPE Description 0000h 4 char ID="KWAJ" 0004h 1 long reserved, always 0D127F088h ? 0008h 1 long reserved, always 00120003h ? 000Ch 1 word reserved, always 01 ? EXTENSION:*.??_ OCCURENCES:PC PROGRAMS:COMPRESS.EXE, EXPAND.EXE, LZEXPAND.DLL REFERENCE:?Windows SDK? SEE ALSO:MS COMPRESS 5.0 VALIDATION: --------I-MSK------------------------------- The MSK files are mask files used by the Autodesk Animator and Animator Pro packages. Two types of MSK files exist. The Animator Pro version is simply a PIC file with the depth 1; A MSK file created by the original Animator is exactly 8000 bytes long. There is no file header or other control information in the file. It contains the image bit map, 1 bit per pixel, with the leftmost pixels packed into the high order bits of each byte. The size of the image is fixed at 320x200. The image is stored left-to-right, top-to-bottom. EXTENSION:MSK OCCURENCES:PC PROGRAMS:Autodesk Animator SEE ALSO:PIC,FLIc --------M-MTM------------------------------- The MTM format is generated by the Multi Track Module tracker by the demo group Renaissance. The tracker features up to 32 channel digital music. Instead of saving whole patterns, the tracker only saves the different tracks and the data which tracks should be played together at which time, thus saving some pattern space. OFFSET Count TYPE Description 0000h 3 char ID='MTM' 0003h 1 byte Version data upper nibble is major version number lower nibble is minor version number 0004h 20 char ASCIIZ song name 0018h 1 word Number of saved tracks. ="NOT" 001Ah 1 byte Highest pattern number saved ="NOP" 001Bh 1 byte Last order number to play(=Songlength-1) 001Ch 1 word Length of extra comment field in bytes ="XSZ" 001Eh 1 byte Number of samples ="NOS" 001Fh 1 byte Attribute byte (currently defined as 0) 0020h 1 byte Beats per track 0021h 1 byte Number of tracks 0022h 32 byte Pan positions of the voices (0=left, 15=right??) 0042h "NOS" rec Instrument data 22 char Sample name 1 dword Sample length in bytes 1 dword Start of sample loop in bytes 1 dword End of sample loop in bytes 1 byte Fine tune value for sample 1 byte Default volume for sample 1 byte Attribute byte, bit mapped 0 0=8 bit sample,1=16 bit sample 1-7 undefined (set to zero) 0042h+ 128 byte Pattern order data "NOS"*37 01C2h+ "NOS"*37 "NOT" rec Track data Each track is saved independently and has the size of exactly 192 bytes. Each track is arranged as 64 3-byte notes with the following format : 64*3 byte BYTE 0 BYTE 1 BYTE 2 ppppppii iiiieeee aaaaaaaa p = pitch value (0=no pitch stated) i = instrument number (0=no instrument number) e = effect number a = effect argument The effects are the standard Protracker effects. 01C2h+ ("NOP"+1)*32 word Track sequencing data "NOS"*37+ This is the list of which track is used "NOT"*192 as which voice in each pattern. One track can be part of many patterns, the drums for example. Track 0 is never saved but is always considered as an empty track. That means that counting really starts at one. The data is organized in sets of 32 voices. The first word contains the information which track is used in pattern 0, voice 0. The next word is for pattern 0, voice 1, etc., this is repeated for each pattern, 32 words for each saved pattern. 01C2h+ "XSZ" char Extra comment field. This contains some "NOS"*37+ message or something. "NOT"*192+ ("NOP"+1)*32*2 01C2h+ ? byte Raw sample data(unsigned). "NOS"*37+ "NOT"*192+ ("NOP"+1)*32*2+ "XSZ" EXTENSION:MTM SEE ALSO:MOD OCCURENCES:PC PROGRAMS:MMEDIT,DMP VALIDATION: --------M-MTS------------------------------- The Master Tracker program by the french demo group Arkham is a tracker for AdLib, SB and speaker - the further limits of this tracker are unknowm to me. OFFSET Count TYPE Description 0000h 6 char ID="MTRAC " 0006h 20 char Song name, zero padded EXTENSION:MST OCCURENCES:PC PROGRAMS:Master Tracler v1.0 SEE ALSO:MOD --------E-MZ EXE---------------------------- The old EXE files are the EXE files executed directly by MS-DOS. They were a major improvement over the old 64K COM files, since EXE files can span multiple segments. An EXE file consists of three different parts, the header, the relocation table and the binary code. The header is expanded by a lot of programs to store their copyright information in the executable, some extensions are documented below. The format of the header is as follows : OFFSET Count TYPE Description 0000h 2 char ID='MZ' ID='ZM' 0002h 1 word Number of bytes in last 512-byte page of executable 0004h 1 word Total number of 512-byte pages in executable (including the last page) 0006h 1 word Number of relocation entries 0008h 1 word Header size in paragraphs 000Ah 1 word Minimum paragraphs of memory allocated in addition to the code size 000Ch 1 word Maximum number of paragraphs allocated in addition to the code size 000Eh 1 word Initial SS relative to start of executable 0010h 1 word Initial SP 0012h 1 word Checksum (or 0) of executable 0014h 1 dword CS:IP relative to start of executable (entry point) 0018h 1 word Offset of relocation table; 40h for new-(NE,LE,LX,W3,PE etc.) executable 001Ah 1 word Overlay number (0h = main program) Following are the header expansions by some other prorams like TLink, LZExe and other linkers, encryptors and compressors; all offsets are relative to the start of the whole header : ---new executable OFFSET Count TYPE Description 001Ch 4 byte ???? 0020h 1 word Behaviour bits ?? 0022h 26 byte reserved (0) 003Ch 1 dword Offset of new executable header from start of file (or 0 if plain MZ executable) ---Borland TLINK OFFSET Count TYPE Description 001Ch 2 byte ?? (apparently always 01h 00h) 001Eh 1 byte ID=0FBh 001Fh 1 byte TLink version, major in high nybble 0020h 2 byte ?? ---old ARJ self-extracting archive OFFSET Count TYPE Description 001Ch 4 char ID='RJSX' (older versions) new signature is 'aRJsf'" in the first 1000 bytes of the file) ---LZEXE compressed executable OFFSET Count TYPE Description 001Ch 2 char ID='LZ' 001Eh 2 char Version number : '09' - LZExe 0.90 '91' - LZExe 0.91 ---PKLITE compressed executable OFFSET Count TYPE Description 001Ch 1 byte Minor version number 001Dh 1 byte Bit mapped : 0-3 - major version 4 - Extra compression 5 - Multi-segment file 001Eh 6 char ID='PKLITE' ---LHarc 1.x self-extracting archive OFFSET Count TYPE Description 001Ch 4 byte unused??? 0020h 3 byte Jump to start of extraction code 0023h 2 byte ??? 0025h 12 char ID='LHarc's SFX ' --LHA 2.x self-extracting archive OFFSET Count TYPE Description 001Ch 8 byte ??? 0024h 10 char ID='LHa's SFX ' For version 2.10 ID='LHA's SFX ' (v2.13) For version 2.13 ---LH self-extracting archive OFFSET Count TYPE Description 001Ch 8 byte ??? 0024h 8 byte ID='LH's SFX ' ---TopSpeed C 3.0 CRUNCH compressed file OFFSET Count TYPE Description 001Ch 1 dword ID=018A0001h 0020h 1 word ID=1565h ---PKARC 3.5 self-extracting archive OFFSET Count TYPE Description 001Ch 1 dword ID=00020001h 0020h 1 word ID=0700h ---BSA (Soviet archiver) self-extracting archive OFFSET Count TYPE Description 001Ch 1 word ID=000Fh 001Eh 1 byte ID=A7h ---LARC self-extracting archive OFFSET Count TYPE Description 001Ch 4 byte ??? 0020h 11 byte ID='SFX by LARC ' After the header, there follow the relocation items, which are used to span multpile segments. The relocation items have the following format : OFFSET Count TYPE Description 0000h 1 word Offset within segment 0002h 1 word Segment of relocation To get the position of the relocation within the file, you have to compute the physical adress from the segment:offset pair, which is done by multiplying the segment by 16 and adding the offset and then adding the offset of the binary start. Note that the raw binary code starts on a paragraph boundary within the executable file. All segments are relative to the start of the executable in memory, and this value must be added to every segment if relocation is done manually. EXTENSION:EXE,OVR,OVL OCCURENCES:PC PROGRAMS:MS-DOS REFERENCE:Ralf Brown's Interrupt List SEE ALSO:COM,EXE,NE EXE --------E-NE EXE---------------------------- The NE EXE files are the new exe files used by windows and OS/2 executables. They contain a small MZ EXE which prints "This program requires Microsoft Windows" or something similar but Some files contain both DOS and Windows versions of the executable. The position of the new EXE header can be found in the old exe header - see the MZ EXE topic for further information. All offsets within this header are from the start of the header if not noted otherwise. OFFSET Count TYPE Description 0000h 2 char ID='NE' 0002h 1 byte Linker major version 0003h 1 byte Linker minor version 0004h 1 word Offset of entry table (see below) 0006h 1 word Length of entry table in bytes 0008h 1 dword File load CRC (0 in Borland's TPW) 000Ch 1 byte Program flags, bitmapped : 0-1 - DGroup type : 0 - none 1 - single shared 2 - multiple 3 - (null) 2 - Global initialization 3 - Protected mode only 4 - 8086 instructions 5 - 80286 instructions 6 - 80386 instructions 7 - 80x87 instructions 000Dh 1 byte Application flags, bitmapped 0-2 - Application type 1 - Full screen (not aware of Windows/P.M. API) 2 - Compatible with Windows/P.M. API 3 - Uses Windows/P.M. API 3 - OS/2 family application 4 - reserved? 5 - Errors in image/executable 6 - "non-conforming program" whatever 7 - DLL or driver (SS:SP info invalid, CS:IP points at FAR init routine called with AX=module handle which returns AX=0000h on failure, AX nonzero on successful initialization) 000Eh 1 byte Auto data segment index 0010h 1 word Initial local heap size 0012h 1 word Initial stack size 0014h 1 dword Entry point (CS:IP), CS is index into segment table 0018h 1 dword Initial stack pointer (SS:SP) SS is index into segment table 001Ch 1 word Segment count 001Eh 1 word Module reference count 0020h 1 word Size of nonresident names table in bytes 0022h 1 word Offset of segment table (see below) 0024h 1 word Offset of resource table 0026h 1 word Offset of resident names table 0028h 1 word Offset of module reference table 002Ah 1 word Offset of imported names table (array of counted strings, terminated with a string of length 00h) 002Ch 1 dword Offset from start of file to nonresident names table 0030h 1 word Count of moveable entry point listed in entry table 0032h 1 word File alignment size shift count 0 is equivalent to 9 (default 512-byte pages) 0034h 1 word Number of resource table entries 0036h 1 byte Target operating system 0 - unknown 1 - OS/2 2 - Windows 3 - European MS-DOS 4.x 4 - Windows 386 5 - BOSS (Borland Operating System Services) 0037h 1 byte Other OS/2 EXE flags, bitmapped 0 - Long filename support 1 - 2.x protected mode 2 - 2.x proportional fonts 3 - Executable has gangload area 0038h 1 word Offset to return thunks or start of gangload area - whatever that means. 003Ah 1 word offset to segment reference thunks or length of gangload area. 003Ch 1 word Minimum code swap area size 003Eh 2 byte Expected Windows version (minor version first) EXTENSION:DLL,EXE,FOT OCCURENCES:PC PROGRAMS: REFERENCE:Windows 3.1 SDK Programmer's Reference, Vol 4. SEE ALSO:EXE,MZ EXE --------H-NG-G------------------------------ Information about this format comes only from a magic file, thus is only good for file identification. I did not test it, since I don't have any NG files. The Norton Guides are a popup help program for the IBM PCs which provide instant help anywhere... OFFSET Count TYPE Description 0000h 2 char ID='NG' 0002h 1 dword ID=0 EXTENSION:NG OCCURENCES:PC PROGRAMS:NG.EXE SEE ALSO:TPH,HLP --------B-OBJ------------------------------- Most of the description was taken from the Microsoft Product Support Services Application Note SS0288. The .OBJ files are binary files used by compilers to link in precompiled code. They contain symbol and relocation information necessary to link the data and code contained in the files. The .OBJ files have no common header which makes a validation or identification guesswork at best. The .OBJ files consist of at least one record, each of the following type : OFFSET Count TYPE Description 0000h 1 byte Record type (see below) 0001h 1 word Record length ="LEN" 0003h "LEN" byte Record data 0003h 1 byte Checksum or 0 +"LEN" (that much for validation) The maximum size of the entire record (unless otherwise noted for specific record types) is 1024 bytes. For LINK386, the format is determined by the least-significant bit of the Record Type field. An odd Record Type indicates that certain numeric fields within the record contain 32-bit values; an even Record Type indicates that those fields contain 16-bit values. The affected fields are described with each record. Note that this principle does not govern the Use32/Use16 segment attribute (which is set in the ACBP byte of SEGDEF records); it simply specifies the size of certain numeric fields within the record. It is possible to use 16-bit OMF records to generate 32-bit segments, or vice versa. LINK ignores the value of the checksum byte, but some other utilities may not. Microsoft's Quick languages write a 0 byte instead of computing a checksum. The contents of each record are determined by the record type, but certain subfields appear frequently enough to be explained separately. The format of such fields is below. Names : A name string is encoded as an 8-bit unsigned count followed by a string of count characters. The character set is usually some ASCII subset. A null name is specified by a single byte of 0 (indicating a string of length 0). Indexed References : Certain items are ordered by occurrence and are referenced by index. The first occurrence of the item has index number 1. Index fields may contain 0 (indicating that they are not present) or values from 1 through 7FFF. The index number field in an object record can be either 1 or 2 bytes long. If the number is in the range 0-7FH, the high-order bit (bit 7) is 0 and the low-order bits contain the index number, so the field is only 1 byte long. If the index number is in the range 80- 7FFFH, the field is 2 bytes long. The Type Indexes : Type Index fields occupy 1 or 2 bytes and occur in PUBDEF, LPUBDEF, COMDEF, LCOMDEF, EXTDEF, and LEXTDEF records. They are encoded as described above for indexed references, but the interpretation of the values stored is governed by whether the module has the "new" or "old" object module format. "Old" versions of the OMF (indicated by lack of a COMENT record with comment class A1), have Type Index fields that contain indexes into previously seen TYPDEF records. This format is no longer produced by Microsoft products and is ignored by LINK if it is present. See the section of this document on TYPDEF records for details on how this was used. "New" versions of the OMF (indicated by the presence of a COMENT record with comment class A1), have Type Index fields that contain proprietary CodeView information. For more information on CodeView, see Appendix 1. Ordered Collections : Certain records and record groups are ordered so that the records may be referred to with indexes (the format of indexes is described in the "Indexed References" section of this document). The same format is used whether an index refers to names, logical segments, or other items. The overall ordering is obtained from the order of the records within the file together with the ordering of repeated fields within these records. Such ordered collections are referenced by index, counting from 1 (index 0 indicates unknown or not specified). For example, there may be many LNAMES records within a module, and each of those records may contain many names. The names are indexed starting at 1 for the first name in the first LNAMES record encountered while reading the file, 2 for the second name in the first record, and so forth, with the highest index for the last name in the last LNAMES record encountered. The ordered collections are: Names Ordered by occurrence of LNAMES records and names within each. Referenced as a name index. Logical Ordered by occurrence of SEGDEF records in Segments file. Referenced as a segment index. Groups Ordered by occurrence of GRPDEF records in file. Referenced as a group index. External Ordered by occurrence of EXTDEF, COMDEF, Symbols LEXTDEF, and LCOMDEF records and symbols within each. Referenced as an external name index (in FIXUP subrecords). Numeric 2- and 4-Byte Fields : Certain records, notably SEGDEF, PUBDEF, LPUBDEF, LINNUM, LEDATA, LIDATA, FIXUPP, and MODEND, contain size, offset, and displacement values that may be 32-bit quantities for Use32 segments. The encoding is as follows: - When the least-significant bit of the record type byte is set (that is, the record type is an odd number), the numeric fields are 4 bytes. - When the least-significant bit of the record type byte is clear, the fields occupy 2 bytes. The values are zero-extended when applied to Use32 segments. NOTE: See the description of SEGDEF records in this document for an explanation of Use16/Use32 segments. The general record ordering is not mandatory, but should be (for link speed) like this : THEADR or LHEADR record : Records Processed by LINK Pass 1 : All records may occur in any order but must stand before the link pass separator, if it is present. COMENT records identifying object format and extensions COMENT records other than Link Pass Separator comment LNAMES or LLNAMES records providing ordered name list SEGDEF records providing ordered list of program segments GRPDEF records providing ordered list of logical segments TYPDEF records (obsolete) ALIAS records PUBDEF records locating and naming public symbols LPUBDEF records locating and naming private symbols COMDEF, LCOMDEF, EXTDEF, LEXTDEF, and CEXTDEF records Link Pass Separator (Optional) : COMENT class A2 record indicating that Pass 1 of the linker is complete. When this record is encountered, LINK stops reading the object file in Pass 1; no records after this comment are read in Pass 1. All the records listed above must come before this COMENT record. For greater linking speed, all LIDATA, LEDATA, FIXUPP, BAKPAT, INCDEF, and LINNUM records should come after the A2 COMENT record, but this is not required. In LINK, Pass 2 begins again at the start of the object module, so these records are processed in Pass 2 no matter where they are placed in the object module. Records Ignored by LINK Pass 1 and Processed by LINK Pass 2 : The following records may come before or after the Link Pass Separator: LIDATA, LEDATA, or COMDAT records followed by applicable FIXUPP records FIXUPP records containing only THREAD subrecords BAKPAT and NBKPAT FIXUPP records COMENT class A0, subrecord type 03 (INCDEF) records containing incremental compilation information for FIXUPP and LINNUM records LINNUM and LINSYM records providing line number and program code or data association Terminator : MODEND record indicating end of module with optional start address Details of each record (form and content) follow below. Conflicts between various OMFs that overlap in their use of record types or fields are marked. Below is a combined list of record types defined by the Intel 8086 OMF specification and record types added after that specification was finished. Titles in square brackets ([]) indicate record types that have been implemented and that are described in this document. Titles not in square brackets indicate record types that have not been implemented and are followed by a paragraph of description from the Intel specification. For unimplemented record types, a subtle distinction is made between records that LINK ignores and those for which LINK generates an "illegal object format" error condition. Records Currently Defined 6EH RHEADR R-Module Header Record This record serves to identify a module that has been processed (output) by LINK-86/LOCATE-86. It also specifies the module attributes and gives information on memory usage and need. This record type is ignored by Microsoft LINK. 70H REGINT Register Initialization Record This record provides information about the 8086 register/register-pairs: CS and IP, SS and SP, DS and ES. The purpose of this information is for a loader to set the necessary registers for initiation of execution. This record type is ignored by Microsoft LINK. 72H REDATA Relocatable Enumerated Data Record This record provides contiguous data from which a portion of an 8086 memory image may eventually be constructed. The data may be loaded directly by an 8086 loader, with perhaps some base fixups. The record may also be called a Load-Time Locatable (LTL) Enumerated Data Record. This record type is ignored by Microsoft LINK. 74H RIDATA Relocatable Iterated Data Record This record provides contiguous data from which a portion of an 8086 memory image may eventually be constructed. The data may be loaded directly by an 8086 loader, but data bytes within the record may require expansion. The record may also be called a Load-Time Locatable (LTL) Iterated Data Record. This record type is ignored by Microsoft LINK. 76H OVLDEF Overlay Definition Record This record provides the overlay's name, its location in the object file, and its attributes. A loader may use this record to locate the data records of the overlay in the object file. This record type is ignored by Microsoft LINK. 78H ENDREC End Record This record is used to denote the end of a set of records, such as a block or an overlay. This record type is ignored by Microsoft LINK. 7AH BLKDEF Block Definition Record This record provides information about blocks that were defined in the source program input to the translator that produced the module. A BLKDEF record will be generated for every procedure and for every block that contains variables. This information is used to aid debugging programs. This record type is ignored by Microsoft LINK. 7CH BLKEND Block End Record This record, together with the BLKDEF record, provides information about the scope of variables in the source program. Each BLKDEF record must be followed by a BLKEND record. The order of the BLKDEF, debug symbol records, and BLKEND records should reflect the order of declaration in the source module. This record type is ignored by Microsoft LINK. 7EH DEBSYM Debug Symbols Record This record provides information about all local symbols, including stack and based symbols. The purpose of this information is to aid debug- ging programs. This record type is ignored by Microsoft LINK. [80H] [THEADR] [Translator Header Record] [82H] [LHEADR] [Library Module Header Record] 84H PEDATA Physical Enumerated Data Record This record provides contiguous data, from which a portion of an 8086 memory image may be constructed. The data belongs to the "unnamed absolute segment" in that it has been assigned absolute 8086 memory addresses and has been divorced from all logical segment information. This record type is ignored by Microsoft LINK. 86H PIDATA Physical Iterated Data Record This record provides contiguous data, from which a portion of an 8086 memory image may be constructed. It allows initialization of data segments and provides a mechanism to reduce the size of object modules when there is repeated data to be used to initialize a memory image. The data belongs to the "unnamed absolute segment." This record type is ignored by Microsoft LINK. [88H] [COMENT] [Comment Record] [8AH/8BH] [MODEND] [Module End Record] [8CH] [EXTDEF] [External Names Definition Record] [8EH] [TYPDEF] [Type Definition Record] [90H/91H] [PUBDEF] [Public Names Definition Record] 92H LOCSYM Local Symbols Record This record provides information about symbols that were used in the source program input to the translator that produced the module. This information is used to aid debugging programs. This record has a format identical to the PUBDEF record. This record type is ignored by Microsoft LINK. [94H/95H] [LINNUM] [Line Numbers Record] [96H] [LNAMES] [List of Names Record] [98H/99H] [SEGDEF] [Segment Definition Record] [9AH] [GRPDEF] [Group Definition Record] [9CH/9DH] [FIXUPP] [Fixup Record] 9EH (none) Unnamed record This record number was the only even number not defined by the original Intel specification. Apparently it was never used. This record type is ignored by Microsoft LINK. [A0H/A1H] [LEDATA] [Logical Enumerated Data Record] [A2H/A3H] [LIDATA] [Logical Iterated Data Record] A4H LIBHED Library Header Record This record is the first record in a library file. It immediately precedes the modules (if any) in the library. Following the modules are three more records in the following order: LIBNAM, LIBLOC, and LIBDIC. This record type is ignored by Microsoft LINK. A6H LIBNAM Library Module Names Record This record lists the names of all the modules in the library. The names are listed in the same sequence as the modules appear in the library. This record type is ignored by Microsoft LINK. A8H LIBLOC Library Module Locations Record This record provides the relative location, within the library file, of the first byte of the first record (either a THEADR or LHEADR or RHEADR record) of each module in the library. The order of the locations corresponds to the order of the modules in the library. This record type is ignored by Microsoft LINK. AAH LIBDIC Library Dictionary Record This record gives all the names of public symbols within the library. The public names are separated into groups; all names in the nth group are defined in the nth module of the library. This record type is ignored by Microsoft LINK. [B0H] [COMDEF] [Communal Names Definition Record] [B2H/B3H] [BAKPAT] [Backpatch Record] [B4H] [LEXTDEF] [Local External Names Definition Record] [B6H/B7H] [LPUBDEF] [Local Public Names Definition Record] [B8H] [LCOMDEF] [Local Communal Names Definition Record] BAH/BBH COMFIX Communal Fixup Record Microsoft doesn't support this never- implemented IBM extension. This record type generates an error when it is encountered by Microsoft LINK. BCH CEXTDEF COMDAT External Names Definition Record C0H SELDEF Selector Definition Record Microsoft doesn't support this never- implemented IBM extension. This record type generates an error when it is encountered by Microsoft LINK. [C2H/C3] [COMDAT] [Initialized Communal Data Record] [C4H/C5H] [LINSYM] [Symbol Line Numbers Record] [C6H] [ALIAS] [Alias Definition Record] [C8H/C9H] [NBKPAT] [Named Backpatch Record] [CAH] [LLNAMES] [Local Logical Names Definition Record] [F0H] [Library Header Record] Although this is not actually an OMF record type, the presence of a record with F0H as the first byte indicates that the module is a Microsoft library. The format of a library file is given in Appendix 2. [F1H] [Library End Record] 80H THEADR--TRANSLATOR HEADER RECORD The THEADR record contains the name of the object module. This name identifies an object module within an object library or in messages produced by the linker. OFFSET Count TYPE Description 0000h 1 byte ID=80h 0001h 1 byte Record length ="LEN" 0002h "LEN" char Name 0002h 1 byte Checksum +"LEN" 82H LHEADR--LIBRARY MODULE HEADER RECORD This record is very similar to the THEADR record. It is used to indicate the name of a module within a library file (which has an internal organization different from that of an object module). This record type was defined in the original Intel specification with the same format but with a different purpose, so its use for libraries should be considered a Microsoft extension. OFFSET Count TYPE Description 0000h 1 byte ID=82h 0001h 1 byte Record length ="LEN" 0002h "LEN" char Name 0002h 1 byte Checksum +"LEN" EXTENSION:OBJ,OBP,OBW,LIB OCCURENCES:PC PROGRAMS:MS Link, TLink, OBJDUMP REFERENCE:**** --------H-OS/2 HELP------------------------- The OS/2 help files are different from the WinHelp help files,since the WinHelp format is proprietary to MicroSoft because of the patented LZ-packing they implemented. OFFSET Count TYPE Description 0000h 3 char ID='HSP' 0003h 1 byte Flags : 0 - INF style file 1-3 - unknown 4 - HLP style file Patching this file allows reading HLP files using the VIEW command, while HLP files seem to work with INF settings as well. 0005h 1 word Total size of header 0007h 1 word Unknown ????h other data 0047h ? char ASCIIZ name of the HLP/INF file EXTENSION:HLP,INF OCCURENCES:OS/2 REFERENCE:INF02A.DOC SEE ALSO:WinHelp HLP --------X-PARADOX DATAFILES-?--------------- The data files for the paradox database engine have the following format : OFFSET Count TYPE Description 0000h 1 byte Number of bytes per record 0001h 32 byte ???? 0021h 1 byte Number of fields per record 0022h 1 byte ?Password protected? / other flags ? - if password protected, 32 more bytes seem to be inserted. 0023h ?? byte ????? 0058h ? rec 1 byte Field type ? 1 - character field 5 - currency? 6 - integer 1 byte Field length After that, my information becomes really blurry :-I There seems to follow the name of the file, and some 0-filled areas, and after that the "first ASCII character after 0C0h" is said to be the start of the field names. Each field name is in ASCIZ. The actual records start after the field names, either at the 4th byte after 00h 02h (the sequence ending the field names section) or after 00h 02h 00h 00h 00h. EXTENSION:??? OCCURENCES:PC PROGRAMS:Paradox engine SEE ALSO: --------I-PBM-G----------------------------- The PBM files are image files, which were used at least by DMGraph, an utility to insert new graphics into a DOOM WAD file. The image dimensions seem to be stored in ASCII format delimited with CR/LF, after that follows the raw binary image data. OFFSET Count TYPE Description 0000h 1 char ID='P' 0001h 1 char Bitmap type : '1' - PBM bitmap '2' - PGM greymap '3' - PPM pixmap '4' - PBM raw bitmap '5' - PGM raw greymap '6' - PPM raw pixmap EXTENSION:PBM,PGM,PPM OCCURENCES:PC PROGRAMS:DMGraph.EXE --------I-PCX------------------------------- The PCX files are created by the programs of the ZSoft Paintbrush family and the FRIEZE package by the same manufacturer. A PCX file contains only one image, the data for this image and possibly palette information for this image. The encoding scheme used for PCX encoding is a simple RLE mechanism, see ALGRTHMS.txt for further information. A PCX image is stored from the upper scan line to the lower scan line. The size of a decoded scan line is always an even number, thus one additional byte should always be allocated for the decoding buffer. The header has a fixed size of 128 bytes and looks like this : OFFSET Count TYPE Description 0000h 1 byte Manufacturer. 10=ZSoft 0001h 1 byte Version information 0=PC Paintbrush v2.5 2=PC Paintbrush v2.8 w palette information 3=PC Paintbrush v2.8 w/o palette information 4=PC Paintbrush/Windows 5=PC Paintbrush v3.0+ 0002h 1 byte Encoding scheme, 1 = RLE, none other known 0003h 1 byte Bits per pixel 0004h 1 word left margin of image 0006h 1 word upper margin of image 0008h 1 word right margin of image 000Ah 1 word lower margin of image 000Ch 1 word Horizontal DPI resolution 000Eh 1 word Vertical DPI resolution 0010h 48 byte Color palette setting for 16-color images 16 RGB triplets 0040h 1 byte reserved 0041h 1 byte Number of color planes ="NCP" 0042h 1 word Number of bytes per scanline (always even, use instead of right margin-left margin). ="NBS" 0044h 1 word Palette information 1=color/bw palette 2=grayscale image 0046h 1 word Horizontal screen size 0048h 1 word Vertical screen size 004Ah 54 byte reserved, set to 0 The space needed to decode a single scan line is "NCP"*"NBS" bytes, the last byte may be a junk byte which is not displayed. After the image data, if the version number is 5 (or greater?) there possibly is a VGA color palette. The color ranges from 0 to 255, 0 is zero intensity, 255 is full intensity. The palette has the following format : OFFSET Count TYPE Description 0000h 1 byte VGA palette ID (=0Ch) 0001h 768 byte RGB triplets with palette information EXTENSION:PCX OCCURENCES:PC PROGRAMS:PC Paintbrush SEE ALSO: --------I-PIC------------------------------- PIC files contain images in an uncompressed format. Both the original Animator and Animator Pro from Autodesk produce PIC files. The file formats are different; Animator Pro produces a hierarchial block oriented file, while the original Animator file is a simpler fixed format. See PIC(Pro) for further information on the Animator Pro PIC format. The original Animator uses this format to store a single-frame picture image. This format description applies to both PIC and original Animator CEL files. The file begins with a 32 byte header, as follows: OFFSET Count TYPE Description 0000h 1 word ID=9119h 0002h 1 word Width of image; PIC files have always a width of 320, CEL images may have any value. 0004h 1 word Height of image, 200 for a PIC, any value for a CEL file. 0006h 1 word X offset of image, always 0 for a PIC image, may be nonzero in a CEL image. 0008h 1 word Y offset of image. Zero for a PIC file. 000Ah 1 byte Bits per pixel (8) 000Bh 1 byte Compresion flag, always zero 000Ch 1 dword Size of the image data in bytes 0010h 16 byte reserved(0) Immediately following the header is the color map. It contains all 256 palette entries in rgb order. Each of the r, g, and b components is a single byte in the range of 0-63. Following the color palette is the image data, one byte per pixel, from left to right, top to bottom. EXTENSION:PIC,CEL OCCURENCES:PC PROGRAMS:Autodesk Animator SEE ALSO:CEL,FLIc,PIC(PRO) --------I-PIC(PRO)-------------------------- This format description applies to both PIC and MSK files created with the Autodesk Animator Pro package. The file begins with a 64-byte header defined as follows: Offset Length Name Description 0000h 1 dword The size of the whole file including the size of this header. 0004h 1 word ID=9500h 0006h 1 word Width of the image 0008h 1 word Height of the image 000Ah 1 word X offset of image 000Ch 1 word Y offset of image 000Eh 1 dword User ID, set to zero 0012h 1 byte Bits per pixel (8 for PIC, 1 for MSK) 0013h 45 byte reserved (0) Following the file header are the data blocks for the image. Each data block within a PIC or MSK file is formatted as follows: OFFSET Count TYPE Description 0000h 1 dword The size of the block, including this header. 0004h 1 word Data type ID : 0 - Color palette info 1 - Byte-per-pixel image data 2 - Bit-per-pixel mask data 0006h ? byte Data The type values in the block headers indicate what type of graphics data the block contains. In a PIC_CMAP block, the first 2-byte word is a version code; currently this is set to zero. Following the version word are all 256 palette entries in rgb order. Each of the r, g, and b components is a single byte in the range of 0-255. This type of block appears in PIC files; there will generally be no color map block in a MSK file. In a PIC_BYTEPIXELS block, the image data appears immediately following the 6-byte block header. The data is stored as one byte per pixel, in left-to-right, topD to-bottom sequence. In a PIC_BITPIXELS block, the bitmap data appears immediately following the 6-byte block header. The data is stored as bits packed into bytes such that the leftmost bits appear in the high-order positions of each byte. The bits are stored in left-to-right, top-to bottom sequence. When the width of the bitmap is not a multiple of 8, there will be unused bits in the low order positions of the last byte on each line. The number of bytes per line is ((width+7)/8). This type of block appears in MSK files. EXTENSION:PIC,MSK OCCURENCES:PC PROGRAMS:Autodesk Animator Pro REFERENCE: SEE ALSO:PIC,FLT --------E-PIF------------------------------- The Program Information Files have stayed a long time with the PC. They origi- nated from IBMs Topview, were carried on by DoubleView and DesqView, and today they are used by Windows and Windows NT. The PIF files store additional information about executables that are foreign to the running multitasking system such as ressource usage, keyboard and mouse virtualization and hotkeys. The original (Topview) PIF had a size of 171h bytes, after that, there come the various extensions for the different operating environments. The different extensions are discussed in their own sections. OFFSET Count TYPE Description 0000h 1 byte reserved 0001h 1 byte Checksum 0002h 30 char Title for the window 0020h 1 word Maximum memory reserved for program 0022h 1 word Minimum memory reserved for program 0024h 63 char Path and filename of the program 0063h 1 byte 0 - Do not close window on exit other - Close window on exit 0064h 1 byte Default drive (0=A: ??) 0065h 64 char Default startup directory 00A5h 64 char Parameters for program 00E5h 1 byte Initial screen mode, 0 equals mode 3 ? 00E6h 1 byte Text pages to reserve for program 00E7h 1 byte First interrupt used by program 00E8h 1 byte Last interrupt used by program 00E9h 1 byte Rows on screen 00EAh 1 byte Columns on screen 00EBh 1 byte X position of window 00ECh 1 byte Y position of window 00EDh 1 word System memory ?? whatever 00EFh 64 char ?? Shared program path 012Fh 64 char ?? Shared program data file 016Fh 1 word Program flags EXTENSION:PIF,DVP OCCURENCES:PC PROGRAMS:Topview, DesqView, Windows REFERENCE:see DDJ #202, July 1993, QuarterDeck SDK SEE ALSO:Windows PIF, Windows NT PIF VALIDATION: --------I-PLY------------------------------- The PoLYgon files created by the Autodesk Animator packages contain a set of points that describe a polygon. OFFSET Count TYPE Description 0000h 1 word Number of points in the file 0002h 1 dword reserved (0) 0006h 1 byte Closed shape flag. If nonzero there is an implied connection between the last and the first point. If it is zero, the shape is open. 0007h 1 byte ID=99h After the header, there follows the point data, organized in records like this : OFFSET Count TYPE Description 0000h 1 word X coordinate 0002h 1 word Y coordinate 0006h 1 word Z coordinate, always zero EXTENSION:PLY OCCURENCES:PC PROGRAMS:Autodesk Animator --------I-PNG-M----------------------------- "excerpted from the PNG (Portable Network Graphics) specification, tenth draft." The PNG format (pronounced PiNG) was the replacement the Internet found, after the GIF format/CompuServe/LZW compression-patent stuff. PNG is a lossless image- compression format, which allows a large range of applications. The PNG format is in the public domain, the latest versions of the standard and related information can always be found at the PNG FTP archive site, The maintainers of the PNG specification can be contacted by e-mail at [email protected]. The PNG format uses Motorola byte order, scanlines always begin on byte boundaries. When pixels are less than 8 bits deep, if the scanline width is not evenly divisible by the number of pixels per byte then the low-order bits in the last byte of each scanline are wasted. The contents of the padding bits added to fill out the last byte of a scanline are unspecified. An additional "filter" byte is added to the beginning of every scanline, as described in detail below. The filter byte is not considered part of the image data, but it is included in the data stream sent to the compression step. PNG allows the image data to be filtered before it is compressed. The purpose of filtering is to improve the compressibility of the data. The filter step itself does not reduce the size of the data. All PNG filters are strictly lossless. PNG defines several different filter algorithms, including "none" which indicates no filtering. The filter algorithm is specified for each scanline by a filter type byte which precedes the filtered scanline in the precompression data stream. An intelligent encoder may switch filters from one scanline to the next. The method for choosing which filter to employ is up to the encoder. A PNG image can be stored in interlaced order to allow progressive display. The purpose of this feature is to allow images to "fade in" when they are being displayed on-the-fly. Interlacing slightly expands the file size on average, but it gives the user a meaningful display much more rapidly. Note that decoders are required to be able to read interlaced images, whether or not they actually perform progressive display. With interlace type 0, pixels are stored sequentially from left to right, and scanlines sequentially from top to bottom (no interlacing). Interlace type 1, known as Adam7 after its author, Adam M. Costello, consists of seven distinct passes over the image. Each pass transmits a subset of the pixels in the image. The pass in which each pixel is transmitted is defined by replicating the following 8-by-8 pattern over the entire image, starting at the upper left corner: 1 6 4 6 2 6 4 6 7 7 7 7 7 7 7 7 5 6 5 6 5 6 5 6 7 7 7 7 7 7 7 7 3 6 4 6 3 6 4 6 7 7 7 7 7 7 7 7 5 6 5 6 5 6 5 6 7 7 7 7 7 7 7 7 Within each pass, the selected pixels are transmitted left to right within a scanline, and selected scanlines sequentially from top to bottom. For example, pass 2 contains pixels 4, 12, 20, etc. of scanlines 0, 8, 16, etc. (numbering from 0,0 at the upper left corner). The last pass contains the entirety of scanlines 1, 3, 5, etc. The data within each pass is laid out as though it were a complete image of the appropriate dimensions. For example, if the complete image is 8x8 pixels, then pass 3 will contain a single scanline containing two pixels. When pixels are less than 8 bits deep, each such scanline is padded to fill an integral number of bytes (see Image layout). Filtering is done on this reduced image in the usual way, and a filter type byte is transmitted before each of its scanlines (see Filter Algorithms). Notice that the transmission order is defined so that all the scanlines transmitted in a pass will have the same number of pixels; this is necessary for proper application of some of the filters. Caution: If the image contains fewer than five columns or fewer than five rows, some passes will be entirely empty. Encoder and decoder authors must be careful to handle this case correctly. In particular, filter bytes are only associated with nonempty scanlines; no filter bytes are present in an empty pass. A PNG file consists of a PNG signature followed by a series of chunks. This chapter defines the signature and the basic properties of chunks. Individual chunk types are discussed in the next chapter. PNG Header OFFSET Count TYPE Description 0000h 8 char ID=89h,'PNG',13,10,26,10 Chunk layout OFFSET Count TYPE Description 0000h 1 dword Number of data bytes after this header. 0004h 4 char Chunk type. A 4-byte chunk type code. For convenience in description and in examining PNG files, type codes are restricted to consist of uppercase and lowercase ASCII letters (A-Z, a-z). However, encoders and decoders should treat the codes as fixed binary values, not character strings. For example, it would not be correct to represent the type code IDAT by the EBCDIC equivalents of those letters. ????h ? byte Data ????h 1 dword CRC calculated on the preceding bytes in that chunk, including the chunk type code and chunk data fields, but not including the length field. The CRC is always present, even for empty chunks such as IEND. The CRC algorithm is specified below. Chunk naming conventions ======================== Chunk type codes are assigned in such a way that a decoder can determine some properties of a chunk even if it does not recognize the type code. These rules are intended to allow safe, flexible extension of the PNG format, by allowing a decoder to decide what to do when it encounters an unknown chunk. The naming rules are not normally of interest when a decoder does recognize the chunk's type. Four bits of the type code, namely bit 5 (value 32) of each byte, are used to convey chunk properties. This choice means that a human can read off the assigned properties according to whether each letter of the type code is uppercase (bit 5 is 0) or lowercase (bit 5 is 1). However, decoders should test the properties of an unknown chunk by numerically testing the specified bits; testing whether a character is uppercase or lowercase is inefficient, and even incorrect if a locale-specific case definition is used. It is also worth noting that the property bits are an inherent part of the chunk name, and hence are fixed for any chunk type. Thus, TEXT and Text are completely unrelated chunk type codes. Decoders should recognize codes by simple four-byte literal comparison; it is incorrect to perform case conversion on type codes. The semantics of the property bits are: First Byte: 0 (uppercase) = critical, 1 (lowercase) = ancillary Chunks which are not strictly necessary in order to meaningfully display the contents of the file are known as "ancillary" chunks. Decoders encountering an unknown chunk in which the ancillary bit is 1 may safely ignore the chunk and proceed to display the image. The time chunk (tIME) is an example of an ancillary chunk. Chunks which are critical to the successful display of the file's contents are called "critical" chunks. Decoders encountering an unknown chunk in which the ancillary bit is 0 must indicate to the user that the image contains information they cannot safely interpret. The image header chunk (IHDR) is an example of a critical chunk. Second Byte: 0 (uppercase) = public, 1 (lowercase) = private If the chunk is public (part of this specification or a later edition of this specification), its second letter is uppercase. If your application requires proprietary chunks, and you have no interest in seeing the software of other vendors recognize them, use a lowercase second letter in the chunk name. Such names will never be assigned in the official specification. Note that there is no need for software to test this property bit; it simply ensures that private and public chunk names will not conflict. Third Byte: reserved, must be 0 (uppercase) always The significance of the case of the third letter of the chunk name is reserved for possible future expansion. At the present time all chunk names must have uppercase third letters. Fourth Byte: 0 (uppercase) = unsafe to copy, 1 (lowercase) = safe to copy This property bit is not of interest to pure decoders, but it is needed by PNG editors (programs that modify a PNG file). If a chunk's safe-to-copy bit is 1, the chunk may be copied to a modified PNG file whether or not the software recognizes the chunk type, and regardless of the extent of the file modifications. If a chunk's safe-to-copy bit is 0, it indicates that the chunk depends on the image data. If the program has made any changes to critical chunks, including addition, modification, deletion, or reordering of critical chunks, then unrecognized unsafe chunks must not be copied to the output PNG file. (Of course, if the program does recognize the chunk, it may choose to output an appropriately modified version.) A PNG editor is always allowed to copy all unrecognized chunks if it has only added, deleted, or modified ancillary chunks. This implies that it is not permissible to make ancillary chunks that depend on other ancillary chunks. PNG editors that do not recognize a critical chunk must report an error and refuse to process that PNG file at all. The safe/unsafe mechanism is intended for use with ancillary chunks. The safe-to-copy bit will always be 0 for critical chunks. For example, the hypothetical chunk type name "bLOb" has the property bits: bLOb <-- 32 bit Chunk Name represented in ASCII form |||| |||'- Safe to copy bit is 1 (lower case letter; bit 5 of byte is 1) ||'-- Reserved bit is 0 (upper case letter; bit 5 of byte is 0) |'--- Private bit is 0 (upper case letter; bit 5 of byte is 0) '---- Ancillary bit is 1 (lower case letter; bit 5 of byte is 1) Therefore, this name represents an ancillary, public, safe-to-copy chunk. See Rationale: Chunk naming conventions. CRC algorithm ============= Chunk CRCs are calculated using standard CRC methods with pre and post conditioning. The CRC polynomial employed is as follows: x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 The 32-bit CRC register is initialized to all 1's, and then the data from each byte is processed from the least significant bit (1) to the most significant bit (128). After all the data bytes are processed, the CRC register is inverted (its ones complement is taken). This value is transmitted (stored in the file) MSB first. For the purpose of separating into bytes and ordering, the least significant bit of the 32-bit CRC is defined to be the coefficient of the x^31 term. Practical calculation of the CRC always employs a precalculated table to greatly accelerate the computation. See Appendix: Sample CRC Code. 4. Chunk Specifications ======================= This chapter defines the standard types of PNG chunks. Critical Chunks =============== All implementations must understand and successfully render the standard critical chunks. A valid PNG image must contain an IHDR chunk, one or more IDAT chunks, and an IEND chunk. IHDR Image Header This chunk must appear FIRST. Its contents are: Width: 4 bytes Height: 4 bytes Bit depth: 1 byte Color type: 1 byte Compression type: 1 byte Filter type: 1 byte Interlace type: 1 byte Width and height give the image dimensions in pixels. They are 4-byte integers. Zero is an invalid value. The maximum for each is (2^31)-1 in order to accommodate languages which have difficulty with unsigned 4-byte values. Bit depth is a single-byte integer giving the number of bits per pixel (for palette images) or per sample (for grayscale and truecolor images). Valid values are 1, 2, 4, 8, and 16, although not all values are allowed for all color types. Color type is a single-byte integer that describes the interpretation of the image data. Color type values represent sums of the following values: 1 (palette used), 2 (color used), and 4 (full alpha used). Valid values are 0, 2, 3, 4, and 6. Bit depth restrictions for each color type are imposed both to simplify implementations and to prohibit certain combinations that do not compress well in practice. Decoders must support all legal combinations of bit depth and color type. (Note that bit depths of 16 are easily supported on 8-bit display hardware by dropping the least significant byte.) The allowed combinations are: Color Allowed Interpretation Type Bit Depths 0 1,2,4,8,16 Each pixel value is a grayscale level. 2 8,16 Each pixel value is an R,G,B series. 3 1,2,4,8 Each pixel value is a palette index; a PLTE chunk must appear. 4 8,16 Each pixel value is a grayscale level, followed by an alpha channel level. 6 8,16 Each pixel value is an R,G,B series, followed by an alpha channel level. Compression type is a single-byte integer that indicates the method used to compress the image data. At present, only compression type 0 (deflate/inflate compression with a 32K sliding window) is defined. All standard PNG images must be compressed with this scheme. The compression type code is provided for possible future expansion or proprietary variants. Decoders must check this byte and report an error if it holds an unrecognized code. See Deflate/Inflate Compression for details. Filter type is a single-byte integer that indicates the preprocessing method applied to the image data before compression. At present, only filter type 0 (adaptive filtering with five basic filter types) is defined. As with the compression type code, decoders must check this byte and report an error if it holds an unrecognized code. See Filter Algorithms for details. Interlace type is a single-byte integer that indicates the transmission order of the pixel data. Two values are currently defined: 0 (no interlace) or 1 (Adam7 interlace). See Interlaced data order for details. PLTE Palette This chunk's contents are from 1 to 256 palette entries, each a three-byte series of the form: red: 1 byte (0 = black, 255 = red) green: 1 byte (0 = black, 255 = green) blue: 1 byte (0 = black, 255 = blue) The number of entries is determined from the chunk length. A chunk length not divisible by 3 is an error. This chunk must appear for color type 3, and may appear for color types 2 and 6. If this chunk does appear, it must precede the first IDAT chunk. There cannot be more than one PLTE chunk. For color type 3 (palette data), the PLTE chunk is required. The first entry in PLTE is referenced by pixel value 0, the second by pixel value 1, etc. The number of palette entries must not exceed the range that can be represented by the bit depth (for example, 2^4 = 16 for a bit depth of 4). It is permissible to have fewer entries than the bit depth would allow. In that case, any out-of-range pixel value found in the image data is an error. For color types 2 and 6 (truecolor), the PLTE chunk is optional. If present, it provides a recommended set of from 1 to 256 colors to which the truecolor image may be quantized if the viewer cannot display truecolor directly. If PLTE is not present, such a viewer must select colors on its own, but it is often preferable for this to be done once by the encoder. Note that the palette uses 8 bits (1 byte) per value regardless of the image bit depth specification. In particular, the palette is 8 bits deep even when it is a suggested quantization of a 16-bit truecolor image. IDAT Image Data This chunk contains the actual image data. To create this data, begin with image scanlines represented as described under Image layout; the layout and total size of this raw data are determinable from the IHDR fields. Then filter the image data according to the filtering method specified by the IHDR chunk. (Note that with filter method 0, the only one currently defined, this implies prepending a filter type byte to each scanline.) Finally, compress the filtered data using the compression method specified by the IHDR chunk. The IDAT chunk contains the output datastream of the compression algorithm. To read the image data, reverse this process. There may be multiple IDAT chunks; if so, they must appear consecutively with no other intervening chunks. The compressed datastream is then the concatenation of the contents of all the IDAT chunks. The encoder may divide the compressed data stream into IDAT chunks as it wishes. (Multiple IDAT chunks are allowed so that encoders can work in a fixed amount of memory; typically the chunk size will correspond to the encoder's buffer size.) It is important to emphasize that IDAT chunk boundaries have no semantic significance and can appear at any point in the compressed datastream. A PNG file in which each IDAT chunk contains only one data byte is legal, though remarkably wasteful of space. (For that matter, zero-length IDAT chunks are legal, though even more wasteful.) See Filter Algorithms and Deflate/Inflate Compression for details. IEND Image Trailer This chunk must appear LAST. It marks the end of the PNG data stream. The chunk's data field is empty. Ancillary Chunks ================ All ancillary chunks are optional, in the sense that encoders need not write them and decoders may ignore them. However, encoders are encouraged to write the standard ancillary chunks when the information is available, and decoders are encouraged to interpret these chunks when appropriate and feasible. The standard ancillary chunks are listed in alphabetical order. This is not necessarily the order in which they would appear in a file. bKGD Background Color This chunk specifies a default background color against which the image may be presented. Note that viewers are not bound to honor this chunk; a viewer may choose to use a different background color. For color type 3 (palette), the bKGD chunk contains: palette index: 1 byte The value is the palette index of the color to be used as background. For color types 0 and 4 (grayscale, with or without alpha), bKGD contains: gray: 2 bytes, range 0 .. (2^bitdepth) - 1 (For consistency, 2 bytes are used regardless of the image bit depth.) The value is the gray level to be used as background. For color types 2 and 6 (RGB, with or without alpha), bKGD contains: red: 2 bytes, range 0 .. (2^bitdepth) - 1 green: 2 bytes, range 0 .. (2^bitdepth) - 1 blue: 2 bytes, range 0 .. (2^bitdepth) - 1 (For consistency, 2 bytes per sample are used regardless of the image bit depth.) This is the RGB color to be used as background. When present, the bKGD chunk must precede the first IDAT chunk, and must follow the PLTE chunk, if any. See Recommendations for Decoders: Background color. cHRM Primary Chromaticities and White Point Applications that need precise specification of colors in a PNG file may use this chunk to specify the chromaticities of the red, green, and blue primaries used in the image, and the referenced white point. These values are based on the 1931 CIE (International Color Committee) XYZ color space. Only the chromaticities (x and y) are specified. The chunk layout is: White Point x: 4 bytes White Point y: 4 bytes Red x: 4 bytes Red y: 4 bytes Green x: 4 bytes Green y: 4 bytes Blue x: 4 bytes Blue y: 4 bytes Each value is encoded as a 4-byte unsigned integer, representing the x or y value times 100000. If the cHRM chunk appears, it must precede the first IDAT chunk, and it must also precede the PLTE chunk if present. gAMA Gamma Correction The gamma correction chunk specifies the gamma of the camera (or simulated camera) that produced the image, and thus the gamma of the image with respect to the original scene. Note that this is not the same as the gamma of the display device that will reproduce the image correctly. The chunk's contents are: Image gamma value: 4 bytes A value of 100000 represents a gamma of 1.0, a value of 45000 a gamma of 0.45, and so on (divide by 100000.0). Values around 1.0 and around 0.45 are common in practice. If the encoder does not know the gamma value, it should not write a gamma chunk; the absence of a gamma chunk indicates the gamma is unknown. If the gAMA chunk appears, it must precede the first IDAT chunk, and it must also precede the PLTE chunk if present. See Gamma correction, Recommendations for Encoders: Encoder gamma handling, and Recommendations for Decoders: Decoder gamma handling. hIST Image Histogram The histogram chunk gives the approximate usage frequency of each color in the color palette. A histogram chunk may appear only when a palette chunk appears. If a viewer is unable to provide all the colors listed in the palette, the histogram may help it decide how to choose a subset of the colors for display. This chunk's contents are a series of 2-byte (16 bit) unsigned integers. There must be exactly one entry for each entry in the PLTE chunk. Each entry is proportional to the fraction of pixels in the image that have that palette index; the exact scale factor is chosen by the encoder. Histogram entries are approximate, with the exception that a zero entry specifies that the corresponding palette entry is not used at all in the image. It is required that a histogram entry be nonzero if there are any pixels of that color. When the palette is a suggested quantization of a truecolor image, the histogram is necessarily approximate, since a decoder may map pixels to palette entries differently than the encoder did. In this situation, zero entries should not appear. The hIST chunk, if it appears, must follow the PLTE chunk, and must precede the first IDAT chunk. See Rationale: Palette histograms, and Recommendations for Decoders: Palette histogram usage. pHYs Physical Pixel Dimensions This chunk specifies the intended resolution for display of the image. The chunk's contents are: 4 bytes: pixels per unit, X axis (unsigned integer) 4 bytes: pixels per unit, Y axis (unsigned integer) 1 byte: unit specifier The following values are legal for the unit specifier: 0: unit is unknown (pHYs defines pixel aspect ratio only) 1: unit is the meter Conversion note: one inch is equal to exactly 0.0254 meters. If this ancillary chunk is not present, pixels are assumed to be square, and the physical size of each pixel is unknown. If present, this chunk must precede the first IDAT chunk. See Recommendations for Decoders: Pixel dimensions. sBIT Significant Bits To simplify decoders, PNG specifies that only certain bit depth values be used, and further specifies that pixel values must be scaled to the full range of possible values at that bit depth. However, the sBIT chunk is provided in order to store the original number of significant bits, since this information may be of use to some decoders. We recommend that an encoder emit an sBIT chunk if it has converted the data from a different bit depth. For color type 0 (grayscale), the sBIT chunk contains a single byte, indicating the number of bits which were significant in the source data. For color type 2 (RGB truecolor), the sBIT chunk contains three bytes, indicating the number of bits which were significant in the source data for the red, green, and blue channels, respectively. For color type 3 (palette color), the sBIT chunk contains three bytes, indicating the number of bits which were significant in the source data for the red, green, and blue components of the palette entries, respectively. For color type 4 (grayscale with alpha channel), the sBIT chunk contains two bytes, indicating the number of bits which were significant in the source grayscale data and the source alpha channel data, respectively. For color type 6 (RGB truecolor with alpha channel), the sBIT chunk contains four bytes, indicating the number of bits which were significant in the source data for the red, green, blue and alpha channels, respectively. Note that sBIT does not have any implications for the interpretation of the stored image: the bit depth indicated by IHDR is the correct depth. sBIT is only an indication of the history of the image. However, an sBIT chunk showing a bit depth less than the IHDR bit depth does mean that not all possible color values occur in the image; this fact may be of use to some decoders. If the sBIT chunk appears, it must precede the first IDAT chunk, and it must also precede the PLTE chunk if present. tEXt Textual Data Any textual information that the encoder wishes to record with the image is stored in tEXt chunks. Each tEXt chunk contains a keyword and a text string, in the format: Keyword: n bytes (character string) Null separator: 1 byte Text: n bytes (character string) The keyword and text string are separated by a zero byte (null character). Neither the keyword nor the text string may contain a null character. Note that the text string is not null-terminated (the length of the chunk is sufficient information to locate the ending). The keyword must be at least one character and less than 80 characters long. The text string may be of any length from zero bytes up to the maximum permissible chunk size. Any number of tEXt chunks may appear, and more than one with the same keyword is permissible. The keyword indicates the type of information represented by the text string. The following keywords are predefined and should be used where appropriate: Title Short (one line) title or caption for image Author Name of image's creator Copyright Copyright notice Description Description of image (possibly long) Software Software used to create the image Disclaimer Legal disclaimer Warning Warning of nature of content Source Device used to create the image Comment Miscellaneous comment; conversion from GIF comment Other keywords, containing any sequence of printable characters in the character set, may be invented for other purposes. Keywords of general interest may be registered with the maintainers of the PNG specification. Keywords must be spelled exactly as registered, so that decoders may use simple literal comparisons when looking for particular keywords. In particular, keywords are considered case-sensitive. Both keyword and text are interpreted according to the ISO 8859-1 (Latin-1) character set. Newlines in the text string should be represented by a single linefeed character (decimal 10); use of other ASCII control characters is discouraged. See Recommendations for Encoders: Text chunk processing and Recommendations for Decoders: Text chunk processing. tIME Image Last-Modification Time This chunk gives the time of the last image modification (not the time of initial image creation). The chunk contents are: 2 bytes: Year (complete; for example, 1995, not 95) 1 byte: Month (1-12) 1 byte: Day (1-31) 1 byte: Hour (0-23) 1 byte: Minute (0-59) 1 byte: Second (0-60) (yes, 60, for leap seconds; not 61, a common error) Universal Time (UTC, also called GMT) should be specified rather than local time. tRNS Transparency Transparency is an alternative to the full alpha channel. Although transparency is not as elegant as the full alpha channel, it requires less storage space and is sufficient for many common cases. For color type 3 (palette), this chunk's contents are a series of alpha channel bytes, corresponding to entries in the PLTE chunk: Alpha for palette index 0: 1 byte Alpha for palette index 1: 1 byte etc. Each entry indicates that pixels of that palette index should be treated as having the specified alpha value. Alpha values have the same interpretation as in an 8-bit full alpha channel: 0 is fully transparent, 255 is fully opaque, regardless of image bit depth. The tRNS chunk may contain fewer alpha channel bytes than there are palette entries. In this case, the alpha channel value for all remaining palette entries is assumed to be 255. In the common case where only palette index 0 need be made transparent, only a one-byte tRNS chunk is needed. The tRNS chunk may not contain more bytes than there are palette entries. For color type 0 (grayscale), the tRNS chunk contains a single gray level value, stored in the format gray: 2 bytes, range 0 .. (2^bitdepth) - 1 (For consistency, 2 bytes are used regardless of the image bit depth.) Pixels of the specified gray level are to be treated as transparent (equivalent to alpha value 0); all other pixels are to be treated as fully opaque (alpha value (2^bitdepth)-1). For color type 2 (RGB), the tRNS chunk contains a single RGB color value, stored in the format red: 2 bytes, range 0 .. (2^bitdepth) - 1 green: 2 bytes, range 0 .. (2^bitdepth) - 1 blue: 2 bytes, range 0 .. (2^bitdepth) - 1 (For consistency, 2 bytes per sample are used regardless of the image bit depth.) Pixels of the specified color value are to be treated as transparent (equivalent to alpha value 0); all other pixels are to be treated as fully opaque (alpha value (2^bitdepth)-1). tRNS is prohibited for color types 4 and 6, since a full alpha channel is already present in those cases. Note: when dealing with 16-bit grayscale or RGB data, it is important to compare both bytes of the sample values to determine whether a pixel is transparent. Although decoders may drop the low-order byte of the samples for display, this must not occur until after the data has been tested for transparency. For example, if the grayscale level 0x0001 is specified to be transparent, it would be incorrect to compare only the high-order byte and decide that 0x0002 is also transparent. When present, the tRNS chunk must precede the first IDAT chunk, and must follow the PLTE chunk, if any. zTXt Compressed Textual Data A zTXt chunk contains textual data, just as tEXt does; however, zTXt takes advantage of compression. A zTXt chunk begins with an uncompressed Latin-1 keyword followed by a null (0) character, just as in the tEXt chunk. The next byte after the null contains a compression type byte, for which the only presently legitimate value is zero (deflate/inflate compression). The compression-type byte is followed by a compressed data stream which makes up the remainder of the chunk. Decompression of this data stream yields Latin-1 text which is equivalent to the text stored in a tEXt chunk. Any number of zTXt and tEXt chunks may appear in the same file. See the preceding definition of the tEXt chunk for the predefined keywords and the exact format of the text. See Deflate/Inflate Compression, Recommendations for Encoders: Text chunk processing, and Recommendations for Decoders: Text chunk processing. Summary of Standard Chunks ========================== This table summarizes some properties of the standard chunk types. Critical chunks (must appear in this order, except PLTE is optional): Name Multiple Ordering constraints OK? IHDR No Must be first PLTE No Before IDAT IDAT Yes Multiple IDATs must be consecutive IEND No Must be last Ancillary chunks (need not appear in this order): Name Multiple Ordering constraints OK? cHRM No Before PLTE and IDAT gAMA No Before PLTE and IDAT sBIT No Before PLTE and IDAT bKGD No After PLTE; before IDAT hIST No After PLTE; before IDAT tRNS No After PLTE; before IDAT pHYs No Before IDAT tIME No None tEXt Yes None zTXt Yes None Standard keywords for tEXt and zTXt chunks: Title Short (one line) title or caption for image Author Name of image's creator Copyright Copyright notice Description Description of image (possibly long) Software Software used to create the image Disclaimer Legal disclaimer Warning Warning of nature of content Source Device used to create the image Comment Miscellaneous comment; conversion from GIF comment Additional Chunk Types ====================== Additional public PNG chunk types are defined in the document "PNG Special-Purpose Public Chunks", available by FTP from or via WWW from 5. Deflate/Inflate Compression ============================== PNG compression type 0 (the only compression method presently defined for PNG) specifies deflate/inflate compression with a 32K window. Deflate compression is an LZ77 derivative used in zip, gzip, pkzip and related programs. Extensive research has been done supporting its patent-free status. Portable C implementations are freely available. Documentation and C code for deflate are available from the Info-Zip archives at Deflate-compressed datastreams within PNG are stored in the "zlib" format, which has the structure: Compression method/flags code: 1 byte Additional flags/check bits: 1 byte Compressed data blocks: n bytes Checksum: 4 bytes Further details on this format may be found in the zlib specification. At this writing, the zlib specification is at draft 3.1, and is available from For PNG compression type 0, the zlib compression method/flags code must specify method code 8 ("deflate" compression) and an LZ77 window size of not more than 32K. The checksum stored at the end of the zlib datastream is calculated on the uncompressed data represented by the datastream. Note that the algorithm used is not the same as the CRC calculation used for PNG chunk checksums. Verifying the chunk CRCs provides adequate confidence that the PNG file has been transmitted undamaged. The zlib checksum is useful mainly as a crosscheck that the deflate and inflate algorithms are implemented correctly. In a PNG file, the concatenation of the contents of all the IDAT chunks makes up a zlib datastream as specified above. This datastream decompresses to filtered image data as described elsewhere in this document. It is important to emphasize that the boundaries between IDAT chunks are arbitrary and may fall anywhere in the zlib datastream. There is not necessarily any correlation between IDAT chunk boundaries and deflate block boundaries or any other feature of the zlib data. For example, it is entirely possible for the terminating zlib checksum to be split across IDAT chunks. PNG also uses zlib datastreams in zTXt chunks. In a zTXt chunk, the remainder of the chunk following the compression type code byte is a zlib datastream as specified above. This datastream decompresses to the user-readable text described by the chunk's keyword. Unlike the image data, such datastreams are not split across chunks; each zTXt chunk contains an independent zlib datastream. 6. Filter Algorithms ==================== This chapter describes the pixel filtering algorithms which may be applied in advance of compression. The purpose of these filters is to prepare the image data for optimum compression. PNG defines five basic filtering algorithms, which are given numeric codes as follows: Code Name 0 None 1 Sub 2 Up 3 Average 4 Paeth The encoder may choose which algorithm to apply on a scanline-by-scanline basis. In the image data sent to the compression step, each scanline is preceded by a filter type byte containing the numeric code of the filter algorithm used for that scanline. Filtering algorithms are applied to bytes, not to pixels, regardless of the bit depth or color type of the image. The filtering algorithms work on the byte sequence formed by a scanline that has been represented as described under Image layout. When the image is interlaced, each pass of the interlace pattern is treated as an independent image for filtering purposes. The filters work on the byte sequences formed by the pixels actually transmitted during a pass, and the "previous scanline" is the one previously transmitted in the same pass, not the one adjacent in the complete image. Note that the subimage transmitted in any one pass is always rectangular, but is of smaller width and/or height than the complete image. Filtering is not applied when this subimage is empty. For all filters, the bytes "to the left of" the first pixel in a scanline must be treated as being zero. For filters that refer to the prior scanline, the entire prior scanline must be treated as being zeroes for the first scanline of an image (or of a pass of an interlaced image). To reverse the effect of a filter, the decoder must use the decoded values of the prior pixel on the same line, the pixel immediately above the current pixel on the prior line, and the pixel just to the left of the pixel above. This implies that at least one scanline's worth of image data must be stored by the decoder at all times. Even though some filter types do not refer to the prior scanline, the decoder must always store each scanline as it is decoded, since the next scanline might use a filter that refers to it. PNG imposes no restriction on which filter types may be applied to an image. However, the filters are not equally effective on all types of data. See Recommendations for Encoders: Filter selection. Filter type 0: None =================== With the None filter, the scanline is transmitted unmodified; it is only necessary to insert a filter type byte before the data. Filter type 1: Sub ================== The Sub filter transmits the difference between each byte and the value of the corresponding byte of the prior pixel. To compute the Sub filter, apply the following formula to each byte of each scanline: Sub(x) = Raw(x) - Raw(x-bpp) where x ranges from zero to the number of bytes representing that scanline minus one, Raw(x) refers to the raw data byte at that byte position in the scanline, and bpp is defined as the number of bytes per complete pixel, rounding up to one. For example, for color type 2 with a bit depth of 16, bpp is equal to 6 (three channels, two bytes per channel); for color type 0 with a bit depth of 2, bpp is equal to 1 (rounding up); for color type 4 with a bit depth of 16, bpp is equal to 4 (two-byte grayscale value, plus two-byte alpha channel). Note this computation is done for each byte, regardless of bit depth. In a 16-bit image, MSBs are differenced from the preceding MSB and LSBs are differenced from the preceding LSB, because of the way that bpp is defined. Unsigned arithmetic modulo 256 is used, so that both the inputs and outputs fit into bytes. The sequence of Sub values is transmitted as the filtered scanline. For all x < 0, assume Raw(x) = 0. To reverse the effect of the Sub filter after decompression, output the following value: Sub(x) + Raw(x-bpp) (computed mod 256), where Raw refers to the bytes already decoded. Filter type 2: Up ================= The Up filter is just like the Sub filter except that the pixel immediately above the current pixel, rather than just to its left, is used as the predictor. To compute the Up filter, apply the following formula to each byte of each scanline: Up(x) = Raw(x) - Prior(x) where x ranges from zero to the number of bytes representing that scanline minus one, Raw(x) refers to the raw data byte at that byte position in the scanline, and Prior(x) refers to the unfiltered bytes of the prior scanline. Note this is done for each byte, regardless of bit depth. Unsigned arithmetic modulo 256 is used, so that both the inputs and outputs fit into bytes. The sequence of Up values is transmitted as the filtered scanline. On the first scanline of an image (or of a pass of an interlaced image), assume Prior(x) = 0 for all x. To reverse the effect of the Up filter after decompression, output the following value: Up(x) + Prior(x) (computed mod 256), where Prior refers to the decoded bytes of the prior scanline. Filter type 3: Average ====================== The Average filter uses the average of the two neighboring pixels (left and above) to predict the value of a pixel. To compute the Average filter, apply the following formula to each byte of each scanline: Average(x) = Raw(x) - floor((Raw(x-bpp)+Prior(x))/2) where x ranges from zero to the number of bytes representing that scanline minus one, Raw(x) refers to the raw data byte at that byte position in the scanline, Prior(x) refers to the unfiltered bytes of the prior scanline, and bpp is defined as for the Sub filter. Note this is done for each byte, regardless of bit depth. The sequence of Average values is transmitted as the filtered scanline. The subtraction of the predicted value from the raw byte must be done modulo 256, so that both the inputs and outputs fit into bytes. However, the sum Raw(x-bpp)+Prior(x) must be formed without overflow (using at least nine-bit arithmetic). floor() indicates that the result of the division is rounded to the next lower integer if fractional; in other words, it is an integer division or right shift operation. For all x < 0, assume Raw(x) = 0. On the first scanline of an image (or of a pass of an interlaced image), assume Prior(x) = 0 for all x. To reverse the effect of the Average filter after decompression, output the following value: Average(x) + floor((Raw(x-bpp)+Prior(x))/2) where the result is computed mod 256, but the prediction is calculated in the same way as for encoding. Raw refers to the bytes already decoded, and Prior refers to the decoded bytes of the prior scanline. Filter type 4: Paeth ==================== The Paeth filter computes a simple linear function of the three neighboring pixels (left, above, upper left), then chooses as predictor the neighboring pixel closest to the computed value. This technique is taken from Alan W. Paeth's article "Image File Compression Made Easy" in Graphics Gems II, James Arvo, editor, Academic Press, 1991. To compute the Paeth filter, apply the following formula to each byte of each scanline: Paeth(x) = Raw(x) - PaethPredictor(Raw(x-bpp),Prior(x),Prior(x-bpp)) where x ranges from zero to the number of bytes representing that scanline minus one, Raw(x) refers to the raw data byte at that byte position in the scanline, Prior(x) refers to the unfiltered bytes of the prior scanline, and bpp is defined as for the Sub filter. Note this is done for each byte, regardless of bit depth. Unsigned arithmetic modulo 256 is used, so that both the inputs and outputs fit into bytes. The sequence of Paeth values is transmitted as the filtered scanline. The PaethPredictor function is defined by the following pseudocode: function PaethPredictor (a, b, c) begin ; a = left, b = above, c = upper left p := a + b - c ; initial estimate pa := abs(p - a) ; distances to a, b, c pb := abs(p - b) pc := abs(p - c) ; return nearest of a,b,c, ; breaking ties in order a,b,c. if pa <= pb AND pa <= pc begin return a end if pb <= pc begin return b end return c end The calculations within the PaethPredictor function must be performed exactly, without overflow. Arithmetic modulo 256 is to be used only for the final step of subtracting the function result from the target pixel value. Note that the order in which ties are broken is fixed and must not be altered. The tie break order is: pixel to the left, pixel above, pixel to the upper left. (This order differs from that given in Paeth's article.) For all x < 0, assume Raw(x) = 0 and Prior(x) = 0. On the first scanline of an image (or of a pass of an interlaced image), assume Prior(x) = 0 for all x. To reverse the effect of the Paeth filter after decompression, output the following value: Paeth(x) + PaethPredictor(Raw(x-bpp),Prior(x),Prior(x-bpp)) (computed mod 256), where Raw and Prior refer to bytes already decoded. Exactly the same PaethPredictor function is used by both encoder and decoder. For more information, check out the above ftp sites. EXTENSION:PNG OCCURENCES:PC,UNIX,AMIGA PROGRAMS:???? REFERENCE:The PNG Specification --------M-PTM------------------------------- Poly Tracker is a Scream Tracker 3 like tracker written by Lone Ranger of AcmE. This is a description of version 2.03 of the PTM format. Early formats are no longer used or supported by the current version of Poly Tracker (it still says "version 1.0β", but there have been about a dozen different versions, including some customized test versions). The samples are stored using delta-compression. OFFSET Count TYPE Description 0000h 28 char Songname in ASCIZ format, 0 padded 001Ch 1 char ID=#26 001Dh 1 word File type version, currently 0203h 001Fh 1 byte reserved (0) 0020h 1 word Number of orders ="ORD" 0022h 1 word Number of instruments ="INS" 0024h 1 word Number of patterns ="PAT" 0026h 1 word Number of voices used ="CHN" 0028h 1 word File flags (always 0 ??) 002Ah 1 word reserved (0) 002Ch 4 char ID='PTMF' 0030h 16 byte reserved (0) 0040h 32 byte Pan settings for each channel : 0 = left, 7 = middle, 15 = right 0060h 256 byte Order list, valid entries are 0.."ORD" 0160h 128 word (Pattern offsets) div 16 The instruments data follows immediately after the header. --- PTM instrument format There are 0.."INS" instruments in the file, each of the following format : OFFSET Count TYPE Description 0000h 1 byte Sample type (bit mapped) 0,1 : 0 - no sample (instrument info only) 1 - normal sample (FileOfs / Length fields are valid) 2 - OPL2 / OPL3 instrument (not used) 3 - MIDI instrument (not used) 2 - sample loop (0 = no loop, 1 = loop) 3 - loop type (0 = unidirectional, 1 = bidirectional) 4 - sample resolution (0 = 8 bits, 1 = 16 bits) 0001h 12 char Name of external sample file 000Dh 1 byte Default volume for sample 000Eh 1 word C4 speed 0010h 1 word reserved (0) 0012h 1 dword absolute? offset of sample data 0016h 1 dword Size of sample in bytes 001Ah 1 dword Start of loop 001Eh 1 dword End of loop 0022h 13 byte reserved (0) 0030h 28 char ASCIZ name of sample 004Ch 4 char ID='PTMS' EXTENSION:PTM OCCURENCES:PC --------M-PS16------------------------------ The Protracker Studio 16 Modules are yet another digital music format. The Protracker modules can have up to 255 different patterns and a length of up to 255 patterns with 31 instruments. The samples can only have a size of up to 64K, there is a maximum of 16 tracks supported. The header of each MOD file looks like this : OFFSET Count TYPE Description 0000h 5 char Header string ID='PS16',254 0005h 75 char Song name, ending with ^Z so that if typing the file will result in PS16 <SongName> 0050h 1 byte File type : 0 - Module with patterns and samples 1 - Song with patterns but without samples 0051h 1 dword Offset of comment field from start of file. Zero if no commend is stored. 0055h 1 byte Format version byte (0) 0056h 1 byte Number of patterns in the file ="PAT" 0057h 1 dword Total size of all patterns in bytes, stored for quick disk reads. 005Bh 1 byte Songlength, number of sequences. 005Ch 128 byte Sequencing information for file 00CCh 31 rec Sample information 1 byte Sample flags, bitmapped : 0 - synthesized / digital 1 - Waveform / FM (only if bit 0 is set) 2 - 16 bit / 8 bit 1 byte Default volume for the sample (0-64) 1 byte Sample fine tuning (signed nibble) 1 dword Sample length - Protracker does only support samples with a size less than 64K. 1 dword Sample loop start 1 dword Sample loop length 1 word Default playback frequency for C2 Can be used to fine tune a sample. 00CCh+ "PAT" rec The pattern information 31*17 The tracks are stored sequentially after each other, first all rows of track 1, then all rows of track 2 and so on. 1 word Pattern size+3, rounded up to a paragraph boundary. 1 byte Number of rows in pattern ="ROW" "ROW" rec 2 byte Note information, bitmapped : 0-5 - Note (see table 0005) 6 - Bit 4 of instrument 7 - Compression bit If this bit not set, there is another byte following the note record specifying the row where the next event takes place - if it is set, the next note follows immediately. A track is terminated by a 0FFh byte. 8-11 - Effect bits 12-15 - Bits 0-3 of instrument 1 byte Effect data 00CCh+ ? byte Sample data in delta format 31*17+ See algorthm.txt for details. ???? The comment block contains information about the sample names as well as some comments to the module. It is formatted like this : OFFSET Count TYPE Description 0000h 4 char ID='INST' 1 byte Instrument name length ="LEN" 1 byte Sample name count ="CNT" "LEN"*"CNT" char Sample names 0000h+ 4 char ID='TEXT' "LEN"*"CNT"+4 1 word Length of following text EXTENSION:MOD OCCURENCES:PC PROGRAMS:Protracker REFERENCE: SEE ALSO:DMF,MOD,S3M,STM VALIDATION: --------I-QFX------------------------------- QFX files are yet another graphic file format used to store received fax images. The .QFX file format is proprietary to Smith Micro Software, Inc. and is used by the Quick Link II fax software. The QFX file header is exactly 1536 bytes long. The fax pages themselves are stored in byte aligned, bit reversed T4 format terminated with 6 EOL's. See CCITT Recommendation T.4 for full documentation on this coding scheme. OFFSET Count TYPE Description 0000h 8 char ID='QLIIFAX',0 0008h 1 word Number of pages in the QFX file 000Ah 1 word Number of scan lines on last page 000Ch 1 dword Number of scan lines for all pages 0010h 1 word Horizontal scaling 1 - High res (200x200), 2 - Normal res (200x100) 0012h 1 word Vertical scaling (always = 1). 0014h 12 byte reserved 0020h 375 dword Offsets of the single pages in the document. Page 1 always starts at offset 1536. The last non-zero dword points to the end of the last page, the first zero dword marks the end of the pages. 0600h ? byte Start of fax page images EXTENSION:QFX OCCURENCES:PC PROGRAMS:Quick Link II --------S-RAW------------------------------- The RAW files are raw signed PCM sound files. PCM means Pulse Code Modulation - which can be played through most sound devices without further manipulation. There is no header or whatsoever. The properties include 8/16-bit samples in INTEL order, stereo or mono format. No identification is possible. EXTENSION:RAW SEE ALSO:SND --------I-RDIB------------------------------ The RDIB files are Device Independent Bitmaps used by Windows. They are RIFF format files. The blocks are unknown to me. SEE ALSO:RIFF --------f-RIFF------------------------------ The RIFF (Resource interchange file format) format was created by Microsoft and is used by many applications like Windows, Corel Draw etc.. It is block structured, each block has a header ID and a size, so that even a program that works with an old version of the file format can skip the unknown parts of the file and work on the known parts of the file. All RIFF blocks begin on a word boundary so it might be necessary to skip an additional byte. In the present specification, only one RIFF block per file is allowed, and only the RIFF and LIST blocks may contain subblocks. The order of blocks in a RIFF file is not mandatory, so you should always scan the whole file for the block ID you seek. Throughout this file, the RIFF block IDs are given in square brackets []. Each ID is always 4 characters dword. OFFSET Count TYPE Description 0000h 4 char ID='RIFF' Each RIFF format file has a header with the signature and the size of the following blocks. 0004h 1 dword Block size. This size is the size of the block controlled by the RIFF header. Normally this equals the file size. ="BSZ" 0008h 4 char Format name. This is the format name of the RIFF file. After this RIFF header comes the first RIFF record. Each RIFF record has the following format : OFFSET Count TYPE Description 0000h 4 char Signature. This is the description of what is contained in this block. 0004h 1 dword Block size. This is the size of the following data block. To get the offset of the next RIFF block record, you have to add this value + 8 to the offset of the current record. ---RiffBLOCK [LIST] This block contains a string list, again in the RIFF subblock format. This list is used for messages and/or copyright messages. All strings in the LIST block share the same format, each block contains one ASCIIZ string - the most common LIST block is the [INFO] block, which can contain the following subblocks : [INAM] The name of the data stored in the file [ICRD] Creation date of the file SEE ALSO:BMP,rDIB,IFF,WAVe,RIFX OCCURENCES:PC PROGRAMS:Windows,Corel Draw REFERENCE:DDJ0994 VALIDATION:FileSize="BSZ"+8 --------f-RIFX-M---------------------------- The RIFX file format is identical to the RIFF file format except that all values are in Motorola byte order. OFFSET Count TYPE Description 0000h 4 char ID='RIFX' 0004h 1 dword Block size. This size is the size of the block controlled by the RIFX header. ="BSZ" 0008h 4 char Format name. REFERENCE:DDJ0994 SEE ALSO:RIFF --------S-S3I------------------------------- This is the Digiplayer/ST3.0 digital sample file format. The sample files include information about the loop of the instrument. The AdLib instruments have another format listed below. OFFSET Count TYPE Description 0000h 1 byte ID=01h 0001h 12 char DOS filename 000Dh 1 byte reserved (0) 000Eh 1 word Paragraph offset of the raw sample data from beginning of file. 0010h 1 dword Sample length in bytes 0014h 1 dword Start of sample loop 0018h 1 dword End of sample loop 001Ch 1 byte Playback volumne of sample 001Dh 1 byte ??? "DSK" what ever that means 001Eh 1 byte Pack type 0 - unpacked 1 - DP30ADPCM 1 001Fh 1 byte Flags (bitmapped) 0 - loop on/off 1 - stereo sample (length bytes for left channel, then another length bytes for right channel!) 2 - 16-Bit samples (in Intel byte order) 0020h 1 dword C2 frequency 0024h 1 dword reserved 0028h 1 word reserved 002Ah 1 word ID=512 002Ch 1 dword ?? Date of last modification ?? (see table 0009) 0030h 28 char ASCIIZ Sample name 003Ch 4 char ID='SCRS' 0040h ? byte Raw sample data Here follows the AdLib instrument format for which I don't know the extension (yet) : OFFSET Count TYPE Description 0000h 1 byte Instrument type 2 - melodic instrument 3 - bass drum 4 - snare drum 5 - tom tom 6 - cymbal 7 - hihat 0001h 12 char DOS file name 000Dh 3 byte reserved 0010h 1 byte Modulator description (bitmapped) 0-3 - frequency multiplier 4 - scale envelope 5 - sustain 6 - pitch vibrato 7 - volume vibrato 0011h 1 byte Carrier description (same as modulator) 0012h 1 byte Modulator miscellaneous (bitmapped) 0-5 - 63-volume 6 - MSB of levelscale 7 - LSB of levelscale 0013h 1 byte Carrier description (same as modulator) 0014h 1 byte Modulator attack / decay byte (bitmapped) 0-3 - Decay 4-7 - Attack 0015h 1 byte Carrier description (same as modulator) 0016h 1 byte Modulator sustain / release byte (bitmapped) 0-3 - Release count 4-7 - 15-Sustain 0017h 1 byte Carrier description (same as modulator) 0018h 1 byte Modulator wave select 0019h 1 byte Carrier wave select 001Ah 1 byte Modulator feedback byte (bitmapped) 0 - additive synthesis on/off 1-7 - modulation feedback 001Bh 1 byte reserved 001Ch 1 byte Instrument playback volume 001Dh 1 byte ??? "DSK" 001Eh 1 word reserved 0020h 1 dword C2 frequency 0024h 12 byte reserved 0030h 28 char ASCIIZ Instrument name 004Ch 4 char ID='SCRI' EXTENSION:S3I,SMP OCCURENCES:PC PROGRAMS:ScreamTracker 3.0 SEE ALSO:MTM,S3M,STM --------M-S3M------------------------------- The ScreamTracker composer and the ScreamTracker Music Interface Kit (STMIK) were written by the demo group Future Crew for their demonstrations and released. S3M files are the files of the version 3 of the ScreamTracker. OFFSET Count TYPE Description 0000h 20 char Song name, ASCII, 0 padded 001Ch 1 byte ID=1Ah 001Dh 1 byte Filetype : 16=Module 17=Song ? What is this supposed to mean ? 001Eh 1 word Reserved 0020h 1 word Number of orders in song ="ORD" 0022h 1 word Number of instruments in song ="INS" 0024h 1 word Number of patterns in song ="PAT" 0026h 1 word Song flags, bitmapped 0 - ScreamTracker 2.0 type vibrato 1 - ScreamTracker 2.0 type tempo 2 - Amiga type slides 3 - Zero volume optimizations 4 - Amiga limits 5 - enable filters / sfx 0028h 1 word Tracker version 002Ah 1 word File format version 1=Original format 2=Original format, unsigned samples 002Ch 4 char ID='SCRM' 0032h 1 byte Maximum volume 0033h 1 byte Initial speed 0034h 1 byte Initial tempo 0035h 1 byte Master multiplier Whats this ???? 0036h 12 byte reserved 0040h 32 byte Channel balance settings 0=left 127=right +128=disabled 255=unused 0060h "ORD" byte Ordering sequence of the patterns 0060h "INS" word Offset of the instruments in paragraphs from +"ORD" begin of header (for binary offset, multiply with 16) 0060h "PAT" word Offset of the pattern data from begin of header +"ORD" in paragraphs +"INS"*2 EXTENSION:S3M OCCURENCES:PC PROGRAMS:ScreamTracker 3.0 SEE ALSO:S3I,STM,S2M --------S-SND------------------------------- The SND files are raw unsigned PCM sound files. PCM means Pulse Code Modulation - which can be played through most sound devices without further manipulation. There is no header or whatsoever. The properties include 8/16-bit samples in INTEL order, stereo or mono format. No identification is possible. EXTENSION:SND SEE ALSO:RAW --------A-SQZ------------------------------- The SQZ files are yet another archive format. The SQZ archives consist of one archive header and several file headers. The archive header has the following format : OFFSET Count TYPE Description 0000h 5 char ID='HLSQZ' 0005h 1 char Version in ASCII ID='1' 0006h 1 byte OS byte, 0 - PC-DOS / MS-DOS 1 - OS/2 2 - MVS 3 - HPFS (OS/2) 4 - Amiga 5 - Macintosh 6 - Unix 0007h 1 byte Misc. flags, bitmapped : 0 - Intel byte order / Motorola byte order 1 - Filetime in ?? / File time in DOS format 2 - No security envelope / security envelope 3-7 - reserved After the header and each block, there is one byte denoting the type/size of the next block : OFFSET Count TYPE Description 0000h 1 byte Block/size specifier : 0 - End of archive 1 - Comment 2 - Password 3 - Security envelope 4-18 - reserved >18 - normal file header, byte value is size of header The normal file header then has the following format : OFFSET Count TYPE Description 0000h 1 byte Checksum of header 0001h 1 byte Flags, bitmapped : 0-3 : Compression method : 0 - 1 - 2 - 3 - 4 - 4 - Security envelope should follow 5-7 - reserved 0002h 1 dword Compressed size of file 0006h 1 dword Original file size 000Ah 1 dword Date and time of file (see table 0009) 000Eh 1 byte File attributes 000Fh 1 dword CRC-32 of file 0013h ? char Filename (see above for length) The comment block : OFFSET Count TYPE Description 0000h 1 word Size of uncompressed comment 0002h 1 word Size of compressed comment data ="LEN" 0004h 1 byte Flags, bitmapped, see above 0005h 1 dword CRC-32 0009h "LEN" byte Compressed comment data The password block : OFFSET Count TYPE Description 0000h 1 word Size of password block (=4) 0004h 1 dword CRC-32 of password Other blocks : OFFSET Count TYPE Description 0000h 1 word Size of this block ="LEN" 0002h "LEN" byte Block data EXTENSION:SQZ OCCURENCES:PC PROGRAMS:?? REFERENCE: SEE ALSO: --------S-SDK------------------------------- The SDK files are disk images from disks used by the Roland S-550/S-50/S-330 sampler devices. Further information wanted. EXTENSION:SDK --------S-SDS------------------------------- The SDS files are MIDI Sample Dump Standart files and are used to transfer samples between MIDI devices. Further information wanted. EXTENSION:SDS SEE ALSO:MID,SDX --------S-SDX------------------------------- The SDX file are like the SDS files sample dump files used for transfer of data between MIDI devices. EXTENSION:SDX SEE ALSO:MID,SDS --------S-SMP------------------------------- The SMP files are digital sample files used by Samplevision software. Further information wanted. EXTENSION:SMP --------M-STM------------------------------- The ScreamTracker 1.0 format was the module format used by the ScreamTracker before version 2.0. OFFSET Count TYPE Description 0000h 20 char ASCIIZ song name 0014h 8 char Tracker name 001Ch 1 byte ID=1Ah 001Dh 1 byte File type 1 - song (contains no samples) 2 - module (contains samples) 001Eh 1 byte Major version number 001Fh 1 byte Minor version number 0020h 1 byte Playback tempo 0021h 1 byte Number of patterns ="PAT" 0022h 1 byte Global playback volume 0023h 13 byte reserved 0030h 31 rec Instrument data 12 char ASCIIZ instrument name 1 byte ID=0 1 byte Instrument disk 1 word reserved 1 word Sample length in bytes 1 word Sample loop start 1 word Sample loop end 1 byte Sample playback volume 1 byte reserved 1 word C3 frequency in Hz 1 dword reserved 1 word length in paragraphs (only for modules,in songs:reserved) 03D0h 64 byte Pattern orders 0410h 4*64*"PAT" rec Pattern data. Each pattern consists of 64 rows, each 4 channels. The channels are stored from left ro right, row by row. 1 byte Note byte : 251 - last 3 bytes not stored, all bytes 0 252 - last 3 bytes not stored, note -0-, whatever that means. 253 - last 3 bytes not stored, note ... 254 - undefined (reserved for run-time) 255 - undefined (reserved for run-time) otherwise bit mapped : 0-3 : note (c=0,c#=1...) 4-7 : octave 1 byte Only valid if above byte < 251, bit mapped 0-2 ; lower bit of note volume 3-7 : instrument number 1 byte bit mapped 0-3 : Effect command in ProTracker format seems to be overlapped by volume bits... 4-6 : upper bits of volume 1 byte command data in ProTracker format 0410h+ ? byte Raw sample data padded to 16 byte boundaries. 4*64*4*"PAT" EXTENSION:STM OCCURENCES:PC PROGRAMS:ScreamTracker 1.0 REFERENCE: SEE ALSO:S3M,MOD --------A-TAR-G----------------------------- The Unix Tape ARchives mostly have the extension TAR. The info about this comes from a magic file, thus useful only for identification. --------A-TAR------------------------------- The Unix TAR program is an archiver program which stores files in a single archive without compression. OFFSET Count TYPE Description @section The Standard Format A @dfn{tar tape} or file contains a series of records. Each record contains @code{RECORDSIZE} bytes. Although this format may be thought of as being on magnetic tape, other media are often used. Each file archived is represented by a header record which describes the file, followed by zero or more records which give the contents of the file. At the end of the archive file there may be a record filled with binary zeros as an end-of-file marker. A reasonable system should write a record of zeros at the end, but must not assume that such a record exists when reading an archive. The records may be @dfn{blocked} for physical I/O operations. Each block of @var{N} records (where @var{N} is set by the @samp{-b} option to @code{tar}) is written with a single @code{write()} operation. On magnetic tapes, the result of such a write is a single tape record. When writing an archive, the last block of records should be written at the full size, with records after the zero record containing all zeroes. When reading an archive, a reasonable system should properly handle an archive whose last block is shorter than the rest, or which contains garbage records after a zero record. The header record is defined in C as follows: @example /* * Standard Archive Format - Standard TAR - USTAR */ #define RECORDSIZE 512 #define NAMSIZ 100 #define TUNMLEN 32 #define TGNMLEN 32 union record @{ char charptr[RECORDSIZE]; struct header @{ char name[NAMSIZ]; char mode[8]; char uid[8]; char gid[8]; char size[12]; char mtime[12]; char chksum[8]; char linkflag; char linkname[NAMSIZ]; char magic[8]; char uname[TUNMLEN]; char gname[TGNMLEN]; char devmajor[8]; char devminor[8]; @} header; @}; /* The checksum field is filled with this while the checksum is computed. */ #define CHKBLANKS " " /* 8 blanks, no null */ /* The magic field is filled with this if uname and gname are valid. */ #define TMAGIC "ustar " /* 7 chars and a null */ /* The magic field is filled with this if this is a GNU format dump entry */ #define GNUMAGIC "GNUtar " /* 7 chars and a null */ /* The linkflag defines the type of file */ #define LF_OLDNORMAL '\0' /* Normal disk file, Unix compatible */ #define LF_NORMAL '0' /* Normal disk file */ #define LF_LINK '1' /* Link to previously dumped file */ #define LF_SYMLINK '2' /* Symbolic link */ #define LF_CHR '3' /* Character special file */ #define LF_BLK '4' /* Block special file */ #define LF_DIR '5' /* Directory */ #define LF_FIFO '6' /* FIFO special file */ #define LF_CONTIG '7' /* Contiguous file */ /* Further link types may be defined later. */ /* Bits used in the mode field - values in octal */ #define TSUID 04000 /* Set UID on execution */ #define TSGID 02000 /* Set GID on execution */ #define TSVTX 01000 /* Save text (sticky bit) */ /* File permissions */ #define TUREAD 00400 /* read by owner */ #define TUWRITE 00200 /* write by owner */ #define TUEXEC 00100 /* execute/search by owner */ #define TGREAD 00040 /* read by group */ #define TGWRITE 00020 /* write by group */ #define TGEXEC 00010 /* execute/search by group */ #define TOREAD 00004 /* read by other */ #define TOWRITE 00002 /* write by other */ #define TOEXEC 00001 /* execute/search by other */ @end example All characters in header records are represented by using 8-bit characters in the local variant of ASCII. Each field within the structure is contiguous; that is, there is no padding used within the structure. Each character on the archive medium is stored contiguously. Bytes representing the contents of files (after the header record of each file) are not translated in any way and are not constrained to represent characters in any character set. The @code{tar} format does not distinguish text files from binary files, and no translation of file contents is performed. The @code{name}, @code{linkname}, @code{magic}, @code{uname}, and @code{gname} are null-terminated character strings. All other fileds are zero-filled octal numbers in ASCII. Each numeric field of width @var{w} contains @var{w}@minus{} 2 digits, a space, and a null, except @code{size}, and @code{mtime}, which do not contain the trailing null. The @code{name} field is the pathname of the file, with directory names (if any) preceding the file name, separated by slashes. The @code{mode} field provides nine bits specifying file permissions and three bits to specify the Set UID, Set GID, and Save Text (``stick'') modes. Values for these bits are defined above. When special permissions are required to create a file with a given mode, and the user restoring files from the archive does not hold such permissions, the mode bit(s) specifying those special permissions are ignored. Modes which are not supported by the operating system restoring files from the archive will be ignored. Unsupported modes should be faked up when creating or updating an archive; e.g. the group permission could be copied from the @code{other} permission. The @code{uid} and @code{gid} fields are the numeric user and group ID of the file owners, respectively. If the operating system does not support numeric user or group IDs, these fields should be ignored. The @code{size} field is the size of the file in bytes; linked files are archived with this field specified as zero. @xref{Extraction Options}; in particular the @samp{-G} option.@refill The @code{mtime} field is the modification time of the file at the time it was archived. It is the ASCII representation of the octal value of the last time the file was modified, represented as an integer number of seconds since January 1, 1970, 00:00 Coordinated Universal Time. The @code{chksum} field is the ASCII representation of the octal value of the simple sum of all bytes in the header record. Each 8-bit byte in the header is added to an unsigned integer, initialized to zero, the precision of which shall be no less than seventeen bits. When calculating the checksum, the @code{chksum} field is treated as if it were all blanks. The @code{typeflag} field specifies the type of file archived. If a particular implementation does not recognize or permit the specified type, the file will be extracted as if it were a regular file. As this action occurs, @code{tar} issues a warning to the standard error. @table @code @item LF_NORMAL @itemx LF_OLDNORMAL These represent a regular file. In order to be compatible with older versions of @code{tar}, a @code{typeflag} value of @code{LF_OLDNORMAL} should be silently recognized as a regular file. New archives should be created using @code{LF_NORMAL}. Also, for backward compatibility, @code{tar} treats a regular file whose name ends with a slash as a directory. @item LF_LINK This represents a file linked to another file, of any type, previously archived. Such files are identified in Unix by each file having the same device and inode number. The linked-to name is specified in the @code{linkname} field with a trailing null. @item LF_SYMLINK This represents a symbolic link to another file. The linked-to name is specified in the @code{linkname} field with a trailing null. @item LF_CHR @itemx LF_BLK These represent character special files and block special files respectively. In this case the @code{devmajor} and @code{devminor} fields will contain the major and minor device numbers respectively. Operating systems may map the device specifications to their own local specification, or may ignore the entry. @item LF_DIR This specifies a directory or sub-directory. The directory name in the @code{name} field should end with a slash. On systems where disk allocation is performed on a directory basis the @code{size} field will contain the maximum number of bytes (which may be rounded to the nearest disk block allocation unit) which the directory may hold. A @code{size} field of zero indicates no such limiting. Systems which do not support limiting in this manner should ignore the @code{size} field. @item LF_FIFO This specifies a FIFO special file. Note that the archiving of a FIFO file archives the existence of this file and not its contents. @item LF_CONTIG This specifies a contiguous file, which is the same as a normal file except that, in operating systems which support it, all its space is allocated contiguously on the disk. Operating systems which do not allow contiguous allocation should silently treat this type as a normal file. @item 'A' @dots{} @itemx 'Z' These are reserved for custom implementations. Some of these are used in the GNU modified format, as described below. @end table Other values are reserved for specification in future revisions of the P1003 standard, and should not be used by any @code{tar} program. The @code{magic} field indicates that this archive was output in the P1003 archive format. If this field contains @code{TMAGIC}, the @code{uname} and @code{gname} fields will contain the ASCII representation of the owner and group of the file respectively. If found, the user and group ID represented by these names will be used rather than the values within the @code{uid} and @code{gid} fields. @section GNU Extensions to the Archive Format The GNU format uses additional file types to describe new types of files in an archive. These are listed below. @table @code @item LF_DUMPDIR @itemx 'D' This represents a directory and a list of files created by the @samp{-G} option. The @code{size} field gives the total size of the associated list of files. Each filename is preceded by either a @code{'Y'} (the file should be in this archive) or an @code{'N'} (The file is a directory, or is not stored in the archive). Each filename is terminated by a null. There is an additional null after the last filename. @item LF_MULTIVOL @itemx 'M' This represents a file continued from another volume of a multi-volume archive created with the @samp{-M} option. The original type of the file is not given here. The @code{size} field gives the maximum size of this piece of the file (assuming the volume does not end before the file is written out). The @code{offset} field gives the offset from the beginning of the file where this part of the file begins. Thus @code{size} plus @code{offset} should equal the original size of the file. @item LF_VOLHDR @itemx 'V' This file type is used to mark the volume header that was given with the @samp{-V} option when the archive was created. The @code{name} field contains the @code{name} given after the @samp{-V} option. The @code{size} field is zero. Only the first file in each volume of an archive should have this type. @end table EXTENSION: OCCURENCES: PROGRAMS: REFERENCE: SEE ALSO: VALIDATION: OFFSET Count TYPE Description 0000h 256 byte Other header info ? 0100h 6 char ID='ustar',0 EXTENSION:TAR OCCURENCES:PC, Unix PROGRAMS:TAR --------G-TDDD------------------------------ This format is used by the Imagine rendering package. The names of the blocks are unknown to me. OFFSET Count TYPE Description EXTENSION:IFF OCCURENCES:Amiga,PC PROGRAMS:Imagine package REFERENCE:DDJ0794 SEE ALSO:IFF --------I-TIFF------------------------------ The TIFF file format was designed jointly by Aldus and Microsoft with leading scanner vendors to faciliate incorporating scanned images into publishing. The described TIFF specification is TIFF 5.0. A TIFF file consists of several different blocks which define the palette data or the LZW-compressed body among other things. TIFF files can be in Motorola _or_ Intel byte order, depending on the first word. If it is 'II', the byte order is in Intel order, if it is 'MM', then you have Motorola byte ordering. Each TIFF file begins with a image file header which points to one or more image file directories, which contain the image data and image information. The format of the image header : OFFSET Count TYPE Description 0000h 2 char ID='II', ID='MM' This is the identification, 'II' stands for Intel byte order, 'MM' for Motorola byte order. The following data must be interpreted accordingly ! 0002h 1 word TIFF "version number". This version number never changed and the value (42) was choosen for its deep philosophical value. In fact, if the version number ever changes, this means that radical changes to the TIFF format have been made, and a TIFF reader should give up immediately. You can consider this word to be a part of the header ID. 0004h 1 dword Offset of first image directory in file form start of file. The first image directory must begin on an even byte boundary. The image directory may follow the image data it describes. The image directory is described below. An organization may wish to store information that is meaningful to only that organization in a TIFF file. Tags numbered 32768 or higher are reserved for that purpose. Upon request, the administrator will allocate and register a block of private tags for an organization. Private enumerated values can be accommodated in a similar fashion. The format of the image file directory (IFD) : All entries are sorted in ascending order by the tag field. OFFSET Count TYPE Description 0000h 1 word Number of entries ="NUM" 0002h "NUM" rec Field descriptor 1 word Field tag, see below 1 word Field type 1 - byte 2 - ASCII string, counted in length. Most often an ASCIIZ string, the trailing zero is counted with the data length. 3 - word 4 - dword / uword 5 - rational (2 dwords, numerator and denominator) 1 dword Length of the field in units of the data type. A single 16-bit word has the length 1. 1 dword Data offset of the field. The data starts on a word boundary, thus the dword should be even. The data for the field may be anywhere in the file, even after the image data. If the data size is less or equal to 4 bytes (determined by the field type and length), then this offset is not a offset but instead the data itself, to save space. If the data size is less than 4 bytes, the data is stored left-justified within the 4 bytes of the offset field. 0002h+ "NUM"*12 1 dword Offset of next IFD in file, 0 if none follow If a certain field in the IFD does not exist, you have to presume the default values. The different fields are : --- BitsPerSample Tag = 258 (102) Type = word N = SamplesPerPixel Default = 1 Number of bits per sample. Note that this tag allows a different number of bits per sample for each sample corresponding to a pixel. For example, RGB color data could use a different number of bits per sample for each of the three color planes. --- ColorMap Tag = 320 (140) Type = word N = 3 * (2**BitsPerSample) No default.ColorMap must be included in all palette color images. This tag defines a Red-Green-Blue color map for palette color images. The palette color pixel value is used to index into all 3 subcurves. The subcurves are stored sequentially. The Red entries come first, followed by the Green entries, followed by the Blue entries. The width of each entry is 16 bits, as implied by the type of word. 0 represents the minimum intensity, and 65535 represents the maximum intensity. --- ColorResponseCurves Tag = 301 (12D) Type = word N = 3 * (2**BitsPerSample) Default: curves based on the NTSC recommended gamma of 2.2. This tag defines three color response curves, one each for Red, Green and Blue color information. The Red entries come first, followed by the Green entries, followed by the Blue entries. The length of each subcurve is 2**BitsPerSample, using the BitsPerSample value corresponding to the respective primary. The width of each entry is 16 bits, as implied by the type of word. The purpose of the color response curves is to refine the content of RGB color images. --- Compression Tag = 259 (103) Type = word N = 1 Default = 1. 1 = No compression, but pack data into bytes as tightly as possible, with no unused bits except at the end of a row. The bytes are stored as an array of bytes, for BitsPerSample <= 8, word if BitsPerSample > 8 and <= 16, and dword if BitsPerSample > 16 and <= 32. The byte ordering of data >8 bits must be consistent with that specified in the TIFF file header (bytes 0 and 1). Rows are required to begin on byte boundaries. 2 = CCITT Group 3 1-Dimensional Modified Huffman run length encoding. See ALGRTHMS.txt BitsPerSample must be 1, since this type of compression is defined only for bilevel images (like FAX images...) 3 = Facsimile-compatible CCITT Group 3, exactly as specified in "Standardization of Group 3 facsimile apparatus for document transmission," Recommendation T.4, Volume VII, Fascicle VII.3, Terminal Equipment and Protocols for Telematic Services, The International Telegraph and Telephone Consultative Committee (CCITT), Geneva, 1985, pages 16 through 31. Each strip must begin on a byte boundary. (But recall that an image can be a single strip.) Rows that are not the first row of a strip are not required to begin on a byte boundary. The data is stored as bytes, not words - byte-reversal is not allowed. See the Group3Options field for Group 3 options such as 1D vs 2D coding. 4 = Facsimile-compatible CCITT Group 4, exactly as specified in "Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus," Recommendation T.6, Volume VII, Fascicle VII.3, Terminal Equipment and Protocols for Telematic Services, The International Telegraph and Telephone Consultative Committee (CCITT), Geneva, 1985, pages 40 through 48. Each strip must begin on a byte boundary. Rows that are not the first row of a strip are not required to begin on a byte boundary. The data is stored as bytes, not words. See the Group4Options field for Group 4 options. 5 = LZW Compression, for grayscale, mapped color, and full color images. See ALGRTHMS.txt 32773 = PackBits compression, a simple byte oriented run length scheme for 1-bit images. See Appendix C. Data compression only applies to raster image data, as pointed to by StripOffsets. --- GrayResponseCurve Tag = 291 (123) Type = word N = 2**BitsPerSample The purpose of the gray response curve and the gray units is to provide more exact photometric interpretation information for gray scale image data, in terms of optical density. --- GrayResponseUnit Tag = 290 (122) Type = word N = 1 For historical reasons, the default is 2. However, for greater accuracy, 3 is recommended. 1 = Number represents tenths of a unit. 2 = Number represents hundredths of a unit. 3 = Number represents thousandths of a unit. 4 = Number represents ten-thousandths of a unit. 5 = Number represents hundred-thousandths of a unit. --- ImageLength Tag = 257 (101) Type = word or dword N = 1 No default. The image's length (height) in pixels (Y:vertical). The number of rows (sometimes described as "scan lines") in the image. --- ImageWidth Tag = 256 (100) Type = word or dword N = 1 No default. The image's width, in pixels (X:horizontal). The number of columns in the image. --- NewSubfileType Tag = 254 (FE) Type = dword N = 1 Default is 0. A general indication of the kind of data that is contained in this subfile. This field is made up of a set of 32 flag bits. Unused bits are expected to be 0. Bit 0 is the low-order bit. Currently defined values for the bitmap are: 0 - Image is reduced of another TIFF image in this file 1 - Image is a single page of a multi-page 2 - Image is a transparency mask for another image in this file --- PhotometricInterpretation Tag = 262 (106) Type = word N = 1 No default. 0 = For bilevel and grayscale images: 0 is imaged as white. 2**BitsPerSample-1 is imaged as black. If GrayResponseCurve exists, it overrides the PhotometricInterpretation value. 1 = For bilevel and grayscale images: 0 is imaged as black. 2**BitsPerSample-1 is imaged as white. If GrayResponseCurve exists, it overrides the PhotometricInterpretation value. 2 = RGB. In the RGB model, a color is described as a combination of the three primary colors of light (red, green, and blue) in particular concentrations. For each of the three samples, 0 represents minimum intensity, and 2**BitsPerSample - 1 represents maximum intensity. For PlanarConfiguration = 1, the samples are stored in the indicated order: first Red, then Green, then Blue. For PlanarConfiguration = 2, the StripOffsets for the sample planes are stored in the indicated order: first the Red sample plane StripOffsets, then the Green plane StripOffsets, then the Blue plane StripOffsets. 3 = "Palette color." In this mode, a color is described with a single sample. The sample is used as an index into ColorMap. The sample is used to index into each of the red, green and blue curve tables to retrieve an RGB triplet defining an actual color. When this PhotometricInterpretation value is used, the color response curves must also be supplied. SamplesPerPixel must be 1. 4 = Transparency Mask. This means that the image is used to define an irregularly shaped region of another image in the same TIFF file. SamplesPerPixel and BitsPerSample must be 1. PackBits compression is recommended. The 1-bits define the interior of the region; the 0-bits define the exterior of the region. The Transparency Mask must have the same ImageLength and ImageWidth as the main image. PlanarConfiguration Tag = 284 (11C) Type = word N = 1 Default is 1. 1 = The sample values for each pixel are stored contiguously, so that there is a single image plane. See PhotometricInterpretation to determine the order of the samples within the pixel data. So, for RGB data, the data is stored RGBRGBRGB...and so on. 2 = The samples are stored in separate "sample planes." The values in StripOffsets and StripByteCounts are then arranged as a 2-dimensional array, with SamplesPerPixel rows and StripsPerImage columns. (All of the columns for row 0 are stored first, followed by the columns of row 1, and so on.) PhotometricInterpretation describes the type of data that is stored in each sample plane. For example, RGB data is stored with the Red samples in one sample plane, the Green in another, and the Blue in another. If SamplesPerPixel is 1, PlanarConfiguration is irrelevant, and should not be included. Predictor Tag = 317 (13D) Type = word N = 1 Default is 1. To be used when Compression=5 (LZW). 1 = No prediction scheme used before coding. 2 = Horizontal differencing. See Appendix I. ResolutionUnit Tag = 296 (128) Type = word N = 1 Default is 2. To be used with XResolution and YResolution. 1 = No absolute unit of measurement. Used for images that may have a non-square aspect ratio, but no meaningful absolute dimensions. The drawback of ResolutionUnit=1 is that different applications will import the image at different sizes. Even if the decision is quite arbitrary, it might be better to use dots per inch or dots per centimeter, and pick XResolution and YResolution such that the aspect ratio is correct and the maximum dimension of the image is about four inches (the "four" is quite arbitrary.) 2 = Inch. 3 = Centimeter. RowsPerStrip Tag = 278 (116) Type = word or dword N = 1 Default is 2**32 - 1, which is effectively infinity. That is, the entire image is one strip. Recomended is a strip size of 8K. The number of rows per strip. The image data is organized into strips for fast access to individual rows when the data is compressed - though this field is valid even if the data is not compressed. --- SamplesPerPixel Tag = 277 (115) Type = word N = 1 Default = 1. The number of samples per pixel. SamplesPerPixel is 1 for bilevel, grayscale, and palette color images. SamplesPerPixel is 3 for RGB images. --- StripByteCounts Tag = 279 (117) Type = word or dword N = StripsPerImage for PlanarConfiguration equal to 1. = SamplesPerPixel * StripsPerImage for PlanarConfiguration equal to 2 No default. For each strip, the number of bytes in that strip. The existence of this field greatly simplifies the chore of buffering compressed data, if the strip size is reasonable. --- StripOffsets Tag = 273 (111) Type = word or dword N = StripsPerImage for PlanarConfiguration equal to 1. = SamplesPerPixel * StripsPerImage for PlanarConfiguration equal to 2 No default. For each strip, the byte offset of that strip. The offset is specified with respect to the beginning of the TIFF file. Note that this implies that each strip has a location independent of the locations of other strips. This feature may be useful for editing applications. This field is the only way for a reader to find the image data, and hence must exist. --- XResolution Tag = 282 (11A) Type = RATIONAL N = 1 No default. The number of pixels per ResolutionUnit in the X direction, i.e., in the ImageWidth direction. --- YResolution Tag = 283 (11B) Type = RATIONAL N = 1 No default. The number of pixels per ResolutionUnit in the Y direction, i.e., in the ImageLength direction. --- Artist Tag = 315 (13B) Type = ASCII Person who created the image. Copyright notice. --- DateTime Tag = 306 (132) Type = ASCII N = 20 Date and time of image creation. Uses the format "YYYY:MM:DD HH:MM:SS", with hours on a 24-hour clock, and one space character between the date and the time. The length of the string, including the null, is 20 bytes. --- HostComputer Tag = 316 (13C) Type = ASCII "ENIAC", or whatever. --- ImageDescription Tag = 270 (10E) Type = ASCII For example, a user may wish to attach a comment such as "1988 company picnic" to an image. --- Make Tag = 271 (10F) Type = ASCII Manufacturer of the scanner, video digitizer, or whatever. --- Model Tag = 272 (110) Type = ASCII The model name/number of the scanner, video digitizer, or whatever. This tag is intended for user information only so format is arbitrary. --- Software Tag = 305 (131) Type = ASCII Name and release number of the software package that created the image. User information only. --- Group3Options Tag = 292 (124) Type = dword N = 1 Those options are for fax-images stored in TIFF format. This field is made up of a set of 32 flag bits. Unused bits are expected to be 0. It is probably not safe to try to read the file if any bit of this field is set that you don't know the meaning of. Bit map : 0 - 2-dimensional coding used. 1 - Image is uncompressed 2 - Fill bits have been added before EOL codes, so that EOL always ends on a byte boundary. --- Group4Options Tag = 293 (125) Type = dword N = 1 This field is made up of a set of 32 flag bits and is used for the images with fax group 4 compression. Unused bits are expected to be 0. It is probably not safe to try to read the file if any bit of this field is set that you don't know the meaning of. Gray scale and color coding schemes are under study, and will be added when finalized. For 2-D coding, each strip is encoded as if it were a separate image. In particular, each strip begins on a byte boundary; and the coding for the first row of a strip is encoded independently of the previous row, using horizontal codes, as if the previous row is entirely white. Each strip ends with the 24-bit end-of-facsimile block (EOFB). Bit map : 0 - reserved (unused) 1 - uncompressed mode is used 2-31 - reserved --- DocumentName Tag = 269 (10D) Type = ASCII The name of the document from which this image was scanned. --- PageName Tag = 285 (11D) Type = ASCII The name of the page from which this image was scanned. --- PageNumber Tag = 297 (129) Type = word N = 2 This tag is used to specify page numbers of a multiple page (e.g. facsimile) document. Two word values are specified. The first value is the page number; the second value is the total number of pages in the document. Note that pages need not appear in numerical order. The first page is 0 (zero). --- XPosition Tag = 286 (11E) Type = RATIONAL The X offset of the left side of the image, with respect to the left side of the page, in ResolutionUnits. --- YPosition Tag = 287 (11F) Type = RATIONAL The Y offset of the top of the image, with respect to the top of the page, in ResolutionUnits. In the TIFF coordinate scheme, the positive Y direction is down, so that YPosition is always positive. --- White Point Tag = 318 (13E) Type = RATIONAL N = 2 Default is the SMPTE white point, D65: x = 0.313, y = 0.329. The white point of the image. Note that this value is described using the 1931 CIE xyY chromaticity diagram and only the chromaticity is specified. The luminance component is arbitrary and not specified. This can correspond to the white point of a monitor that the image was painted on, the filter set/light source combination of a scanner, or to the white point of the illumination model of a rendering package. The ordering is x, y. --- PrimaryChromaticities Tag = 319 (13F) Type = RATIONAL N = 6 Default is the SMPTE primary color chromaticities: Red: x = 0.635 y = 0.340 Green: x = 0.305 y = 0.595 Blue: x = 0.155 y = 0.070 The primary color chromaticities. Note that these values are described using the 1931 CIE xyY chromaticity diagram and only the chromaticities are specified.For paint images, these represent the chromaticities of the monitor and for scanned images they are derived from the filter set/light source combination of a scanner. The ordering is red x, red y, green x, green y, blue x, blue y. --- SubfileType Tag = 255 (FF) Type = word N = 1 A general indication of the kind of data that is contained in this subfile. Currently defined values are: 1 = full resolution image data - ImageWidth, ImageLength, and StripOffsets are required fields 2 = reduced resolution image data - ImageWidth, ImageLength, and StripOffsets are required fields. It is further assumed that a reduced resolution image is a reduced version of the entire extent of the corresponding full resolution data. 3 = single page of a multi-page image (see the PageNumber tag description). Continued use of this field is not recommended. Writers should instead use the new and more general NewSubfileType field. --- Orientation Tag = 274 (112) Type = word N = 1 Default is 1. 1 = The 0th row represents the visual top of the image, and the 0th column represents the visual left hand side. 2 = The 0th row represents the visual top of the image, and the 0th column represents the visual right hand side. 3 = The 0th row represents the visual bottom of the image, and the 0th column represents the visual right hand side. 4 = The 0th row represents the visual bottom of the image, and the 0th column represents the visual left hand side. 5 = The 0th row represents the visual left hand side of the image, and the 0th column represents the visual top. 6 = The 0th row represents the visual right hand side of the image, and the 0th column represents the visual top. 7 = The 0th row represents the visual right hand side of the image, and the 0th column represents the visual bottom. 8 = The 0th row represents the visual left hand side of the image, and the 0th column represents the visual bottom. It is extremely costly for most readers to perform image rotation "on the fly", i.e., when importing and printing; and users of most desktop publishing applications do not expect a file imported by the application to be altered permanently in any way. Threshholding Tag = 263 (107) Type = word N = 1 1 = a bilevel "line art" scan. BitsPerSample must be 1. 2 = a "dithered" scan, usually of continuous tone data such as photographs. BitsPerSample must be 1. 3 = Error Diffused. ColorImageType Tag = 318 (13E) Type = word N = 1 Default is 1. Gives TIFF color image readers a better idea of what kind of color image it is. There will be borderline cases. 1 = Continuous tone, natural image. 2 = Synthetic image, using a greatly restricted range of colors. Such images are produced by most color paint programs. See ColorList for a list of colors used in this image. ColorList Tag = 319 (13F) Type = BYTE or word N = the number of colors that are used in this image, times SamplesPerPixel A list of colors that are used in this image. Use of this field is only practical for images containing a greatly restricted (usually less than or equal to 256) range of colors. ColorImageType should be 2. See ColorImageType. The list is organized as an array of RGB triplets, with no pad. The RGB triplets are not guaranteed to be in any particular order. Note that the red, green, and blue components can either be a BYTE or a word in length. BYTE should be sufficient for most applications. EXTENSION:TIF,TIFF OCCURENCES:PC,MAC,UNIX PROGRAMS:Aldus Pagemaker, Paintbrush REFERENCE: SEE ALSO: VALIDATION: --------I-TARGA----------------------------- The Targa-File format is an image file format used by a wide variety of both scanners and imaging software, and exists in many incarnations. The information has been taken from Appendix C of the Truevision Technical Guide.Requests for further information could be directed to: AT&T Electronic Photography and Imaging Center 2002 Wellesley Ave. Indianapolis, IN 42619 The lack of completeness is due to the fact that the Targa recognizes over half a dozen image file formats, some of which are more widely used than others. OFFSET Count TYPE Description 0000h 1 byte Length of image identification field (below) 0001h 1 byte Color map type : 0 - no color map 1 - 256 entry palette 0002h 1 byte Image type : 0 - no image data included 1 - Uncompressed, color-mapped image 2 - Uncompressed, RGB image 3 - Uncompressed, black and white image 9 - RLE encoded color-mapped image 10 - RLE encoded RGB image 11 - Compressed, black and white image 32 - Compressed color-mapped data, using Huffman, Delta, and runlength encoding. 33 - Compressed color-mapped data, using Huffman, Delta, and RLE. 4-pass quadtree- type process. 0003h 1 word Index of first color map entry 0005h 1 word Count of color map entries 0007h 1 byte Number of bits per color map entry 0008h 1 word X coordinate of the lower left corner of the image. 000Ah 1 word Y coordinate of the lower left corner of the image. 000Ch 1 word Width of the image in pixels 000Eh 1 word Height of the image in pixels 0010h 1 byte Bytes per pixel 0011h 1 byte Flags (bitmapped): 0-3 : Number of attribute bits 4 : reserved 5 : Screen origin in upper left corner 6-7 : Data storage interleave 00 - no interleave 01 - even/odd interleave 10 - four way interleave 11 - reserved The byte should be set to 0. Don't know why. 0012h ? char Image identification string, usually not there, when the length (see up) is 0. ????h ? byte Color map data Depending on the number of bits per color map entry, the entries here have a different size. 4 bytes : 1 byte for blue 1 byte for green 1 byte for red 1 byte for attribute 3 bytes : 1 byte for blue 1 byte for green 1 byte for red 2 bytes : Bitmapped as a word in Intel byte order as follows : ARRRRRGG GGGBBBBB ????h ? byte Image data For images of type 9 (using RLE), the image data is divided into packets, the first byte being the indicator for repetition or copy. If bit 7 of the first byte is set, then repeat (first byte and 07Fh+1) times the next byte, otherwise copy first byte+1 pixels from data stream. RLE packets may cross scan lines ! EXTENSION:TGA OCCURENCES:PC SEE ALSO: --------S-TXW------------------------------- The TXW files are disk images used by the Yamaha TX-16W. Further information wanted. EXTENSION:TXW --------S-UWF-G----------------------------- The UWF files are sample files used by the UltraTracker. Further information wanted. OFFSET Count TYPE Description 0000h 32 char ASCIIZ sample name 0020h 1 char ID=1Ah 0021h 1 char ID=10h 0022h 5 char ID='MUWFB' 0027h 1 char ID=0 0028h 6 char Length of sample as ASCII long integer 002Eh 1 word Length of sample ????? EXTENSION:UWF SEE ALSO:ULT --------M-ULT------------------------------- The ULT files are modules used by the UltraTracker. UltraTracker is a module editor for the Gravis UltraSound soundcard. The version of the file format used now is 6. OFFSET Count TYPE Description 0000h 11 char ID="MAS_UTrack_V" 000Ch 4 char Version number in 4-digit ASCII : 1 - ULT version 1.0 2 - ULT version 2.0 3 - ULT version 2.1 4 - ULT version 2.2 000Fh 32 char Song title 002Fh 1 byte Number of song text lines ="NTL" 0030h "NTL"*32 char Song text 0030h+"NTL"*32 1 byte Number of samples ="NOS" 0031h+"NTL"*32 "NOS" rec Sample structure 32 byte Sample name 12 byte DOS file name of sample 1 dword Sample loop start 1 dword Sample loop end 1 dword Size start 1 dword Size end 1 byte Sample volume (linear) 1 byte Bidirectional loop 0 - No looping, forward playback, 8bit sample 4 - No Looping, forward playback, 16bit sample 8 - Loop Sample, forward playback, 8bit sample 12 - Loop Sample, forward playback, 16bit sample 24 - Loop Sample, reverse playback 8bit sample 28 - Loop Sample, reverse playback, 16bit sample 1 word Fine tune setting 1 word C2-Frequency 0031h+"NTL"*32 +"NOS"*64 256 byte Pattern orders 0131h+"NTL"*32 +"NOS"*64 1 byte ="NOT" Number of tracks -1 0132h+"NTL"*32 +"NOS"*64 1 byte ="NOT" Number of patterns -1 0133h+"NTL"*32 +"NOS"*64 "NOT" byte Pan-position table (0-left, F-right) After the header there comes the event data. EXTENSION:ULT SEE ALSO:UWF --------S-WAVE------------------------------ The Windows .WAV files are RIFF format files. Some programs expect the fmt block right behind the RIFF header itself, so your programs should write out this block as the first block in the RIFF file. The subblocks for the wave files are RiffBLOCK [data] This block contains the raw sample data. The necessary information for playback is contained in the [fmt ] block. RiffBLOCK [fmt ] This block contains the data necessary for playback of the sound files. Note the blank after fmt ! OFFSET Count TYPE Description 0000h 1 word Format tag 1 = PCM (raw sample data) 2 etc. for APCDM, a-Law, u-Law ... 0002h 1 word Channels (1=mono,2=stereo,...) 0004h 1 dword Sampling rate 0008h 1 dword Average bytes per second (=sampling rate*channels) 000Ch 1 word Block alignment / reserved ?? 000Eh 1 word Bits per sample (8/12/16-bit samples) RiffBLOCK [loop] This block is for looped samples. Very few programs support this block, but if your program changes the wave file, it should preserve any unknown blocks. OFFSET Count TYPE Description 0000h 1 dword Start of sample loop 0004h 1 dword End of sample loop EXTENSION:WAV SEE ALSO:RIFF,VOC OCCURENCES:PC PROGRAMS:Windows,GUSWAV,WAV2VOC VALIDATION:NONE --------E-Windows PIF----------------------- Windows also uses the PIF files for better performance under the DOS box. The Windows extension of the original PIF format starts at offset 0171h. OFFSET Count TYPE Description ********* not yet implemented ;-) EXTENSION: OCCURENCES: PROGRAMS: REFERENCE:DDJ #202 SEE ALSO:PIF, WINDOWS NT PIF VALIDATION: --------W-WKS------------------------------- The WKS files are worksheets/spreadsheets used by the Lotus 1-2-3 and Lotus Symphony packages. More information has yet to be found since this information origins from a magic file. OFFSET Count TYPE Description 0000h 5 byte ID=0,0,2,0,4 0005h 1 byte WKS type : 4 - Lotus 1-2-3 v1.A WKS 5 - Symphony 1.0 WKS other - ?WK1 file? (Lotus 2.01+, Symphony 1.1+) EXTENSION:WKS OCCURENCES:PC PROGRAMS:Lotus 1-2-3,Lotus Symphony SEE ALSO:WKS --------T-WORD-G---------------------------- The Microsoft Word programs store their documents in files. The info comes from a magic file and my own (not working) sources, so it is very unreliable except for identification. OFFSET Count TYPE Description 0000h 1 dword ID=31BE00 0002h 1 byte Document type : 0 - MS Word text 1 - MS Text building block 2 - Printer description file(maybe wrong topic) 0003h 1 byte ID=00 0004h 1 word ID=AB00h ToolID, different for the different versions ? 0006h 6 word reserved(0) 0008h 1 dword Textbytes??? Whatever 000Ch 1 word Paragraph information 000Eh 1 word Foot note table 0010h 1 word Section property 0012h 1 word Section table 0014h 1 word Page table 0016h 64 char Style sheet path 0056h 1 word Windows Write page count Can be used to identify Windows Write files, because it is 0 for MS Word and nonzero for Windows Write documents. 0058h 8 char Printer name Used under MS Word / WinWord only 0060h 1 word MS Word page count 0062h 8 byte Document properties 006Ah 1 byte Word version this file was made by 006Bh 1 bool Autosave flag 006Ch 1 word Word 5 page table 006Eh 1 word Mac bkmk (whatever) 0070h 1 word ?Offset of file name for autosave? 0072h 1 word Running head table 0074h 1 word Code page used making this document EXTENSION:DOC OCCURENCES:PC PROGRAMS:MS Word,Windows Write, WinWord SEE ALSO: VALIDATION: --------T-WORDPERFERCT FILES---------------- The WordPerfect files all have a common header - even tough I don't know anything else about them. OFFSET Count TYPE Description 0000h 4 char ID=255,"WPC" 0004h 4 byte unknown 0008h 1 byte ID=1 0009h 1 byte Filetype (see table 0003) (Table 0003) File types of WordPerfect files 01h - macro file 02h - WordPerfect help file 03h - keyboard definition file 0Ah - document file 0Bh - dictionary file 0Ch - thesaurus file 0Dh - block 0Eh - rectangular block 0Fh - column block 10h - printer resource file (PRS) 11h - setup file 12h - prefix information file 13h - printer resource file (ALL) 14h - display resource file (DRS) 15h - overlay file (WP.FIL) 16h - graphics file (WPG) 17h - hyphenation code module 18h - hyphenation data module 19h - macro resource file (MRS) 1Ah - graphics driver (WPD) 1Bh - hyphenation lex module EXTENSION:various OCCURENCES:PC --------W-WQ1------------------------------- Similar to the WKS spreadsheet files, the Quattro Pro spreadsheet files exist, and their header is somewhat similar. Info again from a magic file which makes only identification possible. OFFSET Count TYPE Description 0000h 1 dword ID=00000200h 0004h 1 char ID='Q' EXTENSION:WQ1 OCCURENCES:PC PROGRAMS:Borland Quattro Pro REFERENCE: SEE ALSO:WKS VALIDATION: --------M-XM-------------------------------- The .XM files (Extended Module) are multichannel MOD files created by Triton's FastTracker ][. They feature up to 32 channels and different effects. FT 2 is a shareware program. After the initial .XM header follows the pattern data, after the patterns follow the instruments. OFFSET Count TYPE Description 0000h 17 char ID="Extended module: " 0011h 20 char Module name, padded with zeroes 0025h 1 char ID=01Ah 0026h 20 char Tracker name 003Ah 1 word Tracker revision number, hi-byte is major version 003Ch 1 dword Header size 0040h 1 word Song length in patterns 0042h 1 word Restart position 0044h 1 word Number of channels 0046h 1 word Number of patterns (< 256) ="PAT" 0048h 1 word Number of instruments (<128) 004Ah 1 word Flags : 0 - Linear frequency table / Amiga freq. table 004Ch 1 word Default tempo 004Eh 1 word Default BPM 0050h 256 byte Pattern order table --- Pattern header The patterns are stored as ordinary MOD patterns, except that each note is stored as 5 bytes: ? 1 (byte) Note (0-71, 0 = C-0) +1 1 (byte) Instrument (0-128) +2 1 (byte) Volume column byte (see below) +3 1 (byte) Effect type +4 1 (byte) Effect parameter A simle packing scheme is also adopted, so that the patterns do not become TOO large: Since the MSB in the note value is never used, it is used for the compression.If the bit is set, then the other bits are interpreted as follows: bit 0 set: Note byte ollows 1 set: Instrument byte follows 2 set: Volume column byte follows 3 set: Effect byte follows 4 set: Effect data byte follows OFFSET Count TYPE Description 0000h 1 dword Length of pattern block/header ?? 0004h 1 byte Pattern pack type 0005h 1 word Number of rows in pattern (1..256) 0007h 1 word Size of pattern data ="PSZ" "PSZ" byte Pattern data --- Instrument header Each instrument has one or more sample headers following it. OFFSET Count TYPE Description 0000h 1 dword Instrument block/header size 0004h 22 char ASCII Instrument name, 0 padded ? 001Ah 1 byte Instrument type (always 0) 001Bh 1 word Number of samples in instrument 001Dh 1 dword Sample header size 0021h 96 byte Sample numbers for all notes 0081h 48 byte Points of volume envelope 00C1h 48 byte Points of panning envelope 0101h 1 byte Number of volume points 0102h 1 byte Number of panning points 0103h 1 byte Volume sustain point 0104h 1 byte Volume loop start point 0105h 1 byte Volume loop end point 0106h 1 byte Panning sustain point 0107h 1 byte Panning loop start point 0108h 1 byte Panning loop end point 0109h 1 byte Volume type, bitmapped 0 - Volume on 1 - Sustain on 2 - Loop on 010Ah 1 byte Panning type, bitmapped 0 - Panning on 1 - Sustain on 2 - Loop on 010Bh 1 byte Vibrato type 010Ch 1 byte Vibrato sweep 010Dh 1 byte Vibrato depth 010Eh 1 byte Vibrato rate 010Fh 1 word Volume fadeout 0111h 1 word Reserved --- Sample headers OFFSET Count TYPE Description 0000h 1 dword Sample length ="LEN" 0004h 1 dword Sample loop start 0008h 1 dword Sample loop length 000Ch 1 byte Volume 000Dh 1 byte Finetune for sample (-128..+127) +-127 is one half tone 000Eh 1 byte Sample type, bitmapped 0,1 : Loop type : 0 - no loop 1 - forward loop 2 - ping-pong loop 3 - reserved 4?: sample is 16-bit 000Fh 1 byte Sample pan 0010h 1 byte Relative note number (signed byte) (-96..+95), 0 -> C-4 sounds as C-4 0011h 1 byte Reserved 0012h 22 char ASCII name of sample, 0 padded 0013h "LEN" byte Sample data. The sample data is stored as delta compressed data like the ProTracker. EXTENSION:XM,MOD OCCURENCES: PROGRAMS: REFERENCE: SEE ALSO:MOD,S3M VALIDATION: --------A-ZIP------------------------------- The ZIP archives are created by the PkZIP/PkUnZIP combo produced by the PkWare company. The PkZIP programs have with LHArc and ARJ the best compression. The directory information is stored at the end of the archive, each local file in the archive begins with the following header; This header can be used to identify a ZIP file as such : OFFSET Count TYPE Description 0000h 4 char ID='PK',03,04 0004h 1 word Version needed to extract archive 0006h 1 word General purpose bit field (bit mapped) 0 - file is encrypted 1 - 8K/4K sliding dictionary used 2 - 3/2 Shannon-Fano trees were used 3-4 - unused 5-15 - used internally by ZIP Note: Bits 1 and 2 are undefined if the compression method is other than type 6 (Imploding). 0008h 1 word Compression method (see table 0010) 000Ah 1 dword Original DOS file date/time (see table 0009) 000Eh 1 dword 32-bit CRC of file (inverse??) 0012h 1 dword Compressed file size 0016h 1 dword Uncompressed file size 001Ah 1 word Length of filename ="LEN" 001Ch 1 word Length of extra field ="XLN" 001Eh "LEN" char path/filename 001Eh "XLN" char extra field +"LEN" After all the files, there comes the central directory structure. (Table 0010) PkZip compression types 0 - Stored / No compression 1 - Shrunk / LZW, 8K buffer, 9-13 bits with partial clearing 2 - Reduced-1 / Probalistic compression, lower 7 bits 3 - Reduced-2 / Probalistic compression, lower 6 bits 4 - Reduced-3 / Probalistic compression, lower 5 bits 5 - Reduced-4 / Probalistic compression, lower 4 bits 6 - Imploded / 2/3 Shanno-Fano trees, 4K/8K sliding dictionary --- Central directory structure The CDS is at the end of the archive and contains additional information about the files stored within the archive. OFFSET Count TYPE Description 0000h 4 char ID='PK',01,02 0004h 1 byte Version made by 0005h 1 byte Host OS (see table 0011) 0006h 1 byte Minimum version needed to extract 0007h 1 byte Target OS see above "Host OS" 0008h 1 word General purpose bit flag see above "General purpose bit flag" 000Ah 1 word Compression method see above "Compression method" 000Ch 1 dword DOS date / time of file 0010h 1 dword 32-bit CRC of file (see table 0009) 0014h 1 dword Compressed size of file 0018h 1 dword Uncompressed size of file 001Ch 1 word Length of filename ="LEN" 001Eh 1 word Length of extra field ="XLN" 0020h 1 word Length of file comment ="CMT" 0022h 1 word Disk number ?? 0024h 1 word Internal file attributes (bit mapped) 0 - file is apparently an ASCII/binary file 1-15 - unused 0026h 1 dword External file attributes (OS dependent) 002Ah 1 dword Relative offset of local header from the start of the first disk this file appears on 002Eh "LEN" char Filename / path; should not contain a drive or device letter, all slashes should be forward slashes '/'. 002Eh+ "XLN" char Extra field +"LEN" 002Eh "CMT" char File comment +"LEN" +"XLN" (Table 0011) PkZip Host OS table 0 - MS-DOS and OS/2 (FAT) 1 - Amiga 2 - VMS 3 - *nix 4 - VM/CMS 5 - Atari ST 6 - OS/2 1.2 extended file sys 7 - Macintosh 8-255 - unused --- End of central directory structure The End of Central Directory Structure header has following format : OFFSET Count TYPE Description 0000h 4 char ID='PK',05,06 0004h 1 word Number of this disk 0006h 1 word Number of disk with start of central directory 0008h 1 word Total number of file/path entries on this disk 000Ah 1 word Total number of entries in central dir 000Ch 1 dword Size of central directory 0010h 1 dword Offset of start of central directory relative to starting disk number 0014h 1 word Archive comment length ="CML" 0016h "CML" char Zip file comment EXTENSION:ZIP OCCURENCES:PC,Amiga,ST PROGRAMS:PkZIP,WinZIP REFERENCE:Technote.APP --------A-ZOO------------------------------- The ZOO archive program by Raoul Dhesi is a file compression program now superceeded in both compression and speed by most other compression programs. The archive header looks like this : OFFSET Count TYPE Description 0000h 20 char Archive header text, ^Z terminated, null padded 0014h 1 dword ID=0FDC4A7DCh 0018h 1 dword Offset of first file in archive 001Ch 1 dword Offset of ???? 0020h 1 byte Version archive was made by 0021h 1 byte Minimum version needed to extract Each stored file has its own header, which looks like this : OFFSET Count TYPE Description 0000h 1 dword ID=0FDC4A7DCh 0004h 1 byte Type of directory entry 0005h 1 byte Compression method : 0 - stored 1 - Crunched : LZW, 4K buffer, var len (9-13 bits) 0006h 1 dword Offset of next directory entry 000Ah 1 dword Offset of next header 000Dh 1 word Original date / time of file (see table 0009) 0012h 1 word CRC-16 of file 0014h 1 dword Uncompressed size of file 0018h 1 dword Compressed size of file 001Ch 1 byte Version this file was compressed by 001Dh 1 byte Minimum version needed to extract 001Eh 1 byte Deleted flag 0 - file in archive 1 - file is considered deleted 001Fh 1 dword Offset of comment field, 0 if none 0023h 1 word Length of comment field 0025h ? char ASCIIZ path / filename EXTENSION:ZOO OCCURENCES:PC PROGRAMS:ZOO.EXE REFERENCE: VALIDATION: --------S-ZyXEL----------------------------- The ZyXEL Modems are capable of digitizing speech, the ZFAX software and answering machine software like VoiceConnect store the sampled data in those files. The Modems are capable of compressing the data down to 19.2k CPS (ADPCM) and 9.6k CPS (CELP), the algorithms for the compression may be found in the ZyxelVoc package by N. Igl, but as the firmware on the modems changes, so might the compression algorithm. Playback on the modem is always possible. OFFSET Count TYPE Description 0000h 5 char ID='ZyXEL' 0005h 1 byte 02h, ??? format tag 0006h 4 byte reserved 000Ah 1 word Compression scheme 0 - CELP 1 - 2 bit ADPCM 2 - 3 bit ADPCM 000Ch 4 byte reserved 0010h ? ???? Raw Data The voice data is just the data received from U1496 Modem/Fax. EXTENSION:ZVD,ZYX OCCURENCES:PC PROGRAMS:Voice Connect,ZFAX REFERENCE:ZYXELVOC.* VALIDATION:NONE --------!-ALGORITHMS------------------------ Some algorithms used for encoding images etc... --- TIFF PackBits algorithm Abstract This document describes a simple compression scheme for bilevel scanned and paint type files. Motivation The TIFF specification defines a number of compression schemes. Compression type 1 is really no compression, other than basic pixel packing. Compression type 2, based on CCITT 1D compression, is powerful, but not trivial to implement. Compression type 5 is typically very effective for most bilevel images, as well as many deeper images such as palette color and grayscale images, but is also not trivial to implement. PackBits is a simple but often effective alternative. Description Several good schemes were already in use in various settings. We somewhat arbitrarily picked the Macintosh PackBits scheme. It is byte oriented, so there is no problem with word alignment. And it has a good worst case behavior (at most 1 extra byte for every 128 input bytes). For Macintosh users, there are toolbox utilities PackBits and UnPackBits that will do the work for you, but it is easy to implement your own routines. A pseudo code fragment to unpack might look like this: Loop until you get the number of unpacked bytes you are expecting: Read the next source byte into n. If n is between 0 and 127 inclusive, copy the next n+1 bytes literally. Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times. Else if n is 128, noop. Endloop In the inverse routine, it's best to encode a 2-byte repeat run as a replicate run except when preceded and followed by a literal run, in which case it's best to merge the three into one literal run. Always encode 3-byte repeats as replicate runs. So that's the algorithm. Here are some other rules: o Each row must be packed separately. Do not compress across row boundaries. o The number of uncompressed bytes per row is defined to be (ImageWidth + 7) / 8. If the uncompressed bitmap is required to have an even number of bytes per row, decompress into word- aligned buffers. o If a run is larger than 128 bytes, simply encode the remainder of the run as one or more additional replicate runs. When PackBits data is uncompressed, the result should be interpreted as per compression type 1 (no compression). --- TIFF LZW Compression Abstract This document describes an adaptive compression scheme for raster images. Reference Terry A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, vol. 17 no. 6 (June 1984). Describes the basic Lempel-Ziv & Welch (LZW) algorithm. The author's goal in the article is to describe a hardware-based compressor that could be built into a disk controller or database engine, and used on all types of data. There is no specific discussion of raster images. We intend to give sufficient information in this Appendix so that the article is not required reading. Requirements A compression scheme with the following characteristics should work well in a desktop publishing environment: o Must work well for images of any bit depth, including images deeper than 8 bits per sample. o Must be effective: an average compression ratio of at least 2:1 or better. And it must have a reasonable worst-case behavior, in case something really strange is thrown at it. o Should not depend on small variations between pixels. Palette color images tend to contain abrupt changes in index values, due to common patterning and dithering techniques. These abrupt changes do tend to be repetitive, however, and the scheme should make use of this fact. o For images generated by paint programs, the scheme should not depend on a particular pattern width. 8x8 pixel patterns are common now, but we should not assume that this situation will not change. o Must be fast. It should not take more than 5 seconds to decompress a 100K byte grayscale image on a 68020- or 386-based computer. Compression can be slower, but probably not by more than a factor of 2 or 3. o The level of implementation complexity must be reasonable. We would like something that can be implemented in no more than a couple of weeks by a competent software engineer with some experience in image processing. The compiled code for compression and decompression combined should be no more than about 10K. o Does not require floating point software or hardware. The following sections describe an algorithm based on the "LZW" (Lempel-Ziv & Welch) technique that meets the above requirements. In addition meeting our requirements, LZW has the following characteristics: o LZW is fully reversible. All information is preserved. But if noise or information is removed from an image, perhaps by smoothing or zeroing some low-order bitplanes, LZW compresses images to a smaller size. Thus, 5-bit, 6-bit, or 7-bit data masquerading as 8-bit data compresses better than true 8-bit data. Smooth images also compress better than noisy images, and simple images compress better than complex images. o On a 68082- or 386-based computer, LZW software can be written to compress at between 30K and 80K bytes per second, depending on image characteristics. LZW decompression speeds are typically about 50K bytes per second. o LZW works well on bilevel images, too. It always beats PackBits, and generally ties CCITT 1D (Modified Huffman) compression, on our test images. Tying CCITT 1D is impressive in that LZW seems to be considerably faster than CCITT 1D, at least in our implementation. o Our implementation is written in C, and compiles to about 2K bytes of object code each for the compressor and decompressor. o One of the nice things about LZW is that it is used quite widely in other applications such as archival programs, and is therefore more of a known quantity. The Algorithm Each strip is compressed independently. We strongly recommend that RowsPerStrip be chosen such that each strip contains about 8K bytes before compression. We want to keep the strips small enough so that the compressed and uncompressed versions of the strip can be kept entirely in memory even on small machines, but large enough to maintain nearly optimal compression ratios. The LZW algorithm is based on a translation table, or string table, that maps strings of input characters into codes. The TIFF implementation uses variable-length codes, with a maximum code length of 12 bits. This string table is different for every strip, and, remarkably, does not need to be kept around for the decompressor. The trick is to make the decompressor automatically build the same table as is built when compressing the data. We use a C-like pseudocode to describe the coding scheme: InitializeStringTable(); WriteCode(ClearCode); Omega = the empty string; for each character in the strip { K = GetNextCharacter(); if Omega+K is in the string table { Omega = Omega+K; /* string concatenation */ } else { WriteCode (CodeFromString(Omega)); AddTableEntry(Omega+K); Omega = K; } } /* end of for loop */ WriteCode (CodeFromString(Omega)); WriteCode (EndOfInformation); That's it. The scheme is simple, although it is fairly challenging to implement efficiently. But we need a few explanations before we go on to decompression. The "characters" that make up the LZW strings are bytes containing TIFF uncompressed (Compression=1) image data, in our implementation. For example, if BitsPerSample is 4, each 8-bit LZW character will contain two 4-bit pixels. If BitsPerSample is 16, each 16-bit pixel will span two 8-bit LZW characters. (It is also possible to implement a version of LZW where the LZW character depth equals BitsPerSample, as was described by Draft 2 of Revision 5.0. But there is a major problem with this approach. If BitsPerSample is greater than 11, we can not use 12-bit-maximum codes, so that the resulting LZW table is unacceptably large. Fortunately, due to the adaptive nature of LZW, we do not pay a significant compression ratio penalty for combining several pixels into one byte before compressing. For example, our 4-bit sample images compressed about 3 percent worse, and our 1-bit images compressed about 5 percent better. And it is easier to write an LZW compressor that always uses the same character depth than it is to write one which can handle varying depths.) We can now describe some of the routine and variable references in our pseudocode: InitializeStringTable() initializes the string table to contain all possible single-character strings. There are 256 of them, numbered 0 through 255, since our characters are bytes. WriteCode() writes a code to the output stream. The first code written is a Clear code, which is defined to be code #256. Omega is our "prefix string." GetNextCharacter() retrieves the next character value from the input stream. This will be number between 0 and 255, since our characters are bytes. The "+" signs indicate string concatenation. AddTableEntry() adds a table entry. (InitializeStringTable() has already put 256 entries in our table. Each entry consists of a single-character string, and its associated code value, which is, in our application, identical to the character itself. That is, the 0th entry in our table consists of the string <0>, with corresponding code value of <0>, the 1st entry in the table consists of the string <1>, with corresponding code value of <1>, ..., and the 255th entry in our table consists of the string <255>, with corresponding code value of <255>.) So the first entry that we add to our string table will be at position 256, right? Well, not quite, since we will reserve code #256 for a special "Clear" code, and code #257 for a special "EndOfInformation" code that we will write out at the end of the strip. So the first multiple-character entry added to the string table will be at position 258. Let's try an example. Suppose we have input data that looks like: Pixel 0: <7> Pixel 1: <7> Pixel 2: <7> Pixel 3: <8> Pixel 4: <8> Pixel 5: <7> Pixel 6: <7> Pixel 7: <6> Pixel 8: <6> First, we read Pixel 0 into K. OmegaK is then simply <7>, since Omega is the empty string at this point. Is the string <7> already in the string table? Of course, since all single character strings were put in the table by InitializeStringTable(). So set Omega equal to <7>, and go to the top of the loop. Read Pixel 1 into K. Does OmegaK (<7><7>) exist in the string table? No, so we get to do some real work. We write the code associated with Omega to output (write <7> to output), and add OmegaK (<7><7>) to the table as entry 258. Store K (<7>) into Omega. Note that although we have added the string consisting of Pixel 0 and Pixel 1 to the table, we "re-use" Pixel 1 as the beginning of the next string. Back at the top of the loop. We read Pixel 2 into K. Does OmegaK (<7><7>) exist in the string table? Yes, the entry we just added, entry 258, contains exactly <7><7>. So we just add K onto the end of Omega, so that Omega is now <7><7>. Back at the top of the loop. We read Pixel 3 into K. Does OmegaK (<7><7><8>) exist in the string table? No, so write the code associated with Omega (<258>) to output, and add OmegaK to the table as entry 259. Store K (<8>) into Omega. Back at the top of the loop. We read Pixel 4 into K. Does OmegaK (<8><8>) exist in the string table? No, so write the code associated with Omega (<8>) to output, and add OmegaK to the table as entry 260. Store K (<8>) into Omega. Continuing, we get the following results: After reading: We write to output: And add table entry: Pixel 0 Pixel 1 <7> 258: <7><7> Pixel 2 Pixel 3 <258> 259: <7><7><8> Pixel 4 <8> 260: <8><8> Pixel 5 <8> 261: <8><7> Pixel 6 Pixel 7 <258> 262: <7><7><6> Pixel 8 <6> 263: <6><6> WriteCode() also requires some explanation. The output code stream, <7><258><8><8><258><6>... in our example, should be written using as few bits as possible. When we are just starting out, we can use 9-bit codes, since our new string table entries are greater than 255 but less than 512. But when we add table entry 512, we must switch to 10-bit codes. Likewise, we switch to 11-bit codes at 1024, and 12-bit codes at 2048. We will somewhat arbitrarily limit ourselves to 12-bit codes, so that our table can have at most 4096 entries. If we push it any farther, tables tend to get too large. What happens if we run out of room in our string table? This is where the afore-mentioned Clear code comes in. As soon as we use entry 4094, we write out a (12-bit) Clear code. (If we wait any dworder to write the Clear code, the decompressor might try to interpret the Clear code as a 13-bit code.) At this point, the compressor re-initializes the string table and starts writing out 9-bit codes again. Note that whenever you write a code and add a table entry, Omega is not left empty. It contains exactly one character. Be careful not to lose it when you write an end-of-table Clear code. You can either write it out as a 12-bit code before writing the Clear code, in which case you will want to do it right after adding table entry 4093, or after the clear code as a 9-bit code. Decompression gives the same result in either case. To make things a little simpler for the decompressor, we will require that each strip begins with a Clear code, and ends with an EndOfInformation code. Every LZW-compressed strip must begin on a byte boundary. It need not begin on a word boundary. LZW compression codes are stored into bytes in high-to-low-order fashion, i.e., FillOrder is assumed to be 1. The compressed codes are written as bytes, not words, so that the compressed data will be identical regardless of whether it is an "II" or "MM" file. Note that the LZW string table is a continuously updated history of the strings that have been encountered in the data. It thus reflects the characteristics of the data, providing a high degree of adaptability. LZW Decoding The procedure for decompression is a little more complicated, but still not too bad: while ((Code = GetNextCode()) != EoiCode) { if (Code == ClearCode) { InitializeTable(); Code = GetNextCode(); if (Code == EoiCode) break; WriteString(StringFromCode(Code)); OldCode = Code; } /* end of ClearCode case */ else { if (IsInTable(Code)) { WriteString(StringFromCode(Code)); AddStringToTable(StringFromCode(OldCode)+ FirstChar(StringFromCode(Code))); OldCode = Code; } else { OutString = StringFromCode(OldCode) + FirstChar(StringFromCode(OldCode)); WriteString(OutString); AddStringToTable(OutString); OldCode = Code; } } /* end of not-ClearCode case */ } /* end of while loop */ The function GetNextCode() retrieves the next code from the LZW- coded data. It must keep track of bit boundaries. It knows that the first code that it gets will be a 9-bit code. We add a table entry each time we get a code, so GetNextCode() must switch over to 10-bit codes as soon as string #511 is stored into the table. The function StringFromCode() gets the string associated with a particular code from the string table. The function AddStringToTable() adds a string to the string table. The "+" sign joining the two parts of the argument to AddStringToTable indicate string concatenation. StringFromCode() looks up the string associated with a given code. WriteString() adds a string to the output stream. When SamplesPerPixel Is Greater Than 1 We have so far described the compression scheme as if SamplesPerPixel were always 1, as will be be the case with palette color and grayscale images. But what do we do with RGB image data? Tests on our sample images indicate that the LZW compression ratio is nearly identical regardless of whether PlanarConfiguration=1 or PlanarConfiguration=2, for RGB images. So use whichever configuration you prefer, and simply compress the bytes in the strip. It is worth cautioning that compression ratios on our test RGB images were disappointing low: somewhere between 1.1 to 1 and 1.5 to 1, depending on the image. Vendors are urged to do what they can to remove as much noise from their images as possible. Preliminary tests indicate that significantly better compression ratios are possible with less noisy images. Even something as simple as zeroing out one or two least-significant bitplanes may be quite effective, with little or no perceptible image degradation. Implementation The exact structure of the string table and the method used to determine if a string is already in the table are probably the most significant design decisions in the implementation of a LZW compressor and decompressor. Hashing has been suggested as a useful technique for the compressor. We have chosen a tree based approach, with good results. The decompressor is actually more straightforward, as well as faster, since no search is involved - strings can be accessed directly by code value. Performance Many people do not realize that the performance of any compression scheme depends greatly on the type of data to which it is applied. A scheme that works well on one data set may do poorly on the next. But since we do not want to burden the world with too many compression schemes, an adaptive scheme such as LZW that performs quite well on a wide range of images is very desirable. LZW may not always give optimal compression ratios, but its adaptive nature and relative simplicity seem to make it a good choice. Experiments thus far indicate that we can expect compression ratios of between 1.5 and 3.0 to 1 from LZW, with no loss of data, on continuous tone grayscale scanned images. If we zero the least significant one or two bitplanes of 8-bit data, higher ratios can be achieved. These bitplanes often consist chiefly of noise, in which case little or no loss in image quality will be perceived. Palette color images created in a paint program generally compress much better than continuous tone scanned images, since paint images tend to be more repetitive. It is not unusual to achieve compression ratios of 10 to 1 or better when using LZW on palette color paint images. By way of comparison, PackBits, used in TIFF for black and white bilevel images, does not do well on color paint images, much less continuous tone grayscale and color images. 1.2 to 1 seemed to be about average for 4-bit images, and 8-bit images are worse. It has been suggested that the CCITT 1D scheme could be used for continuous tone images, by compressing each bitplane separately. No doubt some compression could be achieved, but it seems unlikely that a scheme based on a fixed table that is optimized for word black runs separated by dworder white runs would be a very good choice on any of the bitplanes. It would do quite well on the high-order bitplanes (but so would a simpler scheme like PackBits), and would do quite poorly on the low-order bitplanes. We believe that the compression ratios would generally not be very impressive, and the process would in addition be quite slow. Splitting the pixels into bitplanes and putting them back together is somewhat expensive, and the coding is also fairly slow when implemented in software. Another approach that has been suggested uses uses a 2D differencing step following by coding the differences using a fixed table of variable-length codes. This type of scheme works quite well on many 8-bit grayscale images, and is probably simpler to implement than LZW. But it has a number of disadvantages when used on a wide variety of images. First, it is not adaptive. This makes a big difference when compressing data such as 8-bit images that have been "sharpened" using one of the standard techniques. Such images tend to get larger instead of smaller when compressed. Another disadvantage of these schemes is that they do not do well with a wide range of bit depths. The built-in code table has to be optimized for a particular bit depth in order to be effective. Finally, we should mention "lossy" compression schemes. Extensive research has been done in the area of lossy, or non- information-preserving image compression. These techniques generally yield much higher compression ratios than can be achieved by fully-reversible, information-preserving image compression techniques such as PackBits and LZW. Some disadvantages: many of the lossy techniques are so computationally expensive that hardware assists are required. Others are so complicated that most microcomputer software vendors could not afford either the expense of implementation or the increase in application object code size. Yet others sacrifice enough image quality to make them unsuitable for publishing use. In spite of these difficulties, we believe that there will one day be a standardized lossy compression scheme for full color images that will be usable for publishing applications on microcomputers. An International Standards Organization group, ISO/IEC/JTC1/SC2/WG8, in cooperation with CCITT Study Group VIII, is hard at work on a scheme that might be appropriate. We expect that a future revision of TIFF will incorporate this scheme once it is finalized, if it turns out to satisfy the needs of desktop publishers and others in the microcomputer community. This will augment, not replace, LZW as an approved TIFF compression scheme. LZW will very likely remain the scheme of choice for Palette color images, and perhaps 4-bit grayscale images, and may well overtake CCITT 1D and PackBits for bilevel images. Future LZW Extensions Some images compress better using LZW coding if they are first subjected to a process wherein each pixel value is replaced by the difference between the pixel and the preceding pixel. Performing this differencing in two dimensions helps some images even more. However, many images do not compress better with this extra preprocessing, and for a significant number of images, the compression ratio is actually worse. We are therefore not making differencing an integral part of the TIFF LZW compression scheme. However, it is possible that a "prediction" stage like differencing may exist which is effective over a broad range of images. If such a scheme is found, it may be incorporated in the next major TIFF revision. If so, a new value will be defined for the new "Predictor" TIFF tag. Therefore, all TIFF readers that read LZW files must pay attention to the Predictor tag. If it is 1, which is the default case, LZW decompression may proceed safely. If it is not 1, and the reader does not recognize the specified prediction scheme, the reader should give up. Acknowledgements The original LZW reference has already been given. The use of ClearCode as a technique to handle overflow was borrowed from the compression scheme used by the Graphics Interchange Format (GIF), a small-color-paint-image-file format used by CompuServe that also is an adaptation of the LZW technique. Joff Morgan and Eric Robinson of Aldus were each instrumental in their own way in getting LZW off the ground. The TIFF predictor algorithm The idea is to make use of the fact that many continuous tone images rarely vary much in pixel value from one pixel to the next. In such images, if we replace the pixel values by differences between consecutive pixels, many of the differences should be 0, plus or minus 1, and so on. This reduces the apparent information content, and thus allows LZW to encode the data more compactly. Assuming 8-bit grayscale pixels for the moment, a basic C implementation might look something like this: char image[ ][ ]; int row, col; /* take horizontal differences: */ for (row = 0; row < nrows; row++) for (col = ncols - 1; col >= 1; col--) image[row][col] -= image[row][col-1]; If we don't have 8-bit samples, we need to work a little harder, so that we can make better use of the architecture of most CPUs. Suppose we have 4-bit samples, packed two to a byte, in normal TIFF uncompressed (i.e., Compression=1) fashion. In order to find differences, we want to first expand each 4-bit sample into an 8-bit byte, so that we have one sample per byte, low-order justified. We then perform the above horizontal differencing. Once the differencing has been completed, we then repack the 4- bit differences two to a byte, in normal TIFF uncompressed fashion. If the samples are greater than 8 bits deep, expanding the samples into 16-bit words instead of 8-bit bytes seems like the best way to perform the subtraction on most computers. Note that we have not lost any data up to this point, nor will we lose any data later on. It might at first seem that our differencing might turn 8-bit samples into 9-bit differences, 4- bit samples into 5-bit differences, and so on. But it turns out that we can completely ignore the "overflow" bits caused by subtracting a larger number from a smaller number and still reverse the process without error. Normal twos complement arithmetic does just what we want. Try an example by hand if you need more convincing. Up to this point we have implicitly assumed that we are compressing bilevel or grayscale images. An additional consideration arises in the case of color images. If PlanarConfiguration is 2, there is no problem. Differencing proceeds the same way as it would for grayscale data. If PlanarConfiguration is 1, however, things get a little trickier. If we didn't do anything special, we would be subtracting red sample values from green sample values, green sample values from blue sample values, and blue sample values from red sample values, which would not give the LZW coding stage much redundancy to work with. So we will do our horizontal differences with an offset of SamplesPerPixel (3, in the RGB case). In other words, we will subtract red from red, green from green, and blue from blue. The LZW coding stage is identical to the SamplesPerPixel=1 case. We require that BitsPerSample be the same for all 3 samples. Results and guidelines LZW without differencing works well for 1-bit images, 4-bit grayscale images, and synthetic color images. But natural 24-bit color images and some 8-bit grayscale images do much better with differencing. For example, our 24-bit natural test images hardly compressed at all using "plain" LZW: the average compression ratio was 1.04 to 1. The average compression ratio with horizontal differencing was 1.40 to 1. (A compression ratio of 1.40 to 1 means that if the uncompressed image is 1.40MB in size, the compressed version is 1MB in size.) Although the combination of LZW coding with horizontal differencing does not result in any loss of data, it may be worthwhile in some situations to give up some information by removing as much noise as possible from the image data before doing the differencing, especially with 8-bit samples. The simplest way to get rid of noise is to mask off one or two low- order bits of each 8-bit sample. On our 24-bit test images, LZW with horizontal differencing yielded an average compression ratio of 1.4 to 1. When the low-order bit was masked from each sample, the compression ratio climbed to 1.8 to 1; the compression ratio was 2.4 to 1 when masking two bits, and 3.4 to 1 when masking three bits. Of course, the more you mask, the more you risk losing useful information adword with the noise. We encourage you to experiment to find the best compromise for your device. For some applications it may be useful to let the user make the final decision. Interestingly, most of our RGB images compressed slightly better using PlanarConfiguration=1. One might think that compressing the red, green, and blue difference planes separately (PlanarConfiguration=2) might give better compression results than mixing the differences together before compressing (PlanarConfiguration=1), but this does not appear to be the case. Incidentally, we tried taking both horizontal and vertical differences, but the extra complexity of two-dimensional differencing did not appear to pay off for most of our test images. About one third of the images compressed slightly better with two-dimensional differencing, about one third compressed slightly worse, and the rest were about the same. --- BMP RLE_8 compression The BMP can be compressed in two modes, absolute mode and RLE mode. Both modes can occur anywhere in a single bitmap. The RLE mode is a simple RLE mechanism, the first byte contains the count, the second byte the pixel to be replicatet. If the count byte is 0, the second byte is a special, like EOL or delta. In absolute mode, the second byte contains the number of bytes to be copied litteraly. Each absolute run must be word-aligned that means you will may have to add an aditional padding byte which is not included in the count. After an absolute run, RLE compression continues. Second byte Meaning 0 End of line 1 End of bitmap 2 Delta. The next two bytes are the horizontal and vertical offsets from the current position to the next pixel. 3-255 Switch to absolute mode --- BMP RLE_4 compression RLE_4 compression knows the two modes of the RLE_8 compression, absolute and RLE. In the RLE mode, the first byte contains the count of pixels to draw, the second byte contains in its two nibbles the indices off the pixel colors, the higher 4 bits are the left pixel, the lower 4 bits are the right pixel. Note that two-color runs are possible to encode with RLE_4 through this. --- Protracker sample compression / decompression Get the number of sample bytes to process. Call this SamplesLeft. Set Delta counter to 0. DO Get a byte from the buffer. Store the byte in Temp. Subtract the Delta counter from the byte. Store it in the buffer. Move the Temp byte into the Delta Counter Decrement SamplesLeft. WHILE(SamplesLeft <> 0) The technique for conversion back to the raw data is: Get the number of sample bytes to process. Call this SamplesLeft. Set Delta counter to 0. DO Get a byte from the buffer. Add onto the byte the Delta Counter. Store the byte in Delta Counter. Store the byte in Temp. Decrement SamplesLeft. WHILE(SamplesLeft <> 0) --------!-ADDRESSES------------------------- Useful adresses International Midi Association 5316 West 57th Street Los Angeles, CA 90056 xx1-213-649-6434 xx1-213-215-3380 fax --------!-HISTORY--------------------------- History is kept within this file for convenience whilst editing ... Date format is european/german, just for my convenience. Date Who What 14.03.95 MM Introduced tables Last table number=0012 05.06.95 MM + PTM format 25.07.95 MM + PIF format + Paradox format description 11.08.95 MM + MS Compress variants 18.11.95 MM + ARC enhancements, caveats + HA files 22.11.95 MM + Parts of the .CRD files 01.02.96 MM + PNG structure 02.02.96 MM + More on JPEG + TARGA entry created