Why did we write this book? The short answer is that graphics file formats are immortal. Like it or not, data files from the dawn of the computer age are still with us, and they're going to be around for a long time to come. Even when the way we think about data itself changes (as it inevitably will), hundreds of millions of files will still be out there in backup storage. We'll always need a way to read, understand, and display them.
About This Book and the CD-ROM
About the Online Product
Who Is the Book For?
How to Use the Book
Conventions Used in This Book
Terminology of Computer Graphics
About the File Format Specifications
About the Examples
About the Images
About the Contributed Software
Request for Comments
Computer technology evolves rapidly. Hardware, particularly on the desktop, turns over every year or so. Software can become obsolete overnight with the release of a new version. The one thing that remains static is data, which for our purposes means information stored in data files on disk or tape. In this book we're interested in one specific type of data--that used for the interchange and reconstruction of graphics images.
Graphics data files are structured according to specific format conventions, which are (or should be) recorded in format specification documents written and maintained by the creator of the format. Not all formats are documented, however, and some documents are so sparse, poorly written, or out of date that they are essentially useless. Moreover, some format specifications are very difficult to obtain: the creator of the format might have moved; the format might have been sold to another organization; or the organization that owns the format might not actively support or distribute it. These facts make it difficult for someone who needs to find out about the specifics of a particular graphics file format to locate and understand the file format specification. We wrote this book because we saw a need for a centralized source of information, independent of the commercial marketplace, where anyone could obtain the information needed to read graphics files.
When we set out to write this book, we asked the obvious questions: How would we implement an existing format? What resources would we need? Ideally, we would like to have on hand a good book on the subject, and perhaps some working code. Barring that, we'd make do with the original format specification and some advice. Barring that, we'd scrape by with the format specification alone. This book provides as much of this as possible; the format specification is here in most cases, as is some code--even some advice, which, because it's coming from a book, you're free to take or leave as you choose.
To give you some idea about what was on our minds during the planning of this book, we'd like to mention some issues that frequently come up for programmers who need to learn about and implement file formats. In the course of writing this book, both of us (as consultants and veteran users of networked news and bulletin board systems) talked with and observed literally hundreds of other programmers. The following is a sampling of questions frequently asked on the subject of graphics file formats, and comments on how we have addressed them in this book:
"How can I get a copy of the written specification for format XYZ?"
Rarely does a day go by without a request for a written format specification--TIFF, GIF, FaceSaver, PNG, QRT, and many, many more. Unfortunately, there is no single source for even the most common format specifications. A number of format archives are available online, but they contain only what the maintainer has the time and resources to assemble. Each of the books previously on the market has offered a limited subset of the specifications out there.
"I'm trying to implement specification XYZ. I'm having trouble with ABC."
Programmers almost always believe that only the specification document is needed in order to implement a file format. Sadly, if you read a few format specifications, you'll soon discover that there is no law requiring that documentation be written clearly. Specifications, like all technical documents, are written by people with varying degrees of literacy, knowledge, and understanding of the subject in question. Consequently, they range from clearly written and very helpful to unorganized and confusing. Some documents, in fact, are nearly useless. The programmer is eventually forced to become conversant with the oral tradition.
Even if the specification document is well done, written between the lines is a complex set of assumptions about the data. How complicated can a format be? After hours of fiddling with color triples, index map offsets, page tables, multiple header versions, byte-order problems, and just plain bad design, you may well find yourself begging for help while the clock counting your online dollars ticks on. Another goal of this book is to provide a second opinion for use by programmers who find themselves confused by the contents of the documents.
"What does Z mean?"
In this case, Z is basic technical graphics information. Everything a programmer needs to know to read, write, encode, and decode a format is in the specification document, right? Unfortunately, writers of format specifications often use vocabulary foreign to most programmers. For instance, the format might have been created in support of an application that used terminology from the profession of the target users. The meaning of a term might have changed since the time that the format was written, years ago. You might also find that different format specifications have different names for the same thing (e.g., color table, color map, color palette, look-up table, color index table, and so on). In this book, we provide basic guidance whenever possible.
"What is an X.Y file?"
If you scan the computer graphics section of any online service, bulletin board system, or news feed, you will find numerous general questions from users about graphics files, the pros and cons of each format, and sources of image files. Surprisingly, there is no single source of information on the origin, use, and description of most of the graphics file formats available today. Some of this information, particularly on the more common formats (e.g., TIFF, GIF, PCX), is scattered through books and magazine articles published over the last ten years. Other information on the less common formats is available only from other programmers, or (in some extreme cases) from the inventor of the format. Another goal of this book is to include historical and contextual information, including discussions of the strengths and weaknesses of each format.
"Is there a newer version of the XYZ specification than version 1.0?"
Occasionally, this question comes from someone who, specification in hand, just finished writing a format reader only to have it fail when processing sample files that are known to be good. The hapless programmer no doubt found a copy of the format specification, but not, of course, the latest revision. Another of our goals is to provide access to the latest format revisions in this book and keep this information up to date.
"How can I convert an ABC file to an XYZ file?"
Programmers and graphic designers alike are often stumped by this question. They've received a file from a colleague, an author, or a client, and they need to read it, print it, or incorporate it in a document. They need to convert it to something their platform, application, or production environment knows how to deal with. If this is your problem, you'll find this book helpful in a number of ways. In the first place, it will give you the information you need to identify what this file is and what its conversion problems are. We'll give you specific suggestions on how to go about converting the file. Most importantly, we've included a number of software packages that will convert most graphics files from one format to another. Whether you are operating in an Windows, MS-DOS, OS/2, Macintosh, or UNIX environment, you should be able to find a helpful tool.
We'd like to make it easier for you to understand and implement the graphics file formats mentioned in this book. Where does information on the hundreds of graphics file formats in use today come from? Basically, from four sources:
What we've tried to do is to collect these four elements together in one place. Of course not all were available for every format, and sometimes we weren't allowed to include the original specification document. Nevertheless, we've pulled together all the information available. Taken together, the information provided in this and in the materials on the CD-ROM should allow you to understand and implement most of the formats. In this second edition--more about this later--we also provide links on the CD-ROM to the O'Reilly GFF Web Center on the World Wide Web, where we're able to provide up-to-date information and additional resources, as they become available.
Our primary goal in writing this book is to establish a central repository of graphics file format specifications. Because the collected specification documents (not to mention the sample images and associated code and software packages!) total in the hundreds of megabytes, the best way to put them in your hands is on a CD-ROM. What this means is that the CD-ROM is an integral part of the book, if only for the fact that all this information could never be crammed between two covers.
We've written an article describing each graphics file format; this article condenses and summarizes the information we've been able to collect. In some cases this information is extensive; in other cases it's not much. This is the name of the game, unfortunately. When we do have adequate information, we've concentrated on conveying some understanding of the formats, which in many cases means going through them in some detail. Remember, though, that sometimes the specification document does a better job than we could ever do of explaining the nitty-gritty details of the format.
On the CD-ROM, you'll find the original format specifications (when available and when the vendors gave us permission to include them). If we know how to get the specifications, but couldn't enlist the aid of the vendors, we tell you where to go to find them yourself. Also on the CD-ROM is sample code that reads and writes a variety of file formats, and a number of widely-used third-party utilities for file manipulation and conversion. Finally, we've included sample images for many formats. If you have Internet access, you'll be able to get updates and new resources at our Web site.