« DataPortability.orgPrevious Links for January 2008Next »

File Format Registries


2008-01-27

An interesting article File format typing and format registries (found via Gary McGath) got me thinking about my goals for FileFormat.Info and where it fits in among these larger, funded efforts.

FileFormat.Info is a side project for me: I am only working on it in my spare time, and even if I could put in more time, I am more of a programmer than a writer or researcher (witness this blog). Most of the raw content is comes from elsewhere.

While I am disappointed that the article does not mention FileFormat.Info, I am not surprised. FileFormat.Info is probably not at the point where it could be considered a true format repository.

So what is the point of FileFormat.Info? These repositories are funded and actively developed. Is there any value in continuing? Well, I do not see the point of trying to compete directly, since I am behind now and I am not doing anything to catch up.

However, there are some areas where these repositories are not addressing everything that can be done:

  • "Not invented here" syndrome - they all want to research and write their own content. Usually this is a good idea, since you can control the quality and be sure you have the intellectual property rights to it. However, sometimes even partial or semi-correct information can be valuable. And hopefully the rights will become available eventually. I am not shy about accepting other people content (and people are usually willing to share as long I acknowledge that they are the authors).
  • Not enough information - they are good at cataloging the information, but don't generally store source code or sample files or mirrors of the official specifications.
  • Non-format information for digital preservation - File formats are only one piece of the puzzle needed to preserve digital information. Things like storage media and file systems are also necessary.
  • Existing sources - they are ignoring the existing sources of format information that are in actual use: the MIME content-type registry and the Unix file/magic system.

Of course, Wikipedia does much of this, but it is so big and so unfocused that a well-organized site dedicated to a specific area is still better. And Wikipedia's all-human process is not good at automatically republishing content from other sources.

So hopefully FileFormat.Info still has a reason to exist!

Tags: competitor ffi identification preservation

File Formats: (none)