Strings in CMU.TXT

Data stream formats in the Andrew User Interface System 
Wilfred J. Hansen 
Andrew Consortium 
Carnegie Mellon University 
(The Andrew Toolkit (ATK) is the architecture and tools for building
application in the Andrew User Interface System.) 
In order to support the inclusion of arbitrary objects in multi-media
editors, the Andrew Toolkit requires data objects to conform to a set of
conventions for their file representation.  A data object must write its
data enclosed in a begin/end marker pair.  The marker must include a tag
denoting the type of the object being written and a unique identifier,
used for referencing the data object by other data objects.  If a data
object includes other data objects, they must be properly nested.  The
begin/end markers make it possible to find the data associated with an
object without actually parsing the data. 
For example, a text with an embedded picture has the format: 
\begindata{text,1} 
<text data> 
\begindata{picture,2} 
<picture data> 
\enddata{picture,2} 
\view {pictureview,2} 
<more text data> 
\enddata{text,1} 
In order to transport files across most networks, data streams use only
printable 7-bit ASCII characters, including tab, space and new-line, and
keep line lengths below 80 characters. 
____________________________________ 
Text format 
Text data streams in the Andrew User Interface System follow the general 
principles for Andrew Toolkit data streams.  The overall structure 
of a text data stream is 
A. \begindata line 
B. \textdsversion line 
C. \template line 
D. definitions of additional styles 
E. the text body itself 
F. styled text 
G. embedded objects in text body 
H. \enddata line 
Subsequent sections of this document describe each of these components. 
As usual in ATK, the appropriate way to read or write the data stream is 
to call upon the corresponding Read or Write method from the AUIS 
distribution.  Only in this way is your code likely to continue to work in 
the face of changes to the data stream definition.  Moreover, there are 
a number of special features--mostly outdated data streams--that are 
implemented in the code, but not described here. 
A. \begindata line 
Standard ATK begindata line having the form 
\begindata{text,99999} 
where 99999 is some identifying number unique within this data stream. 
B. \textdsversion line 
This line always has the form 
\textdsversion{12} 
There exist files written with earlier data stream versions having values 
other than 12. 
C. \template line 
If the file utilizes a style template, there will be a line of this form: 
\template{default} 
where 'default' is whatever template name is used.  This template name is 
the prefix of a filename.  The name is appended with the suffix ".tpl" and 
sought in the directories named in the user's atktemplatepath preference 
value.  If there is none, the default directory is $ANDREWDIR/lib/tpls. 
'default' is the most usual template name.  Every installation of AUIS 
is expected to have $ANDREWDIR/lib/tpls/default.tpl and its styles are 
not defined further in the document. 
D. definitions of additional styles 
A document may define and use styles that are not in the template.  Each 
such definition is two or more lines: 
\define{internalstylename 
menuname 
attribute 
. . . 
attribute} 
The internalstylename is lower case and may have digits, but no spaces. 
There may be no menuname, in which case there is an empty line;  if there 
a menuname line, it is of the form 
menu:[Menu card name,Style name] 
If there are no attributes, the closing '}' appears at the end of the 
menuname line.  Each attribute line is of the form 
attr:[attributename basis units value] 
where the first three are strings and the fourth is an integer, possibly 
signed.  The specific values allowed are beyond the scope of this document; 
they do correspond closely to values in style.H. 
E. the text body itself 
Text is represented by itself.  n consecutive newlines in the text are 
represented by n+1 newlines in the data stream.  Single newlines are used 
to break the stream into lines of less than 80 bytes;  these are ignored 
when the file is read.  Earlier data stream versions required a sapce before 
a newline if there was to be a space in the text;  version 12 invents a 
space before the newline if one is not there.  The latter is prevented by 
ending the line with a single backslash (\).  If a sentence ends a line and 
has more than one space after its punctuation, the additional spaces must 
appear at the start of the next line.  The characters backslash, left 
brace, and right brace are always preceded in the text with a backslash. 
There is a convention for representing non-ASCII ISO-8859 characters, but 
I don't know what it is offhand. 
F. styled text 
If text in the body is to be displayed in a style, e.g. italic, the text is 
preceded with 
\internalstylename{ 
and followed by a closing curly brace.  The internal style name is 
one of the names defined either in the template or in a \define line. 
G. embedded objects in text body 
When an object is embedded in a text body, two items appear: the data 
stream for the object and a \view line.  The \begindata for the object is 
always at the beginning of a line.  (The previous line is terminated with 
backslash if there is to be no space before the object.)  The \enddata 
line for the object always ends with a newline (which is not treated as a 
space).  
The \view line has the form: 
\view{rasterview,8888,777,0,0} 
In future data stream versions, other items may appear before the '}'; each 
such item is preceded by a comma.  The first item in the list is the textual 
name of the view object to be used to display the dataobject.  The second 
item is the identifing integer that also appears in the \begindata for the 
object.  The third value is ignored.  The fourth and fifth items are 
usually zero;  however, if non-zero the specify the desired width and height 
to display the object. 
H. \enddata line 
Has the form 
\enddata{text,99999} 
that is, it is the same as the \begindata line, but has 'end' instead 
of 'begin'. 
____________________________________ 
Format of ATK raster images 
The raster data object writes a standard ATK data stream beginning with 
a \begindata line and ending with a \enddata line.  Between these comes 
a header and possibly an image body. 
The first line of the header looks like this: 
2 0 65536 65536 0 0 484 603 
Where the values are these: 
RasterVersion:  '2'  
This specification describes the second version of this encoding. 
Options: '0' 
This field may specify changes to the image before displaying it: 
raster_INVERT(1<<0)/* exchange black and white */ 
raster_FLIP(1<<1)/* exch top and bottom */ 
raster_FLOP(1<<2)/* exch left and right */ 
raster_ROTATE(1<<3)/* rotate 90 clockwise */ 
xScale, yScale:  '65536 65536' 
These scale factors affect the size at which the image is printed. 
The value raster_UNITSCALE (136535) will print the image at 
approximately the size on the screen.  The default scale of 
65536 is approximately half the screen size.  (It is not 
exactly half screen size in an effort to simplify scaling on 
300-dots-per-inch printers.) 
x, y, width, height:    '0 0 484 603' 
It is possible for a raster object to display a portion of an 
image.  These fields select this portion by specifying the 
index of the upper left pixel and the width and height of the 
image in pixels. 
In all instances so far, x and y are both zero and the width 
and height specify the entire raster. 
The second header line specifies the actual raster in one of three forms; 
but only the first of these forms is actually used. 
First form: 
bits 10156544 484 603 
RasterType: 'bits' 
This form. 
RasterId: '10156544' 
An identifier so other raster objects can refer to this one. 
Usually this is the same identifier as in the \begindata line. 
Width, Height:  '484 603' 
Describes the width of each row and the number of rows. 
This many rows follow one subsequent lines. 
Second form:   refer 10135624 
The current data object does not have the bits, but refers to the 
bits as stored in another data object (which should appear earlier 
in the same data stream.)   'refer' identifies this form and 
the integer is the identifying number. 
Third form:   file 10235498 filename path 
The raster is not in the current data object, but is in a file. 
'file' identifies this form.  The id number '10235498' allows 
this raster data to be refered to by a 'refer' form.  The filename 
is the full pathname of the file.  Path is the element of a 
"rasterpath" list against which the filename was resolved. 
(This is not fully implemented.  The idea is to acheive a 
measure of recovery in case the file is moved.) 
In the first form ('bits'), the header is followed by lines specifying the 
image.  There is at least one line per raster row, though some rows may take 
more lines.  The bits of a row are encoded in blocks of eight; a multiple 
of 8 bits are specified, though trailing bits will be ignored after reading 
the row.  Following the last bits for a row are a space, a vertical bar (|), 
and a newline.  Basically, white space is to be ignored, so the bytes of the 
encoding are broken into blocks of 13 or 14 bytes separated with tabs. 
The bits of the row are run-length encoded by bytes.  That is, a sequence 
of identical bytes will be represented in only a few bytes rather than 
at full length.  Hexadecimal is a subset of this encoding with a one bit 
representing black and zero for white.  Here is the 
interpretation of each range of byte values: 
control characters and space: 
Ignored. 
@ [ ] ^ _ ` } ~ 0x7F and all characters with high bit set: 
These are errors, but at present they are ignored. 
{ \: 
Illegal end of line.  Treat as end of row. 
Legal end of row.  If there have not been enough 
codes for the entire width, pad with white bits. 
0x21 ... 0x2F  (punctuation characters) 
The next two bytes specify a hex value.  This value is 
to be repeated in the row the number of times given 
by c-0x1F, where c is the input code.  (That is, 0x21 means 
to repeat the byte two times, 0x22 three times, and so on.) 
0x30 ... 0x3F  (digit or punctuation) 
This is a hex digit and encodes one byte of the row 
with the value c-0x30. 
A ... F  a ... f 
These are hex digits with values 0xA ... 0xF. 
g ... z 
Multiple white bytes.  c-'f' bytes of white are generated into row 
G ... Z 
Multiple black bytes.  c- 'F' bytes of black are generated into row 
\begindata{text,538375988} 
\textdsversion{12} 
\template{default} 
\define{global 
\define{up15 
menu:[Justify,Up15] 
attr:[Script PreviousScriptMovement Point -15]} 
This is text in the document.  \italic{This is 
italic.}  These two lines are one paragraph. 
This paragraph is preceded by two newlines, but it will be 
displayed with only one blank line between it and the previous one. 
 When two space are required between words, the second 
must appear at the beginning of a line.  When a newli\ 
ne is not to be replaced with a space, it must be preceded 
with backslash. 
\begindata{bp,9233088} 
\enddata{bp,9233088} 
\view{bpv,9233088,38,0,0} 
This second page has a raster on it. 
\begindata{raster,10156544} 
2 0 68266 68266 0 0 484 603 
bits 10156544 484 603 
zzzg | 
zzzg | 
7fZZHfeKfeOc0g | 
. . . 
zzzg | 
\enddata{raster, 10156544} 
\view{rasterview,10156544,31,0,0} 
\enddata{text,538375988} 
-----------------------------
The only immediate comment I would add is that, if you come across a
file which purports to be an AUIS raster file, it may be either of two
things: 
A) A raster datastream, as defined in Fred's document under "Format of
ATK raster images." The first line of this file would be 
\begindata{raster,99999} 
(with an arbitrary ID integer in place of the 999999). This would be
followed by the header and possibly the image body, and then the final
line would be 
\enddata{raster,99999} 
(with the same integer.) 
B) A text datastream (or some other kind of datastream) containing a
raster as an embedded object and no other data. This is not the
preferred way to store a raster image, but it tends to happen every now
and then. 
In this case, the raster datastream will occur, as described above,
somewhere within the larger datastream. It is legal to read in lines and
throw them away until you find a line that begins 
\begindata{raster, 
(The backslash will always be the first character on the line.) You then
read in the datastream until the \enddata line occurs, and ignore the
rest of the file. (You can compare the ID numbers as a consistency
check).