Document: FTS-0501
Version:  003
Date:     1996-06-03
Title:    Mail packet format
Current Author: Paul Edwards
Current Addresses: 3:711/934@fidonet
Authors: Paul Edwards 3:711/934@fidonet
         Leonard Erickson 1:105/51@fidonet
Copyright: This document is released to the public domain


NOTE
----

If you intend to follow this proposal you must implement all of
it.  If you have any questions as to how certain sentences are
to be interpreted, do not hesitate to contact the author of this
document, whose address is at the top of this document.


BODY
----

This file documents the format of files that are received by
a mailer as described by the FTSCD specs.

Terms:

byte - a binary number from 0 to 255 stored in 8 bits.

integer - a 2-byte numeric data type, stored in little-endian format, 
ie low byte then high byte.  The values specified are to be treated
as unsigned integers, so there are no complications as to whether
negative numbers are represented as 2's complement, sign-magnitude
etc.  These are straight binary values.

bytes - a sequence of bytes


A file will start off with a packet header, which is 58 bytes 
long.  The format of the packet header is:


Offset
Dec  Hex  Length   Data Type   Contents
 0    0    2       integer     originating node
 2    2    2       integer     destination node
 4    4    2       integer     4-digit year of packet creation date
 6    6    2       integer     month component of packet creation date (0-11)
 8    8    2       integer     day component of packet creation date (1-31)
10    A    2       integer     hour (0-23)
12    B    2       integer     minutes (0-59)
14    D    2       integer     seconds (0-59)
16   10    2       integer     baud (useless)
18   12    2       integer     constant value of 2, the packet type
20   14    2       integer     originating net
22   16    2       integer     destination net
24   18    1       byte        product code (optional)
25   19    1       byte        serial number (optional)
26   1A    8       bytes       password (optional)
34   22    2       integer     originating zone
36   24    2       integer     destination zone

Then comes one of the following...

Type 2...

38   26   20       bytes       filler (must be set to NULs if *generating*,
                               could be anything on *reading*)

Extend Type 2+ (optional)...

40   28    2       integer     inverted capability word
42   2A    1       byte        high byte of product code
43   2B    1       byte        minor version of product
44   2C    2       integer     capability word
46   2E    2       integer     originating zone (optional)
48   30    2       integer     destination zone (optional)
50   32    2       integer     originating point (optional)
52   34    2       integer     destination point (optional)
54   36    4       bytes       product-specific field


Notes:

Any unused bytes in the password field must be NUL-filled.  The
password is normally completely NUL, and is there mainly for use
by non-mailprocessors.

Any optional fields must be set to NULs when generating (if not
used), but should not be expected to be NULs when reading.

The defined range of zone, net, node and point are 0-32767.  In
addition, net and node may have the value 0xFFFF, which represents
"-1".

The 1-byte product code is used as follows:
0x00 should be used by new programs.  For historical reasons,
most existing programs use codes from 0x00-0xFE.  0xFF is 
strictly reserved.

In a Type-2+ header, the serial number is used to store the
product high-level version number.


After the packet header comes 0 or more messages, followed by two
NUL characters, which signifies the end of the packet.  A message
will never start with two NUL characters.

The messages are stored as follows:

First there is a 14-byte message header, then there are some
variable-length NUL-terminated strings, then there is a text
block, which is also NUL-terminated.

In general, messages look like this:

1. message header
2. area line (if echomail)
3. optional ctrl-a control lines
4. user text, optionally interspersed with intra-text control lines
5. optional ctrl-a control lines
6. tear line (if echomail)
7. origin line (if echomail)
8. seen-by lines (if echomail)
9. optional ctrl-a control lines

The format of the fixed message header is as follows:

Offset
Dec  Hex  Length   Data Type   Contents
 0    0   2        integer     constant value of 2, the message type
 2    2   2        integer     originating node
 4    4   2        integer     destination node
 6    6   2        integer     originating net
 8    8   2        integer     destination net
10    A   2        integer     attributes
12    C   2        integer     cost (useless)

The attributes are a set of 16 bits, which have the following
meaning when set:

0x0001 - message is private
0x0002 - message to be crashed directly to destination
0x0004 - message has been read by recipient
0x0008 - message has been sent
0x0010 - subject specifies a file that accompanies message
0x0020 - message is in-transit
0x0040 - message destination is unknown (orphaned)
0x0080 - delete message after sending it
0x0100 - message was written locally
0x0200 - message is to be placed on hold to be picked up
0x0400 - officially unused, but has several unofficial uses
0x0800 - subject specifies name of file to file-request
0x1000 - return receipt requested
0x2000 - message is a return receipt
0x4000 - an audit trail is requested
0x8000 - subject specifies a file update request

Then comes a date string, which is always 20-bytes in length,
including the NUL terminator.  It is suggested you treat this
as a variable length string, for historical reasons.

The date string can take two formats, as examples

1. "01 Oct 95  20:31:59"
2. "Sun  1 Oct 95 20:31"

It is suggested that new programs always generate the first format,
but accept either format.

Note:

1. The day in the first format has values from "01" to "31" with a
leading zero.
2. The day in the second format has values from " 1" to "31", with
no leading zero.
3. In both formats, the hours, minutes, seconds and year are zero
filled.
4. The range of days-of-weeks for the second format is "Mon", "Tue",
"Wed" "Thu", "Fri", "Sat", "Sun".  No language translations are
allowed.
5. The range of month for both formats is "Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec".  No language
translations are allowed.
6. All the characters in the date are the corresponding hex values
from ISO 646 (same as ASCII).
7. Only 19 bytes are shown in the above formats, the NUL terminator
makes it 20 bytes total.

Following the date comes the "To name", which is a maximum of 36
characters including a terminating NUL.  There is no restriction
on the content of this field, although it is suggested that
implementations not put any ISO-646 defined control characters in
this field, or the following 2 fields.

After the "To name" field comes the "From name" field, which follows
the same rules as the "To name".

The next field is the "Subject" field, which follows the same rules
as stated above, except that it is a maximum of 72 characters, not
36.

After the "Subject" comes a text block, which has a mixture of
control information and message text in it.  A NUL character
terminates the text block, thus NULs may not be imbedded in the
text block. 

A lot of control information is present in this text block, via
control lines.  Control lines in general follow the following
format: 

^AKEYWORD:data<newline>

Where ^A is x'01', KEYWORD is any number of keywords, some defined
in this document, some not, and the data is a keyword-dependent
field, which may be nothing.  The data may or may not contain a
leading space.  It is highly recommended that if data follows the
keyword, a leading space is inserted, otherwise no leading space
is inserted.  <newline> can be either one of <CR> or <CR><LF>.  The
use of <CR><LF> for <newline> is an obsolescent feature that may
be removed in a future version of this document.  For x'01' to
introduce a control line, it must be the very first character in
the text block, or else follow a <newline>.  The KEYWORD is 
always comprised of printable ISO-646 characters.

The following control lines have meanings as follows:


o INTL <3d destination address> <3d originating address>

This control line is an exception in that it does not have a ":" 
after the keyword.  INTL is used to send inter-zone messages,
although it may also be used in intra-zone messages, although it
is not required to be present.


o FMPT <point number>

This control line is an exception in that it does not have a ":"
after the keyword.  It is used to specify the point number of
the originating node.  It is only to be used for messages that
are from points.


o TOPT <point number>

This control line is an exception in that it does not have a ":"
after the keyword.  It is used to specify the point number of
the destination node.


o MSGID: <address> <number>

The MSGID is an optional control line which is used to uniquely
define a message.  The <address> field is the originating address
of the sender, not necessarly an address in the current network.
If the address contains spaces, then it should be enclosed in
double-quotes.  The <number> is a sequence of exactly 8 hex digits.  
A node that generates MSGID's guarantees that no two messages leaving 
his node will contain a <number> that is the same as the <number> on
another message that was sent from this node in the last 3 years.


o REPLY: <address> <number>

The REPLY is an optional control line which is used to uniquely
define which message is being replied to.  The <address> and
<number> are both copied from the original message's MSGID
control line.


o AREA:<data>

If the very first line of the text block starts with
"AREA:", the message is an echomail message, in the
conference with name <data>.  On receipt of an echomail message,
a node will normally export this message to 0 or more destinations,
by sending multiple "netmail" messages to these destinations, with
the same "AREA" information.  This control line is an exception in
that it does not start with a ctrl-A.


o --- <data>

This control line, known as the "tear line" appears in echomail messages.
<data> is normally the name of the program which wrote the echomail
control lines.  A tearline may be empty, in which case it is just
"---".  It is unknown whether this line is required in echomail or
not.  It is suggested that new programs always generate a tearline,
but accept a message with or without one.  This control line is an 
exception in that it does not start with a ctrl-A.


o  * Origin: <data> (<address>)
 
This control line, known as the "origin line" appears in echomail messages.
The address should be the address of the originating system in the 
current network.  It should be a 4D address, although new programs 
should accept 2D and 3D addresses also.  This control line is required 
for all echomail messages, although it is suggested that new programs 
allow a message to be processed that doesn't have one, or doesn't 
have an address in it.  This control line is an exception in that it 
does not start with a ctrl-A.  It is recommended that <data> contain
the system name, but there is no requirement for this.


o SEEN-BY: <data>

The SEEN-BY control lines are used in echomail to tell which systems within 
the current zone, this particular message has been sent to.  They are
an exception in that they do not start with a ctrl-A.  An inter-zone
message may have the SEEN-BYs stripped before crossing the boundary,
except for both it's own address and the recipient's address.  The
inter-zone recipient may then strip all SEEN-BYs except for it's
own address, in case the sender failed to do so, plus this way it
picks up the one extra address.  The SEEN-BYs should be sorted in
numerical order.  The SEEN-BYs should always start off with
a net/node pair, although new programs should accept data starting
with a node, so long as it is not the very first line.  An example
set of SEEN-BY lines would be:

o SEEN-BY: 711/444 555 666 712/111
o SEEN-BY: 712/222 333

The SEEN-BY line from the first S to the last number before the newline,
inclusive, should not exceed 69 characters in length when generating
them (at least for new applications), but new implementations should 
accept at least 79 characters for this, although it is suggested that 
the SEEN-BY block be treated as a single buffer avoiding any line length 
limitations.  The SEEN-BYs must be in a single block.  Applications must 
be able to handle a minimum of 450 addresses in total.


o PATH: <data>

The PATH lines are used in echomail to tell which net/nodes a message
travelled through to arrive at your node.  The addresses are in the
same format as the SEEN-BY.  However, the addresses are not stripped
at a zone boundary.  The length of the PATH line, follows the same
rules as the SEEN-BY, with 'P' substituted for 'S'.