Format of CTW HTML files
* Notation used in this document:
- ccc means the three-digit representation of the first byte of the IP
number.
- nnn and xxx are taken literally as strings of three n's or x's.
- For the moment this document describes only toplevel nodes.
(Within the downlinks section, the first byte is fixed while the
second byte varies. Within the sidelinks section, the first byte varies.)
- IP prefix means the first byte or first two bytes, in form nnn or nnn.nnn
respectively.
* Overall organization and toplevel tags:
- The file must be a valid HTML-format file, acceptable to all the
major Web browsers including lynx. In particular this means it must
not depend on tables or frames or javascript or images etc.
- There must be exactly one each of these two strings:
DownLinks (ccc.nnn)
SideLinks (not ccc.xxx)
And they must appear in the sequence shown.
These strings are *case*sensitive* and must appear literally exactly
as shown there with exactly one space (not tab) where shown to
separate the visible text. These strings are referred to (below)
as the downtag and the sidetag.
- Between the downtag and the sidetag there must be exactly one
, case-insensitive, in that sequence.
The portion of the file between these two tags constitutes the
downlinks section.
- After the sidetag there must be at least one , the first of
which must be preceded by exactly one , case insensitive.
The portion of the file between these two tags constitutes
the sidelinks section. (Any later items are ignored.)
- Except for the abovementionned tags which are used to locate these
two sections, and the general requirement to be acceptable to lynx
etc., the content of the file outside these two sections is
unrestrained.
* Format of each section (downlinks and sidelinks):
- Within each section, - and
tags (case insensitive) must
exactly alternate, with - first and
last. The text after
each - before the following
describes a single IP address
block or group of consecutive blocks which share the same CTW info.
- The text outside these -
pairs is unrestrained except that
it should be rather terse to make these sections easy to read.
- The blocks described by these -
pairs must be in consecutive
sequence by whichever byte is varying.
* Format of text (List Item) delimited by a single -
pair:
- There are two kinds of list items: Those containing anchored IP prefixes
and those containing plaintext IP prefixes. In the former case,
all anchors within the list item must be anchored IP prefixes.
In the latter case, there may be other anchors, but they must appear
after all plaintext IP prefixes, and they may not contain IP prefixes
themselves.
* Format of list item containing plaintext IP prefix(es):
- Plaintext IP prefixes must be immediately preceded by the - or
by whitespace.
- Plaintext IP prefixes must be immediately followed by whitespace, or by
three periods then whitespace. Three periods indicate the prefix is valid
but there is no CTW data available. Immediate whitespace without periods
indicates the prefix is not valid (forgery or reserved block etc.).
- If anchors appear after all the plaintext IP prefixes, they point to
search engines where CTW data might be found.
* Format of list item containing anchored IP prefix(es):
- There can be any number of anchors in the list item. Each is of the
format described next:
- Each anchor starts with
- The URL must be either mailto:user@host or http://host/dir...filename
- The URL may not contain any quote marks.
- There my be no text of any kind between " and >
- After "> follows the visible button label text then (case insensitive)
- The visible button label must be plain text, no HTML of any kind.
- There my be no spaces or other text between before or after /a
- Before the first and between each and the next
and after the last , there may be other text or HTML which is not
critical, but please try to minimize it to make the WebPage easy to read.
- The keyword UpStream (case sensitive) may appear before the first anchor
within the list item, or between the anchors, but not after the last
anchor in the list item. This keyword may appear at most once within
each list item, and may not appear inside an anchor.
- If UpStream occurs, then all anchors (if any) before it refer to direct
complaint addresses whereas al lanchors after it refer to upstream
complaint addresses.
- If no UpStream occurs, then all anchors refer to direct complaint addresses.
- Preferred format is / Upstream: if it occurs between anchors, and
UpStream only: if it occurs before the first anchor, however this
style preference isn't enforced by the parser.
* Format of visible text labels on anchor buttons:
- Three formats are allowed:
(1) An IP prefix followed by / followed by number of bits (8 times
the number of bytes specified in the prefix)
(2) An IP prefix followed by ...
(3) An equal sign all by itself
- A list item re-directing the user to a lower-level CTW node must
consist of exactly one anchor of format (2), and may not have any
UpStream tag anywhere within it.
- A list item giving complaint addresses directly may consist of any
positive number of anchors of format (1) optionally followed by any
number of anchors of format (3). If more than one anchor of format (1)
occurs, then:
- All but the last byte of the prefix must be the same for all anchors.
- The last byte of consecutive anchor labels must appaer in
consecutive numerical sequence with no gaps.
* Coordination between visible labels and URLs in anchors within a list item:
- A list item referring the user to another CTW node must have, in
the only anchor, a URL of the form: http://host/dir...filename
- In a list item giving complaint addresses direclty, each anchor
must have a URL of the form: mailto:user@host or Furthermore:
- Within the first set of anchors, each of the format (1),
the user@host must be the same.
- Within the second set of anchors, each of format (3),
the user@host must all be different, and each must be
different from the user@host shared by those of format (1).
- If you have time/energy to complain only to one address, you should
complain to the single address common to all the (1) format anchors.
But that address and all the addresses listed in the (3) format anchors
are appropriate for complaining about spam from any block indicated by the
prefixe listed in any (1) format anchors within the list item.
* Coordination between UpStream labels and anchors within a list item:
- If there are only upstream complaint addresses, no direct addresses,
the UpStream label (preferably UpStream only:) must appear before
the first anchor within the list item.
- If there are only direct complaint addresses, no upstream addresses,
there must be no UpStream label anywhere within the list item.
- If there are both direct and upstream complaint addresses, the UpStream
label (preferably / UpStream) must appear somewhere after the last
of the (1) format anchors, before or admidst the (3) format anchors.
The semantics is that the address common to all the (1) format anchors,
and the address within each (3) format anchor preceding the UpStream
tag, are direct, while the address within each (3) format anchor after
the UpStream tag are upstream.
- If the user wants to complain to just one direct address and one upstream
address, the user should click on the button showing the IP address
prefix of the spam source, then click on the first button after
the UpStream label within that same list item.
* Global coordination of visible button labels across an entire section:
- The first IP prefix (of format (1) or (2) anchors) within any list item
(after the first within the section) must have the variable-byte part
greater than the last IP prefix of the previous list item.
* Specific requirements of DownLinks section.
- Every (1) or (2) format visible button label contains the entire
IP prefix associated with the node (a single byte for toplevel nodes,
two bytes for second-level nodes), followed by one extra byte which
varies from anchor to anchor.
* Specific requirements of SideLinks section.
- Every (1) or (2) format visible button label contains the IP prefix
associated with the node but with the last byte removed (hence for
toplevel nodes nothing remains here, while for second-level nodes
only the first byte remains), followed by one extra byte which
varies from anchor to anchor.
Addenda: Need to fold these requirement in somewhere above:
- Any list item containing unassigned prefixes or no-info-available
prefixes... must contain no other text, except whitespace, from the
start of the list item to the first open-parenthesis,
and the variable byte of the prefixes must be in consecutive
increasing sequence.
- All the prefixes within a single section, whether in visible labels
of anchors, or in toplevel text for unassigned or no-info-available,
must be in ascending numerical sequence.
- Note: no-info-available merely means there's no CTW node for that
prefix, but typically there's a search engine available, which is
mentionned in the parenthetical remark. If there's no search engine
(except ARIN which doesn't list abuse nor admin addresses), typically
that prefix will simply be omitted from the section.