My accomplishments 2004.October:
- Oct.01: Wrote add-on to recursive-descent parser for Received: lines,
whereby the parse-spec that succeeded is then used to explore the various
parts of the original string (the Received: line) and copy out each substring
that is wanted and discard the rest, returning an association list matching
wanted field-names with their string-values.
- Oct.02 morning:
Wrote add-on to that which checks the IP number for correctness
and converts it to four numeric bytes, and combines all the individual parsed
parts of the date&time&timezone into a single number converted to number
of seconds since the beginning of the year 1900 Universal Time. Now that
all three parts of this utility are working, here's an example of what it does:
- The RFC822 header line as it appeared in incoming e-mail:
Received: from 66.94.237.39 (HELO n5a.bulk.scd.yahoo.com)
(66.94.237.39) by mta402.mail.scd.yahoo.com with SMTP;
Thu, 19 Aug 2004 16:42:11 -0700"
- The parse specification which succeeded in parsing it:
((:EXACT "from") :W :EMAIL1 :W (:EXACT "(HELO") :W :JW (:EXACT ")") :W
(:EXACT "(") :IPNUM4 (:EXACT ")") (:|01| (:W)) (:EXACT "by") :W :EMAIL1
:W+WITH3? (:EXACT ";") :W :DATIME*)
- The association list matching wanted field names against values:
((:IPNUM4 . "66.94.237.39") (:DAYOFMONTH . "19") (:MONTHNAME . "Aug")
(:YEAR . "2004") (:HRS . "16") (:MIN . "42") (:SEC . "11") (:TZSIGN . "-")
(:MILHM . "0700"))
- The final result showing IP number as bytes, and time as seconds
since 1900.Jan.01 00:00:00 UT:
((66 94 237 39) 3301947731)
- Oct.02 afternoon: Wrote
preliminary-demo CGI application
to generate UniCode output in UTF-8 representation,
specifically some sample text in Spanish, Russian, and Hindi.
- Oct.03: Building on the Received: line parser and add-ons thereof,
and my database of CTW info, wrote software to trace backwards Yahoo! Mail
Received lines to determine whether the e-mail came from outside Yahoo,
in which case it gives the IP number of the injection point, or from inside
Yahoo, in which case it gives the IP number of the origin.
- Oct.07-09: Wrote utility to convert from the formats used by
more and vi on Unix to represent non-ASCII bytes, to those actual bytes,
to be used for collecting samples of UTF-8. Wrote utilities to convert between
UniCode integers and UTF-8 sequences. Wrote utility to convert from my
private accented-character notation, used in English-to-Spanish flashcards,
to UniCode integers. Converted my English-to-Spanish flashcard-drill
program to output in UTF-8, converting my private notation to UTF-8.
- Oct.10-12: Put together the tools I've developed recently for
dealing with spam that has arrived on Yahoo! Mail. The full program now
manages an interactive Yahoo! Mail session using the local-link trick
for reliably downloading Web pages in any desired sequence, keeps track
of the current account and folder and 200-message section of folder and
message position within folder so I can pick up where I left off, downloads
the next spam message WebPage, parses the WebPage to find header lines
and body of message, uses the new Received: line parser to reliably obtain
SMTP-client IP number and date&time&timezone, studies that info to
find injection point
(last-relay) IP number and detect whether it's local on Yahoo or injected
from some other ISP, fetches my archived CTW (Complain To Whom) info about
the IP address block, uses my archived traceroute info to look upstream
when there's no direct CTW data, formats the cover letter and sends the
spam complaint, keeps track of CTW addresses that no longer work, and if
a spam complaint is successfully sent to at least one CTW address it
immediately appends a log entry to a journal file to make sure that if
the program is aborted for any reason I won't lose that info.
- Oct.14: Wrote software to scan the collected last-relay info from
the 1590 spam messages (18.3 megabytes)
on my Unix shell account which I haven't deleted yet, and the
last-relay info I collected Sep.01-09 from 7360 msgs (47.6 megabytes)
on my Yahoo! Mail accounts, looking up the spam-complaint address for
each, classifying them as to which ISP, and counting how many spam
came from each particular ISP, to yield a list of all the ISPs that
sent me more than 100 different spam each in that particular collection.
That list is: Galaxyvisions Yipes Internap Comcast Savvis Verio
Exodus BlueRockDove ProcessRequest ServePath ServerPronto
IndustryTel Sprint NameIntel connect.com.au libero.it
telesp.net.br Genuity ChinaNet.
- Oct.15: Created Yahoo! Groups for each of those 100+ spamming ISPs,
where I plan to archive all the spam I got from each of them. The
Groups are named following a standard pattern, for example
SpamFromSavvis or SpamFromTelespNetBr. Each is customized to accept e-mail
from anyone without needing a Yahoo account, and have archives viewable by
anyone at any time.
- Oct.16: Wrote parser for message listing of Yahoo! Groups. Using that,
wrote utility to download the message listing for any public-viewable Yahoo!
Group, such as the SpamFrom... ones I created yesterday or the other similar
ones I made a few months ago, and check for any new messages since the last
time I checked it. Wrote software such that given the spam-complaint
address, checks whether I have a special Yahoo! Group for its spam, else
defaults to SpamCopies which I made months ago. Put the various pieces of
my Oct.10-12 program together in a new way to forward Yahoo! Mail spam
to the appropriate Yahoo! Group instead of to the spam-complaint address,
and then check the message listing for that group to make sure the
spam really got posted there before proceeding to the next spam in the batch.
Ran it on the first 107 spam in one batch. See for example
all of
those which came from Savvis, and
all of those which
came from misc. ISPs none of which individually achieve a count of 100
among this sample. Update: Complete set of links to these Yahoo! Groups.