My accomplishments 2004.August:
- Aug.11: Thought of wonderful new hack to allow me to efficiently
use lynx to do all the cookie management for a Yahoo! Mail spider (automated
browser) so I don't have to "re-invent the wheel" by writing cookie handling
myself from scratch.. Briefly mentionned the idea in
this
article the same evening, see the very last paragraph.
- Aug.12: Wrote two semi-parsers, for YM WebPage showing single message,
and for YM WebPage showing TOC for up to 200 messages, as tools to be used
by my new YM spider.
- Aug.14: Wrote two more semi-parsers, for YM toplevel WebPage just after
logging in, and for YM WebPage listing statistics of all the folders, again
as tools to be used by my new YM spider.
- Aug.14-16: Finally implemented the actual hack for downloading
individual YM WebPages and parsing them locally to produce links for
downloading new WebPages, using all those various semi-parsers I'd written
in previous days. This
article describes this new hack for spiders with cookies.
- Aug.24-27: Upgraded my hellos.html
to go beyond just "Hello World!" to include some dynamic content that
static methods such as HTML can't accomplish.
- Late August sometime: Wrote a script for re-testing all the
spam-complaint addresses I've collected in recent years, discovered about one
out of six are no longer valid, deleted all that obsolete data from my CTW
(Complain To Whom) database. Decided that made my database rather incomlete,
so next I wrote info-collector (gleaner) for WHOIS records,
specifically to collect the many different ways the WHOIS-record authors
express the concept of complaining about spam or abuse at such-and-such
e-mail address. Used it to try to collect CTW data for all the IP numbers
that sent me spam in recent years for which I no longer had any CTW data
(because of the purge done just before). In
this
article I described the algorithm, including a detailed
list of all the different ways the WHOIS-record authors expressed spam
or abuse to flag the complaint address which my new algorithm now
collected.