Plans for a distributed IPnum-to-CTW (Complain To Whom) database

When you discover that spam (specifically UCE/UBE i.e. Unsolicited Commercial/Bulk E-mail) has intruded into your e-mailbox/inbox, you want to complain to some administrator at the place it came from, but how do you find out that place and that administrator? If you see the message with terse headers, everything you see can be forged by the spammer, so you have no idea where to complain. But if you see the message with *full* headers, including all the Received: lines, then the only stuff you can trust is what was added by your own ISP, namely all internal routing if any after it was already received by your ISP, and the one line that shows how it initially entered your ISP. See here for an example of decoding the header like that:
**NotWritten**

So now you know how to find that one Received: line documenting how the spam arrived at your ISP from some other place on the InterNet. Here are two samples of that one line, one for spam that intruded into my Yahoo Mail account, and one that intruded into my ISP account:

Received: from 203.90.87.75 (EHLO bg.mx2.e-tapaal.com) (203.90.87.75)
        by mta555.mail.yahoo.com with SMTP; 21 Apr 2002 15:22:43 -0700 (PDT)

Received: from workfromhomenewsletter.com ([63.230.24.146])
        by mail.netmagic.net (8.11.6/8.11.6) with SMTP id g37JUXx09596
        for ; Sun, 7 Apr 2002 12:30:40 -0700

The highlighted IP number is the *only* part of that line that can be trusted to tell you where the spam came from, either where it directly came from, or where the last relay was before it arrived at your own ISP. Anything to the left of that number in that same Received line shows what the spammer claimed as host name, which may or may not be true, or reverse-DNS information, which while slightly true may change from day to day to try to evade catching the spammer. Anything to the right of that number is just local information about your own host, time of day, etc., all of which you can trust but none of which gives youi any useful info info about the spammer's location.

Notice how there are two different formats, where the IP number is within round-parentheses (format used by Yahoo) or within square-brackets within round-parentheses (format used by NetMagic). Your own ISP will use one of these two formats consistently, so you just need to look at your headers once to see what format your ISP uses for incoming e-mail from outside sources, then you know better what to look for the next time you get spam.

So-far you've gotten spam, decided to complain about it, looked in the header to find the key Received line added by your own ISP upon receipt of the e-mail, and copied down the IP number in that key Received line, so you know the IP number of the host which trespassed in your own ISP's SMTP server to litter your inbox with the spam, which in these two examples were 203.90.87.75 and 63.230.24.146 respectively. So next you need to find out, for those given IP numbers, where to complain. As the old pirate would say "Aye, there's the rub!!" At present there is no database or lookup engine anywhere on the net that will reliably map from those IP numbers to a complaint address on the ISP that owns that IP number. SpamCop has a lookup engine that tries to provide that information, but most of the addresses it gives out haven't been checked by anyone, and many of them don't work at all, so if you try to send e-mail there your e-mail bounces back undelivered. Perhaps the worst case is when the host named in that complaint address does exist, so your own ISP's mailer daemon tries to connect to a SMTP server there, but in fact there's no SMTP server on that host, so your own mail system tries for five days to connect there, never succeeds, sends you a "transient non-fatal errors" message after a few hours or two days, then sends a final non-delivery notice after the full five days. So all that time you may have thought you'd successfully complained, but in fact nobody has heard you, and now you have to try again to find some valid complaint address for that particular IP number where you are sure the spam came from. Meanwhile the spammer has had five days to spam without the spamming host's admin getting any complaints.

To alleviate this problem of bogus (non-working) complaint addresses given by SpamCop and other souces, I've tried to collect complaint address that actually work, so whenever I get a new spam from an address block I've already researched my new complaint will go immediately to the fully working complaint address I found before. But there are many hundreds of IP address blocks that have spammed, and it's just too much work for one person to track down all that information and make sure it's correct. Hence my new project: the Distributed CTW database, whereby each person takes responsibility for maintaining mappings from IPnumber for a range of numbers, typically all numbers whose high-order byte is the same as the volunteer's own ISP's IP number. For example, my own ISP's IP number starts with 198, so I plan to personally maintain the data for IP numbers that start with 198, here. Notice how it has DownLinks that tell where to complain about various 198.nnn/16 blocks, and in one case there's a DownLink pointing to a sample second-level node which I also plan to maintain. In this case the second-level node isn't very useful because it has entries for only one ISP, but it illustrates how second-level nodes fit into the overall system. (Note, for such cases where only a very few different IP address blocks within a single /16 block have spammed, it seems a bit of a waste to dedicate an entire file for the second-level node, as I did there, so perhaps collecting several micro-nodes into a single file, with an abbreviated format for each, just the DownLinks section for each, would be a better approach. I'm planning to set up a demo of that format sometime soon.) My CTW-198 toplevel node also has side-links which list other /8 blocks for which toplevel nodes currently exist, namely just one that I set up myself to provide an example of how all the toplevel nodes have SideLinks pointing at each other. Each toplevel node's SideLinks section also has direct CTW addresses for all /8 blocks that have single complaint addresses for the whole block hence don't need any toplevel CTW node to break down their address space into /16 blocks.

What am I asking you to do? You volunteer as follows: You send me e-mail wherein you tell me which toplevel node you'd like to maintain, i.e. you tell me the first byte of your IP number (or if that first byte is already taken, another first byte where you have inside contacts) which will be the first byte of the IP numbers for which you maintain the CTW (Complain To Whom) data. After I create a first-draft of that html file for your node, you copy it to your own Web space, correct any mistakes I made that you can spot quickly, and tell me the URL where you have it online. I'll verify it's accessible from here and that the format hasn't been messed up, and if it's OK then I'll link to it from my CTW-198 node so that others can find your node via mine. After that you just maintain it to include new sources of spam within your first-byte range, CTW addresses that were good but stop working (you gotta track down a new working address, upstream if necessary) for each in your range, and accept e-mail from mei and from other volunteers asking about any new sources of spam within your range that you don't have listed yet.

What services do I plan to provide to volunteers?

When you first volunteer, I'll tell you whether anyone else has already volunteered for that particular first byte, and if not I'll run a program to generate the DownLinks section for your particular first-byte, then make a copy of my own node and replace the DownLinks section with what I generated specially for you, and edit the rest of the file to be a suitable first-draft. I'll put that first-draft on my WebSpace for you to copy, just as I did several months ago for somebody else whom I thought might volunteer, and just as I did very recently for my first actual volunteer (other than myself). I'll e-mail you when this first-draft of your node is ready for you to copy.
After you tell me that you've installed your toplevel CTW node, I'll look at it to make sure you didn't screw it up badly, then I'll link to it from my own toplevel node, and maybe also add a link from here as a "news" item.
If you have serious questions where I got some of the complaint addresses that were in the draft version of your CTW node, I'll let you look at the full record from which it was generated. For now, I'll manually call the functions that do the query, but eventually I might set up a WebServer application that does it for you automatically. Update 2002.Jul: I've created a WebServer search engine specifically for the CTW-194 volunteer, so now it'll be easy to configure it to serve additional volunteers.
If you ever get spam from some IP number that starts with 198 but isn't listed in my CTW-198 toplevel node, either because it's not in any full /16 block with single complaint address (it's broken into smaller blocks), or because I simply haven't updated my toplevel node since that site started spamming, I'll research the IP number on my own time and either update my node (if a whole /16 covers it uniformly, or if I have a micro-node to cover each /24 within that particular /16) or e-mail you the info.
I might set up a WebServer application where you can directly query my database for any 198.nnn.nnn.nnn that you're interested in, and get both the best-info and full-record.
As a special service to anyone who does a really good job maintaining his/her toplevel CTW node, I may someday set up additional WebServer applications you can use: In addition to the ability to search your own first-byte (to aid you maintaining your own toplevel node), and my own 198 first-byte (to give you the latest info ahead of what anyone else gets by looking at my static WebPage toplevel node), I'll let you query my database for *any* IP number. Also I might set up a search engine whereby you can paste in any IP number and have my software automatically traverse the linked CTW nodes to tell you what information other people have. But I can't make either of these services generally available, because they are running under my own UID, a personal account on a Unix ISP, not on my own machine.

Any questions?