proposal - new job-ad filtering service

It's been more than a dozen years since I've been able to find even one job on the various job boards that I fully or even mostly qualify for. For example, here is a record of some recent attempts finding jobs using various job-ad boards listed on a hand-out from Project Hired.

As you can see, some sites are completely non-functional for me, while other job sites list only jobs for which I lack many of the primary requirements.
My experience is that if I get very specific with my search, such as looking for jobs programming in Java, I get repeats of the very same requirements such as JBoss and WebSphere over and over and it's a waste of my time to look at hundreds of such ads. It would be really nice to be able to filter out all the jobs that require these repeated skills and spend my time looking at just the remainder, if there are any. (I haven't found even one in recent years.)
If I leave my search request very general, such as looking just for software programming without mention of any particular language, I mostly get jobs in languages where I have no significant skill levels, and in addition there are a hodge podge of miscellaneous requirements for experience in areas I don't have and often don't even have any idea what they are talking about. It would be really nice if I could collect a list of all these skills I don't have and see only what's left, gradually tuning my filter to eliminate more and more jobs I don't qualify for, to see if there are any jobs in any area whatsoever where I might qualify.

Unfortuately I haven't been able to find any job-search site which supports such filtering. I'm thinking of building my own Web site which harvests job ads from other sources and tags them according to these required skills and experience, and allows users to automatically filter job ads to eliminate jobs they obviously don't qualify for, and to tune the filter by just setting checkboxes next to the skills they don't have and submitting the form to update their profile of non-skills and search again.

Obviously it wouldn't save any of my personal time to build this Web service all by myself, manually tagging all the job ads in the world by looking at every last one of them, and then filtering so I don't have to look at what I already spent my time looking at and tagging. It requires lots of volunteers to share the task of tagging the job ads, whereupon each can benefit from the tagging done by others. The more volunteers there are, the more effective the service would be for each member of the "cooperative".

But even if there are volunteers willing to spend time tagging some ads in order to benefit from everyone else's tagging, I wouldn't be able to set up this system in a user-friendly way in the first place without some other people to give me feedback on my proposed design and user interface and to test my early implementations. And I wouldn't be able to attract tagging volunteers without some brainstorm volunteers to help me pass by word of mouth that my site exists and is ready to use. And if I can find some others to write some of the code to implement the Web service, then I can probably make a lot more progress than working all by myself.

Accordingly I'm soliciting volunteers at this time for several areas:

Discussing the general design of the service, and how best to implement the service to require all users to volunteer their share of tagging proportional to usage, without scaring away potential users because of too much up-front labor required to join the cooperative.
Detailed discussions of services and user interfaces, leading to first prototype implementations.
When it reaches this stage, help with implementation.
Promise to volunteer some tagging as soon as the system is implemewnted enough for tagging to even be possible.
Promise to give me useful feedback as to functionality and user interface to help me improve the first-draft system toward better functionality and user interface in later upgrades.
Promise to solicit others to join the "cooperative" as soon as the system is working well enough to support the extra users without any horrible bugs that cause work to be lost.

At the moment my idea for protecting my system from abuse (by distributed denial-of-service floodbots, berzerker tagbots, and real live freeloaders) would be something like this:

To get permission to establish a new account, anyone must first pass a simple Turing test, implemented in PHP because that's the most efficient server-side scripting language available, thus floodbots would overload the TCP/IP service and cause the admin to block access from IP blocks before the load on the system by my server-side Turing test would get me in trouble. I have a crude prototype of such a Turing test implemented in pure HTML only because it was faster for me to do the first prototype in HTML rather than teaching myself the PHP libraries necessary to do it in PHP. See here:
tinyurl.com/uh3t > Contact me
The PHP version would have the user type in the word instead of selecting from options, thus making it resistant to a spider exploring all branches to harvest mailto links. Still the pure HTML prototype hasn't yet been harvested by a spider to result in any spammer's bot sending e-mail to the address given two levels deep in the Turing test, possibly because the large number of wrong answers linking to "random" Web sites on strange topics occupies so much of a spider's time that it never manages to find my mailto link. Still, I would like to simplify the question to make it less of a deterrent to potential new users.
Update 2009.Feb.02: The first draft of the PHP Turing test is up&running.
Update 2009.Feb.04: It's now improved to parameterize on the first octet of the IP number of the HTTP client, accessible via tinyurl.com/filjob
Next, to further discriminate between serious new users and freeloaders, I'd have some more elaborate Turing tests, probably implemented in Common Lisp where I find the coding easier. These Turing tests would also collect personal information, such as the user's favorite security questions, in addition to name and desired password. Some of this information would be used in case of lost password, and some would be used to detect the same person trying to establish more than one account, although with the requirement to perform service before getting a like amount of service back, there's really not much of an incentive to make multiple accounts, so I might not actually implement any serious protection against multiple accounts by the same user.
Finally a new account would be set up for that new user, and the new user would be invited to tag a few job ads, some already tagged as quality control and some not previously tagged as productive work to help the "cooperative". If the quality-control ads are tagged correctly, then the new tagging would be accepted and the new user would finally be able to conduct his/her first job search using our system.

After a new account has been set up and the first job-ad search has been conducted, using the usual types of positive filtering for location and general type of work, the user would get back the first result set, showing just a few jobs together with clear info about skills required. Each required skill would have several associated radio buttons, to show no significant skill whatsoever, or much less skill, or approximately the required skill (or more). The default would be a radio button that indicates refusal to yet provide that info about this particular skill. Or I might have varying number of radio buttons depending the number of years or level of skill required. For example, if the required skill is 5+ years experience, then the radio buttons might be None( ) JustALittle( ) AFewMonths( ) 1-2Years( ) 3-4Years( ) 5+Years( ) and NoAnswer(X). If the required skill is proficiency, then the radio buttons might be NeverHeardOfIt( ) FoundOnGoogleButNoMore( ) FamiliarWithItAlready( ) SomeExperienceCanLearnMore( ) CompetantButNotProficient( ) Proficient/Expert( ) and NoAnswer(X). At both top and bottom of the list of jobs would be a SUBMIT button to update the profile per the radio-button selections made. Depending on how much surplus tagging work the user has done compared to services received, after the profile has been updated the user might see a request to provide more tagging services, or an offer to do an updated search right away, or a statement that YOU HAVE REACHED THE END OF THE INTERNET, THERE IS NOTHING MORE TO SEE (actually that among all jobs already tagged, every last one of them can be rejected as this user not qualifying, so there's no point doing another search until some more jobs that require different skills get tagged, so this user has the choice of tagging more jobs at this time to build up extra credit toward future services, or waiting for somebody else to tag some new job ads and just use the credit already accrued at that later time).

Preliminary ideas for user-interface and automatic tagging of job ads: First, at the time the job ad is automatically harvested from another job-ad board, the HTML is parsed automatially to locate major sections, and the format is compared with already-tagged job ads to find samples for that particular kind of format such as might be used by a particular recruiter or employer. If possible, the section listing required job skills/experience is automatically located. Otherwise, a volunteer must edit the job ad to place square brackets around that particular section, leaving the rest of the ad unchanged.

Next the individual requirements within that section must be located and converted to standard options of number years experience or degree of expertise. Locating the specific skills could be done by keyword combination match, except for new skills never before seen which would needed to be bracketed by volunteers. Options for experience or expertise could likewise be automatically done by keywords, else manually converted by volunteers to build a thesaurus of synonyms such as "proficient" = "expert". In the foreseeable future, only required skills/experience will be considered. Preferred (but not required) skills will be totally ignored during the filtering. It's up to the user later, upon seeing a job ad such that the user fully qualifies per required skills/experience, to make a judgement whether the preferred skills are a reasonably good match.

Likewise the info about the location of the job must be found (by automatic format comparison, or manual bracketing), then converted to standard region names using a thesaurus of location names or some online geographic database giving latitude and longitude of named cities and other regions. For example, Liverpool might be converted to London, and Sunnyvale might be converted to San Jose or even to San Francisco, and of course Brooklyn and Manhattan would be converted to NYC, as a first approximation for the actual location of the job. After the user finds a job for which he/she is fully qualified, in his/her local metropolitan area, that user can then make a judgement whether the commute to work would be a little bit too far or not, possibly depending on the pay rate and location near public transit or free parking.

For the time being, I don't intend to keep track of expiration of job ads and purge ads for jobs which are no longer available. This is because for somebody who hasn't seen a job they qualify for in months or years, the information that you just missed a job you qualify for, the first you've seen in months or years, is actually useful information. It means that company might soon have another job you also qualify for, and it might be useful to establish a contact inside that company even before a new job opening is announced. It also shows that this filtering service actually works and you just have to use the service more diligently every day so you won't miss the next job that turns up that you qualify for. If and when I get enough volunteers that most new job ads are promptly tagged, so that job openings become available through our system almost as soon as we harvest the raw job ad from other job boards, which is almost as soon as the job gets formally announced, so that tagging job ads isn't the biggest bottleneck, and some users complain they would like to have no-longer-available jobs specially flagged and possibly filtered out, and we have funding or volunteer labor to write the code to keep track of job-ad expiration dates, then it would be worth doing at that time.

If and when my job-ad filtering service is popular enough that employers/recruiters want to submit some job ads directly to us instead of to Monster or DICE, and so we eagerly install that option, we'll gladly also provide a way for submitters to later mark their individual job ads as expired, and also to re-activate an expired job ad later if circumstances warrant.

Update 2009.Feb.05: I've tentatively decided to use both "weak" keywords gleaned by automatic scanning of the text of job ads, and "strong" keyword tags specifically provided by an expert/volunteer, for filtering job ads. The way I expect it to work is that if you specified a particular job skill or experience that you do *not* have, then if that tag is listed as a "strong" requirement then you won't be shown that job ad in any case, if that tag is only a "weak" keyword then that job ad will be shown but as low priority, while if the keyword doesn't appear at all then the priority will be normal (possibly lowered or excluded by *other* keywords or tags). With this plan, at the very start when very few job ads have received strong tagging then still the user (jobseeker) will be shown first the ads that don't even mention skills/experience that user doesn't have, so the service should be marginally useful right from the start. The user would than be asked to mark the tags in whatever job ads are shown to that person, for multiple purposes of setting up tags in this one job ad for benefit of others later, tuning the automatic keyword gleaner to recognize more keywords/phrases that refer to specific job skills/experence, and to update this user's profile to show which of the newly-created tags the user has or doesn't have. Note that for this application "weak" keywords are actually used *only* when the particular keyword has already been tagged at least once as a "strong" tag, because otherwise it wouldn't appear in anybody's profile as an explicit non-skill to use for negative-filtering.

My current idea for the first use case for a new user would be that the user asks to see "all" job ads, or all job ads satisfying some keyword. The server collects statistics of all strong tags in the selected set of job ads, and shows them in decending sequence by occurrance as strong tags, and also shows ratio between weak occurrances in these job ads and occurrance in ordinary language, with each group of equal strong-tag count sorted per the weak occurrance ratio. Alongside each keyword or tag there would be radio buttons to say "I have this skill" or "I don't have this skill", and for those keywords that have zero count as strong keys there'd be an additional option of "Not worth using" to indicate this is a "fluff" word that doesn't represent any type of skill or experience. Any have/notHave items would be added to the profile for purpose of filtering job ads. Any notWorthUsing items among those with zero strong-tag count would be added to the profile for purpose of *not* listing among weak keywords during later scans.

After that first use case, later use cases would use the individual user's profile for filtering the "all" set or the all-with-keyword subset. Among the not-eliminated job ads, strong tags and weak keyword occurrance ratios would be again tallied and sorted to find the most commonly occurring tags and keywords, but tags already in the user's profile and weak keywords already flagged as "Not worth using" would be excluded from the report. Radio buttons would otherwise be the same as above. Thus with each pass through the use case, the user would be guided to specify more and more have/notHave skills, until at least one fully-tagged job ad is found for which the user satisfies all required-skill/experience tags, or *all* fully-tagged job ads have been filtered out but there's at least one not-yet-tagged job ad satisfying the search criteria.

After the basic tagging&filtering system is well established, I would like to increase the specificity of strong tags to say not yet/no but the number of years of experience. I'd use -2 to indicate the user doesn't even know what the term means, -1 to indicate that the user has looked up the term and read the definition of the term but has no formal training, 0 to indicate that the user has completed an online tutorial such as the courses offered by ManPower or has read a book but has no "hands on" experience, and values greater than zero to indicate some amount of "hands on" experience. "Lab" assignments during course work would count as "experience", but only in terms of how much time the lab assignments would have taken if the person already were skilled in that kind of task, not counting the time it took to figure out how to do that kind of task. Thus a person who took a three-month course might acquire 10 minutes effective time per two-hour lab assignment for a total of about 1.5 hours i.e. the equivalent of 0.0014 years full-time (40 hr/wk) work experience. Users would update their profiles to show their experience per this calculation, and job ads would be updated to have amount-of-experience-required per the same kind of calculation. For job ads that just say something vague, a guess would be made as to what equivalent work experience to use per the above idea.

.
.
.
.
.
.
.
.
.