3 May 2003 barryp   » (Journeyer)

Was just googling for some info about Python and threads, and got burned by something that's been bugging me about search engines...

You get an awful lot of false hits because of words that appear in the surrounding navigational "fluff" that appears on most webpages.

For example, just about every page in every Python mailing list contains a "next in thread" link, referring to the mailing list threads. So "thread" is a horrendous word to try and search for :(

Many mailing lists show the subject lines of next or previous messages, lots of pages have nagivational links where a word here and a word there might match what you're looking for, but are completely unrelated to any single useful page.

Would be nice if there was a standard way to tag within a page what the "meat" and/or "fluff" is, so search engines can focus on or ignore parts of a page.

If an outfit like Google defined something like this (with the incentive of somewhat improving your pagerank being dangled in front of you) and mail-list web archive software as found in Mailman and such being updated to use it - it could really help out in web searches.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!