Saturday, March 15, 2008

Anti-Link

One of the many wonderful things about the Web is that its hypertext structure not only permits us to navigate it, and to invoke external resources (scripts, graphics, etc), but also to measure relevance and authority. Google's killer insight was of course just this; to use links as votes for the relevance of a given document, and to do this recursively so that the more authoritative the document, the more powerful its outbound links.

But there is a fundamental problem here; the introduction of the REL="NOFOLLOW" tag was meant to stop spammers manipulating this structure by autogenerating great numbers of links, but this is only a partial solution. After all, the fact that somebody considers a document unreliable, irrelevant, spammy, or just...repellent is useful information; but there is no way of capturing it. Ideas like the "Semantic Web" have examined things like the idea of creating links that go backwards as well as forwards; I for one have never been able to understand this, and it sounds far too much like INTERCAL's COME FROM... statement. (You thought GOTO was considered harmful; COME FROM ... is the exact opposite.)

What I propose is that we introduce a negative hyperlink. A kind of informational veto. I've blogged about the Stupid Filter before, which attempts to gather enough stupidity from the Web that it can characterise stupid and use Bayesian filtering to get rid of it, as we do with spam. But I suspect that is a fundamentally limited, and also illiberal, approach; StupidFilter is indexing things like YouTube comments threads, which seems to guarantee that what it actually filters will be inarticulacy, or to put it another way, non-anglophones, the poor, the young, and the members of subcultures of all kinds. The really dangerous stupidity walks at noon and wears a suit, and its nonsense is floated in newspaper headlines and nicely formatted PowerPoint decks. StupidFilter would never filter Dick Cheney.

But a folksonomic approach to nonsense detection would not be bound to any one kind of stupidity or dishonesty, just as PageRank isn't restricted to any one subject. Anyone could antilink any document for any reason, across subjects, languages and cultures. Antilinks would be simple to capture programmatically - just as simple as other HTML tags are. In Python, it would be as simple as replacing the search string in a BeautifulSoup instance - one line of code. Even without changes to today's Web browsers, a simple user script could flash a warning when one was encountered, or provide a read-out of the balance between positive and negative links to a page.

Consider this post at qwghlm.co.uk; Chris is quite right to mock the Metropolitan Police's efforts to encourage the public to report "unusual" things. After all, there is no countervailing force; if you collect enough noise, statistically speaking, you will eventually find a pattern. What you need is the refiner's fire. Why is there no Debunk a Terror Alert hotline?

I am quite serious about this. Implementation could be as simple as a REL="BULLSHIT" attribute. Now how do you go making a submission to the W3C?

No comments:

kostenloser Counter