Tuesday, November 11, 2003

More statistical weirds

I notice that Yankeeblog is having a similar experience to us with random googlers turning up on their site despite searching for something - well - random.

"As someone who finds internet search engines fascinating, I'm always curious to see what searches bring people to this blog (with the exception of the person looking for "Tom Matthews Ultimate Frisbee," I'm sorry to report that few googlers seem to have landed at an appropriate place).

At any rate, it turns out that Yankeeblog is the third site returned by MSN when you search for "I want Pakistani sexy clips". Either there's some part of this blog that I don't know about, or someone recently came in for a big disappointment.

What I'd love to see is a site where you could enter a URL and have it return search queries that produce the former. While it would be practically impossible to do a comprehensive reverse search engine, Google could pretty easily put together one based on the searches it conducts every day -- anyone know of anything like this?"

In fact, looking at those search results, the mystery Pakistani porn freak must have been devastated. The results are completely - but completely - irrelevant, rather like the people looking for folk tales who seem to be channelled by Google to this blog. It's a good reminder of how the system can break down - search engines work by following a simple set of rules many, many times without any higher or human intervention. It's a good example of how such a restricted set of reactions can give rise to a self-organising order, like natural selection, capitalism, or ants. The problem is that although over time the success-rate averages out nicely, that implies a certain number of extreme results. Years ago, all search engines were a bit like that - you knew you'd find something, but probably not what you were looking for. Google revolutionised that by using the number of sites linking to a document as a gauge of value, using the self-organising nature of the web. Which is why results like this don't come up so often.

What's the next jump? The problem with search is manipulation, as in so many things on the net these days - spam, spoof log-on sites, spyware. Google especially has become a huge determining factor for a lot of sites - get a high ranking result and the traffic will pour in. At the same time, search results are becoming increasingly commercial. The danger is that under pressure from manipulators on one hand and businesses paying for higher rankings on the other, the search system becomes untrustworthy. And the web without search is not a serious proposition. The challenge for the search engine bods is to find a way of guaranteeing the integrity of their results.

No comments:

kostenloser Counter