Sunday, January 18, 2009

give IT yahoos United States (dollars)

So, there's this rumour-surrounded gadget that GIYUS wants people to install on their computers as part of the War on Terror. Obviously, I wondered exactly how it worked; did it analyse the Web sites you visit semantically, so as to target its talking points precisely? Did it use some sort of social recommendation mechanism? Also, I was wondering if there was any way of characterising the network traffic it generated and estimating how many people are using it.

So I did the obvious thing and I actually downloaded it. It's packaged as a Firefox extension (.xpi); extensions consist of JavaScript files for the application logic and XUL (XML User interface Language) for the look'n'feel, all wrapped up in a ZIP archive. If you don't have the source of one, all you need to do is pass it through an archive tool and extract all files, and then you can read them in a text editor.

And actually, it's kind of disappointing; no folksonomy, no textual analysis, not even crude keyword matching. It just grabs an RSS feed from, passing in the string "GIYUS", presumably to ensure it gets the right one, checks if any items in it aren't already cached, and if so, fires a graphical alert containing the message. It's basically a e-mail list gussied up in Web2.0 finery, with the feature that it's marginally less trivial to forward the content to nonsubscribers. It doesn't even appear to spy on your browsing history.

Of course, there could be some server-side magic involved. You can usually get a rough idea of location from an IP address, and a rough idea is probably best in terms of hit-rate (you've a much better chance of getting your geotargeting right for "North London" than "Archway"). And you can draw some conclusions from browser credentials - OS, screen, browser type and version etc. For example, perhaps you'd want to serve the red meat civilian deaths are all a fake stuff to MSIE5/6 users in teh US heartland and the Decent Left stuff to Mac users in North London. So I considered actually installing the extension; but then I realised I didn't actually want a simulated Melanie Phillips on my sofa any more than I wanted the real thing. However, it's possible to view the feed on the Web anyway, so I checked.

But they may not even be doing that; I'm on a weird niche ISP, with a linux machine, in North London, and the feed I see at is deeply generic.

Surely, though, it's possible to do better than this? I envisage a sort of Web force multiplier, that would analyse the texts you read as you browse and compute some kind of digest hash, and do the same for every link you send anyone else, stashing the hash of each link in a remote server. As you browse, it compares the hash of the current page with the ones in the DB, and returns a list of possibly appropriate arguments - the strength of this being that they could be data, poetry, code, pictures, video, or indeed anything. We could incorporate some sort of social element, too, to keep a check on quality.

Who here knows about corpus analysis? Most of the academic papers my casual search found gave me that "dog listening to music" feeling. What I need is something like a rather bad crypto hash function - one where two texts with different content would produce non-randomly different hashes. Obviously we'd filter the text with a list of stop words like search engines do, so as to strip out the tehs and ands. We could, for example, use (say) the distribution of words in Wikipedia as a common baseline, and measure how the distribution of significant words in the target texts differs from it.

1 comment:

NotRichard said...

Shame, I could have had some real fun with this if it were cheeky enough to monitor your browsing habits..

Of course "cheeky" isn't the right word, given what someone browsing from to the above site could be giving away to those charming folk in Israel's dark actor community. But then we're verging towards conspiracy theories, eh?

I mean, it's not like Akamai was created by an 'ex' Mossad guy or anything (who went on to be part of something, umm, much biger)

kostenloser Counter