Sunday, April 17, 2011

lobby: update

I'm beginning to make some progress with the lobbying project. Last week I got it spitting out data; in mid-week, I optimised the process of loading the meetings from the ScraperWiki API into NetworkX. Hint: the obj_hook keyword argument in python's json.load() function is really useful!

This weekend it's producing information about lobbies, ministers, and government departments. I've got implementations nearly ready for a couple more dimensions of data - providing each actor's network degree by month, and trying to measure the extent to which ministers act as gatekeepers or flak-catchers. The first of those involves reimplementing a bit of NetworkX - you can't ask for node properties excluding certain edges by attribute, or at least you can't do so without creating a new subgraph, which seems ugly. The second, at the moment, counts the edges of a node if they have a higher weight than that of the node itself and expresses the sum of those edges' weights as a percentage of the total meetings that minister had. That doesn't take any account of time, yet.

I'm thinking of using Google App Engine to deploy it, running the data generator as a cron job and using the bulk uploader utility to slurp the results.

As a taster, the biggest single private interest lobbying Government is Barclays Bank, followed by Shell, the World Bank, the London Stock Exchange, BP, RBS, BAE, Standard Chartered, Lloyds, and Ratan Tata. This may not be that surprising. Neither is it very surprising, if somehow comforting in an old-fashioned way, that the two biggest lobbies of all are the Confederation of British Industry and the TUC, which is achieving about two-thirds the lobbying effort of the CBI and about twice that of Barclays. I was surprised to find that lobby 26 is Facebook, above Tesco, Microsoft, or UNISON. (Google is far, far down the list.) The highest placed individual trade union is the CWU at 24, between HSBC and the Electoral Commission. The littlest lobby is a nursery school in Leeds that got herded into a Big Society meeting with Nick Hurd MP.

I'm not so sure about using this model to assess the ministers, as we're using a priori weightings on them. But the decision to lobby a given minister must contain some information about the lobbyist's perceptions of their power and influence. Britain's most lobbied minister is Chris Grayling MP, Minister of State for Employment, who achieves a weighted degree of 4.2, not far off twice the prime minister. David Willetts, Vince Cable, Nick Clegg, and Francis Maude are the next four before the prime minister. They range between 2.8 and 2.6 with the PM on 2.3. Britain's least influential minister appears to be Baroness Warsi, minister without portfolio, on a score of 0.057.

BIS is the most lobbied department on 12.42, followed by the Department for Work and Pensions on 9.65, the Treasury on 7.065, the Cabinet Office on 5.62, and the DCLG on 3.825. Delight to the econophysicists (are they still around?): the distributions seem to show a nice power-law relationship! Which tells us what precisely? Well....not much except that it's a social network and they usually have them!

There were 2,073 nodes, either ministers or lobbies, in the graph at the last data upload. 2,848 interactions between them were analysed.

Does anyone have any ideas for other metrics that might be interesting?


Lord Blagger said...

What's the thinking behind taking it off scraperwiki?

Alex said...

Because Scraperwiki is for scraping.

Anonymous said...

Rather than using on off line cron job and uploading, why not use the new pipeline library to massage all the data on appengine:

Have you published the source for this somewhere?

gawp said...

You can generate a set of weighted values by lobbyist for ministries they lobby. This would allow clustering of lobbyists by activity; banks should cluster together, for example.

Principal Component Analysis on this would probably be best for this, as you would be able to cluster by direction in lobbyspace verses position, this normalizes for intensity of lobbying. This will show what percentage of lobbying effort is explained by the first 2 PCA vectors; this might lead to some nice clustering right there. If, say, 50% of lobbying is on the first vector investigation of that vector will tell you a lot about standard distribution of lobbying efforts. Analysis by sector will tell you something too.

And similar analysis on the ministries might work too; a weighted vector for each ministry of who is lobbying them. What ministries are most similarly lobbied? PCA of this would show what spectrum of ministries is lobbied and by what proportion.

Much of this is probably predictable, but it will give a nice visualization and anomalies are often informative.

Do you have a data set handy? I'd love to have a go at it...

Alex said...

Anon: AppEngine doesn't have this lib: NetworkX.

Alex said...

Gawp: PCA is a cool approach. NetworkX's clustering algorithms might provide something similar (list). As far as visualisation goes, I've been looking at strategies - I like the idea of a radial-graph look, and the CAIDA Skitter graphs are an inspiration but that does mean reimplementing it for an undirected graph.

You can get the data out of ScraperWiki, but I need to backport some data cleaning from the project into the scraper.

A said...

The categorical nature (dividing lobbyists by industrial sector) of the data makes Correspondence Analysis particularly applicable. As well there are a bunch of machine learning methods that would work nicely. There's a standard set of bioinformatics data exploration tools I'd like to try throwing the data at to see what happens.

As for visualization, there are some pretty pictures that fall out of PCA and correspondence analysis.

My problem with large network graphs is they rapidly turn into "ridiculograms", uninformatively dense meshes. This is common with protein-protein interaction graphs where there is a lot of noise in the data.

Will download the data, please post when it's tidied; long weekend coming up to play with it!

kostenloser Counter