Sunday, July 06, 2008

The ViktorFeed: Documentation

Here is the presentation I delivered at OpenTech 2008:



I'd publish the text, but I didn't prepare a text:-)

Anyway, the ViktorFeed is a development of basic python scripts I've been using for some time to collect data on certain aircraft movements through Sharjah and Dubai Airports. Both of these place all movements on the Web, but neither of them provide anything like an RSS feed, which is why I began scripting, in order to save checking them myself. (You can read about this phase in the Political Pathetic Python posts on this blog.)

The current version works as follows: the web pages involved are loaded and BeautifulSoup instances created for each one. If a page fails to load and an IOError occurs, this stage is skipped for that one and a default message added. Data is extracted using BeautifulSoup's find method in list comprehensions. Each flight is represented by a tuple of values in a list. For each flight, the tuple is unpacked and each item in it assigned to a standard variable. If the airline name is found in a whitelist, the tuple is discarded. Otherwise, various standard items - for example, the name of the airport the flight arrived at or departed from - are added, the time variable is processed to provide both a readable time and a time in seconds since the epoch, and a database is queried to provide the geographical locations of the source and destination.

In the event that a location is not given or not found, a default value is specified and a message added. The default location is in the Bermuda Triangle, thanks to Soizick. The values are reassembled as a dictionary and appended to a list. When all pages have been processed, the content of this list is decorated with the time values in seconds since the origin, and sorted into reverse chronological order. This version is then undecorated, and the individual flights are used to create a Simple GeoRSS file through Python string formatting, which is encoded as Unicode and written out to disk.

Items in the file consist of the time and data group, in the title field, the source, destination, airline, and flight number in the description field, a GeoRSS Line tag with the source and destination geocodes, and the current time and date in the pubDate field. This data can be visualised in Google Maps more than simply. The test version was served from my laptop, using the SimpleHTTPRequestHandler and ForkingTCPServer methods and port forwarding.

Things to do: get it going on a permanent Web presence, refactor the code into a slightly less ugly mess, keep all the flights in a database, make it possible to query past movements.

No comments:

kostenloser Counter