Friday, February 16, 2007

Topology-aware P2P

A lot of ISP people are concerned about the volume of peer-to-peer traffic on their networks, especially thick stuff like video. To be more specific, they are usually concerned about the volume of P2P traffic their users draw from outside their networks, as after all, it's the extra upstream transit they pay for, or the upstream peers they piss off.

One well-known solution for the delivery of popular content across the Internet is a so-called content delivery network, in which the CDN operator places big servers in ISPs' data centres and fills them up with stuff. Then, the local DNS server is altered to point the downstream users at the CDN server, not the original source of the content. Therefore, the stuff is downloaded once over the wide-area network, and served many times in the local network. (The best-known one is Akamai.)

You could, theoretically, set up a box with lots of peer-to-peer clients running on it and seed the local network, but there is no guarantee your users would go there for their videos. This is because most (if not all) P2P clients are unaware of the network topology.

Why? After all, it benefits everybody if the client tries to find content close to it first of all - except in a few corner cases, it's going to be faster and experience less packet loss, it costs the ISP less, and it costs their upstream provider less. It's also likely to be more resilient. And it shouldn't be that difficult to implement.

The first thing a P2P client has to do is to find peers and ask them what they can supply. By extension it also has to declare what it can supply. At the very least this has to include an IP address, port number, and filename, but to do the job properly there should be some metadata and a user identifier. This service discovery function is usually one of the most difficult problems. Now, however you do it, you'll have to initiate a connection across the Internet to your peers to get this data. So why not use this opportunity to measure the round-trip latency, hop count, and packet loss? Then, when the content information and this traceroute-like data is collated, rank each group of peers offering the same stuff by proximity, and make the client prefer the local source.

There are some security implications - a lot of people attempt to hide their network layout from the world in order to make hackers' lives more difficult, and in a topo-aware world this would be an efficiency-reducing technology.

No comments:

kostenloser Counter