Friday, April 01, 2011

Scaling and scoping the NYT paywall

Amusingly for a comment on scalability, I couldn't post this on D^2's thread because Blogger was in a state. Anyway, it's well into the category of "comments that really ought to be posts" so here goes. So various people are wondering how the New York Times managed to spend $50m on setting up their paywall. D^2 reckons that they're overstating, for basically cynical reasons. I think it's more fundamental than that.

The complexity of the rules makes it sound like a telco billing system more than anything else - all about rating and charging lots and lots of events in close to real-time based on a hugely complicated rate-card. You'd be amazed how many software companies are sustained by this issue. It's expensive. The NYT is counting pages served to members (easy) and nonmembers (hard), differentiating between referral sources, and counting different pages differently. Further, it's got to do it quickly. Latency from the US West Coast (their worst case scenario) to nytimes.com is currently about 80 milliseconds. User-interface research suggests that people perceive a response as instant at 100ms - web surfing is a fairly latency tolerant application, but when you think that the server itself takes some time to fetch the page and the data rate in the last mile will restrict how quickly it can be served, there's a very limited budget of time for the paywall to do its stuff without annoying the hell out of everyone.

Although the numbers of transactions won't be as savage, doing real-time rating for the whole NYT website is going to be a significant scalability challenge. Alexa reckons 1.45% of global Web users hit nytimes.com, for example. As comparison, Salesforce.com is 0.4% and that's already a huge engineering challenge (because it's much more complicated behind the scenes). There are apparently 1.6bn "Internet users" - I don't know how that's defined - so that implies that the system must scale to 268 transactions/second (or about 86,400 times the daily reach of my blog!)

A lot of those will be search engines, Internet wildlife, etc, but you still have to tell them to fuck off and therefore it's part of your scale & scope calculations. That's about a tenth of HSBC's online payments processing in 2007, IIRC, or a twentieth of a typical GSM Home Location Register. (The usual rule of thumb for those is 5 kilotransactions/second.) But - and it's the original big but - you need to provision for the peak. Peak usage, not average usage, determines scale and cost. Even if your traffic distribution was weirdly well-behaved and followed a normal distribution, you'd encounter a over 95th percentile event one day in every 20. And network traffic doesn't, it's usually more, ahem, leptokurtotic. So we've got to multiply that by their peak/mean ratio.

And it's a single point of failure, so it has to be robust (or at least fail to a default-open state but not too often). I for one can't wait for the High Scalability article on it.

So it's basically similar in scalability, complexity, and availability to a decent sized MVNO's billing infrastructure, and you'd be delighted to get away with change from £20m for that.

2 comments:

Anonymous said...

Holy crap dude. You did more over engineering of this than was necessary. Not nearly as complex as you make it. Not even close. Its cookies and counting page views for certain urls that is it.

Henrik Holst said...

Wow 268 TPS for $50M, I wonder what they would charge to write the kind of software that I write on a daily basis (handles about 1M TPS)...

kostenloser Counter