Scalability a Growing Problem in Web 2.0

There is a great I.B.M. commercial that was popular a year or two ago. In it a bunch of programmers and designers are eagerly surrounding a computer monitor that reports on a web application they just released. The team is happy, jovial, like all teams are during a much-anticipated launch. They get even more happy […]

There is a great I.B.M. commercial that was popular a year or two ago. In it a bunch of programmers and designers are eagerly surrounding a computer monitor that reports on a web application they just released. The team is happy, jovial, like all teams are during a much-anticipated launch. They get even more happy when the numbers of users starts to climb. Up, up it goes, and they cheer for joy. Then the numbers keep climbing, steeply, and the group realizes that their system won’t scale. Their mood turns from sweet to sour in an instant. Their best case scenario wasn’t actually that: it was their worst case scenario.

This is an extreme case of what happens when an application benefits from network effects. If an application is useful, then the network of users will grow crazily fast at some point, what many folks like to call a “tipping” point, after the book by Malcolm Gladwell. This is a crucial stage for any Web 2.0 application, because it is at that moment when an application can make its money back in an instant. If it can’t scale, it can’t survive the tipping point.

In his What is Web 2.0 piece, Tim O’Reilly puts it provocatively: “it’s not accident that Google’s system administration, networking, and load balancing techniques are perhaps even more closely guarded secrets than their search algorithms.” Indeed, you can find Brin and Page’s original search algorithm each and every day on your friendly internet: The Anatomy of a Search Engine. But what can you find out about how they manage their system as a whole? Not much.

Just yesterday I got a surprise about scalability that some of you Bokardo readers may have even noticed yourself. My site was loading page elements fine, but it didn’t seem to complete a page request until a minute or two had passed. It wasn’t just the XHTML pages, either, it was the RSS feed as well. So most people trying to access my RSS feed got a timeout instead.

It turns out that a WordPress plugin was to blame. The plugin was one I downloaded and activated months ago called Shortstat, which provides a good glimpse into how many folks are visiting my site at a given time. It records URLs, referrers, IP addresses, and makes an attempt to look up the location of those IP addresses via a public API on another web site HostIP.info, each and every time a request is made. Let me repeat that: each and every time a request is made to my server, this plugin makes another request to a different server in order to look up IP addresses in an attempt to get a corresponding location back.

Talk about scalability issues. If I’m sending about 10 requests per minute to this site, what about everyone else? What about those folks who also installed the plugin and who have more traffic than I do? Combine us all together, across the world, and you’ve got instant scalability issues. Needless to say, two days ago the service collapsed, and so a bunch of other sites had the same problem I did. Here’s the WordPress support page regarding this issue: http://wordpress.org/support/topic/47348.

In addition to my little problem, Nat Torkington points us to the ongoing issues with the Google Maps backend, most assuredly brought about by the massive use and scaling of the system. Over the past year, as Google Maps has been hacked to life by folks like Paul Rademacher with Housingmaps and then officially brought to life with its own API, Google has used Navteq as the maps provider. As Nat points out, that recently changed on or around October 4th when Google switched to another map provider called “TeleAtlas”. Now their own maps use Navteq, the maps retrieved through their API use TeleAtlas.

While this is most likely a money issue, it was probably brought on by the amazing scale to which the Google API was and is being used. Navteq realized that their maps were being used in a much grander scale than they expected, and they wanted to be paid more and more money as a result. In other words, their revenue didn’t scale with the usage of the maps. As Nat says, ” The biggest threat to a data business is free access to the data”.

In Web 2.0 these issues are only going to become more pervasive. So if you offer up an API, make sure you know what you’re asking for.

Update: Robert Cringely writes on the same topic, calling it an “energy crisis”.

Published: October 21st, 2005