In the beginning, we had the World Wide Wait. We had browsers and Web servers, and not much else in between. While life on the Web was architecturally simple, there wasn't too much that we could do about improving the latency of our Web connections. When we wanted to view a particularly complex Web page, or one with lots of graphics, or one on a Web server in a far-flung corner of the world, we waited.
But the days of waiting are coming to a close, thanks to an array of caching technologies that deliver data faster to the impatient browsing public. And as corporate Web sites have evolved from a single, simple Web server into more complex architectures, the caching universe has expanded as well. In this article, the second in a series on new Web technologies for IT managers and intranets, we'll examine which of these approaches makes the most sense for different situations, and how these technologies fit into expanding Web applications infrastructures. We'll also explain how to improve the browsing experience for corporate users as well as for Web site visitors. The first article covered Web management tools and services. The next will examine the latest e-commerce payment processing products.
Caching is a relatively simple concept: You move content closer to the ultimate destination, removing delays in network latency, congested servers, and overloaded links and routers in the process. The ideal location for the best-performing cache is to place Web content on hard disks: If we could have one large enough to store the entire Web (or at least the Web sites that you might visit), we would never have to wait for a page to load ever again. However, disks aren't really practical. We would all have to have huge hard disks on our own desktops to contain the terabytes of Web content. The better practice, then, is to locate the cache on the big hard disks of servers on local area networks (LANs), and share this wealth among multiple users.
What you need to know
But there are lots of subtle issues behind this relatively simple concept. For instance, what happens if the page changes between the time you cache some content and you go to view it? Are there ways to store frequently used page elements that don't change often, such as logos, headers, and other graphics, while updating more dynamic content on the fly? Are you trying to save money by not having to buy additional outbound bandwidth from your corporation to the public Internet in addition to providing a better browsing experience for your end users?
Add to these issues the fact that many corporate Web sites have evolved from simple, single-server sites to more complex ones covering multiple sites and spanning several continents. To make matters worse, these complex servers have a wide array of data types, such as streaming media, Shockwave, and other animations, and run more sophisticated applications, such as secure credit card processing and dynamic database updates, than just serving up Web pages. Such applications as streaming media and e-commerce storefronts have different and more complex caching requirements than simple file services. This is because these applications have different sizes of data streams (think of a megabyte audio file next to several simple text labels) and pages with both static and dynamic elements pulled from multiple sources, including the Web, file servers, and databases.
The best caches need to understand how to implement the concept of content freshness, and keep track of when and how Web content changes. It also needs to examine how pages are constructed, and refresh only objects that change frequently. Caches have to understand how to find the location of your content and where your users are connecting to your servers from around the world. Some caches also act as proxy servers to control or conserve on outbound bandwidth, enabling multiple users to share a single Internet connection effectively. For example, let's say everyone in your workgroup connects to the New York Times Web site every morning to read the day's headlines. Rather than sending the same series of pages multiple times across the Internet from the Times' Web site to your company's desktops, you receive one version of these pages and share it among your workgroup.
There are four basic kinds of caching products and services. The newest entrants to this arena are caching services that are sold by application service providers. The other kinds are products, including that typically run on UNIX, prepackaged caching servers that combine caching software running on either UNIX or Novell Inc.'s NetWare with general-purpose server hardware, and specialty appliances that come designed just for caching.
The first and newest caching category provides content distribution services to handle much more than caching. Service providers offer Web server reliability, peak load demand, and the geographic dispersion of the Web itself. These services mirror the trend toward more complex and distributed Web sites, where content may be located on different continents and on servers maintained by separate divisions. And just as your own Web sites might be more distributed, your visitors might also be coming from many remote corners of the world.
To handle both of these trends, vendors have come up with the concept of replicating your content and distributing it around the globe closer to the origins of potential site visitors. These service providers are building large, replicated data centers and high-speed network connections to the general Internet. The idea here is to ensure the browsing public will avoid congested network routes and peering points. These service providers include tools to manage how content is distributed and how to do load balancing during peak traffic times.
As you can imagine, this is a very expensive undertaking. However, the cost savings in terms of overseas bandwidth charges can be quite large as well, particularly for Internet Service Providers (ISPs) outside of North America that are servicing American content abroad. Some offshore ISPs estimate that between 30% and 50% of their overall traffic has North American destinations. The major players in this area include Digital Island Inc./Sandpiper Networks Inc., Akamai Technologies Inc., and Edgix Corp.
A variation on this theme is from SkyCache Inc. The company makes use of Inktomi Corp.'s server software but delivers content via a series of earth-orbiting satellites, bypassing the congested terrestrial Internet in the process. Users of this service need to install special rooftop dishes to communicate with their networks. Working with these service providers means making some changes to your existing ISP relationships. In some cases, such as with Akamai and Sandpiper, users will have to co-locate their Web servers at their data centers and add additional management software and tools to remotely manage these servers. In the case of SkyCache, users might need to upgrade routers and other network infrastructure devices to work with its networks.