The cause is an obscure HTML command that is interpreted poorly by Google but correctly by Yahoo and some other search engines. Knowing about the trick at least gives you some hope of understanding it -- if it happens to you.
I described on Nov. 1 how duplicate copies of your content can result in the copies ranking higher in Google than your own original material. And on Nov. 8, I showed how you can prevent duplicates from hurting your site's rankings.
Today, I'll finish this series on search-engine problems with a look at the complete hijacking of your Web site traffic.
The ability of one site to hijack another site's Google traffic arises because of two different but related HTML codes:
• A 301 code is a permanent redirect. This code is placed on a page by a Webmaster when the content has moved to a different location. The 301 code is like a Webmaster saying to search engines, "Page A is now at Page B." This code is legitimately needed because the site has a new domain name or simply renamed its directory structure. Search engines usually give the new page the same weight that the old page had. This means companies can shift content around without losing the rankings that were previously earned in search-engine listings.
• A 302 code is a temporary redirect (more precisely referred to as a "found elsewhere" code). The 302 code essentially says, "Page A is now at Page B, but it plans to move back, so keep the link to Page A and give it the same weight as Page B." This could be a legitimate use of the code. For example, a company could be load-balancing by temporarily moving some high-demand content to a beefier server.
The problem is that Google sometimes puts the Page A link in its results instead of Page B, even when Page A is an attacker's site that is using a 302 code to steal Page B's traffic. As amazing as it may seem, Google can shift your legitimate traffic to an imposter, who can profit from any visitors who don't immediately notice any difference.
Some Search Engines Solve The Problem
The problem is not entirely new, but it's become a serious concern lately as Webmasters have started using 302 codes more widely. Google's own "AdSense" listing was even hijacked by an unrelated site, which had been using 302 codes innocently as a way to make its links editable at a later date.
Complaints grew to the point that Danny Sullivan, the editor of Search Engine Watch, held an "Indexing Summit" to discuss it publicly with search-engine officials in August.
According to Eric Baldeschweiler, Yahoo's director of software development, the Yahoo search engine solved the problem many months ago. Yahoo uses the following rules to settle things:
• All 301 permanent redirects are fine. Yahoo links to Page B. You can safely move content around within your site or to an entirely new domain with no problem. There's only one exception -- Yahoo shows the URL of a site's home page even if it immediately redirects to a "welcome" page.
• All 302 temporary redirects within a domain are fine. Yahoo links to Page A. You can temporarily redirect traffic from your own Page A to your own Page B without losing the search-engine ranking of Page A.
• All 302 redirects from one domain to another are considered permanent. Page A derives no benefit. Yahoo links to Page B. This eliminates any site's ability to steal traffic from an unrelated site.
Sullivan confirms, "In fact, Yahoo has gone to that kind of solution." However, he adds, "The result that came down from Google" -- which also participated in the summit -- "was, 'We think we've solved that problem.' " Sullivan believes MSN Search might also suffer from the 302 problem to some extent, but MSN did not take part in the summit.
Google Publishes A Tech Support Portal
If you think your Web site's traffic is flowing to some other party, Google has a form you can fill out to report it. Matt Cutts, a Google search quality engineer, links to the form in his blog and says complaints that are submitted there "will get the same level of investigation" they would if he personally was notified. You'll have to try it to find out if that's so.
Whether or not your company is affected today, you need to be knowledgeable about potential redirection problems in case you eventually have to do something about them (even if that mostly boils down to complaining your head off).
Sullivan has shown several examples of how the problem affects Google and MSN Search in his own blog entries.
Claus Schmidt, an Internet consultant who's been tracking the problem for ages, has an extensive technical explanation. (This includes a note that Yahoo's solution is compatible with the original RFC that defines the 301 and 302 codes.)
Yahoo's solution seems eminently reasonable and workable to me. Rather than experimenting with complex rules to analyze URL hijacking, Google and other search engines should simply adopt the rational 301/302 policy shown above.
This problem shouldn't exist and need not exist. Finding out that your site has suddenly lost most of its traffic because of an HTML trick is a lousy way to start the day.
Update: English Version of IceSword
I wrote on June 14, 2005, about IceSword, an antihacker utility designed to defeat "rootkits" that infect Windows. IceSword, at that time, was available only in a Chinese-language version.
An English-language version of the program is now available for download from the following Web page:
Xfocus.net is the home of a Chinese group of security researchers. The group's download page, as a result, is written entirely in Chinese. Non-Chinese speakers, however, can easily download IceSword_en1.12.rar (a compressed file) by clicking the blue characters in angle brackets shown at the bottom of the page.
I'd like to thank one of my readers who goes by the name of Illukka for his help researching this topic. He'll receive a gift certificate for a book, CD, or DVD of his choice for sending me a tip I printed.