The fact that people who steal your work often outrank you is a problem called "Googlewashing," as I wrote last week. The problem is getting so bad that your Web page may slip out of Google's Top 10 for particular search terms even if your friends innocently quote an excerpt of your work and legitimately link to you as the source.
Today, I'll explain how you can keep people from copying your writing wholesale -- and how to keep search engines from penalizing you when your work is legally excerpted elsewhere.
Do As I Say, Not As I Link
I reported last week that some specialists in the field of "search-engine optimization" have found ways to steal your content and then artificially raise their rankings to Page Rank 10, Google's highest score. I'll describe in my next column how to combat that trick. In today's piece, I focus solely on ordinary search-engine results, where no fakery is involved in the scoring.
Here's an overview of the situation your Web site may be suffering from:
You publish articles or blogs. To attract traffic, you may invest considerable time and money into developing fresh and interesting content. Unfortunately, almost any Web page is easy for ethically challenged people to copy, posting it as if it were their own. If your content is distributed via RSS (Really Simple Syndication), the process is trivial to automate.
People comment on your work. Apart from outright theft, it's very common for other sites to excerpt a paragraph or two from your work. This is considered "fair use" under most countries' copyright laws. It's perfectly acceptable, assuming the commenters include a link to your original article.
You link back or "trackback" to the commenters. Some Web sites are programmed to automatically include a fragment of any comments on the original article that may be found at other Web sites. These cross references are called "trackbacks" and usually include links to those sites. Unfortunately, by linking to other sites' excerpts of your work, you may be contributing extra "points" to those sites in search engines, pushing their rankings above your original article's.
Trackbacks are admirable as courtesies. But you don't necessarily need to help those sites rank above yours in searches on your particular topic.
To Follow Or To Nofollow, That Is The Question
In an interview, Andy Edmonds, a relevance data analyst for MSN Search, suggested that companies posting trackbacks and other related-comment links use a feature called nofollow.
To use this trick, you add the attribute rel="nofollow" when linking to sites that comment on your content. Search engines largely consider such links to be "unapproved" by your site. These links, therefore, don't lend additional weight to the pages you're linking to. Google, Yahoo, MSN Search, and other indexes started recognizing the nofollow attribute in January 2005.
Adding nofollow to every link in a site's comments section has become fairly easy to do without the need for you to hand-code it. Methods to add the attribute to comments automatically (or omit it) are already available for most major blogging tools, such as Movable Type, Blogger, and WordPress.
Using this feature on your site's trackbacks (as well as your comment area) is a logical extension. To be sure, there are heartfelt opinions against using nofollow, as expressed at sites such as IO Error. The strongest criticism of the attribute is that it hasn't stopped comment spam, the problem that nofollow supposedly was invented to combat.
But nofollow does seem to me to be a useful way to help search engines recognize content that wasn't originated by your site. It's also a feature you can turn on or off for individual links, with some extra work, if you wish.
I Never Meta Tag I Didn't Like
Ideally, search engines would automatically recognize a new article posted by you as the "original article." Your work would then rank higher than Web pages that were merely references to your original article.
MSN's Edmonds suggests a way Web sites can help search engines determine which page was the first to carry some particular content. This is to use a "meta tag" containing the date an article was published. If several Web pages contained links to each other, the page with the earliest date would be considered by search engines to be the "original content." It should, therefore, rank higher on a given search that also matches the follow-up pages.
One effort to popularize tags of this kind is the Date element of the Dublin Core, a group that proposes standards for meta data. Unfortunately, only a very few meta tags are noticed by most search engines, and date isn't one of them, according to Search Engine Watch.
Despite the fact that date meta tags aren't widely supported, that doesn't mean search engines might not recognize them someday, so it wouldn't hurt to use them. "That's a community practice that would really help with this problem," Edmonds says.
Stopping Outright Rip-Offs Of Your Content
Perhaps your biggest problem isn't legitimate excerpts outranking your stuff but dishonest people copying your material verbatim and posting it as their own. There's a big incentive for this these days. Now that text ads from Google and other services pay Web sites for each individual click, many promoters try to build as many pages as possible -- using whatever articles they find -- merely to attract visitors who might click advertising links.
One way to find sites that are blatantly copying from you is to subscribe to a service such as Copyscape's Copysentry. For $9.95 to $19.95 per month (or free for small, manual searches), this service reports to you every week or every day on Web pages that contain sentences that match 10 or 20 pages you determine. Specifying additional pages costs $0.25 to $1.00 per month.
If you find copycat sites, it may be useless to complain to the offender directly -- if you can even find a way to contact him or her. But you might get results by complaining to the copyist's ad network or Web hosting provider.
Copysentry isn't a panacea. Although it was created by some of the developers of the highly regarded Google Alert service, it doesn't seem to find every instance of duplicate content on the Web. In fact, some reviewers, such as David Mattison of The Ten Thousand Year Blog, report that using plain old Google reveals more duplication of your content than using Copysentry does.
If you decide to use Google or any other search engine to look for unauthorized duplication of your content, follow a few simple rules:
Wait 30 days, since Google and many other search engines update their master index files only once a month. (Some sites are indexed far more often than this, but your site and the copycat sites you're looking for may not be.)
Look for copies of your last paragraph because many sites legitimately reprint the first one or two paragraphs of your content in the course of linking to your site.
Make up little-used phrases at the end of your articles to help you zoom in on copycats. Common words and phrases will appear on so many Web pages that copies of your particular work will be hard to find.
Remember, search services such as Google and Copysentry are no substitute for serious digital rights management, if DRM technology (which is beyond the scope of this column) is really the level of protection your content needs.
Unauthorized copying will never completely go away, but you should at least be able to glean some satisfaction by catching the worst offenders. In my opinion, people who copy entire articles without permission are being impermissibly impertinent. (Copy that, suckas.)