It crawls the Web without malice seeking out every possible bit of content.
It’s name is Googlebot, and sometimes it gets to see things on the Web that
the rest of don’t.
Unless of course you pretend to be Googlebot.
Superficially spoofing Googlebot, Google’s Web crawler, is not a difficult
thing to do and was recently the subject of a very popular post on the
Digg site. Since at least September of 2006 however, Google has made efforts
to help webmasters protect themselves against spoofed Googlebots. That
doesn’t mean people still aren’t trying to be Googlebot, if the popularity of
the Digg post is any indication.
Just like any user approaching a Web site, Googlebot identifies itself. The
mechanism by which Googlebot is identified at the top level is by something
called the “user agent.” In most cases, the user agent is defined and
reported to the visited Web site as the browser used to view the site.
When Googlebot visits a site, the user agent is Googlebot. So to appear as Googlebot to a site, all you need to
do is identify yourself, by way of the user agent, as Googlebot.
Doing so on Mozilla Firefox is a simple matter of using the User Agent
Switcher extension, which allows Firefox users to be any user agent they
choose. An even easier approach is to take advantage of BeTheBot, which enables users to see Web sites as Googlebot sees them.
Though using the Googlebot user agent may trick some sites into thinking
you’re actually Google’s all-seeing Web crawler, it could also end up
getting your IP address banned from sites if you get caught.
Since at least September, Google has provided webmasters with a
definitive way to verify Googlebot. In addition to the user agent, there is at
least one other key identifier for Googlebot, which is IP address.
“Any interested Web site owner can tell if a visitor is the real Googlebot,”
a Google spokesperson told internetnews.com. “Anyone can set their
browser to pretend to be any “user agent” that they want. How a particular
Web server decides to handle a particular browser or user agent is a choice
that the Web site owner makes.”
Google recommends that webmasters use DNS (define) to verify the
identity of the user agent defined “googlebot” on a case-by-case basis doing a
reverse DNS lookup that would verify that the suspect crawler is in the
googlebot.com domain.
Additionally Google recommends that webmasters also do
a forward DNS->IP lookup, which would prevent a potential spoofer from simply
setting up their own reverse DNS that points to the googlebot.com domain
space. Google posted details on its blog in September.
“Some people do different things for different user agents (e.g.
Googlebot),” Google’s spokesperson said. “If we believe deceptive or
malicious action is happening, Google can take action on that.”
This article was first published on InternetNews.com.
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.