Monday, June 17, 2024

The Trouble with Real-Time Search

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Everybody’s talking about real-time search — the kind of search where you can find things posted seconds ago, rather than hours or days. (People are even talking about Google talking about real-time search. The company has been hinting lately about such an offering.)

The current leader in real-time search is Twitter — but Twitter isn’t the only real-time search game in town. OneRio, Scoopler and Yauba are just a few of the alternatives that have popped up recently. (Sites like Twitterfall, Tweetmeme and others are also called real-time search engines, but these are just Twitter front ends — all results are tweets posted by Twitter users.)

Last week, I addressed comparisons being made between Twitter Search and Google’s new “Recent results” feature. In a nutshell, the reason these are different is because Twitter needs only to index its own site, whereas Google needs to index tens of billions of sites (including Comparing Twitter to Google is like comparing your company mail room with the U.S. Postal Service. That the mailroom can route internal mail quickly doesn’t mean the USPS can.

There are expectations in some circles that Google will soon embrace real-time search by offering results that it normally indexes — i.e., everything — but in a Twitter-style, just-posted-5-seconds ago frequency. I’m here to dash those expectations. It’s not going to happen. And, to the extent that it does happen, it will be a bad thing.

Why Regular Google Searches Work

Google is great, and everybody uses it. The reason Google search is so useful is that Google indexes almost everything, but you don’t have to look at everything. Results are ranked from most relevant to least, based on a wide range of secret-sauce criteria. Type in a search term, and get back good results on the page — never mind that there are 50 million pages of results. The answer you were looking for is probably right there on the first page.

Of course, real-time search can’t rank by relevance. If it does, it’s not real-time search.

Why Twitter Search Works

Twitter’s real-time search engine works great because it’s not like the larger Internet.

First, Twitter posts are by definition very small — 140 characters or less. Google must index much larger pages.

Second, Twitter is a service that you have to sign up for. Twitter is constantly killing accounts by users who abuse it for spam and other bad purposes. Google has no control over most of the sites it indexes, or their users.

And third, everything posted on Twitter is posted in a common, predictable format. The Twitter search engine doesn’t have to figure out what type of content its indexing, or receive instructions and pointers from a wide range of protocols and standards.

And these factors hint at how Google will achieve its real-time search — by not being like regular Google search.

How Google’s Real-Time Search Will Work

Google will probably develop a real-time search engine mainly for subscriber-based services, such as Facebook, Flickr and YouTube. It will also probably throw in its own News feeds, which are already near real-time.

Hopefully, you’ll be able to choose types of content. For example, you should be able to choose “social networks,” and narrow that to “posts by my own friends only.”

Google will not, cannot and should not try to offer real-time search for everything on the Internet. One reason is spam. You already can’t stand just the spam that comes flooding into your own personal e-mail inbox. You definitely don’t want all spam posted everywhere.

With real-time search, spammers can monitor existing sites to see what people are writing about or searching for. Let’s say there’s a giant earthquake in L.A. Everybody wants to jump on their real-time search engine to get updates. It’s trivially easy for spammers to start bombing the Internet with spam loaded with the “earthquake” keyword.

This is already happening, even on Twitter. The more people use real-time search, the more time and energy spammers will devote to exploiting it.

The downside of all real-time search is that anything you search gives you quite a lot of repeated and irrelevant results. This is bad enough on Twitter, but on the larger Internet this would be unbearable — especially when combined with all the spam.

Google will have real-time search, and it will probably be very useful. But they’ll do this by not doing what people are currently expecting them to do — which is to offer real-time search for the whole Internet. They’ll offer real-time search for a sub-set of the Internet, which brings me to one more real-time search problem.

By picking and choosing which major sites to index for its real-time search engine, Google will ignore the millions of small sites out there. If you want your dinky little company to be discoverable, you’re probably going to have to turn to social networks or other sites to promote it.

So expect Google to embrace real-time search. But don’t expect that search to be anywhere near as useful as good old-fashioned regular Google searches.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles