In pursuit of intelligent search engines

Search engine intelligence is almost as unpredictable as the weather--murky one minute, sunny skies the next.

In this article:
The Hyperbolic Tree from InXight
Mindshare leaders
Autonomy Knowledge Management Suite: A visual map of search results
New intranet search engines meet the challenges
What the future holds
In the business of search engines:
Everybody complains about weather forecasts, just as they do about intranet search engines. Then the storm passes, and the hunt for the perfect search engine is on again.

The complaints may be many, but they all focus on the quality of the search. Verity Search '97 user Diep Truong, NetworkMCI Library Manager at MCI in Washington, D.C., just wants to "type in the word 'benefits' and get human resources information, not a page on the benefits of ISDN."

Englewood, Colo.-based U.S. West technical staff member Steve Collins wants only the right amount of information from a search of his company's intranet using its Verity search engine instead of always having to deal with either an avalanche or a drought. "You either get way too much information back," he says, "or you get too little. In either case, you don't get what you want."

The Hyperbolic Tree from InXight Software is best suited for browsing and interacting with large hierarchies. Users can navigate among hundreds of thousands of Web pages at a time. Smooth, animated transitions help users maintain a sense of orientation.

Even systems integrators like Montreal-based Yamatech Connectivity Solutions acknowledge that the products they work with leave room for improvement.

Jack Kincler, managing director at Yamatech, which builds search systems using Digital Equipment's AltaVista Search Intranet eXtension, says that there is "not enough intelligence in search engines right now. They just go and bring back whatever they find, and many times you get redundant results with, for example, the same document listed more than once."

But a break in this gloomy weather pattern is coming, and soon. Today's first-generation search engines are rapidly evolving, and much improved technology is just over the horizon. The next few years promise big changes in what search engines look like and what they can do.

Smarter searching

Search engine vendors are moving rapidly to make searching easier. First-generation search engines generally provide a relevancy ranking by counting the number of times the search word occurs in a document and listing the one where it appears most often on top. At MCI, Truong says employees would like the option to move a document to the top of the results list if the word appears in the title or in the first 50 characters. Most search engine vendors will be providing this sort of flexibility in the near future.

Several search engines now make the list of results more manageable by organizing the search results into coherent categories. Inference in Novato, Calif., which makes its search engine, InferenceFind, available on the Web for Internet searches, provides this clustering and removes redundant results as well.

And since most searches gather too much information initially, search engines such as Fulcrum's Knowledge Network and Infoseek's Ultraseek are making it easier to narrow down queries, so a second try finds the right information. These engines allow you to use the initial results as the basis for further searching--without having to start the second search from scratch--by choosing a document or text in a document and asking the search engine to "find more like this."

Search engines can accept queries typed in English, instead of requiring boolean operators like "and," "or," and "not." In this example, you might get back information on software patents in Japan or hardware patents in Korea.

"The first thing our users want," says Ron Maturo, intranet manager at Merck Research Laboratories in Rahway, N.J., "is to be able to type in normal English to do a search. They don't want to have to use Boolean terms." Merck Research uses Infoseek's Ultraseek search engine. The biggest boon to searching, however, is probably the emergence of natural language searching.

A growing number of vendors now offer the choice of using Boolean operators like "and," "or," and "not" in a query, or simply typing in a question in the form of a sentence. Autonomy Knowledge Server, Fulcrum Knowledge Network, Infoseek's Ultraseek Server, Verity's IntelliServ, and other search engines are rapidly acquiring the intelligence to automatically convert natural-language expressions into Boolean queries.

If you were to type the question, "Tell me about technology patents in the Far East," into a first-generation search engine, says Bruce Milne, Fulcrum's manager of industry and new media marketing, you might get millions of documents containing one or more of the words in that phrase.

A search engine that understands natural language, Milne says, would analyze the query and toss out "tell me about" and "in the" as irrelevant words. It understands that "Far East" is a concept that includes countries such as Japan, Korea, and Singapore, rather than interpreting it in a strictly Boolean sense and finding documents that contain things like "far away" or "East of Eden." (See diagram, "Natural language searching: an example.")

"[There is] not enough intelligence in search engines right now. They just go and bring back whatever they find, so many times you get redundant results."
Jack Kincler, managing director, Yamatech Connectivity Solutions, Montreal, Canada

Likewise, "technology patents" is a concept that might include things like patents on software. Instead of returning a huge number of unusable results, the search would bring back documents that might include such details as software patents in Japan.

Easier administration

The next generation of search engines promises to make life easier for administrators as well as for users. Search engines are becoming more powerful, so indexing happens faster, which is important as the amount of information on corporate intranets grows by leaps and bounds. And within the coming year, distributed indexing should be widely available--an important feature for Merck Research, says Maturo, since Merck's intranet has information on 67 servers scattered around the world.

Web masters at MCI are looking forward to automatic categorization, Truong says, which should also be widely available in the next year or so. Besides its Verity search engine and the search capabilities built into Netscape's Compass product (also based on a Verity engine), MCI provides its users with a list of 500 sites on the intranet organized into categories such as corporate data, market research, and company services.

Building this index, notes Truong, involves a lot of manual work, since MCI employees must laboriously go through each site and put it in the proper category. But search engines are now starting to take over this process by making use of a lexicon, or thesaurus, of related words to assign a document to the proper category. Without a lexicon, this could be tricky, since a document that should go into the category "computers" might use the terms "CPU," "processor," and "RAM," but not actually contain the term "computer."

Autonomy, Fulcrum, and Semio now provide these lexicons, and lexicons of terms relevant to specific industries will be available soon. To make the indexing more accurate, these lexicons will also allow you to add your own company-specific terms to the list.

Push technology gets real

Like much in the computer industry, the early promise of push technology was lost in the hype. Earlier approaches tended to be blunt instruments, delivering lots of undifferentiated information to the desktop--and using lots of bandwidth to do it.

Diep Truong, NetworkMCI Library manager, says: "[I just want to] type in the word "benefits" and get human resources information, not a page on the benefits of ISDN."

Now, search engine vendors and companies doing Web channel broadcasting are forming alliances like the one recently announced between Verity and BackWeb Technologies. The result, says analyst Hadley Reynolds of Boston-based Delphi Group (, will be the ability to "tune" search engines to receive the information that you really need from a variety of sources. The search engine, Reynolds says, will deliver not only the day's headlines, "but it will also provide continuous indexing of your favorite newsgroup and update you any time your competitor's Web page changes."

While it may not be essential for every employee, Reynolds notes that for people in the financial industry or the media--or for analysts like himself--"a search engine that can deliver 10 or 12 absolutely critical hits a day is going to be like your right arm."

Mindshare leaders

In a survey of more than 300 IT professionals familiar with retrieval technology, the majority of first place votes for vendor leadership in the search engine category was cast for Verity and Fulcrum Technologies. Excalibur's Excalibur RetrievalWare and Digital's AltaVista search engines also attracted attention.

Verity 25%

Fulcrum 19%

Excalibur 13%

AltaVista 7%

Other 36%

Source: Delphi Group

You'll also be able to tune the search engine to present the information in different ways. For some subjects, you might want the whole document. Other topics might only warrant a summary, or just the URL. Or you may want to request the document with the search terms highlighted, so you can see why the search engine found it.

This ability to tailor a search engine's output will include the option of searching your own desktop, says David Yockelson, an analyst at the Meta Group in Waltham, Mass. ( Increasingly, he says, search engine vendors are following the "Microsoft model," where the intranet and Web become an extension of your desktop. Search engines will spider through your computer along with everything else and add your documents to a personal index for your searches.

Beyond HTML

On the Internet, HTML is the common denominator, so first-generation search engines only had to handle one format. Inside organizations, however, data comes in a myriad of different types: HTML, ASCII text files, Adobe PDF files, word processing documents written in Microsoft Word and Corel's WordPerfect formats, flat files, and Microsoft Exchange, Lotus Notes, Oracle, Sybase, DB/2, and other databases. But most intranet search engines, led by companies such as AltaVista and Fulcrum, which already handle over 200 different file formats, are rapidly adding the capability to handle data in many different formats.

To achieve this capability, however, some search engines rely on gateways, which can mean that the process is visible to the end user. "We'd like to see a much more seamless integration," says U.S. West's Steve Collins. "Ideally, the user shouldn't have to know whether you're serving up database information or text documents."

Autonomy Knowledge Management Suite: A visual map of search results

There's also a growing amount of image, sound, and video content finding its way onto corporate intranets. Although there's not a huge demand currently for the ability to search this multimedia content, says Meta Group's Yockelson, it may soon become an essential tool for people in the entertainment, news, and advertising industries, just to name a few.

Most search engines can already catalog the meta-information about an image or a video file, such as a title, a description, or the date it was created. But Vienna, Va.-based Excalibur Technologies' RetrievalWare search engine goes further, with a feature the company calls adaptive pattern matching. This allows you to say, "find me another picture like this one," says Yockelson. "It can recognize that there is a hat in a picture, and then find other pictures with hats in them."

Changes in the industry

While few people are willing to talk publicly about impending deals, industry insiders agree that consolidation in the search engine industry is a given. The next six to 12 months will almost certainly see at least one, and probably several, acquisitions or mergers.

New intranet search engines meet the challenges

Rapidly evolving technology now find ways to:

Provide search results that make it easy to find the right document.
Tailor searches to individual users--without overwhelming them with information from push technology and intelligent agents.
Take the burden of categorizing documents off Web administrators.
Scale up to meet the challenge of managing large numbers of documents located on servers distributed throughout the enterprise.
Handle data in a multitude of formats, including images and sound.

The beginnings of this shakeout can already be seen: A year ago the Delphi Group was tracking 60 companies in the search engine business. Now, says analyst Hadley Reynolds, that number is down to about 45.

One of the major players, Fulcrum, was recently acquired by document management company PC DOCS International in Burlington, Mass. ( This acquisition portends a trend, as the world of search engines merges into the larger field of knowledge management.

Another trend users will welcome is falling prices. Straight-text search engines are "becoming a commodity item," says Mike Lynch, Autonomy CEO. Vendors--like Microsoft, which supplies its Index Server search engine for free with the Windows NT operating system--are helping to drive prices down. Other vendors that make their search engines available on the Internet can afford to subsidize intranet pricing with advertising on the Web.

What the future holds

Generally available in the next 12 months

Distributed data management: Indexing data on multiple servers
Knowledge maps and other tools for visualizing search results
Search engines that categorize documents automatically, creating taxonomies without laborious manual indexing
Personalized searching, tailored to the needs of individual users
Able to access most commonly used file formats
Available next year... and beyond
Delivery of customized information that's highly relevant to your particular needs through "push" technology and indexing your own desktop
Search technology that merges with knowledge management--the entire corporate knowledge base becomes accessible
Following the traditional client/server model, prices for most search engines are based on the number of users or servers. Some vendors, like Infoseek, however, are experimenting with pricing based on the number of documents indexed.

A picture is worth a thousand words

Who says results from a search have to come back as a list of documents? The Delphi Group's Reynolds contends that as search engines get better at pinpointing the right content, the focus will shift to finding ways to make it easier for users to deal with the information. "Instead of presenting users with mountains of documents," he says, "search engines will provide visual representations of the content of those documents." (See diagram, "Visual map of search results.")

Inxight Software, in Palo Alto, Calif., ( has begun to move in this direction with a "Hyperbolic Tree," which allows users to explore a map of a Web site or group of documents by using a mouse to drag one item to the center of the page. As that item approaches the center, it becomes, literally, the center of attention, and links to associated documents come into view.

Autonomy takes a slightly different approach by providing a visual map of the results of a search. This map could include not only news articles, data from corporate archives, and e-mail, but people as well, because--as CEO Lynch says--much of the really valuable information in the corporate world "is between people's ears." Autonomy builds up a profile of the interests and expertise of individuals in the company, based on their past searches, which then becomes part of the general corporate body of knowledge.

As visual mapping and other advanced technologies become widely available over the next year or so, the storm clouds hovering over intranet searching should begin to fade. Changes in the industry will make searching your company's intranet a whole lot easier. //

Dan Orzech is a Philadelphia, Penn.-based writer who specializes in technology. His work has appeared in The Los Angeles Times, The Philadelphia Inquirer, and many computer industry publications.Ø

In the business of search engines:

Product: AltaVista Search Intranet eXtension 97
Company: Digital Equipment

Product: askSam Web Publisher
Company: askSam Systems

Product: Autonomy Knowledge Server
Company: Autonomy

Product: BRS/Search
Company: Dataware Technologies

Product: DB/Text Intranet Spider
Company: Inmagic

Product: Excalibur RetrievalWare
Company: Excalibur Technologies

Product: Folio siteDirector
Company: Folio Products

Product: Fulcrum Knowledge Network
Company: Fulcrum Technologies

Product: Lycos Site Spider
Company: Lycos

Product: InferenceFind
Company: Inference

Product: InQuery
Company: Sovereign Hill Software

Product: Intelligent Miner for Text
Company: IBM

Product: IntelliServ
Company: Verity

Product: Verity Search '97
Company: Verity

Product: Inxight
Company: Inxight Software

Product: ISYS:web
Company: ISYS/Odyssey Development

Product: Microsoft Index Server7
Company: Microsoft

Product: Netscape Compass Server
Company: Netscape Communications

Product: Open Text
Company: Open Text

Product: Phantom Maxum Development
Company: Maxum

Product: SemioMap
Company: Semio

Product: Ultraseek Server
Company: Infoseek

Product: WebWorks Search
Company: Quadralay

Comment and Contribute


(Maximum characters: 1200). You have characters left.