It’s 4 o’clock on a Friday afternoon. You’re in a meeting with your biggest client, and she has just hinted that she would like to be wined and dined tonight at a trendy, uptown restaurant. How do you find a table for two on short notice–without making dozens of telephone calls to snooty and unsympathetic maitre d’s?
Tom Infantino and his colleagues at Foodline.com Inc. believe they have a solution: A system that uses computer telephony integration (CTI) to enable anyone to reserve a table at a participating restaurant by calling Foodline and responding to automated voice prompts. First, you tell the speech recognition system that you want a table for two at a fancy French restaurant in New York City at 7:00 p.m. The Foodline.com interactive voice response (IVR) system presents you with a list of restaurants and gives you the option to listen to reviews. Your call is then transferred to the restaurant of your choice, allowing you to place your reservation with a live human being.
“The biggest issue we have with the telephone [system] is the cost of hosting the calls,” says Tom Infantino, CTO and COO of Foodline.com.
With this telephone system and a companion Web site (www.foodline.com), the Boston-based startup is attempting to revolutionize the way people dine out. Similarly, other companies, both small and large, want to capitalize on recent advances in speech recognition, computer telephony, and customer service. Whether the corporation is trying to shave a few seconds off millions of calls to their customer service call center or to enable employees to dial colleagues simply by saying their names, recent advances in speech recognition may make implementation of this technology possible.
Why are corporations beginning to take telephone-based automatic speech recognition (ASR) systems seriously? Speech recognition–the automatic conversion of a stream of speech into discrete, digitizable words–requires elaborate processing of large volumes of audio data in real-time. So with PC clock speeds doubling every year or two, “the hardware platforms have really made speech blossom,” says Judith Markowitz, president of J. Markowitz Consultants, an Evanston, Ill., firm specializing in speech recognition. Also, “the algorithms [for processing speech] have become much better, and accuracy has improved dramatically over recent years,” Markowitz says.
With the technology improving rapidly, sales of telephone-based ASR will top $1 billion in 2003, according to Voice Information Associates Inc. of Lexington, Mass. And there are a number of big-name vendors getting into the speech game. Joshua Walker, an analyst with Forrester Research Inc. of Cambridge, Mass., says the key players include Lernout & Hauspie Speech Products N.V., Nuance Communications Inc., and SpeechWorks Inc. Other important vendors are Dragon Systems Inc., IBM Corp., Periphonics Corp., and Phillips Electronics N.V.
But the real business driver behind widespread adoption of enterprise speech recognition systems is improving customer service. By sidestepping the slow and awkward telephone keypad, speech recognition systems allow companies to give their clients a better experience. In addition, speech systems can help companies save money by reducing the need for human telephone operators.
The potential benefits of speech recognition don’t stop there. “Speech recognition lets companies take a pool of data from the Web to the telephone,” says Walker. And that’s what Tom Infantino at Foodline set out to do.
Back to the groaning board
Not that the implementation of the telephone restaurant reservations system was a cakewalk. In 1998, “we first worked with Lernout & Hauspie to implement a speech recognition telephone system in Boston,” says Infantino, Foodline’s CTO and COO. Lernout & Hauspie of Burlington, Mass., helped Foodline install and configure its ASR1500 automatic speech recognition software on two Dell Computer Corp. dual-Pentium 400MHz servers.
But the voice interface confused and frustrated users, and Foodline’s performance fell short. Users often didn’t understand how they were expected to respond to the voice prompts, Infantino says. “We went back to Lernout & Hauspie, but they couldn’t help us” with the user interface problems, he says.
Foodline went back to the drawing board in 1999 and found a new speech vendor. The company partnered with SpeechWorks of Boston in May 1999 to build a speech-driven restaurant reservations system from scratch using SpeechWorks 5.0 speech recognition software. “SpeechWorks bowled us over with how they could bring the technology to fruition,” Infantino says. SpeechWorks proved much more adept than Lernout & Hauspie at fine-tuning the wording and enunciation of the voice prompts, according to Infantino. The vendor’s solutions services department helped Foodline plan the all-important call flow, which was designed with the aid of Artisoft Inc.’s Visual Voice Pro 5.0, a development platform for telephony applications. The call flow is the script that leads the customer down a hierarchy of voice prompts to elicit the information required to query Foodline’s restaurant information database.
Foodline’s new telephone system, which was rolled out in New York in November 1999, is hosted on three custom-built servers running under Windows NT leased from Automated Financial Systems Inc. of New York.
Restaurants connect to the servers via Foodline’s ISP, which is Digex Inc. of Beltsville, Md. With the speech recognition solution and hosting services outsourced, Infantino and his colleagues can devote most of their efforts to their core business challenges: selling restaurants and diners on Foodline and revving up revenue streams.
The company’s business model incorporates multiple sources of revenue, Infantino says. First, Foodline charges restaurants a fee of $100 per month to lease the Foodline Reservation Solution, which is designed to replace the paper-based systems that many restaurants still use. Second, restaurants pay Foodline $1 per diner for each reservation made with the system. Third, Foodline plans to sell audio advertisements for the telephone system and presently markets display ads on the companion Web site.
Still, Infantino is aware of the fiscal challenges his startup faces. “The biggest issue we have with the telephone [system] is the cost of hosting the calls. It’s hard to pay for that with the existing revenue streams.”
Building customer loyalty
American Airlines Inc. has been making an investment of a different sort in speech recognition. American’s business challenge is to maintain and build customer loyalty in an age when air travel is a commodity, and service often takes a back seat to profit margins.
So American, a subsidiary of Fort Worth, Texas-based AMR Corp., decided to boost the customer service experience by incorporating speech recognition into several of its key systems, including the call center that services its top tier of frequent flyers, AAdvantage Executive Platinum members.
American’s best customers were annoyed when they were required to enter a long and cumbersome frequent flyer number at the beginning of a call to the customer service desk. So in November 1998 the airline introduced a simple improvement designed to keep Executive Platinum members a bit happier: When they call, they can now speak their frequent flyer numbers naturally rather than straining to enter the alphanumeric sequence on the telephone keypad.
Such a simple idea, but the implementation was not trivial. “Working with multiple vendors is a challenge,” says Carline Smith, the Fort Worth, Texas-based manager of reservations planning for American. Smith says that one of the key issues was, “If there was a problem, whose problem was it?” But working with several vendors over eight months, American put the system into production in November 1998.
The key vendor in American’s project was Sabre Inc., also of Forth Worth, which acted as systems integrator. Since American’s parent AMR also owns 82% of Sabre, the partnership was close. With the new speech recognition telephony system, “the caller calls [an 800 number] and the call is routed to the IVR system,” explains Kevin Smilie, senior manager for product distribution at Sabre. “The caller is then asked to speak their AAdvantage number.” The spoken alphanumeric sequence is then digitized by the speech recognition software and pops up on the screen of a customer service agent, together with the customer’s frequent flyer account information, which has been cached to reduce the wait.
If the speech recognition system misunderstands the caller, the system prompts the caller to pronounce the AAdvantage number again. If the system and the caller reach an impasse, the call is forwarded to a customer service agent.
American and Sabre created the call flow with the PeriProducer development platform from Periphonics Inc., in Bohemia, N.Y.; the interactive voice system runs on Periphonics’ VPS/is software. For speech recognition, the American ASR system uses Recognizer software from Nuance of Menlo Park, Calif. The whole system runs on SPECS servers from Sun Microsystems Inc. of Palo Alto, Calif.
“These vendors have a process of getting [speech recognition] systems into pilot and production,” says Smilie. “They’ve done a good job.” Carline Smith of American adds: “Nuance goes into pilot and analyzes how the customer responds to the application and changes prompts that don’t work.”
But the development environment has a ways to go, according to Smilie. “Deployments are still infantile, and the development tools are still lacking,” he says. Nevertheless, Smilie is bullish on speech recognition telephony. “This is the way people will interact with call centers in the future,” he says.
Smith agrees, having implemented another speech recognition project in December 1998 that lets anyone dial American Airlines and get flight information by answering a series of voice prompts. “Improved service is one of the ways that airlines can differentiate themselves,” she concludes.
Improving the telephone experience
Chuck Forgue of Unity Healthcare also wanted to improve the telephone experience by making it easier for people to connect with colleagues via telephone. Forgue, director of communications services in the IT department at Unity, took a tall order from his boss at the St. Louis division of the Sisters of Mercy Health System. Following a corporate combination of six hospitals plus other healthcare operations under the Unity Healthcare brand, Forgue was given one week in April 1999 to implement a better way for the organization’s 4,500 employees to contact colleagues at the company’s new headquarters.
With multiple telephone systems coming together, the list of employee names was stable, but their telephone numbers were not. Management wanted to avoid delays in connecting calls. They also wanted to spare human operators the frustration of paging through a telephone directory. The company envisioned a system that would allow an employee to call a colleague simply by speaking his or her name into the telephone, and allowing the computer telephony system to handle the rest.
IBM Corp. proposed to Unity’s management that the company install its ViaVoice Directory Dialer software to solve the problem. The Armonk, N.Y.-based computer giant flew in engineers to work with Forgue on the implementation. “We put this in down and dirty, real quick,” using a cannibalized IBM Netfinity 5500 server, Forgue says. The system was up and running within a couple of days.
“We’re taking about 8,000 calls a month [with the system], and it looks like we’re saving about 80% of a full-time employee,” Forgue says. “But the reason we did it was to facilitate communication and to make sure people had access to headquarters.” Forgue’s assistant administers the ViaVoice system, stepping in to update the telephone directory when employees join or leave the company and to correct ViaVoice when the system has difficulty matching the sound of an employee’s name with its written form.
“Our next step is to roll this out to include the management at the hospitals,” Forgue says. “Then we want to widen it to include all our locations.” Now there’s a project that will have people talking. //
John Rossheim is a freelance editor and writer specializing in privacy issues and speech recognition technology. He lives in Providence, R.I., and can be reached at email@example.com.
What if a telephone-based Automatic Speech Recognition (ASR) customer-service application could be developed as a standards-based extension of an existing e-commerce Web site? That’s the goal of the emerging Voice eXtensible Markup Language (VoiceXML or VXML).
Founded by AT&T Corp., Lucent Technologies Inc., and Motorola Inc. in March 1999, the VXML Forum (www.vxmlforum.org) seeks to provide a platform that will enable Web developers to code voice prompts, call flows, and other key elements of ASR systems for telephony as intuitively as they write HTML. If the open standard works as promised, IT shops will be able to save development time and money on ASR systems since VXML-based and HTML-based applications can share critical databases with relative ease.
Before the VXML Forum was announced, the three founders and new key partner IBM Corp. had competed to develop their own markup languages to voice-enable Web applications. Motorola, for one, announced its VoxML voice programming language in September 1998 and trumpeted deployments at Web sites including Weather.com (The Weather Channel), CBS MarketWatch.com, and Biztravel.com. Web-based telephony applications might, for example, let consumers call an 800 number and speak the name of a city to listen to the local weather forecast, or get stock quotes by speaking the company name without having to know ticker symbols and enter them via telephone keypad.
With four key players joining forces under the VXML flag, more than 60 vendors have followed suit. Leading speech vendors supporting VXML include Dragon Systems Inc., Lernout & Hauspie Speech Products N.V., Nuance Communications Inc., Philips Electronics North America Corp., and SpeechWorks Inc. Computer and networking heavyweights 3Com Corp., Hewlett-Packard Co., Novell Inc., and Sun Microsystems Inc. have also given the standard a nod.
The VXML Forum released a preliminary spec for the standard in August 1999. Proponents would like to see VXML become the ultimate thin client for ubiquitous Internet access. Time will tell whether the Forum can keep its own act together, stave off any potential competing standards, and demonstrate that ASR telephony applications based on VXML can generate profits. –J.R.