Aiming to build momentum on its speech technology efforts for the .NET
platform, Microsoft Wednesday used the annual SpeechTEK
International Exposition and Educational Conference as a platform to unwrap
the beta 2 release of the .NET Speech Software Development Kit (SDK), a
technical preview of the .NET Speech platform, and a new Joint Development
Program (JDP).
The SDK is a developer tool based on the Speech Application Language Tags
(SALT) specification, which defines a set of lightweight tags as extensions
to common Web-based programming languages, allowing developers to add
speech functionality to existing Web applications. The SDK is designed to
integrate with the Visual Studio .NET development environment, and will
allow developers to write combined speech and visual Web applications in a
single code base.
Microsoft unveiled
the first beta of the .NET Speech SDK in May. The beta 2 release adds
enhancements to grammar and prompt creation, editing tools, and debugging
tools for telephony and multimodal applications. The new features include
W3C standards-compatible formats for grammar authoring, a prebuilt library
of reusable speech telephony and application controls, and grammar
libraries.
To add to the momentum, Microsoft showed off a technical preview of the
.NET Speech platform — slated for release in the middle of 2003. The
platform is the core speech recognition engine in Microsoft’s strategy.
Designed to support both telephones and multimodal-enabled devices like
PCs, PDAs and Tablet PCs, the platform contains the SALT interpreter
software, SALT-enabled ASP.NET controls, a SALT-based voice browser and a
text-to-speech engine provided by strategic partner SpeechWorks.
Finally, building on the speech-centric strategic relationships it has
formed with SpeechWorks, Intervoice, and long-standing partner Intel, Microsoft unveiled the JDP, which aims to bring
together enterprise customers and partners looking to build and deploy
applications from the .NET Speech SDK on the .NET Speech platform.
JDP participants will receive access to the .NET Speech platform technical
preview, as well as real-world production-environment-level testing.
Microsoft believes the value proposition of speech technology is clear: it
stands to reduce costs associated with call center agents. A typical
customer service call costs $5 to $10 to support, while an automated voice
recognition system can lower that to 10 cents to 30 cents per call.
Additionally, voice recognition technology can be used to give employees
access to critical information while on the move.
Earlier this year, market research firm the Kelsey Group projected
worldwide spending on voice recognition will reach $41 billion by 2005.
But Microsoft is by no means alone in the space. It is likely to face stiff
competition from IBM , a pioneer in the voice recognition
space. In April, IBM announced it had assigned
about 100 speech researchers from IBM Research to an eight-year project
dubbed the Super Human Speech Recognition Initiative, intended to
revolutionize voice technologies.
Currently IBM offers solutions based on VoiceXML and Java, and has helped
develop a new specification, X+V (a combination of XHTML and VoiceXML)
for multimodal access. For instance, it crafted a system for investment
management firm T. Rowe Price, which allows customers to access and manage
their accounts through natural conversations by utilizing IBM WebSphere
Voice Server with Natural Language Understanding.
Smaller, specialized players, like Mountain View, Calif.-based start-up
TuVox, are also in the space. TuVox, founded by two alums of Apple Computer
uses a combination of artificial intelligence and VoiceXML to help firms
automate their technical support call centers. It has already automated
the after-hours technical support lines for both Handspring and
Activision.
But while the ball is already rolling in the voice recognition space, IBM
says there are still significant hurdles to overcome; hurdles which spurred
it to create the Super Human Speech Recognition Initiative.
Noise, punctuation and grammar, and accents all continue to pose problems
for speech recognition.