Wednesday used the annual SpeechTEK International Exposition and Educational Conference as a platform to unwrap the beta 2 release of the .NET Speech Software Development Kit (SDK), a technical preview of the .NET Speech platform, and a new Joint Development Program (JDP).
The SDK is a developer tool based on the Speech Application Language Tags (SALT) specification, which defines a set of lightweight tags as extensions to common Web-based programming languages, allowing developers to add speech functionality to existing Web applications. The SDK is designed to integrate with the Visual Studio .NET development environment, and will allow developers to write combined speech and visual Web applications in a single code base.
Microsoft unveiled the first beta of the .NET Speech SDK in May. The beta 2 release adds enhancements to grammar and prompt creation, editing tools, and debugging tools for telephony and multimodal applications. The new features include W3C standards-compatible formats for grammar authoring, a prebuilt library of reusable speech telephony and application controls, and grammar libraries.
To add to the momentum, Microsoft showed off a technical preview of the .NET Speech platform -- slated for release in the middle of 2003. The platform is the core speech recognition engine in Microsoft's strategy. Designed to support both telephones and multimodal-enabled devices like PCs, PDAs and Tablet PCs, the platform contains the SALT interpreter software, SALT-enabled ASP.NET controls, a SALT-based voice browser and a text-to-speech engine provided by strategic partner SpeechWorks.
Finally, building on the speech-centric strategic relationships it has formed with SpeechWorks, Intervoice, and long-standing partner Intel, Microsoft unveiled the JDP, which aims to bring together enterprise customers and partners looking to build and deploy applications from the .NET Speech SDK on the .NET Speech platform.
JDP participants will receive access to the .NET Speech platform technical preview, as well as real-world production-environment-level testing.
Microsoft believes the value proposition of speech technology is clear: it stands to reduce costs associated with call center agents. A typical customer service call costs $5 to $10 to support, while an automated voice recognition system can lower that to 10 cents to 30 cents per call. Additionally, voice recognition technology can be used to give employees access to critical information while on the move.
Earlier this year, market research firm the Kelsey Group projected worldwide spending on voice recognition will reach $41 billion by 2005.
But Microsoft is by no means alone in the space. It is likely to face stiff competition from IBM, a pioneer in the voice recognition space. In April, IBM announced it had assigned about 100 speech researchers from IBM Research to an eight-year project dubbed the Super Human Speech Recognition Initiative, intended to revolutionize voice technologies.
Currently IBM offers solutions based on VoiceXML and Java, and has helped develop a new specification, X+V (a combination of XHTML and VoiceXML) for multimodal access. For instance, it crafted a system for investment management firm T. Rowe Price, which allows customers to access and manage their accounts through natural conversations by utilizing IBM WebSphere Voice Server with Natural Language Understanding.
Smaller, specialized players, like Mountain View, Calif.-based start-up TuVox, are also in the space. TuVox, founded by two alums of Apple Computer uses a combination of artificial intelligence and VoiceXML to help firms automate their technical support call centers. It has already automated the after-hours technical support lines for both Handspring and Activision.
But while the ball is already rolling in the voice recognition space, IBM says there are still significant hurdles to overcome; hurdles which spurred it to create the Super Human Speech Recognition Initiative.
Noise, punctuation and grammar, and accents all continue to pose problems for speech recognition.