NEW YORK — World Wide Web creator Sir Tim Berners-Lee is challenging developers to do more with voice recognition systems and spur a sector that is ripe for innovation.
In town to deliver a keynote address at the SpeechTek conference,
Berners-Lee said end users demand a seamless experience when dealing
with voice-activated telephone systems. He also warned that frustration with voice recognition limitations could hurt the industry at a time when existing standards can help developers get past current voice recognition limitations.
Berners-Lee, who serves as a director of the World Wide Web
Consortium (W3C), said voice technology firms must find ways to provide a good interpretation of mumbles and mangled phrases or even context for voice transactions.
“I’m a user. I call 1-800 numbers to get my washing machine fixed. It’s important to me that it works properly,” Berners-Lee said, recounting his own frustrations with a voice-activated system that did not recognize the word “yes.”
“Generally, I’m impressed with what voice technology could do but
when it can’t understand that I’m shouting ‘yes!’ into the telephone,
there are limitations. I eventually learned to say ‘yup” and got my
appointment.”
He also suggested development work be centered around understanding
the context of certain voice commands, especially when using voice
technology to handle customer service queries.
Berners-Lee described voice recognition technology as a tough sector because of the inherent differences between natural languages and computer languages. “The natural language is soft, fuzzy and evolving
but computer languages are hard and clearly defined. Speech
technologies are trying to bridge the gap to help computers to figure
out what people are saying and that’s not an easy thing,” he said.
“Computer recognition has to be just as good as a human brain,” he
said, arguing that the sophisticated use of voice technology will be
driven by standards coming from the W3C.
He called on developers in the audience to get involved in the W3C’s
work to create specifications around a voice browser and multimodal interaction activity.
Berners-Lee also highlighted the work in the W3C Speech Interface
Framework that recently
published the Speech Synthesis Markup Language (SSML) 1.0 as a W3C
Recommendation.
Practical use of SSML allows VoiceXML-based services to be accessed
via text phones for people with speaking or hearing impairments. It is
also aimed at helping software developers build applications for such
gadgets as mobile phones and personal digital assistants (PDAs).It
joins existing standards such as the W3C Recommendations VoiceXML 2.0
and Speech Recognition Grammar Specification (SRGS).
Berners-Lee said developers could expect W3C recommendations for
InkML and EMMA (Extended Multimodal Annotation), both of which deal with speech
and ink recognition technologies.
He also said voice technologies could be used to drive enterprise
adoption of the Semantic Web, which treats the World Wide Web as one giant database that links human readable documents and machine readable data in a way useful to both mankind and machine.
Berners-Lee, one of the driving
forces behind the idea of giving data more meaning through the use
of metadata
commands to existing back end databases could stunt growth in the voice
technology space.
This is where the Semantic Web comes in, he argued, pointing out that
voice recognition technology will benefit when applications start
communicating with each other in a straightforward way.
“Talk to any CIO and they’ll tell you what the problem is. It’s the
stovepipe where one application handles one area of business and another
application does something else. And these applications aren’t talking
to each other. The problem of getting through the stovepipe is huge.”