Conversational AI is closer to the artificial intelligence we have been expecting and not getting in completed products. The closest thing we have are Digital Assistants like the Amazon Echo, which is mostly just a good speech to text engine working off of Google Search. But with a Conversational AI, the promise is not only an AI that can have a conversation with you; it is one that can display non-verbal cues like facial expressions as well.
At NVIDIA’s GTC this week, Jensen Huan, their CEO, showcased the latest iteration of Jarvis their Conversational AI engine, and it could transform how our technology interacts with us. This AI interaction at scale, if done right, could vastly improve automated sales close rates, increase customer satisfaction with automated help systems, and even better allow robots to become customer-facing interfaces at physical venues. This last is essential because, right now, with COVID-19, customer-facing employees may be at the greatest risk.
Even more interesting is that this technology is on the path to replacing physical talking heads. And you could, in theory, use it so that a long passed CEO could still give keynote speeches at company events. Or, recalling Ronald McDonald and Jack in the Jack In The Box commercials, you could create a virtual spokesperson that could scale to talk to millions, if not billions, of customers.
The Importance Of Non-Verbal Cues
When we communicate in person, we don’t just speak. We have facial expressions that we use to emphasize and contextualize what we are saying. The same sentence, you look good, with different intonation and expression could be a compliment or a sarcastic critique, for instance.
One of the things that makes interfacing with a computer less efficient than interfacing with a person is that computers are emotionally barren and is currently unable to use the full set of tools a human uses for complex expression and communication.
So an AI that can more completely use physical expressions should, if the technology is used effectively, be able to better communicate and create a bond with the human interfacing with it. As we move to more verbal interfaces, a trend that the Pandemic is now starting to drive, we’ll need those interfaces to be more capable than they are, and giving them the ability to converse and emote would go a long way towards getting that done.
The Importance Of Scale
I once had a conversation with a very frustrated Steve Ballmer at Microsoft. Steve indicated that given the massive number of customers the firm had, it was complicated for him to take direction from them, let alone just keep them all straight. He argued that he couldn’t be customer-driven because there were so many of them with diverse needs. Steve couldn’t translate those needs into actions even though he agreed Microsoft needed to be more customer-focused.
What an AI does that a human can’t is scale. We have limitations on the number of people we can collect data from, our ability to retain unaltered that data, and our ability to form that data into information that genuinely reflects our market.
And when backed with in-depth customer data, an AI can better understand what will likely trigger someone to buy, how to best deal with them if they are upset, and may even know a great deal about what the customer is dealing with personally from Social Media.
AIs don’t get mad, they don’t pull pranks, they don’t have substance abuse problems, they don’t make inappropriate comments (unless they aren’t properly trained), and computers don’t get tired. And this scale would allow a Conversational AI to interact with every customer a company had either just to keep those customers informed or to help them through a problem.
For instance, this week, I got a notice that UPS delivered a package that was supposed to come to me to someplace on the other side of the county. I tried to talk to someone at UPS 4 times and got cut off, got busy signals, and finally was told the UPS line was disconnected when I attempted to transfer from their scripted bot to a real person.
Now Jarvis isn’t a conversational AI; it is part of a toolset that allows you to create one. As it would likely be implemented, it would be layered on top of something like NVIDIA’s Merlin Application Framework for Deep Learning Recommender Systems, which was also announced this week. Also, you would use NeMO an Open-Source toolkit to build conversational AI-models, Megatron -BERT, which improves reading comprehension enhancing response accuracy, TesorRT 7.1, which improves AI inference accuracy, and Flowtron, a state of the art speech synthesis model which allows the system to talk and emote accurately.
The combination of these technologies, plus future development efforts, should result in a Conversational AI that will revolutionize how we interact with computers en masse. The result should be a massive increase in the use of AIs in customer-facing rolls and a strengthened ability to safely provide seeming in-person support during a time when many of us can’t leave the house.
It is potentially one of the bigger game changers announced at NVIDIA’s GTC this year.
Right now, the future of computing is evolving into an interactive speech interface. NVIDIA showcased a unique and powerful solution this week with Jarvis and Merlin that could result in a Digital Assistant that is more like its name and less like a verbal interface to Google Search. This technology is one of the ways we could also achieve digital immortality because this AI could learn to look and act like you over time and, once you are gone, continue to interface with your loved ones even after you are long gone.
But with the COVID-19 event, the need to replace humans that interface with lots of people during a workday with a system that can’t get sick has never been higher. Jarvis and Merlin, with the other noted technologies, could bridge that gap and take us closer to our speech based new interfaces far more quickly.