Today’s voice assistants can already tackle basic administrative
chores, such as transcribing calls or scheduling meetings, and even
some higher‑level tasks, such as monitoring phone calls to identify
high‑potential sales leads. Reaching even this basic level of accuracy
and ability has taken decades of research. In part that’s because
computers have historically struggled to parse human speech— which is
freeform, creative, and full of idiosyncrasies.
Progress in recent years has come from machine learning, which
involves feeding machines enormous amounts of speech data and teaching
them to recognize patterns on their own. In 2017, Google CEO Sundar
Pichai announced that the company’s voice recognition technology had
reached 95% accuracy—a 20% improvement since 2013.
Andrew Ng, the former chief scientist at Chinese tech giant Baidu,
has predicted that voice assistants will become ubiquitous in the
workplace once they reach 99% accuracy. That
last mile will be challenging. Today’s voice assistants often struggle
to identify names from unfamiliar ethnic groups, or even pop song
songs with “foreign” titles.
Currently, you can only string up to two commands for Google
Home (for example, “Play Spotify and set volume to 10”). Google’s AI
still fails at traffic updates and other combined commands. And
computers still don’t speak entirely naturally: you probably won’t
mistake Alexa for your friend or coworker.
“The technology is moving very fast,” says Joshua Montgomery, CEO of
Mycroft, a startup that is creating an
open‑source equivalent to Amazon’s Alexa. That’s because of massive
investments in smart speakers, improved voice functions for phones and
cars, more advanced chatbots, and so on. Mycroft has raised about $3
million in venture capital and another $800,000 in preorders on Kickstarter and Indiegogo to get its
voice assistant off the ground.
At the other end of the market, Amazon and Microsoft have formed an
intriguing alliance aimed at the workplace. Alexa and Cortana
(Microsoft’s digital assistant) already share reciprocal features;
each can be used to interact with the other platform. Both can perform
basic tasks like setting meetings, managing appointments, and sending
emails. And both work with Office 365, Microsoft’s suite of
productivity apps.
Integrations are still fairly basic, but it’s easy to imagine a
future where Cortana could, for instance, tap into the automated
“Insights” functions of Excel so users can take a quick hit of data
analysis without opening a spreadsheet. Other advances will likely
come from overseas. Last year, Chinese web giant Baidu announced
DuerOS, a proprietary conversational platform that includes more than
100 partner brands, including HTC and Nvidia.
“We’re seeing a virtuous cycle where the technology is accelerating
because so many people are working on it,” says Buzzanga. “Five years
from now, will we still have today’s Microsoft and Google applications
with voice bolted on? I don’t know, but it’s not what I’m looking for.
I think it will be something more radical.”