Over the past year or so, voice interaction has gone from a novelty to something that many of us have come to rely on in our daily lives. From asking our phones to perform a search to telling our Amazon Echo or Google Home to play our favourite album, we are increasingly using our voices to get data or perform an action that previously would have required touch or type.

Ian Massingham, technical and developer evangelism lead, Amazon Web Services

So far, this has largely been confined to leisure or personal life management, with our work lives remaining relatively untouched. However, that is beginning to change, with voice technology starting to find its way into offices and other workplaces. And as the underlying technologies advance, their use in enterprise is only going to increase.

“I really think that there's a use case for this in many, many enterprises, many, many customer service organisations, either in the physical context, or in the digital context,” says Ian Massingham, technical and developer evangelism lead at Amazon Web Services (AWS).

“I think you'll see that these applications become pretty ubiquitous over time.”

The technologies driving the voice interaction revolution

Rudimentary voice interaction has been available for years, but it’s recent rise has been driven by significant advances in a host of the underlying technologies.

Machine learning, which Massingham describes as “the process of training a model with a dataset to perform predictions or to create predictive analytics capabilities”, has allowed advances, for example, in the conversion of audio to text. However, while clearly a vital part of voice interaction, conversion is only one small part of a far more complex puzzle.

Arguably far more significant to the voice interaction boom is the vast advances in understanding human language, so that instead of barking set commands that a machine can understand, we can instead speak naturally and trust that the technology will be able to work out what we mean.

“I think the biggest step forward that we've seen has been in NLU: natural language understanding,” says Massingham. “There's quite a big NLU challenge when you consider something like the Echo, or even chatbots that might be deployed inside the enterprise.

“It's a wide problem; there are a lot of different questions which can be asked, a lot of different intents that interactors – users of the service – may have, and it can be challenging for individual organisations to build systems that are capable of resolving those intents.”

“I think the biggest step forward that we've seen has been in NLU: natural language understanding.”

The primary issue is that in normal human language, there are a multitude of ways in which the same request can be made.

“Let's just say I have a travel booking bot that is associated with booking flights,” he says. “How many different ways do you think there are for me to say that I want to fly from Manchester to Edinburgh? How many different ways might I enrich that request with data which is pertinent to fulfilling the request?

“You know, 'I want to fly from Manchester to Edinburgh on Thursday with FlyBe' or 'I want to fly to Edinburgh from Manchester'. There are so many different permutations.”

Resolving this issue has been achieved not with one technology, but many, each of which has been developed and built upon.

These include keyphrase extraction, where the pertinent words and phrases within a sentence are identified, intent resolution, which involves determining what a person wants to achieve from their request, and entity recognition, which involves determining what category a word or phrase belongs to, be it a company name, airport, product or something entirely different.

Such technologies are invaluable tools for a host of applications, which has allowed AWS to offer them as the basis of numerous products, including Amazon Lex, “an engine and automatic speech recognition and natural language understanding engine that you can use to build chatbots” and Amazon Comprehend, which “allows you to send text to the service and receive assessments of key phrases”.

However, together, along with numerous other technologies, they are producing products that can not only be used to allow businesses to improve the customer experience, but enhance their own internal operations too.

Voice interaction in the workplace: from conferences to HR 

While Amazon’s flagship voice product, the Amazon Echo, has largely been confined to the home environment it, and its personal assistant Alexa, are now coming to the world of business.

“At our large user conference last year, AWS re:Invent, we announced something called Alexa for Business. Now this is only available in the US right now, and we anticipate that we will extend that to other regions in due course,” says Massingham. “Alexa for Business enables enterprises or other large-scale users of technology to deploy the Amazon Echo within their environment.”

This is significant because it allows large numbers of Amazon Echos to be rapidly set up and deployed to a specific environment, rather than taking each unit through the five to ten-minute provisioning process, one after the other, and so making the technology practical to use at scale.

“It also supports automation of common office infrastructure,” adds Massingham. “I'm talking here about conference rooms, video conferencing systems and the device itself will also act as a conference phone, so you could put an Amazon Echo into a conference room rather than a speaker phone and use that to join conference calls to drive the video conferencing system which is present within the room. There are quite a lot of features there.”

This creates the opportunity for businesses to use the technology in ways that Massingham predicts will significantly improve the way office workers perform basic tasks.

“I think it will have a significant impact on the average office worker,” he says. “If you've got a conference room and you walk into it and you just say 'start my video conference' and then the voice interface looks back into your Exchange calendar, figures out what's in your diary next, grabs the attributes that are required to initiate the call and then all of a sudden you're in the call on the video screen in the room, that's a much better user experience than fiddling around with pin codes.

“So it's all on the simple accessibility of the user experience for the end user. I don't really think the end user will know or care who is providing the technology, what they will see is dramatic uptick in the quality of the interfaces, the simplicity of the interfaces and the accessibility of the interfaces that they interact with as part of their work and or as part of their everyday lives.”

“I don't really think the end user will know or care who is providing the technology, what they will see is dramatic uptick in the quality, the simplicity and the accessibility of the interfaces that they interact with as part of their work.”

Such natural language technology is also enabling an automation of internal processes, although at this stage through text. One example of where this is already in place is in the development centres of US insurer Liberty Mutual in Dublin, Ireland, and Belfast, the UK.

“They have chatbot technology integrated with their employee information system, so you can go on to the intranet and ask questions of an intelligent agent via text, things like 'how do I make a change to my pension contributions' or 'how much leave do I have left this year',” explains Massingham.

“Their HR self-service systems have been exposed through their internal users via a chat interface, so you don't have to know where to find the information, you just ask natural language questions via text and the chatbot will answer where it can, or HR service centre employees will answer where the chatbot can't.”

Such technology has immense potential to make workplace interactions easier, but when combined with voice, it also has the potential to improve access for employees with disabilities.

“It makes these systems much more accessible to individuals that might historically have been excluded or marginalised from their use. I'm talking about people that have physical disabilities, maybe they're visually impaired, or those can't get around, maybe they don't have the skills to use a keyboard,” he says. “Well they can now, today, access information systems and applications that might have been off-limits to them previously.

“We get a lot of interest from charities and other organisations that are working with individuals that have traditionally been less able to access information systems and applications because of physical impairment or accessibility issues that they have.”

Beyond the office: the wide world of voice interaction for business

Offices aren’t the only area where voice has significant potential for business. Hospitality has already shown significant potential for adding to the guest experience.

“Say you're a hotel like the Wynn, for example, in Las Vegas. They have Amazon Echo devices in every one of their guest suites, and you can literally walk into a room and say 'Alexa, open the curtains' or 'Alexa, turn the lights' and it's used for environment automation there,” says Massingham, adding that the Alexa for Businesses service has dramatically improved deployment in these settings.

There are also many workplace settings where voice interaction is of great benefit, particularly in roles where stopping to put down items and check a computer delays processes.

“It's also being used in retail use cases in the US, with one customer who's using it for back office automation in the retail environment,” he explains.

“Managers of retail stores can use voice to check stock levels, to reorder products, to understand how they're performing against KPIs and targets that they may have assigned to them, and they can do all of that with voice without having to log into a machine or even necessarily walk over to the device.

“They can have it in a stock room or bank office in the retail environment and interact very naturally with the device with their information systems using voice.”

“Managers of retail stores can use voice to check stock levels, to reorder products, to understand how they're performing against KPIs and targets that they may have assigned to them.”

Then there are contact centres, one of the earliest adopters of customer-facing enterprise AI, which already use the technology to automate parts of the customer interaction process, many through Amazon’s own contact centre product, Amazon Connect.

Here, a voice interface is able to perform tasks for a host of companies, with Massingham giving the case of a power provider handling customer enquiries during a weather-related blackout.

“[The platform is able] to receive calls from customers, to ask customers what they need help with, to understand when they want to report an outage or want to know when their water is going to be turned back on,” he explains.

“To integrate that with data sources within the enterprise that might hold information about service restoration, or be able to record outages and put them into a workflow queue for engineers to deal with then to generate speech back to customer, which says: yes we understand that you currently have no power, we'll dispatch an engineer out to you. You should expect your supply to be back on within the next six hours, eight hours, whatever that number is that comes back from the information system at the backend.

“It can be used to automate customer service workflows, it can be used to provide service which is more immediate, and actually it can be used to allow the human beings that are working in these contact centre environments to focus on the more complex tasks, which do genuinely demand human intelligence and can't be fulfilled through an automated agent like this.”

Driving the voice boom with accessible machine learning

Of course, in many of these situations the voice interaction system has to be plugged into existing company databases or systems, and while there are many enterprise-focused tools offered by AWS and competing companies to resolve this, in some unusual cases a company many need to build its own solution.

Previously this would have been immensely challenging: machine learning and artificial intelligence technologies are notoriously difficult in comparison to many other programming tasks, particularly for developers who have not previously tackled them.

However, AWS has a product for this too, in the form of Amazon SageMaker.

“We announced this last year, and this is a complete environment for software developers and researchers that want to build and train machine learning models whilst abstracting away a lot of the complexity and heavy lifting,” explains Massingham.

“Those really simplify the process for developers that want to get hold of the tools for machine learning without knowing that much about how the tools are built.”

“There's a lot of rich functionality out there that just lets developers get started much more quickly, and see their results much more quickly and iterate much more quickly.”

As a result, this type of technology is now considerably more accessible than it previously has been, paving the way for wider adoption and use.

“At AWS we often say there's never been a better time to build apps; we seem to say that continuously. But I do actually believe that: I think there's never been a better time to embark on a project that involves AI and machine learning than there is today,” he says.

“There's a lot of rich functionality out there at the platform level as well as at the abstracted services level that just lets developers get started much more quickly, and see their results much more quickly and iterate much more quickly, and therefore build better products, build more effective products.”

Voice interaction: the ubiquitous technology of tomorrow

Given the potential of voice interaction for businesses, how widespread does Massingham anticipate the technology being within a decade?

“I think it's very difficult to predict what will happen,” he says. “I think you'll find that voice interfaces become ubiquitous, but I also think you'll find that there is a minority of the population that never really get comfortable with voice. That's my own expectation.”

However, despite some members of the population remaining resistant, he anticipates natural language interfaces becoming the standard for many interactions.

“I think you'll find that voice interfaces become ubiquitous, but I also think you'll find that there is a minority of the population that never really get comfortable with voice.”

“I think that chat-based interfaces, the use of freeform text as an interface, I think that will be ubiquitous,” he says. “Everybody uses messaging applications and it's very simple to integrate those applications with intelligent agents so that you can interact with brands or services using natural language. So as a proportion of overall interaction there will be a lot less structured interaction with applications and data sources.

“In my view a lot more usage of 'I want to book a flight from Manchester to Edinburgh tomorrow at three'. 'Ok, do you want to fly with BA or FlyBe?' 'I want to fly with BA'. 'The cost is £150. Do you want me to book it?' 'Yes'.

“You've never actually interacted with a human being there, it's natural language interfaces to information systems. I think that will be become very, very prevalent.”


Share this article