Seeing as Siri is active on more than half a billion devices, you have likely spoken this phrase today. Or perhaps you regularly use other voice assistants, such as Google Home or Alexa, that have flooded today’s consumer market. Each one of these devices is constantly being enhanced to better emulate a real, conversing human.
But how is it possible for technological systems to communicate with the human-like charisma they aim to have? Continue reading to find out how Siri, Alexa, and other voice assistants work!
Natural Language Processing (NLP)
Voice assistants are powered by a branch of artificial intelligence (AI) called Natural Language Processing. ‘Natural Language’ is a fancy term that refers to languages humans use with diverse sets of vocabulary, words with multiple meanings, unique accents, and wordplay. NLP helps voice assistants understand and respond to the complexities of our languages.
There are two key parts to NLP:
- Natural Language Understanding
- Natural Language Generation
Natural Language Understanding (NLU)
NLU is the process of comprehending what the user said, and what their intent was. It’s easier for a computer to understand a language when it’s broken up into pieces, and the following components help the computer understand each of the pieces in a unique way:
- Tokenization – breaks up the sentence into individual words
- Part-of-speech tagging – identifies words by their part of speech
- Lemmatization /Stemming – returns words to their ‘base format’ or dictionary form.
- Semantic labeling – allows the algorithm to understand the role of a word in a sentence (e.g. as an agent, goal, result, etc.)
- Phrase structure rules – allow the computer to understand grammar
However, to make a fully functioning voice assistant, the system has to be able to respond. That’s where the next component comes in…
Natural Language Generation (NLG)
NLG is the process of formulating an intelligent and conversational response. This process uses structured data to retrieve information in order to formulate an appropriate response.
Structured data is information an algorithm must retrieve from other sources to answer a user’s query. For instance, if a user asks how long it’ll take to get to a location, it’s possible that the data to answer the question will be pulled from Google Maps.
The types of neural networks being used for NLG algorithms are Recurrent Neural Networks (RNNs). The output of this kind of neural network feeds back into itself, allowing the network to make decisions over tiny steps. Furthermore, these neural networks allow the device to have a “good memory”, which is essential for carrying out a conversation.
After a voice assistant has generated an intelligent response, it is delivered to the user using a process called speech synthesis. The system takes the response that has been generated by the NLG algorithm and breaks it down to its phonetic components. Then, the voice assistant plays the response to the user out of a speaker, adding human-like voice quality with pitch changes.
Even though it’s been nearly a decade since some voice assistants have been developed, there are still some hurdles in NLP technology that haven’t been overcome. There are two main challenges in the NLU process:
- Lexical ambiguity
- Syntactic ambiguity
Lexical ambiguity is when a word has multiple meanings. There are tons of words in the English language with more than one definition, so it can sometimes be difficult for the NLU system to interpret the user’s intent.
Syntactic ambiguity is when there’s more than one way to interpret a sentence. Take this sentence as an example: Brian heard the cat with one ear. This sentence can either mean that the cat has one ear, or that Brian heard the causing one ear. Sentences like these can baffle the system, which may result in a confusing response.
- Voice assistants are powered by a branch of artificial intelligence called Natural Language Processing.
- NLP helps computers understand human languages by combining computer science and linguistics.
- There are two key parts to NLP – Natural Language Understanding and Natural Language Generation.
- NLU is the process of comprehending what the user said
- NLG is the process of formulating an intelligent and conversational response.
- Speech synthesis is the process that a voice assistant uses to respond to the user
- The two main challenges in the NLU process are lexical ambiguity and syntactic ambiguity.
As NLP technology improves, voice assistants are becoming better at understanding and interpreting their human users.