Natural language understanding, also known as NLU, is a term that refers to how computers understand language spoken and written by people. Yes, that’s almost tautological, but it’s worth stating, because while the architecture of NLU is complex, and the results can be magical, the underlying goal of NLU is very clear.
For example, there are an estimated 320 billion emails sent every day. That is a lot of natural language created and consumed, and if computers can better understand it, it can help the people who are interacting with those emails. NLU can determine whether an email is spam, if an email is high priority, or if there are other, related, emails to share with the recipient. All of these efforts help people get the most out of email.
Of course, there’s also the ever present question of what the difference is between natural language understanding and natural language processing, or NLP. The answer, again, is in the name. Natural language processing is about processing natural language, or taking text and transforming it into pieces that are easier for computers to use. Some common NLP tasks are removing stop words, segmenting words, or splitting compound words. NLP can also identify parts of speech, or important entities within text.
Getting back to the uses of natural language understanding, we can think of other examples, such as:
These examples are a small percentage of all the uses for natural language understanding. Anything you can think of where you could benefit from understanding what natural language is communicating is likely a domain for NLU.
Natural language understanding is complicated, and seems like magic, because natural language is complicated. Language packs a lot of information in a small amount of space. A clear example of this is the sentence “the trophy would not fit in the brown suitcase because it was too big.” You probably understood immediately what was too big, but this is really difficult for a computer.
We can’t simply write a program that checks for the phrase “was too big” and understand that the phrase refers to the first item. First, because the phrase might instead be “was too large” or “was too heavy” or “is too big.” Second, because there are formulations where that “rule” falls flat, such as “the brown suitcase would not fit the trophy because it was too big.” There are even phrasings that might even be confusing to people, such as “I didn’t bring the trophy in the brown suitcase because it was too big.” Was the trophy too big for the suitcase, or was the suitcase too big to bring?
It’s for this reason that NLU relies heavily on machine learning. Machine learning, or ML, can take large amounts of text and learn patterns over time. This is explained by what’s called the distributional hypothesis, which says that you can learn a lot about a word “by the company it keeps.” Take the word “hat.” An ML model might see phrases like, “the man was wearing a hat on his head” or “I put on a hat to keep the sun out of my eyes.” If the model sees phrases like these enough, it starts to pick up on some patterns. Throw it, then, the phrase, “I put on a baseball cap to keep out the sun” and it can sense that just maybe there is a similarity between “hat” and “baseball cap.” Add in the phrase “the man wore a baseball cap on his head” and the similarity is seen to be even stronger.
As you can imagine, these ML models require a lot of data. OpenAI trained their GPT-2 model on 1.5 billion parameters, and followed that up with GPT-3 on 175 billion parameters. This data is often crawled from publicly available data on the web, but is then fine-tuned on a specific dataset. This fine tuning allows the model to better understand a given dataset. For example, fine tuning may help the model to better understand medical data.
Improvements in computing and machine learning have increased the power and capabilities of NLU over the past decade. We can expect over the next few years for NLU to become even more powerful and more integrated into software.
For more information on the applications of Natural Language Understanding, and to learn how you can leverage Algolia’s search and discovery APIs across your site or app, please contact our team of experts.
Dustin Coates
Product and GTM ManagerPowered by Algolia AI Recommendations