A Brief Introduction to NLP with Phoebe LiuHave you ever interacted with a chatbot? Or requested something from a virtual assistant like Siri, Alexa, or your car’s infotainment system? What about translating something online? Most of us have interacted with these types of artificial intelligence (AI) before, and never stopped to contemplate the ease with which we could communicate our needs and receive an appropriate response. But a quick pause to reflect on the complexities of human language, and isn’t it a wonder that machines can communicate with us at all? It’s all thanks to natural language processing. But what is natural language processing (NLP)? Natural language processing is the technology used to teach computers how to understand and generate appropriate responses in a human-life manner. With NLP, machines learn to read, decipher, and interpret written and spoken human language, as well as create narratives that describe, summarize, or explain input (structured data) in a human-like manner. NLP is the driving force behind many AI solutions you interact with regularly and enables comprehension between humans and machines. Today, NLP is becoming increasingly popular thanks to tremendous improvements in data access and increases in computational power.
Why Natural Language Processing is DifficultNLP can be challenging. But why is natural language processing difficult? A computer’s native language, at its base level, is simply a collection of millions of ones and zeros, a binary assortment of yes’s and no’s. Computers don’t think contextually like humans – they think logically. When you speak to an AI-powered computer, that machine must somehow understand and interpret what was said, calculate an appropriate response, and convert that response to human (or natural) language—all in a matter of milliseconds. It’s hard to imagine the level of processing power required for this feat, and computers are doing this all the time. The intricacies of natural language shouldn’t be understated, either. Humans express themselves in an infinite number of ways. There are hundreds of languages and dialects, and each has its own syntax rules and slang that may vary whether the language is written or spoken. Individuals also write and speak differently from one another. Some may talk with a lisp, for instance, or write with abbreviations. For a computer to understand all of these deviations, it must have encountered them before. It must be trained on similar data. Another challenge is that the training corpus should be in the same domain for the intended application. For example, the conversation collected in a medical environment is different from that of the customer support domain, making data collection all more challenging as it is hard, but necessary, to gather data from the right domain. These factors all contribute to the difficulty involved in the implementation of NLP. You must have access to large amounts of natural language data so a computer is prepared for a vast range of interactions. The computational power to service those interactions and bridge the gap between ones and zeros and natural language is critical. It’s little wonder that NLP has only recently become a prominent part of machine learning.
NLP TechniquesNLP breaks down language into shorter segments to understand relationships between the segments and how they connect to create meaning. The two language components are syntax (the arrangement of words in a sentence such that they make grammatical sense) and semantics (the meaning conveyed by the text). Within each category are core NLP techniques:
Syntactic AnalysisThese are a few standard methods machines use to analyze syntax:
- Segmentation: Breaking a sentence down into smaller pieces.
- Lemmatization: Reducing a word to its base and grouping together similarly-based words.
- Part-of-speech tagging: Identifying the part-of-speech for each word.
- Stemming: Removing affixes and suffixes of words to obtain root word.
Semantic AnalysisThe following are two popular methods machines use to analyze meaning:
- Named entity recognition: Identify preset groups (such as people and places) and categorize them.
- Word sense disambiguation: Give meaning to a word based on context.
What Can Natural Language Processing Do?NLP has many use cases. It helps scale language-related tasks by enabling machines to carry out repetitive tasks that would otherwise be done by humans. A variety of industries use NLP, including:
- Social media analytics: NLP can track sentiments about brands, products, or specific topics and determine how customers make choices. It can also filter out fake news by detecting political bias.
- Text-to-speech applications: Text-to-speech apps provide information in more ways for greater inclusivity, as well as create richer interactive experiences for call centers, video games, and language education domains.
- Personal assistants and chatbots: NLP enables AI to communicate with people for routine questions and transactions, freeing humans for more high-level, strategic efforts.
- Search queries. Especially useful in eCommerce, NLP helps identify key search terms to drive more relevant search results.
- Language translation: NLP is used to translate across a full range of languages and dialects.
- Information extraction: Used, for instance, in healthcare for patient records, data extraction via NLP is vital for distilling critical information quickly.