Lesson 9: Introduction to Natural Language Processing

What is Natural Language Processing?

Natural Language Processing (NLP) is an interdisciplinary subfield of artificial intelligence, computer science and linguistics that allows machines to analyze, understand, interpret and generate human language such as spoken or written language. In this lesson, we will focus mostly on written language usually represented as text in documents, customer support tickets, product reviews, email communications, social media feeds, insurance claims, medical reports, court records, and more.

NLP Tasks

The applications of NLP are everywhere today. Sometimes when my 6 year old daughter wants to sleep in the evening, she will take Amazon Alexa to her room, and speak to Alexa: “Alexa, play K-love” to listen to some praise music to fall asleep. Virtual assistants like Sirri and Alexa use NLP. When you speak to Alexa, Alexa receives the voice request, converts the voice request to text, processes the text and assigns meaning to it (Natural language understanding or NLU), then generates a text response (Natural Language Generation or NLG) which is then converted to a voice response. Hence, speech-to-text, NLU, NLG and text-to-speech occurs inside of Alexa. NLU and NLG are two broad categories of NLP tasks.

NLP tasks and use cases include:

NLP Tasks and Applications

Language modeling

Autocompletion when typing emails, for example in Outlook or GMail.
Sentence completion: predicting the words in a sentence.
Spelling correction.

Conversational agents

Chatbot assistant such as in customer support services.
Dialogue systems such as Siri and Alexa.

Text Classification

Spam filter popularly used in filtering emails.
Facts checking and fake news detection.
Plagiarism detection.
Sentiment analysis, for example, detecting sentiments in a social media feed.

Information Extraction

Extracting specific information such as brands, company names, names of people, location, dates, etc from a text. This is known as identifying named entities from text (named entity recognition).
Extracting keywords or key phrases from a text.

Information Search or Retrieval

Finding information such as documents relevant to a user’s search query. For example, Google search is a popular use case of information retrieval.

Topic Modeling

Extraction of topics and themes that best represent a text.

Text Summarization

Generating summaries, main points, or the most important information from text documents such as articles and medical reports. Note that extractive summarization extracts the most important sentences from long-form text, which are joined together to form a summary. On the other hand, abstractive summarization paraphrases text to form a summary, and may include words or sentences that aren’t present in the original text.

Machine translation

Converting text from one language to the other. Google translate is a application of machine translation.

Brief History is NLP

Rule Based Approaches

The field of NLP began in 1940, after the second world war. There was a need to develop a machine that automatically translates one language to another. Hence, the first NLP Task was an experimental machine translation project, a collaboration between Georgetown University and IBM, which started in in the early 1950s. The Georgetown-IBM experiment implemented a programmed rule-based machine translation system on an IBM 701 (IBM’s first commercial and scientific computer), to translate the Russian language to English. The public demonstration of a Russian-English machine translation system in New York in January 1954 sparked a great deal of public interest to the point where the front page of the New York Times reported that:

“A public demonstration of what is believed to be the first successful use of a machine to translate meaningful texts from one language to another took place here yesterday afternoon. This may be the cumulation of centuries of search by scholars for “a mechanical translator.”
In 1966, Joseph Weizenbaum, a professor at the Massachusetts Institute of Technology (MIT) Artificial Intelligence Laboratory, released a chatbot called ELIZA, that mimics a Rogerian psychotherapist. ELIZA was created to serve as a therapeutic tool, and it would respond to users mostly by asking open-ended questions and giving generic responses to words and phrases. ELIZA simulated conversation by using a pattern matching and substitution methodology programmed with complicated if-then logic and significant linguistic expertise. Rule based systems are complex to build and depend on expert domain knowledge.

Statistical and Machine Learning Approaches

In the 1980s and 1990s there was a paradigm shift from rule-based approach to statistical, probabilistic, and machine learning approaches for NLP tasks due to increase in data and computational power. By the 1990s, IBM introduced statistical machine translation where researchers theorized that if they looked at enough text, they could find patterns in translations. Statistical machine translation concepts come from information theory.
The creation of the world wide web in 1990 made it possible to accumulate huge amount of text data that machine learning algorithms can train on to achieve NLP tasks. In 1990s, the first recurrent neural network was trained.
In 1997 LSTM (long short-term memory) for recurrent neural networks was developed.
In 2006, Google Translate was the first successful commercial NLP machine translation system. Google Translate uses statistical machine translation.
In 2013, Google researchers introduced word2vec, the first word embeddings model, widely used for encoding words into vector representations in NLP tasks.

Large Language Models

In 2014, the attention mechanism was introduced by Google and published in 2017 in a paper titled, “Attention Is All You Need”.
In 2017, the Google Brain researchers introduced the transformer architecture.
In 2018, OpenAI released the first GPT-1, their first Generative Pre-trained Transformer model.
In 2019, OpenAI released GPT-2. Google released BERT, a large transformer model widely used to create word embeddings or representations.
In 2020 OpenAI released GPT-3.
In 2022, OpenAI released ChatGPT, which became popular overnight.