Lesson 9: Introduction to Natural Language Processing
What is Natural Language Processing?
Natural Language Processing (NLP) is an interdisciplinary subfield of artificial intelligence, computer science and linguistics that allows machines to analyze, understand, interpret and generate human language such as spoken or written language. In this lesson, we will focus mostly on written language usually represented as text in documents, customer support tickets, product reviews, email communications, social media feeds, insurance claims, medical reports, court records, and more.
NLP Tasks
The applications of NLP are everywhere today. Sometimes when my 6 year old daughter wants to sleep in the evening, she will take Amazon Alexa to her room, and speak to Alexa: “Alexa, play K-love” to listen to some praise music to fall asleep. Virtual assistants like Sirri and Alexa use NLP. When you speak to Alexa, Alexa receives the voice request, converts the voice request to text, processes the text and assigns meaning to it (Natural language understanding or NLU), then generates a text response (Natural Language Generation or NLG) which is then converted to a voice response. Hence, speech-to-text, NLU, NLG and text-to-speech occurs inside of Alexa. NLU and NLG are two broad categories of NLP tasks.
NLP tasks and use cases include:
Brief History is NLP
Rule Based Approaches
The field of NLP began in 1940, after the second world war. There was a need to develop an machine that automatically translates one language to another. Hence, the first NLP Task was an experimental machine translation project, a collaboration between Georgetown University and IBM, which started in in the early 1950s. The Georgetown-IBM experiment implemented a programmed rule-based machine translation system on an IBM 701 (IBM’s first commercial and scientific computer), to translate the Russian language to English. The public demonstration of a Russian-English machine translation system in New York in January 1954 sparked a great deal of public interest to the point where the front page of the New York Times reported that:
“A public demonstration of what is believed to be the first successful use of a machine to translate meaningful texts from one language to another took place here yesterday afternoon. This may be the cumulation of centuries of search by scholars for “a mechanical translator.”
In 1966, Joseph Weizenbaum, a professor at the Massachusetts Institute of Technology (MIT) Artificial Intelligence Laboratory, released a chatbot called ELIZA, that mimics a Rogerian psychotherapist. ELIZA was created to serve as a therapeutic tool, and it would respond to users mostly by asking open-ended questions and giving generic responses to words and phrases. ELIZA simulated conversation by using a pattern matching and substitution methodology programmed with complicated if-then logic and significant linguistic expertise. Rule based systems are complex to build and depend on expert domain knowledge.
Statistical and Machine Learning Approaches
In the 1980s and 1990s there was a paradigm shift from rule-based approach to statistical, probabilistic, and machine learning approaches for NLP tasks due to increase in data and computational power. By the 1990s, IBM introduced statistical machine translation where researchers theorized that if they looked at enough text, they could find patterns in translations. Statistical machine translation concepts that come from information theory.
The creation of the world wide web in 1990 made it possible to accumulate huge amount of text data that machine learning algorithms can train on to achieve NLP tasks. By the 1990s, the first recurrent neural network was trained.
In 1997 LSTM (long short-term memory) for recurrent neural networks was developed.
In 2006, Google Translate was the first successful commercial NLP machine translation system. Google Translate uses statistical machine translation.
In 2013, Google researchers introduced word2vec, the first word embeddings model, widely used for encoding words into vector representations in NLP tasks.
Large Language Models
In 2014, the attention mechanism was introduced by Google and published in 2017 in a paper titled, “Attention Is All You Need”.
In 2017, the Google Brain researchers introduced the transformer architecture.
In 2018, OpenAI released the first GPT-1, their first Generative Pre-trained Transformer model.
In 2019, OpenAI released GPT-2. Google released BERT, a large transformer model widely used to create word embeddings or representations.
In 2020 OpenAI released GPT-3.
In 2022, OpenAI released ChatGPT, which became popular overnight.