Lesson 1: Introduction to Machine Learning

Banner.

My Six-Year-Old Daughter Learning Machine Learning

Debbie

I explained the concept of machine learning (ML) to my six-year-old daughter (in the picture above), and she grasped it in less than two minutes! I gave her pairs of numbers: \((1, 2)\), \((2, 4)\), and \((3, 6)\). For the fourth pair, I only provided the first number \((4, \_)\) and asked her to guess the second number. She correctly said 8. Let’s take a look at the standard definition of machine learning, then use this scenario of how my daughter is learning machine learning to present the concept in plain language.

What is Machine Learning?

The term Machine Learning was coined in 1959 by Arthur Samuel, an American pioneer in the field of computer science and artificial intelligence Arthur defined machine learning using the following statement:

Definition

“A computer program is said to learn from experience \(E\) with respect to some class of tasks \(T\) and performance measure \(P\), if its performance at tasks in \(T\), as measured by \(P\), improves with experience \(E\)”.

This statement implies that machine learning is “a computer’s ability to learn without being explicitly programmed”. To someone who is still trying to understand the concept of machine learning, this definition might be a bit challenging to grasp.

Machine Learning in Plain Language

Let’s go back to my daughter’s scenario. How did she figure out that the answer is 8? She studied the relationship between the first and second numbers and discovered that if she doubles the first number, she obtains the second number. Although this is a naive example where the given data has a perfect relationship with no noise, it conveys the idea of how machine learning works.

My daughter learned the relationship in the data, then used that relationship to make predictions. What if my daughter was given some noisy or ‘imperfect’ data, such as \((1, 2)\), \((2, 5)\), and \((3, 5)\)? Real-world data is noisy and does not have a perfect pattern.

In this case, the relationship in the data can be estimated. For example, if we estimate the relationship such that the second number is obtained by doubling the first number, our predictions of the second values may not be perfect. Since the model is an estimate of the relationship in the data, the predictions may be correct, incorrect, close to the actual values, or far from the actual values. The gap between the actual value and the predicted value is called the prediction error.

What if the estimated relationship in this data, \((1, 2)\), \((2, 5)\), and \((3, 5)\), is that the second number is obtained by multiplying the first number by four \(\hat{y} = 4x\)? The predictions will be much further away from the actual output values. So, there are an infinite number of models or estimated relationships that could be used to make predictions.

Given that there are several models that could be used to represent the relationship in the data, which model is the “best” for predicting the outcomes with new data? The “best” model is the one that minimizes the prediction error (the difference between the prediction and the actual value). Generally, when training a model, the objective is to obtain a model that minimizes the prediction error.

Machine learning is analogous to providing my daughter with some data and later asking her to guess a number given new data. Machine learning involves:

  • Providing training data to an algorithm to learn or estimate the relationships (patterns) in the data, with the objective of minimizing the prediction error or maximizing model accuracy.
  • Using the learned relationships (called a model) to make predictions based on new input data.

Therefore, to keep the definition of machine learning intuitive and simple, I define machine learning as follows:

Definition

Machine learning is an approach to learning relationships or patterns within a dataset using algorithms, and applying the learned relationships, known as a model, to make predictions on new, unseen data.

Machine Learning Models

The pattern learned by the algorithm is called a model. A model captures the relationship in the data and serves as a representation of reality. The goal of machine learning is to train a model that can be used for inference. A good machine learning model captures the underlying patterns in the data and is simple enough to generalize well to new, unseen data, ensuring strong performance both on training and production (real-world) data.

In machine learning, models are trained or learned using algorithms. An algorithm is a sequence of steps for solving a problem. Some models are mathematical in nature and others are not. In the examples above, we can formulate a mathematical model for solving the problem as \(\hat{y} = 2x\), which means, we can estimate the second number, \(y\) using \(\hat{y}\) by doubling the first number \(x\).

The model \(\hat{y} = 2x\) assumes a linear relationship between the variables. However, in some cases, the data may exhibit non-linear relationships, making a non-linear model more appropriate. It’s important to understand that \(\hat{y} = 2x\) represents an estimated relationship rather than the “true” relationship. The true relationship can be mathematically expressed as \(y = 2x + \text{error}\), or equivalently, \(y = \hat{y} + \text{error}\), where the error is defined as \(\text{error} = y - \hat{y}\). This error term captures the difference between the observed values and the predicted values, acknowledging that the model may not perfectly fit the data.

Machine Learning Tasks

There are different types of models in machine learning:

Regression models predict continuous outcomes given some input data. These models are trained with both input (\(x\)) and output (\(y\)) data, and thus are supervised. Examples include linear regression and random forest regressor.

Classification models predict categorical outcomes given some input data. These models are trained with both input (\(x\)) and output (\(y\)) data, and thus are supervised. Examples include logistic regression and random forest classifier.

Clustering models group records into clusters based on the similarity of the records. These models are trained with input data only, without any output data, and hence are unsupervised.

Practical ML Applications

Machine learning is valuable to almost all industries and fields, such as Healthcare, Finance, Marketing, Natural Language Processing, Computer Vision, and more.

Machine learning has a wide range of applications, including:

  • Identifying patients at risk for certain diseases,
  • Detecting credit card fraud and assessing credit risk,
  • Predicting stock prices,
  • Customer segmentation,
  • Predicting the time it will take for a device to fail (predictive maintenance use case),
  • Predicting customer attrition,
  • Recommending products based on user characteristics or preferences,
  • Energy management, such as forecasting electricity consumption to optimize operations,
  • Object detection,
  • Sentiment analysis,
  • Analysis of customer reviews,
  • Speech recognition in virtual assistants such as Siri and Alexa,
  • Topic modeling to capture emerging themes in text, text generation, and many more.

Summary

In this lesson, machine learning is introduced through a simple example where a six-year-old learns to predict a missing value in a set of numbers based on a pattern. This illustrates how ML works. Machine learning is an approach that involves learning relationships or patterns within a dataset using an algorithm. The learned relationship, known as a model, is then applied on new (unseen) data to make predictions. The formal definition of machine learning describes it as a process in which a computer program improves its performance over time with experience, using algorithms to build models that predict outcomes for new, unseen data.

The lesson then explains how machine learning models are developed using algorithms that learn patterns from data. The goal is to create a model that generalizes well and makes accurate predictions, whether the relationships are linear or non-linear. The concept of prediction error is also discussed: while predictions may not be perfect, a good model should make predictions that are acceptable, especially for regression tasks where the outcome is continuous. Finally, the lesson covers different types of ML tasks — regression, classification, and clustering — and highlights the wide range of ML applications across industries such as healthcare, finance, and natural language processing.