In this lesson, we will focus on the application of Deep Neural Networks (DNNs) in solving traditional regression and classification problems, along with their use in image classification. We will use the keras or TensorFlow framework to explore how DNNs can be applied to practical problems such as predicting house prices based on various features, forecasting customer churn in a business context, and performing grayscale image classification. By understanding these applications, you’ll gain insight into how DNNs can be leveraged to address real-world challenges across different domains.
Building Artificial Neural Networks in keras
Keras is an interface for solving machine learning problems using deep learning. It is a high-level API of TensorFlow written in Python. TensorFlow is an end-to-end open-source machine learning platform.
Sample code for building a keras model
import keras
1. model = keras.Sequential()
2. model.add(keras.Input(shape=(2,))) # input layer with two features
3. model.add(keras.layers.Dense(3, activation='relu')) # first hidden layer
4. model.add(keras.layers.Dense(1, activation='sigmoid')) # output layer
5. model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
6. model.fit(X_train, y_train, epochs=5) # fit the model
7. model.evaluate(X_test, y_test) # evaluate the model
8. model.predict(X_test) # use the model for prediction
The following steps are used to build a model in keras.
Initializing the model: The model is initialized using the keras.Sequential() method. This method allows you to build the model layer by layer. A sequential model is simple to use when you are building a feed-forward neural network.
Adding the input layers: The input layer is added with keras.Input(shape=(2,)), indicating the model expects input data with two features. Explicitly specifying the input layer is optional: omitting it delays weight initialization until training, while specifying it initializes weights as layers are added.
Adding the first hidden layers: The first hidden layer is added using keras.layers.Dense(3, activation='relu'). It has 3 neurons and uses the ReLU activation function. You can add more hidden layers with different activation functions like ReLU, sigmoid, or tanh to increase the model’s capacity for learning.
Adding the output layer: The output layer is added with keras.layers.Dense(1, activation=‘sigmoid’), having one node, suitable for binary classification. The sigmoid activation outputs a value between 0 and 1, representing the probability of the input belonging to class 1.
Compiling the model: The model is compiled with the model.compile() method. This step specifies the optimizer (Adam), the loss function (binary_crossentropy for binary classification tasks), and the metrics to track (accuracy). Compiling configures the model for training.
Fit the Model: The model is trained using the fit() method, with parameters such as batch_size (number of instances per iteration) and epochs (number of times the entire dataset is passed through). Fitting the model adjusts the weights based on the training data (X_train and y_train) over a specified number of epochs.
Evaluating the model: The model is evaluated using the .evaluate() method, which calculates the performance on a test dataset.
Making predictions with the model: Predictions are made with the .predict() method, generating output based on new input data. For a binary classification, this method outputs predicted probabilities for the test data, which can be further processed (e.g., converting to binary labels using a threshold) to make final predictions.
Deep Neural Network for Predicting House Price
In the following section, we will explore how to use Keras to build a deep neural network for a regression task: house price prediction.
Dataset
Let’s generate some data for house price prediction and split it for model training and evaluation
Code
import numpy as npimport pandas as pdfrom sklearn.model_selection import train_test_split# Generate random data for predicting house pricesnp.random.seed(42)# Simulate some features related to house pricesnum_samples =10000square_feet = np.random.uniform(500, 5000, num_samples) # Size of the house in square feetnum_bedrooms = np.random.randint(1, 6, num_samples) # Number of bedroomsnum_bathrooms = np.random.randint(1, 4, num_samples) # Number of bathroomsage_of_house = np.random.randint(0, 100, num_samples) # Age of the house in years# Simulate house prices (target variable)house_price = (square_feet *150) + (num_bedrooms *50) + (num_bathrooms *200) - (age_of_house *100) + np.random.normal(0, 100, num_samples)# Create a DataFramedata = pd.DataFrame({'square_feet': square_feet,'num_bedrooms': num_bedrooms,'num_bathrooms': num_bathrooms,'age_of_house': age_of_house,'house_price': house_price})X = data[['square_feet', 'num_bedrooms', 'num_bathrooms', 'age_of_house']] # Featuresy = data['house_price'] # Target variable# Split the data into training and testing sets (80% training, 20% testing)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Display the datadata.head().style.format({'square_feet': '{:.2f}','house_price': '{:.2f}'})
square_feet
num_bedrooms
num_bathrooms
age_of_house
house_price
0
2185.43
4
3
57
323026.52
1
4778.21
2
3
4
716980.79
2
3793.97
5
1
32
566312.73
3
3193.96
3
2
42
475287.50
4
1202.08
3
3
18
179391.17
Model Building with Keras
Code
import kerasnp.random.seed(42)model = keras.Sequential()model.add(keras.Input(shape=(len(X_train.columns),)))model.add(keras.layers.Dense(20, activation='relu'))model.add(keras.layers.Dense(20, activation='relu'))model.add(keras.layers.Dense(1)) # Output layer (linear activation is the default)model.compile(optimizer='adam', loss='mse', metrics=[keras.metrics.RootMeanSquaredError()])model.fit(X_train, y_train, epochs=10, verbose=0)
<keras.src.callbacks.history.History object at 0x7f9792511510>
y_pred = model.predict(X_test, verbose=0)print("First few predictions", "\n", y_pred[0:5])
First few predictions
[[300136.47]
[589240.9 ]
[492194.1 ]
[104718.5 ]
[461271.7 ]]
Analysis of Training History
The training history can be inspected to understand the number of epochs that will be optimal for an early stop.
Code
import matplotlib.pyplot as plthistory = model.fit(X, y, epochs=100, verbose=0, validation_split=0.3)plt.title("Training History")plt.xlabel("Epoch")plt.ylabel('Loss')plt.plot(history.history["val_loss"], color="green")plt.plot(history.history["loss"], color="red");
This graph shows that it would be optimal to set the number of epochs to 20.
Deep Neural Network for Predicting Customer Churn
In the following section, we will explore how to use Keras to build a deep neural network for a classification task: predicting customer churn.
Let’s generate some data for customer churn prediction and split it for model training and evaluation.
Code
import numpy as npimport pandas as pdfrom sklearn.model_selection import train_test_split# Generate random data for predicting customer churnnp.random.seed(42)# Simulate some features related to customer churnnum_samples =10000age = np.random.randint(18, 70, num_samples) # Age of the customerincome = np.random.uniform(30000, 150000, num_samples) # Income of the customernum_products = np.random.randint(1, 5, num_samples) # Number of products the customer useshas_complaints = np.random.randint(0, 2, num_samples) # Whether the customer had any complaints (0 or 1)# Simulate churn (target variable: 1 for churn, 0 for no churn)churn = arr = np.random.choice([0, 1], size=10000) # 1 for churn, 0 for no churn# Create a DataFramedata = pd.DataFrame({'age': age,'income': income,'num_products': num_products,'has_complaints': has_complaints,'churn': churn})X = data[['age', 'income', 'num_products', 'has_complaints']] # Featuresy = data['churn'] # Target variable# Split the data into training and testing sets (80% training, 20% testing)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Display the datadata.head().style.format({'age': '{:.0f}','income': '{:.2f}'})
age
income
num_products
has_complaints
churn
0
56
113031.28
4
1
1
1
69
48702.18
2
0
0
2
46
59213.11
3
0
0
3
32
131063.25
1
1
1
4
60
55116.73
1
0
1
Model Building with Keras
Code
import kerasimport tensorflow as tftf.random.set_seed(1234)model = keras.Sequential()model.add(keras.Input(shape=(len(X_train.columns),))) # Input layer with the featuresmodel.add(keras.layers.Dense(15, activation='relu')) # First hidden layermodel.add(keras.layers.Dense(15, activation='relu')) # Second hidden layermodel.add(keras.layers.Dense(1, activation='sigmoid')) # Output layer (sigmoid for binary classification)model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])model.fit(X_train, y_train, epochs=10, verbose=0)
<keras.src.callbacks.history.History object at 0x7f9730679490>
y_pred = model.predict(X_test, verbose=0)print("First few predictions (probabilities for churn):", "\n", y_pred[0:5])
First few predictions (probabilities for churn):
[[0.9999958 ]
[0.9980307 ]
[0.99970955]
[1. ]
[0.99555945]]
Deep Neural Network for Image Classification
Deep neural network can be used for image classification or object identification (computer vision).
A digital image consists of a grid of rows and columns, where each cell in the grid is called a pixel. A pixel is the smallest unit of the image and represents a single point in the picture. Each pixel has a color value, which defines its appearance.
For grayscale images, the pixel value is a single number that represents the brightness of the pixel. This value corresponds to a shade of gray, where lower values indicate darker shades, and higher values represent lighter shades.
For colored images, each pixel is defined by three color components: red, green, and blue (RGB). Each of these color components is stored as a separate grayscale image, known as a color plane. Together, these three color planes combine to form the full color representation of each pixel in the image.
Pixel values range from 0 to 255, where 0 represents the darkest value (black) and 255 represents the brightest value (white) in the case of grayscale images. To apply algorithms that learn from images, the images must be represented as numerical data, typically in the form of matrices or tensors. These numerical representations allow machine learning models to process and extract patterns from the images.
Matrix and tensor representation of images: Grayscale images
Grayscale images are represented as 3D tensors (equivalent to 3D NumPy arrays). A 3D tensor for a grayscale image has three dimensions:
The first dimension represents the number of samples or images in the dataset.
The second dimension represents the height (number of rows) of the image.
The third dimension represents the width (number of columns) of the image.
Thus, the shape of a grayscale image is represented as (samples, height, width), where samples refers to the number of images, height is the vertical size of the image, and width is the horizontal size of the image.
Let’s randomly generate 2 grayscale images with height=5 and width=5.
The 3D image dataset with shape (samples, height, width) can be flattened into a 2D dataset by reshaping each image into a vector. We can reshape the dataset using images.reshape(samples, height * width).
An example of a 3D image dataset is MNIST, which contains 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images. The MNIST dataset can be loaded using the load_data() function from the keras.datasets module as follows:
Code
from tensorflow.keras.datasets import mnist# Load MNIST dataset(X_train, y_train), (X_test, y_test) = mnist.load_data()# Display shapes of the datasetprint(f"Training data shape: {X_train.shape}\nTest data shape: {X_test.shape}")
Training data shape: (60000, 28, 28)
Test data shape: (10000, 28, 28)
Let’s view the first image in the training set
Code
plt.imshow(X_train[0]);
4D Image Datasets (Grayscale Color Images):
Color scale images are represented as 4D tensors with the shape (samples, height, width, color_depth). Each image has a width and height, with three channels representing the red, green, and blue components. In other words, each colored image consists of three grayscale color planes. A 4D color images can be represented as a matrix or tensor.
An example of a 4D image dataset is CIFAR-10, which consists of 50,000 32x32 color training images and 10,000 test images, labeled with 10 categories: 0 = airplane; 1 = automobile; 2 = bird; 3 = cat; 4 = deer; 5 = dog; 6 = frog; 7 = horse; 8 = ship; 9 = truck. The CIFAR-10 dataset can be loaded using the .load_data() method from the keras.datasets module..
Code
import ssl# To avoid SSL errorsssl._create_default_https_context = ssl._create_unverified_context# Load CIFAR-10 datasetfrom keras.datasets import cifar10(X_train, y_train), (X_test, y_test) = cifar10.load_data()print(f"Training data shape: {X_train.shape}\nTest data shape: {X_test.shape}")
Training data shape: (50000, 32, 32, 3)
Test data shape: (10000, 32, 32, 3)
Let’s view the 7th image
Code
plt.imshow(X_train[6]);
How to Build a Deep Neural Networks with Grayscale Image Dataset
We will use the MNIST image grayscale dataset in Keras. The images in the MNIST dataset are handwritten digits from digit 0 to 9.
Image Data Preprocessing
Check the distribution of the output data.
Normalize the data to a smaller range. Since pixel values range from 0 to 255, divide each value by 255 to scale the data to a range of 0 to 1.
Scaling the data helps the algorithm run faster and produces better results. If you scale the training data, ensure the test data is also scaled to maintain similar distributions for better generalization.
Let’s load and plot the training and test datasets.
Code
import numpy as npimport matplotlib.pyplot as pltfrom keras.datasets import mnist# Load MNIST dataset(X_train_mn, y_train_mn), (X_test_mn, y_test_mn) = mnist.load_data()# Generate histograms for training and test datatrain_hist = np.histogram(y_train_mn, bins=range(11))[0]test_hist = np.histogram(y_test_mn, bins=range(11))[0]# Create subplots to display the histogramsfig, ax = plt.subplots(2)ax[0].set_xticks(range(10))ax[1].set_xticks(range(10))# Plot the histogramsax[0].bar(range(10), train_hist)ax[0].set_title("Histogram of training output data")ax[1].bar(range(10), test_hist)ax[1].set_title("Histogram of test output data")# Adjust layout for better spacingfig.tight_layout()# Show the plotplt.show()
Code
print(f"Minimum value in the training data: {np.min(X_train_mn)}", "\n",f"Maximum value in the test data: {np.max(X_test_mn)}")
Minimum value in the training data: 0
Maximum value in the test data: 255
Let’s scale the input training and test datasets
Code
# Scale the data by dividing by 255X_train_mn = X_train_mn /255X_test_mn = X_test_mn /255# Minimum and maximum value of scaled pixel valuesprint(f"Minimum value in the scaled training data: {np.min(X_train_mn)}\n"f"Maximum value in the scaled test data: {np.max(X_test_mn)}")
Minimum value in the scaled training data: 0.0
Maximum value in the scaled test data: 1.0
Let’s transform the output data to categorical type
Code
# Transform interger labels into one hot encodings ## for example: 3 to [0, 0, 0, 1, 0, 0, 0, 0, 0, 0 ]y_train_mn = tf.keras.utils.to_categorical(y_train_mn)y_test_mn = tf.keras.utils.to_categorical(y_test_mn)# Minimum and maximum value in the categorical dataprint(f"Minimum value in the categorical training data: {np.min(y_train_mn)}\n"f"Maximum value in the categorical test data: {np.max(y_test_mn)}")
Minimum value in the categorical training data: 0.0
Maximum value in the categorical test data: 1.0
Shape of training and test data
Code
# Display shapes of the datasetprint(f"Shape of input training data: {X_train_mn.shape}\n"f"Shape of output test data: {X_test_mn.shape}")
Shape of input training data: (60000, 28, 28)
Shape of output test data: (10000, 28, 28)
Flatten the 3D grayscale images to 2D
Code
# Unpack the 3D data to 2D by reshapingX_train_mn = X_train_mn.reshape(60000, 28*28)X_test_mn = X_test_mn.reshape(10000, 28*28)# Shape of 2D input dataprint(f"Shape of 2D input training data: {X_train_mn.shape}\n"f"Shape of 2D output test data: {X_test_mn.shape}")
Shape of 2D input training data: (60000, 784)
Shape of 2D output test data: (10000, 784)
The shape of the output datasets
Code
# Shape of 2D dataprint(f"Shape of the output training data: {y_train_mn.shape}\n"f"Shape of the output test data: {y_test_mn.shape}")
Shape of the output training data: (60000, 10)
Shape of the output test data: (10000, 10)
Build a Keras Model for Image Classification
Code
n_features = X_train_mn.shape[1]# Make results reproducible by setting a random seedtf.random.set_seed(42)# Initializing the modelmodel = keras.Sequential()# Input layer model.add(keras.Input(shape=(n_features,)))# Adding a Dense layer with 512 units and ReLU activationmodel.add(keras.layers.Dense(512, activation="relu"))# Adding the output layer with softmax activation for multiclass classification# The number of units should match the number of classes (10 for MNIST)model.add(keras.layers.Dense(10, activation="softmax"))# Compile the model: specify the optimizer, loss function, and evaluation metrics# Use categorical_cross-entropy loss function for multiclass classificationmodel.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=["accuracy"])# Train the model with the training datamodel.fit(X_train_mn, y_train_mn, epochs=5, batch_size=128, verbose=0);model.summary()
# Evaluate the model on the test setloss, accuracy = model.evaluate(X_test_mn, y_test_mn, verbose=0)# Print the evaluation resultsprint(f"Test loss: {loss}")
Test loss: 0.07665430009365082
Code
print(f"Test accuracy: {accuracy}")
Test accuracy: 0.9771999716758728
Using the Model for Prediction
Code
# Make predictions on the test sety_pred = model.predict(X_test_mn, verbose=0)# Print the first few predictionsprint("First few predictions (probabilities):")
# To get the predicted class labels (i.e., the index of the highest probability):y_pred_labels = np.argmax(y_pred, axis=1)# Print the first few predicted labelsprint("First few predicted labels:\n", y_pred_labels[:5])
First few predicted labels:
[7 2 1 0 4]
Code
# To get the integer (index) label for the actual test output y_labels = np.argmax(y_test_mn, axis=1)# Print the first few actual output test labelsprint("First few actual output test labels:\n", y_labels[:5])
First few actual output test labels:
[7 2 1 0 4]
Summary
This lesson focuses on applying Deep Neural Networks to practical regression and classification tasks using the Keras and TensorFlow frameworks. It provides hands-on examples, starting with the creation of DNN models for predicting house prices and customer churn. In these cases, the lesson guides you through model building using Keras, from initializing models to adding input and hidden layers, compiling the model, training it, and evaluating its performance. The process is demonstrated with practical datasets, where Keras is used for both regression (house price prediction) and binary classification (customer churn prediction).
Additionally, the lesson delves into using DNNs for image classification, explaining how to work with image datasets like MNIST and CIFAR-10. It discusses how images are represented as numerical data (tensors), and it details how grayscale and color images are handled differently in deep learning models. The lesson also covers preprocessing techniques such as normalizing pixel values and reshaping image data into 2D arrays. Finally, it walks through building and evaluating a model for classifying handwritten digits in the MNIST dataset, providing step-by-step instructions on data preprocessing, model building, training, and evaluation.