Lesson 7: Pre-trained Models and Transfer Learning

Pre-trained Models and Transfer Learning

Instead of building a CNN from scratch, pre-trained networks can be used to predict new image samples. Pre-trained deep neural networks are useful for predicting samples that belong to the same classes as those in the pre-trained set.

Pre-trained models save computational time and resources as they are trained on vast amounts of data. These models can be fine-tuned for specific tasks, allowing them to generalize well to new data.

Examples of pre-trained models in Keras include:

Xception
VGG16
VGG19
ResNet50
ResNet101
And more

For a full list of pre-trained models, refer to the official Keras documentation: https://keras.io/api/applications/.

Steps for Using a Pre-trained Model

A pretrained model can be used without tuning as follows:

Load the data
Preprocess the data
Initialize the pre-trained model
Make predictions with the model
Find the object predicted with the highest probability

The ResNet50 Pre-trained Model

Let’s use the ResNet50 pre-trained model to make predictions. ResNet-50 is a convolutional neural network that is 50 layers deep and trained on over a million images across 1000 categories, including objects like keyboards, mice, pencils, and various animals.

The data used for ResNet-50 has a shape of (224, 224, 3), so we need to ensure that the image sample we want to predict is resized to the same shape.

Loading the Image to be Predicted

Lets upload and display image that will be predicted with a pretrained model.

Code

import matplotlib.pyplot as plt
from keras.preprocessing.image import load_img, img_to_array

# Load the image
orange = load_img("orange.png", color_mode="rgb", target_size=(224, 224))

# Convert to array
orange_array = img_to_array(orange)

# Display the image
plt.imshow(orange_array.astype('uint8')); 
plt.axis('off');  # Hide the axes

(-0.5, 223.5, 223.5, -0.5)

Prepare the Image to be Predicted

The data used for the ResNet has 4 dimensions so we need to make sure the image to be predicted is 4D. So, let’s check the dimensions of the image.

Code

import keras
from keras.preprocessing.image import load_img, img_to_array

orange_arr = keras.preprocessing.image.img_to_array(orange)
print("Image Shape", orange_arr.shape)

Image Shape (224, 224, 3)

Let’s reshape the image from 3D to 4D

Code

orange_arr = orange_arr.reshape(1, 224, 224, 3)
orange_arr.shape

(1, 224, 224, 3)

Let’s prepare the image to be predicted

Code

import keras

orange_image = keras.applications.resnet50.preprocess_input(orange_arr)

Initialize the ResNet50 Pretrained Model

Code

# initialize the pretrained model
resnet_model = keras.applications.ResNet50()
#resnet_model.summary()

Use the ResNet50 Pretrained Model for Prediction

Code

import tensorflow as tf

# Disable TensorFlow logging
tf.get_logger().setLevel('ERROR')

# make a prediction
y_pred = resnet_model.predict(orange_image, verbose=0);

# print the top 2 probabilities with the corresponding predicted images
results = keras.applications.resnet50.decode_predictions(y_pred, top=2)
print(results)

[[('n07747607', 'orange', 0.94502777), ('n07749582', 'lemon', 0.04975761)]]

The results show top two predicted labels with their probabilities. Orange has the highest probability so we can classify or identify the image as an orange.

Building a Model with Images in a Directory

Let’s first build a model from scratch using image data from a directory. Then, we will compare this approach with fine-tuning a pre-trained model, instead of building a model entirely from scratch. ### Preparing and Loadig the Image Data

If the training and test sets are stored in a directory on your computer, we can upload the data and use it for model training or transfer learn, as shown in this section. You can create a training_set folder and a test_set folder inside a “data” folder. The data folder is at the first level, inside the project directory.

Assuming the data is binary and consists of cars and flowers, you should create a car folder and a flower folder containing the respective car and flower images for both the training and test sets. For example, the diagram below shows the car and flowerfolders inside the training_set.

The diagram below shows the training flower images inside the flower folder.

Read the training and test image datasets

Code

import os
base_dir = "../data"
train_dir = os.path.join(base_dir, "training_set")
test_dir = os.path.join(base_dir, "test_set")
# directory with training car images
train_car_dir = os.path.join(train_dir, "car")
# directory with test car images
test_car_dir = os.path.join(test_dir, "car")
# directory with training flower images
train_flower_dir = os.path.join(train_dir, "flower")
# directory with test flower images
test_flower_dir = os.path.join(test_dir, "flower")

print("Does the path exist? ", os.path.exists(train_flower_dir))

Does the path exist?  True

Code

# Create data generators for training and testing
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,             # Normalize pixel values
    rotation_range=40,          # Random rotation
    width_shift_range=0.2,      # Random horizontal shift
    height_shift_range=0.2,     # Random vertical shift
    shear_range=0.2,            # Random shear
    zoom_range=0.2,             # Random zoom
    horizontal_flip=True        # Random horizontal flip
)

test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)  # Only rescale for test data

Image Data Preprocessing and Batching

We will then prepare the image data folders for training by applying the following steps:

Rescale pixel values to a range between 0 and 1.
Resize images to 64x64 pixels.
Load images in batches of 20 from the specified directory. This data is then ready to be fed into a machine learning model for training.

Code

# Create data generators for rescaling
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

# Load training images in batches of 20 with resizing to 64x64 pixels
train_set = train_datagen.flow_from_directory(
    train_dir,
    target_size=(64, 64),
    batch_size=20,
    class_mode='binary'
)

Found 2000 images belonging to 2 classes.

Code

# Load test images in batches of 20 with resizing to 64x64 pixels
test_set = test_datagen.flow_from_directory(
    test_dir,
    target_size=(64, 64),
    batch_size=20,
    class_mode='binary'
)

Found 2000 images belonging to 2 classes.

Code

# Print both test and train image shapes in a single print statement 
print(f"Test image shape: {test_set.image_shape}\nTrain image shape: {train_set.image_shape}")

Test image shape: (64, 64, 3)
Train image shape: (64, 64, 3)

Note

After an image is processed by ImageDataGenerator using its flow_from_directory() method, train_set.image_shape will show the shape of one image in the dataset, not the entire batch.
ImageDataGenerator is lazy and loads data in batches during training.
The first batch of images is loaded into memory initially when using train_set = train_datagen.flow_from_directory().
This first batch is the one that will be fed into the model for training.

Model Building

After preparing the image data and have the data in a suitable numerical tensor format, we can now initialize a deep learning model, define the architecture, compile the model and fit the model as follows.

Code

import tensorflow as tf
import keras
from keras import layers

import warnings

# Suppress all warnings globally
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
tf.random.set_seed(1234)

# Initialize the Sequential model
model = keras.Sequential()

# Add the Input layer to specify the input shape
model.add(layers.Input(shape=(64, 64, 3)))  # Set input shape to 64x64 images with 3 color channels (RGB)

# Add the first convolutional layer with 28 filters and 3x3 kernel
model.add(layers.Conv2D(28, (3, 3), activation='relu', padding="same"))

# Add the first MaxPooling layer
model.add(layers.MaxPooling2D((2, 2)))

# Add the second convolutional layer with 64 filters and 3x3 kernel
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# Add the second MaxPooling layer
model.add(layers.MaxPooling2D((2, 2)))

# Flatten the output to connect to fully connected layers
model.add(layers.Flatten())

# Dropout layer to drop 50% of the neurons during training to prevent overfitting
model.add(layers.Dropout(0.5))

# Add a Dense fully connected layer with 512 units
model.add(layers.Dense(512, activation='relu'))

# Add the output layer with a sigmoid activation for binary classification
model.add(layers.Dense(1, activation='sigmoid'))

# Compile the model with Adam optimizer, binary cross-entropy loss, and accuracy metric
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Fit the model to the training data (train_set) with validation from the test data (test_set)
model.fit(
    train_set,
    steps_per_epoch=100,  # steps * batch_size = 2000
    epochs=5,
    validation_data=test_set,
    validation_steps=100,
    shuffle=False, 
    verbose=0
)

<keras.src.callbacks.history.History object at 0x7fa584f14050>

Code

model.summary();

Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D)                 │ (None, 64, 64, 28)     │           784 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d (MaxPooling2D)    │ (None, 32, 32, 28)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_1 (Conv2D)               │ (None, 30, 30, 64)     │        16,192 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_1 (MaxPooling2D)  │ (None, 15, 15, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (Flatten)               │ (None, 14400)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 14400)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 512)            │     7,373,312 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │           513 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 22,172,405 (84.58 MB)
 Trainable params: 7,390,801 (28.19 MB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 14,781,604 (56.39 MB)

Note

Epoch: One full pass through the entire dataset. The number of epochs determines how many times the algorithm sees the dataset.
Batch Size: Defines how many examples are used to update parameters at a time.
Steps per Epoch: The number of iterations (or batches) required to process the entire dataset, calculated as the size of the dataset divided by the batch size.

Evaluate the model

Code

# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(test_set, steps=100, verbose=0)

# Print the evaluation results in a single line with newlines
print(f"Test Loss: {test_loss}\nTest Accuracy: {test_accuracy}")

Test Loss: 0.41914913058280945
Test Accuracy: 0.8220000267028809

Use the model for prediction

Code

# Access the class labels (flow and car)
class_labels = train_set.class_indices
print("Class Labels:", class_labels)

Class Labels: {'car': 0, 'flower': 1}

Code

sample_images, sample_labels = next(test_set)  # Get a batch of images and labels from the test_set

# Predict the labels for the second samples
y_pred = model.predict(sample_images[1:2], verbose=0)
print("True Label: ", sample_labels[1])

True Label:  1.0

Code

print(y_pred)

[[0.9621799]]

Transfer Learning

Transfer learning involves using a pre-trained model and adapting it to a new dataset. A CNN model consists of two parts: the convolutional base (which captures generic features) and the classifier (ANN). We can retain the convolutional base from a pre-trained model and replace the classifier to suit the new task. This approach allows us to freeze the convolutional layers while modifying the classifier to predict specific categories, such as adapting an animal classifier to predict only cats and dogs.

How to Fine-Tune a Pre-trained Model

Instead of training a model from scratch, we can leverage a pre-trained model to initialize the model with learned features. This speeds up convergence by reusing previously learned representations. We freeze the earlier layers to preserve the general features, while fine-tuning the later layers (e.g., the output layer) to adapt the model to the new task.

In this case, we will fine-tune a pre-trained VGG16 model to classify objects as either a car or a flower, modifying it from its original 1000 categories to just two.

Steps for Fine-tuning the VGG16 Model

Initializing a Pre-trained Model
We will begin by loading the VGG16 model with pre-trained weights, excluding the top (classification) layers, as we will replace them with a custom output layer.
Initializing the Sequential Model
A Sequential model will be used to stack layers in the desired order.
Adding Layers of the Pre-trained Model
We will add all layers of the pre-trained VGG16 model to the Sequential model except for the last layer, which is the classifier layer (since we will be modifying it).
Freezing the Initial Layers
The initial layers of the pre-trained model will be frozen, meaning their weights will not be updated during training. This allows us to retain the feature extraction capabilities learned from the large dataset the model was initially trained on.
Adding an Output Layer
A new output layer will be added to the frozen layers, adjusted to predict only two categories (car and flower).
Compiling the Network
The model will be compiled with an appropriate optimizer, loss function, and evaluation metrics.
Fitting the Model with Additional Data
The model will be trained (fine-tuned) with additional data, which could be augmented to improve generalization.
Using the Model to Make Predictions
Finally, we will use the trained model to make predictions on new examples, classifying them as either a car or a flower.

Code

# initialize the pre-trained vgg16 model
import tensorflow as tf
import keras

tf.random.set_seed(1234)
vgg16_model = keras.applications.VGG16()
vgg16_model.summary()

Model: "vgg16"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_2 (InputLayer)      │ (None, 224, 224, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block1_conv1 (Conv2D)           │ (None, 224, 224, 64)   │         1,792 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block1_conv2 (Conv2D)           │ (None, 224, 224, 64)   │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block1_pool (MaxPooling2D)      │ (None, 112, 112, 64)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_conv1 (Conv2D)           │ (None, 112, 112, 128)  │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_conv2 (Conv2D)           │ (None, 112, 112, 128)  │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_pool (MaxPooling2D)      │ (None, 56, 56, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv1 (Conv2D)           │ (None, 56, 56, 256)    │       295,168 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv2 (Conv2D)           │ (None, 56, 56, 256)    │       590,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv3 (Conv2D)           │ (None, 56, 56, 256)    │       590,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_pool (MaxPooling2D)      │ (None, 28, 28, 256)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv1 (Conv2D)           │ (None, 28, 28, 512)    │     1,180,160 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv2 (Conv2D)           │ (None, 28, 28, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv3 (Conv2D)           │ (None, 28, 28, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_pool (MaxPooling2D)      │ (None, 14, 14, 512)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv1 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv2 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv3 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_pool (MaxPooling2D)      │ (None, 7, 7, 512)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (Flatten)               │ (None, 25088)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ fc1 (Dense)                     │ (None, 4096)           │   102,764,544 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ fc2 (Dense)                     │ (None, 4096)           │    16,781,312 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ predictions (Dense)             │ (None, 1000)           │     4,097,000 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 138,357,544 (527.79 MB)
 Trainable params: 138,357,544 (527.79 MB)
 Non-trainable params: 0 (0.00 B)

We aim to transfer all the initial layers of the pre-trained model, except for the output layers, to a new classifier.

Code

# Initialize the classifier model
model = tf.keras.Sequential()

# Loop through the pre-trained layers and add them to the sequential model
# Exclude the pre-trained output layer
pre_trained_output = str(vgg16_model.layers[-1])

for layer in vgg16_model.layers:
    if str(layer) != pre_trained_output:
        model.add(layer)

model.summary()

Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ block1_conv1 (Conv2D)           │ (None, 224, 224, 64)   │         1,792 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block1_conv2 (Conv2D)           │ (None, 224, 224, 64)   │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block1_pool (MaxPooling2D)      │ (None, 112, 112, 64)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_conv1 (Conv2D)           │ (None, 112, 112, 128)  │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_conv2 (Conv2D)           │ (None, 112, 112, 128)  │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_pool (MaxPooling2D)      │ (None, 56, 56, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv1 (Conv2D)           │ (None, 56, 56, 256)    │       295,168 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv2 (Conv2D)           │ (None, 56, 56, 256)    │       590,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv3 (Conv2D)           │ (None, 56, 56, 256)    │       590,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_pool (MaxPooling2D)      │ (None, 28, 28, 256)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv1 (Conv2D)           │ (None, 28, 28, 512)    │     1,180,160 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv2 (Conv2D)           │ (None, 28, 28, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv3 (Conv2D)           │ (None, 28, 28, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_pool (MaxPooling2D)      │ (None, 14, 14, 512)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv1 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv2 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv3 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_pool (MaxPooling2D)      │ (None, 7, 7, 512)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (Flatten)               │ (None, 25088)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ fc1 (Dense)                     │ (None, 4096)           │   102,764,544 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ fc2 (Dense)                     │ (None, 4096)           │    16,781,312 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 134,260,544 (512.16 MB)
 Trainable params: 134,260,544 (512.16 MB)
 Non-trainable params: 0 (0.00 B)

You will notice from the summary of the classifier model that the output layer, named Prediction (Dense) in the pretrained model summary, is not included in this sequential model with the added layers.

Now, let’s freeze the transferred layers of the new sequential model and add the output layer with a sigmoid activation function.

Code

# Freeze the layers transferred to the new sequential model
for layer in model.layers:
    layer.trainable = False

# Add the output layer with a sigmoid activation
model.add(layers.Dense(1, activation="sigmoid"))

# Optionally, display the model summary
model.summary()

Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ block1_conv1 (Conv2D)           │ (None, 224, 224, 64)   │         1,792 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block1_conv2 (Conv2D)           │ (None, 224, 224, 64)   │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block1_pool (MaxPooling2D)      │ (None, 112, 112, 64)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_conv1 (Conv2D)           │ (None, 112, 112, 128)  │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_conv2 (Conv2D)           │ (None, 112, 112, 128)  │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block2_pool (MaxPooling2D)      │ (None, 56, 56, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv1 (Conv2D)           │ (None, 56, 56, 256)    │       295,168 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv2 (Conv2D)           │ (None, 56, 56, 256)    │       590,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_conv3 (Conv2D)           │ (None, 56, 56, 256)    │       590,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block3_pool (MaxPooling2D)      │ (None, 28, 28, 256)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv1 (Conv2D)           │ (None, 28, 28, 512)    │     1,180,160 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv2 (Conv2D)           │ (None, 28, 28, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_conv3 (Conv2D)           │ (None, 28, 28, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block4_pool (MaxPooling2D)      │ (None, 14, 14, 512)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv1 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv2 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_conv3 (Conv2D)           │ (None, 14, 14, 512)    │     2,359,808 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ block5_pool (MaxPooling2D)      │ (None, 7, 7, 512)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (Flatten)               │ (None, 25088)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ fc1 (Dense)                     │ (None, 4096)           │   102,764,544 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ fc2 (Dense)                     │ (None, 4096)           │    16,781,312 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │         4,097 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 134,264,641 (512.18 MB)
 Trainable params: 4,097 (16.00 KB)
 Non-trainable params: 134,260,544 (512.16 MB)

You would notice that the output or prediction layer has been added.

Prepare the Data and Use it for Fine-tuning

Ensure the image shape matches the input shape expected by the pre-trained model, and re-scale the image data.

Code

# Create data generators for rescaling
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

# Load training images in batches of 20 with resizing to 64x64 pixels
train_set = train_datagen.flow_from_directory(
    train_dir,
    target_size=(224, 224),
    batch_size=20,
    class_mode='binary'
)

Found 2000 images belonging to 2 classes.

Code

# Load test images in batches of 20 with resizing to 64x64 pixels
test_set = test_datagen.flow_from_directory(
    test_dir,
    target_size=(224, 224),
    batch_size=20,
    class_mode='binary'
)

Found 2000 images belonging to 2 classes.

Code

# Print both test and train image shapes in a single print statement 
print(f"Test image shape: {test_set.image_shape}\nTrain image shape: {train_set.image_shape}")

Test image shape: (224, 224, 3)
Train image shape: (224, 224, 3)

Fine-tune the Pre-trained Model

Code

# Compile the model
model.compile(
    optimizer='rmsprop',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Fit the model with the car-flower image data
model.fit(
    train_set,
    steps_per_epoch=20,  
    epochs=1,
    validation_data=test_set,
    validation_steps=20,
    shuffle=False,
    verbose=0
)

<keras.src.callbacks.history.History object at 0x7fa554e51510>

Load an Image to be Predicted

Code

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt  # You need to import matplotlib to use plt

# Load the new image
new_image = tf.keras.preprocessing.image.load_img(
    "car.png",
    color_mode="rgb",
    target_size=(224, 224)
)

# Convert the image to a numpy array
new_image_arr = tf.keras.preprocessing.image.img_to_array(new_image)

# Expand dimensions to match the model input
new_image_array = np.expand_dims(new_image_arr, axis=0)

# Display the image
plt.imshow(new_image_array[0].astype('uint8'))  # Use new_image_array[0] to get the image itself
plt.axis('off')  # Hide the axes

(-0.5, 223.5, 223.5, -0.5)

Code

plt.show()

Use the Fine-tuned Model for Prediction

Code

## inspect the classes in the dataset
classes = train_set.class_indices
print("Classes in the dataset: ", classes)

Classes in the dataset:  {'car': 0, 'flower': 1}

Code

# make a prediction
y_pred = model.predict(new_image_array, verbose=0)[0][0]
print("Estimated Probability: ", y_pred)

Estimated Probability:  0.0003353959

Code

# classify the image based on the prediction
if y_pred >0.5:
  print("Predicted Label: ", "Flower")
  
elif y_pred<0.5:
  print("Predicted Label: ", "Car")

Predicted Label:  Car

Use Pretrained Model for Prediction (No Fine-tuning)

Code

import tensorflow as tf
import keras

# initialize the pretrained model
VGG16_model = keras.applications.VGG16()

# Disable TensorFlow logging
tf.get_logger().setLevel('ERROR')

# make a prediction
y_pred = VGG16_model.predict(new_image_array, verbose=0)

# print the top 5 probabilities with the corresponding predicted images
results = keras.applications.vgg16.decode_predictions(y_pred, top=5);
for i in results:
    print(i)

[('n03924679', 'photocopier', 0.48023644), ('n04590129', 'window_shade', 0.101187), ('n04554684', 'washer', 0.07467646), ('n04239074', 'sliding_door', 0.062271174), ('n04005630', 'prison', 0.033811044)]

We can see that the pre-trained model classifies the image as a photocopier because it is a general multi-class model with 1,000 labels or categories. However, fine-tuning the pre-trained model enables it to make more accurate predictions. The fine-tuned model correctly identified the image as a car.

Summary

This lesson explains how to use pre-trained models to perform tasks like image classification without building models from scratch. Pre-trained models, such as ResNet50 and VGG16, are beneficial because they are already trained on large datasets, saving time and computational resources. These models can be used directly or fine-tuned to adapt to specific tasks. The process involves loading data, preprocessing it, and using a pre-trained model to predict images, such as classifying objects like cars or flowers.

The lesson covers detailed steps for using a pre-trained model like ResNet50 to classify images, including image preprocessing, reshaping images to match the model’s input format, and making predictions. It also introduces transfer learning, where pre-trained models are adjusted for new tasks by adding custom output layers and freezing the convolutional base. Through an example using the VGG16 model, students learn how to replace the output layer for binary classification tasks like distinguishing between cars and flowers. By leveraging pre-trained networks, users can save resources and build efficient models for specific applications with minimal training. Fine-tuning allows users to freeze the initial layers of a pre-trained model, modify the output layer, and adapt the model to new data.