- The Hidden Layer
- Posts
- Learning Deep Learning Week 2
Learning Deep Learning Week 2
Week 13 of documenting my AI/ML learning journey (Jan 5 - Jan 11)
What was discussed last week…
I learned how Keras’ Functional API worked, and how it can be more flexible than its Sequential API in certain cases.
And in Keras’ Functional API, I learned how to make common types of layers, including Dense, Dropout, and even custom layers.
Hello!
This week, I found what I think is the first error in the course that I have encountered! If you are familiar with Python and how it deals with OOP (which you should be), then you may be able to tell what is wrong with the example code that had been used by the course. I discuss the mechanics of this code later, in Tuesday’s entry.
And in Saturday’s entry, you’ll see why I made the thumbnail of this week’s post that way.

Can you see what’s wrong?
Tuesday, January 7th
Phew!
Today was a long day with school and extracurriculars (clubs), so I didn’t have much time for learning DL today.
Today, though, I learned about how to implement data augmentation though Keras’ Image Data Generator class. For a refresher, data augmentation is when training data is artificially made more “difficult” for the model to predict correctly though random:
Rotations
Scaling
Flipping/Reflections
Translations
and Noise
Little did I know, the ImageDataGenerator class is also commonly used to simply load and preprocess data—it doesn’t always need to augment training data—I would figure that out later (Friday)…
To import Keras’ Image Data Generator class, it’s quite simple:
import ImageDataGenerator
And to generate different augmented versions of an image, you’ll have to make an object instance of the ImageDataGenerator class (also called a “datagen”).
Think of these different datagens as different “filters” or “flavors” of augmenting an image—for those who know CSS, making datagens is similar to making styles!
datagen = ImageDataGenerator(
# The image can be rotated up to +/- 40 degrees
rotation_range=40,
# The image can be translated horizontally up to 20% of its width
width_shift_range=0.2,
# The image can be translated vertically up to 20% of its height
height_shift_range=0.2,
# The image can be applied a shear transformation of up to 0.2 radians (~11.5 degrees).
shear_range=0.2,
# The image can be zoomed by +/- 20%
zoom_range=0.2,
# The image can be horizontally flipped or not
horizontal_flip=True,
# The image will have its missing pixels (from other transformations) filled with the value of the nearest pixel in the image.
fill_mode='nearest'
)
Applying a datagen to a set of training images is pretty easy to do, but keep in mind it uses a batch dimension, so using the np.expand_dims()
method will take care of that. Also, to generate images using the datagen, the method .flow()
is used.
# Making all of the training images into arrays and adding a batch dimension to all of them
x = img_to_array(img)
x = np.expand_dims(x, axis=0)
# Load and preprocess the dataset
image_paths = [
# x_train (not shown) contains training images from the CIFAR-10 dataset, this specific image shows a car
x_train[5]
]
training_images = []
for image_path in image_paths:
img_array = img_to_array(image_path)
training_images.append(img_array)
training_images = np.array(training_images)
# Generate and visualize augmented images
i = 0
for batch in datagen.flow(training_images, batch_size=1):
plt.figure(i)
imgplot = plt.imshow(array_to_img(batch[0]))
plt.title(f'Augmented Image {i + 1}')
i += 1
if i % 4 == 0:
break
plt.show()
Here’s what an example output of this code would look like:

Remember, the augmentation transformations are all randomized.
And it doesn’t stop there! You can even make your own data augmentation function using methods and tools from other libraries (in this case, from numpy):
# Define a custom data augmentation function
def add_random_noise(image):
noise = np.random.normal(0, 0.1, image.shape)
return image + noise
# Create an instance of ImageDataGenerator with the custom augmentation
datagen = ImageDataGenerator(preprocessing_function=add_random_noise)
And to whose who read the intro, here’s why the code sample was wrong (according to ChatGPT 4o):

I think this is incorrect—lmk what you guys think in the comments!
Thursday, January 9th
Today, I learned how transfer learning works for Keras: a type of machine learning that comes from “reusing” another successful model’s parameters and settings (i.e. pre-trained models) to use to create another model yourself for the specific context of your problem; this essentially allows the knowledge from a previous model to be used in a similar problem, which is especially useful because it:
Reduces training time
Improves performance
Has the ability to work with smaller datasets
Requires less computational power (no need to get that new, expensive GPU!)
Efficient use of learned features (feature extraction)
And many other benefits!
Note that these pre-trained models are trained using large (and I mean LARGE) datasets of general, everyday objects like ImageNet. Thus, pre-trained models’ primary use is to serve as tools in feature extracting (determining what features or characteristics of an image differentiate the images into their classes the most optimally), and require more fine-tuning in order for the model to be trained towards its specific problem’s context. So, the relationship between transfer learning and pre-trained models can be described in a nice little formula:
Transfer Learning = Pre-Trained Model + Fine-Tuning
So, how do we access these awesome pre-trained models?
Loading a PT Model, part 1
from tensorflow.keras.applications import [pt_model_name]
It’s that simple! The applications
module of Keras includes a collection of pre-trained deep learning models for image classification; some notable examples include:
VGG16: A general CNN architecture with 16 layers
EfficientNetV2B0: Has a very flexible input shape of (None, None, 3) (basically any image)
InceptionV3: Have well-designed convolution modules that generate discriminatory features while reducing parameters
And to actually load the pre-trained model, the function to do so is intuitive:
base_model = VGG16(
weights='imagenet',
include_top=False,
input_shape=(224, 224, 3)
)
But when you do import and load the function, the layers of the pre-trained model can still be “re-trained”, so the layers have to be “frozen” in order for the pre-trained weights to be retained while training the model. This is useful then the model is being used as a feature extractor.
for layer in base.model_layers:
layer.trainable = False;
Friday, January 10th
Loading a PT Model, part 2
Finally, this is how you would create your own model from the loaded pre-trained model:
model = Sequential([
base_model,
# Flattens the base layer to a 1D array
Flatten(),
Dense(256, activation='relu')
# The num_classes variable would the number of classes you have
Dense(num_classes, activation='sigmoid')
])
Note: A sigmoid activation for the last layer would be better for binary classification, while a softmax activation for the last layer would be better for multiclass classification.
And after that, compile the model:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics='accuracy')
Note: binary_crossentropy, is also better for binary classification tasks.
Also about the ImageDataGenerator class—I discovered that it makes a good training data loader, and the example code I saw used it to load training data (I explain the method’s attributes in the code sample because Keras doesn’t have .flow_from_directory()
in its documentation for some reason 😭):
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# Path to the training data
'training_data',
# Resizes images to 150x150
target_size=(224, 224),
# Number of images being processed at once (in a batch)
batch_size=32,
# Mode of the labels (binary/multiclass)
class_mode='binary'
# Shuffling or not
shuffle=True,
# Randomization (for the shuffling)
seed=42
)
.flow_from_directory()
.flow_from_directory()
is similar to the .flow()
method, but .flow_from_directory()
allows for the retrieval of data from a directory directly instead of requiring NumPy/arbitrary array; here’s what a directory where you would use .flow_from_directory
on would look like:
| --- data/
| | --- train/
| | | --- stop_sign/ [50 images]
| | | --- no_stop_sign/ [50 images]
| | --- test/
| | | --- predict/ [25 images]
This directory structure is important because .flow_from_directory()
uses “on-the-fly” augmentation and automatically assigns class labels based on subdirectory names (i.e. stop_sign
and no_stop_sign
in this case), essentially “inferring” the labels of the training images. This “on-the-fly” augmentation also means that .flow_from_directory()
is efficient in handling memory, where it analyses batch by batch, but in turn slows the augmentation and preprocessing process down.
After all this, train_generator
can be treated just like a training_data
variable when putting it in as a parameter to a .fit()
or a .predict()
method.
But back to the model! Training the model just uses the .fit()
method, but notice how there isn’t any validation data in this model; the validation_data
parameter isn’t included in the method, and it turns out that validation data isn’t necessary!
model.fit(train_generator, epochs=10)
Fine-Tuning after Loading the PT Model
But what about fine-tuning? Right…
To do that, just unfreeze some of the weights in the last layers of the model, like so:
# This unfreezes the last 4 layers, hence the [-4:]
for layer in base_model.layers[-4:]:
layer.trainable = True
A Clarification with Model instances (objects)
With all models, they all require four main methods for training and testing:
.compile()
is the first method that’s used for a model object, and it’s pretty straightforward in its usage (“compiling” is basically “translating” your current code into something the computer can understand i.e. binary), and the only notable parameter for it is theoptimizer
which can be, for example, thermsprop
and theadam
optimizers.
Model.compile(
optimizer="rmsprop",
loss=None,
loss_weights=None,
# There's a lot of metrics to choose from, usually a list is used for this parameter
metrics=None,
weighted_metrics=None,
run_eagerly=False,
steps_per_execution=1,
jit_compile="auto",
auto_scale_loss=True,
)
.fit()
is the second method, and it is used for (re)training the model using several different parameters (not all are required, ofc):
Model.fit(
# Input data
x=None,
# Target data
y=None,
batch_size=None,
epochs=1,
# 0 = silent, 1 = progress bar, and 2 = one line per epoch; "auto" = 1
verbose="auto",
callbacks=None,
validation_split=0.0,
validation_data=None,
shuffle=True,
class_weight=None,
sample_weight=None,
initial_epoch=0,
steps_per_epoch=None,
validation_steps=None,
validation_batch_size=None,
validation_freq=1,
)
.evaluate()
would be the next logical step in training a model, for it evaluates the model’s accuracy and other (loss) metrics using the validation (pre-test) data, and then later, the test data.
Model.evaluate(
# Input data
x=None,
# Validation or Test data, depending on what stage of development the model is in
y=None,
batch_size=None,
verbose="auto",
sample_weight=None,
steps=None,
callbacks=None,
return_dict=False,
**kwargs
)
.predict()
is the “beta-testing” of the model, where all you’re gonna get is predictions from the model, given the input datax
.
Model.predict(
x,
batch_size=None,
verbose="auto",
steps=None,
callbacks=None
)
Saturday, January 11th
I was thinking this morning about an analogy to describe the usefulness of pre-trained models and how they conduct feature extraction, so I came up with this one:
A “pre-trained model” are like a seasoned photographer, who is really good at finding the best places to takes pictures, incorporating various elements of photography like subframing and the rule of thirds (I studied a bit of photography over the summer), and when they are given a certain area, the “input data”, to take pictures in (e.g. a park, wedding venue, street), they exemplify “feature extraction” when they point out places to take the best places to take pictures in.
“Fine-tuning”, then, is when the client tells the photographer their specific style or needs in the photos, and then the photographer adjusts their suggestions (in where to take pictures within the area) accordingly.
Training Data? Target Data? Test Data? Isn’t that Validation Data?
Yes, all the types of data can be very confusing…
But here, I created a guide (as an image) to understanding all the types of data:

Validation Data and Test Data are all types of Target Data, while Training Data includes both Input Data and Target Data.
The only difference between “validation” data and “test” data is that the “test” data is like a “final exam”, or the “final trial of a model’s training”, while “validation” data are just “tests” or “quizzes” for a model to assess its performance to improve upon it.
Lessons Learned
I learned how to utilize Keras’ “ImageDataLoader” class to load, preprocess, and occasionally, augment images to train a model on “datagens”, object instances of “ImageDataLoader”.
I also reviewed the roles and differences between Training, Input, Target, Validation, and Test Data (although nowadays a lot of AI and ML developers interchange some of these terms).
Resources
Course I followed:
Keras Documentation: Simplistic, and informative!