The Hidden Layer
Posts
Applying Computer Vision Week 2

Applying Computer Vision Week 2

Week 11 of documenting my AI/ML learning journey (Dec 22 - Dec 28)

Brandon Kim
January 05, 2025

What was discussed last week…

What CUDA means and it’s use in AI pipelines
Analyzing a CV neural network model made for classifying images (images with a stop sign vs images that didn’t have one)

Monday, December 23rd (Christmas Eve Eve)

Using the same model that I was working with in IBM’s CV Studio from last week, I tested the model with some sample test images using the following code:

# Context for the file that this code snippet is from is in my past post, "Applying Computer Vision Week 1"
imageNames = ['stop_1.jpeg','stop_2.jpeg','not_stop_1.jpeg']
for imageName in imageNames:
    image = Image.open(imageName)
    transform = composed = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()])
    x = transform(image)
    z=model(x.unsqueeze_(0))
    _,yhat=torch.max(z.data, 1)
    # print(yhat)
    prediction = "Stop"
    if yhat == 1:
        prediction ="Not Stop"
    imshow_(transform(image),imageName+": Prediction = "+prediction)

transforms.Compose()

This function takes a list of transformations and applies them sequentially to an image, kind of like how pipes work in a cmd terminal, where the output of one step becomes the input for the next. In this case, the transforms.Compose() function does the resizing first (i.e. transforms.Resize()), and then converts the image (which is in PIL/pillows format) into a tensor i.e. (transforms.ToTensor()).

.unsqueeze_()

This adds a new dimension to a tensor “in place“, which means that the tensor is modified directly; for example, in this context, the image tensors are structured like this:

(channels, height, width) e.g. (3, 244, 244)

But this function makes it:

(batch_size, channels, height, width) e.g. (1, 3, 244, 244)

As a result of all of these functions working together, the model was able to correctly identify the presence of a stop sign in all of the three test images that I showed it! :)

The model is pretty smart, huh?

Tuesday, December 24th (Christmas Eve)

Today didn’t involve much code, just refining the model by adding tricky training data so my model becomes “smarter”.

In IBM’s CV Studio, there are about five main steps in making a model:

Upload (the images)
Annotate (the images for preprocessing if needed)
Train model
Use model ←This is where I am right now
Showcase

After running using the model on some tricky pictures, it turns out it still misclassified some images:

oh. shoot.

So as a part of my course requirements (and my perfectionism), I got a few more images from the internet with stop signs and no stop signs for additional model training, and hopefully it will be able to classify the image (named trick_test) correctly.

Some of the images that I added (the second additional image was taken from Reddit lmao)

And sure enough, the model was getting smarter, with the added data:

Yippee! (I’m gonna ignore the fact that my first additional training image was the one that was classified incorrectly aka this one)

Now, in the course, I have to review others’ submissions based on their CV model (that’s how the project is graded):

yikes…

Also, I was able to get the model to classify the misclassified image correctly because I actually added the misclassified image into the training data…😳

Technically I wasn’t “cheating”, I was just using the concept of uncertainty sampling, a type of active learning where:

1. The model makes predictions on new, unseen data.

2. It identifies samples that it misclassified or is least confident about.

3. These misclassified or uncertain samples are then added to the training dataset with their correct labels.

4. The model is retrained on this updated dataset.

This process, therefore, helps the model learn from its mistakes and improve its performance on challenging cases.

perplexity.ai, 2024

And that’s what I did!

Lessons Learned

I learned more about some tensor/image-editing functions, i.e. .unsqueeze_() and transforms.Compose()

I gained some experience in looking for training data online, specifically high-quality images that help improve a model’s performance.

Resources

Course I followed:

Introduction to Computer Vision and Image Processing

Offered by IBM. Computer Vision is one of the most exciting fields in Machine Learning and AI. It has applications in many industries, such ... Enroll for free.

www.coursera.org/learn/introduction-computer-vision-watson-opencv

AI model + search engine I used to generate a few of my explanations: perplexity.ai