The Hidden Layer
Posts
Applying Computer Vision Week 1

Applying Computer Vision Week 1

Week 10 of documenting my AI/ML learning journey (Dec 15 - Dec 21)

Brandon Kim
December 29, 2024

I FINALLY GOT OFF MY BAN THIS WEEK!

Meaning more content, and AI learning! Hooray!

Today, I’m trying something different: I’m going to start working between my personal computer (laptop) and my dad’s old desktop to see how performance in training, optimizing, etc. works when training my AI models, since I’m planning on making applications with my knowledge in branches in AI like CV (computer vision), regression models, and possibly NLPs (Natural Language Processing) later on.

Additionally, this week is final exam week at my high school, so I may not have as much time to learn AI and ML as much as I would like…

Thus, the content may vary more in terms of quality and quantity—just thought I’d let you guys know!

Thursday, December 19th

Today is the first day of my new chapter in my AI journey, but I’m still working on CV; I’m just applying it—using IBM’s CV Studio!

As a part of my IBM course that’ve been following, it led me though a process of making an image classification app in CV Studio using transfer learning (because of my sh*t computer😭) that can identify whether an image includes a stop sign or not, and the process was very easy because of how simplistic and effective CV Studio’s interface is.

Me importing images into IBM’s CV Studio

Also, IBM’s course provided the code (in a Jupyter Notebook) and the training images (i.e. “stop” and “not stop”). And since I like to understand things in-depth, I went into Jupyter Notebook, and tried to understand some of the functions and concepts that were being used, including code snippets like:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("the device type is", device)

This snippet of code defines our device (i.e. CPU) as the first visible cuda device if we have CUDA available. What is CUDA? CUDA is parallel computing platform and programming model developed by NVIDIA, which allows PyTorch to use the power of GPUs for faster computations, and CUDA devices are CUDA-enabled/compatible GPUs. We also have a function, torch.device:

torch.device(<device_specification>)

torch.device is a function from one of many PyTorch’s libraries, and it represents the device on which tensors and models will be allocated and computations will be performed; common device specifications include the CPU “cpu” and the GPU “cuda”. But the specification that was used in the code snippet above uses dynamic assignment, where it checks if my computer has a CUDA-compatible GPU (which it doesn’t, it has an integrated Intel Iris Xe😭); if it does, it assigns the device specification to “cuda:[gpu_id]” and if it doesn’t, assigns the device specification to “cpu”.

The “gpu_id” will normally be “0”, unless you have multiple GPUs that are CUDA-compatible—and the torch.device function will let you specify what GPU you want to use (e.g. device = torch.device("cuda:1") if you want to use the second GPU).

Another piece of code further down the code contained the bulk of the processing and the “AI stuff”—the training of the model.

def train_model(model, train_loader,validation_loader, criterion, optimizer, n_epochs,print_=True):
    loss_list = []
    accuracy_list = []
    correct = 0
    #global:val_set
    n_test = len(val_set)
    accuracy_best=0
    best_model_wts = copy.deepcopy(model.state_dict())

    # Loop through epochs
        # Loop through the data in loader
    print("The first epoch should take several minutes")
    for epoch in tqdm(range(n_epochs)):
        
        loss_sublist = []
        # Loop through the data in loader

        for x, y in train_loader:
            x, y=x.to(device), y.to(device)
            model.train() 

            z = model(x)
            loss = criterion(z, y)
            loss_sublist.append(loss.data.item())
            loss.backward()
            optimizer.step()

            optimizer.zero_grad()
        print("epoch {} done".format(epoch) )

        scheduler.step()    
        loss_list.append(np.mean(loss_sublist))
        correct = 0


        for x_test, y_test in validation_loader:
            x_test, y_test=x_test.to(device), y_test.to(device)
            model.eval()
            z = model(x_test)
            _, yhat = torch.max(z.data, 1)
            correct += (yhat == y_test).sum().item()
        accuracy = correct / n_test
        accuracy_list.append(accuracy)
        if accuracy>accuracy_best:
            accuracy_best=accuracy
            best_model_wts = copy.deepcopy(model.state_dict())
        
        
        if print_:
            print('learning rate',optimizer.param_groups[0]['lr'])
            print("The validaion  Cost for each epoch " + str(epoch + 1) + ": " + str(np.mean(loss_sublist)))
            print("The validation accuracy for epoch " + str(epoch + 1) + ": " + str(accuracy)) 
    model.load_state_dict(best_model_wts)    
    return accuracy_list,loss_list, model

…I’ll explain this part later.

Meanwhile, I was attempting to deploy the app though CV Studio…

The app deployment wasn’t successful :(

Saturday, December 21st

So…that giant piece of code from Thursday! I’ll try to break it down.

Part 1: Variable Definition

    loss_list = []
    accuracy_list = []
    correct = 0
    #global:val_set
    n_test = len(val_set)
    accuracy_best=0
    best_model_wts = copy.deepcopy(model.state_dict())

In the context of this file, loss_list and accuracy_list store the loss and accuracy values for each epoch. correct keeps track of correctly predicted samples. n_test is the number of samples in the validation set. accuracy_best stores the best validation accuracy observed. best_model_wts stores the model weights corresponding to the best validation accuracy.

Part 2: The Loss Sublist

   for epoch in tqdm(range(n_epochs)):
        
        loss_sublist = []
        # Loop through the data in loader

        for x, y in train_loader:
            x, y=x.to(device), y.to(device)
            model.train() 

            z = model(x)
            loss = criterion(z, y)
            loss_sublist.append(loss.data.item())
            loss.backward()
            optimizer.step()

            optimizer.zero_grad()
        print("epoch {} done".format(epoch) )

        scheduler.step()    
        loss_list.append(np.mean(loss_sublist))
        correct = 0

From the structure of this code, we see that the this part of the code is in a for loop, that loops though it for each epoch there is: n_epochs. A list, loss_sublist is created, and another for loop is used, where in that loop, the PyTorch DataLoader object train_loader that provides the batches of data x (input data/features) and y (target labels), where the batches’ data are moved to the specified device i.e. in my case, the “cpu” because of my lack of a CUDA-compatible GPU (see the first code snippet for context).

Then, the model.train() line sets the model to “training mode”, where layers like dropout and batch normalization layers toggle on/off or behave differently during training. And then after that, the model(x) computes the predictions z, where z is used in the next line to compute the loss, and in the next line the .append() function adds the loss value to the list loss_sublist to keep track of all the losses.

Finally, the loss.backward() computes the gradients of the loss (i.e. backpropogation), and the optimizer.step() updates the model parameters using the gradients and the optimizer (e.g. SGD, Adam), and then finally, the optimizer.zero_grad() resets the gradients to zero for the next epoch. Then the learning rate updates using scheduler.step() (in this program it decreases from 0.01 to 0.001 though the epochs), and the average loss for the epoch is added (appended) to the loss_list, and he correct count correct is reset to 0 for the validation phase. What’s the validation phase? Look below at the next snippet of code!

What is a DataLoader?

“DataLoader” is a class in PyTorch that can perform many functions that can prove useful when training a neural network in batches: for example, it can automatically split input data into batches, amongst other functions, and it’s a very versatile and powerful tool that can make the training process much modular and simpler.

Learn more in may past post where I explain it further: https://the-hidden-layer.beehiiv.com/p/learning-computer-vision-week-4

Part 3: The Validation Holder

   for epoch in tqdm(range(n_epochs)):
   ...
        for x_test, y_test in validation_loader:
            x_test, y_test=x_test.to(device), y_test.to(device)
            model.eval()
            z = model(x_test)
            _, yhat = torch.max(z.data, 1)
            correct += (yhat == y_test).sum().item()
        accuracy = correct / n_test
        accuracy_list.append(accuracy)

Similar to the x, y=x.to(device), y.to(device) line in Part 2, but this time the DataLoader validation_loader is moving the batches x_test and y_test to the device i.e. “cpu”.

The model is then set into “evaluation mode”, which is another “mode”, (like “training mode” from Part 2) using model.eval(), and then model(x_test) obtains the predictions z, which the program uses to then find the class with the highest predicted score (logits), using the .max() function, along dimension 1 (class dimension).

Then, the variable correct saves the number of however much instances of correct predictions there were by comparing the predicted value yhat to the true value y_test in an equality logic condition; the number equivalent to True is 1, so if the predicted value was correct, then that would contribute to the number correct though the additional functions .sum() and .item(). And finally, the accuracy of the current validation set is computed and appended/added to the accuracy_list.

What is the underscore (“_”) doing there? (and torch.max() function explanation)

The reason why the underscore is there, is because we want to save the indices of the max values in their respective dimension, but not the values themselves.

The function, torch.max() isn’t as simple as a regular python max() function.

Even though torch.max() does in fact output the max value in a vector (list/array), the function needs another piece of information, a parameter that specifies a dimension, in order to do its job. This is because torch.max() works on tensors, not lists/arrays, which means that the function needs to know which dimension it should look in to determine the max value(s), basically, the function needs to know if it is looking for the largest value for each row, or column, or “depth vector”, etc.

The syntax looks a bit like this:

values, indices = torch.max(tensor_name, dimension_to_focus_on)

In this code snippet, the dimension that the function focused on was dimension 1: the class scores for each sample in the batch. Dimension 1 is usually where the classes are

With that said, the value that the function returns also comes with its indices (as seen in the code snippet), and the indices correspond to the dimension the function was looking at—this means that with a dimension of 1, the function will return all the max values for each row, along with the indexes of said values in their respective row. (and column if the dimension was 0).

Part 4: Summarizing

        if accuracy>accuracy_best:
            accuracy_best=accuracy
            best_model_wts = copy.deepcopy(model.state_dict())
        
        if print_:
            print('learning rate',optimizer.param_groups[0]['lr'])
            print("The validation  Cost for each epoch " + str(epoch + 1) + ": " + str(np.mean(loss_sublist)))
            print("The validation accuracy for epoch " + str(epoch + 1) + ": " + str(accuracy)) 
    model.load_state_dict(best_model_wts)    
    return accuracy_list,loss_list, model

We see a classic “high score”-like updating system in the first “if” statement, where the variable accuracy_best essentially acts like a “high score” but for the highest accuracy.

Then, the variable best_model_wts (best model weights) saves a copy/”snapshot” of the model’s model parameters and buffers, which acts like a manual checkpoint, saving the “highest accuracy model’s parameters”.

Lessons Learned

I learned about CUDA and its role in functions like torch.device().

Also, I reviewed DataLoaders in PyTorch and how useful and universal they can be!

torch.max() bears many resemblances to Python’s default max() function, with some differences that tailor to how tensors work in PyTorch (check out my past posts for more info about what tensors are).

Resources

Course I followed:

Introduction to Computer Vision and Image Processing

Offered by IBM. Computer Vision is one of the most exciting fields in Machine Learning and AI. It has applications in many industries, such ... Enroll for free.

www.coursera.org/learn/introduction-computer-vision-watson-opencv