Learning Computer Vision Week 7

Week 7 of documenting my AI/ML learning journey (Oct 20 - Oct 26)

What was discussed last week…

  • The genius behind Faster R-CNNs which implement CNNs in CV in an innovative way

I actually want to make a comprehensive CV model soon, and this week, I’ll be planning to clean up some concepts with CV so that I can finally, hopefully, make my own model (possibly with some images that I have taken).

About function explanation notation…

In function headings, datatypes of parameter will be in subscript while text in regular notation represent parameter names where the actual value will replace them, and text that is in italics with an equal sign is text that will actually need to be in the function, where the value is placed after it, and gray text represent the default values for a given parameter (and are optional), which means that parameters in regular text are required.

Example (functions will be in a heading format):

# All are valid uses of this (hypothetical) function
sum = math.add(3, 6, 3.2)
sum = math.add(2, 3, make_copy=True, 5.5)
sum = math.add(10, 4, make_copy=True)
sum = math.add(4, 7)

.add(int x, int y, make_copy=bool False, float speed)

In this case, the x and y parameters are required and will have their values replace them when in use, and the make_copy parameter is set to False by default, unless told otherwise, and the speed parameter is completely optional, where the function can run both with and without it.

Disclaimer

Most of these code samples in this newsletter have been taken from IBM’s course (listed in the Resources section), and thus aren’t original snippets by me.

Tuesday, October 22nd (Part One: Pre-Trained CV Model Setup)

Today I stumbled across some code that supposedly was making an object detection CNN using TensorFlow, and it looked like very strange code, where it seemed cryptic to me, even though it was all in Python; it’s just the sheer amount of different libraries and modules that were being used all at once that was making me all confused. For example, this is some code for preprocessing training and annotated images by assigning them a number. I would later realize that the code that was shown to me was a guide to training a DL Neural Network model using transfer learning, which is essentially finding a pre-trained model which has been trained on a relatively large dataset and is related to your task (e.g. image classification), and tuning it until it works best for your situation. The pre-trained model that I was using is called MobileNet V1.

os.makedirs(DATA_PATH, exist_ok=True)
with open(LABEL_MAP_PATH, 'w') as f:
    for idx, label in enumerate(labels):
        f.write('item {\n')
        f.write("\tname: '{}'\n".format(label))
        f.write('\tid: {}\n'.format(idx + 1))
        f.write('}\n')

os.makedirs(string path, char exist_ok=False)

The .makedirs() function from Python’s os library make directories, with possible parameters:

path (required)

A path string or set of bytes representing a file path that the function will attempt to make a directory (folder) in.

exist_ok (optional)

The default value of this parameter is False, and if left False, then the program will throw and OSError if the target directory/folder already exists, and in contrast, the program won’t do so if the value for this parameter is True.

open(string path, char mode)

The open() function enables the program to read files, write files, create files, and more within running a Python file.

path (required)

Same parameter as seen before.

mode (required)

This parameter will be a char, depending on what you want the program to do to the file. The most common chars/modes used for this functions is “r” for reading a file, “w” for writing a file, “a” for appending to a file, and “x” to create a file. For reading and creating, it’s worth noting that the program will throw an error if the file does not exist.

Wednesday, October 23nd (Part Two: Pre-Trained Model Importing)

I continued to analyze the same object detection CNN program from yesterday, and figured out some more of the functions’ functions. This following code retrieves and extracts the pre-trained model from the internet, using the tarfile module/library.

MODEL_TYPE = 'ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18'
CONFIG_TYPE = 'ssd_mobilenet_v1_quantized_300x300_coco14_sync'
download_base = 'http://download.tensorflow.org/models/object_detection/'
model = MODEL_TYPE + '.tar.gz'
tmp = '/resources/checkpoint.tar.gz'

if not (os.path.exists(CHECKPOINT_PATH)):
    opener = urllib.request.URLopener()
    opener.retrieve(download_base + model, tmp)

    # Extract all the `model.ckpt` files.
    with tarfile.open(tmp) as tar:
        for member in tar.getmembers():
            member.name = os.path.basename(member.name)
            if 'model.ckpt' in member.name:
                tar.extract(member, path=CHECKPOINT_PATH)
            if 'pipeline.config' in member.name:
                tar.extract(member, path=CONFIG_PATH)

    os.remove(tmp)

This is the most ugly important part, where the program imports a pre-trained model from TensorFlow’s Model Zoo by using the os and urllib libraries. This this code, the formula for the pre-trained models’ retrieval URLs is as follows:

 download_base + MODEL_TYPE / CONFIG_TYPE + .tar.gz

urllib.request.URLopener()

Kudos if you figured out that this was a class! This class from the urllib library is a class that creates object instances that can use functions like .open(), .close(), and .retrieve() (which we’ll see later) to handle URL requests from the internet.

opener.retrieve(string url, filename=string None)

This function from the urllib.request library downloads a file from a URL and saves it to a specified location.

url (required)

This parameter is the URL to retrieve.

filename (required)

This parameter specifies where to save the downloaded file.

.getmembers()

Coming from the tarfile module, in the context of the code, the .getmembers() function retrieves a list of the full PATHs for all the files/“members” in the opened .tar.gz file that the function is attached to.

os.path.basename(string path)

This function is pretty useful, for it returns the last (most specific) folder/directory name in the path parameter that was passed into the function.

.extract(string member, path=”.”)

The .extract() function does the actual job of extracting a specific member (file or directory) from a file in a tarball format (e.g. .tar, .tar.gz).

member (required)

The path (in string form) or TarInfo object that represents the file that you want to extract.

path (optional)

The destination directory where the extracted file or directory would be saved, if unspecified, then the function will save the extracted file to the current working directory.

Wait, so why is a with…as statement used instead of a variable declaration?

You may have noticed the interesting notation in the line,

with tarfile.open(tmp) as tar:

Where tar is initialized, but not in the usual declaration format i.e. tar = tarfile.open(tmp); why is that?

When it comes to opening files within code in Python, it’s usually best practice to use a with…as statement because when the program exits the scope of the with…as statement, the file will automatically and safely close itself, without requiring some sort of .close() function or whatnot. This structure also make the code more modular, where the code that handles the tar can be easily seen (it will be tabbed and/or more right than the other lines).

Thursday, October 24th (Break: Figuring Out the Pipeline)

For those who don’t know, a pipeline is basically a structured workflow that covers the whole process of making and training and tuning a model. Ok, so like I said, the model that I’m experimenting (to clarify, the code excerpts of it are in this newsletter) uses the MobileNet V1 pre-trained model, but it also uses the MobileNet V1 configuration (same name, different purpose: it’s stupid naming, I know) and the SSD MobileNet Architecture.

The Anatomy of a Neural Network (non-exhaustive):

After further researching into what the role of pre-trained models were in building a neural network, I found that among all the different components that made up a one, there are three main components that make up the majority of a Neural Network:

  1. Architecture (Notable Examples: ResNet, AlexNet, VGGNet)

    A Neural Network’s architecture outlines the structure of the layers and neurons based on what inputs and outputs go where: how much neurons there will be in each layer, and what kind of neuron will they behave as, setting where in outputs of what neuron(s) go to which other neuron(s), and more. Thus, the architecture of a Neural Network behaves like a recipe, where different ingredients (input data) are called, and each one goes through a cooking or prepping method such as mixing, boiling, frying, and dicing (neuron type); resulting mixtures and forms of the ingredients are then sent to more cooking methods (output data from a neuron to input data of another neuron) until the final, resulting food (final output) is made. Some ingredients undergo the same cooking method, while others are the only ingredient that undergo a cooking method: say, for example, for making a pizza, a pepperoni sausage has to be cut up to make toppings for the pizza, but the pepperoni sausage is the only ingredient that will be cut up. Similarly, some types of neurons/layers (like flattening layers) may only appear once or otherwise very rarely within a neural networks architecture.

  2. C on f ig ur a tio n (Notable Examples: YOLO, MobileNet)

    The configuration of a neural network refers to the setup for training the neural network. This includes aspects such as weight formulas, learning rates, optimization algorithms, batch size, number of epochs, dropout rates, input image sizes, and the number of output classes. That is to say, the configuration mostly takes care of the “mathy stuff” that is required by the neural network in order to behave and learn properly.

  3. Pre-Trained Model (commonly used, but optional; Notable Examples: YOLO, MobileNet, ResNet)

    A pre-trained model is akin to a default build or skin in a videogame; it’s a well-known and/or trusted example of a partially-created or fully-created configuration for a specific architecture and thus pre-trained models are only compatible with their respective architecture that they were built and optimized for. Pre-Trained models, while optional, are very useful when you have a crappy computer (like me) that can’t train model parameters and hyperparameters such as weights, biases, and the learning rate. This is because of how refined and accurate models demand a high computing force (e.g. powerful GPU or TPU) in order to be trained.

Lessons Learned

  • Transfer Learning is interesting in action!

  • I feel like the os and urllib libraries are going to be a staple in importing architectures and pre-trained models.

  • And speaking of architectures and pre-trained models, I learned the difference between the two, and also with configurations.

Resources

Course I followed: