The Hidden Layer
Posts
Learning Computer Vision Week 9

Learning Computer Vision Week 9

Week 9 of documenting my AI/ML learning journey (Nov 3 - Nov 9)

Brandon Kim
December 22, 2024

What was discussed last week…

The workings of TensorFlow’s .config files
How protobuf files work
Pipelines in ML model training

Hello everyone, sorry about not posting for the past couple weeks—I’ve been on vacation in South Korea with my family. It was really cool seeing how much more futuristic it felt in Seoul (not really, it’s just a much cleaner city than American cities). Hopefully, society can keep progressing with the help of researching in AI more!

About function explanation notation…

In function headings, datatypes of parameter will be in _subscript while text in regular notation represent parameter names where the actual value will replace them, and text that is in italics with an equal sign is text that will actually need to be in the function, where the value is placed after it, and gray text represent the default values for a given parameter (and are optional), which means that parameters in regular text are required.

Example (functions will be in a heading format):

# All are valid uses of this (hypothetical) function
sum = math.add(3, 6, 3.2)
sum = math.add(2, 3, make_copy=True, 5.5)
sum = math.add(10, 4, make_copy=True)
sum = math.add(4, 7)

.add(_int x, _inty, make_copy=_bool False, _floatspeed)

In this case, the x and y parameters are required and will have their values replace them when in use, and the make_copy parameter is set to False by default, unless told otherwise, and the speed parameter is completely optional, where the function can run both with and without it.

Disclaimer

Most of these code samples in this newsletter have been taken from IBM’s course (listed in the Resources section), and thus aren’t original snippets by me.

Monday, November 4th (Part 6: Training the Model)

Today, I looked at the same IBM lab for making an object detection CNN model with the help of a pre-trained model (if you want to catch up on the previous parts, see my past weeks’ posts), but now, I can finally analyze the part where the model is being trained.

epochs = 40
start_datetime = datetime.now()
!python -m object_detection.model_main \
    --pipeline_config_path=$DATA_PATH/pipeline.config \
    --num_train_steps=$epochs \
    --num_eval_steps=100

regex = re.compile(r"model\.ckpt-([0-9]+)\.index")
numbers = [int(regex.search(f).group(1)) for f in os.listdir(OUTPUT_PATH) if regex.search(f)]
TRAINED_CHECKPOINT_PREFIX = os.path.join(OUTPUT_PATH, 'model.ckpt-{}'.format(max(numbers)))

!python3 -m object_detection.export_inference_graph \
  --pipeline_config_path=$DATA_PATH/pipeline.config \
  --trained_checkpoint_prefix=$TRAINED_CHECKPOINT_PREFIX \
  --output_directory=$EXPORTED_PATH
end_datetime = datetime.now()

The “datetime” lines

start_datetime = datetime.now()
end_datetime = datetime.now()

Their main purpose is just to record the start and end time of when the model is being trained, so that you can calculate and see how long it takes to train and export the model.

The TensorFlow modules

model_main module section

First, we have the model_main module, one of the more simple modules that TensorFlow has to offer:

!python -m object_detection.model_main \
    --pipeline_config_path=$DATA_PATH/pipeline.config \
    --num_train_steps=$epochs \
    --num_eval_steps=100

Ok, so a lot to unpack here. First of all, I learned that some TensorFlow modules are built to be executed as shell commands (bash, powershell, etc.), and this is one of them. But somehow, these modules are in a python file!? Well, this code snippet was from an IBM lab that used Jupyter Notebook, and in Jupyter Notebooks, an exclamation mark “!” at the beginning of the line allows me to execute shell commands directly from a notebook cell.

With that out of the way, the “model_main” module acts similarly to how a native function would, being accessible with a single command. Its primary function is to train a model using the configuration file it was given, along with being able to do additional functions such as saving checkpoints during training, and evaluating the model and logging evaluation metrics.

# Syntax
python -m object_detection.model_main \
    --pipeline_config_path=<path_to_pipeline_config> \
    --model_dir=<path_to_model_directory> \
    --num_train_steps=<number_of_training_steps> \
    --num_eval_steps=<number_of_evaluation_steps> \
    --alsologtostderr

This is also where the .config file we spent so much time on last week comes into play; in the first variable, we see the .config file’s path ($DATA_PATH/pipeline.config) being assigned to the --pipline_config_path option, which will give the model settings that the module needs in order to set up the training process.

I can’t read command-line (CLI) notation!

Dw, I’ll explain some of the important aspects of CLI in the context of the model_main module snippet.

python -m object_detection.model_main \

The first word in a command line typically specifies the program or command that the system will use to execute the command. Any options or arguments that follow adjust or control how that command runs. A dash followed by a character (such as -m in python -m) represents an option (sometimes called a flag or parameter) that modifies the command’s behavior slightly.

For example, in the command above, the python command tells the system to use Python, and the -m option tells Python to run a module rather than a script or file directly. In this case, object_detection.model_main is the specific module that Python will execute.

--pipeline_config_path=$DATA_PATH/pipeline.config \

A -- also denotes an option or argument, but is used for more “lengthy” arguments; the command above sets the pipeline_config_path option to DATA_PATH/pipeline.config.

Additionally, if you noticed the backslashes \ at the end of every line that’s within a module (e.g. model_main), in shell and command-line interfaces, that backslash tells the system that the command (module in this case) will continue to be defined in the next line, and thus the all of the lines where the model_main module was used, except for the last one, had a backslash at the end of the line.

export_inference_graph model section

Second is the export_inference_graph module: it’s not as straightforward as the model_main module, but basically the primary function of the export_inference_graph module is to convert the trained model into a deployable format: a process called model exporting or graph exporting. Think of a trained model as a completed 3D print; it has to go though post-processing, which for the model includes steps like “freezing” model parameters and placeholders and getting rid of unnecessary data specific to training and backpropagation, like dropout layers, and overall optimizes the model for its specific-use case: that’s that the this section with the export_inference_graph module does.

# Syntax
python -m object_detection.export_inference_graph \
    --pipeline_config_path=<path_to_pipeline_config> \
    --trained_checkpoint_prefix=<path_to_checkpoint> \
    --output_directory=<output_directory>

The regex section

This is the snippet section that I’m talking about:

regex = re.compile(r"model\.ckpt-([0-9]+)\.index")
numbers = [int(regex.search(f).group(1)) for f in os.listdir(OUTPUT_PATH) if regex.search(f)]
TRAINED_CHECKPOINT_PREFIX = os.path.join(OUTPUT_PATH, 'model.ckpt-{}'.format(max(numbers)))

A quick rundown about regexes for those who don’t know what it is: “regex” stands for “regular expression”, and it’s a powerful tool in programming if you want to be able to do things like edit complex strings (e.g. file paths) or extract data from a complex for mat such as an email; these type of tasks require complex pattern searching. A regex come in the form of a sequence of characters (like a string), where it defines a search pattern.

In this case, the data that we are trying to extract from the files in the OUTPUT_PATH directory are the checkpoint numbers for each filename, which we save to the numbers array though a list comprehension. How do I know this?

This is the regex in the snippet:

# in python, all regexes begin with an "r" followed by the regex in a string format, as shown.
r"model\.ckpt-([0-9]+)\.index"
# An example of a filename that would fit this regex format:
"model.ckpt-101.index"

model\.ckpt-: This part of the pattern looks for text that starts with "model.ckpt-". The backslash (\) before the dot escapes it, so it’s treated as a literal dot rather than a wildcard character (wildcards are placeholders that can take any value).
([0-9]+): This part matches one or more digits (any number from 0 to 9) and captures them as a group. The parentheses around [0-9]+ create a capturing group, which allows you to extract the matched digits.
\.index: This part looks for the literal text ".index" at the end of the string.

Putting the regex into context,

numbers = [int(regex.search(f).group(1)) for f in os.listdir(OUTPUT_PATH) if regex.search(f)]

The list comprehension for the numbers array basically says “for each file f in the directory OUTPUT_PATH, search for filenames f that matches the regex format, then save the digits that were captured in the first capturing group: ([0-9]+) of that filename into the list as an integer.”

Lessons Learned

TensorFlow modules are not all built the same, one can be very different from another in terms of purpose and syntax.
With that said, some TensorFlow modules use command line (CLI) syntax!
I also had to dive deeper into CLI concepts, such as the difference between when - and -- are used to denote a variable.
I also learned the concept and the syntax of a regex.

Resources

Course I followed:

Introduction to Computer Vision and Image Processing

Offered by IBM. Computer Vision is one of the most exciting fields in Machine Learning and AI. It has applications in many industries, such ... Enroll for free.

www.coursera.org/learn/introduction-computer-vision-watson-opencv