The Hidden Layer
Posts
Learning Computer Vision Week 8

Learning Computer Vision Week 8

Week 8 of documenting my AI/ML learning journey (Oct 27 - Nov 2)

Brandon Kim
December 07, 2024

What was discussed last week…

The application of transfer learning (ft. the os and urllib libraries)
Architectures, configurations, and pre-trained models: working together, but they’re all different.

Disclaimer

Most of these code samples in this newsletter have been taken from IBM’s course (listed in the Resources section), and thus aren’t original snippets by me.

Monday, October 28th (Part Five: Building the Pipeline)

If you don’t understand what ”part five” is in the header, check out my past post: it’s where I started a mini series on me trying to figure out the application of a transfer learning model.

# Context (taken from a previous kernel, the original code is in a Jupyter Notebook)
CONFIG_TYPE = 'ssd_mobilenet_v1_quantized_300x300_coco14_sync'
 
# Pipeline code
pipeline_skeleton = 'content/models/research/object_detection/samples/configs/' + CONFIG_TYPE + '.config'
configs = config_util.get_configs_from_pipeline_file(pipeline_skeleton)

label_map = label_map_util.get_label_map_dict(LABEL_MAP_PATH)
num_classes = len(label_map.keys())
meta_arch = configs["model"].WhichOneof("model")

override_dict = {
  'model.{}.num_classes'.format(meta_arch): num_classes,
  'train_config.batch_size': 6,
  'train_input_path': TRAIN_RECORD_PATH,
  'eval_input_path': VAL_RECORD_PATH,
  'train_config.fine_tune_checkpoint': os.path.join(CHECKPOINT_PATH, 'model.ckpt'),
  'label_map_path': LABEL_MAP_PATH
}

configs = config_util.merge_external_params_with_configs(configs, kwargs_dict=override_dict)
pipeline_config = config_util.create_pipeline_proto_from_configs(configs)
config_util.save_pipeline_config(pipeline_config, DATA_PATH)

This code…is creatine a pipeline: a structured process that takes everything from data preprocessing all the way to deployment, kinda like a recipe! While I was researching on how this code works (with ChatGPT), one thing I’d like to clarify is that even though the model that this code is making is importing a pre-trained model’s configurations, the code still changes some of the configurations that the pre-trained model has described. I’ll explain why the code does this tomorrow.

Tuesday, October 29th (Part 5.2: Explaining the Pipeline)

So the code that I exhibited yesterday (it’s from IBM’s lab btw) was code that is responsible for building a model pipeline. To put the code into perspective, the code, made by IBM, was made in a Jupyter Notebook, meaning that training a model would be a challenge, for the notebook doesn’t have the most robust processing power (and I was experimenting with the code from a laptop, with no fancy GPUs are anything 😭). Thus, adjustments needed to be made to the configuration of the pre-trained model.

But first, we need the code to download a .config file: a centralized model format for all your ML needs! Even TensorFlow heavily uses it!

pipeline_skeleton = 'content/models/research/object_detection/samples/configs/' + CONFIG_TYPE + '.config'
configs = config_util.get_configs_from_pipeline_file(pipeline_skeleton)

No, but really, a .config file, written in either a protobuf (aka Protocol Buffer) or YAML format, contains everything that helps guide the behavior a model and thus really comes in handy when modularizing machine learning models, and making it simpler and easier to modify and manage the models. But in order to access the contents of the .config file, we have to convert it into a readable format (i.e. a dictionary) though the .get_configs_from_pipeline_file() function, as seen above (line 2).

Parts of a `.config` protobuf file (JSON/dictionary version)

Model Configurations: Architectures (saved in the `model` key)

Defines the architecture of a model, along with additional model-specific parameters such as the number of layers and aspect ratios.

meta_arch = configs["model"].WhichOneof("model")

The .WhichOneof() function (unique to protobuf files!) gets which model is currently selected in the “oneof” group in configs["model"].

Training Configurations: Hyperparameters (saved in the `train_config` key)

Sets up the hyperparameters for tuning, such as batch size, optimizer(s), and regularization settings, along with checkpoint information for transfer learning.

Training Data Configurations: Training Data + Label Map (saved in the `train_input_reader` key)

Specifies the location and format of the training data, along with a dictionary of numeric IDs with their corresponding classes: the label map.

label_map = label_map_util.get_label_map_dict(LABEL_MAP_PATH)
num_classes = len(label_map.keys())

Evaluating Configurations: Test Data (saved in the `eval_config` key)

Specifies the location and format of the evaluation data.

Evaluating Data Configurations: Test Data + Label Map (saved in the `eval_input_reader` key)

Similar to the “Training Data Configurations”, just with validation data instead of training data.

By the way, think of checkpoints like hitting “pause” on a model during training, and seeing, or “checking”, all of the model’s current parameters to see how it’s doing; it’s a snapshot of what the model looks like when the checkpoint is happening. Checkpoints can also serve as a “safety net” just in case your computer crashes or something unfortunate happens, because with checkpoints, you can resume training from wherever point the checkpoint has “saved” the model at.

The Folly of the Term “Configuration”

Wait…what’s up with all the “configurations”?

Traditionally, “configurations” refer to the aspects of the model that help crunch the numbers—that do the “mathy stuff”—so to say. Elements such as the learning rate, weights and biases, and the activation functions are all under the configuration, but now, in today’s tEcHoNoLoGiCaLlY aDvAnCeD world, things such as .config files can now consolidate all of the aspects of a model, like the architecture and configuration. Concepts like "hyperparameter optimization”, the automatic adjusting of a models hyperparameters (hyperparameters are the parameters that influence a model’s output but models aren’t normally being trained to optimize hyperparameters) are being integrated into libraries and frameworks such as TensorFlow and PyTorch to the point where the automation and consolidation of so many things has now become the norm.

And now, everything from training parameters to hyperparameters to data paths to architectural settings to evaluation metrics now fall under the term: “configuration”.

Instead of being referred to as just the “weights and biases and formulas” of a model, “configurations” has become everything. When someone brings up “configuration” in ML, you can only tell what they’re talking about through context clues.

Wednesday, October 30th (Part 5.3: Wait what is protobuf?)

What’s a “oneof”?

“oneif” is a feature in protobuf (Protocol Buffer) files that functions like a mutually exclusive option group that allows for the definition multiple possible fields (protobuf’s version of variables) within a single "oneof" group, but only one field in the group can hold a value at any given time.

The more I tried to understand what was really going on in the code, I realized that I had to learn what protobuf is: a Serialization Framework and a Schema Language.

AHHHH!!!!! WTF ARE THOSE!

Well, AI and ML are all about data, data, data. Data in the programming world is what the makes the world go round, and things need to be put in place to make the transferring of data, like our pre-trained model in the form of a .config file, simple and easy (or as simple as it can be…), and that’s what Serialization Frameworks and Schema Languages are here for!

A serialization framework is what “packages” and simplifies the data (cause dtypes like objects and structures are supposedly “complex” for a computer) from in-memory storage into a simpler form where it can be saved in a file, or shared with others on different (computer) systems. It’s…kinda like zipping files? Famous serialization frameworks include JSON, XML, YAML, and, yes, Google’s protobuf.

Meanwhile, a schema language is the “rules” that define the structure of how the data will be stored in their corresponding serialization framework. One extremely simple example of a “schema language” is in CSV, where the “schema language” is how commas separate rows from columns; the comma notation isn’t technically an example of a schema language, though, due to CSV’s simplicity. Like I said, serialization frameworks have their own designated schema languages and are named respectively so: JSON Schema, XML Schema, and protobuf (the schema lang is already integrated into protobuf, because Google has to be *so fancy* ooooooooh is smart and made protobuf more simpler than other serialization frameworks).

And it turns out, the anatomy of the .config file I made the day before yesterday is actually the JSON version of the file (hence the reason it has keys): the original version, in the protobuf serialization framework, differs quite a bit from its converted JSON version but still contains the same data technically; I’ll show how Protocol Buffer and JSON differ tomorrow.

Halloween (Part 5.4: protobuf File Format Explanation)

A quick explanation of protobuf file notation.

Note: This is an example of a SIMPLIFIED protobuf file for building a pipeline for a CV model and while the structure is most likely the same, this ISN’T the exact protobuf file that the IBM lab had used (the one that was shown in the previous days).

syntax = "proto3";

message PipelineConfig {
  Model model = 1;
  TrainConfig train_config = 2;
  EvalConfig eval_config = 3;
  InputReader train_input_reader = 4;
  InputReader eval_input_reader = 5;
}

message Model {
  oneof architecture {  // "oneof" group to select one architecture type
    Ssd ssd = 1;
    FasterRcnn faster_rcnn = 2;
  }
}

message Ssd {
  int32 num_classes = 1;
  string feature_extractor = 2;  // e.g., "mobilenet_v2"
}

message FasterRcnn {
  int32 num_classes = 1;
  string feature_extractor = 2;  // e.g., "resnet50"
}

message TrainConfig {
  int32 batch_size = 1;
  float learning_rate = 2;
  string fine_tune_checkpoint = 3;
}

message EvalConfig {
  int32 num_examples = 1;
  string metrics_set = 2;
  int32 eval_interval = 3;
}

message InputReader {
  string file_pattern = 1;
  bool shuffle = 2;
  int32 buffer_size = 3;
}

Corresponding JSON file if the `ssd` architecture was selected:

{
  "model": {
    "ssd": {
      "num_classes": 80,
      "feature_extractor": "mobilenet_v2"
    }
  },
  "train_config": {
    "batch_size": 32,
    "learning_rate": 0.01,
    "fine_tune_checkpoint": "/path/to/checkpoint"
  },
  "eval_config": {
    "num_examples": 1000,
    "metrics_set": "coco_detection_metrics",
    "eval_interval": 500
  },
  "train_input_reader": {
    "file_pattern": "/path/to/train_dataset/*.tfrecord",
    "shuffle": true,
    "buffer_size": 2048
  },
  "eval_input_reader": {
    "file_pattern": "/path/to/eval_dataset/*.tfrecord",
    "shuffle": false,
    "buffer_size": 1024
  }
}

Corresponding architecture is the `resnet_50` architecture was selected:

{
  "model": {
    "faster_rcnn": {
      "num_classes": 80,
      "feature_extractor": "resnet50"
    }
  },
  "train_config": {
    "batch_size": 32,
    "learning_rate": 0.01,
    "fine_tune_checkpoint": "/path/to/checkpoint"
  },
  "eval_config": {
    "num_examples": 1000,
    "metrics_set": "coco_detection_metrics",
    "eval_interval": 500
  },
  "train_input_reader": {
    "file_pattern": "/path/to/train_dataset/*.tfrecord",
    "shuffle": true,
    "buffer_size": 2048
  },
  "eval_input_reader": {
    "file_pattern": "/path/to/eval_dataset/*.tfrecord",
    "shuffle": false,
    "buffer_size": 1024
  }
}

Here’s the equivalents from the protobuf file to JSON:

Protocol Buffer	JSON
message PipelineModel	N/A (explanation below the table)
message Model	“model” key
The specific model message (in this case, Ssd or FasterRcnn)	“ssd” or “faster_rcnn” key, nested within the “model” kay (depends on which model is selected)
message TrainConfig	“train_config” key
message EvalConfig	“eval_config” key
message InputReader	“train_input_reader” and “eval_input_reader” keys

message PipelineModel’s purpose

The first message, message PipelineModel is there to be a top-level container for the configuration file, making the pipeline more modular, acting as a sort of a “main menu” for the whole file. More specifically, message PipelineModel organizes and brings together all the different components needed for setting up and running the machine learning pipeline by containing fields for each component of the pipeline; notice how the field names are identical to their counterparts in the JSON converted version, and how the declaration keywords (the first word in a field declaration) are identical to their counterparts in the message names that come later in the protobuf file.

message PipelineConfig {
  Model model = 1; # declaration keyword = "Model", field name = "model"
  TrainConfig train_config = 2;
  EvalConfig eval_config = 3;
  InputReader train_input_reader = 4;
  InputReader eval_input_reader = 5;
}

Why can’t I find PipelineModel in the JSON file?

Also, the reason why the JSON conversion doesn’t visibly have a PipelineConfig key is because the JSON conversion represents the contents of the PipelineConfig message itself, where the PipelineConfig‘s purpose is to act as a container; it’s kinda like having a directory (aka folder) containing another directory that has the actual data that’s needed.

{  # <-- This opening bracket represents the beginning of the PipelineModel object
  "model": {
    "architecture": "ssd",
    "num_classes": 80
  },
# rest of the JSON data...

Lessons Learned

TensorFlow heavily uses .config files, a type of protobuf or YAML file; in this case it was protobuf.
Created by Google, protobuf (aka Protocol Buffer) is a type of serialization framework and schema language that handles the storage and “packing” of data to where it can be sent and opened (deserialized) safely and easily.
The anatomy of a protobuf and its corresponding JSON file (cool!)

Ending Note

Sorry, guys, but this will be the last post from me for a while—for those who don’t know, I am only a high school student, which means that I live under my parents’ roof, and my parents have taken away access to my computer as a part of a ban, meaning that I won’t be able to post or learn about AI during my ban. I expect to be back by January; hopefully this unexpected postponing/hiatus won’t last for too long.

Resources

Course I followed:

Introduction to Computer Vision and Image Processing

Offered by IBM. Computer Vision is one of the most exciting fields in Machine Learning and AI. It has applications in many industries, such ... Enroll for free.

www.coursera.org/learn/introduction-computer-vision-watson-opencv