Learning Computer Vision Week 2

Week 2 of documenting my AI/ML learning journey (Sept 15 - Sept 21)

What was discussed last week…

  • Image and Pixel Transformations

  • The Basics of Linear Algebra: vectors, matrices, and eigenvectors

  • SVD (still figuring it out)

Didn’t read last week’s post? Click here!

At my high school, I am in a STEM pathway with courses geared toward different STEM fields, and understandably, there are engineering courses. In one of those engineering classes, I get access to a makerspace, and the current project I was working on at the time of this post was to make a laser engraving file to engrave into a piece of wood; it was practice for a new laser cutting software that we has just received licenses for (LightBurn, for those who are curious).

Midway into the project, I wanted to upload a design I found on the internet onto the LightBurn file, which I had to use an svg file (vector image) converter online in order to import the image onto the file. And while the converter was converting, I wondered if the converter used edge detection

Some SVG converters do use Sobel filters or similar edge detection techniques when converting raster images to vector graphics. Edge detection, including methods like the Sobel operator, is often a key step in identifying shapes and lines in an image for vectorization. However, not all SVG converters use these advanced techniques. The specific approach varies depending on the tool and desired output quality, ranging from sophisticated edge detection and path tracing to simpler methods or even direct raster image embedding.

-"Response to user query regarding SVG converters and Sobel filters." OpenAI, 2024, https://www.perplexity.ai/

Now, edge detection was a concept that I had skimmed before (in the same course), but I haven’t looked fully into it, so this week, I’ll be investigating that.

Also, last week was a doozy, considering that I had picked up linear algebra in just a matter of a few days, so I’ll be spending this next week cleaning up my understanding of LA and specifically the mechanics of SVD.

Monday, September 16

Today, I just spent most of my programming time visualizing a Sobel and sharpening operator on random made-up images that I made (on paper).

Tuesday, September 17

Downloaded the Pillow and Open CV libraries using pip today!

Welp, looks I’ve already done it actually

Now that I learned the concepts of correcting and conducting segmentation for images though filtering, sharpening, and edge detection, I experimented with some images in a Jupyter notebook, using the Pillows and OpenCV library; both of which can get the job done, but in different ways.

Types of common and useful filters (not exhaustive):

1a. Mean Filters (Level Filters, Smoothing Filters)

Also known as mean or smoothing filter, these filters are the most simplest of filters when it comes to image preprocessing. This is there the kernel averages out all the values that are contained in the kernel for the kernel’s center pixel’s value.

No matter which library is being used, a kernel is required.

What is a Kernel (in convolution)?

Also known as a filter, the convolution kernel is the filter matrix for feature extraction. For each pixel in an image, the inner product of the pixel within the local window centered on that pixel and the convolution kernel is calculated. This value is used as the new value for that pixel. Each pixel in the image is traversed, and the above inner product operation is performed to obtain a new feature map. Different kinds of convolution kernels can achieve different functions.

Pillows library

OpenCV library

kernel = np.ones((6,6))/36

kernel_filter = ImageFilter.Kernel((5,5), kernel.flatten())

image_filtered = noisy_image.filter(kernel_filter)

plot_image(image_filtered, noisy_image,title_1="Filtered image",title_2="Image Plus Noise")
kernel = np.ones((6,6))/36

image_filtered = cv2.filter2D(src=noisy_image, ddepth=-1, kernel=kernel)

plot_image(image_filtered, noisy_image,title_1="Filtered image",title_2="Image Plus Noise")

But, of course, there is a trade-off in the sizes of kernels: the larger the kernel is, the more noise it will be able to filter, but blurs the image more.

1b. Gaussian Blur

Gaussian Blur bases its blurring method by the Gaussian distribution (the 68-95-99.7 rule is based off of it), where the kernel gives more “weight” to the center pixel, and exponentially less “weight” to the surrounding pixels in the style of that of a Gaussian distribution.

Pillows library

OpenCV library

image_filtered = noisy_image.filter(ImageFilter.GaussianBlur(4))

plot_image(image_filtered , noisy_image,title_1="Filtered image",title_2="Image Plus Noise")

The parameter in .GaussianBlur() (i.e. 4) is the kernel radius (yes, that means that the kernel is always going to be a square)

image_filtered_23 = cv2.GaussianBlur(noisy_image,(5,5),sigmaX=4,sigmaY=4)

plot_image(image_filtered_23 , noisy_image,title_1="Filtered image",title_2="Image Plus Noise")

OpenCV’s .GaussianBlur() has quitemuch more parameters; here are their descriptions for the ones shown (respectively):

src: The input image

ksize: The Gaussian kernel size

sigmaX: The kernel’s standard deviation in the X direction

sigmaY: sigmaX but in the y-direction, will be equal to sigmaX if sigmaY is set to 0 (and the kernel will therefore be a square)

Even though you can do and customize a lot more with OpenCV, it’s more complex than Pillow.

When I was trying to display different images in my Jupyter notebook, I had to read into the matplotlib (library) documentation in order to successfully do so; it came in pretty handy! (If you’re curious which website I used, look in the Resources section)

1c. Median Filtering

Median Filtering is similar to Mean Filtering, but instead of taking the mean (average) of all of the pixels in a kernel, Median Filtering takes the median instead, leading to more definite but sometimes more inaccurate/warped images.

2. Edges

Pillows library

OpenCV library

Edge Enhancing (optional but recommended):

img_gray = img_gray.filter(ImageFilter.EDGE_ENHANCE)
img_gray = img_gray.filter(ImageFilter.FIND_EDGES)

Note that this can also be a shortcut for image sharpening, but I’ll go over a more comprehensive method down below.

x-direction:

ddepth = cv2.CV_16S

grad_x = cv2.Sobel(src=img_gray, ddepth=ddepth, dx=1, dy=0, ksize=3)

y-direction:

grad_y = cv2.Sobel(src=img_gray, ddepth=ddepth, dx=0, dy=1, ksize=3)

Combining:

abs_grad_x = cv2.convertScaleAbs(grad_x)
abs_grad_y = cv2.convertScaleAbs(grad_y)
grad = cv2.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0)

3. Sharpening

Pillows library

OpenCV library

sharpen_kernel = np.array([[-1,-1,-1], 
          [-1, 9,-1],
          [-1,-1,-1]])

sharpen_kernel = ImageFilter.Kernel((3,3), sharpen_kernel.flatten())

sharpened = image.filter(sharpen_kernel)

plot_image(sharpened, image, title_1="Sharpened image",title_2="Image")
sharpen_kernel = np.array([[-1,-1,-1], 
          [-1, 9,-1],
          [-1,-1,-1]])

sharpened = cv2.filter2D(image, -1, sharpen_kernel)

plot_image(sharpened, image, title_1="Sharpened image",title_2="Image")

Notice how both libraries require a sharpening kernel to work! This sharpening kernel/filter can come in many different variations though, with the main gist being the middle pixel(s) being 0 or greater, and the outside pixels having a negative number.

Wednesday, September 18

Took a quiz in my Computer Vision course, and I passed second try 🤷

I also reviewed KNN (K-Nearest Neighbors), but in the concept of computer vision and, more specifically, image classification, which is what it sounds like: classifying images into categories e.g. “cat”, “dog”, “fish”.

When computers try to identify images, remember that they can only see the images as a bunch of intensity values, so this poses a lot of problems when trying to achieve image classification. But KNN is an algorithm that attempts to bridge that gap.

Basically, when looking at a set of images, the computer “plots” them in an area (like a coordinate plane), where the “area” has one dimension for each of their characteristics and attributes; for example, in a dataset where images can vary in their red, green, and blue values, they would be plotted in a 3D space; this is a concept in ML called dimensionality reduction, or in other words, “data visualization in a low-dimensional space”. Then when the computer model is “trained” by the “training” images that it’s given, the model is given an image it has never seen before and is assigned the task of classifying the image using the training data it was given. This unknown image is called “test” data for the computer, and it calculates the distance between that test image and all the otehr images that were in the training data, and takes the “k” amount of nearest images, or “neighbors”, and remembers them. Finally, the computer assigns the test image the class of whatever the majority class is of the “closest k amount” of images.

KNN sounds cool in theory, but in application makes for a relatively inefficient classification algorithm though.

Thursday, September 19

I spent most of today reviewing mainly machine learning/AI theory and starting up my IBM Cloud account (that came with the course, hint hint); I reviewed a new concept today, Linear Classification.

Linear Classifier is basically the concept where you insert the data from an input into a linear function, which is more commonly known as the Decision Function (or Decision Boundary), which outputs a value that will be plotted on a Decision Plane, which in turn will determine what class the input belongs to, based on which Decision Region the image’s plot point is in on the decision plane. In the context of computer vision, that input would be an image, and the classifier equation processes the three color channels of the image as vectors, calculates them though the decision function, and plots the image on a decision plane.

An example decision function/boundary

Friday, September 20

After looking over the concepts of Gradient Descent (a concept borrowed from calculus where a ML model will increase/decrease its model parameters in the direction that will decrease the error) and it’s role in Logistic Regression (used for categorical problems compared to Linear Regression), I looked at some sample code (lab) that IBM made that puts those concepts into action.

I was really overwhelmed at first glance.

What in the world…

Note:

If I didn’t say this before: Make sure you have your basic Python skills and concepts down before going further into your AI/ML journey; a lot of libraries and frameworks (as of Fall 2024) are based off of Python.

So to save some trouble, I’ll highlight some of the functions (only functions from PyTorch) that were important in making a Logistic Regression Model, which is a very simple type of Neural Network.

torch.arange() and torch.view() (PyTorch)

torch.arange(-1, 1, 0.1).view(-1, 1)

The .arange(-1, 1, 0.1) creates a 1D tensor with values from -1 to 1 (exclusive) with an increasing interval of 0.1, creating values like [-1.0, -0.9, -0.8, ..., 0.9]. The .view(-1, 1) reshapes the tensor that was created with .arange(-1, 1, 0.1), where the first value determines the number of rows, and the second determines the number of columns that the reshaped tensor will have; the “-1” means “the number of elements in this whole tensor”, effectively reshaping the tensor into having only one column with each element having its own row.

What is a Tensor?

Tensors can have a lot of meanings, depending on which scientific context you’re talking about (e.g. physics, autonomy), but since this is a documentation of my machine learning/AI journey, I’m going to explain what tensors are in the context of machine learning; more specifically, in PyTorch.

Tensors are a generalization of different “structures” (or “mathematical objects” in a more algebraic sense): scalars/values are 0D tensors, vectors/arrays are 1D tensors, matrices/2D arrays, understandably, are 2D tensors, and tensors are capable of storing data in higher dimensions (3D, 4D, etc). With operations, tensors with a certain amount of dimensions are able to be converted to a smaller amount of dimensions for things like a color image (3D, because it has height, width, and color channels) to a grayscale image (2D, because it doesn’t need to handle different colors’ intensities).

nn.Linear() (PyTorch)

self.linear = nn.Linear(n_inputs, 1)

This code is what’s called a Linear Layer, where the input, n_values, is taken, and all of its values are “plugged” into a linear equation, and the output is all of those computed values (a method known as mapping) added together into a “1” dimensional object; a weighted value.

torch.sigmoid() and torch.linear() (PyTorch)

yhat = torch.sigmoid(self.linear(x))

The torch.sigmoid() puts the parameter though a sigmoid function, which a one kind of an activation function. Activation functions introduce nonlinearity into the model, and outputs the result; this is important because nonlinearity is crucial in machine learning models (like neural networks) because it’s what enables these models to identify complex relationships and patterns. Additionally, activation functions are differentiable (a function whose rate of change can be calculated at any point), which is a requirement for Gradient Descent to work to find the model parameters what would make the model more accurate.

A snapshot of a sigmoid function, where the left point of the red line represents the input, and the right (purple) point represents the output.

!! OOP Alert !!

Machine Learning and the Encapsulation Principle

If you noticed in the last function that I had explained, the function itself doesn’t do the work, but it actually creates a “thing” that could carry out the work. This “thing” is called an object, and this style is an attribute of Object-Oriented Programming, and is an example of the encapsulation principle; machine learning libraries and frameworks (e.g. PyTorch) tend to follow this principle to establish organization and encourage modularity, so expect to see this style of object-making a lot in AI/ML code.

Lessons Learned

  • I should be grateful for computers and their libraries and frameworks for doing the heavy math-lifting of Sobel operations and linear algebra

  • There are a variety of filters that help with image preprocessing, and much of them are reminiscent of tools in photography software (e.g. Adobe Lightroom)

  • KNN is a good concept in theory, but in practice it kinda sucks, so different algorithms based off of KNN are made to improve upon that.

  • Linear Classifiers use Decision Functions to classify things.

  • There’s a reason for why so many functions from AI/ML libraries require a object “middle": that’s because of the encapsulation principle in OOP.

Sources/Resources

The course I followed:

Libraries (specific pages):