Learning Computer Vision Week 1

Week 1 of documenting my AI/ML learning journey (Sept 8 - Sept 14)

Things I’ve already learned

  • How CNNs work

  • How RNNs and LSTMs work

  • Programmed a CNN

  • Learned the basics of multiple AI/ML Python libraries like pandas, Keras, and numpy

I was bummed to find out that, despite cameras and other photographical innovations, computers can’t “tell” if there is an object or not in the image.

But then again, that’s why we have things like CAPTCHAs that catch these weaknesses of computers—now with computer vision, though, I’m not too sure if CAPTCHAs will stay around for any longer…

Man I hate these things…

This week, I dove into one of IBM’s courses for CV (computer vision) with python and other libraries associated with the field (like OpenCV and pillows; I’ll explain what these libraries do later).

What is Computer Vision?

Computer vision is a field of artificial intelligence (AI) that uses machine learning and neural networks to teach computers and systems to derive meaningful information from digital images, videos and other visual inputs—and to make recommendations or take actions when they see defects or issues.  

Wednesday, September 11

Today, I focused on how images can “transform” in the mathematical sense.

Generally, in computer science, when you give input images into a computer, it registers it as a lot of pixels on a screen.

In computer vision, each pixel can be represented as a vector—a set of numbers. To transform images (e.g., translate, stretch, reflect, rotate), each pixel vector is multiplied by a matrix, resulting in a new vector representing the pixel’s new position.

And the program does it for every single pixel.

Now if you got confused by the terms like “vector” and “matrix”, don’t worry, those are terms that come from linear algebra, a difficult field of mathematics. And in order to understand the math going on behind these transformations, I got introduced to linear transformations, which is a concept in linear algebra.

Note that I am only a sophomore in high school as of the writing of this newsletter .-.

The transformation matrix for one of the transformations: rotations.
Rotations require a knowledge of the unit circle (trig) to understand.

Thankfully, my family has access to ChatGPT’s “Plus” plan, so I grinded linear algebra in a new chat like there was no tomorrow.

I should get a Guinness World record for “learning linear algebra, fastest time” lmao

Friday, September 13

After learning about linear transformations, I started to apply them in code, and using jupyter notebook, I made some examples where I transformed some images.

Transformations

1. Resizing/Scaling

Pillows library

OpenCV library

new_image = image.resize((new_width, new_height))

For scale factors:

new_image = cv2.resize(image,None,fx=scale_factor_horizontal, fy=scale_factor_vertical, interpolation = cv2.INTER_NEAREST)

For specified number of rows and columns:

new_image = cv2.resize(image, (100, 200), interpolation=cv2.INTER_CUBIC)

The “interpolation” parameter can differ: the INTER_NEAREST uses the nearest pixel while INTER_CUBIC uses several pixels near the pixel value.

2. Translating

Pillows library

OpenCV library

translated_img = img.translate(100, 50)
new_image = cv2.warpAffine(image, M, (cols + translation_x, rows + translation_y))

3. Rotating

Pillows library

OpenCV library

new_image = image.rotate(theta) # theta is the angle (CCW)
M = cv2.getRotationMatrix2D(center= (cols // 2 - 1, rows // 2 - 1), angle=theta, scale=1)
new_image = cv2.warpAffine(image, M, (cols, rows))

The “cols // 2 - 1, rows // 2 - 1” represents the image’s center.

Tip for programming (in general)

Note that there are multiple ways to do certain functions; feel free to research further into functions like these if you wish!

I also learned about Array Operations.

We can perform array operations on an image; Using Python broadcasting, we can add a constant to each pixel's intensity value.
Before doing that, we must first we convert the PIL image to a numpy array.

-IBM’s Skills Network
# Adding a constant
new_image = image + 20
# Multiplying a constant
new_image = image * 10

# Making noise
Noise = np.random.normal(0, 20, (height, width, 3)).astype(np.uint8)
# Adding noise
new_image = image + Noise
# Making the whole photo noise
new_image = image * Noise

And I learned another concept, Singular Value Decomposition, (another concept in linear algebra *yikes*!) though this code (explanation is below the code):

U, s, V = np.linalg.svd(im_gray, full_matrices=True)

So I went to ChatGPT again for an explanation.

Thanks ChatGPT

So supposedly, SVD is a good tool that used in image processing i.e. image compression, dimensionality reduction, noise filtering, etc.

Code to show the “number of components” comparison (the more components, the better the image quality):

for n_component in [1,10,100,200,500]:
    S_new = S[:, :n_component]
    V_new = V[:n_component, :]
    A = U.dot(S_new.dot(V_new))
    plt.imshow(A,cmap='gray')
    plt.title("Number of Components:"+str(n_component))
    plt.show()

Saturday, September 14

Today, I finally decoded the math behind SVD, though a really great video that I watched:

And I also learned more linear algebra concepts, like eigenvalues and eigenvectors, because of their roles in the formula for SVD (the previous ChatGPT mentioned the concepts in the previous explanation, but I had overlooked those concepts yesterday):

ChatGPT teaching me once again

Practice matrices/problems that I tried to find the eigenvectors for. emphasis on the “tried”

Lessons Learned

  1. Linear Algebra is not really a “cryptic” concept, but more of like a ton of simple concepts put together where it provides a different perspective on everyday operations and data structures (and you have to multiply numbers a million times before calculating an eigenvalue).

  2. Linear Algebra really is crucial when learning about the math theory behind image processing in Computer Vision.

  3. SVD is the “culmination” of all Linear Algebra concepts, so it’s a probably a good thing to look at the most important LA concepts first before attempting to learn how SVD works.

  4. The Pillows and OpenCV library can both be used for image preprocessing in computer vision, but they accomplish tasks and commands in different ways—for example, OpenCV commands are compatible with numpy arrays, while Pillows has its own “PIL image” datatype.

Sources/Resources

The course I followed:

Learn Linear Algebra (Khan Academy)

Extra SVD/Linear Algebra videos: