Skip to content

Instantly share code, notes, and snippets.

@nicholaschiasson
Last active February 8, 2017 23:49
Show Gist options
  • Save nicholaschiasson/84d8cb7e7a6c113a9feb606b3c594b32 to your computer and use it in GitHub Desktop.
Save nicholaschiasson/84d8cb7e7a6c113a9feb606b3c594b32 to your computer and use it in GitHub Desktop.
Computer Vision Assignment 1
Display the source blob
Display the rendered blob
Raw

COMP4102A: Assignment 1

Nicholas Chiasson - 100891716


  1. Homogeneous coordinates allow you to represent any/all transformations including rotation, scale, translation, projection, using the matrix multiplication operation alone, simplifying the effort required to find a resulting transformation matrix or viewspace point into a single sequence of matrix multiplications.

  2. Pinhole cameras have an infinite depth of field due to the fact that the rays of light entering the hole are not refracted by a lens and the aperture is very small, therefore the entering rays of light are organized and will not overlap on the image plane to cause blurring.

  3. A camera with a lens does not have an infinite depth of field because the lens refracts entering light such that an object out of focus would reflect rays of light into the lens to be refracted so they do not all meet at (or around) the same point on the image plane, thus creating a blur.

  4. Just as matrix multiplications are not, three dimensional rotations are not commutative operations. Each successive rotational operation is performed on previously modified (rotated) axes and therefore will have a different angle value. As a small example, a 180 degree rotation about the Y axis followed by a 45 degree upward rotation about the X axis will result in the object facing backward and upward by 45 degrees. On the other hand, a 45 degree upward rotation about the X axis followed by a 180 degree rotation about the Y axis will result in the object facing backward and downward by 45 degrees. This is because after the rotation about the X axis, the Y axis will have been also rotated by 45 degrees, thus differentiating this sequence of rotations from the reverse order sequence.

  5. A smaller focal length results in a larger field of view, whereas a greater focal length results in a smaller field of view. The greater the focal length, the weaker the image perspective.

  6. Filling out the matrix multiplication we find the following:

    u = f * X

    v = f * Y

    w = Z

    Knowing that x = u / w and y = v / w, we can derive to the following projection equations:

    x = u / w
    x = f * X / Z
    
    y = v / w
    y = f * Y / Z
    

    Now our equation for x is of the form x = some function(f, X, Z) and likewise, our equation for y is of the form y = some function(f, Y, Z).

    Any point on a line from the origin through point P can be represented by the following set of equations known as the parametric form of the equation of a line:

    x' = X_0 + t * X = t * X
    y' = Y_0 + t * Y = t * Y
    z' = Z_0 + t * Z = t * Z
    

    where t is an arbitrary value and X_0, Y_0, and Z_0 are the coordinates of the origin (0, 0, 0).

    In our projection equations from above, we can substitute the values for X, Y, and Z with x' / t, y' / t, and z' / t respectively. This form introduces ambiguity to the values of X, Y, and Z, meaning that the following equations are satisfactory for any point on the line from the origin through point P:

    x = f * (x' / t) / (z' / t)
    x = f * x' / z'
    // t values cancel out
    
    y = f * (y' / t) / (z' / t)
    y = f * y' / z'
    // t values cancel out
    

    And thus we can conclude that any point (x', y', z') on a line from the origin through point P(X, Y, Z) projects to the same point on the image.

  7. All 3d rotations can be represented with as few as 3 parameters, in other words 3 degrees of freedom.

    All 2d rotations can be represented with as few as 1 parameter, in other words 1 degree of freedom.

    This implies that entries of a rotation matrix are not independent (ie. rows and columns are orthogonal), otherwise there would be more degrees of freedom being that 2d rotations are represented with 2 by 2 matrices and 3d rotations are represented by 3 by 3 matrices.

  8. From the equation Zz = f*f we can derive the thin lens equation, proving its validity as long as the thin lens equation holds.

    Zz = f*f
    (Z^ - f)(z^ - f) = f*f
    (Z^z^ - fZ^ - fz^ + f*f) / f = f
    Z^z^ / f - Z^ - z^ + f = f
    Z^z^ / f = f + Z^ + z^ - f
    Z^z^ / f = Z^ + z^
    1 / f = (Z^ + z^) / Z^z^
    1 / f = (Z^ / Z^z^) + (z^ / Z^z^)
    1 / f = (1 / z^) + (1 / Z^)
    

    Now all we need to do is prove the thin lens equation.

    Thin lens model

    Above is a graphical representation of the thin lens model.

    We can find similar triangles in the figure connecting points QOF_r and spF_r.

    We can reason that sp / QO = sF_r / OF_r.

    Further, since sF_r = z and OF_r = f, we can instead say:

    sp / QO = z / f = (z^ / f) - 1

    where z^ = z + f. Also let Z^ = Z + f.

    Next, we can find two more similar triangles in the figure connecting points SPO and spO.

    Like before, we can reason that sp / SP = sO / SO = z^ / Z^.

    It is visually clear that SP = QO, so we can again refactor to the following:

    sp / QO = z^ / Z^

    Now, comparing the following two equations:

    sp / QO = (z^ / f) - 1

    sp / QO = z^ / Z^

    we see that z^ / Z^ = (z^ / f) - 1.

    Dividing both sides by z^ reveals the following:

    z^ / Z^z^ = (z^ / fz^) - 1 / z^

    1 / Z^ = (1 / f) - (1 / z^)

    which rearranges to the thin lens equation:

    (1 / Z^) + (1 / z^) = 1 / f

    This proves the thin lens equation.

    When Z^ is plus infinity, z^ is f and when Z^ is f, z^ is plus infinity.

  9. Focal length f = 500, pixel size s_x = s_y = 1, and principal point at (o_x, o_y) = (320, 240).

    a. Below are the simplified matrix multiplication steps toward finding the projection matrix to project a world frame point onto the image plane.

    ```
    | -f/s_x     0    o_x | | 1 0 0 -170 |
    |    0    -f/s_y  o_y | | 0 1 0  -95 |
    |    0       0     1  | | 0 0 1  -70 |
    ```
    
    ```
    | -500     0   320 | | 1 0 0 -170 |
    |   0    -500  240 | | 0 1 0  -95 |
    |   0      0    1  | | 0 0 1  -70 |
    ```
    
    ```
    | -500    0    320    85000-22400 |
    |   0   -500   240    47500-16800 |
    |   0     0     1         -70     |
    ```
    
    ```
    | -500    0    320    62600 |
    |   0   -500   240    30700 |
    |   0     0     1      -70  |
    ```
    

    b. The following matrix multiplication will tell us the pixel coordinates of the world point X_w = (350, 200, 150):

    ```
    | -500    0    320    62600 | | 350 |
    |   0   -500   240    30700 | | 200 |
    |   0     0     1      -70  | | 150 |
                                  |  1  |
    ```
    
    ```
    | -64400 |
    | -33300 |
    |   80   |
    ```
    
    Thus the pixel coordinates of X_w are:
    ```
    (u, v) = (-64400 / 80, -33300 / 80)
    (u, v) = (-805, -416.25)
    ```
    
  10. // Done in code

  11. O(N^2) time complexity when using a direct convolution with a square 2d mask. O(N) time complexity when using a separable kernel.

  12. TRUE or FALSE

    a. FALSE

    b. FALSE

    c. FALSE

    d. FALSE

  13. Having the full integral image makes it very easy to compute the sum of all pixels in the desired rectangle.

    Firstly, we already have the sum of the pixels from the top left of the integral image all the way to the bottom right of our rectangle, I(i+H, j+W), but this is too much information. We need to subtract the sum from the area above and to the left of the rectangle. Using the following equation, this can be accomplished:

    I(i+H, j+W) - I(i, j+W) - I(i+H, j)
    

    The problem now is that we have subtracted too much. We have double subtracted the small window above and to the left diagonally above the rectangle because both sums I(i, j+W) and I(i+H, j) include this area of pixels. To fix this, we use the following, final equation:

    I(i+H, j+W) - I(i, j+W) - I(i+H, j) + I(i, j)
    

    Because this approach only requires 4 sums given by the precomputed integral image and no iteration at all, the running time is not dependent on the size of the rectangle.

import cv2
import numpy as np
K = np.array([[-500, 0, 320], [0, -500, 240], [0, 0, 1]], np.float)
R = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], np.float)
T = np.array([-170, -95, -70], np.float)
X_w = np.array([[350, 200, 150]], np.float)
print("K matrix:")
print(K)
print("R matrix:")
print(R)
print("T vector:")
print(T)
print("World point:")
print(X_w)
print("Image point:")
print(cv2.projectPoints(X_w, R, T, K, None)[0][0])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment