Attempt at explaining the kernel trick in preparation for 6.867 Midterm

Problem: Transforming X into φ(X) space can be expensive, and it is usually used as an intermediate result inside of a dot product like <φ(x[i]), φ(x[j])>.

Trick to save computation time: Conditional on having a φ where we know how to compute <φ(x[i]), φ(x[j])> through a shortcut, we can use the shortcut instead of explicitly calling φ and storing the long intermediate result. The savings stem from (a) saving calls to φ, and (b) making the dot product operate on shorter vectors.

Example

φ(x) = (x[1]**2, sqrt(2)*x[1]* x[2], x[2]**2)

<φ(x),φ(z)> = sum((x[1]**2)(z[1]**2), 2x[1]x[2]z[1]z[2], (x[2]**2)(z[2]**2))

<φ(x),φ(z)> = (x1z1 + x2z2)**2 = <x,z>**2

def shortcut(x, z):
    return (x.dot(z)) **2   

def old_way(x, y):
   return φ(x).dot(φ(z))

In old_way, each call to φ costs 4 multiplications, and the dot product of the intermediate results costs 3 multiplications and 2 additions. So the cost is 11 multiplications and 2 additions.

In shortcut, we need 2 multiplications and 1 addition to compute the dot product. And then another multiplication to square the scalar result, for a total of 3 multiplications and 1 addition.

So, by using shortcut instead of old_way we save 8 multiplications and 1 addition per call.

sshleifer/kernel_trick.md

Example