Pat White byelipk

O(1)

This denotes constant time. Running a statement like if (true) {...} is constant time. Another example of constant time is looking up a value in an object, array, or a hash table.

O(log n)

This denotes logrithmic time. Divide and conquer or recursive algorithms have a O(log n) time complexity.

O(n)

Exercises

Is it okay to initialize all the weights to the same value as long as that value is selected randomly using He initialization?

No. All weights should be initialized to different random values and should not have the same initial value. If weights are symmetrical, meaning they have the same value, it makes it almost impossible for backpropagation to converge to a good solution.

Think of it this way: if all the weights are the same, it's like having just one neuron per layer, but much slower.

The technique we use to break this symmetry is to sample weights randomly.

Exercises

1. Draw an ANN using the original artificial neurons that compute the XOR operation.

TODO: Upload photo of XOR network

2. Why is it generally preferable to use a Logistic Regression classifier rather than a classical Perceptron (ie. a single layer of Linear Threshold Units trained using the Perceptron training algorithm)? How can you tweak a Perceptron to make it equivalent to a Logistic Regression classifier?

A classical perceptron will only converge if the data is linearly seperable. It also cannot compute class probabilities. The logistic regression classifier is able to converge on non-linear data and outputs class probabilities.

Excercises

1. What are the main benefits of creating a computation graph rather than directly executing the computations? What are the main drawbacks?

Deep Learning frameworks that generate computation graphs, like TensorFlow, have several things going for it.

For starters, computation graphs will compute the gradients automatically. This saves you from having to do lots of tedious calculus by hand.

Another huge plus is that they are optimized to run on your computer's GPU. If this wasn't the case you'd need to learn either CUDA or OPENCL and write lots of C++ by hand. Not an easy thing to do.

Exercises

1. What Linear Regression training algorithm can you use if you have a training set with millions of features?

You could use batch gradient descent, stochastic gradient descent, or mini-batch gradient descent. SGD and MBGD would work the best because neither of them need to load the entire dataset into memory in order to take 1 step of gradient descent. Batch would be ok with the caveat that you have enough memory to load all the data.

The normal equations method would not be a good choice because it is computationally inefficient. The main cause of the computational complexity comes from inverse operation on an (n x n) matrix.

O n2 . 4 to O n3

	.code-snippet {
	background-color: #2B394D;
	.line-number {
	color: #636d83;
	font-size: 1.25rem;
	}
	.line {
	color: white;
	font-size: 1.25rem;
	}

	def time_diff(start_time, end_time)
	seconds_diff = (start_time - end_time).to_i.abs

	hours = seconds_diff / 3600
	seconds_diff -= hours * 3600

	minutes = seconds_diff / 60
	seconds_diff -= minutes * 60

	seconds = seconds_diff

	# A utility written in Ruby to download video from M3U8 files.
	#
	# From Wikipedia:
	#
	# An M3U file is a plain text file that specifies the locations of one or more
	# media files. The file is saved with the "m3u" filename extension if the text
	# is encoded in the local system's default non-Unicode encoding (e.g., a
	# Windows codepage), or with the "m3u8" extension if the text is UTF-8 encoded.

	require 'optparse'

	from collections import Counter

	def take_n(data, n, test_condition=lambda x: True):
	"""
	Return index position of first N items that match a test condition.

	Parameters
	==========

	:data: An enumerable, such as a list.

	import numpy as np

	def fetch_batch(X, y, epoch, n_batches, batch_index, batch_size):
	"""
	A generic function that returns the next batch of data to train on.

	Parameters
	==========

	:X: The training examples

Pat White byelipk

Exercises

Is it okay to initialize all the weights to the same value as long as that value is selected randomly using He initialization?

Exercises

1. Draw an ANN using the original artificial neurons that compute the XOR operation.

2. Why is it generally preferable to use a Logistic Regression classifier rather than a classical Perceptron (ie. a single layer of Linear Threshold Units trained using the Perceptron training algorithm)? How can you tweak a Perceptron to make it equivalent to a Logistic Regression classifier?

Excercises

1. What are the main benefits of creating a computation graph rather than directly executing the computations? What are the main drawbacks?

Exercises

1. What Linear Regression training algorithm can you use if you have a training set with millions of features?

2. Suppose the features in your training set have very different scales: what algorithms might suffer from this, and how? What can you do about it?