Skip to content

Instantly share code, notes, and snippets.

@violetguos
Last active June 1, 2018 20:06
Show Gist options
  • Save violetguos/3ea7ef5bf0eff05810639326db7119f2 to your computer and use it in GitHub Desktop.
Save violetguos/3ea7ef5bf0eff05810639326db7119f2 to your computer and use it in GitHub Desktop.
lec

lec 5 CNN kernel

  • the kernel is a certain pattern we are looking for slide

loss functions

  • ReLu
  • Does not saturate (in +region)
  • Very computationally efficient
  • Converges much faster than sigmoid/tanh in practice (e.g. 6x)
  • Actually more biologically plausible than sigmoid
  • Not zero-centered output
  • An annoyance: hint: what is the gradient when x < 0? undefined at x = 0
  • Leaky ReLU Does not saturate
  • Computationally efficient
  • Converges much faster than sigmoid/tanh in practice! (e.g. 6x)
  • will not “die”
  • Exponential Linear Units (ELU)
  • All benefits of ReLU
  • Closer to zero mean outputs
  • Negative saturation regime compared with Leaky ReLU adds some robustness to noise

lec 3 loss function

** slide lec 3

  • multi-class SVM loss sum(max(0, Sj - Syi + 1)), where Sj = prediction score, Syi = true score slide 14

lect 3 GLoVe

1/2 sum f(Pij)(ui * vj - log Pij)

  • mangnitude of ui vj: eventually capture log count
  • skip gram: capture co-ocurances one window at a time
  • glove: capture the cont of the overall statistics of how often these words appear [45:00]

lec 4 Softmax, intro NN

d/df -log softmax(f_y) = [y - t] = δ slide 25

@violetguos
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment