MLConf 2013 notes (1)

large scale deep learning

Quoc V.Le - Google

parallel neural networks at Google scale

machine learning requires domain knowledge from human experts
we want to move beyond hiring domain experts; it would be good to have machines create features rather than human experts

deep learning:

applying non-linearity (like a sigmoid) in successive iterations to build complex neural networks

The network can "learn" a lot of complex functions, independent of domain knowledge.

pixels -> edge detectors -> face detectors

trains deep learning on many machines (10K or more)
forward pass to compute the gradient, backward pass to compute the gradient
model parameters are partitioned
can use up to 1000 cores.
"1000 cores is still really small" so they partition the data and apply the functions to separate nodes and then send answers back to a "parameter server"
the problem with this model: the server needs to wait for all answers to compute. so they relax the constraint and allow for asynch computation

voice search, photo search, and text understanding

Voice search: your speech is sent to a deep neural network that

Completely done with parallelized networks.

Text understanding: useful but very difficult

programatically understanding the meaning of words in context (complete with metaphors and idioms)
you can map each word to a 100-dimension space.
translation can be mapped geometrically by matching words that occupy the same XY