-
-
Save Newmu/acb738767acb4788bac3 to your computer and use it in GitHub Desktop.
| """ | |
| The MIT License (MIT) | |
| Copyright (c) 2015 Alec Radford | |
| Permission is hereby granted, free of charge, to any person obtaining a copy | |
| of this software and associated documentation files (the "Software"), to deal | |
| in the Software without restriction, including without limitation the rights | |
| to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
| copies of the Software, and to permit persons to whom the Software is | |
| furnished to do so, subject to the following conditions: | |
| The above copyright notice and this permission notice shall be included in all | |
| copies or substantial portions of the Software. | |
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
| IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
| FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
| AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
| LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
| OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
| SOFTWARE. | |
| """ | |
| def Adam(cost, params, lr=0.0002, b1=0.1, b2=0.001, e=1e-8): | |
| updates = [] | |
| grads = T.grad(cost, params) | |
| i = theano.shared(floatX(0.)) | |
| i_t = i + 1. | |
| fix1 = 1. - (1. - b1)**i_t | |
| fix2 = 1. - (1. - b2)**i_t | |
| lr_t = lr * (T.sqrt(fix2) / fix1) | |
| for p, g in zip(params, grads): | |
| m = theano.shared(p.get_value() * 0.) | |
| v = theano.shared(p.get_value() * 0.) | |
| m_t = (b1 * g) + ((1. - b1) * m) | |
| v_t = (b2 * T.sqr(g)) + ((1. - b2) * v) | |
| g_t = m_t / (T.sqrt(v_t) + e) | |
| p_t = p - (lr_t * g_t) | |
| updates.append((m, m_t)) | |
| updates.append((v, v_t)) | |
| updates.append((p, p_t)) | |
| updates.append((i, i_t)) | |
| return updates |
For people who struggle with the provided code and the message "Incompatible broadcastable dimensions.", they may need to modify the theano.shared(p.get_value() ... ) calls by adding the broadcastable=p.broadcastable option. Then the updates will be broadcastable in the same way as the original variables.
One more proposed change:
m = theano.shared(np.zeros(p.get_value().shape).astype(dtype=theano.config.floatX))
v = theano.shared(np.zeros(p.get_value().shape).astype(dtype=theano.config.floatX))
The code above doesn't handle scalar parameters correctly - the p.get_value() * 0. will create a float64, even if p.get_value() returns a float32.
That's right @bspeice. However, I 'm confused of the value of the b1 and b2, their values are set to 0.9 and 0.999 respectively in original paper.
Yes that is a mistake I think
No it is correct, see the update here again.
It is 1 - beta1 as beta1 and beta1 as 1 - beta1...
Which is 1 - 0.1 hence beta1 = 0.9 exactly what paper says, and 1 - 0.001 = 0.999 which is again exactly what paper says. Here they r using original beta1 as 1-beta1 and similarly with beta2.... Hece the confusion.
please i have a question
is this function is the built in function in tensor flow or this function is another function ????
To your question @stablum: this is how Theano constructs the computation graph. The adam() function should only be called once to define the updates in the computational graph, therefore
mandvget initialized to 0 once.