falcondai/lr.png

Last active March 3, 2016 19:45

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/falcondai/6de2db7f6f906169a6b7.js"></script>
Save falcondai/6de2db7f6f906169a6b7 to your computer and use it in GitHub Desktop.

Download ZIP

NLP ps 3

Raw

lr.png

Raw

TTIC 31190 NLP: problem set 3

Falcon Dai

problem 1

I used TensorFlow to implement the neural network and achieved accuracy 0.825224 on the DEVTEST set.

Details:

use Adam to optimize
initial learning rate = 0.5
decay the learning rate by 0.5 every 256 epochs
max iteration = 1024

more graphs

problem 2

I choose to test different context window sizes.

w	test accuracy
2	0.794463
1	0.825224
0	0.841024

For w = 2, i got test accuracy of 0.600112 with the same values for other training hyper-parameters. One problem seems to be the sparsity in hidden layer reaching 0.99 very early which prevents gradients to flow across the ReLU. So I decided to halve the initial learning rate and that did save some units from being zeroed (0.97 sparsity) and got an improved test accuracy of 0.794463 which is not as good as w = 1.

For w = 0, i got test accuracy of 0.841024 which is better than w = 1. This result is somewhat surprising to me. But such result can be explained by the fact that many words have one very common POS especially for the special symbols in tweets such as hashtags, mentions and emoji. So considering context might be a distractor.

problem 3

I choose to explore the effect of regularizations.

For L2 regularization, I tried lambda of 0.1, 0.01, and 0.001. 0.001 produced the highest DEV accuracy which leads to a test accuracy of 0.845078 which is better than the original performance.

For dropout regularization, I tried dropout rate of 0.2, 0.5, and 0.9. 0.2 produced the highest DEV accuracy which leads to a DEVTEST accuracy of 0.729866 which is much worse than not using dropout.

I also tried dropout on both the input and the hidden layers. With dropout rate of 0.2, I got better DEV accuracy and DEVTEST accuracy of 0.842142 which is better than not using dropout.

Raw

sparsity.png

Raw

train-acc.png

Raw

w-h1-grad.png

Raw

w-output-grad.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment