Skip to content

Instantly share code, notes, and snippets.

@falcondai
Last active March 3, 2016 19:45
Show Gist options
  • Save falcondai/6de2db7f6f906169a6b7 to your computer and use it in GitHub Desktop.
Save falcondai/6de2db7f6f906169a6b7 to your computer and use it in GitHub Desktop.
NLP ps 3

TTIC 31190 NLP: problem set 3

Falcon Dai

problem 1

I used TensorFlow to implement the neural network and achieved accuracy 0.825224 on the DEVTEST set.

Details:

  • use Adam to optimize
  • initial learning rate = 0.5
  • decay the learning rate by 0.5 every 256 epochs
  • max iteration = 1024

learning rate over epochs

train accuracy over epochs

more graphs

sparsity of the hidden layer over epochs

histogram of hidden layer weight gradients

histogram of output layer weight gradients

problem 2

I choose to test different context window sizes.

w test accuracy
2 0.794463
1 0.825224
0 0.841024

For w = 2, i got test accuracy of 0.600112 with the same values for other training hyper-parameters. One problem seems to be the sparsity in hidden layer reaching 0.99 very early which prevents gradients to flow across the ReLU. So I decided to halve the initial learning rate and that did save some units from being zeroed (0.97 sparsity) and got an improved test accuracy of 0.794463 which is not as good as w = 1.

For w = 0, i got test accuracy of 0.841024 which is better than w = 1. This result is somewhat surprising to me. But such result can be explained by the fact that many words have one very common POS especially for the special symbols in tweets such as hashtags, mentions and emoji. So considering context might be a distractor.

problem 3

I choose to explore the effect of regularizations.

For L2 regularization, I tried lambda of 0.1, 0.01, and 0.001. 0.001 produced the highest DEV accuracy which leads to a test accuracy of 0.845078 which is better than the original performance.

For dropout regularization, I tried dropout rate of 0.2, 0.5, and 0.9. 0.2 produced the highest DEV accuracy which leads to a DEVTEST accuracy of 0.729866 which is much worse than not using dropout.

I also tried dropout on both the input and the hidden layers. With dropout rate of 0.2, I got better DEV accuracy and DEVTEST accuracy of 0.842142 which is better than not using dropout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment