Can Neural Networks remember ?

Resources : http://bit.ly/SeqModelsResources

A gunshot.

"This is it", she thought

She took a deep breath and prayed.

9 seconds later...

she was World Sprint Champion.

Can't decide sentiment without considering all sentences.

Can't classify activity with just one frame of a video.

Can't generate/classify a second of music without considering the previous seconds.

Traditional neural networks can’t remember.

Recurrent Neural Networks can.

Conventional Neural Networks

  def __init__(self, input_size, hidden_size, output_size):
      self.in2out = nn.Linear(input_size, output_size)
      self.softmax = nn.LogSoftmax(dim=1)

  def forward(self, input):
      output = self.in2out(input)
      output = self.softmax(output)
      return output, hidden

Recurrent Neural Networks

  def __init__(self, input_size, hidden_size, output_size):
      self.hid2hid = nn.Linear(hidden_size, hidden_size)
      self.in2hid = nn.Linear(input_size,hidden_size)
      self.tanh = nn.Tanh()

  def forward(self, input, hidden):
      hidden = self.tanh(self.in2hid(input) + self.hid2hid(hidden))
      return hidden

Hidden state is representative of entire sequence

Types of Sequential Models

1 : One-to-One

Vanilla Neural Network
*Image Classification*

2 : One-to-Many

Sequence output
*Image Captioning*

3 : Many-to-One

Sequence output
*Sentiment Analysis*

4a : Many-to-Many

Sequence input and sequence output
*Encoder Decoder. Translation.*

4b : Many-to-Many

Synced sequence input and output
*Wakeword detection.*

Long and Short term dependencies

Consider the sentence...

I grew up in Chennai... I speak X.

Where X is the word we are trying to predict.

I grew up in Chennai... I speak X.

speak --> X

X must be a language.
Close to X, hence short-term dependency

Chennai --> X

X must be relevant to Chennai.
Far from X, hence long-term dependency

The problem with (Vanilla) RNN

In theory, RNNs are capable of handling “long-term dependencies.”
In practice, RNNs don’t seem to be able to learn them.

Solution ?

We need to decide what information to keep or remove at every timestep
We need gates
If the gate is a function...
Discard information, if gate value is 0
Allow information, if gate value is 1

How do you define a gate ?

pointwise multiplication operation
To decide if X should be allowed through the gate.
- gate(X) = W*X + B
But gate value needs to be between 0 and 1..
Apply sigmoid function to gate value
- gate(X) = sigmoid(W*X + B)

Elements of LSTM

Input
Hidden State
Cell State

Gates of LSTM

Forget gate (`f_t`)

Decide what information to get rid of
Input : Hidden state and input
1 : Keep data
0 : Discard data completely

Forget gate (`f_t`)

For example :
- To determine gender of pronoun
- We need to remember the gender of last subject
- So, if we come across new subject, forget old gender

Input gate (`i_t`)

Decide what new information to store in cell state
Input : Hidden state and input
1 : Input is important
0 : Input doesn't matter

Input gate (`i_t`)

Layer has 2 parts
- Sigmoid layer (input gate layer) which decides which values we’ll update
- Tanh layer that creates a vector of new candidate values, Ct

Input gate (`f_t`)

For example :
- To determine gender of pronoun
- We need to remember the gender of last subject
- So, if we come across new subject, update new subject's gender

Update/Output gate

We know

How much to update (i_t)
How much to forgot (f_t)

Next Step ?
Update the value

Update/Output gate

- `C`_`t-1` : Previous Cell State - `C`_`t` : Current Cell State

Determine output using hidden and cell state.

Character-Level Models

Encoding characters
- 27 possible chars (A-Z + \n)
- Character --> [27 x 1] vector
Input
- Sequence of characters
- Input for each timestep --> [27 x 1] char-vector
Output
- Single character for each timestep
- Character that will follow the sequence of input chars

Colab Notebook : bit.ly/ColabCharLSTM

Character-Level LSTM (Keras) to generate names
Bonus : Using the LSTM to join names
- Brad + Angelina = Brangelina
- Char + Lizard = Charizard
- Britain + Exit = Brexit

📋 Slides Content as Markdown ➕ Follow me on Github 👨‍💼 Connect on LinkedIn

py-ranoid/SeqModelsSlides.md

Can Neural Networks remember ?

Consider the sentence...

The problem with (Vanilla) RNN

Solution ?

How do you define a gate ?

Elements of LSTM

Gates of LSTM

Forget gate (f_t)

Forget gate (f_t)

Input gate (i_t)

Input gate (i_t)

Input gate (f_t)

Update/Output gate

Update/Output gate

Character-Level Models

Forget gate (`f_t`)

Forget gate (`f_t`)

Input gate (`i_t`)

Input gate (`i_t`)

Input gate (`f_t`)