Last active
January 16, 2018 07:26
-
-
Save hamelsmu/eb30d598559c4ef29d2358f63064b5e3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def extract_decoder_model(model): | |
""" | |
Extract the decoder from the original model. | |
Inputs: | |
------ | |
model: keras model object | |
Returns: | |
------- | |
A Keras model object with the following inputs and outputs: | |
Inputs of Keras Model That Is Returned: | |
1: the embedding index for the last predicted word or the <Start> indicator | |
2: the last hidden state, or in the case of the first word the hidden state | |
from the encoder | |
Outputs of Keras Model That Is Returned: | |
1. Prediction (class probabilities) for the next word | |
2. The hidden state of the decoder, to be fed back into the decoder at the | |
next time step | |
Implementation Notes: | |
---------------------- | |
Must extract relevant layers and reconstruct part of the computation graph | |
to allow for different inputs as we are not going to use teacher forcing at | |
inference time. | |
""" | |
# the latent dimension is the same throughout the architecture so we are going to | |
# cheat and grab the latent dimension of the embedding because that is the same as | |
# what is output from the decoder | |
latent_dim = model.get_layer('Decoder-Word-Embedding').output_shape[-1] | |
# Reconstruct the input into the decoder | |
decoder_inputs = model.get_layer('Decoder-Input').input | |
dec_emb = model.get_layer('Decoder-Word-Embedding')(decoder_inputs) | |
dec_bn = model.get_layer('Decoder-Batchnorm-1')(dec_emb) | |
# Instead of setting the intial state from the encoder and forgetting about it, | |
# during inference we are not doing teacher forcing, so we will have to have a | |
# feedback loop from predictions back into the GRU, thus we define this input | |
# layer for the state so we can add this capability | |
gru_inference_state_input = Input(shape=(latent_dim,), name='hidden_state_input') | |
# we need to reuse the weights that is why we are getting this | |
# If you inspect the decoder GRU that we created for training, it will take as | |
# input 2 tensors -> (1) is the embedding layer output for the teacher forcing, | |
# which will now be the last step's prediction, and will be | |
# _start_ on the first time step. | |
# (2) is the state, which we will initialize with the encoder | |
# on the first time step, but then grab the state after the | |
# first prediction and feed that back in again. | |
gru_out, gru_state_out = model.get_layer('Decoder-GRU')([dec_bn, | |
gru_inference_state_input]) | |
# Reconstruct dense layers | |
dec_bn2 = model.get_layer('Decoder-Batchnorm-2')(gru_out) | |
dense_out = model.get_layer('Final-Output-Dense')(dec_bn2) | |
decoder_model = Model([decoder_inputs, gru_inference_state_input], | |
[dense_out, gru_state_out]) | |
return decoder_model |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment