-
-
Save fchollet/314085fffa200de9c3da to your computer and use it in GitHub Desktop.
'''Functional Keras is a more functional replacement for the Graph API. | |
''' | |
################### | |
# 2 LSTM branches # | |
################### | |
a = Input(input_shape=(10, 32)) # output is a TF/TH placeholder, augmented with Keras attributes | |
b = Input(input_shape=(10, 32)) | |
encoded_a = LSTM(32)(a) # output is a TF/TH tensor | |
encoded_b = LSTM(32)(b) | |
merged = merge([encoded_a, encoded_b], mode='concat') | |
decoded = RepeatVector(10)(merged) | |
decoded = LSTM(32, return_sequences=True)(decoded) | |
# this is a fully-featured Keras model, will all the goodies that come with those. | |
# this is made possible by Keras topology information stored in the tensors. | |
model = Model(input=[a, b], output=[decoded]) | |
model.compile(optimizer=Adam(), loss='mse') | |
model.fit([x1, x2], y) | |
################ | |
# Shared layer # | |
################ | |
shared_lstm = LSTM(32) | |
a = Input(input_shape=(10, 32)) | |
b = Input(input_shape=(10, 32)) | |
encoded_a = shared_lstm(a) | |
encoded_b = shared_lstm(b) | |
merged = merge([encoded_a, encoded_b], mode='concat') | |
decoded = RepeatVector(10)(merged) | |
decoded = LSTM(32, return_sequences=True)(decoded) | |
############################## | |
# Insertion of arbitrary ops # | |
############################## | |
# NOTE: cannot do a = tf.sigmoid(a), because although 'a' is a valid tf tensor, | |
# it is 'augmented' with data that allows Keras to keep track of previous operations | |
# (thus making it possible to train a model)... | |
a = Input(input_shape=(10, 32)) | |
a = Lambda(tf.sigmoid)(a) | |
model = Model(input=[a, b], output=[decoder]) | |
model.compile(optimizer=Adam(), loss='mse') | |
model.fit([x1, x2], y) |
cannot do a = tf.sigmoid(a)
Though tf.sigmoid
wont work, we could make K.sigmoid
work right? K.sigmoid
could just copy the said attributes of the input to the output tensor. You can also do a = Activation('sigmoid')(a)
. Lambda layer for every activation doesn't sound right.
Also, what about operators ? c = Dense(10)(a + b)
and a = 0.2 * a
?
One option would be to write an abstract class K.Tensor
that wraps both theano and tensorflow tensors, which would take care of all the attribute business.
One option would be to write an abstract class K.Tensor that wraps both theano and tensorflow tensors, which would take care of all the attribute business.
That seems like a good idea, I'll think about it a bit more.
How would something like this translate to the new functional Graph API?
Models would be callable, so you could do:
conv1 = Sequential() # first convnet
conv1.add(...)
conv2 = Sequential() # second convnet, different architecture
conv2.add(...)
a = Input(input_shape=(3, 32, 32))
b = Input(input_shape=(1, 64, 64))
conv1a = conv1(a)
conv2b = conv2(b)
merged = merge([conv1a, conv2b], mode='concat', axis=1)
@farizrahman4u: upon reflexion, that is not feasible, because we need to be able to construct a layer-level computation graph in Keras land. When using Lambda
, then the TF/TH ops are associated to the Lambda layer. When using arbitrary ops outside of Lambda, we would have no layer for these ops to be attached to, and the entire concept of "model" (as a network of layers) would break down.
The way to solve this would be to associated "layers" to every op in the Keras backend, but I don't want to go that way. Let's keep it simple.
The inputs a
and b
in the first example get used twice: once to go into the two LSTM
and then again when specifying the input in Model
.
a = Input(input_shape=(10, 32))
b = Input(input_shape=(10, 32))
encoded_a = LSTM(32)(a)
encoded_b = LSTM(32)(b)
# skip some code here
model = Model(input=[a, b], output=[decoded])
This looks odd to me.
Would it be sensible to have the Model
optionally figure out what the inputs are based on where they get used in the graph? In this case, it seems like it would be straightforward, but maybe that is not generally the case.
I was caught a little off-guard by
encoded_a = shared_lstm(a)
encoded_b = shared_lstm(b)
I expected the definition of a layer to happen in one place (i.e. the meaning of the first line doesn't change based on the second line in this example).
In this API, could I replace those two lines with something like
encoded_a, encoded_b = shared_lstm([a,b])
I expected the definition of a layer to happen in one place (i.e. the meaning of the first line doesn't change based on the second line in this example).
Which is definitely the case. The definition of the layer happens at shared_lstm = LSTM(32)
. From then on it acts as a function (of course it should only be called on tensors with compatible shapes).
Would it be sensible to have the Model optionally figure out what the inputs are based on where they get used in the graph?
It is possible, but I believe it is a better practice to have the user explicitly states the inputs in the model definition, since it allows us to raise an error message if there is a discrepancy between the user's view of the world and the actual graph. Because the user will need to know what inputs are required anyway, when passing data to the model. Might as well prevent issues as early as possible.
I have a few doubts regarding the new api:
- Masking : How are the masks of sequences passed around among different layers?
- The Graph model can be used as a query-able data structure when copying weights from a model to another model of different config. Will this be possible in the new api?
I still recommend the K.Tensor
approach because:
- Can write arbitrary TH/TF expressions without loss of topology information. No confusion between
Lambda
andLambdaMerge
and all those stuff. - Mask can be made an attribute of the
Tensor
. This answers my first question. This way all mask stuff would be completely hidden from the user. Otherwise__call__
would have to return a tuple,(sequence, mask)
, then the user would have to reroute it to themask
arg of next layer:
(y, mask) = LSTM(10, return_sequences=True)(x, mask)
(z, mask) = LSTM(5)(y, mask)
which is not very interesting.
- It is not that complicated. A class with lot of operator overloads. All this is a lot of work anyway:)
One complication is that there would be a lot of dummy "op layers".
Masking : How are the masks of sequences passed around among different layers?
Like they were before. Some layers can generate a mask based on their input tensor and the previous mask. The mask is then propagated forward. If a layer that does not supports masking receives a non-None mask, it raises an error.
Importantly the new approach is more general than the previous one, so it will be possible for a multi-input layer to handle masking.
How it works in practice:
a = Input(shape)
# This creates a node in a graph linking a to b.
# the mask generated by Masking is stored inside the node.
b = Masking()(a)
# the lstm retrieves the node that b came from, and reads the mask from there
c = LSTM(32)(b)
The Graph model can be used as a query-able data structure when copying weights from a model to another model of different config. Will this be possible in the new api?
Yes. This is an important feature. You will still be able to iterate over the layers in a graph and query a layer by name.
a = Input(shape)
b = Dense(32, name='my_dense')(a)
c = Dense(32, name='output')(b)
model = Model(a, c)
# list of all layers in order of horizontal graph traversal.
# So for a sequential model it's just the ordered list of layers, starting with the input layer
model.layers
first_dense_instance = model.get_layer(name='my_dense')
first_dense_instance = model.get_layer(index=0)
Very interesting new API and less verbose, more readable and avoid a lot of input=X name=X, I like it!
What about the access of an intermediary layer like in a Siamese Network?
@fchollet: Very nice! Although the current
Graph
API is verbose, I liked that I could easily combine multipleSequential
models as inputs, e.g.:How would something like this translate to the new functional
Graph
API?