Skip to content

Instantly share code, notes, and snippets.

@fchollet
Last active May 23, 2019 11:14
Show Gist options
  • Save fchollet/314085fffa200de9c3da to your computer and use it in GitHub Desktop.
Save fchollet/314085fffa200de9c3da to your computer and use it in GitHub Desktop.
'''Functional Keras is a more functional replacement for the Graph API.
'''
###################
# 2 LSTM branches #
###################
a = Input(input_shape=(10, 32)) # output is a TF/TH placeholder, augmented with Keras attributes
b = Input(input_shape=(10, 32))
encoded_a = LSTM(32)(a) # output is a TF/TH tensor
encoded_b = LSTM(32)(b)
merged = merge([encoded_a, encoded_b], mode='concat')
decoded = RepeatVector(10)(merged)
decoded = LSTM(32, return_sequences=True)(decoded)
# this is a fully-featured Keras model, will all the goodies that come with those.
# this is made possible by Keras topology information stored in the tensors.
model = Model(input=[a, b], output=[decoded])
model.compile(optimizer=Adam(), loss='mse')
model.fit([x1, x2], y)
################
# Shared layer #
################
shared_lstm = LSTM(32)
a = Input(input_shape=(10, 32))
b = Input(input_shape=(10, 32))
encoded_a = shared_lstm(a)
encoded_b = shared_lstm(b)
merged = merge([encoded_a, encoded_b], mode='concat')
decoded = RepeatVector(10)(merged)
decoded = LSTM(32, return_sequences=True)(decoded)
##############################
# Insertion of arbitrary ops #
##############################
# NOTE: cannot do a = tf.sigmoid(a), because although 'a' is a valid tf tensor,
# it is 'augmented' with data that allows Keras to keep track of previous operations
# (thus making it possible to train a model)...
a = Input(input_shape=(10, 32))
a = Lambda(tf.sigmoid)(a)
model = Model(input=[a, b], output=[decoder])
model.compile(optimizer=Adam(), loss='mse')
model.fit([x1, x2], y)
@EderSantana
Copy link

whoa I liked that.
decoded = RepeatVector(10)([encoded_a, encoded_b], merge_mode='concat')

Merge will be built in all layers? How will regular RepeatVector(10) be backward compatible?
This is similar to Torch's nngraph right? They allow any layer to be plugged like that and they are still compatible with regular nn.


What is merged in line 29?


Here is one thing, right now, we are using __call__ to simplify the API of get_output. But we could reserve __call__ for that new API and use the code written in the old one in a new method. Something called apply or something like that that temporally replaces the input can calculates the input.

New __call__ can be a wrapper of set_previous with an optional Merge.

@EderSantana
Copy link

Also, before I was against an Input layer, but now I understand how important it is and how it make things easier.

@EderSantana
Copy link

Since Merge layers don't have any parameter and we are doing this new API using regular set_previous. I don't see why it wouldn't be backward compatible. We would just have to go back to our codes and see where we use the __call__ as get_output.

Another option, which is not as elegant is to create a new method something like an add or connect to everybody:

a = Input(input_shape=(10, 32))  # output is a TF/TH placeholder, augmented with Keras attributes
b = Input(input_shape=(10, 32))
encoded_a = LSTM(32).add(a)  # output is a TF/TH tensor
encoded_b = LSTM(32).addt(b)
decoded = RepeatVector(10).add([encoded_a, encoded_b], merge_mode='concat')
decoded = LSTM(32, return_sequences=True).add(decoded)

But we shouldn't change the way set_previous is right now to avoid breaking changes if possible.

@fchollet
Copy link
Author

Here is one thing, right now, we are using call to simplify the API of get_output. But we could reserve call for that new API and use the code written in the old one in a new method. Something called apply or something like that that temporally replaces the input can calculates the input.

The way I see it, __call__ would still behave in the old way: calling a layer on an input x will return x as processed by the layer.

However I plan on moving the actual layer logic to __call__, with get_output deferring to __call__.

Merge will be built in all layers? How will regular RepeatVector(10) be backward compatible?

It's a possibility, like it's already the case in Graph. It wouldn't affect RepeatVector in the old usage. Another possibility is to introduce a merge function:

merged = merge([encoded_a, encoded_b], merge_mode='concat')  # output is a tensor, not a layer instance (different from `Merge`)
decoded = RepeatVector(10)(merged)

What is merged in line 29?

A typo; fixed it.

@jfsantos
Copy link

Looks a lot like Torch nngraph, which I particularly like. I have been working with Graph on Keras quite a bit recently and I consider the current API a bit too verbose, so something like this would be great.

Regarding the merge things, I think built-in merge is a plus in the way of having less verbosity. Same thing for using __call__() instead of a .add()/.connect().

@EderSantana
Copy link

Ok let us mock up how Layer would look like them. Do you already have an idea?

class Layer(object):
    def __init__(self, *args, *kwargs):
        ...
    def __call__(self, inputs):
        # make sure inputs is a list
        # call merge, if "not merge_mode or singleton" just pass inputs[0], else return new layer with merged values as output
        # set that merge as previous with set_previous?
        # Return "get_output" or another Layer?

Before, __call__ would return a tensor (either based theano or tensorflow) and Model would actually care to get_input and get_outputs

Let us investigate each case:

  • Return get_output
    This would make graph design easier. Also the API becomes a graph annotator similar to Blocks or nngraph. Note that we would have no access to the layer properties in a case such as decoded = RepeatVector(10)([encoded_a, encoded_b], merge_mode='concat') the RepeatVector is gone and decoder is a tensor. This is probably the most minimalist and clean option. We could for example just compile K.function([a, b], [decoder])
  • Return Layer
    To avoid losing much control, __call__ could return a Layer or a Container with the elements added so far. If it returns a layer or a container, we would have to keep get_output to actually get tensor elements.
    Also, returning an object Layer or Container gives us more friendliness if trying to support non symbolic backends in the future.

Unless I'm missing something, I'm more inclined to the second case where we return a Layer.

What do you guys think?

@fchollet
Copy link
Author

Before, call would return a tensor (either based theano or tensorflow) and Model would actually care to get_input and get_outputs

__call__ will still return a tensor. It will be, however, a tensor possessing some extra attributes making it possible to reconstruct the Keras model it just went through (independently of the backend the tensor type belongs to).

get_output will simply retrieve the input tensor, pass it to __call__, and return the result.

@AvantiShri
Copy link

Thank you for this @fchollet! As another feather in the "there is only one API that you can converge to" cap, a colleague of mine who uses https://github.com/Lasagne tells me that they have a similar interface.

@lukedeo
Copy link

lukedeo commented Mar 13, 2016

I think this is great @fchollet! My one concern (not sure if you addressed elsewhere, I only looked at this quickly) is that I do like the verbose named argument training/prediction with the Graph API, but I think it should be optional. Could something like

a = Input(input_shape=(10, 32), name='input_1')
b = Input(input_shape=(10, 32), name='input_2')
...
model = Model(input=[a, b], output={'y': decoded})
model.compile(optimizer=Adam(), loss='mse')
model.fit({'input_1': x1, 'input_2': x2}, {'y': y})

be possible? Not sure how feasible/desirable this is...

@fchollet
Copy link
Author

@lukedeo: I was thinking of unifying the Graph, Sequential and Model APIs, so that you could do:

model.compile(optimizer=Adam(), loss='mse')
model.compile(optimizer=Adam(), loss={'y': 'mse'})

model.fit({'input_1': x1, 'input_2': x2}, y)
model.fit({'input_1': x1, 'input_2': x2}, [y])
model.fit({'input_1': x1, 'input_2': x2}, {'y': y})
model.fit([x1, x2], y)
model.fit([x1, x2], [y])

It implies that inputs and outputs are ordered (by order of their definition in the code). Thoughts?

To keep backward compatibility, it should also be possible to do:

model.fit({'input_1': x1, 'input_2': x2, 'y': y}, None)

@jocicmarko
Copy link

@fchollet: Very nice! Although the current Graph API is verbose, I liked that I could easily combine multiple Sequential models as inputs, e.g.:

conv1 = Sequential()  # first convnet
conv1.add(...)

conv2 = Sequential()  # second convnet, different architecture
conv2.add(...)

model = Graph()
model.add_input(name='conv1_input', input_shape=(3, 32, 32))
model.add_input(name='conv2_input', input_shape=(1, 64, 64))
model.add_node(conv1, name='conv1', input='conv1_input')
model.add_node(conv2, name='conv2', input='conv2_input')
# merge concat here

How would something like this translate to the new functional Graph API?

@farizrahman4u
Copy link

cannot do a = tf.sigmoid(a)

Though tf.sigmoid wont work, we could make K.sigmoid work right? K.sigmoid could just copy the said attributes of the input to the output tensor. You can also do a = Activation('sigmoid')(a). Lambda layer for every activation doesn't sound right.

Also, what about operators ? c = Dense(10)(a + b) and a = 0.2 * a ?
One option would be to write an abstract class K.Tensor that wraps both theano and tensorflow tensors, which would take care of all the attribute business.

@fchollet
Copy link
Author

One option would be to write an abstract class K.Tensor that wraps both theano and tensorflow tensors, which would take care of all the attribute business.

That seems like a good idea, I'll think about it a bit more.

How would something like this translate to the new functional Graph API?

Models would be callable, so you could do:

conv1 = Sequential()  # first convnet
conv1.add(...)

conv2 = Sequential()  # second convnet, different architecture
conv2.add(...)

a = Input(input_shape=(3, 32, 32))
b = Input(input_shape=(1, 64, 64))
conv1a = conv1(a)
conv2b = conv2(b)
merged = merge([conv1a, conv2b], mode='concat', axis=1)

@fchollet
Copy link
Author

@farizrahman4u: upon reflexion, that is not feasible, because we need to be able to construct a layer-level computation graph in Keras land. When using Lambda, then the TF/TH ops are associated to the Lambda layer. When using arbitrary ops outside of Lambda, we would have no layer for these ops to be attached to, and the entire concept of "model" (as a network of layers) would break down.

The way to solve this would be to associated "layers" to every op in the Keras backend, but I don't want to go that way. Let's keep it simple.

@sergeyf
Copy link

sergeyf commented Mar 16, 2016

The inputs a and b in the first example get used twice: once to go into the two LSTM and then again when specifying the input in Model.

a = Input(input_shape=(10, 32))  
b = Input(input_shape=(10, 32))
encoded_a = LSTM(32)(a) 
encoded_b = LSTM(32)(b)
# skip some code here
model = Model(input=[a, b], output=[decoded])

This looks odd to me.

Would it be sensible to have the Model optionally figure out what the inputs are based on where they get used in the graph? In this case, it seems like it would be straightforward, but maybe that is not generally the case.

@dansbecker
Copy link

I was caught a little off-guard by

encoded_a = shared_lstm(a)
encoded_b = shared_lstm(b)

I expected the definition of a layer to happen in one place (i.e. the meaning of the first line doesn't change based on the second line in this example).

In this API, could I replace those two lines with something like

encoded_a, encoded_b = shared_lstm([a,b])

@fchollet
Copy link
Author

I expected the definition of a layer to happen in one place (i.e. the meaning of the first line doesn't change based on the second line in this example).

Which is definitely the case. The definition of the layer happens at shared_lstm = LSTM(32). From then on it acts as a function (of course it should only be called on tensors with compatible shapes).

@fchollet
Copy link
Author

Would it be sensible to have the Model optionally figure out what the inputs are based on where they get used in the graph?

It is possible, but I believe it is a better practice to have the user explicitly states the inputs in the model definition, since it allows us to raise an error message if there is a discrepancy between the user's view of the world and the actual graph. Because the user will need to know what inputs are required anyway, when passing data to the model. Might as well prevent issues as early as possible.

@farizrahman4u
Copy link

@fchollet

I have a few doubts regarding the new api:

  • Masking : How are the masks of sequences passed around among different layers?
  • The Graph model can be used as a query-able data structure when copying weights from a model to another model of different config. Will this be possible in the new api?

I still recommend the K.Tensor approach because:

  • Can write arbitrary TH/TF expressions without loss of topology information. No confusion between Lambda and LambdaMerge and all those stuff.
  • Mask can be made an attribute of the Tensor. This answers my first question. This way all mask stuff would be completely hidden from the user. Otherwise __call__ would have to return a tuple, (sequence, mask), then the user would have to reroute it to the mask arg of next layer:
(y, mask) = LSTM(10, return_sequences=True)(x, mask)
(z, mask) = LSTM(5)(y, mask)

which is not very interesting.

  • It is not that complicated. A class with lot of operator overloads. All this is a lot of work anyway:)

One complication is that there would be a lot of dummy "op layers".

@fchollet
Copy link
Author

Masking : How are the masks of sequences passed around among different layers?

Like they were before. Some layers can generate a mask based on their input tensor and the previous mask. The mask is then propagated forward. If a layer that does not supports masking receives a non-None mask, it raises an error.

Importantly the new approach is more general than the previous one, so it will be possible for a multi-input layer to handle masking.

How it works in practice:

a = Input(shape)

# This creates a node in a graph linking a to b.
# the mask generated by Masking is stored inside the node.
b = Masking()(a)

# the lstm retrieves the node that b came from, and reads the mask from there
c = LSTM(32)(b)

The Graph model can be used as a query-able data structure when copying weights from a model to another model of different config. Will this be possible in the new api?

Yes. This is an important feature. You will still be able to iterate over the layers in a graph and query a layer by name.

a = Input(shape)
b = Dense(32, name='my_dense')(a)
c = Dense(32, name='output')(b)

model = Model(a, c)

# list of all layers in order of horizontal graph traversal.
# So for a sequential model it's just the ordered list of layers, starting with the input layer
model.layers

first_dense_instance = model.get_layer(name='my_dense')
first_dense_instance = model.get_layer(index=0)

@GregorySenay
Copy link

Very interesting new API and less verbose, more readable and avoid a lot of input=X name=X, I like it!
What about the access of an intermediary layer like in a Siamese Network?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment