Skip to content

Instantly share code, notes, and snippets.

@LysandreJik
Created December 16, 2019 22:34
Show Gist options
  • Save LysandreJik/c958925768eb6a9a72609ea99561d1cb to your computer and use it in GitHub Desktop.
Save LysandreJik/c958925768eb6a9a72609ea99561d1cb to your computer and use it in GitHub Desktop.
Training GPT-2 LM Head model in Keras
from transformers import GPT2Tokenizer, TFGPT2LMHeadModel
import tensorflow as tf
model = TFGPT2LMHeadModel.from_pretrained("distilgpt2")
tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
text = """
A SQUAT grey building of only thirty-four stories. Over the main entrance the
words, CENTRAL LONDON HATCHERY AND CONDITIONING CENTRE,
and, in a shield, the World State’s motto, COMMUNITY, IDENTITY, STABILITY.
The enormous room on the ground floor faced towards the north. Cold for
all the summer beyond the panes, for all the tropical heat of the room itself,
a harsh thin light glared through the windows, hungrily seeking some draped
lay figure, some pallid shape of academic goose-flesh, but finding only the glass
and nickel and bleakly shining porcelain of a laboratory. Wintriness responded
to wintriness. The overalls of the workers were white, their hands gloved with
a pale corpse-coloured rubber. The light was frozen, dead, a ghost. Only from
the yellow barrels of the microscopes did it borrow a certain rich and living
substance, lying along the polished tubes like butter, streak after luscious streak
in long recession down the work tables.
“And this,” said the Director opening the door, “is the Fertilizing Room.”
Bent over their instruments, three hundred Fertilizers were plunged, as the Director of Hatcheries and Conditioning entered the room, in the scarcely breathing silence, the absent-minded, soliloquizing hum or whistle, of absorbed
concentration. A troop of newly arrived students, very young, pink and callow,
followed nervously, rather abjectly, at the Director’s heels. Each of them carried
a notebook, in which, whenever the great man spoke, he desperately scribbled.
Straight from the horse’s mouth. It was a rare privilege. The D. H. C. for Central
London always made a point of personally conducting his new students round
the various departments.
“Just to give you a general idea,” he would explain to them. For of course some
sort of general idea they must have, if they were to do their work intelligentlythough as little of one, if they were to be good and happy members of society, as
possible. For particulars, as every one knows, make for virture and happiness;
generalities are intellectually necessary evils. Not philosophers but fretsawyers
""" * 100
tokenized_text = tokenizer.encode(text)
examples = []
block_size = 100
for i in range(0, len(tokenized_text) - block_size + 1, block_size): # Truncate in block of block_size
examples.append(tokenized_text[i:i + block_size])
inputs, labels = [], []
for ex in examples:
inputs.append(ex[:-1])
labels.append(ex[1:])
dataset = tf.data.Dataset.from_tensor_slices((inputs, labels))
BATCH_SIZE = 16
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=[loss, *[None] * model.config.n_layer], metrics=[metric])
model.fit(dataset, epochs=3)
@jayendra13
Copy link

Thanks for your response, I will switch to custom training loop instead of fit.

@alexol91
Copy link

Change BATCH_SIZE = 16 for BATCH_SIZE = 12

@jayendra13
Copy link

jayendra13 commented Oct 30, 2020

so the batch-size 12 is hardcoded inside this pretrained model ? different batch size also don't work in the model created from the default config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment