Skip to content

Instantly share code, notes, and snippets.

View taylanbil's full-sized avatar

Taylan Bilal taylanbil

View GitHub Profile
@taylanbil
taylanbil / gist:dfe0010c906f9dfb3dc582f3bbb2933b
Last active May 5, 2020 20:32
Diff to get Myle's branch in a better state
diff --git a/fairseq/data/data_utils.py b/fairseq/data/data_utils.py
index ab82ea45..646545bc 100644
--- a/fairseq/data/data_utils.py
+++ b/fairseq/data/data_utils.py
@@ -199,7 +199,7 @@ def filter_by_size(indices, dataset, max_positions, raise_exception=False):
def batch_by_size(
indices, num_tokens_fn, max_tokens=None, max_sentences=None,
- required_batch_size_multiple=1,
+ required_batch_size_multiple=1, tpu=False
@taylanbil
taylanbil / tpu-vs-robertatpu.diff
Created December 6, 2019 18:25
Fairseq changes on top of tpu branch to make RoBERTa work well with TPUs
diff --git a/fairseq/criterions/masked_lm.py b/fairseq/criterions/masked_lm.py
index eb2fcf3..df27187 100644
--- a/fairseq/criterions/masked_lm.py
+++ b/fairseq/criterions/masked_lm.py
@@ -29,9 +29,14 @@ class MaskedLmLoss(FairseqCriterion):
2) the sample size, which is used as the denominator for the gradient
3) logging outputs to display while training
"""
+ # FIXME: proving reduce is always True
+ assert reduce, 'OMG NON REDUCE'
@taylanbil
taylanbil / upstreammaster_a0f7599-vs-tpu.diff
Created December 6, 2019 18:07
Fairseq changes to make Transformer + translation task work well with TPUs
diff --git a/fairseq/checkpoint_utils.py b/fairseq/checkpoint_utils.py
index 10de955..a6a6187 100644
--- a/fairseq/checkpoint_utils.py
+++ b/fairseq/checkpoint_utils.py
@@ -17,6 +17,8 @@ from torch.serialization import default_restore_location
from fairseq.models import FairseqEncoder, FairseqDecoder
+import torch_xla.core.xla_model as xm
+
@taylanbil
taylanbil / gist:6fc69ce632aabfd03eeb3b4903d51ddf
Created November 26, 2019 00:19
roberta metrics report 20191125
Metric: CompileTime
TotalSamples: 4
Accumulator: 02m10s739ms169.854us
ValueRate: 762ms277.314us / second
Rate: 0.0235018 / second
Percentiles: 1%=25s766ms505.593us; 5%=25s766ms505.593us; 10%=25s766ms505.593us; 20%=25s766ms505.593us; 50%=39s737ms929.180us; 80%=41s729ms347.105us; 90%=41s729ms347.105us; 95%=41s729ms347.105us; 99%=41s729ms347.105us
Metric: ExecuteTime
TotalSamples: 139
Accumulator: 07m06s439ms701.634us
ValueRate: 831ms927.457us / second
@taylanbil
taylanbil / tpuchanges.20191118.diff
Created November 18, 2019 19:46
Fairseq changes so it works well with TPUs
diff --git a/fairseq/checkpoint_utils.py b/fairseq/checkpoint_utils.py
index 10de955..aa8160f 100644
--- a/fairseq/checkpoint_utils.py
+++ b/fairseq/checkpoint_utils.py
@@ -17,6 +17,8 @@ from torch.serialization import default_restore_location
from fairseq.models import FairseqEncoder, FairseqDecoder
+import torch_xla.core.xla_model as xm
+
@taylanbil
taylanbil / gist:f7e5d631a4a92811d16fbebfcc675349
Created October 3, 2019 18:46
2019-09-11 / fairseq transformer metrics report wmt18, minor change to loss function branch, end of epoch 5
Epoch 5 end 23:40:43
Metric: CompileTime
TotalSamples: 103
Counter: 12h10m19s368ms61.206us
ValueRate: 01s026ms336.810us / second
Rate: 0.00590595 / second
Percentiles: 1%=014ms574.635us; 5%=051ms968.017us; 10%=252ms244.650us; 20%=538ms284.546us; 50%=31s090ms150.944us; 80%=06m51s432ms266.183us; 90%=06m26s349ms452.444us; 95%=08m46s288ms682.970us; 99%=01h03m02s697ms309.382us
Metric: ExecuteTime
TotalSamples: 60976
Counter: 01d20h12m35s140ms552.464us
@taylanbil
taylanbil / gist:ced8c407f415eee08f85bac156bcce26
Last active October 3, 2019 18:49
2019-10-03 / fairseq transformer metrics report wmt18, master branch, end of epoch 5
Epoch 5 end 07:32:37
Metric: CompileTime
TotalSamples: 98
Accumulator: 11h12m28s413ms977.979us
ValueRate: 790ms91.790us / second
Rate: 0.00511136 / second
Percentiles: 1%=012ms455.058us; 5%=036ms797.399us; 10%=309ms487.393us; 20%=01s073ms779.463us; 50%=02m34s048ms533.549us; 80%=06m43s037ms491.406us; 90%=06m59s355ms429.817us; 95%=06m24s546ms665.777us; 99%=21m42s139ms386.139us
Metric: ExecuteTime
TotalSamples: 60976
Accumulator: 02d36h02m12s990ms414.578us
@taylanbil
taylanbil / gist:bbfec9307a2f4c35833d70976fd96bf8
Created September 27, 2019 21:55
[fseq][transformer] warmed up run
Epoch 1 begin 21:38:10
training/ 21:39:31, device xla:1, step 1, Rate=19.64, GlobalRate=19.64
training/ 21:39:31, device xla:2, step 1, Rate=19.52, GlobalRate=19.52
training/ 21:39:31, device xla:5, step 1, Rate=19.37, GlobalRate=19.37
training/ 21:39:31, device xla:8, step 1, Rate=38.76, GlobalRate=38.76
training/ 21:39:31, device xla:4, step 1, Rate=19.31, GlobalRate=19.31
training/ 21:39:31, device xla:6, step 1, Rate=38.53, GlobalRate=38.53
training/ 21:39:31, device xla:7, step 1, Rate=76.98, GlobalRate=76.98
training/ 21:39:31, device xla:3, step 1, Rate=38.31, GlobalRate=38.31
training/ 21:39:51, device xla:8, step 2, Rate=45.96, GlobalRate=46.01
@taylanbil
taylanbil / gist:64aaa74c59745fce84aa217057f421d8
Created September 27, 2019 21:53
[fairseq][transformer] Fresh run w/ 3 large shapes.
Epoch 1 begin 19:52:45
training/ 19:55:52, device xla:5, step 1, Rate=2.19, GlobalRate=2.19
training/ 19:55:52, device xla:4, step 1, Rate=2.19, GlobalRate=2.19
training/ 19:56:05, device xla:2, step 1, Rate=1.97, GlobalRate=1.97
training/ 19:56:05, device xla:1, step 1, Rate=1.97, GlobalRate=1.97
training/ 19:59:14, device xla:8, step 1, Rate=1.60, GlobalRate=1.60
training/ 19:59:14, device xla:3, step 1, Rate=1.60, GlobalRate=1.60
training/ 19:59:16, device xla:6, step 1, Rate=1.60, GlobalRate=1.60
training/ 20:12:01, device xla:7, step 1, Rate=0.94, GlobalRate=0.94
training/ 20:17:46, device xla:5, step 2, Rate=1.11, GlobalRate=0.54
@taylanbil
taylanbil / gist:8ff73e6b8cb26e550c8d47be1a844f48
Created September 18, 2019 03:30
error while dumping graphs
| WARNING: 240829 samples have invalid sizes and will be skipped, max_positions=(64, 64), first few sample ids=[1422704, 2718830, 2897878, 3673048, 2016896, 2200333, 3886976, 2097242, 3124502, 2871279]
Epoch 1 begin 00:17:55
training/ 00:19:08, device xla:1, step 1, Rate=132.04, GlobalRate=132.04, loss=15.8125, nll_loss=15.8750
training/ 00:20:21, device xla:1, step 2, Rate=54.94, GlobalRate=6.89, loss=15.8125, nll_loss=15.8125
training/ 00:25:46, device xla:1, step 3, Rate=22.92, GlobalRate=2.56, loss=16.0000, nll_loss=16.0000
training/ 00:40:56, device xla:1, step 4, Rate=9.34, GlobalRate=0.98, loss=15.9375, nll_loss=15.9375
training/ 01:58:50, device xla:1, step 5, Rate=3.77, GlobalRate=0.26, loss=15.7500, nll_loss=15.8125
2019-09-18 03:13:04.411218: E tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: session_work.first->session()->Run( session_work.second.feed_inputs, session_work.second.outputs_handles, &outputs) == ::tensorflow::Status::OK() (Unavailable: From /job:tpu_worker/replica:0/