Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save taylanbil/7241f441dad7aa3dfb2cdf287a72cf76 to your computer and use it in GitHub Desktop.
Save taylanbil/7241f441dad7aa3dfb2cdf287a72cf76 to your computer and use it in GitHub Desktop.
2020-07-20 19:26:21 | INFO | train_inner | epoch 001: 100 / 648283 loss=14.211, ppl=18970, wps=0, ups=0, wpb=143, bsz=8, num_updates=100, lr=1e-06, gnorm=8.634, train_wall=11, wall=130
RAWLOSS @ 200 tensor(1351.1304, device='xla:1')
2020-07-20 19:26:46 | INFO | train_inner | epoch 001: 200 / 648283 loss=12.824, ppl=7251.87, wps=6, ups=0.04, wpb=152, bsz=8, num_updates=200, lr=2e-06, gnorm=6.779, train_wall=10, wall=156
RAWLOSS @ 300 tensor(2289.3340, device='xla:1')
2020-07-20 19:27:11 | INFO | train_inner | epoch 001: 300 / 648283 loss=12.463, ppl=5647.65, wps=10.5, ups=0.04, wpb=265, bsz=8, num_updates=300, lr=3e-06, gnorm=4.417, train_wall=10, wall=181
RAWLOSS @ 400 tensor(1715.6587, device='xla:1')
2020-07-20 19:27:38 | INFO | train_inner | epoch 001: 400 / 648283 loss=11.9, ppl=3821.35, wps=7.8, ups=0.04, wpb=208, bsz=8, num_updates=400, lr=4e-06, gnorm=4.5, train_wall=11, wall=208
RAWLOSS @ 500 tensor(2465.9453, device='xla:1')
2020-07-20 19:28:03 | INFO | train_inner | epoch 001: 500 / 648283 loss=11.938, ppl=3924.46, wps=11.8, ups=0.04, wpb=298, bsz=8, num_updates=500, lr=5e-06, gnorm=3.786, train_wall=10, wall=233
RAWLOSS @ 600 tensor(2054.0742, device='xla:1')
2020-07-20 19:28:28 | INFO | train_inner | epoch 001: 600 / 648283 loss=11.621, ppl=3150.11, wps=10.1, ups=0.04, wpb=255, bsz=8, num_updates=600, lr=6e-06, gnorm=4.013, train_wall=10, wall=258
RAWLOSS @ 700 tensor(1702.3367, device='xla:1')
2020-07-20 19:28:54 | INFO | train_inner | epoch 001: 700 / 648283 loss=11.113, ppl=2214.71, wps=8.7, ups=0.04, wpb=221, bsz=8, num_updates=700, lr=7e-06, gnorm=3.932, train_wall=10, wall=284
RAWLOSS @ 800 tensor(1390.9158, device='xla:1')
2020-07-20 19:29:19 | INFO | train_inner | epoch 001: 800 / 648283 loss=10.451, ppl=1400.17, wps=7.6, ups=0.04, wpb=192, bsz=8, num_updates=800, lr=8e-06, gnorm=4.029, train_wall=10, wall=309
RAWLOSS @ 900 tensor(1542.2075, device='xla:1')
2020-07-20 19:29:44 | INFO | train_inner | epoch 001: 900 / 648283 loss=10.907, ppl=1919.54, wps=8.1, ups=0.04, wpb=204, bsz=8, num_updates=900, lr=9e-06, gnorm=3.623, train_wall=10, wall=334
RAWLOSS @ 1000 tensor(1371.6144, device='xla:1')
2020-07-20 19:30:10 | INFO | train_inner | epoch 001: 1000 / 648283 loss=10.639, ppl=1594.43, wps=7.4, ups=0.04, wpb=186, bsz=8, num_updates=1000, lr=1e-05, gnorm=3.659, train_wall=10, wall=359
RAWLOSS @ 1100 tensor(1693.7521, device='xla:1')
2020-07-20 19:30:35 | INFO | train_inner | epoch 001: 1100 / 648283 loss=10.443, ppl=1391.67, wps=9.3, ups=0.04, wpb=234, bsz=8, num_updates=1100, lr=1.1e-05, gnorm=3.67, train_wall=10, wall=385
RAWLOSS @ 1200 tensor(1228.0624, device='xla:1')
2020-07-20 19:31:00 | INFO | train_inner | epoch 001: 1200 / 648283 loss=9.898, ppl=954.02, wps=7.1, ups=0.04, wpb=179, bsz=8, num_updates=1200, lr=1.2e-05, gnorm=4.351, train_wall=10, wall=410
RAWLOSS @ 1300 tensor(1328.1467, device='xla:1')
2020-07-20 19:31:27 | INFO | train_inner | epoch 001: 1300 / 648283 loss=11.205, ppl=2361.23, wps=6.4, ups=0.04, wpb=171, bsz=8, num_updates=1300, lr=1.3e-05, gnorm=4.224, train_wall=11, wall=437
RAWLOSS @ 1400 tensor(1641.4343, device='xla:1')
2020-07-20 19:31:52 | INFO | train_inner | epoch 001: 1400 / 648283 loss=10.478, ppl=1426.51, wps=8.9, ups=0.04, wpb=226, bsz=8, num_updates=1400, lr=1.4e-05, gnorm=4.648, train_wall=10, wall=462
RAWLOSS @ 1500 tensor(1422.3253, device='xla:1')
2020-07-20 19:32:18 | INFO | train_inner | epoch 001: 1500 / 648283 loss=10.158, ppl=1142.78, wps=8, ups=0.04, wpb=202, bsz=8, num_updates=1500, lr=1.5e-05, gnorm=4.321, train_wall=10, wall=487
RAWLOSS @ 1600 tensor(1923.9608, device='xla:1')
2020-07-20 19:32:43 | INFO | train_inner | epoch 001: 1600 / 648283 loss=10.554, ppl=1503.33, wps=10.4, ups=0.04, wpb=263, bsz=8, num_updates=1600, lr=1.6e-05, gnorm=3.868, train_wall=10, wall=513
RAWLOSS @ 1700 tensor(1688.1075, device='xla:1')
2020-07-20 19:33:08 | INFO | train_inner | epoch 001: 1700 / 648283 loss=9.9, ppl=955.49, wps=9.8, ups=0.04, wpb=246, bsz=8, num_updates=1700, lr=1.7e-05, gnorm=3.629, train_wall=10, wall=538
RAWLOSS @ 1800 tensor(1570.7827, device='xla:1')
2020-07-20 19:33:33 | INFO | train_inner | epoch 001: 1800 / 648283 loss=10.491, ppl=1439.63, wps=8.6, ups=0.04, wpb=216, bsz=8, num_updates=1800, lr=1.8e-05, gnorm=4.323, train_wall=10, wall=563
RAWLOSS @ 1900 tensor(1828.8079, device='xla:1')
2020-07-20 19:33:59 | INFO | train_inner | epoch 001: 1900 / 648283 loss=10.187, ppl=1165.65, wps=10.3, ups=0.04, wpb=259, bsz=8, num_updates=1900, lr=1.9e-05, gnorm=3.833, train_wall=10, wall=588
RAWLOSS @ 2000 tensor(1782.3601, device='xla:1')
2020-07-20 19:34:24 | INFO | train_inner | epoch 001: 2000 / 648283 loss=10.164, ppl=1147, wps=10, ups=0.04, wpb=253, bsz=8, num_updates=2000, lr=2e-05, gnorm=3.724, train_wall=10, wall=614
RAWLOSS @ 2100 tensor(2344.3152, device='xla:1')
2020-07-20 19:34:49 | INFO | train_inner | epoch 001: 2100 / 648283 loss=10.187, ppl=1165.83, wps=13.2, ups=0.04, wpb=332, bsz=8, num_updates=2100, lr=2.1e-05, gnorm=3.385, train_wall=10, wall=639
RAWLOSS @ 2200 tensor(1709.8677, device='xla:1')
2020-07-20 19:35:14 | INFO | train_inner | epoch 001: 2200 / 648283 loss=10.365, ppl=1318.59, wps=9.4, ups=0.04, wpb=238, bsz=8, num_updates=2200, lr=2.2e-05, gnorm=3.906, train_wall=10, wall=664
RAWLOSS @ 2300 tensor(1072.0686, device='xla:1')
2020-07-20 19:35:41 | INFO | train_inner | epoch 001: 2300 / 648283 loss=9.915, ppl=965.1, wps=6, ups=0.04, wpb=156, bsz=8, num_updates=2300, lr=2.3e-05, gnorm=4.58, train_wall=11, wall=690
RAWLOSS @ 2400 tensor(1801.0403, device='xla:1')
2020-07-20 19:36:06 | INFO | train_inner | epoch 001: 2400 / 648283 loss=10.352, ppl=1306.96, wps=9.9, ups=0.04, wpb=251, bsz=8, num_updates=2400, lr=2.4e-05, gnorm=3.878, train_wall=10, wall=716
RAWLOSS @ 2500 tensor(1392.2241, device='xla:1')
2020-07-20 19:36:31 | INFO | train_inner | epoch 001: 2500 / 648283 loss=9.299, ppl=629.85, wps=8.6, ups=0.04, wpb=216, bsz=8, num_updates=2500, lr=2.5e-05, gnorm=4.098, train_wall=10, wall=741
RAWLOSS @ 2600 tensor(1319.8892, device='xla:1')
2020-07-20 19:36:56 | INFO | train_inner | epoch 001: 2600 / 648283 loss=9.918, ppl=967.22, wps=7.6, ups=0.04, wpb=192, bsz=8, num_updates=2600, lr=2.6e-05, gnorm=4.08, train_wall=10, wall=766
RAWLOSS @ 2700 tensor(1346.2131, device='xla:1')
2020-07-20 19:37:21 | INFO | train_inner | epoch 001: 2700 / 648283 loss=9.428, ppl=688.84, wps=8.2, ups=0.04, wpb=206, bsz=8, num_updates=2700, lr=2.7e-05, gnorm=3.947, train_wall=10, wall=791
RAWLOSS @ 2800 tensor(1433.2075, device='xla:1')
2020-07-20 19:37:47 | INFO | train_inner | epoch 001: 2800 / 648283 loss=10.658, ppl=1615.93, wps=7.7, ups=0.04, wpb=194, bsz=8, num_updates=2800, lr=2.8e-05, gnorm=4.144, train_wall=10, wall=816
RAWLOSS @ 2900 tensor(1693.4897, device='xla:1')
2020-07-20 19:38:12 | INFO | train_inner | epoch 001: 2900 / 648283 loss=10.623, ppl=1576.56, wps=9.2, ups=0.04, wpb=230, bsz=8, num_updates=2900, lr=2.9e-05, gnorm=3.753, train_wall=10, wall=841
RAWLOSS @ 3000 tensor(867.6174, device='xla:1')
2020-07-20 19:38:37 | INFO | train_inner | epoch 001: 3000 / 648283 loss=9.856, ppl=926.7, wps=5, ups=0.04, wpb=127, bsz=8, num_updates=3000, lr=3e-05, gnorm=5.171, train_wall=10, wall=867
RAWLOSS @ 3100 tensor(1705.9694, device='xla:1')
2020-07-20 19:39:02 | INFO | train_inner | epoch 001: 3100 / 648283 loss=9.924, ppl=971.57, wps=9.9, ups=0.04, wpb=248, bsz=8, num_updates=3100, lr=3.1e-05, gnorm=3.709, train_wall=10, wall=892
RAWLOSS @ 3200 tensor(1676.2924, device='xla:1')
2020-07-20 19:39:28 | INFO | train_inner | epoch 001: 3200 / 648283 loss=9.521, ppl=734.78, wps=9.7, ups=0.04, wpb=254, bsz=8, num_updates=3200, lr=3.2e-05, gnorm=3.369, train_wall=11, wall=918
RAWLOSS @ 3300 tensor(1001.3185, device='xla:1')
2020-07-20 19:39:53 | INFO | train_inner | epoch 001: 3300 / 648283 loss=9.567, ppl=758.43, wps=6, ups=0.04, wpb=151, bsz=8, num_updates=3300, lr=3.3e-05, gnorm=4.485, train_wall=10, wall=943
RAWLOSS @ 3400 tensor(1667.1111, device='xla:1')
2020-07-20 19:40:19 | INFO | train_inner | epoch 001: 3400 / 648283 loss=10.932, ppl=1954.28, wps=8.7, ups=0.04, wpb=220, bsz=8, num_updates=3400, lr=3.4e-05, gnorm=4.179, train_wall=10, wall=968
RAWLOSS @ 3500 tensor(1426.0515, device='xla:1')
2020-07-20 19:40:44 | INFO | train_inner | epoch 001: 3500 / 648283 loss=9.437, ppl=693.34, wps=8.7, ups=0.04, wpb=218, bsz=8, num_updates=3500, lr=3.5e-05, gnorm=4.225, train_wall=10, wall=994
RAWLOSS @ 3600 tensor(1634.5762, device='xla:1')
2020-07-20 19:41:09 | INFO | train_inner | epoch 001: 3600 / 648283 loss=8.933, ppl=488.61, wps=10.5, ups=0.04, wpb=264, bsz=8, num_updates=3600, lr=3.6e-05, gnorm=3.123, train_wall=10, wall=1019
RAWLOSS @ 3700 tensor(1552.2892, device='xla:1')
2020-07-20 19:41:34 | INFO | train_inner | epoch 001: 3700 / 648283 loss=8.782, ppl=440.28, wps=10.1, ups=0.04, wpb=255, bsz=8, num_updates=3700, lr=3.7e-05, gnorm=3.398, train_wall=10, wall=1044
RAWLOSS @ 3800 tensor(1543.8973, device='xla:1')
2020-07-20 19:41:59 | INFO | train_inner | epoch 001: 3800 / 648283 loss=9.899, ppl=955.05, wps=9, ups=0.04, wpb=225, bsz=8, num_updates=3800, lr=3.8e-05, gnorm=3.949, train_wall=10, wall=1069
RAWLOSS @ 3900 tensor(1488.0979, device='xla:1')
2020-07-20 19:42:25 | INFO | train_inner | epoch 001: 3900 / 648283 loss=8.763, ppl=434.36, wps=9.7, ups=0.04, wpb=245, bsz=8, num_updates=3900, lr=3.9e-05, gnorm=3.304, train_wall=10, wall=1094
RAWLOSS @ 4000 tensor(1680.7052, device='xla:1')
2020-07-20 19:42:50 | INFO | train_inner | epoch 001: 4000 / 648283 loss=9.22, ppl=596.16, wps=10.4, ups=0.04, wpb=263, bsz=8, num_updates=4000, lr=4e-05, gnorm=3.169, train_wall=10, wall=1120
RAWLOSS @ 4100 tensor(1941.4369, device='xla:1')
2020-07-20 19:43:15 | INFO | train_inner | epoch 001: 4100 / 648283 loss=9.692, ppl=826.97, wps=11.4, ups=0.04, wpb=289, bsz=8, num_updates=4100, lr=4.1e-05, gnorm=3.085, train_wall=10, wall=1145
RAWLOSS @ 4200 tensor(1739.0806, device='xla:1')
2020-07-20 19:43:41 | INFO | train_inner | epoch 001: 4200 / 648283 loss=10.411, ppl=1361.17, wps=9.2, ups=0.04, wpb=241, bsz=8, num_updates=4200, lr=4.2e-05, gnorm=4.04, train_wall=11, wall=1171
RAWLOSS @ 4300 tensor(1847.4691, device='xla:1')
2020-07-20 19:44:07 | INFO | train_inner | epoch 001: 4300 / 648283 loss=10.577, ppl=1527.25, wps=9.9, ups=0.04, wpb=252, bsz=8, num_updates=4300, lr=4.3e-05, gnorm=4.633, train_wall=10, wall=1197
RAWLOSS @ 4400 tensor(2160.4172, device='xla:1')
2020-07-20 19:44:32 | INFO | train_inner | epoch 001: 4400 / 648283 loss=10.054, ppl=1063.25, wps=12.2, ups=0.04, wpb=310, bsz=8, num_updates=4400, lr=4.4e-05, gnorm=2.967, train_wall=10, wall=1222
RAWLOSS @ 4500 tensor(1743.6643, device='xla:1')
2020-07-20 19:44:58 | INFO | train_inner | epoch 001: 4500 / 648283 loss=9.713, ppl=839.07, wps=10.2, ups=0.04, wpb=259, bsz=8, num_updates=4500, lr=4.5e-05, gnorm=3.415, train_wall=10, wall=1248
RAWLOSS @ 4600 tensor(1013.4943, device='xla:1')
2020-07-20 19:45:23 | INFO | train_inner | epoch 001: 4600 / 648283 loss=8.308, ppl=316.87, wps=6.9, ups=0.04, wpb=176, bsz=8, num_updates=4600, lr=4.6e-05, gnorm=3.738, train_wall=10, wall=1273
RAWLOSS @ 4700 tensor(1215.8892, device='xla:1')
2020-07-20 19:45:49 | INFO | train_inner | epoch 001: 4700 / 648283 loss=8.274, ppl=309.61, wps=8.3, ups=0.04, wpb=212, bsz=8, num_updates=4700, lr=4.7e-05, gnorm=3.482, train_wall=10, wall=1298
RAWLOSS @ 4800 tensor(1687.0696, device='xla:1')
2020-07-20 19:46:14 | INFO | train_inner | epoch 001: 4800 / 648283 loss=8.883, ppl=472.1, wps=10.8, ups=0.04, wpb=274, bsz=8, num_updates=4800, lr=4.8e-05, gnorm=3.139, train_wall=10, wall=1324
RAWLOSS @ 4900 tensor(1267.5745, device='xla:1')
2020-07-20 19:46:40 | INFO | train_inner | epoch 001: 4900 / 648283 loss=9.283, ppl=622.9, wps=7.5, ups=0.04, wpb=197, bsz=8, num_updates=4900, lr=4.9e-05, gnorm=3.492, train_wall=11, wall=1350
RAWLOSS @ 5000 tensor(1933.2743, device='xla:1')
2020-07-20 19:47:05 | INFO | train_inner | epoch 001: 5000 / 648283 loss=10.033, ppl=1047.57, wps=11, ups=0.04, wpb=278, bsz=8, num_updates=5000, lr=5e-05, gnorm=3.293, train_wall=10, wall=1375
RAWLOSS @ 5100 tensor(1389.2589, device='xla:1')
RAWLOSS @ 5200 tensor(1521.3794, device='xla:1')
RAWLOSS @ 5300 tensor(985.2541, device='xla:1')
RAWLOSS @ 5400 tensor(1239.3900, device='xla:1')
RAWLOSS @ 5500 tensor(1070.6313, device='xla:1')
RAWLOSS @ 5600 tensor(2136.5396, device='xla:1')
RAWLOSS @ 5700 tensor(1710.4930, device='xla:1')
RAWLOSS @ 5800 tensor(1422.4799, device='xla:1')
RAWLOSS @ 5900 tensor(1369.4492, device='xla:1')
RAWLOSS @ 6000 tensor(1244.3560, device='xla:1')
RAWLOSS @ 6100 tensor(2512.4678, device='xla:1')
RAWLOSS @ 6200 tensor(1890.8009, device='xla:1')
RAWLOSS @ 6300 tensor(1129.4983, device='xla:1')
RAWLOSS @ 6400 tensor(961.3102, device='xla:1')
RAWLOSS @ 6500 tensor(1826.2983, device='xla:1')
RAWLOSS @ 6600 tensor(669.0168, device='xla:1')
RAWLOSS @ 6700 tensor(1369.3162, device='xla:1')
RAWLOSS @ 6800 tensor(2058.0190, device='xla:1')
RAWLOSS @ 6900 tensor(1416.5652, device='xla:1')
RAWLOSS @ 7000 tensor(1297.1597, device='xla:1')
RAWLOSS @ 7100 tensor(1532.3556, device='xla:1')
RAWLOSS @ 7200 tensor(1194.9662, device='xla:1')
RAWLOSS @ 7300 tensor(1631.6942, device='xla:1')
RAWLOSS @ 7400 tensor(1123.8306, device='xla:1')
RAWLOSS @ 7500 tensor(1551.5527, device='xla:1')
RAWLOSS @ 7600 tensor(1701.3202, device='xla:1')
RAWLOSS @ 7700 tensor(1108.4896, device='xla:1')
RAWLOSS @ 7800 tensor(1451.0496, device='xla:1')
RAWLOSS @ 7900 tensor(1845.2185, device='xla:1')
RAWLOSS @ 8000 tensor(1752.1614, device='xla:1')
RAWLOSS @ 8100 tensor(1533.9989, device='xla:1')
RAWLOSS @ 8200 tensor(1526.4739, device='xla:1')
RAWLOSS @ 8300 tensor(2234.0649, device='xla:1')
RAWLOSS @ 8400 tensor(1127.3406, device='xla:1')
RAWLOSS @ 8500 tensor(1681.1058, device='xla:1')
RAWLOSS @ 8600 tensor(1827.9142, device='xla:1')
RAWLOSS @ 8700 tensor(1644.2002, device='xla:1')
RAWLOSS @ 8800 tensor(1690.3883, device='xla:1')
RAWLOSS @ 8900 tensor(1959.0852, device='xla:1')
RAWLOSS @ 9000 tensor(1217.4624, device='xla:1')
RAWLOSS @ 9100 tensor(1144.5541, device='xla:1')
RAWLOSS @ 9200 tensor(1397.6129, device='xla:1')
RAWLOSS @ 9300 tensor(1250.4963, device='xla:1')
RAWLOSS @ 9400 tensor(1405.1880, device='xla:1')
RAWLOSS @ 9500 tensor(938.4954, device='xla:1')
RAWLOSS @ 9600 tensor(1413.3824, device='xla:1')
RAWLOSS @ 9700 tensor(1633.5691, device='xla:1')
RAWLOSS @ 9800 tensor(2035.8236, device='xla:1')
RAWLOSS @ 9900 tensor(1467.5835, device='xla:1')
RAWLOSS @ 10000 tensor(1565.9857, device='xla:1')
RAWLOSS @ 10100 tensor(1274.9399, device='xla:1')
RAWLOSS @ 10200 tensor(1482.6823, device='xla:1')
RAWLOSS @ 10300 tensor(1025.6306, device='xla:1')
RAWLOSS @ 10400 tensor(1639.2822, device='xla:1')
RAWLOSS @ 10500 tensor(1739.4558, device='xla:1')
RAWLOSS @ 10600 tensor(1478.3591, device='xla:1')
RAWLOSS @ 10700 tensor(1441.7552, device='xla:1')
RAWLOSS @ 10800 tensor(1178.0557, device='xla:1')
RAWLOSS @ 10900 tensor(1758.1354, device='xla:1')
RAWLOSS @ 11000 tensor(2277.9707, device='xla:1')
RAWLOSS @ 11100 tensor(1958.7487, device='xla:1')
RAWLOSS @ 11200 tensor(1572.9771, device='xla:1')
RAWLOSS @ 11300 tensor(1671.0051, device='xla:1')
RAWLOSS @ 11400 tensor(1405.1251, device='xla:1')
RAWLOSS @ 11500 tensor(2001.3026, device='xla:1')
2020-07-20 19:54:05 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 9.119 | ppl 555.92 | wps 3697.2 | wpb 236.3 | bsz 8 | num_updates 5000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment