Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save taylanbil/bcdbe27ea22f4667015a2e135b74852a to your computer and use it in GitHub Desktop.
Save taylanbil/bcdbe27ea22f4667015a2e135b74852a to your computer and use it in GitHub Desktop.
RAWLOSS @ 100 tensor(2970.8003, device='xla:0')
RAWLOSS @ 100 tensor(1757.3165, device='xla:0')
RAWLOSS @ 100 tensor(1953.1198, device='xla:0')
RAWLOSS @ 100 tensor(3101.5469, device='xla:0')
RAWLOSS @ 100 tensor(3934.1355, device='xla:0')
RAWLOSS @ 100 tensor(2359.2961, device='xla:0')
RAWLOSS @ 100 tensor(2797.4104, device='xla:1')
RAWLOSS @ 100 tensor(2044.7153, device='xla:0')
2020-07-20 20:33:53 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-20 20:33:53 | INFO | train_inner | epoch 001: 100 / 81036 loss=13.799, ppl=14254.9, wps=0, ups=0, wpb=2187, bsz=64, num_updates=100, lr=1e-06, gnorm=3.694, train_wall=14, wall=95
2020-07-20 20:33:53 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
RAWLOSS @ 200 tensor(2427.6775, device='xla:0')
RAWLOSS @ 200 tensor(1442.3955, device='xla:0')
RAWLOSS @ 200 tensor(1955.6888, device='xla:1')
RAWLOSS @ 200 tensor(1564.5947, device='xla:0')
RAWLOSS @ 200 tensor(2397.6733, device='xla:0')
RAWLOSS @ 200 tensor(2411.3735, device='xla:0')
RAWLOSS @ 200 tensor(1628.1062, device='xla:0')
RAWLOSS @ 200 tensor(1908.8589, device='xla:0')
2020-07-20 20:34:27 | INFO | train_inner | epoch 001: 200 / 81036 loss=12.783, ppl=7048.47, wps=52.4, ups=0.03, wpb=1776, bsz=64, num_updates=200, lr=2e-06, gnorm=2.183, train_wall=11, wall=129
2020-07-20 20:34:27 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
RAWLOSS @ 300 tensor(1671.6956, device='xla:1')
RAWLOSS @ 300 tensor(2032.6129, device='xla:0')
RAWLOSS @ 300 tensor(2331.8958, device='xla:0')
RAWLOSS @ 300 tensor(1358.9934, device='xla:0')
RAWLOSS @ 300 tensor(1498.9780, device='xla:0')
RAWLOSS @ 300 tensor(2165.1125, device='xla:0')
RAWLOSS @ 300 tensor(1357.4266, device='xla:0')
RAWLOSS @ 300 tensor(2041.4008, device='xla:0')
2020-07-20 20:35:02 | INFO | train_inner | epoch 001: 300 / 81036 loss=12.357, ppl=5246.05, wps=49, ups=0.03, wpb=1688, bsz=64, num_updates=300, lr=3e-06, gnorm=1.994, train_wall=11, wall=164
2020-07-20 20:35:02 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
RAWLOSS @ 400 tensor(1330.5079, device='xla:0')
RAWLOSS @ 400 tensor(2962.9514, device='xla:0')
RAWLOSS @ 400 tensor(2582.0928, device='xla:0')
RAWLOSS @ 400 tensor(1621.3994, device='xla:0')
RAWLOSS @ 400 tensor(2639.0959, device='xla:0')
RAWLOSS @ 400 tensor(2012.1141, device='xla:0')
RAWLOSS @ 400 tensor(1408.4937, device='xla:1')
RAWLOSS @ 400 tensor(2422.6812, device='xla:0')
2020-07-20 20:35:36 | INFO | train_inner | epoch 001: 400 / 81036 loss=11.961, ppl=3986.59, wps=60.2, ups=0.03, wpb=2048, bsz=64, num_updates=400, lr=4e-06, gnorm=1.717, train_wall=12, wall=198
RAWLOSS @ 500 tensor(1421.8625, device='xla:0')
RAWLOSS @ 500 tensor(2101.3054, device='xla:0')
RAWLOSS @ 500 tensor(1190.1812, device='xla:0')
RAWLOSS @ 500 tensor(1715.4381, device='xla:0')
RAWLOSS @ 500 tensor(1854.4866, device='xla:0')
RAWLOSS @ 500 tensor(1990.5693, device='xla:0')
RAWLOSS @ 500 tensor(2170.2637, device='xla:0')
RAWLOSS @ 500 tensor(1837.7633, device='xla:1')
2020-07-20 20:36:10 | INFO | train_inner | epoch 001: 500 / 81036 loss=11.647, ppl=3208.03, wps=50.9, ups=0.03, wpb=1769, bsz=64, num_updates=500, lr=5e-06, gnorm=1.603, train_wall=12, wall=232
RAWLOSS @ 600 tensor(1572.3944, device='xla:0')
RAWLOSS @ 600 tensor(1736.2310, device='xla:0')
RAWLOSS @ 600 tensor(2109.1536, device='xla:0')
RAWLOSS @ 600 tensor(1393.9427, device='xla:1')
RAWLOSS @ 600 tensor(1931.5094, device='xla:0')
RAWLOSS @ 600 tensor(1741.2776, device='xla:0')
RAWLOSS @ 600 tensor(1575.5127, device='xla:0')
RAWLOSS @ 600 tensor(1897.7202, device='xla:0')
2020-07-20 20:36:45 | INFO | train_inner | epoch 001: 600 / 81036 loss=11.338, ppl=2589.15, wps=51.8, ups=0.03, wpb=1776, bsz=64, num_updates=600, lr=6e-06, gnorm=1.797, train_wall=12, wall=267
RAWLOSS @ 700 tensor(1785.8511, device='xla:0')
RAWLOSS @ 700 tensor(2110.7805, device='xla:0')
RAWLOSS @ 700 tensor(2514.1729, device='xla:0')
RAWLOSS @ 700 tensor(1954.3976, device='xla:1')
RAWLOSS @ 700 tensor(1459.2474, device='xla:0')
RAWLOSS @ 700 tensor(1131.9515, device='xla:0')
RAWLOSS @ 700 tensor(2078.4407, device='xla:0')
RAWLOSS @ 700 tensor(1843.8748, device='xla:0')
2020-07-20 20:37:19 | INFO | train_inner | epoch 001: 700 / 81036 loss=11.002, ppl=2051.23, wps=57.6, ups=0.03, wpb=1951, bsz=64, num_updates=700, lr=7e-06, gnorm=1.622, train_wall=11, wall=301
RAWLOSS @ 800 tensor(2222.1287, device='xla:0')
RAWLOSS @ 800 tensor(1078.9088, device='xla:0')
RAWLOSS @ 800 tensor(2100.1682, device='xla:1')
RAWLOSS @ 800 tensor(2315.2129, device='xla:0')
RAWLOSS @ 800 tensor(1601.1167, device='xla:0')
RAWLOSS @ 800 tensor(1832.7350, device='xla:0')
RAWLOSS @ 800 tensor(1353.9714, device='xla:0')
RAWLOSS @ 800 tensor(1422.4886, device='xla:0')
2020-07-20 20:37:52 | INFO | train_inner | epoch 001: 800 / 81036 loss=10.569, ppl=1519.29, wps=56.1, ups=0.03, wpb=1901, bsz=64, num_updates=800, lr=8e-06, gnorm=1.794, train_wall=12, wall=335
RAWLOSS @ 900 tensor(2001.3239, device='xla:0')
RAWLOSS @ 900 tensor(1435.3341, device='xla:0')
RAWLOSS @ 900 tensor(2326.0073, device='xla:0')
RAWLOSS @ 900 tensor(1358.6759, device='xla:0')
RAWLOSS @ 900 tensor(1670.4648, device='xla:0')
RAWLOSS @ 900 tensor(1583.6451, device='xla:0')
RAWLOSS @ 900 tensor(1365.2646, device='xla:0')
RAWLOSS @ 900 tensor(1394.8363, device='xla:1')
2020-07-20 20:38:27 | INFO | train_inner | epoch 001: 900 / 81036 loss=10.569, ppl=1519.32, wps=52.2, ups=0.03, wpb=1793, bsz=64, num_updates=900, lr=9e-06, gnorm=1.902, train_wall=12, wall=369
RAWLOSS @ 1000 tensor(1818.5525, device='xla:0')
RAWLOSS @ 1000 tensor(1990.1970, device='xla:0')
RAWLOSS @ 1000 tensor(1839.7052, device='xla:0')
RAWLOSS @ 1000 tensor(1071.9586, device='xla:1')
RAWLOSS @ 1000 tensor(1533.0587, device='xla:0')
RAWLOSS @ 1000 tensor(1177.7563, device='xla:0')
RAWLOSS @ 1000 tensor(1397.5411, device='xla:0')
RAWLOSS @ 1000 tensor(1775.6034, device='xla:0')
2020-07-20 20:39:01 | INFO | train_inner | epoch 001: 1000 / 81036 loss=10.262, ppl=1227.92, wps=52, ups=0.03, wpb=1772, bsz=64, num_updates=1000, lr=1e-05, gnorm=1.954, train_wall=12, wall=403
RAWLOSS @ 1100 tensor(1649.9690, device='xla:0')
RAWLOSS @ 1100 tensor(1729.4590, device='xla:0')
RAWLOSS @ 1100 tensor(1744.8107, device='xla:0')
RAWLOSS @ 1100 tensor(1423.1836, device='xla:0')
RAWLOSS @ 1100 tensor(1465.9176, device='xla:0')
RAWLOSS @ 1100 tensor(1746.0730, device='xla:0')
RAWLOSS @ 1100 tensor(1902.0587, device='xla:1')
RAWLOSS @ 1100 tensor(1854.7543, device='xla:0')
RAWLOSS @ 1200 tensor(1120.9475, device='xla:0')
RAWLOSS @ 1200 tensor(2651.9900, device='xla:0')
RAWLOSS @ 1200 tensor(1851.4359, device='xla:0')
RAWLOSS @ 1200 tensor(1922.0718, device='xla:0')
RAWLOSS @ 1200 tensor(787.5982, device='xla:0')
RAWLOSS @ 1200 tensor(1496.5520, device='xla:1')
RAWLOSS @ 1200 tensor(2067.7168, device='xla:0')
RAWLOSS @ 1200 tensor(1216.9918, device='xla:0')
RAWLOSS @ 1300 tensor(1950.9186, device='xla:0')
RAWLOSS @ 1300 tensor(1934.1382, device='xla:0')
RAWLOSS @ 1300 tensor(1986.0408, device='xla:0')
RAWLOSS @ 1300 tensor(1266.9921, device='xla:0')
RAWLOSS @ 1300 tensor(1555.1661, device='xla:0')
RAWLOSS @ 1300 tensor(1688.6029, device='xla:0')
RAWLOSS @ 1300 tensor(3904.7627, device='xla:1')
RAWLOSS @ 1300 tensor(1987.5448, device='xla:0')
RAWLOSS @ 1400 tensor(2353.2351, device='xla:0')
RAWLOSS @ 1400 tensor(1423.8197, device='xla:0')
RAWLOSS @ 1400 tensor(1797.4116, device='xla:0')
RAWLOSS @ 1400 tensor(1382.4547, device='xla:0')
RAWLOSS @ 1400 tensor(1763.2184, device='xla:1')
RAWLOSS @ 1400 tensor(1751.9918, device='xla:0')
RAWLOSS @ 1400 tensor(1663.2499, device='xla:0')
RAWLOSS @ 1400 tensor(2144.2837, device='xla:0')
RAWLOSS @ 1500 tensor(1556.6667, device='xla:0')
RAWLOSS @ 1500 tensor(1770.3358, device='xla:1')
RAWLOSS @ 1500 tensor(1429.5319, device='xla:0')
RAWLOSS @ 1500 tensor(1411.8625, device='xla:0')
RAWLOSS @ 1500 tensor(1295.8887, device='xla:0')
RAWLOSS @ 1500 tensor(1495.4001, device='xla:0')
RAWLOSS @ 1500 tensor(1589.7419, device='xla:0')
RAWLOSS @ 1500 tensor(1606.0579, device='xla:0')
RAWLOSS @ 1600 tensor(1043.6141, device='xla:0')
RAWLOSS @ 1600 tensor(1415.6194, device='xla:0')
RAWLOSS @ 1600 tensor(2111.9822, device='xla:1')
RAWLOSS @ 1600 tensor(1848.3516, device='xla:0')
RAWLOSS @ 1600 tensor(1574.2125, device='xla:0')
RAWLOSS @ 1600 tensor(2576.0010, device='xla:0')
RAWLOSS @ 1600 tensor(1435.9971, device='xla:0')
RAWLOSS @ 1600 tensor(2283.5654, device='xla:0')
RAWLOSS @ 1700 tensor(1595.3772, device='xla:0')
RAWLOSS @ 1700 tensor(1805.8171, device='xla:1')
RAWLOSS @ 1700 tensor(1852.5198, device='xla:0')
RAWLOSS @ 1700 tensor(1582.0444, device='xla:0')
RAWLOSS @ 1700 tensor(2197.1626, device='xla:0')
RAWLOSS @ 1700 tensor(1510.3783, device='xla:0')
RAWLOSS @ 1700 tensor(1680.5950, device='xla:0')
RAWLOSS @ 1700 tensor(1389.6589, device='xla:0')
RAWLOSS @ 1800 tensor(1386.3527, device='xla:0')
RAWLOSS @ 1800 tensor(1477.1746, device='xla:0')
RAWLOSS @ 1800 tensor(1316.0349, device='xla:0')
RAWLOSS @ 1800 tensor(1313.9607, device='xla:1')
RAWLOSS @ 1800 tensor(1534.0713, device='xla:0')
RAWLOSS @ 1800 tensor(2442.3997, device='xla:0')
RAWLOSS @ 1800 tensor(2022.6328, device='xla:0')
RAWLOSS @ 1800 tensor(1580.3710, device='xla:0')
2020-07-20 20:40:39 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 10.362 | ppl 1316 | wps 16156.2 | wpb 1889.5 | bsz 64 | num_updates 1000
2020-07-20 20:41:22 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_1_1000.pt (epoch 1 @ 1000 updates, score 10.362) (writing took 42.88991188723594 seconds)
2020-07-20 20:41:22 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
RAWLOSS @ 1900 tensor(1409.9108, device='xla:0')
RAWLOSS @ 1900 tensor(2181.8125, device='xla:1')
RAWLOSS @ 1900 tensor(976.1409, device='xla:0')
RAWLOSS @ 1900 tensor(1605.6019, device='xla:0')
RAWLOSS @ 1900 tensor(950.1663, device='xla:0')
RAWLOSS @ 1900 tensor(1500.7196, device='xla:0')
RAWLOSS @ 1900 tensor(1447.4484, device='xla:0')
RAWLOSS @ 1900 tensor(1580.1951, device='xla:0')
2020-07-20 20:41:56 | INFO | train_inner | epoch 001: 1100 / 81036 loss=10.11, ppl=1105.21, wps=11.1, ups=0.01, wpb=1935, bsz=64, num_updates=1100, lr=1.1e-05, gnorm=1.91, train_wall=12, wall=578
RAWLOSS @ 2000 tensor(1711.6940, device='xla:0')
RAWLOSS @ 2000 tensor(1366.3888, device='xla:0')
RAWLOSS @ 2000 tensor(1673.8571, device='xla:0')
RAWLOSS @ 2000 tensor(1178.6492, device='xla:0')
RAWLOSS @ 2000 tensor(1122.1761, device='xla:1')
RAWLOSS @ 2000 tensor(1364.6173, device='xla:0')
RAWLOSS @ 2000 tensor(1503.6803, device='xla:0')
RAWLOSS @ 2000 tensor(1768.3527, device='xla:0')
2020-07-20 20:42:30 | INFO | train_inner | epoch 001: 1200 / 81036 loss=10.348, ppl=1303.44, wps=53.1, ups=0.03, wpb=1817, bsz=64, num_updates=1200, lr=1.2e-05, gnorm=1.701, train_wall=12, wall=612
RAWLOSS @ 2100 tensor(1645.3506, device='xla:0')
RAWLOSS @ 2100 tensor(1504.4690, device='xla:0')
RAWLOSS @ 2100 tensor(1770.7802, device='xla:0')
RAWLOSS @ 2100 tensor(1804.2430, device='xla:0')
RAWLOSS @ 2100 tensor(1650.1572, device='xla:0')
RAWLOSS @ 2100 tensor(1712.3865, device='xla:1')
RAWLOSS @ 2100 tensor(1658.8093, device='xla:0')
RAWLOSS @ 2100 tensor(2189.2732, device='xla:0')
2020-07-20 20:43:04 | INFO | train_inner | epoch 001: 1300 / 81036 loss=10.385, ppl=1337.29, wps=51.3, ups=0.03, wpb=1757, bsz=64, num_updates=1300, lr=1.3e-05, gnorm=1.967, train_wall=12, wall=646
RAWLOSS @ 2200 tensor(741.3139, device='xla:1')
RAWLOSS @ 2200 tensor(1397.9999, device='xla:0')
RAWLOSS @ 2200 tensor(1306.1820, device='xla:0')
RAWLOSS @ 2200 tensor(1816.5071, device='xla:0')
RAWLOSS @ 2200 tensor(1712.6990, device='xla:0')
RAWLOSS @ 2200 tensor(1625.7528, device='xla:0')
RAWLOSS @ 2200 tensor(1509.3036, device='xla:0')
RAWLOSS @ 2200 tensor(1460.7605, device='xla:0')
2020-07-20 20:43:39 | INFO | train_inner | epoch 001: 1400 / 81036 loss=9.436, ppl=692.77, wps=53.9, ups=0.03, wpb=1846, bsz=64, num_updates=1400, lr=1.4e-05, gnorm=2.271, train_wall=11, wall=681
RAWLOSS @ 2300 tensor(1256.1327, device='xla:0')
RAWLOSS @ 2300 tensor(1919.0973, device='xla:0')
RAWLOSS @ 2300 tensor(1033.0536, device='xla:0')
RAWLOSS @ 2300 tensor(1770.2001, device='xla:0')
RAWLOSS @ 2300 tensor(1356.5675, device='xla:0')
RAWLOSS @ 2300 tensor(1281.6326, device='xla:0')
RAWLOSS @ 2300 tensor(1896.7076, device='xla:1')
RAWLOSS @ 2300 tensor(2044.0779, device='xla:0')
2020-07-20 20:44:14 | INFO | train_inner | epoch 001: 1500 / 81036 loss=9.944, ppl=985.28, wps=56.6, ups=0.03, wpb=2018, bsz=64, num_updates=1500, lr=1.5e-05, gnorm=2.032, train_wall=12, wall=716
RAWLOSS @ 2400 tensor(1075.7241, device='xla:0')
RAWLOSS @ 2400 tensor(1548.3057, device='xla:0')
RAWLOSS @ 2400 tensor(890.2056, device='xla:0')
RAWLOSS @ 2400 tensor(2199.2112, device='xla:1')
RAWLOSS @ 2400 tensor(1585.5078, device='xla:0')
RAWLOSS @ 2400 tensor(1897.6166, device='xla:0')
RAWLOSS @ 2400 tensor(2349.6912, device='xla:0')
RAWLOSS @ 2400 tensor(1568.6990, device='xla:0')
2020-07-20 20:44:49 | INFO | train_inner | epoch 001: 1600 / 81036 loss=9.346, ppl=650.73, wps=57.3, ups=0.03, wpb=2016, bsz=64, num_updates=1600, lr=1.6e-05, gnorm=2.237, train_wall=13, wall=752
RAWLOSS @ 2500 tensor(962.3625, device='xla:0')
RAWLOSS @ 2500 tensor(1415.0819, device='xla:1')
RAWLOSS @ 2500 tensor(1357.3680, device='xla:0')
RAWLOSS @ 2500 tensor(1532.6869, device='xla:0')
RAWLOSS @ 2500 tensor(1903.3213, device='xla:0')
RAWLOSS @ 2500 tensor(1041.7490, device='xla:0')
RAWLOSS @ 2500 tensor(1649.4910, device='xla:0')
RAWLOSS @ 2500 tensor(1362.5601, device='xla:0')
2020-07-20 20:45:23 | INFO | train_inner | epoch 001: 1700 / 81036 loss=9.59, ppl=770.76, wps=52.8, ups=0.03, wpb=1793, bsz=64, num_updates=1700, lr=1.7e-05, gnorm=2.214, train_wall=12, wall=786
RAWLOSS @ 2600 tensor(1433.6476, device='xla:0')
RAWLOSS @ 2600 tensor(1117.7612, device='xla:0')
RAWLOSS @ 2600 tensor(861.6875, device='xla:0')
RAWLOSS @ 2600 tensor(1551.6271, device='xla:0')
RAWLOSS @ 2600 tensor(1738.4299, device='xla:1')
RAWLOSS @ 2600 tensor(2004.2317, device='xla:0')
RAWLOSS @ 2600 tensor(1511.3901, device='xla:0')
RAWLOSS @ 2600 tensor(1881.3784, device='xla:0')
2020-07-20 20:45:57 | INFO | train_inner | epoch 001: 1800 / 81036 loss=9.22, ppl=596.38, wps=61.4, ups=0.03, wpb=2089, bsz=64, num_updates=1800, lr=1.8e-05, gnorm=2.869, train_wall=12, wall=820
RAWLOSS @ 2700 tensor(1764.5472, device='xla:0')
RAWLOSS @ 2700 tensor(1887.2849, device='xla:0')
RAWLOSS @ 2700 tensor(1853.2770, device='xla:0')
RAWLOSS @ 2700 tensor(1354.0685, device='xla:1')
RAWLOSS @ 2700 tensor(1929.4672, device='xla:0')
RAWLOSS @ 2700 tensor(968.1652, device='xla:0')
RAWLOSS @ 2700 tensor(1323.0848, device='xla:0')
RAWLOSS @ 2700 tensor(1847.4006, device='xla:0')
2020-07-20 20:46:32 | INFO | train_inner | epoch 001: 1900 / 81036 loss=9.691, ppl=826.7, wps=60.1, ups=0.03, wpb=2059, bsz=64, num_updates=1900, lr=1.9e-05, gnorm=2.261, train_wall=11, wall=854
RAWLOSS @ 2800 tensor(1179.4827, device='xla:0')
RAWLOSS @ 2800 tensor(1810.1344, device='xla:0')
RAWLOSS @ 2800 tensor(1247.4254, device='xla:0')
RAWLOSS @ 2800 tensor(1511.3344, device='xla:0')
RAWLOSS @ 2800 tensor(1387.0867, device='xla:0')
RAWLOSS @ 2800 tensor(1970.2271, device='xla:0')
RAWLOSS @ 2800 tensor(1260.4536, device='xla:0')
RAWLOSS @ 2800 tensor(1428.8004, device='xla:1')
2020-07-20 20:47:06 | INFO | train_inner | epoch 001: 2000 / 81036 loss=8.914, ppl=482.34, wps=57.1, ups=0.03, wpb=1949, bsz=64, num_updates=2000, lr=2e-05, gnorm=1.946, train_wall=12, wall=888
RAWLOSS @ 2900 tensor(2130.4285, device='xla:0')
RAWLOSS @ 2900 tensor(1291.1876, device='xla:0')
RAWLOSS @ 2900 tensor(1186.7540, device='xla:0')
RAWLOSS @ 2900 tensor(881.2741, device='xla:0')
RAWLOSS @ 2900 tensor(1175.3256, device='xla:0')
RAWLOSS @ 2900 tensor(1722.8356, device='xla:0')
RAWLOSS @ 2900 tensor(683.1208, device='xla:1')
RAWLOSS @ 2900 tensor(1670.6589, device='xla:0')
RAWLOSS @ 3000 tensor(2093.8652, device='xla:0')
RAWLOSS @ 3000 tensor(1373.8485, device='xla:0')
RAWLOSS @ 3000 tensor(1307.9662, device='xla:0')
RAWLOSS @ 3000 tensor(1185.1688, device='xla:0')
RAWLOSS @ 3000 tensor(1613.1737, device='xla:0')
RAWLOSS @ 3000 tensor(1396.5134, device='xla:1')
RAWLOSS @ 3000 tensor(1617.2462, device='xla:0')
RAWLOSS @ 3000 tensor(1601.1096, device='xla:0')
RAWLOSS @ 3100 tensor(1415.2422, device='xla:0')
RAWLOSS @ 3100 tensor(1636.2690, device='xla:0')
RAWLOSS @ 3100 tensor(1290.3104, device='xla:1')
RAWLOSS @ 3100 tensor(1212.2844, device='xla:0')
RAWLOSS @ 3100 tensor(1369.2926, device='xla:0')
RAWLOSS @ 3100 tensor(1614.9175, device='xla:0')
RAWLOSS @ 3100 tensor(1404.9669, device='xla:0')
RAWLOSS @ 3100 tensor(1441.8110, device='xla:0')
RAWLOSS @ 3200 tensor(1324.9653, device='xla:0')
RAWLOSS @ 3200 tensor(837.5519, device='xla:0')
RAWLOSS @ 3200 tensor(1499.7574, device='xla:0')
RAWLOSS @ 3200 tensor(1189.5745, device='xla:0')
RAWLOSS @ 3200 tensor(1908.1287, device='xla:0')
RAWLOSS @ 3200 tensor(1531.9708, device='xla:0')
RAWLOSS @ 3200 tensor(1667.5598, device='xla:1')
RAWLOSS @ 3200 tensor(1652.9093, device='xla:0')
RAWLOSS @ 3300 tensor(1473.4531, device='xla:0')
RAWLOSS @ 3300 tensor(955.8904, device='xla:0')
RAWLOSS @ 3300 tensor(1278.3154, device='xla:0')
RAWLOSS @ 3300 tensor(1414.3846, device='xla:0')
RAWLOSS @ 3300 tensor(1417.2324, device='xla:1')
RAWLOSS @ 3300 tensor(1852.7939, device='xla:0')
RAWLOSS @ 3300 tensor(824.7117, device='xla:0')
RAWLOSS @ 3300 tensor(1249.7942, device='xla:0')
RAWLOSS @ 3400 tensor(1740.2345, device='xla:0')
RAWLOSS @ 3400 tensor(1808.3588, device='xla:1')
RAWLOSS @ 3400 tensor(1330.5613, device='xla:0')
RAWLOSS @ 3400 tensor(1269.6254, device='xla:0')
RAWLOSS @ 3400 tensor(1708.9253, device='xla:0')
RAWLOSS @ 3400 tensor(1880.5201, device='xla:0')
RAWLOSS @ 3400 tensor(1607.5392, device='xla:0')
RAWLOSS @ 3400 tensor(1432.4714, device='xla:0')
RAWLOSS @ 3500 tensor(1701.1743, device='xla:0')
RAWLOSS @ 3500 tensor(1393.0560, device='xla:1')
RAWLOSS @ 3500 tensor(1385.7749, device='xla:0')
RAWLOSS @ 3500 tensor(1365.4130, device='xla:0')
RAWLOSS @ 3500 tensor(1775.6072, device='xla:0')
RAWLOSS @ 3500 tensor(1931.9413, device='xla:0')
RAWLOSS @ 3500 tensor(1142.4243, device='xla:0')
RAWLOSS @ 3500 tensor(1310.2847, device='xla:0')
RAWLOSS @ 3600 tensor(1148.5848, device='xla:0')
RAWLOSS @ 3600 tensor(1528.9443, device='xla:1')
RAWLOSS @ 3600 tensor(1755.8218, device='xla:0')
RAWLOSS @ 3600 tensor(1317.0939, device='xla:0')
RAWLOSS @ 3600 tensor(1250.5432, device='xla:0')
RAWLOSS @ 3600 tensor(1637.4978, device='xla:0')
RAWLOSS @ 3600 tensor(1392.2312, device='xla:0')
RAWLOSS @ 3600 tensor(1144.7692, device='xla:0')
2020-07-20 20:48:27 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 9.308 | ppl 633.82 | wps 19526.8 | wpb 1889.5 | bsz 64 | num_updates 2000 | best_loss 9.308
@kkissmart
Copy link

I pulled fairseq code from master branch.

run

python train.py $HOME/pytorch-tutorial-data/wmt18_en_de_bpej32k --arch=transformer_vaswani_wmt_en_de_big -s en -t de --criterion cross_entropy --encoder-normalize-before --decoder-normalize-before --task translation --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --lr-scheduler polynomial_decay --lr 1e-04 --min-lr -1 --warmup-updates 10000 --total-num-update 500000 --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.0 --max-tokens 2052 --seed 2 --log-format simple --log-interval 100 --max-source-positions 1026 --max-target-positions 1026 --save-interval-updates 5000 --skip-invalid-size-inputs-valid-test --num-batch-buckets 1 --save-dir dummy_trans --tpu --distributed-world-size 8 --bf 16

and here is my logs.
2020-07-22 04:52:57 | INFO | fairseq_cli.train | model transformer_vaswani_wmt_en_de_big, criterion CrossEntropyCriterion
2020-07-22 04:52:57 | INFO | fairseq_cli.train | num. model params: 285915136 (num. trained: 285915136)
2020-07-22 04:53:03 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-07-22 04:53:03 | INFO | fairseq_cli.train | max tokens per GPU = 2052 and max sentences per GPU = None
2020-07-22 04:53:03 | INFO | fairseq.trainer | no existing checkpoint found dummy_trans/checkpoint_last.pt
2020-07-22 04:53:03 | INFO | fairseq.trainer | loading train data for epoch 1
2020-07-22 04:53:05 | INFO | fairseq.data.data_utils | loaded 5186259 examples from: /home/yinhanliu/pytorch-tutorial-data/wmt18_en_de_bpej32k/train.en-de.en
2020-07-22 04:53:07 | INFO | fairseq.data.data_utils | loaded 5186259 examples from: /home/yinhanliu/pytorch-tutorial-data/wmt18_en_de_bpej32k/train.en-de.de
2020-07-22 04:53:07 | INFO | fairseq.tasks.translation | /home/yinhanliu/pytorch-tutorial-data/wmt18_en_de_bpej32k train en-de 5186259 examples
2020-07-22 04:53:07 | INFO | fairseq.data.language_pair_dataset | bucketing source lengths: [251]
2020-07-22 04:53:07 | INFO | fairseq.data.language_pair_dataset | bucketing target lengths: [251]
2020-07-22 04:54:08 | INFO | fairseq_cli.train | begin training epoch 1
2020-07-22 04:55:09 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:55:37 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:56:09 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:56:40 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:56:40 | INFO | train_inner | epoch 001: 100 / 81036 loss=14.464, ppl=22603.3, wps=0, ups=0, wpb=2187, bsz=64, num_updates=100, lr=1e-06, gnorm=7.729, train_wall=98, wall=216
2020-07-22 04:56:40 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:57:13 | INFO | train_inner | epoch 001: 200 / 81036 loss=13.142, ppl=9036.69, wps=52.8, ups=0.03, wpb=1776, bsz=64, num_updates=200, lr=2e-06, gnorm=5.436, train_wall=16, wall=250
2020-07-22 04:57:13 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:57:47 | INFO | train_inner | epoch 001: 300 / 81036 loss=12.628, ppl=6330.36, wps=49.9, ups=0.03, wpb=1688, bsz=64, num_updates=300, lr=3e-06, gnorm=4.804, train_wall=16, wall=284
2020-07-22 04:57:47 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:58:21 | INFO | train_inner | epoch 001: 400 / 81036 loss=12.268, ppl=4932.25, wps=60.8, ups=0.03, wpb=2048, bsz=64, num_updates=400, lr=4e-06, gnorm=3.261, train_wall=16, wall=317
2020-07-22 04:58:54 | INFO | train_inner | epoch 001: 500 / 81036 loss=12.049, ppl=4238.2, wps=52.9, ups=0.03, wpb=1769, bsz=64, num_updates=500, lr=5e-06, gnorm=4.143, train_wall=16, wall=351
2020-07-22 04:59:28 | INFO | train_inner | epoch 001: 600 / 81036 loss=11.772, ppl=3498.23, wps=53, ups=0.03, wpb=1776, bsz=64, num_updates=600, lr=6e-06, gnorm=3.532, train_wall=16, wall=384
2020-07-22 05:00:01 | INFO | train_inner | epoch 001: 700 / 81036 loss=11.453, ppl=2804.32, wps=58, ups=0.03, wpb=1951, bsz=64, num_updates=700, lr=7e-06, gnorm=3.642, train_wall=16, wall=418
2020-07-22 05:00:35 | INFO | train_inner | epoch 001: 800 / 81036 loss=11.068, ppl=2146.65, wps=56.8, ups=0.03, wpb=1901, bsz=64, num_updates=800, lr=8e-06, gnorm=4.044, train_wall=16, wall=452

BUT I AM ABLE TO REPRODUCE YOUR NUMBERS WITHOUT BF16!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment