Created
July 20, 2020 20:51
-
-
Save taylanbil/bcdbe27ea22f4667015a2e135b74852a to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RAWLOSS @ 100 tensor(2970.8003, device='xla:0') | |
RAWLOSS @ 100 tensor(1757.3165, device='xla:0') | |
RAWLOSS @ 100 tensor(1953.1198, device='xla:0') | |
RAWLOSS @ 100 tensor(3101.5469, device='xla:0') | |
RAWLOSS @ 100 tensor(3934.1355, device='xla:0') | |
RAWLOSS @ 100 tensor(2359.2961, device='xla:0') | |
RAWLOSS @ 100 tensor(2797.4104, device='xla:1') | |
RAWLOSS @ 100 tensor(2044.7153, device='xla:0') | |
2020-07-20 20:33:53 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning | |
2020-07-20 20:33:53 | INFO | train_inner | epoch 001: 100 / 81036 loss=13.799, ppl=14254.9, wps=0, ups=0, wpb=2187, bsz=64, num_updates=100, lr=1e-06, gnorm=3.694, train_wall=14, wall=95 | |
2020-07-20 20:33:53 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning | |
RAWLOSS @ 200 tensor(2427.6775, device='xla:0') | |
RAWLOSS @ 200 tensor(1442.3955, device='xla:0') | |
RAWLOSS @ 200 tensor(1955.6888, device='xla:1') | |
RAWLOSS @ 200 tensor(1564.5947, device='xla:0') | |
RAWLOSS @ 200 tensor(2397.6733, device='xla:0') | |
RAWLOSS @ 200 tensor(2411.3735, device='xla:0') | |
RAWLOSS @ 200 tensor(1628.1062, device='xla:0') | |
RAWLOSS @ 200 tensor(1908.8589, device='xla:0') | |
2020-07-20 20:34:27 | INFO | train_inner | epoch 001: 200 / 81036 loss=12.783, ppl=7048.47, wps=52.4, ups=0.03, wpb=1776, bsz=64, num_updates=200, lr=2e-06, gnorm=2.183, train_wall=11, wall=129 | |
2020-07-20 20:34:27 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning | |
RAWLOSS @ 300 tensor(1671.6956, device='xla:1') | |
RAWLOSS @ 300 tensor(2032.6129, device='xla:0') | |
RAWLOSS @ 300 tensor(2331.8958, device='xla:0') | |
RAWLOSS @ 300 tensor(1358.9934, device='xla:0') | |
RAWLOSS @ 300 tensor(1498.9780, device='xla:0') | |
RAWLOSS @ 300 tensor(2165.1125, device='xla:0') | |
RAWLOSS @ 300 tensor(1357.4266, device='xla:0') | |
RAWLOSS @ 300 tensor(2041.4008, device='xla:0') | |
2020-07-20 20:35:02 | INFO | train_inner | epoch 001: 300 / 81036 loss=12.357, ppl=5246.05, wps=49, ups=0.03, wpb=1688, bsz=64, num_updates=300, lr=3e-06, gnorm=1.994, train_wall=11, wall=164 | |
2020-07-20 20:35:02 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning | |
RAWLOSS @ 400 tensor(1330.5079, device='xla:0') | |
RAWLOSS @ 400 tensor(2962.9514, device='xla:0') | |
RAWLOSS @ 400 tensor(2582.0928, device='xla:0') | |
RAWLOSS @ 400 tensor(1621.3994, device='xla:0') | |
RAWLOSS @ 400 tensor(2639.0959, device='xla:0') | |
RAWLOSS @ 400 tensor(2012.1141, device='xla:0') | |
RAWLOSS @ 400 tensor(1408.4937, device='xla:1') | |
RAWLOSS @ 400 tensor(2422.6812, device='xla:0') | |
2020-07-20 20:35:36 | INFO | train_inner | epoch 001: 400 / 81036 loss=11.961, ppl=3986.59, wps=60.2, ups=0.03, wpb=2048, bsz=64, num_updates=400, lr=4e-06, gnorm=1.717, train_wall=12, wall=198 | |
RAWLOSS @ 500 tensor(1421.8625, device='xla:0') | |
RAWLOSS @ 500 tensor(2101.3054, device='xla:0') | |
RAWLOSS @ 500 tensor(1190.1812, device='xla:0') | |
RAWLOSS @ 500 tensor(1715.4381, device='xla:0') | |
RAWLOSS @ 500 tensor(1854.4866, device='xla:0') | |
RAWLOSS @ 500 tensor(1990.5693, device='xla:0') | |
RAWLOSS @ 500 tensor(2170.2637, device='xla:0') | |
RAWLOSS @ 500 tensor(1837.7633, device='xla:1') | |
2020-07-20 20:36:10 | INFO | train_inner | epoch 001: 500 / 81036 loss=11.647, ppl=3208.03, wps=50.9, ups=0.03, wpb=1769, bsz=64, num_updates=500, lr=5e-06, gnorm=1.603, train_wall=12, wall=232 | |
RAWLOSS @ 600 tensor(1572.3944, device='xla:0') | |
RAWLOSS @ 600 tensor(1736.2310, device='xla:0') | |
RAWLOSS @ 600 tensor(2109.1536, device='xla:0') | |
RAWLOSS @ 600 tensor(1393.9427, device='xla:1') | |
RAWLOSS @ 600 tensor(1931.5094, device='xla:0') | |
RAWLOSS @ 600 tensor(1741.2776, device='xla:0') | |
RAWLOSS @ 600 tensor(1575.5127, device='xla:0') | |
RAWLOSS @ 600 tensor(1897.7202, device='xla:0') | |
2020-07-20 20:36:45 | INFO | train_inner | epoch 001: 600 / 81036 loss=11.338, ppl=2589.15, wps=51.8, ups=0.03, wpb=1776, bsz=64, num_updates=600, lr=6e-06, gnorm=1.797, train_wall=12, wall=267 | |
RAWLOSS @ 700 tensor(1785.8511, device='xla:0') | |
RAWLOSS @ 700 tensor(2110.7805, device='xla:0') | |
RAWLOSS @ 700 tensor(2514.1729, device='xla:0') | |
RAWLOSS @ 700 tensor(1954.3976, device='xla:1') | |
RAWLOSS @ 700 tensor(1459.2474, device='xla:0') | |
RAWLOSS @ 700 tensor(1131.9515, device='xla:0') | |
RAWLOSS @ 700 tensor(2078.4407, device='xla:0') | |
RAWLOSS @ 700 tensor(1843.8748, device='xla:0') | |
2020-07-20 20:37:19 | INFO | train_inner | epoch 001: 700 / 81036 loss=11.002, ppl=2051.23, wps=57.6, ups=0.03, wpb=1951, bsz=64, num_updates=700, lr=7e-06, gnorm=1.622, train_wall=11, wall=301 | |
RAWLOSS @ 800 tensor(2222.1287, device='xla:0') | |
RAWLOSS @ 800 tensor(1078.9088, device='xla:0') | |
RAWLOSS @ 800 tensor(2100.1682, device='xla:1') | |
RAWLOSS @ 800 tensor(2315.2129, device='xla:0') | |
RAWLOSS @ 800 tensor(1601.1167, device='xla:0') | |
RAWLOSS @ 800 tensor(1832.7350, device='xla:0') | |
RAWLOSS @ 800 tensor(1353.9714, device='xla:0') | |
RAWLOSS @ 800 tensor(1422.4886, device='xla:0') | |
2020-07-20 20:37:52 | INFO | train_inner | epoch 001: 800 / 81036 loss=10.569, ppl=1519.29, wps=56.1, ups=0.03, wpb=1901, bsz=64, num_updates=800, lr=8e-06, gnorm=1.794, train_wall=12, wall=335 | |
RAWLOSS @ 900 tensor(2001.3239, device='xla:0') | |
RAWLOSS @ 900 tensor(1435.3341, device='xla:0') | |
RAWLOSS @ 900 tensor(2326.0073, device='xla:0') | |
RAWLOSS @ 900 tensor(1358.6759, device='xla:0') | |
RAWLOSS @ 900 tensor(1670.4648, device='xla:0') | |
RAWLOSS @ 900 tensor(1583.6451, device='xla:0') | |
RAWLOSS @ 900 tensor(1365.2646, device='xla:0') | |
RAWLOSS @ 900 tensor(1394.8363, device='xla:1') | |
2020-07-20 20:38:27 | INFO | train_inner | epoch 001: 900 / 81036 loss=10.569, ppl=1519.32, wps=52.2, ups=0.03, wpb=1793, bsz=64, num_updates=900, lr=9e-06, gnorm=1.902, train_wall=12, wall=369 | |
RAWLOSS @ 1000 tensor(1818.5525, device='xla:0') | |
RAWLOSS @ 1000 tensor(1990.1970, device='xla:0') | |
RAWLOSS @ 1000 tensor(1839.7052, device='xla:0') | |
RAWLOSS @ 1000 tensor(1071.9586, device='xla:1') | |
RAWLOSS @ 1000 tensor(1533.0587, device='xla:0') | |
RAWLOSS @ 1000 tensor(1177.7563, device='xla:0') | |
RAWLOSS @ 1000 tensor(1397.5411, device='xla:0') | |
RAWLOSS @ 1000 tensor(1775.6034, device='xla:0') | |
2020-07-20 20:39:01 | INFO | train_inner | epoch 001: 1000 / 81036 loss=10.262, ppl=1227.92, wps=52, ups=0.03, wpb=1772, bsz=64, num_updates=1000, lr=1e-05, gnorm=1.954, train_wall=12, wall=403 | |
RAWLOSS @ 1100 tensor(1649.9690, device='xla:0') | |
RAWLOSS @ 1100 tensor(1729.4590, device='xla:0') | |
RAWLOSS @ 1100 tensor(1744.8107, device='xla:0') | |
RAWLOSS @ 1100 tensor(1423.1836, device='xla:0') | |
RAWLOSS @ 1100 tensor(1465.9176, device='xla:0') | |
RAWLOSS @ 1100 tensor(1746.0730, device='xla:0') | |
RAWLOSS @ 1100 tensor(1902.0587, device='xla:1') | |
RAWLOSS @ 1100 tensor(1854.7543, device='xla:0') | |
RAWLOSS @ 1200 tensor(1120.9475, device='xla:0') | |
RAWLOSS @ 1200 tensor(2651.9900, device='xla:0') | |
RAWLOSS @ 1200 tensor(1851.4359, device='xla:0') | |
RAWLOSS @ 1200 tensor(1922.0718, device='xla:0') | |
RAWLOSS @ 1200 tensor(787.5982, device='xla:0') | |
RAWLOSS @ 1200 tensor(1496.5520, device='xla:1') | |
RAWLOSS @ 1200 tensor(2067.7168, device='xla:0') | |
RAWLOSS @ 1200 tensor(1216.9918, device='xla:0') | |
RAWLOSS @ 1300 tensor(1950.9186, device='xla:0') | |
RAWLOSS @ 1300 tensor(1934.1382, device='xla:0') | |
RAWLOSS @ 1300 tensor(1986.0408, device='xla:0') | |
RAWLOSS @ 1300 tensor(1266.9921, device='xla:0') | |
RAWLOSS @ 1300 tensor(1555.1661, device='xla:0') | |
RAWLOSS @ 1300 tensor(1688.6029, device='xla:0') | |
RAWLOSS @ 1300 tensor(3904.7627, device='xla:1') | |
RAWLOSS @ 1300 tensor(1987.5448, device='xla:0') | |
RAWLOSS @ 1400 tensor(2353.2351, device='xla:0') | |
RAWLOSS @ 1400 tensor(1423.8197, device='xla:0') | |
RAWLOSS @ 1400 tensor(1797.4116, device='xla:0') | |
RAWLOSS @ 1400 tensor(1382.4547, device='xla:0') | |
RAWLOSS @ 1400 tensor(1763.2184, device='xla:1') | |
RAWLOSS @ 1400 tensor(1751.9918, device='xla:0') | |
RAWLOSS @ 1400 tensor(1663.2499, device='xla:0') | |
RAWLOSS @ 1400 tensor(2144.2837, device='xla:0') | |
RAWLOSS @ 1500 tensor(1556.6667, device='xla:0') | |
RAWLOSS @ 1500 tensor(1770.3358, device='xla:1') | |
RAWLOSS @ 1500 tensor(1429.5319, device='xla:0') | |
RAWLOSS @ 1500 tensor(1411.8625, device='xla:0') | |
RAWLOSS @ 1500 tensor(1295.8887, device='xla:0') | |
RAWLOSS @ 1500 tensor(1495.4001, device='xla:0') | |
RAWLOSS @ 1500 tensor(1589.7419, device='xla:0') | |
RAWLOSS @ 1500 tensor(1606.0579, device='xla:0') | |
RAWLOSS @ 1600 tensor(1043.6141, device='xla:0') | |
RAWLOSS @ 1600 tensor(1415.6194, device='xla:0') | |
RAWLOSS @ 1600 tensor(2111.9822, device='xla:1') | |
RAWLOSS @ 1600 tensor(1848.3516, device='xla:0') | |
RAWLOSS @ 1600 tensor(1574.2125, device='xla:0') | |
RAWLOSS @ 1600 tensor(2576.0010, device='xla:0') | |
RAWLOSS @ 1600 tensor(1435.9971, device='xla:0') | |
RAWLOSS @ 1600 tensor(2283.5654, device='xla:0') | |
RAWLOSS @ 1700 tensor(1595.3772, device='xla:0') | |
RAWLOSS @ 1700 tensor(1805.8171, device='xla:1') | |
RAWLOSS @ 1700 tensor(1852.5198, device='xla:0') | |
RAWLOSS @ 1700 tensor(1582.0444, device='xla:0') | |
RAWLOSS @ 1700 tensor(2197.1626, device='xla:0') | |
RAWLOSS @ 1700 tensor(1510.3783, device='xla:0') | |
RAWLOSS @ 1700 tensor(1680.5950, device='xla:0') | |
RAWLOSS @ 1700 tensor(1389.6589, device='xla:0') | |
RAWLOSS @ 1800 tensor(1386.3527, device='xla:0') | |
RAWLOSS @ 1800 tensor(1477.1746, device='xla:0') | |
RAWLOSS @ 1800 tensor(1316.0349, device='xla:0') | |
RAWLOSS @ 1800 tensor(1313.9607, device='xla:1') | |
RAWLOSS @ 1800 tensor(1534.0713, device='xla:0') | |
RAWLOSS @ 1800 tensor(2442.3997, device='xla:0') | |
RAWLOSS @ 1800 tensor(2022.6328, device='xla:0') | |
RAWLOSS @ 1800 tensor(1580.3710, device='xla:0') | |
2020-07-20 20:40:39 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 10.362 | ppl 1316 | wps 16156.2 | wpb 1889.5 | bsz 64 | num_updates 1000 | |
2020-07-20 20:41:22 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_1_1000.pt (epoch 1 @ 1000 updates, score 10.362) (writing took 42.88991188723594 seconds) | |
2020-07-20 20:41:22 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning | |
RAWLOSS @ 1900 tensor(1409.9108, device='xla:0') | |
RAWLOSS @ 1900 tensor(2181.8125, device='xla:1') | |
RAWLOSS @ 1900 tensor(976.1409, device='xla:0') | |
RAWLOSS @ 1900 tensor(1605.6019, device='xla:0') | |
RAWLOSS @ 1900 tensor(950.1663, device='xla:0') | |
RAWLOSS @ 1900 tensor(1500.7196, device='xla:0') | |
RAWLOSS @ 1900 tensor(1447.4484, device='xla:0') | |
RAWLOSS @ 1900 tensor(1580.1951, device='xla:0') | |
2020-07-20 20:41:56 | INFO | train_inner | epoch 001: 1100 / 81036 loss=10.11, ppl=1105.21, wps=11.1, ups=0.01, wpb=1935, bsz=64, num_updates=1100, lr=1.1e-05, gnorm=1.91, train_wall=12, wall=578 | |
RAWLOSS @ 2000 tensor(1711.6940, device='xla:0') | |
RAWLOSS @ 2000 tensor(1366.3888, device='xla:0') | |
RAWLOSS @ 2000 tensor(1673.8571, device='xla:0') | |
RAWLOSS @ 2000 tensor(1178.6492, device='xla:0') | |
RAWLOSS @ 2000 tensor(1122.1761, device='xla:1') | |
RAWLOSS @ 2000 tensor(1364.6173, device='xla:0') | |
RAWLOSS @ 2000 tensor(1503.6803, device='xla:0') | |
RAWLOSS @ 2000 tensor(1768.3527, device='xla:0') | |
2020-07-20 20:42:30 | INFO | train_inner | epoch 001: 1200 / 81036 loss=10.348, ppl=1303.44, wps=53.1, ups=0.03, wpb=1817, bsz=64, num_updates=1200, lr=1.2e-05, gnorm=1.701, train_wall=12, wall=612 | |
RAWLOSS @ 2100 tensor(1645.3506, device='xla:0') | |
RAWLOSS @ 2100 tensor(1504.4690, device='xla:0') | |
RAWLOSS @ 2100 tensor(1770.7802, device='xla:0') | |
RAWLOSS @ 2100 tensor(1804.2430, device='xla:0') | |
RAWLOSS @ 2100 tensor(1650.1572, device='xla:0') | |
RAWLOSS @ 2100 tensor(1712.3865, device='xla:1') | |
RAWLOSS @ 2100 tensor(1658.8093, device='xla:0') | |
RAWLOSS @ 2100 tensor(2189.2732, device='xla:0') | |
2020-07-20 20:43:04 | INFO | train_inner | epoch 001: 1300 / 81036 loss=10.385, ppl=1337.29, wps=51.3, ups=0.03, wpb=1757, bsz=64, num_updates=1300, lr=1.3e-05, gnorm=1.967, train_wall=12, wall=646 | |
RAWLOSS @ 2200 tensor(741.3139, device='xla:1') | |
RAWLOSS @ 2200 tensor(1397.9999, device='xla:0') | |
RAWLOSS @ 2200 tensor(1306.1820, device='xla:0') | |
RAWLOSS @ 2200 tensor(1816.5071, device='xla:0') | |
RAWLOSS @ 2200 tensor(1712.6990, device='xla:0') | |
RAWLOSS @ 2200 tensor(1625.7528, device='xla:0') | |
RAWLOSS @ 2200 tensor(1509.3036, device='xla:0') | |
RAWLOSS @ 2200 tensor(1460.7605, device='xla:0') | |
2020-07-20 20:43:39 | INFO | train_inner | epoch 001: 1400 / 81036 loss=9.436, ppl=692.77, wps=53.9, ups=0.03, wpb=1846, bsz=64, num_updates=1400, lr=1.4e-05, gnorm=2.271, train_wall=11, wall=681 | |
RAWLOSS @ 2300 tensor(1256.1327, device='xla:0') | |
RAWLOSS @ 2300 tensor(1919.0973, device='xla:0') | |
RAWLOSS @ 2300 tensor(1033.0536, device='xla:0') | |
RAWLOSS @ 2300 tensor(1770.2001, device='xla:0') | |
RAWLOSS @ 2300 tensor(1356.5675, device='xla:0') | |
RAWLOSS @ 2300 tensor(1281.6326, device='xla:0') | |
RAWLOSS @ 2300 tensor(1896.7076, device='xla:1') | |
RAWLOSS @ 2300 tensor(2044.0779, device='xla:0') | |
2020-07-20 20:44:14 | INFO | train_inner | epoch 001: 1500 / 81036 loss=9.944, ppl=985.28, wps=56.6, ups=0.03, wpb=2018, bsz=64, num_updates=1500, lr=1.5e-05, gnorm=2.032, train_wall=12, wall=716 | |
RAWLOSS @ 2400 tensor(1075.7241, device='xla:0') | |
RAWLOSS @ 2400 tensor(1548.3057, device='xla:0') | |
RAWLOSS @ 2400 tensor(890.2056, device='xla:0') | |
RAWLOSS @ 2400 tensor(2199.2112, device='xla:1') | |
RAWLOSS @ 2400 tensor(1585.5078, device='xla:0') | |
RAWLOSS @ 2400 tensor(1897.6166, device='xla:0') | |
RAWLOSS @ 2400 tensor(2349.6912, device='xla:0') | |
RAWLOSS @ 2400 tensor(1568.6990, device='xla:0') | |
2020-07-20 20:44:49 | INFO | train_inner | epoch 001: 1600 / 81036 loss=9.346, ppl=650.73, wps=57.3, ups=0.03, wpb=2016, bsz=64, num_updates=1600, lr=1.6e-05, gnorm=2.237, train_wall=13, wall=752 | |
RAWLOSS @ 2500 tensor(962.3625, device='xla:0') | |
RAWLOSS @ 2500 tensor(1415.0819, device='xla:1') | |
RAWLOSS @ 2500 tensor(1357.3680, device='xla:0') | |
RAWLOSS @ 2500 tensor(1532.6869, device='xla:0') | |
RAWLOSS @ 2500 tensor(1903.3213, device='xla:0') | |
RAWLOSS @ 2500 tensor(1041.7490, device='xla:0') | |
RAWLOSS @ 2500 tensor(1649.4910, device='xla:0') | |
RAWLOSS @ 2500 tensor(1362.5601, device='xla:0') | |
2020-07-20 20:45:23 | INFO | train_inner | epoch 001: 1700 / 81036 loss=9.59, ppl=770.76, wps=52.8, ups=0.03, wpb=1793, bsz=64, num_updates=1700, lr=1.7e-05, gnorm=2.214, train_wall=12, wall=786 | |
RAWLOSS @ 2600 tensor(1433.6476, device='xla:0') | |
RAWLOSS @ 2600 tensor(1117.7612, device='xla:0') | |
RAWLOSS @ 2600 tensor(861.6875, device='xla:0') | |
RAWLOSS @ 2600 tensor(1551.6271, device='xla:0') | |
RAWLOSS @ 2600 tensor(1738.4299, device='xla:1') | |
RAWLOSS @ 2600 tensor(2004.2317, device='xla:0') | |
RAWLOSS @ 2600 tensor(1511.3901, device='xla:0') | |
RAWLOSS @ 2600 tensor(1881.3784, device='xla:0') | |
2020-07-20 20:45:57 | INFO | train_inner | epoch 001: 1800 / 81036 loss=9.22, ppl=596.38, wps=61.4, ups=0.03, wpb=2089, bsz=64, num_updates=1800, lr=1.8e-05, gnorm=2.869, train_wall=12, wall=820 | |
RAWLOSS @ 2700 tensor(1764.5472, device='xla:0') | |
RAWLOSS @ 2700 tensor(1887.2849, device='xla:0') | |
RAWLOSS @ 2700 tensor(1853.2770, device='xla:0') | |
RAWLOSS @ 2700 tensor(1354.0685, device='xla:1') | |
RAWLOSS @ 2700 tensor(1929.4672, device='xla:0') | |
RAWLOSS @ 2700 tensor(968.1652, device='xla:0') | |
RAWLOSS @ 2700 tensor(1323.0848, device='xla:0') | |
RAWLOSS @ 2700 tensor(1847.4006, device='xla:0') | |
2020-07-20 20:46:32 | INFO | train_inner | epoch 001: 1900 / 81036 loss=9.691, ppl=826.7, wps=60.1, ups=0.03, wpb=2059, bsz=64, num_updates=1900, lr=1.9e-05, gnorm=2.261, train_wall=11, wall=854 | |
RAWLOSS @ 2800 tensor(1179.4827, device='xla:0') | |
RAWLOSS @ 2800 tensor(1810.1344, device='xla:0') | |
RAWLOSS @ 2800 tensor(1247.4254, device='xla:0') | |
RAWLOSS @ 2800 tensor(1511.3344, device='xla:0') | |
RAWLOSS @ 2800 tensor(1387.0867, device='xla:0') | |
RAWLOSS @ 2800 tensor(1970.2271, device='xla:0') | |
RAWLOSS @ 2800 tensor(1260.4536, device='xla:0') | |
RAWLOSS @ 2800 tensor(1428.8004, device='xla:1') | |
2020-07-20 20:47:06 | INFO | train_inner | epoch 001: 2000 / 81036 loss=8.914, ppl=482.34, wps=57.1, ups=0.03, wpb=1949, bsz=64, num_updates=2000, lr=2e-05, gnorm=1.946, train_wall=12, wall=888 | |
RAWLOSS @ 2900 tensor(2130.4285, device='xla:0') | |
RAWLOSS @ 2900 tensor(1291.1876, device='xla:0') | |
RAWLOSS @ 2900 tensor(1186.7540, device='xla:0') | |
RAWLOSS @ 2900 tensor(881.2741, device='xla:0') | |
RAWLOSS @ 2900 tensor(1175.3256, device='xla:0') | |
RAWLOSS @ 2900 tensor(1722.8356, device='xla:0') | |
RAWLOSS @ 2900 tensor(683.1208, device='xla:1') | |
RAWLOSS @ 2900 tensor(1670.6589, device='xla:0') | |
RAWLOSS @ 3000 tensor(2093.8652, device='xla:0') | |
RAWLOSS @ 3000 tensor(1373.8485, device='xla:0') | |
RAWLOSS @ 3000 tensor(1307.9662, device='xla:0') | |
RAWLOSS @ 3000 tensor(1185.1688, device='xla:0') | |
RAWLOSS @ 3000 tensor(1613.1737, device='xla:0') | |
RAWLOSS @ 3000 tensor(1396.5134, device='xla:1') | |
RAWLOSS @ 3000 tensor(1617.2462, device='xla:0') | |
RAWLOSS @ 3000 tensor(1601.1096, device='xla:0') | |
RAWLOSS @ 3100 tensor(1415.2422, device='xla:0') | |
RAWLOSS @ 3100 tensor(1636.2690, device='xla:0') | |
RAWLOSS @ 3100 tensor(1290.3104, device='xla:1') | |
RAWLOSS @ 3100 tensor(1212.2844, device='xla:0') | |
RAWLOSS @ 3100 tensor(1369.2926, device='xla:0') | |
RAWLOSS @ 3100 tensor(1614.9175, device='xla:0') | |
RAWLOSS @ 3100 tensor(1404.9669, device='xla:0') | |
RAWLOSS @ 3100 tensor(1441.8110, device='xla:0') | |
RAWLOSS @ 3200 tensor(1324.9653, device='xla:0') | |
RAWLOSS @ 3200 tensor(837.5519, device='xla:0') | |
RAWLOSS @ 3200 tensor(1499.7574, device='xla:0') | |
RAWLOSS @ 3200 tensor(1189.5745, device='xla:0') | |
RAWLOSS @ 3200 tensor(1908.1287, device='xla:0') | |
RAWLOSS @ 3200 tensor(1531.9708, device='xla:0') | |
RAWLOSS @ 3200 tensor(1667.5598, device='xla:1') | |
RAWLOSS @ 3200 tensor(1652.9093, device='xla:0') | |
RAWLOSS @ 3300 tensor(1473.4531, device='xla:0') | |
RAWLOSS @ 3300 tensor(955.8904, device='xla:0') | |
RAWLOSS @ 3300 tensor(1278.3154, device='xla:0') | |
RAWLOSS @ 3300 tensor(1414.3846, device='xla:0') | |
RAWLOSS @ 3300 tensor(1417.2324, device='xla:1') | |
RAWLOSS @ 3300 tensor(1852.7939, device='xla:0') | |
RAWLOSS @ 3300 tensor(824.7117, device='xla:0') | |
RAWLOSS @ 3300 tensor(1249.7942, device='xla:0') | |
RAWLOSS @ 3400 tensor(1740.2345, device='xla:0') | |
RAWLOSS @ 3400 tensor(1808.3588, device='xla:1') | |
RAWLOSS @ 3400 tensor(1330.5613, device='xla:0') | |
RAWLOSS @ 3400 tensor(1269.6254, device='xla:0') | |
RAWLOSS @ 3400 tensor(1708.9253, device='xla:0') | |
RAWLOSS @ 3400 tensor(1880.5201, device='xla:0') | |
RAWLOSS @ 3400 tensor(1607.5392, device='xla:0') | |
RAWLOSS @ 3400 tensor(1432.4714, device='xla:0') | |
RAWLOSS @ 3500 tensor(1701.1743, device='xla:0') | |
RAWLOSS @ 3500 tensor(1393.0560, device='xla:1') | |
RAWLOSS @ 3500 tensor(1385.7749, device='xla:0') | |
RAWLOSS @ 3500 tensor(1365.4130, device='xla:0') | |
RAWLOSS @ 3500 tensor(1775.6072, device='xla:0') | |
RAWLOSS @ 3500 tensor(1931.9413, device='xla:0') | |
RAWLOSS @ 3500 tensor(1142.4243, device='xla:0') | |
RAWLOSS @ 3500 tensor(1310.2847, device='xla:0') | |
RAWLOSS @ 3600 tensor(1148.5848, device='xla:0') | |
RAWLOSS @ 3600 tensor(1528.9443, device='xla:1') | |
RAWLOSS @ 3600 tensor(1755.8218, device='xla:0') | |
RAWLOSS @ 3600 tensor(1317.0939, device='xla:0') | |
RAWLOSS @ 3600 tensor(1250.5432, device='xla:0') | |
RAWLOSS @ 3600 tensor(1637.4978, device='xla:0') | |
RAWLOSS @ 3600 tensor(1392.2312, device='xla:0') | |
RAWLOSS @ 3600 tensor(1144.7692, device='xla:0') | |
2020-07-20 20:48:27 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 9.308 | ppl 633.82 | wps 19526.8 | wpb 1889.5 | bsz 64 | num_updates 2000 | best_loss 9.308 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I pulled fairseq code from master branch.
run
python train.py $HOME/pytorch-tutorial-data/wmt18_en_de_bpej32k --arch=transformer_vaswani_wmt_en_de_big -s en -t de --criterion cross_entropy --encoder-normalize-before --decoder-normalize-before --task translation --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --lr-scheduler polynomial_decay --lr 1e-04 --min-lr -1 --warmup-updates 10000 --total-num-update 500000 --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.0 --max-tokens 2052 --seed 2 --log-format simple --log-interval 100 --max-source-positions 1026 --max-target-positions 1026 --save-interval-updates 5000 --skip-invalid-size-inputs-valid-test --num-batch-buckets 1 --save-dir dummy_trans --tpu --distributed-world-size 8 --bf 16
and here is my logs.
2020-07-22 04:52:57 | INFO | fairseq_cli.train | model transformer_vaswani_wmt_en_de_big, criterion CrossEntropyCriterion
2020-07-22 04:52:57 | INFO | fairseq_cli.train | num. model params: 285915136 (num. trained: 285915136)
2020-07-22 04:53:03 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-07-22 04:53:03 | INFO | fairseq_cli.train | max tokens per GPU = 2052 and max sentences per GPU = None
2020-07-22 04:53:03 | INFO | fairseq.trainer | no existing checkpoint found dummy_trans/checkpoint_last.pt
2020-07-22 04:53:03 | INFO | fairseq.trainer | loading train data for epoch 1
2020-07-22 04:53:05 | INFO | fairseq.data.data_utils | loaded 5186259 examples from: /home/yinhanliu/pytorch-tutorial-data/wmt18_en_de_bpej32k/train.en-de.en
2020-07-22 04:53:07 | INFO | fairseq.data.data_utils | loaded 5186259 examples from: /home/yinhanliu/pytorch-tutorial-data/wmt18_en_de_bpej32k/train.en-de.de
2020-07-22 04:53:07 | INFO | fairseq.tasks.translation | /home/yinhanliu/pytorch-tutorial-data/wmt18_en_de_bpej32k train en-de 5186259 examples
2020-07-22 04:53:07 | INFO | fairseq.data.language_pair_dataset | bucketing source lengths: [251]
2020-07-22 04:53:07 | INFO | fairseq.data.language_pair_dataset | bucketing target lengths: [251]
2020-07-22 04:54:08 | INFO | fairseq_cli.train | begin training epoch 1
2020-07-22 04:55:09 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:55:37 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:56:09 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:56:40 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:56:40 | INFO | train_inner | epoch 001: 100 / 81036 loss=14.464, ppl=22603.3, wps=0, ups=0, wpb=2187, bsz=64, num_updates=100, lr=1e-06, gnorm=7.729, train_wall=98, wall=216
2020-07-22 04:56:40 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:57:13 | INFO | train_inner | epoch 001: 200 / 81036 loss=13.142, ppl=9036.69, wps=52.8, ups=0.03, wpb=1776, bsz=64, num_updates=200, lr=2e-06, gnorm=5.436, train_wall=16, wall=250
2020-07-22 04:57:13 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:57:47 | INFO | train_inner | epoch 001: 300 / 81036 loss=12.628, ppl=6330.36, wps=49.9, ups=0.03, wpb=1688, bsz=64, num_updates=300, lr=3e-06, gnorm=4.804, train_wall=16, wall=284
2020-07-22 04:57:47 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 04:58:21 | INFO | train_inner | epoch 001: 400 / 81036 loss=12.268, ppl=4932.25, wps=60.8, ups=0.03, wpb=2048, bsz=64, num_updates=400, lr=4e-06, gnorm=3.261, train_wall=16, wall=317
2020-07-22 04:58:54 | INFO | train_inner | epoch 001: 500 / 81036 loss=12.049, ppl=4238.2, wps=52.9, ups=0.03, wpb=1769, bsz=64, num_updates=500, lr=5e-06, gnorm=4.143, train_wall=16, wall=351
2020-07-22 04:59:28 | INFO | train_inner | epoch 001: 600 / 81036 loss=11.772, ppl=3498.23, wps=53, ups=0.03, wpb=1776, bsz=64, num_updates=600, lr=6e-06, gnorm=3.532, train_wall=16, wall=384
2020-07-22 05:00:01 | INFO | train_inner | epoch 001: 700 / 81036 loss=11.453, ppl=2804.32, wps=58, ups=0.03, wpb=1951, bsz=64, num_updates=700, lr=7e-06, gnorm=3.642, train_wall=16, wall=418
2020-07-22 05:00:35 | INFO | train_inner | epoch 001: 800 / 81036 loss=11.068, ppl=2146.65, wps=56.8, ups=0.03, wpb=1901, bsz=64, num_updates=800, lr=8e-06, gnorm=4.044, train_wall=16, wall=452
BUT I AM ABLE TO REPRODUCE YOUR NUMBERS WITHOUT BF16!!!!