Created
July 20, 2020 20:51
-
-
Save taylanbil/9c30dd4e2902db8fc0ac145f5dc79fb8 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RAWLOSS @ 100 tensor(1759.2184, device='cuda:7') | |
RAWLOSS @ 100 tensor(2361.1760, device='cuda:4') | |
RAWLOSS @ 100 tensor(3937.2319, device='cuda:2') | |
RAWLOSS @ 100 tensor(2799.5732, device='cuda:0') | |
RAWLOSS @ 100 tensor(1954.5380, device='cuda:5') | |
RAWLOSS @ 100 tensor(2972.5251, device='cuda:6') | |
RAWLOSS @ 100 tensor(2046.4896, device='cuda:1') | |
RAWLOSS @ 100 tensor(3103.4412, device='cuda:3') | |
2020-07-20 20:33:19 | INFO | train_inner | epoch 001: 100 / 81036 loss=14.994, ppl=32621.7, wps=11385.2, ups=5.98, wpb=1903.2, bsz=64, num_updates=100, lr=1e-06, gnorm=6.203, loss_scale=128, train_wall=18, wall=64 | |
RAWLOSS @ 200 tensor(2411.5215, device='cuda:7') | |
RAWLOSS @ 200 tensor(2397.7852, device='cuda:4') | |
RAWLOSS @ 200 tensor(1564.6809, device='cuda:1') | |
RAWLOSS @ 200 tensor(1628.1787, device='cuda:5') | |
RAWLOSS @ 200 tensor(1442.4983, device='cuda:6') | |
RAWLOSS @ 200 tensor(1909.0181, device='cuda:3') | |
RAWLOSS @ 200 tensor(1955.7889, device='cuda:0') | |
RAWLOSS @ 200 tensor(2427.7900, device='cuda:2') | |
2020-07-20 20:33:35 | INFO | train_inner | epoch 001: 200 / 81036 loss=13.256, ppl=9782.21, wps=11376.5, ups=5.98, wpb=1903.3, bsz=64, num_updates=200, lr=2e-06, gnorm=2.649, loss_scale=128, train_wall=17, wall=81 | |
RAWLOSS @ 300 tensor(1499.0195, device='cuda:4') | |
RAWLOSS @ 300 tensor(2165.2576, device='cuda:7') | |
RAWLOSS @ 300 tensor(2041.5122, device='cuda:1') | |
RAWLOSS @ 300 tensor(2331.9780, device='cuda:3') | |
RAWLOSS @ 300 tensor(1359.0815, device='cuda:5') | |
RAWLOSS @ 300 tensor(1357.4939, device='cuda:6') | |
RAWLOSS @ 300 tensor(1671.8204, device='cuda:0') | |
RAWLOSS @ 300 tensor(2032.6752, device='cuda:2') | |
2020-07-20 20:33:52 | INFO | train_inner | epoch 001: 300 / 81036 loss=12.642, ppl=6391.1, wps=11351.1, ups=5.98, wpb=1898.2, bsz=64, num_updates=300, lr=3e-06, gnorm=1.998, loss_scale=128, train_wall=17, wall=98 | |
RAWLOSS @ 400 tensor(1621.4983, device='cuda:4') | |
RAWLOSS @ 400 tensor(1330.5886, device='cuda:3') | |
RAWLOSS @ 400 tensor(2422.7766, device='cuda:5') | |
RAWLOSS @ 400 tensor(2012.2313, device='cuda:7') | |
RAWLOSS @ 400 tensor(2639.2793, device='cuda:1') | |
RAWLOSS @ 400 tensor(2962.9961, device='cuda:2') | |
RAWLOSS @ 400 tensor(2582.1494, device='cuda:6') | |
RAWLOSS @ 400 tensor(1408.4751, device='cuda:0') | |
2020-07-20 20:34:09 | INFO | train_inner | epoch 001: 400 / 81036 loss=12.232, ppl=4808.94, wps=11366.3, ups=5.98, wpb=1901, bsz=64, num_updates=400, lr=4e-06, gnorm=1.782, loss_scale=128, train_wall=17, wall=114 | |
RAWLOSS @ 500 tensor(2101.3281, device='cuda:7') | |
RAWLOSS @ 500 tensor(1990.6239, device='cuda:4') | |
RAWLOSS @ 500 tensor(2170.3997, device='cuda:5') | |
RAWLOSS @ 500 tensor(1190.2463, device='cuda:3') | |
RAWLOSS @ 500 tensor(1854.5607, device='cuda:2') | |
RAWLOSS @ 500 tensor(1421.9266, device='cuda:1') | |
RAWLOSS @ 500 tensor(1715.4827, device='cuda:6') | |
RAWLOSS @ 500 tensor(1837.8611, device='cuda:0') | |
2020-07-20 20:34:25 | INFO | train_inner | epoch 001: 500 / 81036 loss=11.863, ppl=3724.61, wps=11346.4, ups=5.98, wpb=1898.6, bsz=64, num_updates=500, lr=5e-06, gnorm=1.677, loss_scale=128, train_wall=17, wall=131 | |
RAWLOSS @ 600 tensor(1741.2328, device='cuda:5') | |
RAWLOSS @ 600 tensor(1931.7069, device='cuda:4') | |
RAWLOSS @ 600 tensor(2109.1321, device='cuda:7') | |
RAWLOSS @ 600 tensor(1572.4634, device='cuda:1') | |
RAWLOSS @ 600 tensor(1736.2273, device='cuda:2') | |
RAWLOSS @ 600 tensor(1897.7574, device='cuda:6') | |
RAWLOSS @ 600 tensor(1575.5797, device='cuda:3') | |
RAWLOSS @ 600 tensor(1393.9969, device='cuda:0') | |
2020-07-20 20:34:42 | INFO | train_inner | epoch 001: 600 / 81036 loss=11.472, ppl=2840.62, wps=11289.3, ups=5.98, wpb=1887.6, bsz=64, num_updates=600, lr=6e-06, gnorm=1.597, loss_scale=128, train_wall=17, wall=148 | |
RAWLOSS @ 700 tensor(2078.4817, device='cuda:7') | |
RAWLOSS @ 700 tensor(1459.3020, device='cuda:3') | |
RAWLOSS @ 700 tensor(1954.3855, device='cuda:0') | |
RAWLOSS @ 700 tensor(1131.9280, device='cuda:6') | |
RAWLOSS @ 700 tensor(1785.8784, device='cuda:5') | |
RAWLOSS @ 700 tensor(2514.1399, device='cuda:4') | |
RAWLOSS @ 700 tensor(2110.8450, device='cuda:2') | |
RAWLOSS @ 700 tensor(1843.9750, device='cuda:1') | |
2020-07-20 20:34:59 | INFO | train_inner | epoch 001: 700 / 81036 loss=11.123, ppl=2230.18, wps=11321.8, ups=5.99, wpb=1889.5, bsz=64, num_updates=700, lr=7e-06, gnorm=1.591, loss_scale=128, train_wall=17, wall=164 | |
RAWLOSS @ 800 tensor(2222.2434, device='cuda:7') | |
RAWLOSS @ 800 tensor(2315.1426, device='cuda:4') | |
RAWLOSS @ 800 tensor(1422.5798, device='cuda:2') | |
RAWLOSS @ 800 tensor(1832.8357, device='cuda:5') | |
RAWLOSS @ 800 tensor(2100.2158, device='cuda:0') | |
RAWLOSS @ 800 tensor(1353.9969, device='cuda:1') | |
RAWLOSS @ 800 tensor(1078.9521, device='cuda:6') | |
RAWLOSS @ 800 tensor(1601.1827, device='cuda:3') | |
2020-07-20 20:35:16 | INFO | train_inner | epoch 001: 800 / 81036 loss=10.822, ppl=1810.86, wps=11509.7, ups=5.99, wpb=1921, bsz=64, num_updates=800, lr=8e-06, gnorm=1.661, loss_scale=128, train_wall=17, wall=181 | |
RAWLOSS @ 900 tensor(1358.7069, device='cuda:5') | |
RAWLOSS @ 900 tensor(2001.2806, device='cuda:2') | |
RAWLOSS @ 900 tensor(2326.0085, device='cuda:4') | |
RAWLOSS @ 900 tensor(1394.9055, device='cuda:0') | |
RAWLOSS @ 900 tensor(1583.6710, device='cuda:7') | |
RAWLOSS @ 900 tensor(1365.3242, device='cuda:6') | |
RAWLOSS @ 900 tensor(1670.3481, device='cuda:1') | |
RAWLOSS @ 900 tensor(1435.4464, device='cuda:3') | |
2020-07-20 20:35:32 | INFO | train_inner | epoch 001: 900 / 81036 loss=10.631, ppl=1585.66, wps=11266.5, ups=5.99, wpb=1880.3, bsz=64, num_updates=900, lr=9e-06, gnorm=1.715, loss_scale=128, train_wall=17, wall=198 | |
RAWLOSS @ 1000 tensor(1818.5881, device='cuda:7') | |
RAWLOSS @ 1000 tensor(1177.7251, device='cuda:2') | |
RAWLOSS @ 1000 tensor(1533.1217, device='cuda:4') | |
RAWLOSS @ 1000 tensor(1397.4106, device='cuda:3') | |
RAWLOSS @ 1000 tensor(1990.2612, device='cuda:6') | |
RAWLOSS @ 1000 tensor(1071.9924, device='cuda:0') | |
RAWLOSS @ 1000 tensor(1775.7512, device='cuda:5') | |
RAWLOSS @ 1000 tensor(1839.6976, device='cuda:1') | |
2020-07-20 20:35:49 | INFO | train_inner | epoch 001: 1000 / 81036 loss=10.425, ppl=1374.46, wps=11259.9, ups=5.98, wpb=1882.3, bsz=64, num_updates=1000, lr=1e-05, gnorm=1.791, loss_scale=128, train_wall=17, wall=215 | |
RAWLOSS @ 1100 tensor(1746.0033, device='cuda:2') | |
RAWLOSS @ 1100 tensor(1465.9130, device='cuda:5') | |
RAWLOSS @ 1100 tensor(1744.7491, device='cuda:6') | |
RAWLOSS @ 1100 tensor(1423.1323, device='cuda:3') | |
RAWLOSS @ 1100 tensor(1854.6252, device='cuda:1') | |
RAWLOSS @ 1100 tensor(1649.8682, device='cuda:7') | |
RAWLOSS @ 1100 tensor(1902.0649, device='cuda:0') | |
RAWLOSS @ 1100 tensor(1729.4275, device='cuda:4') | |
RAWLOSS @ 1200 tensor(1851.4042, device='cuda:1') | |
RAWLOSS @ 1200 tensor(2067.7593, device='cuda:5') | |
RAWLOSS @ 1200 tensor(1922.0482, device='cuda:6') | |
RAWLOSS @ 1200 tensor(1496.4819, device='cuda:0') | |
RAWLOSS @ 1200 tensor(2652.0574, device='cuda:2') | |
RAWLOSS @ 1200 tensor(1120.8887, device='cuda:3') | |
RAWLOSS @ 1200 tensor(787.6282, device='cuda:7') | |
RAWLOSS @ 1200 tensor(1216.9780, device='cuda:4') | |
RAWLOSS @ 1300 tensor(3905.0183, device='cuda:0') | |
RAWLOSS @ 1300 tensor(1987.4576, device='cuda:1') | |
RAWLOSS @ 1300 tensor(1555.0515, device='cuda:5') | |
RAWLOSS @ 1300 tensor(1986.0359, device='cuda:3') | |
RAWLOSS @ 1300 tensor(1688.5969, device='cuda:2') | |
RAWLOSS @ 1300 tensor(1266.9993, device='cuda:7') | |
RAWLOSS @ 1300 tensor(1950.9441, device='cuda:4') | |
RAWLOSS @ 1300 tensor(1934.2272, device='cuda:6') | |
RAWLOSS @ 1400 tensor(1763.2106, device='cuda:0') | |
RAWLOSS @ 1400 tensor(1382.3062, device='cuda:1') | |
RAWLOSS @ 1400 tensor(1663.3346, device='cuda:3') | |
RAWLOSS @ 1400 tensor(2144.2966, device='cuda:2') | |
RAWLOSS @ 1400 tensor(1423.8508, device='cuda:6') | |
RAWLOSS @ 1400 tensor(2353.2927, device='cuda:5') | |
RAWLOSS @ 1400 tensor(1797.4767, device='cuda:4') | |
RAWLOSS @ 1400 tensor(1752.0325, device='cuda:7') | |
RAWLOSS @ 1500 tensor(1589.7679, device='cuda:3') | |
RAWLOSS @ 1500 tensor(1606.0082, device='cuda:2') | |
RAWLOSS @ 1500 tensor(1295.9928, device='cuda:6') | |
RAWLOSS @ 1500 tensor(1770.3326, device='cuda:0') | |
RAWLOSS @ 1500 tensor(1411.9240, device='cuda:5') | |
RAWLOSS @ 1500 tensor(1495.4089, device='cuda:1') | |
RAWLOSS @ 1500 tensor(1556.7166, device='cuda:4') | |
RAWLOSS @ 1500 tensor(1429.4736, device='cuda:7') | |
RAWLOSS @ 1600 tensor(1415.7618, device='cuda:2') | |
RAWLOSS @ 1600 tensor(1043.6442, device='cuda:5') | |
RAWLOSS @ 1600 tensor(2576.0803, device='cuda:6') | |
RAWLOSS @ 1600 tensor(2111.9097, device='cuda:0') | |
RAWLOSS @ 1600 tensor(1848.3643, device='cuda:3') | |
RAWLOSS @ 1600 tensor(2283.4233, device='cuda:7') | |
RAWLOSS @ 1600 tensor(1435.9967, device='cuda:4') | |
RAWLOSS @ 1600 tensor(1574.2908, device='cuda:1') | |
RAWLOSS @ 1700 tensor(1510.4642, device='cuda:6') | |
RAWLOSS @ 1700 tensor(1805.7920, device='cuda:0') | |
RAWLOSS @ 1700 tensor(1852.4142, device='cuda:2') | |
RAWLOSS @ 1700 tensor(2197.0974, device='cuda:3') | |
RAWLOSS @ 1700 tensor(1595.3687, device='cuda:5') | |
RAWLOSS @ 1700 tensor(1582.0337, device='cuda:4') | |
RAWLOSS @ 1700 tensor(1389.5233, device='cuda:1') | |
RAWLOSS @ 1700 tensor(1680.5702, device='cuda:7') | |
RAWLOSS @ 1800 tensor(1313.8553, device='cuda:0') | |
RAWLOSS @ 1800 tensor(1477.1389, device='cuda:6') | |
RAWLOSS @ 1800 tensor(1534.0729, device='cuda:3') | |
RAWLOSS @ 1800 tensor(2022.6931, device='cuda:5') | |
RAWLOSS @ 1800 tensor(1315.9595, device='cuda:2') | |
RAWLOSS @ 1800 tensor(2442.5000, device='cuda:1') | |
RAWLOSS @ 1800 tensor(1580.4133, device='cuda:7') | |
RAWLOSS @ 1800 tensor(1386.3921, device='cuda:4') | |
2020-07-20 20:36:23 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 10.362 | ppl 1316 | wps 47581.5 | wpb 1889.5 | bsz 64 | num_updates 1000 | |
2020-07-20 20:36:36 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_1_1000.pt (epoch 1 @ 1000 updates, score 10.362) (writing took 12.745748163026292 seconds) | |
RAWLOSS @ 1900 tensor(1580.1207, device='cuda:4') | |
RAWLOSS @ 1900 tensor(1409.8403, device='cuda:3') | |
RAWLOSS @ 1900 tensor(1605.6798, device='cuda:5') | |
RAWLOSS @ 1900 tensor(1447.5181, device='cuda:6') | |
RAWLOSS @ 1900 tensor(1500.7025, device='cuda:7') | |
RAWLOSS @ 1900 tensor(950.2123, device='cuda:2') | |
RAWLOSS @ 1900 tensor(2181.9126, device='cuda:0') | |
RAWLOSS @ 1900 tensor(976.0250, device='cuda:1') | |
2020-07-20 20:36:52 | INFO | train_inner | epoch 001: 1100 / 81036 loss=10.315, ppl=1274.05, wps=3011.9, ups=1.58, wpb=1909.1, bsz=64, num_updates=1100, lr=1.1e-05, gnorm=1.936, loss_scale=128, train_wall=17, wall=278 | |
RAWLOSS @ 2000 tensor(1178.6830, device='cuda:5') | |
RAWLOSS @ 2000 tensor(1364.5532, device='cuda:4') | |
RAWLOSS @ 2000 tensor(1768.2903, device='cuda:1') | |
RAWLOSS @ 2000 tensor(1122.2102, device='cuda:0') | |
RAWLOSS @ 2000 tensor(1503.8456, device='cuda:2') | |
RAWLOSS @ 2000 tensor(1366.4176, device='cuda:6') | |
RAWLOSS @ 2000 tensor(1673.8319, device='cuda:7') | |
RAWLOSS @ 2000 tensor(1711.5432, device='cuda:3') | |
2020-07-20 20:37:09 | INFO | train_inner | epoch 001: 1200 / 81036 loss=10.137, ppl=1126.16, wps=11146.6, ups=5.98, wpb=1864.6, bsz=64, num_updates=1200, lr=1.2e-05, gnorm=1.862, loss_scale=128, train_wall=17, wall=295 | |
RAWLOSS @ 2100 tensor(1650.2241, device='cuda:4') | |
RAWLOSS @ 2100 tensor(1770.8967, device='cuda:1') | |
RAWLOSS @ 2100 tensor(1712.5360, device='cuda:0') | |
RAWLOSS @ 2100 tensor(1658.7856, device='cuda:5') | |
RAWLOSS @ 2100 tensor(1504.3652, device='cuda:2') | |
RAWLOSS @ 2100 tensor(1645.2926, device='cuda:3') | |
RAWLOSS @ 2100 tensor(1804.2935, device='cuda:7') | |
RAWLOSS @ 2100 tensor(2189.3533, device='cuda:6') | |
2020-07-20 20:37:26 | INFO | train_inner | epoch 001: 1300 / 81036 loss=10.05, ppl=1060.33, wps=11226.8, ups=5.97, wpb=1881.1, bsz=64, num_updates=1300, lr=1.3e-05, gnorm=1.949, loss_scale=128, train_wall=17, wall=311 | |
RAWLOSS @ 2200 tensor(1816.4875, device='cuda:5') | |
RAWLOSS @ 2200 tensor(1509.4227, device='cuda:4') | |
RAWLOSS @ 2200 tensor(741.3441, device='cuda:0') | |
RAWLOSS @ 2200 tensor(1460.7803, device='cuda:1') | |
RAWLOSS @ 2200 tensor(1397.9740, device='cuda:7') | |
RAWLOSS @ 2200 tensor(1625.7877, device='cuda:2') | |
RAWLOSS @ 2200 tensor(1306.2498, device='cuda:3') | |
RAWLOSS @ 2200 tensor(1712.5863, device='cuda:6') | |
2020-07-20 20:37:43 | INFO | train_inner | epoch 001: 1400 / 81036 loss=9.963, ppl=998.15, wps=11440.5, ups=5.98, wpb=1913.4, bsz=64, num_updates=1400, lr=1.4e-05, gnorm=2.015, loss_scale=128, train_wall=17, wall=328 | |
RAWLOSS @ 2300 tensor(2044.0695, device='cuda:7') | |
RAWLOSS @ 2300 tensor(1896.6611, device='cuda:0') | |
RAWLOSS @ 2300 tensor(1919.1024, device='cuda:4') | |
RAWLOSS @ 2300 tensor(1033.1302, device='cuda:1') | |
RAWLOSS @ 2300 tensor(1281.7118, device='cuda:6') | |
RAWLOSS @ 2300 tensor(1770.1635, device='cuda:2') | |
RAWLOSS @ 2300 tensor(1356.5090, device='cuda:3') | |
RAWLOSS @ 2300 tensor(1256.1304, device='cuda:5') | |
2020-07-20 20:37:59 | INFO | train_inner | epoch 001: 1500 / 81036 loss=9.861, ppl=929.67, wps=11270.3, ups=5.97, wpb=1887, bsz=64, num_updates=1500, lr=1.5e-05, gnorm=2.064, loss_scale=128, train_wall=17, wall=345 | |
RAWLOSS @ 2400 tensor(2349.6655, device='cuda:7') | |
RAWLOSS @ 2400 tensor(890.1895, device='cuda:2') | |
RAWLOSS @ 2400 tensor(1897.6302, device='cuda:1') | |
RAWLOSS @ 2400 tensor(1075.7246, device='cuda:5') | |
RAWLOSS @ 2400 tensor(1585.5381, device='cuda:3') | |
RAWLOSS @ 2400 tensor(1568.6381, device='cuda:6') | |
RAWLOSS @ 2400 tensor(2199.2390, device='cuda:0') | |
RAWLOSS @ 2400 tensor(1548.0806, device='cuda:4') | |
2020-07-20 20:38:16 | INFO | train_inner | epoch 001: 1600 / 81036 loss=9.776, ppl=876.55, wps=11203.2, ups=5.95, wpb=1883.9, bsz=64, num_updates=1600, lr=1.6e-05, gnorm=2.145, loss_scale=128, train_wall=17, wall=362 | |
RAWLOSS @ 2500 tensor(1362.4688, device='cuda:3') | |
RAWLOSS @ 2500 tensor(962.4784, device='cuda:1') | |
RAWLOSS @ 2500 tensor(1649.5826, device='cuda:5') | |
RAWLOSS @ 2500 tensor(1532.8005, device='cuda:6') | |
RAWLOSS @ 2500 tensor(1415.2178, device='cuda:0') | |
RAWLOSS @ 2500 tensor(1903.2310, device='cuda:7') | |
RAWLOSS @ 2500 tensor(1041.7327, device='cuda:2') | |
RAWLOSS @ 2500 tensor(1357.4067, device='cuda:4') | |
2020-07-20 20:38:33 | INFO | train_inner | epoch 001: 1700 / 81036 loss=9.664, ppl=811.43, wps=11249.4, ups=6.01, wpb=1872.5, bsz=64, num_updates=1700, lr=1.7e-05, gnorm=2.207, loss_scale=128, train_wall=16, wall=378 | |
RAWLOSS @ 2600 tensor(1551.7859, device='cuda:1') | |
RAWLOSS @ 2600 tensor(1433.5659, device='cuda:3') | |
RAWLOSS @ 2600 tensor(2004.1361, device='cuda:5') | |
RAWLOSS @ 2600 tensor(1738.2325, device='cuda:0') | |
RAWLOSS @ 2600 tensor(1881.1885, device='cuda:4') | |
RAWLOSS @ 2600 tensor(1511.2999, device='cuda:6') | |
RAWLOSS @ 2600 tensor(861.6576, device='cuda:2') | |
RAWLOSS @ 2600 tensor(1117.5820, device='cuda:7') | |
2020-07-20 20:38:49 | INFO | train_inner | epoch 001: 1800 / 81036 loss=9.567, ppl=758.49, wps=11347.2, ups=5.99, wpb=1893.9, bsz=64, num_updates=1800, lr=1.8e-05, gnorm=2.122, loss_scale=128, train_wall=17, wall=395 | |
RAWLOSS @ 2700 tensor(1853.3322, device='cuda:1') | |
RAWLOSS @ 2700 tensor(1929.5304, device='cuda:3') | |
RAWLOSS @ 2700 tensor(1847.4696, device='cuda:5') | |
RAWLOSS @ 2700 tensor(1887.1577, device='cuda:6') | |
RAWLOSS @ 2700 tensor(1354.1272, device='cuda:0') | |
RAWLOSS @ 2700 tensor(968.2283, device='cuda:7') | |
RAWLOSS @ 2700 tensor(1764.6019, device='cuda:4') | |
RAWLOSS @ 2700 tensor(1323.1575, device='cuda:2') | |
2020-07-20 20:39:06 | INFO | train_inner | epoch 001: 1900 / 81036 loss=9.467, ppl=707.6, wps=11510, ups=6.01, wpb=1916.2, bsz=64, num_updates=1900, lr=1.9e-05, gnorm=2.308, loss_scale=128, train_wall=16, wall=412 | |
RAWLOSS @ 2800 tensor(1247.3906, device='cuda:5') | |
RAWLOSS @ 2800 tensor(1179.4261, device='cuda:1') | |
RAWLOSS @ 2800 tensor(1428.7751, device='cuda:0') | |
RAWLOSS @ 2800 tensor(1387.0508, device='cuda:6') | |
RAWLOSS @ 2800 tensor(1970.2676, device='cuda:2') | |
RAWLOSS @ 2800 tensor(1810.1782, device='cuda:7') | |
RAWLOSS @ 2800 tensor(1260.3325, device='cuda:4') | |
RAWLOSS @ 2800 tensor(1511.3676, device='cuda:3') | |
2020-07-20 20:39:23 | INFO | train_inner | epoch 001: 2000 / 81036 loss=9.34, ppl=647.9, wps=11348.1, ups=6.01, wpb=1888.8, bsz=64, num_updates=2000, lr=2e-05, gnorm=2.282, loss_scale=128, train_wall=16, wall=428 | |
RAWLOSS @ 2900 tensor(1722.6688, device='cuda:5') | |
RAWLOSS @ 2900 tensor(1175.4904, device='cuda:1') | |
RAWLOSS @ 2900 tensor(2130.2017, device='cuda:3') | |
RAWLOSS @ 2900 tensor(881.3087, device='cuda:6') | |
RAWLOSS @ 2900 tensor(1670.6604, device='cuda:7') | |
RAWLOSS @ 2900 tensor(1291.2537, device='cuda:4') | |
RAWLOSS @ 2900 tensor(683.1188, device='cuda:0') | |
RAWLOSS @ 2900 tensor(1186.7698, device='cuda:2') | |
RAWLOSS @ 3000 tensor(1373.8418, device='cuda:5') | |
RAWLOSS @ 3000 tensor(2093.9600, device='cuda:1') | |
RAWLOSS @ 3000 tensor(1185.1798, device='cuda:4') | |
RAWLOSS @ 3000 tensor(1307.9381, device='cuda:3') | |
RAWLOSS @ 3000 tensor(1613.1339, device='cuda:6') | |
RAWLOSS @ 3000 tensor(1601.0812, device='cuda:2') | |
RAWLOSS @ 3000 tensor(1617.2799, device='cuda:7') | |
RAWLOSS @ 3000 tensor(1396.6063, device='cuda:0') | |
RAWLOSS @ 3100 tensor(1404.9858, device='cuda:1') | |
RAWLOSS @ 3100 tensor(1441.7434, device='cuda:5') | |
RAWLOSS @ 3100 tensor(1369.3285, device='cuda:4') | |
RAWLOSS @ 3100 tensor(1614.8783, device='cuda:6') | |
RAWLOSS @ 3100 tensor(1212.1155, device='cuda:3') | |
RAWLOSS @ 3100 tensor(1636.4840, device='cuda:2') | |
RAWLOSS @ 3100 tensor(1415.3347, device='cuda:7') | |
RAWLOSS @ 3100 tensor(1290.0740, device='cuda:0') | |
RAWLOSS @ 3200 tensor(1189.4517, device='cuda:1') | |
RAWLOSS @ 3200 tensor(1499.7421, device='cuda:5') | |
RAWLOSS @ 3200 tensor(837.5268, device='cuda:3') | |
RAWLOSS @ 3200 tensor(1652.7386, device='cuda:2') | |
RAWLOSS @ 3200 tensor(1908.1565, device='cuda:6') | |
RAWLOSS @ 3200 tensor(1324.9556, device='cuda:4') | |
RAWLOSS @ 3200 tensor(1531.9482, device='cuda:7') | |
RAWLOSS @ 3200 tensor(1667.3102, device='cuda:0') | |
RAWLOSS @ 3300 tensor(955.9008, device='cuda:1') | |
RAWLOSS @ 3300 tensor(1278.3064, device='cuda:5') | |
RAWLOSS @ 3300 tensor(1249.6583, device='cuda:4') | |
RAWLOSS @ 3300 tensor(824.5538, device='cuda:2') | |
RAWLOSS @ 3300 tensor(1414.2592, device='cuda:3') | |
RAWLOSS @ 3300 tensor(1852.9918, device='cuda:6') | |
RAWLOSS @ 3300 tensor(1473.3435, device='cuda:7') | |
RAWLOSS @ 3300 tensor(1417.2756, device='cuda:0') | |
RAWLOSS @ 3400 tensor(1432.7677, device='cuda:5') | |
RAWLOSS @ 3400 tensor(1330.4747, device='cuda:1') | |
RAWLOSS @ 3400 tensor(1269.7737, device='cuda:4') | |
RAWLOSS @ 3400 tensor(1708.8308, device='cuda:2') | |
RAWLOSS @ 3400 tensor(1607.6464, device='cuda:3') | |
RAWLOSS @ 3400 tensor(1740.2078, device='cuda:6') | |
RAWLOSS @ 3400 tensor(1880.3925, device='cuda:7') | |
RAWLOSS @ 3400 tensor(1808.4706, device='cuda:0') | |
RAWLOSS @ 3500 tensor(1775.8573, device='cuda:1') | |
RAWLOSS @ 3500 tensor(1932.1007, device='cuda:5') | |
RAWLOSS @ 3500 tensor(1310.2703, device='cuda:2') | |
RAWLOSS @ 3500 tensor(1365.3604, device='cuda:4') | |
RAWLOSS @ 3500 tensor(1142.5303, device='cuda:3') | |
RAWLOSS @ 3500 tensor(1385.8608, device='cuda:6') | |
RAWLOSS @ 3500 tensor(1701.1370, device='cuda:7') | |
RAWLOSS @ 3500 tensor(1393.0564, device='cuda:0') | |
RAWLOSS @ 3600 tensor(1637.2332, device='cuda:3') | |
RAWLOSS @ 3600 tensor(1755.6888, device='cuda:5') | |
RAWLOSS @ 3600 tensor(1144.6475, device='cuda:1') | |
RAWLOSS @ 3600 tensor(1148.7426, device='cuda:7') | |
RAWLOSS @ 3600 tensor(1528.7803, device='cuda:0') | |
RAWLOSS @ 3600 tensor(1392.0120, device='cuda:2') | |
RAWLOSS @ 3600 tensor(1250.3655, device='cuda:6') | |
RAWLOSS @ 3600 tensor(1317.1091, device='cuda:4') | |
2020-07-20 20:39:56 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 9.308 | ppl 633.84 | wps 47400.8 | wpb 1889.5 | bsz 64 | num_updates 2000 | best_loss 9.308 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment