Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save taylanbil/9c30dd4e2902db8fc0ac145f5dc79fb8 to your computer and use it in GitHub Desktop.
Save taylanbil/9c30dd4e2902db8fc0ac145f5dc79fb8 to your computer and use it in GitHub Desktop.
RAWLOSS @ 100 tensor(1759.2184, device='cuda:7')
RAWLOSS @ 100 tensor(2361.1760, device='cuda:4')
RAWLOSS @ 100 tensor(3937.2319, device='cuda:2')
RAWLOSS @ 100 tensor(2799.5732, device='cuda:0')
RAWLOSS @ 100 tensor(1954.5380, device='cuda:5')
RAWLOSS @ 100 tensor(2972.5251, device='cuda:6')
RAWLOSS @ 100 tensor(2046.4896, device='cuda:1')
RAWLOSS @ 100 tensor(3103.4412, device='cuda:3')
2020-07-20 20:33:19 | INFO | train_inner | epoch 001: 100 / 81036 loss=14.994, ppl=32621.7, wps=11385.2, ups=5.98, wpb=1903.2, bsz=64, num_updates=100, lr=1e-06, gnorm=6.203, loss_scale=128, train_wall=18, wall=64
RAWLOSS @ 200 tensor(2411.5215, device='cuda:7')
RAWLOSS @ 200 tensor(2397.7852, device='cuda:4')
RAWLOSS @ 200 tensor(1564.6809, device='cuda:1')
RAWLOSS @ 200 tensor(1628.1787, device='cuda:5')
RAWLOSS @ 200 tensor(1442.4983, device='cuda:6')
RAWLOSS @ 200 tensor(1909.0181, device='cuda:3')
RAWLOSS @ 200 tensor(1955.7889, device='cuda:0')
RAWLOSS @ 200 tensor(2427.7900, device='cuda:2')
2020-07-20 20:33:35 | INFO | train_inner | epoch 001: 200 / 81036 loss=13.256, ppl=9782.21, wps=11376.5, ups=5.98, wpb=1903.3, bsz=64, num_updates=200, lr=2e-06, gnorm=2.649, loss_scale=128, train_wall=17, wall=81
RAWLOSS @ 300 tensor(1499.0195, device='cuda:4')
RAWLOSS @ 300 tensor(2165.2576, device='cuda:7')
RAWLOSS @ 300 tensor(2041.5122, device='cuda:1')
RAWLOSS @ 300 tensor(2331.9780, device='cuda:3')
RAWLOSS @ 300 tensor(1359.0815, device='cuda:5')
RAWLOSS @ 300 tensor(1357.4939, device='cuda:6')
RAWLOSS @ 300 tensor(1671.8204, device='cuda:0')
RAWLOSS @ 300 tensor(2032.6752, device='cuda:2')
2020-07-20 20:33:52 | INFO | train_inner | epoch 001: 300 / 81036 loss=12.642, ppl=6391.1, wps=11351.1, ups=5.98, wpb=1898.2, bsz=64, num_updates=300, lr=3e-06, gnorm=1.998, loss_scale=128, train_wall=17, wall=98
RAWLOSS @ 400 tensor(1621.4983, device='cuda:4')
RAWLOSS @ 400 tensor(1330.5886, device='cuda:3')
RAWLOSS @ 400 tensor(2422.7766, device='cuda:5')
RAWLOSS @ 400 tensor(2012.2313, device='cuda:7')
RAWLOSS @ 400 tensor(2639.2793, device='cuda:1')
RAWLOSS @ 400 tensor(2962.9961, device='cuda:2')
RAWLOSS @ 400 tensor(2582.1494, device='cuda:6')
RAWLOSS @ 400 tensor(1408.4751, device='cuda:0')
2020-07-20 20:34:09 | INFO | train_inner | epoch 001: 400 / 81036 loss=12.232, ppl=4808.94, wps=11366.3, ups=5.98, wpb=1901, bsz=64, num_updates=400, lr=4e-06, gnorm=1.782, loss_scale=128, train_wall=17, wall=114
RAWLOSS @ 500 tensor(2101.3281, device='cuda:7')
RAWLOSS @ 500 tensor(1990.6239, device='cuda:4')
RAWLOSS @ 500 tensor(2170.3997, device='cuda:5')
RAWLOSS @ 500 tensor(1190.2463, device='cuda:3')
RAWLOSS @ 500 tensor(1854.5607, device='cuda:2')
RAWLOSS @ 500 tensor(1421.9266, device='cuda:1')
RAWLOSS @ 500 tensor(1715.4827, device='cuda:6')
RAWLOSS @ 500 tensor(1837.8611, device='cuda:0')
2020-07-20 20:34:25 | INFO | train_inner | epoch 001: 500 / 81036 loss=11.863, ppl=3724.61, wps=11346.4, ups=5.98, wpb=1898.6, bsz=64, num_updates=500, lr=5e-06, gnorm=1.677, loss_scale=128, train_wall=17, wall=131
RAWLOSS @ 600 tensor(1741.2328, device='cuda:5')
RAWLOSS @ 600 tensor(1931.7069, device='cuda:4')
RAWLOSS @ 600 tensor(2109.1321, device='cuda:7')
RAWLOSS @ 600 tensor(1572.4634, device='cuda:1')
RAWLOSS @ 600 tensor(1736.2273, device='cuda:2')
RAWLOSS @ 600 tensor(1897.7574, device='cuda:6')
RAWLOSS @ 600 tensor(1575.5797, device='cuda:3')
RAWLOSS @ 600 tensor(1393.9969, device='cuda:0')
2020-07-20 20:34:42 | INFO | train_inner | epoch 001: 600 / 81036 loss=11.472, ppl=2840.62, wps=11289.3, ups=5.98, wpb=1887.6, bsz=64, num_updates=600, lr=6e-06, gnorm=1.597, loss_scale=128, train_wall=17, wall=148
RAWLOSS @ 700 tensor(2078.4817, device='cuda:7')
RAWLOSS @ 700 tensor(1459.3020, device='cuda:3')
RAWLOSS @ 700 tensor(1954.3855, device='cuda:0')
RAWLOSS @ 700 tensor(1131.9280, device='cuda:6')
RAWLOSS @ 700 tensor(1785.8784, device='cuda:5')
RAWLOSS @ 700 tensor(2514.1399, device='cuda:4')
RAWLOSS @ 700 tensor(2110.8450, device='cuda:2')
RAWLOSS @ 700 tensor(1843.9750, device='cuda:1')
2020-07-20 20:34:59 | INFO | train_inner | epoch 001: 700 / 81036 loss=11.123, ppl=2230.18, wps=11321.8, ups=5.99, wpb=1889.5, bsz=64, num_updates=700, lr=7e-06, gnorm=1.591, loss_scale=128, train_wall=17, wall=164
RAWLOSS @ 800 tensor(2222.2434, device='cuda:7')
RAWLOSS @ 800 tensor(2315.1426, device='cuda:4')
RAWLOSS @ 800 tensor(1422.5798, device='cuda:2')
RAWLOSS @ 800 tensor(1832.8357, device='cuda:5')
RAWLOSS @ 800 tensor(2100.2158, device='cuda:0')
RAWLOSS @ 800 tensor(1353.9969, device='cuda:1')
RAWLOSS @ 800 tensor(1078.9521, device='cuda:6')
RAWLOSS @ 800 tensor(1601.1827, device='cuda:3')
2020-07-20 20:35:16 | INFO | train_inner | epoch 001: 800 / 81036 loss=10.822, ppl=1810.86, wps=11509.7, ups=5.99, wpb=1921, bsz=64, num_updates=800, lr=8e-06, gnorm=1.661, loss_scale=128, train_wall=17, wall=181
RAWLOSS @ 900 tensor(1358.7069, device='cuda:5')
RAWLOSS @ 900 tensor(2001.2806, device='cuda:2')
RAWLOSS @ 900 tensor(2326.0085, device='cuda:4')
RAWLOSS @ 900 tensor(1394.9055, device='cuda:0')
RAWLOSS @ 900 tensor(1583.6710, device='cuda:7')
RAWLOSS @ 900 tensor(1365.3242, device='cuda:6')
RAWLOSS @ 900 tensor(1670.3481, device='cuda:1')
RAWLOSS @ 900 tensor(1435.4464, device='cuda:3')
2020-07-20 20:35:32 | INFO | train_inner | epoch 001: 900 / 81036 loss=10.631, ppl=1585.66, wps=11266.5, ups=5.99, wpb=1880.3, bsz=64, num_updates=900, lr=9e-06, gnorm=1.715, loss_scale=128, train_wall=17, wall=198
RAWLOSS @ 1000 tensor(1818.5881, device='cuda:7')
RAWLOSS @ 1000 tensor(1177.7251, device='cuda:2')
RAWLOSS @ 1000 tensor(1533.1217, device='cuda:4')
RAWLOSS @ 1000 tensor(1397.4106, device='cuda:3')
RAWLOSS @ 1000 tensor(1990.2612, device='cuda:6')
RAWLOSS @ 1000 tensor(1071.9924, device='cuda:0')
RAWLOSS @ 1000 tensor(1775.7512, device='cuda:5')
RAWLOSS @ 1000 tensor(1839.6976, device='cuda:1')
2020-07-20 20:35:49 | INFO | train_inner | epoch 001: 1000 / 81036 loss=10.425, ppl=1374.46, wps=11259.9, ups=5.98, wpb=1882.3, bsz=64, num_updates=1000, lr=1e-05, gnorm=1.791, loss_scale=128, train_wall=17, wall=215
RAWLOSS @ 1100 tensor(1746.0033, device='cuda:2')
RAWLOSS @ 1100 tensor(1465.9130, device='cuda:5')
RAWLOSS @ 1100 tensor(1744.7491, device='cuda:6')
RAWLOSS @ 1100 tensor(1423.1323, device='cuda:3')
RAWLOSS @ 1100 tensor(1854.6252, device='cuda:1')
RAWLOSS @ 1100 tensor(1649.8682, device='cuda:7')
RAWLOSS @ 1100 tensor(1902.0649, device='cuda:0')
RAWLOSS @ 1100 tensor(1729.4275, device='cuda:4')
RAWLOSS @ 1200 tensor(1851.4042, device='cuda:1')
RAWLOSS @ 1200 tensor(2067.7593, device='cuda:5')
RAWLOSS @ 1200 tensor(1922.0482, device='cuda:6')
RAWLOSS @ 1200 tensor(1496.4819, device='cuda:0')
RAWLOSS @ 1200 tensor(2652.0574, device='cuda:2')
RAWLOSS @ 1200 tensor(1120.8887, device='cuda:3')
RAWLOSS @ 1200 tensor(787.6282, device='cuda:7')
RAWLOSS @ 1200 tensor(1216.9780, device='cuda:4')
RAWLOSS @ 1300 tensor(3905.0183, device='cuda:0')
RAWLOSS @ 1300 tensor(1987.4576, device='cuda:1')
RAWLOSS @ 1300 tensor(1555.0515, device='cuda:5')
RAWLOSS @ 1300 tensor(1986.0359, device='cuda:3')
RAWLOSS @ 1300 tensor(1688.5969, device='cuda:2')
RAWLOSS @ 1300 tensor(1266.9993, device='cuda:7')
RAWLOSS @ 1300 tensor(1950.9441, device='cuda:4')
RAWLOSS @ 1300 tensor(1934.2272, device='cuda:6')
RAWLOSS @ 1400 tensor(1763.2106, device='cuda:0')
RAWLOSS @ 1400 tensor(1382.3062, device='cuda:1')
RAWLOSS @ 1400 tensor(1663.3346, device='cuda:3')
RAWLOSS @ 1400 tensor(2144.2966, device='cuda:2')
RAWLOSS @ 1400 tensor(1423.8508, device='cuda:6')
RAWLOSS @ 1400 tensor(2353.2927, device='cuda:5')
RAWLOSS @ 1400 tensor(1797.4767, device='cuda:4')
RAWLOSS @ 1400 tensor(1752.0325, device='cuda:7')
RAWLOSS @ 1500 tensor(1589.7679, device='cuda:3')
RAWLOSS @ 1500 tensor(1606.0082, device='cuda:2')
RAWLOSS @ 1500 tensor(1295.9928, device='cuda:6')
RAWLOSS @ 1500 tensor(1770.3326, device='cuda:0')
RAWLOSS @ 1500 tensor(1411.9240, device='cuda:5')
RAWLOSS @ 1500 tensor(1495.4089, device='cuda:1')
RAWLOSS @ 1500 tensor(1556.7166, device='cuda:4')
RAWLOSS @ 1500 tensor(1429.4736, device='cuda:7')
RAWLOSS @ 1600 tensor(1415.7618, device='cuda:2')
RAWLOSS @ 1600 tensor(1043.6442, device='cuda:5')
RAWLOSS @ 1600 tensor(2576.0803, device='cuda:6')
RAWLOSS @ 1600 tensor(2111.9097, device='cuda:0')
RAWLOSS @ 1600 tensor(1848.3643, device='cuda:3')
RAWLOSS @ 1600 tensor(2283.4233, device='cuda:7')
RAWLOSS @ 1600 tensor(1435.9967, device='cuda:4')
RAWLOSS @ 1600 tensor(1574.2908, device='cuda:1')
RAWLOSS @ 1700 tensor(1510.4642, device='cuda:6')
RAWLOSS @ 1700 tensor(1805.7920, device='cuda:0')
RAWLOSS @ 1700 tensor(1852.4142, device='cuda:2')
RAWLOSS @ 1700 tensor(2197.0974, device='cuda:3')
RAWLOSS @ 1700 tensor(1595.3687, device='cuda:5')
RAWLOSS @ 1700 tensor(1582.0337, device='cuda:4')
RAWLOSS @ 1700 tensor(1389.5233, device='cuda:1')
RAWLOSS @ 1700 tensor(1680.5702, device='cuda:7')
RAWLOSS @ 1800 tensor(1313.8553, device='cuda:0')
RAWLOSS @ 1800 tensor(1477.1389, device='cuda:6')
RAWLOSS @ 1800 tensor(1534.0729, device='cuda:3')
RAWLOSS @ 1800 tensor(2022.6931, device='cuda:5')
RAWLOSS @ 1800 tensor(1315.9595, device='cuda:2')
RAWLOSS @ 1800 tensor(2442.5000, device='cuda:1')
RAWLOSS @ 1800 tensor(1580.4133, device='cuda:7')
RAWLOSS @ 1800 tensor(1386.3921, device='cuda:4')
2020-07-20 20:36:23 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 10.362 | ppl 1316 | wps 47581.5 | wpb 1889.5 | bsz 64 | num_updates 1000
2020-07-20 20:36:36 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/checkpoint_1_1000.pt (epoch 1 @ 1000 updates, score 10.362) (writing took 12.745748163026292 seconds)
RAWLOSS @ 1900 tensor(1580.1207, device='cuda:4')
RAWLOSS @ 1900 tensor(1409.8403, device='cuda:3')
RAWLOSS @ 1900 tensor(1605.6798, device='cuda:5')
RAWLOSS @ 1900 tensor(1447.5181, device='cuda:6')
RAWLOSS @ 1900 tensor(1500.7025, device='cuda:7')
RAWLOSS @ 1900 tensor(950.2123, device='cuda:2')
RAWLOSS @ 1900 tensor(2181.9126, device='cuda:0')
RAWLOSS @ 1900 tensor(976.0250, device='cuda:1')
2020-07-20 20:36:52 | INFO | train_inner | epoch 001: 1100 / 81036 loss=10.315, ppl=1274.05, wps=3011.9, ups=1.58, wpb=1909.1, bsz=64, num_updates=1100, lr=1.1e-05, gnorm=1.936, loss_scale=128, train_wall=17, wall=278
RAWLOSS @ 2000 tensor(1178.6830, device='cuda:5')
RAWLOSS @ 2000 tensor(1364.5532, device='cuda:4')
RAWLOSS @ 2000 tensor(1768.2903, device='cuda:1')
RAWLOSS @ 2000 tensor(1122.2102, device='cuda:0')
RAWLOSS @ 2000 tensor(1503.8456, device='cuda:2')
RAWLOSS @ 2000 tensor(1366.4176, device='cuda:6')
RAWLOSS @ 2000 tensor(1673.8319, device='cuda:7')
RAWLOSS @ 2000 tensor(1711.5432, device='cuda:3')
2020-07-20 20:37:09 | INFO | train_inner | epoch 001: 1200 / 81036 loss=10.137, ppl=1126.16, wps=11146.6, ups=5.98, wpb=1864.6, bsz=64, num_updates=1200, lr=1.2e-05, gnorm=1.862, loss_scale=128, train_wall=17, wall=295
RAWLOSS @ 2100 tensor(1650.2241, device='cuda:4')
RAWLOSS @ 2100 tensor(1770.8967, device='cuda:1')
RAWLOSS @ 2100 tensor(1712.5360, device='cuda:0')
RAWLOSS @ 2100 tensor(1658.7856, device='cuda:5')
RAWLOSS @ 2100 tensor(1504.3652, device='cuda:2')
RAWLOSS @ 2100 tensor(1645.2926, device='cuda:3')
RAWLOSS @ 2100 tensor(1804.2935, device='cuda:7')
RAWLOSS @ 2100 tensor(2189.3533, device='cuda:6')
2020-07-20 20:37:26 | INFO | train_inner | epoch 001: 1300 / 81036 loss=10.05, ppl=1060.33, wps=11226.8, ups=5.97, wpb=1881.1, bsz=64, num_updates=1300, lr=1.3e-05, gnorm=1.949, loss_scale=128, train_wall=17, wall=311
RAWLOSS @ 2200 tensor(1816.4875, device='cuda:5')
RAWLOSS @ 2200 tensor(1509.4227, device='cuda:4')
RAWLOSS @ 2200 tensor(741.3441, device='cuda:0')
RAWLOSS @ 2200 tensor(1460.7803, device='cuda:1')
RAWLOSS @ 2200 tensor(1397.9740, device='cuda:7')
RAWLOSS @ 2200 tensor(1625.7877, device='cuda:2')
RAWLOSS @ 2200 tensor(1306.2498, device='cuda:3')
RAWLOSS @ 2200 tensor(1712.5863, device='cuda:6')
2020-07-20 20:37:43 | INFO | train_inner | epoch 001: 1400 / 81036 loss=9.963, ppl=998.15, wps=11440.5, ups=5.98, wpb=1913.4, bsz=64, num_updates=1400, lr=1.4e-05, gnorm=2.015, loss_scale=128, train_wall=17, wall=328
RAWLOSS @ 2300 tensor(2044.0695, device='cuda:7')
RAWLOSS @ 2300 tensor(1896.6611, device='cuda:0')
RAWLOSS @ 2300 tensor(1919.1024, device='cuda:4')
RAWLOSS @ 2300 tensor(1033.1302, device='cuda:1')
RAWLOSS @ 2300 tensor(1281.7118, device='cuda:6')
RAWLOSS @ 2300 tensor(1770.1635, device='cuda:2')
RAWLOSS @ 2300 tensor(1356.5090, device='cuda:3')
RAWLOSS @ 2300 tensor(1256.1304, device='cuda:5')
2020-07-20 20:37:59 | INFO | train_inner | epoch 001: 1500 / 81036 loss=9.861, ppl=929.67, wps=11270.3, ups=5.97, wpb=1887, bsz=64, num_updates=1500, lr=1.5e-05, gnorm=2.064, loss_scale=128, train_wall=17, wall=345
RAWLOSS @ 2400 tensor(2349.6655, device='cuda:7')
RAWLOSS @ 2400 tensor(890.1895, device='cuda:2')
RAWLOSS @ 2400 tensor(1897.6302, device='cuda:1')
RAWLOSS @ 2400 tensor(1075.7246, device='cuda:5')
RAWLOSS @ 2400 tensor(1585.5381, device='cuda:3')
RAWLOSS @ 2400 tensor(1568.6381, device='cuda:6')
RAWLOSS @ 2400 tensor(2199.2390, device='cuda:0')
RAWLOSS @ 2400 tensor(1548.0806, device='cuda:4')
2020-07-20 20:38:16 | INFO | train_inner | epoch 001: 1600 / 81036 loss=9.776, ppl=876.55, wps=11203.2, ups=5.95, wpb=1883.9, bsz=64, num_updates=1600, lr=1.6e-05, gnorm=2.145, loss_scale=128, train_wall=17, wall=362
RAWLOSS @ 2500 tensor(1362.4688, device='cuda:3')
RAWLOSS @ 2500 tensor(962.4784, device='cuda:1')
RAWLOSS @ 2500 tensor(1649.5826, device='cuda:5')
RAWLOSS @ 2500 tensor(1532.8005, device='cuda:6')
RAWLOSS @ 2500 tensor(1415.2178, device='cuda:0')
RAWLOSS @ 2500 tensor(1903.2310, device='cuda:7')
RAWLOSS @ 2500 tensor(1041.7327, device='cuda:2')
RAWLOSS @ 2500 tensor(1357.4067, device='cuda:4')
2020-07-20 20:38:33 | INFO | train_inner | epoch 001: 1700 / 81036 loss=9.664, ppl=811.43, wps=11249.4, ups=6.01, wpb=1872.5, bsz=64, num_updates=1700, lr=1.7e-05, gnorm=2.207, loss_scale=128, train_wall=16, wall=378
RAWLOSS @ 2600 tensor(1551.7859, device='cuda:1')
RAWLOSS @ 2600 tensor(1433.5659, device='cuda:3')
RAWLOSS @ 2600 tensor(2004.1361, device='cuda:5')
RAWLOSS @ 2600 tensor(1738.2325, device='cuda:0')
RAWLOSS @ 2600 tensor(1881.1885, device='cuda:4')
RAWLOSS @ 2600 tensor(1511.2999, device='cuda:6')
RAWLOSS @ 2600 tensor(861.6576, device='cuda:2')
RAWLOSS @ 2600 tensor(1117.5820, device='cuda:7')
2020-07-20 20:38:49 | INFO | train_inner | epoch 001: 1800 / 81036 loss=9.567, ppl=758.49, wps=11347.2, ups=5.99, wpb=1893.9, bsz=64, num_updates=1800, lr=1.8e-05, gnorm=2.122, loss_scale=128, train_wall=17, wall=395
RAWLOSS @ 2700 tensor(1853.3322, device='cuda:1')
RAWLOSS @ 2700 tensor(1929.5304, device='cuda:3')
RAWLOSS @ 2700 tensor(1847.4696, device='cuda:5')
RAWLOSS @ 2700 tensor(1887.1577, device='cuda:6')
RAWLOSS @ 2700 tensor(1354.1272, device='cuda:0')
RAWLOSS @ 2700 tensor(968.2283, device='cuda:7')
RAWLOSS @ 2700 tensor(1764.6019, device='cuda:4')
RAWLOSS @ 2700 tensor(1323.1575, device='cuda:2')
2020-07-20 20:39:06 | INFO | train_inner | epoch 001: 1900 / 81036 loss=9.467, ppl=707.6, wps=11510, ups=6.01, wpb=1916.2, bsz=64, num_updates=1900, lr=1.9e-05, gnorm=2.308, loss_scale=128, train_wall=16, wall=412
RAWLOSS @ 2800 tensor(1247.3906, device='cuda:5')
RAWLOSS @ 2800 tensor(1179.4261, device='cuda:1')
RAWLOSS @ 2800 tensor(1428.7751, device='cuda:0')
RAWLOSS @ 2800 tensor(1387.0508, device='cuda:6')
RAWLOSS @ 2800 tensor(1970.2676, device='cuda:2')
RAWLOSS @ 2800 tensor(1810.1782, device='cuda:7')
RAWLOSS @ 2800 tensor(1260.3325, device='cuda:4')
RAWLOSS @ 2800 tensor(1511.3676, device='cuda:3')
2020-07-20 20:39:23 | INFO | train_inner | epoch 001: 2000 / 81036 loss=9.34, ppl=647.9, wps=11348.1, ups=6.01, wpb=1888.8, bsz=64, num_updates=2000, lr=2e-05, gnorm=2.282, loss_scale=128, train_wall=16, wall=428
RAWLOSS @ 2900 tensor(1722.6688, device='cuda:5')
RAWLOSS @ 2900 tensor(1175.4904, device='cuda:1')
RAWLOSS @ 2900 tensor(2130.2017, device='cuda:3')
RAWLOSS @ 2900 tensor(881.3087, device='cuda:6')
RAWLOSS @ 2900 tensor(1670.6604, device='cuda:7')
RAWLOSS @ 2900 tensor(1291.2537, device='cuda:4')
RAWLOSS @ 2900 tensor(683.1188, device='cuda:0')
RAWLOSS @ 2900 tensor(1186.7698, device='cuda:2')
RAWLOSS @ 3000 tensor(1373.8418, device='cuda:5')
RAWLOSS @ 3000 tensor(2093.9600, device='cuda:1')
RAWLOSS @ 3000 tensor(1185.1798, device='cuda:4')
RAWLOSS @ 3000 tensor(1307.9381, device='cuda:3')
RAWLOSS @ 3000 tensor(1613.1339, device='cuda:6')
RAWLOSS @ 3000 tensor(1601.0812, device='cuda:2')
RAWLOSS @ 3000 tensor(1617.2799, device='cuda:7')
RAWLOSS @ 3000 tensor(1396.6063, device='cuda:0')
RAWLOSS @ 3100 tensor(1404.9858, device='cuda:1')
RAWLOSS @ 3100 tensor(1441.7434, device='cuda:5')
RAWLOSS @ 3100 tensor(1369.3285, device='cuda:4')
RAWLOSS @ 3100 tensor(1614.8783, device='cuda:6')
RAWLOSS @ 3100 tensor(1212.1155, device='cuda:3')
RAWLOSS @ 3100 tensor(1636.4840, device='cuda:2')
RAWLOSS @ 3100 tensor(1415.3347, device='cuda:7')
RAWLOSS @ 3100 tensor(1290.0740, device='cuda:0')
RAWLOSS @ 3200 tensor(1189.4517, device='cuda:1')
RAWLOSS @ 3200 tensor(1499.7421, device='cuda:5')
RAWLOSS @ 3200 tensor(837.5268, device='cuda:3')
RAWLOSS @ 3200 tensor(1652.7386, device='cuda:2')
RAWLOSS @ 3200 tensor(1908.1565, device='cuda:6')
RAWLOSS @ 3200 tensor(1324.9556, device='cuda:4')
RAWLOSS @ 3200 tensor(1531.9482, device='cuda:7')
RAWLOSS @ 3200 tensor(1667.3102, device='cuda:0')
RAWLOSS @ 3300 tensor(955.9008, device='cuda:1')
RAWLOSS @ 3300 tensor(1278.3064, device='cuda:5')
RAWLOSS @ 3300 tensor(1249.6583, device='cuda:4')
RAWLOSS @ 3300 tensor(824.5538, device='cuda:2')
RAWLOSS @ 3300 tensor(1414.2592, device='cuda:3')
RAWLOSS @ 3300 tensor(1852.9918, device='cuda:6')
RAWLOSS @ 3300 tensor(1473.3435, device='cuda:7')
RAWLOSS @ 3300 tensor(1417.2756, device='cuda:0')
RAWLOSS @ 3400 tensor(1432.7677, device='cuda:5')
RAWLOSS @ 3400 tensor(1330.4747, device='cuda:1')
RAWLOSS @ 3400 tensor(1269.7737, device='cuda:4')
RAWLOSS @ 3400 tensor(1708.8308, device='cuda:2')
RAWLOSS @ 3400 tensor(1607.6464, device='cuda:3')
RAWLOSS @ 3400 tensor(1740.2078, device='cuda:6')
RAWLOSS @ 3400 tensor(1880.3925, device='cuda:7')
RAWLOSS @ 3400 tensor(1808.4706, device='cuda:0')
RAWLOSS @ 3500 tensor(1775.8573, device='cuda:1')
RAWLOSS @ 3500 tensor(1932.1007, device='cuda:5')
RAWLOSS @ 3500 tensor(1310.2703, device='cuda:2')
RAWLOSS @ 3500 tensor(1365.3604, device='cuda:4')
RAWLOSS @ 3500 tensor(1142.5303, device='cuda:3')
RAWLOSS @ 3500 tensor(1385.8608, device='cuda:6')
RAWLOSS @ 3500 tensor(1701.1370, device='cuda:7')
RAWLOSS @ 3500 tensor(1393.0564, device='cuda:0')
RAWLOSS @ 3600 tensor(1637.2332, device='cuda:3')
RAWLOSS @ 3600 tensor(1755.6888, device='cuda:5')
RAWLOSS @ 3600 tensor(1144.6475, device='cuda:1')
RAWLOSS @ 3600 tensor(1148.7426, device='cuda:7')
RAWLOSS @ 3600 tensor(1528.7803, device='cuda:0')
RAWLOSS @ 3600 tensor(1392.0120, device='cuda:2')
RAWLOSS @ 3600 tensor(1250.3655, device='cuda:6')
RAWLOSS @ 3600 tensor(1317.1091, device='cuda:4')
2020-07-20 20:39:56 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 9.308 | ppl 633.84 | wps 47400.8 | wpb 1889.5 | bsz 64 | num_updates 2000 | best_loss 9.308
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment