Skip to content

Instantly share code, notes, and snippets.

@mrdrozdov
Created July 12, 2017 22:27
Show Gist options
  • Save mrdrozdov/d3d6ac43a3130c799e4f3f5867853184 to your computer and use it in GitHub Desktop.
Save mrdrozdov/d3d6ac43a3130c799e4f3f5867853184 to your computer and use it in GitHub Desktop.
YellowFin Sanity Check
# test_load_save=False
python main.py --emsize 30 --nhid 30 --dropout 0.5 --epochs 40 --tied --opt_method=YF --logdir=../logs --log-interval 1
using YF
| epoch 1 | 1/ 1327 batches | lr 20.00 | ms/batch 575.36 | loss 18.41 | ppl 99009707.08
| epoch 1 | 2/ 1327 batches | lr 20.00 | ms/batch 290.29 | loss 9.18 | ppl 9712.73
| epoch 1 | 3/ 1327 batches | lr 20.00 | ms/batch 289.18 | loss 9.16 | ppl 9555.24
| epoch 1 | 4/ 1327 batches | lr 20.00 | ms/batch 283.89 | loss 9.15 | ppl 9380.19
| epoch 1 | 5/ 1327 batches | lr 20.00 | ms/batch 284.20 | loss 9.13 | ppl 9218.35
| epoch 1 | 6/ 1327 batches | lr 20.00 | ms/batch 285.12 | loss 9.11 | ppl 9029.46
| epoch 1 | 7/ 1327 batches | lr 20.00 | ms/batch 291.18 | loss 9.09 | ppl 8834.21
| epoch 1 | 8/ 1327 batches | lr 20.00 | ms/batch 286.67 | loss 9.07 | ppl 8685.38
| epoch 1 | 9/ 1327 batches | lr 20.00 | ms/batch 279.19 | loss 9.03 | ppl 8330.73
| epoch 1 | 10/ 1327 batches | lr 20.00 | ms/batch 296.91 | loss 9.01 | ppl 8211.29
| epoch 1 | 11/ 1327 batches | lr 20.00 | ms/batch 301.63 | loss 8.96 | ppl 7808.55
| epoch 1 | 12/ 1327 batches | lr 20.00 | ms/batch 295.98 | loss 8.92 | ppl 7447.74
| epoch 1 | 13/ 1327 batches | lr 20.00 | ms/batch 366.19 | loss 8.86 | ppl 7077.75
| epoch 1 | 14/ 1327 batches | lr 20.00 | ms/batch 309.45 | loss 8.84 | ppl 6875.61
| epoch 1 | 15/ 1327 batches | lr 20.00 | ms/batch 309.26 | loss 8.81 | ppl 6690.71
| epoch 1 | 16/ 1327 batches | lr 20.00 | ms/batch 293.60 | loss 8.70 | ppl 6030.88
| epoch 1 | 17/ 1327 batches | lr 20.00 | ms/batch 350.34 | loss 8.67 | ppl 5814.34
| epoch 1 | 18/ 1327 batches | lr 20.00 | ms/batch 375.11 | loss 8.45 | ppl 4694.37
| epoch 1 | 19/ 1327 batches | lr 20.00 | ms/batch 405.12 | loss 8.37 | ppl 4314.87
| epoch 1 | 20/ 1327 batches | lr 20.00 | ms/batch 293.13 | loss 8.19 | ppl 3616.07
| epoch 1 | 21/ 1327 batches | lr 20.00 | ms/batch 355.02 | loss 8.19 | ppl 3606.05
| epoch 1 | 22/ 1327 batches | lr 20.00 | ms/batch 312.23 | loss 8.13 | ppl 3386.94
| epoch 1 | 23/ 1327 batches | lr 20.00 | ms/batch 310.66 | loss 7.90 | ppl 2694.39
| epoch 1 | 24/ 1327 batches | lr 20.00 | ms/batch 305.29 | loss 7.77 | ppl 2371.22
| epoch 1 | 25/ 1327 batches | lr 20.00 | ms/batch 324.08 | loss 7.63 | ppl 2063.99
| epoch 1 | 26/ 1327 batches | lr 20.00 | ms/batch 318.18 | loss 7.65 | ppl 2110.36
| epoch 1 | 27/ 1327 batches | lr 20.00 | ms/batch 328.10 | loss 7.50 | ppl 1813.92
| epoch 1 | 28/ 1327 batches | lr 20.00 | ms/batch 339.00 | loss 7.67 | ppl 2144.47
| epoch 1 | 29/ 1327 batches | lr 20.00 | ms/batch 325.19 | loss 7.51 | ppl 1831.30
| epoch 1 | 30/ 1327 batches | lr 20.00 | ms/batch 291.31 | loss 7.41 | ppl 1651.22
| epoch 1 | 31/ 1327 batches | lr 20.00 | ms/batch 347.63 | loss 7.50 | ppl 1806.91
| epoch 1 | 32/ 1327 batches | lr 20.00 | ms/batch 336.38 | loss 7.52 | ppl 1851.49
| epoch 1 | 33/ 1327 batches | lr 20.00 | ms/batch 319.41 | loss 7.43 | ppl 1679.79
| epoch 1 | 34/ 1327 batches | lr 20.00 | ms/batch 315.77 | loss 7.49 | ppl 1794.24
| epoch 1 | 35/ 1327 batches | lr 20.00 | ms/batch 305.14 | loss 7.50 | ppl 1802.84
| epoch 1 | 36/ 1327 batches | lr 20.00 | ms/batch 282.60 | loss 7.41 | ppl 1654.92
| epoch 1 | 37/ 1327 batches | lr 20.00 | ms/batch 292.92 | loss 7.23 | ppl 1384.92
| epoch 1 | 38/ 1327 batches | lr 20.00 | ms/batch 287.21 | loss 7.52 | ppl 1836.03
| epoch 1 | 39/ 1327 batches | lr 20.00 | ms/batch 287.35 | loss 7.31 | ppl 1502.53
| epoch 1 | 40/ 1327 batches | lr 20.00 | ms/batch 299.57 | loss 7.37 | ppl 1595.39
| epoch 1 | 41/ 1327 batches | lr 20.00 | ms/batch 300.11 | loss 7.31 | ppl 1487.75
| epoch 1 | 42/ 1327 batches | lr 20.00 | ms/batch 292.93 | loss 7.07 | ppl 1174.24
| epoch 1 | 43/ 1327 batches | lr 20.00 | ms/batch 293.56 | loss 7.27 | ppl 1443.24
| epoch 1 | 44/ 1327 batches | lr 20.00 | ms/batch 288.28 | loss 7.29 | ppl 1462.21
| epoch 1 | 45/ 1327 batches | lr 20.00 | ms/batch 295.02 | loss 7.16 | ppl 1288.47
| epoch 1 | 46/ 1327 batches | lr 20.00 | ms/batch 297.78 | loss 7.15 | ppl 1268.90
| epoch 1 | 47/ 1327 batches | lr 20.00 | ms/batch 292.76 | loss 7.10 | ppl 1206.12
| epoch 1 | 48/ 1327 batches | lr 20.00 | ms/batch 287.46 | loss 7.19 | ppl 1326.62
| epoch 1 | 49/ 1327 batches | lr 20.00 | ms/batch 288.45 | loss 7.20 | ppl 1345.43
| epoch 1 | 50/ 1327 batches | lr 20.00 | ms/batch 287.69 | loss 7.12 | ppl 1237.82
| epoch 1 | 51/ 1327 batches | lr 20.00 | ms/batch 292.82 | loss 7.17 | ppl 1302.12
| epoch 1 | 52/ 1327 batches | lr 20.00 | ms/batch 302.54 | loss 7.09 | ppl 1202.17
| epoch 1 | 53/ 1327 batches | lr 20.00 | ms/batch 295.60 | loss 7.05 | ppl 1148.25
| epoch 1 | 54/ 1327 batches | lr 20.00 | ms/batch 300.51 | loss 7.26 | ppl 1417.68
| epoch 1 | 55/ 1327 batches | lr 20.00 | ms/batch 291.84 | loss 7.19 | ppl 1326.38
| epoch 1 | 56/ 1327 batches | lr 20.00 | ms/batch 287.17 | loss 7.16 | ppl 1289.11
| epoch 1 | 57/ 1327 batches | lr 20.00 | ms/batch 291.70 | loss 6.98 | ppl 1074.15
| epoch 1 | 58/ 1327 batches | lr 20.00 | ms/batch 287.47 | loss 7.10 | ppl 1208.53
| epoch 1 | 59/ 1327 batches | lr 20.00 | ms/batch 286.23 | loss 7.08 | ppl 1191.17
| epoch 1 | 60/ 1327 batches | lr 20.00 | ms/batch 288.38 | loss 7.12 | ppl 1236.40
| epoch 1 | 61/ 1327 batches | lr 20.00 | ms/batch 288.13 | loss 7.01 | ppl 1108.72
| epoch 1 | 62/ 1327 batches | lr 20.00 | ms/batch 288.30 | loss 6.89 | ppl 978.45
| epoch 1 | 63/ 1327 batches | lr 20.00 | ms/batch 285.88 | loss 7.01 | ppl 1112.83
| epoch 1 | 64/ 1327 batches | lr 20.00 | ms/batch 288.63 | loss 7.08 | ppl 1191.95
| epoch 1 | 65/ 1327 batches | lr 20.00 | ms/batch 286.54 | loss 6.73 | ppl 840.33
| epoch 1 | 66/ 1327 batches | lr 20.00 | ms/batch 288.32 | loss 6.82 | ppl 916.58
| epoch 1 | 67/ 1327 batches | lr 20.00 | ms/batch 284.62 | loss 6.97 | ppl 1062.09
| epoch 1 | 68/ 1327 batches | lr 20.00 | ms/batch 289.87 | loss 7.06 | ppl 1166.66
| epoch 1 | 69/ 1327 batches | lr 20.00 | ms/batch 286.48 | loss 7.01 | ppl 1108.07
| epoch 1 | 70/ 1327 batches | lr 20.00 | ms/batch 300.43 | loss 6.99 | ppl 1088.71
| epoch 1 | 71/ 1327 batches | lr 20.00 | ms/batch 285.63 | loss 7.14 | ppl 1266.55
| epoch 1 | 72/ 1327 batches | lr 20.00 | ms/batch 290.07 | loss 7.15 | ppl 1279.94
| epoch 1 | 73/ 1327 batches | lr 20.00 | ms/batch 294.98 | loss 6.95 | ppl 1045.51
| epoch 1 | 74/ 1327 batches | lr 20.00 | ms/batch 297.10 | loss 6.96 | ppl 1056.39
| epoch 1 | 75/ 1327 batches | lr 20.00 | ms/batch 297.83 | loss 7.14 | ppl 1267.06
| epoch 1 | 76/ 1327 batches | lr 20.00 | ms/batch 299.09 | loss 6.93 | ppl 1020.15
| epoch 1 | 77/ 1327 batches | lr 20.00 | ms/batch 292.40 | loss 6.94 | ppl 1027.99
| epoch 1 | 78/ 1327 batches | lr 20.00 | ms/batch 320.14 | loss 7.01 | ppl 1103.85
| epoch 1 | 79/ 1327 batches | lr 20.00 | ms/batch 315.60 | loss 6.95 | ppl 1042.96
| epoch 1 | 80/ 1327 batches | lr 20.00 | ms/batch 303.31 | loss 6.85 | ppl 944.30
| epoch 1 | 81/ 1327 batches | lr 20.00 | ms/batch 289.75 | loss 6.90 | ppl 990.23
| epoch 1 | 82/ 1327 batches | lr 20.00 | ms/batch 289.67 | loss 6.75 | ppl 857.66
| epoch 1 | 83/ 1327 batches | lr 20.00 | ms/batch 292.73 | loss 6.96 | ppl 1050.99
| epoch 1 | 84/ 1327 batches | lr 20.00 | ms/batch 295.31 | loss 7.02 | ppl 1119.68
| epoch 1 | 85/ 1327 batches | lr 20.00 | ms/batch 302.74 | loss 6.92 | ppl 1009.62
| epoch 1 | 86/ 1327 batches | lr 20.00 | ms/batch 298.69 | loss 6.71 | ppl 822.79
| epoch 1 | 87/ 1327 batches | lr 20.00 | ms/batch 292.40 | loss 6.98 | ppl 1073.88
| epoch 1 | 88/ 1327 batches | lr 20.00 | ms/batch 291.11 | loss 6.93 | ppl 1026.40
| epoch 1 | 89/ 1327 batches | lr 20.00 | ms/batch 286.89 | loss 6.99 | ppl 1082.17
| epoch 1 | 90/ 1327 batches | lr 20.00 | ms/batch 293.64 | loss 6.96 | ppl 1056.87
| epoch 1 | 91/ 1327 batches | lr 20.00 | ms/batch 292.68 | loss 6.92 | ppl 1010.41
| epoch 1 | 92/ 1327 batches | lr 20.00 | ms/batch 286.69 | loss 6.78 | ppl 880.36
| epoch 1 | 93/ 1327 batches | lr 20.00 | ms/batch 288.30 | loss 6.86 | ppl 950.37
| epoch 1 | 94/ 1327 batches | lr 20.00 | ms/batch 287.01 | loss 6.84 | ppl 931.56
| epoch 1 | 95/ 1327 batches | lr 20.00 | ms/batch 289.05 | loss 6.89 | ppl 977.51
| epoch 1 | 96/ 1327 batches | lr 20.00 | ms/batch 291.15 | loss 6.70 | ppl 812.95
| epoch 1 | 97/ 1327 batches | lr 20.00 | ms/batch 288.41 | loss 6.88 | ppl 975.57
| epoch 1 | 98/ 1327 batches | lr 20.00 | ms/batch 287.37 | loss 7.09 | ppl 1202.25
| epoch 1 | 99/ 1327 batches | lr 20.00 | ms/batch 290.92 | loss 6.82 | ppl 917.02
| epoch 1 | 100/ 1327 batches | lr 20.00 | ms/batch 286.74 | loss 6.74 | ppl 847.86
# test_load_save=True
python main.py --emsize 30 --nhid 30 --dropout 0.5 --epochs 40 --tied --opt_method=YF --logdir=../logs --log-interval 1 --test-load-save
using YF
| epoch 1 | 1/ 1327 batches | lr 20.00 | ms/batch 594.00 | loss 18.41 | ppl 99009707.08
| epoch 1 | 2/ 1327 batches | lr 20.00 | ms/batch 295.36 | loss 9.18 | ppl 9712.73
| epoch 1 | 3/ 1327 batches | lr 20.00 | ms/batch 297.41 | loss 9.16 | ppl 9555.24
| epoch 1 | 4/ 1327 batches | lr 20.00 | ms/batch 294.90 | loss 9.15 | ppl 9380.19
| epoch 1 | 5/ 1327 batches | lr 20.00 | ms/batch 292.77 | loss 9.13 | ppl 9218.35
| epoch 1 | 6/ 1327 batches | lr 20.00 | ms/batch 296.21 | loss 9.11 | ppl 9029.46
| epoch 1 | 7/ 1327 batches | lr 20.00 | ms/batch 305.16 | loss 9.09 | ppl 8834.21
| epoch 1 | 8/ 1327 batches | lr 20.00 | ms/batch 300.87 | loss 9.07 | ppl 8685.38
| epoch 1 | 9/ 1327 batches | lr 20.00 | ms/batch 299.93 | loss 9.03 | ppl 8330.73
| epoch 1 | 10/ 1327 batches | lr 20.00 | ms/batch 298.74 | loss 9.01 | ppl 8211.29
| epoch 1 | 11/ 1327 batches | lr 20.00 | ms/batch 303.62 | loss 8.96 | ppl 7808.55
| epoch 1 | 12/ 1327 batches | lr 20.00 | ms/batch 296.76 | loss 8.92 | ppl 7447.74
| epoch 1 | 13/ 1327 batches | lr 20.00 | ms/batch 299.31 | loss 8.86 | ppl 7077.75
| epoch 1 | 14/ 1327 batches | lr 20.00 | ms/batch 298.60 | loss 8.84 | ppl 6875.61
| epoch 1 | 15/ 1327 batches | lr 20.00 | ms/batch 296.27 | loss 8.81 | ppl 6690.71
| epoch 1 | 16/ 1327 batches | lr 20.00 | ms/batch 296.60 | loss 8.70 | ppl 6030.88
| epoch 1 | 17/ 1327 batches | lr 20.00 | ms/batch 299.60 | loss 8.67 | ppl 5814.34
| epoch 1 | 18/ 1327 batches | lr 20.00 | ms/batch 303.28 | loss 8.45 | ppl 4694.37
| epoch 1 | 19/ 1327 batches | lr 20.00 | ms/batch 308.79 | loss 8.37 | ppl 4314.87
| epoch 1 | 20/ 1327 batches | lr 20.00 | ms/batch 299.36 | loss 8.19 | ppl 3616.07
| epoch 1 | 21/ 1327 batches | lr 20.00 | ms/batch 292.12 | loss 8.19 | ppl 3606.05
| epoch 1 | 22/ 1327 batches | lr 20.00 | ms/batch 291.01 | loss 8.13 | ppl 3386.94
| epoch 1 | 23/ 1327 batches | lr 20.00 | ms/batch 288.78 | loss 7.90 | ppl 2694.39
| epoch 1 | 24/ 1327 batches | lr 20.00 | ms/batch 287.31 | loss 7.77 | ppl 2371.22
| epoch 1 | 25/ 1327 batches | lr 20.00 | ms/batch 294.66 | loss 7.63 | ppl 2063.99
| epoch 1 | 26/ 1327 batches | lr 20.00 | ms/batch 291.24 | loss 7.65 | ppl 2110.36
| epoch 1 | 27/ 1327 batches | lr 20.00 | ms/batch 294.19 | loss 7.50 | ppl 1813.92
| epoch 1 | 28/ 1327 batches | lr 20.00 | ms/batch 295.10 | loss 7.67 | ppl 2144.47
| epoch 1 | 29/ 1327 batches | lr 20.00 | ms/batch 301.83 | loss 7.51 | ppl 1831.30
| epoch 1 | 30/ 1327 batches | lr 20.00 | ms/batch 339.28 | loss 7.41 | ppl 1651.22
| epoch 1 | 31/ 1327 batches | lr 20.00 | ms/batch 344.04 | loss 7.50 | ppl 1806.91
| epoch 1 | 32/ 1327 batches | lr 20.00 | ms/batch 298.47 | loss 7.52 | ppl 1851.49
| epoch 1 | 33/ 1327 batches | lr 20.00 | ms/batch 298.52 | loss 7.43 | ppl 1679.79
| epoch 1 | 34/ 1327 batches | lr 20.00 | ms/batch 297.91 | loss 7.49 | ppl 1794.24
| epoch 1 | 35/ 1327 batches | lr 20.00 | ms/batch 305.00 | loss 7.50 | ppl 1802.84
| epoch 1 | 36/ 1327 batches | lr 20.00 | ms/batch 292.06 | loss 7.41 | ppl 1654.92
| epoch 1 | 37/ 1327 batches | lr 20.00 | ms/batch 300.67 | loss 7.23 | ppl 1384.92
| epoch 1 | 38/ 1327 batches | lr 20.00 | ms/batch 335.58 | loss 7.52 | ppl 1836.03
| epoch 1 | 39/ 1327 batches | lr 20.00 | ms/batch 349.72 | loss 7.31 | ppl 1502.53
| epoch 1 | 40/ 1327 batches | lr 20.00 | ms/batch 339.67 | loss 7.37 | ppl 1595.39
| epoch 1 | 41/ 1327 batches | lr 20.00 | ms/batch 304.81 | loss 7.31 | ppl 1487.75
| epoch 1 | 42/ 1327 batches | lr 20.00 | ms/batch 299.14 | loss 7.07 | ppl 1174.24
| epoch 1 | 43/ 1327 batches | lr 20.00 | ms/batch 312.36 | loss 7.27 | ppl 1443.24
| epoch 1 | 44/ 1327 batches | lr 20.00 | ms/batch 322.90 | loss 7.29 | ppl 1462.21
| epoch 1 | 45/ 1327 batches | lr 20.00 | ms/batch 312.02 | loss 7.16 | ppl 1288.47
| epoch 1 | 46/ 1327 batches | lr 20.00 | ms/batch 322.67 | loss 7.15 | ppl 1268.90
| epoch 1 | 47/ 1327 batches | lr 20.00 | ms/batch 320.65 | loss 7.10 | ppl 1206.12
| epoch 1 | 48/ 1327 batches | lr 20.00 | ms/batch 330.47 | loss 7.19 | ppl 1326.62
| epoch 1 | 49/ 1327 batches | lr 20.00 | ms/batch 323.54 | loss 7.20 | ppl 1345.43
| epoch 1 | 50/ 1327 batches | lr 20.00 | ms/batch 328.53 | loss 7.12 | ppl 1237.82
| epoch 1 | 51/ 1327 batches | lr 20.00 | ms/batch 342.80 | loss 7.17 | ppl 1302.12
| epoch 1 | 52/ 1327 batches | lr 20.00 | ms/batch 330.60 | loss 7.09 | ppl 1202.17
| epoch 1 | 53/ 1327 batches | lr 20.00 | ms/batch 320.55 | loss 7.05 | ppl 1148.25
| epoch 1 | 54/ 1327 batches | lr 20.00 | ms/batch 349.01 | loss 7.26 | ppl 1417.68
| epoch 1 | 55/ 1327 batches | lr 20.00 | ms/batch 325.64 | loss 7.19 | ppl 1326.38
| epoch 1 | 56/ 1327 batches | lr 20.00 | ms/batch 313.87 | loss 7.16 | ppl 1289.11
| epoch 1 | 57/ 1327 batches | lr 20.00 | ms/batch 331.67 | loss 6.98 | ppl 1074.15
| epoch 1 | 58/ 1327 batches | lr 20.00 | ms/batch 329.33 | loss 7.10 | ppl 1208.53
| epoch 1 | 59/ 1327 batches | lr 20.00 | ms/batch 337.48 | loss 7.08 | ppl 1191.17
| epoch 1 | 60/ 1327 batches | lr 20.00 | ms/batch 343.75 | loss 7.12 | ppl 1236.40
| epoch 1 | 61/ 1327 batches | lr 20.00 | ms/batch 342.28 | loss 7.01 | ppl 1108.72
| epoch 1 | 62/ 1327 batches | lr 20.00 | ms/batch 320.60 | loss 6.89 | ppl 978.45
| epoch 1 | 63/ 1327 batches | lr 20.00 | ms/batch 316.88 | loss 7.01 | ppl 1112.83
| epoch 1 | 64/ 1327 batches | lr 20.00 | ms/batch 315.85 | loss 7.08 | ppl 1191.95
| epoch 1 | 65/ 1327 batches | lr 20.00 | ms/batch 322.99 | loss 6.73 | ppl 840.33
| epoch 1 | 66/ 1327 batches | lr 20.00 | ms/batch 324.03 | loss 6.82 | ppl 916.58
| epoch 1 | 67/ 1327 batches | lr 20.00 | ms/batch 335.64 | loss 6.97 | ppl 1062.09
| epoch 1 | 68/ 1327 batches | lr 20.00 | ms/batch 331.33 | loss 7.06 | ppl 1166.66
| epoch 1 | 69/ 1327 batches | lr 20.00 | ms/batch 329.21 | loss 7.01 | ppl 1108.07
| epoch 1 | 70/ 1327 batches | lr 20.00 | ms/batch 334.38 | loss 6.99 | ppl 1088.71
| epoch 1 | 71/ 1327 batches | lr 20.00 | ms/batch 327.60 | loss 7.14 | ppl 1266.55
| epoch 1 | 72/ 1327 batches | lr 20.00 | ms/batch 314.63 | loss 7.15 | ppl 1279.94
| epoch 1 | 73/ 1327 batches | lr 20.00 | ms/batch 339.22 | loss 6.95 | ppl 1045.51
| epoch 1 | 74/ 1327 batches | lr 20.00 | ms/batch 328.03 | loss 6.96 | ppl 1056.39
| epoch 1 | 75/ 1327 batches | lr 20.00 | ms/batch 333.54 | loss 7.14 | ppl 1267.06
| epoch 1 | 76/ 1327 batches | lr 20.00 | ms/batch 319.62 | loss 6.93 | ppl 1020.15
| epoch 1 | 77/ 1327 batches | lr 20.00 | ms/batch 312.65 | loss 6.94 | ppl 1027.99
| epoch 1 | 78/ 1327 batches | lr 20.00 | ms/batch 318.81 | loss 7.01 | ppl 1103.85
| epoch 1 | 79/ 1327 batches | lr 20.00 | ms/batch 330.12 | loss 6.95 | ppl 1042.96
| epoch 1 | 80/ 1327 batches | lr 20.00 | ms/batch 334.54 | loss 6.85 | ppl 944.30
| epoch 1 | 81/ 1327 batches | lr 20.00 | ms/batch 314.00 | loss 6.90 | ppl 990.23
| epoch 1 | 82/ 1327 batches | lr 20.00 | ms/batch 308.79 | loss 6.75 | ppl 857.66
| epoch 1 | 83/ 1327 batches | lr 20.00 | ms/batch 310.70 | loss 6.96 | ppl 1050.99
| epoch 1 | 84/ 1327 batches | lr 20.00 | ms/batch 315.65 | loss 7.02 | ppl 1119.68
| epoch 1 | 85/ 1327 batches | lr 20.00 | ms/batch 322.63 | loss 6.92 | ppl 1009.62
| epoch 1 | 86/ 1327 batches | lr 20.00 | ms/batch 322.94 | loss 6.71 | ppl 822.79
| epoch 1 | 87/ 1327 batches | lr 20.00 | ms/batch 322.30 | loss 6.98 | ppl 1073.88
| epoch 1 | 88/ 1327 batches | lr 20.00 | ms/batch 307.48 | loss 6.93 | ppl 1026.40
| epoch 1 | 89/ 1327 batches | lr 20.00 | ms/batch 321.74 | loss 6.99 | ppl 1082.17
| epoch 1 | 90/ 1327 batches | lr 20.00 | ms/batch 318.80 | loss 6.96 | ppl 1056.87
| epoch 1 | 91/ 1327 batches | lr 20.00 | ms/batch 311.21 | loss 6.92 | ppl 1010.41
| epoch 1 | 92/ 1327 batches | lr 20.00 | ms/batch 307.94 | loss 6.78 | ppl 880.36
| epoch 1 | 93/ 1327 batches | lr 20.00 | ms/batch 316.78 | loss 6.86 | ppl 950.37
| epoch 1 | 94/ 1327 batches | lr 20.00 | ms/batch 310.51 | loss 6.84 | ppl 931.56
| epoch 1 | 95/ 1327 batches | lr 20.00 | ms/batch 322.99 | loss 6.89 | ppl 977.51
| epoch 1 | 96/ 1327 batches | lr 20.00 | ms/batch 332.68 | loss 6.70 | ppl 812.95
| epoch 1 | 97/ 1327 batches | lr 20.00 | ms/batch 316.60 | loss 6.88 | ppl 975.57
| epoch 1 | 98/ 1327 batches | lr 20.00 | ms/batch 307.05 | loss 7.09 | ppl 1202.25
| epoch 1 | 99/ 1327 batches | lr 20.00 | ms/batch 330.19 | loss 6.82 | ppl 917.02
| epoch 1 | 100/ 1327 batches | lr 20.00 | ms/batch 313.72 | loss 6.74 | ppl 847.86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment