Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Nov05/2bd0d542367cff34512f09240f8465de to your computer and use it in GitHub Desktop.
Save Nov05/2bd0d542367cff34512f09240f8465de to your computer and use it in GitHub Desktop.
20240218_reinforcement learning_pong training log 1200e

20240217_pong_REINFORCE.ipynb
👉 training log for reference
1200 episodes on T4 GPU, Wall time: 2h 12min 12s

Episode: 20, score: -14.500000
[-14. -15. -16. -13. -14. -16. -16. -12.]
Episode: 40, score: -14.500000
[-15. -16. -16. -13. -13. -14. -13. -16.]
Episode: 60, score: -13.750000
[-16. -16. -11. -14. -15. -12. -15. -11.]
Episode: 80, score: -14.500000
[-16. -14. -16. -13. -16. -14. -17. -10.]
Episode: 100, score: -13.750000
[-11. -13. -14. -12. -17. -15. -15. -13.]
Episode: 120, score: -13.500000
[-16. -15. -16. -11. -10. -16. -13. -11.]
Episode: 140, score: -14.000000
[-12. -13. -15. -15. -14. -14. -16. -13.]
Episode: 160, score: -14.125000
[-11. -13. -16. -15. -16. -12. -14. -16.]
Episode: 180, score: -14.250000
[-16. -15. -16. -13. -16. -13. -14. -11.]
Episode: 200, score: -13.250000
[-14. -14. -11. -16. -16. -14. -12.  -9.]
Episode: 220, score: -14.375000
[-15. -16. -13. -10. -15. -15. -15. -16.]
Episode: 240, score: -15.500000
[-13. -16. -15. -14. -17. -17. -16. -16.]
Episode: 260, score: -13.750000
[-13. -12. -14. -16. -13. -14. -15. -13.]
Episode: 280, score: -14.500000
[-13. -16. -16. -16. -12. -13. -13. -17.]
Episode: 300, score: -15.250000
[-16. -16. -16. -15. -12. -15. -17. -15.]
Episode: 320, score: -13.750000
[-13. -16. -15. -16. -14. -10. -13. -13.]
Episode: 340, score: -14.000000
[-16. -11. -13. -14. -16. -17. -14. -11.]
Episode: 360, score: -14.250000
[-16. -10. -13. -15. -16. -16. -17. -11.]
Episode: 380, score: -13.125000
[-15. -15. -13. -16.  -8. -13. -12. -13.]
Episode: 400, score: -11.375000
[-10. -13. -11. -11.  -8. -15.  -8. -15.]
Episode: 420, score: -13.750000
[-10. -14. -13. -15. -13. -16. -16. -13.]
Episode: 440, score: -11.125000
[-12. -15. -11.  -8. -16.  -8.  -9. -10.]
Episode: 460, score: -12.125000
[-11. -14. -13. -11. -10. -15. -13. -10.]
Episode: 480, score: -12.625000
[ -7. -14. -15. -15. -14. -12. -11. -13.]
Episode: 500, score: -10.000000
[ -8. -13. -10. -13.  -9. -10.  -7. -10.]
Episode: 520, score: -10.625000
[ -9. -13.  -7. -12. -12.  -9. -12. -11.]
Episode: 540, score: -12.625000
[-14. -15. -14. -10. -11. -13. -15.  -9.]
Episode: 560, score: -10.625000
[ -5. -13. -13.  -8.  -8. -11. -16. -11.]
Episode: 580, score: -11.000000
[-10. -14. -10.  -9. -13. -15.  -5. -12.]
Episode: 600, score: -10.625000
[-13.  -9.  -9. -12. -11. -13. -11.  -7.]
Episode: 620, score: -9.250000
[ -9. -10.  -9. -12.  -4. -11.  -8. -11.]
Episode: 640, score: -7.000000
[ -6.  -4. -12.  -7.  -8.  -5.  -3. -11.]
Episode: 660, score: -7.000000
[ -6.  -7. -10. -10.  -4.  -4. -10.  -5.]
Episode: 680, score: -5.125000
[-7. -4. -9. -1. -5. -6. -2. -7.]
Episode: 700, score: -4.875000
[-5. -5. -2. -7. -5. -8. -4. -3.]
Episode: 720, score: -2.000000
[-1. -1. -1. -1. -2. -3. -6. -1.]
Episode: 740, score: -2.250000
[-3. -3. -2.  0. -7. -1. -1. -1.]
Episode: 760, score: -1.625000
[-4. -1. -6. -2.  2.  2. -2. -2.]
Episode: 780, score: -3.250000
[-1. -3.  0. -3. -4. -8. -5. -2.]
Episode: 800, score: -1.500000
[-1. -3.  0. -2.  2. -2. -3. -3.]
Episode: 820, score: -1.125000
[-1. -1. -4.  0. -1.  0. -1. -1.]
Episode: 840, score: -0.375000
[ 0.  1.  0. -2. -1.  0. -1.  0.]
Episode: 860, score: -0.750000
[ 0. -1. -4.  0. -1.  0.  0.  0.]
Episode: 880, score: -0.500000
[ 1. -2. -2.  1. -1.  0.  0. -1.]
Episode: 900, score: -0.125000
[ 0.  0. -1.  1.  1.  0. -2.  0.]
Episode: 920, score: 1.000000
[1. 1. 1. 1. 1. 1. 1. 1.]
Episode: 940, score: 0.875000
[1. 0. 1. 1. 0. 1. 0. 3.]
Episode: 960, score: 0.500000
[ 1. -1.  1.  1.  1. -1.  1.  1.]
Episode: 980, score: 1.000000
[1. 1. 1. 1. 1. 1. 1. 1.]
Episode: 1000, score: 1.000000
[1. 1. 1. 1. 1. 1. 1. 1.]
Episode: 1020, score: 0.375000
[-4.  1.  1.  1.  1.  1.  1.  1.]
Episode: 1040, score: 0.500000
[ 1.  1.  3.  1. -2.  1. -2.  1.]
Episode: 1060, score: 1.000000
[1. 1. 1. 1. 1. 1. 1. 1.]
Episode: 1080, score: 1.000000
[1. 1. 1. 1. 1. 1. 1. 1.]
Episode: 1100, score: 1.000000
[1. 1. 1. 1. 1. 1. 1. 1.]
Episode: 1120, score: 1.000000
[1. 1. 1. 1. 1. 1. 1. 1.]
Episode: 1140, score: 0.750000
[ 1.  1.  1.  1. -1.  1.  1.  1.]
Episode: 1160, score: 1.375000
[ 1.  3.  3. -2.  3.  0.  0.  3.]
Episode: 1180, score: 2.375000
[ 2.  4.  4. -1.  0.  3.  3.  4.]
Episode: 1200, score: 1.875000
[ 1.  2.  3. -5.  4.  4.  3.  3.]
CPU times: user 51min 14s, sys: 3min 45s, total: 55min
Wall time: 2h 12min 12s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment