Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Nov05/10c8789bd2214a9b2b454ddba152767d to your computer and use it in GitHub Desktop.
Save Nov05/10c8789bd2214a9b2b454ddba152767d to your computer and use it in GitHub Desktop.
20240218_reinforcement learning_pong training log for reference

20240217_pong_REINFORCE.ipynb
👉 training log for reference
800 episodes on T4 GPU, Wall time: 1h 17min 44s

Episode: 20, score: -14.000000
[-15. -17. -15. -14. -13. -13. -16.  -9.]
Episode: 40, score: -14.000000
[-15. -13. -16. -16. -16. -10. -10. -16.]
Episode: 60, score: -13.375000
[-16. -16.  -7. -13. -15. -11. -16. -13.]
Episode: 80, score: -14.125000
[-16. -13. -11. -16. -16. -10. -15. -16.]
Episode: 100, score: -14.875000
[-15. -12. -15. -16. -16. -16. -16. -13.]
Episode: 120, score: -15.125000
[-14. -16. -16. -13. -17. -13. -16. -16.]
Episode: 140, score: -14.000000
[-11. -15. -13. -16. -16. -15. -10. -16.]
Episode: 160, score: -14.500000
[-12. -12. -15. -16. -16. -15. -14. -16.]
Episode: 180, score: -14.125000
[-15. -13. -13. -14. -16. -13. -15. -14.]
Episode: 200, score: -15.500000
[-16. -13. -16. -15. -16. -16. -16. -16.]
Episode: 220, score: -12.875000
[-12. -16. -10. -16. -12. -14. -10. -13.]
Episode: 240, score: -13.125000
[-15. -12. -11. -10. -16. -16. -11. -14.]
Episode: 260, score: -14.375000
[-14. -14. -16. -16. -15. -15. -13. -12.]
Episode: 280, score: -14.375000
[-14. -12. -16. -15. -15. -16. -16. -11.]
Episode: 300, score: -12.875000
[-14. -11. -10.  -9. -16. -16. -15. -12.]
Episode: 320, score: -13.250000
[-15. -12. -11. -11. -15. -12. -14. -16.]
Episode: 340, score: -14.250000
[-16. -16. -11. -16. -15. -14. -13. -13.]
Episode: 360, score: -14.500000
[-16.  -9. -16. -16. -16. -14. -16. -13.]
Episode: 380, score: -11.750000
[-11. -16. -16. -14. -16.  -9.  -7.  -5.]
Episode: 400, score: -15.000000
[-16. -14. -16. -15. -16. -17. -12. -14.]
Episode: 420, score: -14.000000
[-11. -14. -12. -16. -16. -13. -16. -14.]
Episode: 440, score: -14.125000
[-16. -13. -16.  -9. -14. -14. -15. -16.]
Episode: 460, score: -13.375000
[-16. -14.  -9. -15.  -9. -14. -15. -15.]
Episode: 480, score: -14.500000
[-13. -13. -16. -15. -16. -15. -12. -16.]
Episode: 500, score: -13.250000
[-14.  -9. -15. -15. -16.  -9. -16. -12.]
Episode: 520, score: -13.750000
[-15.  -9. -14. -14. -16. -15. -14. -13.]
Episode: 540, score: -13.000000
[-13. -13. -14. -14. -13. -14.  -9. -14.]
Episode: 560, score: -11.375000
[-12. -12.  -8.  -9. -13. -11. -16. -10.]
Episode: 580, score: -12.500000
[-11. -10. -15. -14. -13. -15. -10. -12.]
Episode: 600, score: -11.375000
[-15.  -8. -13. -13. -13.  -9.  -7. -13.]
Episode: 620, score: -12.500000
[-14. -10. -13. -10. -16. -13. -11. -13.]
Episode: 640, score: -11.250000
[-11.  -8. -15. -13. -11. -10. -13.  -9.]
Episode: 660, score: -11.875000
[ -6. -12. -13. -12.  -8. -12. -16. -16.]
Episode: 680, score: -11.375000
[-12. -14. -12.  -7. -12. -12. -14.  -8.]
Episode: 700, score: -10.000000
[-13. -10.  -9. -14.  -7.  -6. -13.  -8.]
Episode: 720, score: -9.125000
[ -7.  -8. -12.  -9. -12. -10.  -4. -11.]
Episode: 740, score: -8.875000
[ -6.  -5. -14. -11.  -7. -11.  -7. -10.]
Episode: 760, score: -7.000000
[ -6.  -8.  -5. -11. -10.  -8.  -6.  -2.]
Episode: 780, score: -7.500000
[-10.  -6.  -5.  -4. -10.  -8. -11.  -6.]
Episode: 800, score: -5.000000
[-8. -2. -2. -5. -8. -4. -4. -7.]
CPU times: user 29min 59s, sys: 2min 10s, total: 32min 10s
Wall time: 1h 17min 44s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment