20240217_pong_REINFORCE.ipynb
👉 training log for reference
800 episodes on T4 GPU, Wall time: 1h 17min 44s
Episode: 20, score: -14.000000
[-15. -17. -15. -14. -13. -13. -16. -9.]
Episode: 40, score: -14.000000
[-15. -13. -16. -16. -16. -10. -10. -16.]
Episode: 60, score: -13.375000
[-16. -16. -7. -13. -15. -11. -16. -13.]
Episode: 80, score: -14.125000
[-16. -13. -11. -16. -16. -10. -15. -16.]
Episode: 100, score: -14.875000
[-15. -12. -15. -16. -16. -16. -16. -13.]
Episode: 120, score: -15.125000
[-14. -16. -16. -13. -17. -13. -16. -16.]
Episode: 140, score: -14.000000
[-11. -15. -13. -16. -16. -15. -10. -16.]
Episode: 160, score: -14.500000
[-12. -12. -15. -16. -16. -15. -14. -16.]
Episode: 180, score: -14.125000
[-15. -13. -13. -14. -16. -13. -15. -14.]
Episode: 200, score: -15.500000
[-16. -13. -16. -15. -16. -16. -16. -16.]
Episode: 220, score: -12.875000
[-12. -16. -10. -16. -12. -14. -10. -13.]
Episode: 240, score: -13.125000
[-15. -12. -11. -10. -16. -16. -11. -14.]
Episode: 260, score: -14.375000
[-14. -14. -16. -16. -15. -15. -13. -12.]
Episode: 280, score: -14.375000
[-14. -12. -16. -15. -15. -16. -16. -11.]
Episode: 300, score: -12.875000
[-14. -11. -10. -9. -16. -16. -15. -12.]
Episode: 320, score: -13.250000
[-15. -12. -11. -11. -15. -12. -14. -16.]
Episode: 340, score: -14.250000
[-16. -16. -11. -16. -15. -14. -13. -13.]
Episode: 360, score: -14.500000
[-16. -9. -16. -16. -16. -14. -16. -13.]
Episode: 380, score: -11.750000
[-11. -16. -16. -14. -16. -9. -7. -5.]
Episode: 400, score: -15.000000
[-16. -14. -16. -15. -16. -17. -12. -14.]
Episode: 420, score: -14.000000
[-11. -14. -12. -16. -16. -13. -16. -14.]
Episode: 440, score: -14.125000
[-16. -13. -16. -9. -14. -14. -15. -16.]
Episode: 460, score: -13.375000
[-16. -14. -9. -15. -9. -14. -15. -15.]
Episode: 480, score: -14.500000
[-13. -13. -16. -15. -16. -15. -12. -16.]
Episode: 500, score: -13.250000
[-14. -9. -15. -15. -16. -9. -16. -12.]
Episode: 520, score: -13.750000
[-15. -9. -14. -14. -16. -15. -14. -13.]
Episode: 540, score: -13.000000
[-13. -13. -14. -14. -13. -14. -9. -14.]
Episode: 560, score: -11.375000
[-12. -12. -8. -9. -13. -11. -16. -10.]
Episode: 580, score: -12.500000
[-11. -10. -15. -14. -13. -15. -10. -12.]
Episode: 600, score: -11.375000
[-15. -8. -13. -13. -13. -9. -7. -13.]
Episode: 620, score: -12.500000
[-14. -10. -13. -10. -16. -13. -11. -13.]
Episode: 640, score: -11.250000
[-11. -8. -15. -13. -11. -10. -13. -9.]
Episode: 660, score: -11.875000
[ -6. -12. -13. -12. -8. -12. -16. -16.]
Episode: 680, score: -11.375000
[-12. -14. -12. -7. -12. -12. -14. -8.]
Episode: 700, score: -10.000000
[-13. -10. -9. -14. -7. -6. -13. -8.]
Episode: 720, score: -9.125000
[ -7. -8. -12. -9. -12. -10. -4. -11.]
Episode: 740, score: -8.875000
[ -6. -5. -14. -11. -7. -11. -7. -10.]
Episode: 760, score: -7.000000
[ -6. -8. -5. -11. -10. -8. -6. -2.]
Episode: 780, score: -7.500000
[-10. -6. -5. -4. -10. -8. -11. -6.]
Episode: 800, score: -5.000000
[-8. -2. -2. -5. -8. -4. -4. -7.]
CPU times: user 29min 59s, sys: 2min 10s, total: 32min 10s
Wall time: 1h 17min 44s