Note that the Roboschool reward scales are different from MuJoCo's. All results are ran with 4 sessions with distinct random seeds.
mean_returns_ma
is the returns moving-average over 100 checkpoints from the sessions averaged.
Env. \ SAC | mean_returns_ma |
graph |
---|---|---|
RoboschoolAnt | 2451.55 | |
RoboschoolHalfCheetah | 2004.27 | |
RoboschoolHopper | 2090.52 | |
RoboschoolWalker2d | 1711.92 |
Trial graph | Moving average |