While understanding the principals of reinforcement learning is easy, making the implementation work is not that easy. These are the major difficulty one may face while implementing RL based algorithm for doing something useful(other that playing an atari game).
- Lack of framework that could have made the implementation easy: Recently introduced "Dopamine" is at its nascent stage. It does not allow creation of customized environment. Basically, it is just for replicating benchmarks based on board-based games.
- Lack of implementation information: Internet is scattered of tutorials and code demostrating how to construct an RL agent that plays a "Pole" game. They just demostrate the agent type, but not necessarily describe the nitty-gritty of theory mapped to implementation.
- So many independent tunables: Once implemented, it is very very hard to tell if the implementation is working correctly.
So, there are some guidlines one can follow to be a little bit confident about the implementation:
- Choose a library/framework and stick to it. My advice is tensorflow + dopamine + gym. Reading the code and making things work if it is not working will help the developer in long term.
- Take a simple environment and test on it.
- Compute KL divergence of policy before and after update, where a spike will show instability. use good visualization tools and logging mechenism.
- Work with small networks to avoid overfitting
- Automate the testing of agents with different learning rate, discount factor, policy entropy constants.