This is a Deep Reinforcement Learning solution to the Acrobot-v1 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.
The algorithm is a Double Deep Q Network (DQN) with Prioritized Experience Replay (PER), where the proportional prioritization variant has been implemented. All hyper parameters have been chosen by hand based on several experiments. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.
The hyperparameters are: