Skip to content

Instantly share code, notes, and snippets.

@kylekyle
Last active March 6, 2020 18:29
Show Gist options
  • Save kylekyle/ba1d0d716b644e83495e95d68418167a to your computer and use it in GitHub Desktop.
Save kylekyle/ba1d0d716b644e83495e95d68418167a to your computer and use it in GitHub Desktop.
keras-rl based deep q-learning agent using OpenAI Gym’s Blackjack-v0 environment that runs in Google Colab.
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Blackjack Agent.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyOHTYUMBbusnS9kCTT3cQkI",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/kylekyle/ba1d0d716b644e83495e95d68418167a/blackjack-agent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AJqfZjOaEX5r",
"colab_type": "text"
},
"source": [
"# Neural-network-based Blackjack Agent \n",
"\n",
"This notebook is intended to be execute in Google Colab and **does not support a GPU or TPU** runtime. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D3g9NwSPDNz-",
"colab_type": "text"
},
"source": [
"Install ipydeps."
]
},
{
"cell_type": "code",
"metadata": {
"id": "VABszUWq24uo",
"colab_type": "code",
"colab": {}
},
"source": [
"!pip install ipydeps"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "GC5pBZZFDTuR",
"colab_type": "text"
},
"source": [
"Use `ipydeps` to install all other dependencies. `keras-rl` require Tensorflow at version 1.13.1. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "k98VKADW1lSt",
"colab_type": "code",
"colab": {}
},
"source": [
"import ipydeps\n",
"ipydeps.pip([\"tensorflow==1.13.1\", \"keras-rl\", \"gym\"])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "c7vggKwjDgUE",
"colab_type": "text"
},
"source": [
"Import everything needed to run the `Blackjack-v0` environment in Open AI Gym. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "9nxXY5eFCzkW",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np\n",
"import gym\n",
"\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Activation, Flatten\n",
"from keras.optimizers import Adam\n",
"\n",
"from rl.agents.dqn import DQNAgent\n",
"from rl.policy import BoltzmannQPolicy\n",
"from rl.memory import SequentialMemory"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "4N39vmz6DqZ7",
"colab_type": "text"
},
"source": [
"Instantiate the environment and extract the number of actions.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "3s4fTJyD2jhL",
"colab_type": "code",
"colab": {}
},
"source": [
"env = gym.make('Blackjack-v0')\n",
"np.random.seed(123)\n",
"env.seed(123)\n",
"nb_actions = env.action_space.n"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "iD7badLhDwev",
"colab_type": "text"
},
"source": [
"Build a simple model to capture q-values. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "59o_J31YC4LU",
"colab_type": "code",
"colab": {}
},
"source": [
"\n",
"model = Sequential()\n",
"model.add(Flatten(input_shape=(1,3)))\n",
"model.add(Dense(16))\n",
"model.add(Activation('relu'))\n",
"model.add(Dense(16))\n",
"model.add(Activation('relu'))\n",
"model.add(Dense(16))\n",
"model.add(Activation('relu'))\n",
"model.add(Dense(nb_actions))\n",
"model.add(Activation('linear'))\n",
"\n",
"model.summary()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "gGef9P22D20g",
"colab_type": "text"
},
"source": [
"Configure and compile the agent. Use can use any built-in Keras optimizer and metrics."
]
},
{
"cell_type": "code",
"metadata": {
"id": "sdWfKijlC4Uo",
"colab_type": "code",
"colab": {}
},
"source": [
"memory = SequentialMemory(limit=50000, window_length=1)\n",
"policy = BoltzmannQPolicy()\n",
"dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,\n",
" target_model_update=1e-2, policy=policy)\n",
"dqn.compile(Adam(lr=1e-3), metrics=['mae'])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "I6T1SbjPEAGO",
"colab_type": "text"
},
"source": [
"Train the model."
]
},
{
"cell_type": "code",
"metadata": {
"id": "7K_z56vPC4Sk",
"colab_type": "code",
"colab": {}
},
"source": [
"dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "aVT0z0wREDyi",
"colab_type": "text"
},
"source": [
"Save the trained model."
]
},
{
"cell_type": "code",
"metadata": {
"id": "cXqBLG33C4Q2",
"colab_type": "code",
"colab": {}
},
"source": [
"dqn.save_weights('dqn_blackjack_v0_weights.h5f', overwrite=True)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "WnQrnwd_EM0K",
"colab_type": "text"
},
"source": [
"Evaluate the algorithm for 5 episodes."
]
},
{
"cell_type": "code",
"metadata": {
"id": "MCVGvS3XC4N1",
"colab_type": "code",
"colab": {}
},
"source": [
"dqn.test(env, nb_episodes=5, visualize=False)"
],
"execution_count": 0,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment