kylekyle · March 6, 2020 18:29
diff --git a/blackjack-agent.ipynb b/blackjack-agent.ipynb
 {
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Blackjack Agent.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyOHTYUMBbusnS9kCTT3cQkI",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/kylekyle/ba1d0d716b644e83495e95d68418167a/blackjack-agent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AJqfZjOaEX5r",
        "colab_type": "text"
      },
      "source": [
        "# Neural-network-based Blackjack Agent \n",
        "\n",
        "This notebook is intended to be execute in Google Colab and **does not support a GPU or TPU** runtime. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "D3g9NwSPDNz-",
        "colab_type": "text"
      },
      "source": [
        "Install ipydeps."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "VABszUWq24uo",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!pip install ipydeps"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GC5pBZZFDTuR",
        "colab_type": "text"
      },
      "source": [
        "Use `ipydeps` to install all other dependencies. `keras-rl` require Tensorflow at version 1.13.1. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "k98VKADW1lSt",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import ipydeps\n",
        "ipydeps.pip([\"tensorflow==1.13.1\", \"keras-rl\", \"gym\"])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c7vggKwjDgUE",
        "colab_type": "text"
      },
      "source": [
        "Import everything needed to run the `Blackjack-v0` environment in Open AI Gym. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9nxXY5eFCzkW",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import numpy as np\n",
        "import gym\n",
        "\n",
        "from keras.models import Sequential\n",
        "from keras.layers import Dense, Activation, Flatten\n",
        "from keras.optimizers import Adam\n",
        "\n",
        "from rl.agents.dqn import DQNAgent\n",
        "from rl.policy import BoltzmannQPolicy\n",
        "from rl.memory import SequentialMemory"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4N39vmz6DqZ7",
        "colab_type": "text"
      },
      "source": [
        "Instantiate the environment and extract the number of actions.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3s4fTJyD2jhL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "env = gym.make('Blackjack-v0')\n",
        "np.random.seed(123)\n",
        "env.seed(123)\n",
        "nb_actions = env.action_space.n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iD7badLhDwev",
        "colab_type": "text"
      },
      "source": [
        "Build a simple model to capture q-values. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "59o_J31YC4LU",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "\n",
        "model = Sequential()\n",
        "model.add(Flatten(input_shape=(1,3)))\n",
        "model.add(Dense(16))\n",
        "model.add(Activation('relu'))\n",
        "model.add(Dense(16))\n",
        "model.add(Activation('relu'))\n",
        "model.add(Dense(16))\n",
        "model.add(Activation('relu'))\n",
        "model.add(Dense(nb_actions))\n",
        "model.add(Activation('linear'))\n",
        "\n",
        "model.summary()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gGef9P22D20g",
        "colab_type": "text"
      },
      "source": [
        "Configure and compile the agent. Use can use any built-in Keras optimizer and metrics."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "sdWfKijlC4Uo",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "memory = SequentialMemory(limit=50000, window_length=1)\n",
        "policy = BoltzmannQPolicy()\n",
        "dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,\n",
        "               target_model_update=1e-2, policy=policy)\n",
        "dqn.compile(Adam(lr=1e-3), metrics=['mae'])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "I6T1SbjPEAGO",
        "colab_type": "text"
      },
      "source": [
        "Train the model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "7K_z56vPC4Sk",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aVT0z0wREDyi",
        "colab_type": "text"
      },
      "source": [
        "Save the trained model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cXqBLG33C4Q2",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "dqn.save_weights('dqn_blackjack_v0_weights.h5f', overwrite=True)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WnQrnwd_EM0K",
        "colab_type": "text"
      },
      "source": [
        "Evaluate the algorithm for 5 episodes."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MCVGvS3XC4N1",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "dqn.test(env, nb_episodes=5, visualize=False)"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
 }
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "Blackjack Agent.ipynb",
	"provenance": [],
	"collapsed_sections": [],
	"authorship_tag": "ABX9TyOHTYUMBbusnS9kCTT3cQkI",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/kylekyle/ba1d0d716b644e83495e95d68418167a/blackjack-agent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "AJqfZjOaEX5r",
	"colab_type": "text"
	},
	"source": [
	"# Neural-network-based Blackjack Agent \n",
	"\n",
	"This notebook is intended to be execute in Google Colab and does not support a GPU or TPU runtime. "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "D3g9NwSPDNz-",
	"colab_type": "text"
	},
	"source": [
	"Install ipydeps."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "VABszUWq24uo",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"!pip install ipydeps"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "GC5pBZZFDTuR",
	"colab_type": "text"
	},
	"source": [
	"Use `ipydeps` to install all other dependencies. `keras-rl` require Tensorflow at version 1.13.1. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "k98VKADW1lSt",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"import ipydeps\n",
	"ipydeps.pip([\"tensorflow==1.13.1\", \"keras-rl\", \"gym\"])"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "c7vggKwjDgUE",
	"colab_type": "text"
	},
	"source": [
	"Import everything needed to run the `Blackjack-v0` environment in Open AI Gym. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "9nxXY5eFCzkW",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"import numpy as np\n",
	"import gym\n",
	"\n",
	"from keras.models import Sequential\n",
	"from keras.layers import Dense, Activation, Flatten\n",
	"from keras.optimizers import Adam\n",
	"\n",
	"from rl.agents.dqn import DQNAgent\n",
	"from rl.policy import BoltzmannQPolicy\n",
	"from rl.memory import SequentialMemory"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "4N39vmz6DqZ7",
	"colab_type": "text"
	},
	"source": [
	"Instantiate the environment and extract the number of actions.\n"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "3s4fTJyD2jhL",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"env = gym.make('Blackjack-v0')\n",
	"np.random.seed(123)\n",
	"env.seed(123)\n",
	"nb_actions = env.action_space.n"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "iD7badLhDwev",
	"colab_type": "text"
	},
	"source": [
	"Build a simple model to capture q-values. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "59o_J31YC4LU",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"\n",
	"model = Sequential()\n",
	"model.add(Flatten(input_shape=(1,3)))\n",
	"model.add(Dense(16))\n",
	"model.add(Activation('relu'))\n",
	"model.add(Dense(16))\n",
	"model.add(Activation('relu'))\n",
	"model.add(Dense(16))\n",
	"model.add(Activation('relu'))\n",
	"model.add(Dense(nb_actions))\n",
	"model.add(Activation('linear'))\n",
	"\n",
	"model.summary()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "gGef9P22D20g",
	"colab_type": "text"
	},
	"source": [
	"Configure and compile the agent. Use can use any built-in Keras optimizer and metrics."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "sdWfKijlC4Uo",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"memory = SequentialMemory(limit=50000, window_length=1)\n",
	"policy = BoltzmannQPolicy()\n",
	"dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,\n",
	" target_model_update=1e-2, policy=policy)\n",
	"dqn.compile(Adam(lr=1e-3), metrics=['mae'])"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "I6T1SbjPEAGO",
	"colab_type": "text"
	},
	"source": [
	"Train the model."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "7K_z56vPC4Sk",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "aVT0z0wREDyi",
	"colab_type": "text"
	},
	"source": [
	"Save the trained model."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "cXqBLG33C4Q2",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"dqn.save_weights('dqn_blackjack_v0_weights.h5f', overwrite=True)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "WnQrnwd_EM0K",
	"colab_type": "text"
	},
	"source": [
	"Evaluate the algorithm for 5 episodes."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "MCVGvS3XC4N1",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"dqn.test(env, nb_episodes=5, visualize=False)"
	],
	"execution_count": 0,
	"outputs": []
	}
	]
	}