AshNguyen · January 30, 2020 01:55
diff --git a/Unty ML-Agents - Python API.ipynb b/Unty ML-Agents - Python API.ipynb
 {"nbformat":4,"nbformat_minor":0,"metadata":{"anaconda-cloud":{},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.6.5"},"colab":{"name":"Unty ML-Agents - Python API.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"apRZIjDW0ogH","colab_type":"text"},"source":["# Unity ML-Agents Toolkit\n","## Environment Basics\n","This notebook contains a walkthrough of the basic functions of the Python API for the Unity ML-Agents toolkit. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md)."]},{"cell_type":"markdown","metadata":{"id":"zqTcpjq80ogK","colab_type":"text"},"source":["### 1. Set environment parameters\n","\n","Be sure to set `env_name` to the name of the Unity environment file you want to launch. Ensure that the environment build is in `../envs`.\n","\n","Alternatively, if you don't know the path, what can you do instead? (Hint: What happens if the variable `env_name` has no value?)"]},{"cell_type":"code","metadata":{"id":"1ATsLb040ogL","colab_type":"code","colab":{}},"source":["env_name = \"../envs/3DBall\"  # Name of the Unity environment binary to launch\n","train_mode = True  # Whether to run the environment in training or inference mode"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"zdrrfHQZ0ogP","colab_type":"text"},"source":["### 2. Load dependencies\n","\n","The following loads the necessary dependencies and checks the Python version (at runtime). ML-Agents Toolkit (v0.3 onwards) requires Python 3."]},{"cell_type":"code","metadata":{"id":"t9lx3nsl0ogQ","colab_type":"code","colab":{}},"source":["import matplotlib.pyplot as plt\n","import numpy as np\n","import sys\n","\n","from mlagents_envs.environment import UnityEnvironment\n","from mlagents_envs.side_channel.engine_configuration_channel import EngineConfig, EngineConfigurationChannel\n","\n","%matplotlib inline\n","\n","print(\"Python version:\")\n","print(sys.version)\n","\n","# check Python version\n","if (sys.version_info[0] < 3):\n","    raise Exception(\"ERROR: ML-Agents Toolkit (v0.3 onwards) requires Python 3\")"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"tFQ6lzg00ogT","colab_type":"text"},"source":["### 3. Start the environment\n","`UnityEnvironment` launches and begins communication with the environment when instantiated.\n","\n","Environments contain _brains_ which are responsible for deciding the actions of their associated _agents_. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.\n","\n","Read the code and note down the methods used in the next cells as well as their usage."]},{"cell_type":"code","metadata":{"id":"2iSJiHcl0ogU","colab_type":"code","colab":{}},"source":["engine_configuration_channel = EngineConfigurationChannel()\n","env = UnityEnvironment(base_port = 5004, file_name=env_name, side_channels = [engine_configuration_channel])\n","\n","#Reset the environment\n","env.reset()\n","\n","# Set the default brain to work with\n","group_name = env.get_agent_groups()[0]\n","group_spec = env.get_agent_group_spec(group_name)\n","\n","# Set the time scale of the engine\n","engine_configuration_channel.set_configuration_parameters(time_scale = 3.0)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Gzow69Xz0ogX","colab_type":"text"},"source":["### 4. Examine the observation and state spaces\n","We can reset the environment to be provided with an initial set of observations and states for all the agents within the environment. In ML-Agents, _states_ refer to a vector of variables corresponding to relevant aspects of the environment for an agent. Likewise, _observations_ refer to a set of relevant pixel-wise visuals for an agent."]},{"cell_type":"code","metadata":{"id":"NsvbK4G70ogY","colab_type":"code","colab":{}},"source":["# Get the state of the agents\n","step_result = env.get_step_result(group_name)\n","\n","# Examine the number of observations per Agent\n","print(\"Number of observations : \", len(group_spec.observation_shapes))\n","\n","# Examine the state space for the first observation for all agents\n","print(\"Agent state looks like: \\n{}\".format(step_result.obs[0]))\n","\n","# Examine the state space for the first observation for the first agent\n","print(\"Agent state looks like: \\n{}\".format(step_result.obs[0][0]))\n","\n","# Is there a visual observation ?\n","vis_obs = any([len(shape) == 3 for shape in group_spec.observation_shapes])\n","print(\"Is there a visual observation ?\", vis_obs)\n","\n","# Examine the visual observations\n","if vis_obs:\n","    vis_obs_index = next(i for i,v in enumerate(group_spec.observation_shapes) if len(v) == 3)\n","    print(\"Agent visual observation look like:\")\n","    obs = step_result.obs[vis_obs_index]\n","    plt.imshow(obs[0,:,:,:])\n"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"-tH5If3o0ogc","colab_type":"text"},"source":["### 5. Take random actions in the environment\n","Once we restart an environment, we can step the environment forward and provide actions to all of the agents within the environment. Here we simply choose random actions based on the `action_space_type` of the default brain.\n","\n","Once this cell is executed, 10 messages will be printed that detail how much reward will be accumulated for the next 10 episodes. The Unity environment will then pause, waiting for further signals telling it what to do next. Thus, not seeing any animation is expected when running this cell."]},{"cell_type":"markdown","metadata":{"id":"rCXoY3vv3zP3","colab_type":"text"},"source":["Task: \n","\n","1/ Edit the code to also print out how long it takes for each episode\n","\n","2/ Instead of editing the `.yaml` file like in the previous activity, write the code to set the model's hyperparameters.\n","\n","Bonus: The cell prints out the total reward for each of the episodes. When you work through the pre-class work, you also saw similar information printed out from the command line. The information should be saved in a `.csv` file in a folder named `summaries`. Look for this file and plot the reward as the model is trained to visually examine if the step size chosen is appropriate. "]},{"cell_type":"code","metadata":{"id":"WVkbO-IS0oge","colab_type":"code","colab":{}},"source":["for episode in range(10):\n","    env.reset()\n","    step_result = env.get_step_result(group_name)\n","    done = False\n","    episode_rewards = 0\n","    while not done:\n","        action_size = group_spec.action_size\n","        if group_spec.is_action_continuous():\n","            action = np.random.randn(step_result.n_agents(), group_spec.action_size)\n","            \n","        if group_spec.is_action_discrete():\n","            branch_size = group_spec.discrete_action_branches\n","            action = np.column_stack([np.random.randint(0, branch_size[i], size=(step_result.n_agents())) for i in range(len(branch_size))])\n","        env.set_actions(group_name, action)\n","        env.step()\n","        step_result = env.get_step_result(group_name)\n","        episode_rewards += step_result.reward[0]\n","        done = step_result.done[0]\n","    print(\"Total reward this episode: {}\".format(episode_rewards))"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"1hVz-syl0ogg","colab_type":"text"},"source":["### 6. Close the environment when finished\n","When we are finished using an environment, we can close it with the function below."]},{"cell_type":"code","metadata":{"id":"uFLmKWkh0ogh","colab_type":"code","colab":{}},"source":["env.close()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"IdWIvhNI0ogk","colab_type":"code","colab":{}},"source":[""],"execution_count":0,"outputs":[]}]}