naoto yoshida ugo-nama-kun

mujoco_py のマーカーの使い方

この記事は強化学習 Advent Calendar 2021の12/22の記事です。

はじめまして！東京大学の吉田です。

大学では身体を持って自律的に発達する人工知能をつくることに興味があって、それを研究しています。

今回の記事は強化学習というよりは強化学習の環境をmujoco-pyを使って作るときのtipsといった内容です。すでに強化学習の環境を作ってみたり、mujoco-pyを使ってMujocoをつかった物理シミュレーションをしている・やろうとしている人向けの内容です。

	import tensorflow as tf # chacked @ tensorflow==2.6.0

	available_gpus = tf.config.experimental.list_physical_devices('GPU')
	print("Num GPUs Available: ", len(available_gpus))
	if available_gpus:
	try:
	tf.config.experimental.set_visible_devices(available_gpus[gpu_id], "GPU")
	logical_gpus = tf.config.experimental.list_logical_devices('GPU')
	print(len(available_gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
	except RuntimeError as e:

	from dm_control import suite
	from collections import deque

	# deepmind control suite を使う場合
	env_dmc = suite.load("cartpole", "balance")

	# dm_control で得た場合の画像の形式は (84, 84, 3)．これは mujoco_py の場合も同じ
	im = self.env_dmc.physics.render(camera_id=0, height=84, width=84)

	# frame stack [(84, 84, 3), (84, 84, 3), (84, 84, 3)]

	import torch
	import numpy as np
	import matplotlib.pyplot as plt

	# Assuming some environment that render image...
	im = env.render(mode="rgb_array",
	height=84,
	width=84,
	camera_id=0) / 255.
	im = torch.tensor(im.astype(np.float32))

	from numpy import std, mean, sqrt

	#correct if the population S.D. is expected to be equal for the two groups.
	def cohen_d(x,y):
	nx = len(x)
	ny = len(y)
	dof = nx + ny - 2
	return (mean(x) - mean(y)) / sqrt(((nx-1)std(x, ddof=1) * 2 + (ny-1)std(y, ddof=1) * 2) / dof)

	# From: https://stackoverflow.com/questions/14313510/how-to-calculate-rolling-moving-average-using-python-numpy-scipy

	def moving_average(x: np.array, w: int):
	return np.convolve(x, np.ones(w), 'valid') / w

	import matplotlib.pyplot as plt
	import numpy as np

	N = 100

	x = np.linspace(0, 6*np.pi, N)

	y = np.sin(x)

	upper = y + 0.2

	# From : https://stackoverflow.com/questions/10003143/how-to-slice-a-deque

	class sliceable_deque(collections.deque):
	def __getitem__(self, index):
	try:
	return collections.deque.__getitem__(self, index)
	except TypeError:
	return type(self)(itertools.islice(self, index.start,
	index.stop, index.step))