nov05/20240223_udacity_drlnd_p2_env.md

Last active November 18, 2024 21:20

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/nov05/1d49183a91456a63e13782e5f49436be.js"></script>
Save nov05/1d49183a91456a63e13782e5f49436be to your computer and use it in GitHub Desktop.

Download ZIP

Raw

20240223_udacity_drlnd_p2_env.md

Udacity Deep Reinforcement Learning - p2 & `deeprl` env setup

👉 check the drlnd_py310 env setup notes
👉 check the p1 env setup notes
👉 course curriculum
👉 Colab notebooks

Window 11, VSCode, Minicoda, Powershell

👉 copy from the env where cuda and pytorch have been installed
🟢 conda create --name drlnd_p2 --clone drlnd (Python 3.6)

(base) PS D:\github\udacity-deep-reinforcement-learning\python> conda create --name drlnd_p2 --clone drlnd
Source:      D:\Users\*\miniconda3\envs\drlnd
Destination: D:\Users\*\miniconda3\envs\drlnd_p2
Packages: 159
Files: 13970

or check how to install cuda + pytorch in windows 11
conda install cuda --channel "nvidia/label/cuda-12.1.0"
or go to https://pytorch.org/, and select the right version to install
❌ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
🟢 conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

pip install torchmeta
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidi

🟢 Follow these steps to install mujoco-py on Windows

get mjpro150 win64
get mjkey.txt

🟢 Powershell $env:PATH += ";C:\Users\*\.mujoco\mjpro150\bin"
Powershell $env:path -split ";" to display path variables

🟢 download mujoco-py-1.50.1.68.tar.gz from https://pypi.org/project/mujoco-py/1.50.1.68/#files

pip install "cython<3"  
pip install mujoco-py-1.50.1.68.tar.gz  
python D:\github\udacity-deep-reinforcement-learning\python\mujoco-py\examples\body_interaction.py

you might need this pip install lockfile and some other packages. install them according to the error messages.
a worse case is that your python version is too high (maybe >=3.9?), you might need to install mujoco_py manually.
now you should be able to see this.

👉 install gym atari and lincense
https://stackoverflow.com/a/69602242

pip install -U gym
pip install -U gym[atari,accept-rom-license]
pip install bleach==1.5.0  
pip install --upgrade numpy   
pip install --upgrade tensorboard

👉 install OpenAI Baselines

pip install --upgrade pip setuptools wheel   
pip install opencv-python==4.5.5.64  
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .

for python 3.11, you can pip install opencv-python.
and i Successfully installed opencv-python-4.9.0.80.

👉 intall the rest packages for the deeprl folder.
pip install -r .\deeprl_files\requirements.txt

requirements.txt

# torch
# torchvision
# torchmeta 
# gym==0.15.7
# tensorflow==1.15.0
# opencv-python==4.0.0.21
atari-py
scikit-image==0.14.2
tqdm
pandas
pathlib
seaborn
# roboschool==1.0.34
dm-control2gym  
tensorflow-io

for python 3.11, losen the version requirement scikit-image.
I got scikit-image-0.22.0 installed.

👉 test the env setup

run notebooks

python -m ipykernel install --user --name=drlnd_p2
jupyter notebook D:\github\udacity-deep-reinforcement-learning\p2_continuous-control\Continuous_Control.ipynb  
jupyter notebook D:\github\udacity-deep-reinforcement-learning\p2_continuous-control\Crawler.ipynb

🟢 python -m deeprl.component.envs

if __name__ == '__main__':
    import time
    ## num_envs=5 will only create 3 env and cause error
    ## "results = _flatten_list(results)"
    ## in "baselines\baselines\common\vec_env\subproc_vec_env.py"
    task = Task('Hopper-v2', num_envs=3, single_process=False)
    state = task.reset()

    ## This might be helpful for custom env debugging
    # env_dict = gym.envs.registration.registry.env_specs.copy()
    # for item in env_dict.items():
    #     print(item)

    start_time = time.time()
    while True:
        action = np.random.rand(task.action_space.shape[0])
        next_state, reward, done, _ = task.step(action)
        print(done)
        if time.time()-start_time > 10: ## run about 10s
            break  
    task.close()

🟢 run examples:
D:\github\udacity-deep-reinforcement-learning\python\deeprl_files\examples.py

if __name__ == '__main__':
    mkdir('log')
    mkdir('tf_log')
    set_one_thread()
    random_seed()
    # -1 is CPU, an non-negative integer is the index of GPU
    # select_device(-1)
    select_device(0) ## GPU
    
    game = 'Reacher-v2'
    # a2c_continuous(game=game)
    # ppo_continuous(game=game)
    ddpg_continuous(game=game)

you should be able to see something like this in the video.

folder `./python/deeprl` structure

https://github.com/ShangtongZhang/DeepRL
https://github.com/ChalamPVS/Unity-Reacher

🟢 copied python files from repo @ShangtongZhang/DeepRL to repo @Nov05/udacity-deep-reinforcement-learning under the './python' folder.

DeepRL/template_jobs.py

ddpg_continuous(game='Reacher-v2', run=0, env=env,
	remark=ddpg_continuous.__name__)

DeepRL/examples.py

def ddpg_continuous(**kwargs):
	config.task_fn = lambda: Task(config.game, env=env)
	run_steps(DDPGAgent(config))

deep_rl/utils/config.py

class Config:
	def __init__(self):
		self.task_fn = None

DeepRL/deep_rl/utils/misc.py

def run_steps(agent):
    config = agent.config
    agent.step()

deep_rl/agent/DDPG_agent.py

class DDPGAgent(BaseAgent):
	self.task = config.task_fn()
	def step(self):

deep_rl/component/envs.py

def make_env(env_id, seed, rank, episode_life=True):
class Task:
    def __init__(self,
                 name,
                 num_envs=1,
		 env=env,
if __name__ == '__main__':
    task = Task('Hopper-v2', 5, single_process=False)

Author

nov05 commented Feb 29, 2024 •

edited

Loading

🟢 bug 2 debugging: python multiprocessing in windows.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\multiprocess\spawn.py", line 113, in spawn_main
    new_handle = reduction.duplicate(pipe_handle,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\multiprocess\reduction.py", line 82, in duplicate
    return _winapi.DuplicateHandle(
           ^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 6] The handle is invalid

❌ solution: pip install multiprocess
Successfully installed multiprocess-0.70.16

Solution: I have no idea why this fixes the problem, but switching from multiprocessing.Queue to queue.Queue and multiprocessing.Process to threading.Thread did it.

stackoverflow, Multiprocessing vs Threading in Python
Asked 2 years, 11 months ago

  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\envs.py", line 237, in __init__
    env0 = self.envs_wrapper.envs[0]
           ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SubprocVecEnv' object has no attribute 'envs'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\spawn.py", line 113, in spawn_main
    new_handle = reduction.duplicate(pipe_handle,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\reduction.py", line 79, in duplicate
    return _winapi.DuplicateHandle(
           ^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 6] The handle is invalid

Incorrect Argument Order for Calls to _winapi.DuplicateHandle() in multiprocessing.reduction.DupHandle

@bastiaan If you have a module a that defines a function def g(): def f(): ... how can you import f from a? You cannot. On windows, due to fundamental limitations of windows, that is what python will do when executing the callback passed to p.map! The function is "pickled" by name, then sent to the subprocess. The subprocess hence receives only the name of the function and the module where it is defined and imports it, and then performs the real call and sends back the result. But for this to work the callback has to be importable. –
Bakuriu
Mar 2, 2017 at 18:08

https://www.datacamp.com/tutorial/python-subprocess

run this command python -m experiments.deeprl_multiprocessing.

EOFError: Ran out of input means the pickle file isn't right (probably doesn't exist).

  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\process.py", line 148, in join
    assert self._popen is not None, 'can only join a started process'
           ^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only join a started process
(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> WARNING:tensorflow:From D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input

Author

nov05 commented Mar 2, 2024 •

edited

Loading

🟢⚠️ issue solved again: colab cell got AttributeError: module 'numpy' has no attribute 'bool'. for numpy==1.25.2.

solution: downgrade to numpy==1.23.5 or some other version, OR go to /content/python/baselines/baselines/common/vec_env/shmem_vec_env.py, manually change np.bool to bool.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-2-39d2d6577cd9>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from deeprl import *
      2 # if __name__ == '__main__':
      3 task = Task('Hopper-v2', num_envs=10, single_process=True)
      4 state = task.reset()
      5 for _ in range(4):

7 frames
[/usr/local/lib/python3.10/dist-packages/numpy/__init__.py](https://localhost:8080/#) in __getattr__(attr)
    317 
    318         if attr in __former_attrs__:
--> 319             raise AttributeError(__former_attrs__[attr])
    320 
    321         if attr == 'testing':

AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

/content/python/baselines/baselines/common/vec_env/shmem_vec_env.py in <module>
     15              np.int8: ctypes.c_int8,
     16              np.uint8: ctypes.c_char,
---> 17              np.bool: ctypes.c_bool}
     18 
     19

Author

nov05 commented Mar 2, 2024 •

edited

Loading

🟢⚠️ issue: colab cell got CompileError: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1, while compiling mujoco-py.

check mujoco-py Dockerfile
check local linux + anaconda, mujoco-py installation
✅some suggested use !pip install free-mujoco-py to avoid dependency issues. however it gave the same error in colab.
example colab notebook

Compiling /usr/local/lib/python3.10/dist-packages/mujoco_py/cymj.pyx because it changed.
[1/1] Cythonizing /usr/local/lib/python3.10/dist-packages/mujoco_py/cymj.pyx
INFO:root:running build_ext
INFO:root:building 'mujoco_py.cymj' extension
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr/local
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr/local/lib
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr/local/lib/python3.10
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr/local/lib/python3.10/dist-packages
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr/local/lib/python3.10/dist-packages/mujoco_py
INFO:root:creating /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr/local/lib/python3.10/dist-packages/mujoco_py/gl
INFO:root:x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/local/lib/python3.10/dist-packages/mujoco_py -I/usr/local/lib/python3.10/dist-packages/mujoco_py/binaries/linux/mujoco210/include -I/usr/local/lib/python3.10/dist-packages/numpy/core/include -I/usr/include/python3.10 -c /usr/local/lib/python3.10/dist-packages/mujoco_py/cymj.c -o /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr/local/lib/python3.10/dist-packages/mujoco_py/cymj.o -fopenmp -w
INFO:root:x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/local/lib/python3.10/dist-packages/mujoco_py -I/usr/local/lib/python3.10/dist-packages/mujoco_py/binaries/linux/mujoco210/include -I/usr/local/lib/python3.10/dist-packages/numpy/core/include -I/usr/include/python3.10 -c /usr/local/lib/python3.10/dist-packages/mujoco_py/gl/osmesashim.c -o /usr/local/lib/python3.10/dist-packages/mujoco_py/generated/_pyxbld_2.0.2.13_310_linuxcpuextensionbuilder/temp.linux-x86_64-cpython-310/usr/local/lib/python3.10/dist-packages/mujoco_py/gl/osmesashim.o -fopenmp -w
---------------------------------------------------------------------------
DistutilsExecError                        Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/unixccompiler.py](https://localhost:8080/#) in _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts)
    184         try:
--> 185             self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
    186         except DistutilsExecError as msg:

37 frames
DistutilsExecError: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1

During handling of the above exception, another exception occurred:

CompileError                              Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/unixccompiler.py](https://localhost:8080/#) in _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts)
    185             self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
    186         except DistutilsExecError as msg:
--> 187             raise CompileError(msg)
    188 
    189     def create_static_lib(

CompileError: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1

✅ solution:

!apt-get install -y \
    libgl1-mesa-dev \
    libgl1-mesa-glx \
    libglew-dev \
    libosmesa6-dev \
    software-properties-common
!apt-get install -y patchelf
!pip install free-mujoco-py

Author

nov05 commented Mar 2, 2024 •

edited

Loading

🟢⚠️ issue solved by bug fix: colab cell got AttributeError: 'SubprocVecEnv' object has no attribute 'envs'. modified code in the wrapper function for multiprocessing.

not under if __name__ == '__main__':, and single_process=False.
put the code under if __name__ == '__main__':, it got the same error.

%%time
from deeprl import *
# if __name__ == '__main__':
task = Task('Hopper-v2', num_envs=10, single_process=False)
state = task.reset()
for _ in range(4):
    actions = [np.random.rand(task.action_space.shape[0])] * len(task.envs_wrapper.envs)
    _, _, dones, _ = task.step(actions)
    print(dones)
task.close()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-8-0cfb92f97e91>](https://localhost:8080/#) in <cell line: 3>()
      1 from deeprl import *
      2 # if __name__ == '__main__':
----> 3 task = Task('Hopper-v2', num_envs=10, single_process=False)
      4 state = task.reset()
      5 for _ in range(4):

[/content/python/deeprl/component/envs.py](https://localhost:8080/#) in __init__(self, name, num_envs, env, is_mlagents, single_process, log_dir, episode_life, seed)
    235             self.envs_wrapper = Wrapper(env_fns)
    236 
--> 237         env0 = self.envs_wrapper.envs[0]
    238         self.observation_space = env0.observation_space
    239         self.state_dim = int(np.prod(env0.observation_space.shape))

AttributeError: 'SubprocVecEnv' object has no attribute 'envs'

🟢 debug
in the wrapper function, num_envs, observation_space, action_space are attributes. and in the multiprocessing wrapper function, the list of envs isn't an attribute, hence you can't access task.envs_wrapper.envs[0].

## .\baselines\baselines\common\vec_env\vec_env.py
class VecEnv(ABC):
     def __init__(self, num_envs, observation_space, action_space):

## .\baselines\baselines\common\vec_env\subproc_vec_env.py
class SubprocVecEnv(VecEnv):
    def __init__(self, env_fns, spaces=None, context='spawn'):
        VecEnv.__init__(self, len(env_fns), observation_space, action_space)

## .\deeprl\component\envs.py
class Task:
    def __init__(self,...
        self.observation_space = self.envs_wrapper.observation_space
        self.state_dim = int(np.prod(self.observation_space.shape))
        self.action_space = self.envs_wrapper.action_space
...
    actions = [np.random.rand(task.action_space.shape[0])] * task.envs_wrapper.num_envs

p.s. this colab cell ran successfully

%%time
from deeprl import *
if __name__ == '__main__':
    task = Task('Hopper-v2', num_envs=10, single_process=True)
    state = task.reset()
    for _ in range(100):
        actions = [np.random.rand(task.action_space.shape[0])] * len(task.envs_wrapper.envs)
        _, _, dones, _ = task.step(actions)
        if np.sum(dones):
            print(dones)
    task.close()

[False False False False  True False False False False False]
[ True  True False  True False False False  True False False]
[False False  True False False  True  True False  True  True]
[False False False  True  True False False False False False]
[ True  True False False False False  True  True False False]
[False False  True False False  True False False  True False]
[False False False False False False False False False  True]
[False False False False  True False  True False False False]
[ True  True False  True False False False False  True False]
[False False  True False False  True False  True False  True]
CPU times: user 323 ms, sys: 26.7 ms, total: 350 ms
Wall time: 350 ms

Author

nov05 commented Mar 3, 2024 •

edited

Loading

🟢⚠️ issue solved: in windows, sometimes the unity executable window is tiny. setting train_mode=False solved the problem. i guess it makes sense that when it is training, you don't need to watch it trains.

python -m deeprl.component.envs

    num_envs, env_id = 1, 0
    task = Task('Reacher-v2', num_envs=num_envs, env=env, is_mlagents=True, single_process=True)
    infos = task.reset(train_mode=False)

Author

nov05 commented Mar 3, 2024 •

edited

Loading

🟢⚠️ issue solved by bug fix: the baseline function VecEnv wrapped unity executable env doesn't take in actions or generate rewards. and it seemed to be constantly flashing. i suspect it got reset every iteration.

python -m deeprl.component.envs

    num_envs, env_id = 1, 0
    task = Task('Reacher-v2', num_envs=num_envs, env=env, is_mlagents=True, single_process=True)
    infos = task.reset(train_mode=False)
    num_agents = len(infos[env_id].agents)
    scores = [0.] * num_agents
    for i in range(10):
        actions = [np.random.randn(num_agents, task.action_space.shape[0])] * task.num_envs
        _, rewards, dones, infos = task.step(actions)
        scores += rewards[env_id]
        if np.any(rewards[env_id]):
            print("actions: ", actions[env_id], "rewards:", rewards[env_id], "scores:", 
                scores, "max reached:", infos[env_id].max_reached)
        if np.any(dones[env_id]): ## if any agent finishes an episode
            print("An agent finished an episode!")
            break
    task.close()

bug fix: firstly, if done is wrong, it should be if np.any(done), for there are multiple agents in an env, hence done is a non-empty list, which means if done will always get executed. secondly, remove the if done logic, cause you don't want to reset the env when one is done.

class MLAgentsVecEnv(VecEnv):
    def step_wait(self): ## VecEnv downward func
        data = []
        for i in range(self.num_envs):
            env_info = self.envs[i].step(self.actions[i])[self.brain_name]
            obsv, revw, done, info = env_info.vector_observations, env_info.rewards, env_info.local_done, env_info
            # if done: ## there are multiple agents
            #     obsv = self.envs[i].reset(train_mode=self.train_mode)
            data.append([obsv, revw, done, info])
        obsvs, revws, dones, infos = zip(*data)
        return obsvs, np.asarray(revws), np.asarray(dones), infos
...

Author

nov05 commented Mar 3, 2024 •

edited

Loading

🟢⚠️ issue solved: OpenAI Baselines AttributeError: 'EnvSpec' object has no attribute 'entry_point'. Did you mean: '_entry_point'?.

$python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\run.py", line 34, in <module>
    env_type = env.entry_point.split(':')[0].split('.')[-1]
               ^^^^^^^^^^^^^^^
AttributeError: 'EnvSpec' object has no attribute 'entry_point'. Did you mean: '_entry_point'?
(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python>

my current gym version 0.13.1. this comment suggested upgrade with pip install gym==0.14, which solved the issue, despite the error message baselines 0.1.5 requires gym<0.14.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
baselines 0.1.5 requires gym<0.14, but you have gym 0.14.0 which is incompatible.
Successfully installed gym-0.14.0

however, the following new error came out.

  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\keras\src\losses.py", line 1680, in <module>
    @tf.__internal__.dispatch.add_dispatch_support
     ^^^^^^^^^^^^^^^
AttributeError: module 'tensorflow.python' has no attribute '__internal__'

previously, i changed the code in %USERPROFILE%\miniconda3\envs\drlnd_py311\Lib\site-packages\keras\src\losses.py to avoid warnings.

# import tensorflow.compat.v2 as tf
import tensorflow.python as tf ## nov05, 20240225

## nov05, 20240225
# LABEL_DTYPES_FOR_LOSSES = {
#     tf.compat.v1.losses.sparse_softmax_cross_entropy: "int32",
#     sparse_categorical_crossentropy: "int32",
# }
LABEL_DTYPES_FOR_LOSSES = {
    tf.losses.sparse_softmax_cross_entropy: "int32",
    tf.keras.losses.sparse_categorical_crossentropy: "int32",
}

existing installation: keras 2.15.0. reinstall keras, or manually remove the change. you will have the warnings back, and this issue will be solved.

✅ to suppress the warning, check this solution.

Option 1: Upgrade to newer tensorflow version, e.g. TF2.x.

Option 2: Manually change the code in ...\Lib\site-packages\keras\src\losses.py. However, this might cause some other issues. For example, I got an error AttributeError: module 'tensorflow.python' has no attribute '__internal__' when running OpenAI Baselines.

import tensorflow.python as tf
# LABEL_DTYPES_FOR_LOSSES = {
#     tf.compat.v1.losses.sparse_softmax_cross_entropy: "int32",
#     sparse_categorical_crossentropy: "int32",
# }
LABEL_DTYPES_FOR_LOSSES = {
    tf.losses.sparse_softmax_cross_entropy: "int32",
    tf.keras.losses.sparse_categorical_crossentropy: "int32",
}

Option 3: Suppress the warning function in D:\Users\*\miniconda3\envs\drlnd_py311\Lib\site-packages\tensorflow\python\util\module_wrapper.py. Here D:\Users\*\miniconda3\envs\drlnd_py311 is my Miniconda virtual environment, your path would be different. For some reason I have to use the older version tensorflow, and this one works just fine for me.

  def _tfmw_add_deprecation_warning(self, name, attr):
    return False ##---👈✅ i added this line to bypass the whole function.
    """Print deprecation warning for attr with given name if necessary."""
    if (self._tfmw_warning_count < _PER_MODULE_WARNING_LIMIT and
        name not in self._tfmw_deprecated_checked):

      self._tfmw_deprecated_checked.add(name)

      if self._tfmw_module_name:
        full_name = 'tf.%s.%s' % (self._tfmw_module_name, name)
      else:
        full_name = 'tf.%s' % name
      rename = get_rename_v2(full_name)
      if rename and not has_deprecation_decorator(attr):
        call_location = _call_location()
        # skip locations in Python source
        if not call_location.startswith('<'):
          logging.warning(
              'From %s: The name %s is deprecated. Please use %s instead.\n',
              _call_location(), full_name, rename)
          self._tfmw_warning_count += 1
          return True
    return False

Ref: GitHub issue #62415

⚠️ P.S. if on Windows you get this error, which means Windows doesn't support multiprossing well, which is the same with this issue.

  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\subproc_vec_env.py", line 91, in close_extras
    p.join()
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\process.py", line 148, in join
    assert self._popen is not None, 'can only join a started process'
           ^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only join a started process

Author

nov05 commented Mar 4, 2024 •

edited

Loading

🟢⚠️ issue solved: this colab cell had an error.

👉 !python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --save_path=~/models/PongNoFrameskip-v4_1M_ppo2 --log_path=~/log

Exception: ROM is missing for pong, see https://github.com/openai/atari-py#roms for instructions

%cd /content/
!gdown https://www.atarimania.com/roms/Atari-2600-VCS-ROM-Collection.zip
!unzip Atari-2600-VCS-ROM-Collection.zip
%cd /content/python/
!python -m atari_py.import_roms /content/ROMS

you are supposed to see the following output. ignore the cuda related messages, for i was using a CPU runtime in colab.

2024-03-04 00:43:00.994736: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-04 00:43:00.994818: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-04 00:43:00.997044: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-04 00:43:01.010351: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-04 00:43:02.876884: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Logging to /tmp/openai-2024-03-04-00-43-05-509987
env_type: atari
2024-03-04 00:43:06.598826: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-04 00:43:06.598914: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-04 00:43:06.601367: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-04 00:43:06.645518: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-04 00:43:06.645578: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-04 00:43:06.648043: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-04 00:43:08.765260: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-04 00:43:08.837625: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Training ppo2 on atari:PongNoFrameskip-v4 with arguments 
{'nsteps': 128, 'nminibatches': 4, 'lam': 0.95, 'gamma': 0.99, 'noptepochs': 4, 'log_interval': 1, 'ent_coef': 0.01, 'lr': <function atari.<locals>.<lambda> at 0x788f2d31dfc0>, 'cliprange': 0.1, 'network': 'cnn'}
input shape is (84, 84, 4)
Stepping environment...
------------------------------------------
| eplenmean               | nan          |
| eprewmean               | nan          |
| fps                     | 36           |
| loss/approxkl           | 0.0041961395 |
| loss/clipfrac           | 0.25976562   |
| loss/policy_entropy     | 1.7871349    |
| loss/policy_loss        | 0.0011078847 |
| loss/value_loss         | 3.0158722    |
| misc/explained_variance | -0.808       |
| misc/nupdates           | 1            |
| misc/serial_timesteps   | 128          |
| misc/time_elapsed       | 7.05         |
| misc/total_timesteps    | 256          |
------------------------------------------
Stepping environment...

Author

nov05 commented Mar 4, 2024 •

edited

Loading

🟢⚠️ issue solved: in windows, when import atari ROM with python -m atari_py.import_roms /content/ROMS, got an error.

i have atari_py==0.2.9 installed, and i still get the error.

FileNotFoundError: Could not find module 'D:\Users\*\miniconda3\envs\drlnd_py311\Lib\site-packages\atari_py\ale_interface\ale_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.

✅⚠️ it worked. but it is not safe. solution: download ale_c.dll from a google drive, and place it under C:\Users\Deep Raval\AppData\Local\Programs\Python\Python38\Lib\site-packages\atari_py\ale_interface (Your path can be different).

👉 get the dll file
👉 run this python -c 'import atari_py; print(atari_py.list_games())' to verify. you should get the following output.

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -c 'import atari_py; print(atari_py.list_games())'  
['tetris']

❌ solution: download wheel file from https://pypi.org/simple/atari-py/ and pip install it.

pip install atari_py-0.2.9-cp39-cp39-win_amd64.whl

however I got error.

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> pip install D:\github\udacity-deep-reinforcement-learning\data\atari_py-0.2.9-cp39-cp39-win_amd64.whl
ERROR: atari_py-0.2.9-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.
(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python>

cp39 means cython 39. mine is cython==0.29.37, which is the latest version that is lower than cython 3 😵‍💫. i can't find the right version.

❌ solution:
- uninstall atari-py and gym[atari].
- Download VS build tools
- Run the VS build setup and select "C++ build tools" and install it.
- Restart PC.
- pip install cmake
  pip install atari-py
  pip install gym[atari]
- import atari_py
  print(atari_py.list_games())

⚠️ run this on windows. python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4. you will get multiprocessing errors, same with this issue.

Logging to C:\Users\*\AppData\Local\Temp\openai-2024-03-04-01-10-30-237567
env_type: atari

Author

nov05 commented Mar 4, 2024 •

edited

Loading

ℹ️ youtube, Install Gymnasium (OpenAI Gym) on Windows | Resolve error "Failed building wheels for box2d-py"
Johnny Code, 993 subscribers, 7K views 8 months ago

Author

nov05 commented Mar 4, 2024 •

edited

Loading

ℹ️ multiprocessing

datacamp, Python Multiprocessing Tutorial
https://www.datacamp.com/tutorial/python-multiprocessing-tutorial

Differences of Multiprocessing on Windows and Linux
Multiprocessing behaves very differently on Windows and Linux. Learn the differences to prevent mistakes.
Aquiles Carattino 2020-06-13
https://pythonforthelab.com/blog/differences-between-multiprocessing-windows-and-linux/

Author

nov05 commented Mar 5, 2024 •

edited

Loading

🟢⚠️ issue solved: Unity multiprocessing issue in Colab. write a new function UnityMultiEnvironment to launch multiple envs from an exe file. check the video.

TypeError: cannot pickle '_thread.lock' object - cause UnityEnvironment is a class?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<timed exec> in <module>

[/content/python/deeprl/component/envs.py](https://localhost:8080/#) in __init__(self, game, num_envs, env_fn_kwargs, envs, single_process, log_dir, episode_life, seed)
    291         else:
    292             wrapper_kwargs = {'env_fns': self.env_fns}
--> 293         self.envs_wrapper = Wrapper(**wrapper_kwargs)
    294 
    295         self.observation_space = self.envs_wrapper.observation_space

9 frames
[/content/python/baselines/baselines/common/vec_env/subproc_vec_env.py](https://localhost:8080/#) in __init__(self, env_fns, spaces, context)
     56             p.daemon = True  # if the main process crashes, we should not cause things to hang
     57             with clear_mpi_env_vars():
---> 58                 p.start()
     59         for remote in self.work_remotes:
     60             remote.close()

[/usr/lib/python3.10/multiprocessing/process.py](https://localhost:8080/#) in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

[/usr/lib/python3.10/multiprocessing/context.py](https://localhost:8080/#) in _Popen(process_obj)
    286         def _Popen(process_obj):
    287             from .popen_spawn_posix import Popen
--> 288             return Popen(process_obj)
    289 
    290         @staticmethod

[/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py](https://localhost:8080/#) in __init__(self, process_obj)
     30     def __init__(self, process_obj):
     31         self._fds = []
---> 32         super().__init__(process_obj)
     33 
     34     def duplicate_for_child(self, fd):

[/usr/lib/python3.10/multiprocessing/popen_fork.py](https://localhost:8080/#) in __init__(self, process_obj)
     17         self.returncode = None
     18         self.finalizer = None
---> 19         self._launch(process_obj)
     20 
     21     def duplicate_for_child(self, fd):

[/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py](https://localhost:8080/#) in _launch(self, process_obj)
     45         try:
     46             reduction.dump(prep_data, fp)
---> 47             reduction.dump(process_obj, fp)
     48         finally:
     49             set_spawning_popen(None)

[/usr/lib/python3.10/multiprocessing/reduction.py](https://localhost:8080/#) in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

[/content/python/baselines/baselines/common/vec_env/vec_env.py](https://localhost:8080/#) in __getstate__(self)
    194     def __getstate__(self):
    195         import cloudpickle
--> 196         return cloudpickle.dumps(self.x)
    197 
    198     def __setstate__(self, ob):

[/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle_fast.py](https://localhost:8080/#) in dumps(obj, protocol)
     60     with io.BytesIO() as file:
     61         cp = CloudPickler(file, protocol=protocol)
---> 62         cp.dump(obj)
     63         return file.getvalue()
     64 

[/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle_fast.py](https://localhost:8080/#) in dump(self, obj)
    536     def dump(self, obj):
    537         try:
--> 538             return Pickler.dump(self, obj)
    539         except RuntimeError as e:
    540             if "recursion" in e.args[0]:

TypeError: cannot pickle '_thread.lock' object

$python -m test2.test_unity_multiprocessing

Author

nov05 commented Mar 5, 2024 •

edited

Loading

🟢⚠️ issue solved: Unity env multiprocessing issue in Windows. check the result video.

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning> cd python
(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m experiments.deeprl_ddpg_continous
WARNING:tensorflow:From D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

E0305 03:11:07.896000000  2688 src/core/ext/transport/chttp2/server/chttp2_server.cc:1060] UNKNOWN:No address added out of total 1 resolved for '[::]:5005' {created_time:"2024-03-05T09:11:07.869297+00:00", children:[UNKNOWN:Failed to prepare server socket {fd:4188, target_address:"ipv6:%5B::%5D:5005", created_time:"2024-03-05T09:11:07.8692565+00:00", children:[UNAVAILABLE:WSA Error {created_time:"2024-03-05T09:11:07.8692022+00:00", wsa_error:10048, grpc_status:14, os_error:"Only one usage of each socket address (protocol/network address/port) is normally permitted.\r\n", syscall:"bind"}]}]}
Traceback (most recent call last):
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\rpc_communicator.py", line 51, in initialize
    self.server.add_insecure_port('[::]:'+str(self.port))
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\grpc\_server.py", line 1329, in add_insecure_port
    return _common.validate_port_binding_result(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\grpc\_common.py", line 181, in validate_port_binding_result
    raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
RuntimeError: Failed to bind to address [::]:5005; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\deeprl_ddpg_continous.py", line 53, in <module>
    ddpg_continuous(game='unity-Reacher-v2', run=0,
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\deeprl_ddpg_continous.py", line 35, in ddpg_continuous
    run_steps(DDPGAgent(config))
              ^^^^^^^^^^^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\agent\DDPG_agent.py", line 17, in __init__
    self.task = config.task_fn()
                ^^^^^^^^^^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\deeprl_ddpg_continous.py", line 16, in <lambda>
    config.task_fn = lambda: Task(config.game, env_fn_kwargs=config.env_fn_kwargs, single_process=True)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\envs.py", line 277, in __init__
    env_fn, self.env_type = get_env_fn(game, env_fn_kwargs=self.env_fn_kwargs,
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\envs.py", line 74, in get_env_fn
    env = env_fn_mappings[env_type](**kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\environment.py", line 63, in __init__
    aca_params = self.send_academy_parameters(rl_init_parameters_in)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\environment.py", line 506, in send_academy_parameters
    return self.communicator.initialize(inputs).rl_initialization_output
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\rpc_communicator.py", line 54, in initialize
    raise UnityTimeOutException(
unityagents.exception.UnityTimeOutException: Couldn't start socket communication because worker number 0 is still in use. You may need to manually close a previously opened environment or use a different worker number.

✅ 创建 pipe 的步骤，拉到 class 外完成，把 conns 赋值给 . 就行——如此就不会每个从此 class 实体化而来 object 的 conns 都指向变成 class variables、同样的 conns（https://www.geeksforgeeks.org/g-fact-34-class-or-static.../)。原版代码那样创建 pipe，是因为不清楚原原版代码 .__init__() 具体如何实现，不能改动。

change code in D:\github\udacity-deep-reinforcement-learning\python\unityagents\environment.py
test $python -m test2.test_unity_multiprocessing

self.unity_to_external = UnityToExternalServicerImplementation()
self.unity_to_external.parent_conn, self.unity_to_external.child_conn = Pipe()

Author

nov05 commented Mar 5, 2024 •

edited

Loading

✅ unity env having two subprocesses.
👉 run python -m test2.test_unity_multiprocessing.

change the D:\github\udacity-deep-reinforcement-learning\python\unityagents\environment.py

class UnityEnvironment(object):
    def executable_launcher(self, file_name, docker_training, no_graphics):
        ...
                    print("⚠️ launch_string:", launch_string)
                    self.proc1 = subprocess.Popen(
                        [launch_string, '--port', str(self.port)])
                    self.proc2 = subprocess.Popen(
                        [launch_string, '--port', str(self.port+20)])
        
    def _close(self):
        self._loaded = False
        self.communicator.close()
        if self.proc1 is not None:
            self.proc1.kill()
        import time
        time.sleep(10)
        if self.proc2 is not None:
            self.proc2.kill()

lauch the env, 2 procs got created, only the newest one had graphics. need to test whether both are working.
D:\github\udacity-deep-reinforcement-learning\python\experiments\unity_multiprocessing.py

    env1 = UnityEnvironment(file_name=f1, no_graphics=False)
    time.sleep(10)
    env1.close()

output:

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m experiments.unity_multiprocessing
WARNING:tensorflow:From D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

⚠️ launch_string: D:\github\udacity-deep-reinforcement-learning\python\..\data\Reacher_Windows_x86_64_20\Reacher.exe
INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
                goal_size -> 5.0
                goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , ,

Author

nov05 commented Mar 5, 2024 •

edited

Loading

⚠️ issue open: tensorflow==2.15.0 causes the following error.

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m tests.test_bc
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 112, in _get_module_details
  File "D:\github\udacity-deep-reinforcement-learning\python\tests\__init__.py", line 9, in <module>
    from unitytrainers import *
  File "D:\github\udacity-deep-reinforcement-learning\python\unitytrainers\__init__.py", line 2, in <module>
    from .models import *
  File "D:\github\udacity-deep-reinforcement-learning\python\unitytrainers\models.py", line 5, in <module>
    import tensorflow.contrib.layers as c_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

❌ this solution doesn't work for me.

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

✅ this solution works for me. i am using python==3.11.7 and tensorflow==2.15.0. however, not sure this is the right solution.

# import tensorflow.contrib.layers as c_layers
import keras.layers as c_layers

Author

nov05 commented Mar 7, 2024 •

edited

Loading

⁉️ question: is it possible to control individual agent asynchronously in an Unity env (.\unityagents\environment.py)?

Author

nov05 commented Mar 11, 2024 •

edited

Loading

🟢⚠️ issue solved: Unity + deeprl (multiprocessing, unsuccessful)
✅ wrap the class with a func, so that cloudpickle won't throw "TypeError: can't pickle _thread._local objects".

D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\envs.py

## wrap the class with a func, or Multiprocessing will throw
## "TypeError: cannot pickle '_thread.lock' object"
def make_unity(**kwargs):
    return lambda: UnityEnvironment(**kwargs)

env_types = {'dm', 'atari', 'gym', 'unity'}
env_fn_mappings = {'dm': dm_control2gym.make,
                   'atari': make_atari,
                   'gym': gym.make,
                   'unity': make_unity}
    
# adapted from https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/blob/master/envs.py
## refactored, func for unity added, by nov05
def get_env_fn(game, ## could be called "id", "env_id" in other functions
               env_fn_kwargs = None,
               seed=None, 
               rank=None, 
               episode_life=True):
    random_seed(seed)

    ## get env type
    env_type, kwargs = None, dict()
    if game.startswith("unity"):
        env_type = 'unity'
        kwargs.update(env_fn_kwargs)
    elif game.startswith("dm"):
        env_type = 'dm'
        _, domain, task = game.split('-')
        kwargs.update({'domain_name': domain, 'task_name': task})
    elif hasattr(gym.envs, 'atari') and \
        isinstance(env.unwrapped, gym.envs.atari.atari_env.AtariEnv):
        env_type = 'atari'
        kwargs.update({'env_id':game})
    else:
        env_type = 'gym'
        kwargs.update({'id':game})

    ## create env    
    env = env_fn_mappings[env_type](**kwargs)

    if env_type!='unity':
        env.seed(seed + rank)
        env = OriginalReturnWrapper(env)
        if env_type=='atari':
            env = wrap_deepmind(env,
                                episode_life=episode_life,
                                clip_rewards=False,
                                frame_stack=False,
                                scale=False)
            obs_shape = env.observation_space.shape
            if len(obs_shape)==3:
                env = TransposeImage(env)
                env = FrameStack(env, 4)
                
    return lambda:env, env_type ## return the env as a func

TypeError: cannot pickle '_thread.lock' object

🟢 RpcCommunicator at port 5005 is initializing...
🟢 RpcCommunicator at port 5006 is initializing...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<timed exec> in <module>

[/content/python/deeprl/component/envs.py](https://localhost:8080/#) in __init__(self, game, num_envs, env_fn_kwargs, envs, single_process, log_dir, episode_life, seed)
    299         else:
    300             wrapper_kwargs = {'env_fns': self.env_fns}
--> 301         self.envs_wrapper = Wrapper(**wrapper_kwargs)
    302 
    303         self.observation_space = self.envs_wrapper.observation_space

9 frames
[/content/python/baselines/baselines/common/vec_env/subproc_vec_env.py](https://localhost:8080/#) in __init__(self, env_fns, spaces, context)
     55             p.daemon = True  # if the main process crashes, we should not cause things to hang
     56             with clear_mpi_env_vars():
---> 57                 p.start()
     58         for remote in self.work_remotes:
     59             remote.close()

[/usr/lib/python3.10/multiprocessing/process.py](https://localhost:8080/#) in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

[/usr/lib/python3.10/multiprocessing/context.py](https://localhost:8080/#) in _Popen(process_obj)
    286         def _Popen(process_obj):
    287             from .popen_spawn_posix import Popen
--> 288             return Popen(process_obj)
    289 
    290         @staticmethod

[/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py](https://localhost:8080/#) in __init__(self, process_obj)
     30     def __init__(self, process_obj):
     31         self._fds = []
---> 32         super().__init__(process_obj)
     33 
     34     def duplicate_for_child(self, fd):

[/usr/lib/python3.10/multiprocessing/popen_fork.py](https://localhost:8080/#) in __init__(self, process_obj)
     17         self.returncode = None
     18         self.finalizer = None
---> 19         self._launch(process_obj)
     20 
     21     def duplicate_for_child(self, fd):

[/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py](https://localhost:8080/#) in _launch(self, process_obj)
     45         try:
     46             reduction.dump(prep_data, fp)
---> 47             reduction.dump(process_obj, fp)
     48         finally:
     49             set_spawning_popen(None)

[/usr/lib/python3.10/multiprocessing/reduction.py](https://localhost:8080/#) in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

[/content/python/baselines/baselines/common/vec_env/vec_env.py](https://localhost:8080/#) in __getstate__(self)
    193     def __getstate__(self):
    194         import cloudpickle
--> 195         return cloudpickle.dumps(self.x)
    196 
    197     def __setstate__(self, ob):

[/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle_fast.py](https://localhost:8080/#) in dumps(obj, protocol)
     60     with io.BytesIO() as file:
     61         cp = CloudPickler(file, protocol=protocol)
---> 62         cp.dump(obj)
     63         return file.getvalue()
     64 

[/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle_fast.py](https://localhost:8080/#) in dump(self, obj)
    536     def dump(self, obj):
    537         try:
--> 538             return Pickler.dump(self, obj)
    539         except RuntimeError as e:
    540             if "recursion" in e.args[0]:

TypeError: cannot pickle '_thread.lock' object

suggestion: something is not serializable.

Author

nov05 commented Mar 11, 2024 •

edited

Loading

🟢⚠️ issue solved: Gym game + deeprl example (multiprocessing), runs successfully in Colab (Linux), causes dump in Windows.
✅ solution: downgrade Python 3.11 to Python 3.10.

multiprocessing and python 3.11 conflict? TypeError: code() argument 13 must be str, not int
The instantiating of of class Task causes error.

from deeprl import *
task = Task('Hopper-v2', num_envs=2, single_process=False)

(drlnd_py311) PS D:\github\udacity-deep-reinforcement-learning\python> python -m tests2.test_deeprl_envs 
🟢 Process SpawnProcess-1 has started.
🟢 Process SpawnProcess-2 has started.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\vec_env.py", line 200, in __setstate__
    self.x = pickle.loads(ob)
             ^^^^^^^^^^^^^^^^
TypeError: code() argument 13 must be str, not int
Traceback (most recent call last):
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\connection.py", line 328, in _recv_bytes
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
    nread, err = ov.GetOverlappedResult(True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\spawn.py", line 132, in _main
BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:
    self = reduction.pickle.load(from_parent)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Traceback (most recent call last):
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\vec_env.py", line 200, in __setstate__
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\github\udacity-deep-reinforcement-learning\python\tests2\test_deeprl_envs.py", line 120, in <module>
    self.x = pickle.loads(ob)
             ^^^^^^^^^^^^^^^^
    test1() ## gym fn, deeprl
TypeError: code() argument 13 must be str, not int
    ^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\tests2\test_deeprl_envs.py", line 21, in test1
    task = Task('Hopper-v2', num_envs=num_envs, single_process=single_process)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\envs.py", line 301, in __init__
    self.envs_wrapper = Wrapper(**wrapper_kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\subproc_vec_env.py", line 63, in __init__
    observation_space, action_space, self.spec = self.remotes[0].recv()
                                                 ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\connection.py", line 250, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\connection.py", line 337, in _recv_bytes
    raise EOFError
EOFError
Exception ignored in: <function SubprocVecEnv.__del__ at 0x000001E71FE06660>
Traceback (most recent call last):
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\subproc_vec_env.py", line 108, in __del__
    self.close()
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\vec_env.py", line 98, in close
    self.close_extras()
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\subproc_vec_env.py", line 92, in close_extras
    remote.send(('close', None))
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "D:\Users\guido\miniconda3\envs\drlnd_py311\Lib\multiprocessing\connection.py", line 289, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [WinError 232] The pipe is being closed

suggestion: "haskel 是因为python版本问题。我这边使用python11时，报一样的错误。但是使用python3.6错误消失。pyodp的版本不变。2023-12-15 10:45:41 发布于北京"
suggestion: "Some attribute issue with Python 3.11 func.code object"
https://lists.buildroot.org/pipermail/buildroot/2022-December/657155.html

Author

nov05 commented Mar 11, 2024

🟢⚠️ issue solved: conda env drlnd_py310, tensorflow==2.16.1 would cause the following errors. ✅ downgrade to tensorflow==2.15.0 solved the issue. colab is currently using tensorflow==2.15.0 as well.

run a Baselines example
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --save_path=~/models/PongNoFrameskip-v4_1M_ppo2 --log_path=~/log

(drlnd_py310) PS D:\github\udacity-deep-reinforcement-learning> python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --save_path=~/models/PongNoFrameskip-v4_1M_ppo2 --log_path=~/log
Logging to C:\Users\guido/log
env_type: atari
⚠️ <function make_vec_env.<locals>.make_thunk.<locals>.<lambda> at 0x000002C5A3ADB9A0>
🟢 Process SpawnProcess-1 has started.
🟢 Process SpawnProcess-2 has started.
🟢 Process SpawnProcess-3 has started.
🟢 Process SpawnProcess-4 has started.
🟢 Process SpawnProcess-5 has started.
🟢 Process SpawnProcess-6 has started.
🟢 Process SpawnProcess-7 has started.
🟢 Process SpawnProcess-8 has started.
🟢 Process SpawnProcess-9 has started.
🟢 Process SpawnProcess-10 has started.
🟢 Process SpawnProcess-11 has started.
🟢 Process SpawnProcess-12 has started.
Training ppo2 on atari:PongNoFrameskip-v4 with arguments 
{'nsteps': 128, 'nminibatches': 4, 'lam': 0.95, 'gamma': 0.99, 'noptepochs': 4, 'log_interval': 1, 'ent_coef': 0.01, 'lr': <function atari.<locals>.<lambda> at 0x000002C5A3ADA440>, 'cliprange': 0.1, 'network': 'cnn'}
input shape is (84, 84, 4)
Traceback (most recent call last):
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\run.py", line 250, in <module>
    main(sys.argv)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\run.py", line 211, in main
    model, env = train(args, extra_args)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\run.py", line 77, in train
    model = learn(
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\ppo2\ppo2.py", line 97, in learn
    network = policy_network_fn(ob_space.shape)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\models.py", line 68, in network_fn
    return nature_cnn(input_shape, **conv_kwargs)
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\models.py", line 21, in nature_cnn
    h = tf.cast(h, tf.float32) / 255.
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\keras\src\backend\common\keras_tensor.py", line 92, in __tf_tensor__
    raise ValueError(
ValueError: A KerasTensor cannot be used as input to a TensorFlow function. A KerasTensor is a symbolic placeholder for a shape and dtype, used when constructing Keras Functional models or Keras Functions. You can only use it as input to a Keras layer or a Keras operation (from the namespaces `keras.layers` and `keras.operations`). You are likely doing something like:

x = Input(...)
...
tf_fn(x) # Invalid.


What you should do instead is wrap `tf_fn` in a layer:

class MyLayer(Layer):
def call(self, x):
return tf_fn(x)

x = MyLayer()(x)


Exception ignored in: <function SubprocVecEnv.__del__ at 0x000002C5A39B9870>
Traceback (most recent call last):
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\subproc_vec_env.py", line 109, in __del__
    self.close()
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\vec_env.py", line 98, in close
    self.close_extras()
  File "d:\github\udacity-deep-reinforcement-learning\python\baselines\baselines\common\vec_env\subproc_vec_env.py", line 93, in close_extras
    remote.send(('close', None))
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\connection.py", line 280, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed

Author

nov05 commented Mar 14, 2024 •

edited

Loading

🟢⁉️ question closed: 'vector_action_descriptions': ['', '', '', ''] in class BrainParameters cannot be pickled during Multiprocess piping. however, the following lines ran just fine. i don't understand why.
✅ alright, this father-f*cker, the value of BrainParameters.vector_action_descriptions, isn't a list of strings. Rather, it is <class 'google.protobuf.pyext._message.RepeatedScalarContainer'> and seems to be not serializable.

D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\envs.py

brain_info = {'vector_action_descriptions':['','','',''], 'something':9}
remote.send(brain_info)

you can find class BrainParameters definition here.
D:\github\udacity-deep-reinforcement-learning\python\unityagents\brain.py

Author

nov05 commented Mar 15, 2024 •

edited

Loading

🟢⚠️ issue solved: random seed problem. in .\python\tests2\test_deeprl_envs.py, seeds only affect the balls. if seeds are different, each ball movement will be different. if seeds are the same, ball movements in different environment instance will be the same. however, what we would need here is the randomness of the Unity environment, e.g. for Reacher-v2. it is strange that in another python file .\python\tests2\test_unity_multiprocessing.py, each environment is different no matter whether the seeds are different.

✅ first of all, the env controls the ball movements, and they are fine, always fine - balls move randomly, which means the random seeds always work. the actions controls the sticks, and if you wrote ❌ [randn()] * num_envs which would generate a list of the same number, and of course the sticks would move the same in different envs. instead, you need to use [rand() for _ in range(num_envs)] to get a list of different numbers. this was a stupid mistake.

    for _ in range(max_steps):
        actions = [np.random.randn(task.envs_wrapper.num_agents, task.action_space.shape[0]) for _ in range(task.num_envs)]

    env_fn_kwargs = {'file_name': env_file_name, 'no_graphics': no_graphics}
    task = Task('unity-Reacher-v2', num_envs=num_envs, seeds=[1,1],
                env_fn_kwargs=env_fn_kwargs, single_process=single_process)
...
    for _ in range(max_steps):
        actions = [np.random.randn(task.envs_wrapper.num_agents, task.action_space.shape[0])] * task.num_envs

terminal outputs

(drlnd_py310) PS D:\github\udacity-deep-reinforcement-learning\python> python -m tests2.test_deeprl_envs
👉 Random seed: 335424301
🟢 RpcCommunicator at port 5005 is initializing...
INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
                goal_size -> 5.0
                goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , ,
👉 Random seed: 916458839
🟢 RpcCommunicator at port 5006 is initializing...
INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
                goal_size -> 5.0
                goal_speed -> 1.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , ,
🟢 Task has started...

has it anything to do with Multiprocessing? No. Single processing gives the same result.

import multiprocessing as mp
class UnitySubprocVecEnv(VecEnv):
...
        ctx = mp.get_context(context)
        self.remotes, self.work_remotes = zip(*[ctx.Pipe() for _ in range(self.num_envs)])
        self.ps = [ctx.Process(target=unity_worker, args=(work_remote, remote, CloudpickleWrapper(env_fn))) 
                for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]

seed doesn't work in the Unity environment Python code.

$ python -m tests2.test_deeprl_envs, single_process = True

D:\github\udacity-deep-reinforcement-learning\python\unityagents\environment.py

    def _generate_reset_input(self, training, config) -> UnityRLInput: # type: ignore
...
        rl_in.environment_parameters.CopyFrom(EnvironmentParametersProto())
        for key in config:
            rl_in.environment_parameters.float_parameters[key] = config[key]
        # rl_in.environment_parameters.float_parameters['seed'] = np.random.randint(-2147483648, 2147483647) ## added by nov05
        # print('👉 rl_in.environment_parameters.float_parameters[\'seed\']:', rl_in.environment_parameters.float_parameters['seed'])

    def send_academy_parameters(self, init_parameters: UnityRLInitializationInput) -> UnityRLInitializationOutput: # type: ignore
        inputs = UnityInput()
        ## seed will be stored in "inputs.rl_initialization_input.seed"
        inputs.rl_initialization_input.CopyFrom(init_parameters)
        print('👉 inputs.rl_initialization_input.seed:', inputs.rl_initialization_input.seed)
        return self.communicator.initialize(inputs).rl_initialization_output

Author

nov05 commented Mar 23, 2024 •

edited

Loading

one solution for reference: an env with 1 agent, score reached 30+ after 280 episodes. check the code.
one visual result for reference: an env with 20 agents, trained
Shangtong Zhang's deeprl
endtoend.ai's DDPG score playing mujoco reacher: -4.01
my code (integrated with deeprl): 1. (mujoco) reacher-v2_train, 2. (mujoco) reacher-v2_eval, 3. unity-reacher_train, 4. unity-reacher-v2_eval
🟢⚠️ issue solved: The models don't seem to learn for the Unity Reacher game. They perform well in Mujoco Reacher (reaching a score of -5), but their learning halts after 40 episodes when playing Unity Reacher (reaching only a score of 6 instead of the expected score of 30+). Possible causes include bugs in the logic to get episodic_return_train for multiple environments or issues with the hyperparameter configurations.
solution: the models were not learning possibly due to the following causes:
- q_critic and q_target had a shape of (mini_batch_size, 1), the output of MSE as loss value is an empty tensor
- optimizer learning rate was 1e-3, probably too large
- optimizer params included phi_body, a dummy module
- zero_grad on the network, rather than the optimizer (theoretically it shouldn't be a problem)
- when using the local network to generate actions, it didn't turn on the eval mode
- ...

❌ the old code:

actor_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-3)
self.actor_opt = actor_opt_fn(list(self.actor_body.parameters()) + list(self.phi_body.parameters()))
self.network.zero_grad()
critic_loss = (q_critic - q_target).pow(2).mul(0.5).sum(-1).mean()  ## returns torch([]), empty tensor

🟢 my code:

actor_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-4)
self.actor_opt = actor_opt_fn(list(self.actor_body.parameters()))
self.network.critic_opt.zero_grad()  ## added by nov05
critic_loss = torch.mean((q_critic-q_target).pow(2).mul(0.5).sum(-1), 0)  ## RMSE

Author

nov05 commented Apr 7, 2024 •

edited

Loading

🟢⚠️ issue solved: training has been slow. added torch.nn.BatchNorm1d, however, got the following error. my task has multiple unity envs, each env has multiple agents, torch.Size([1, 1, 33]) means there is 1 env 1 agent.

2024-04-07 03:32:29,914 - root - INFO: Episode 0, Step 0, 0.00 s/episode
🟢 Unity environment has been resetted.
👉 torch.Size([1, 1, 33]) BatchNorm1d(33, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

refer to this set of model training hypermeters
solution: the original code is for mujoco, 1 env with 1 agent, hence the shape of tensors, such as actions and states, are 2 dimensional. for unity, 1 env with multiple agents, hence the shape of tensors need to reduce 1 dimension for the neural networks.

Author

nov05 commented Apr 9, 2024 •

edited

Loading

🟢⚠️ issue solved: neural network nn.BatchNorm1d layer threw error ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, state_size]) when it was actually evaluating. during training, tensor sizes are usually like [mini_batch_size, state_size], no error will be given. it turned out that i forgot to turn on eval mode of the network. it makes sense that you can't normalize a single channel of values. and this layer probably is skipped during evaluation.

    ## neural network
    config.network_fn = lambda: DeterministicActorCriticNet(
        config.state_dim,  
        config.action_dim,  
        actor_body=FCBody(config.state_dim, (128,128), gate=nn.LeakyReLU, 
                          init_method='uniform_fan_in', 
                          batch_norm=nn.BatchNorm1d,),
        critic_body=FCBody(config.state_dim+config.action_dim, (128,128), gate=nn.LeakyReLU, 
                           init_method='uniform_fan_in', batch_norm=nn.BatchNorm1d),
        actor_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-4),
        ## for the critic optimizer, it seems that 1e-3 won't converge
        critic_opt_fn=lambda params: torch.optim.Adam(params, lr=1e-4, weight_decay=1e-5),  
        # batch_norm=nn.BatchNorm1d,
        )

DeterministicActorCriticNet(
  (phi_body): DummyBody()
  (actor_body): FCBody(
    (layers): ModuleList(
      (0): Linear(in_features=33, out_features=128, bias=True)
      (1): LeakyReLU(negative_slope=0.01)
      (2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): Linear(in_features=128, out_features=128, bias=True)
      (4): LeakyReLU(negative_slope=0.01)
      (5): Linear(in_features=128, out_features=4, bias=True)
      (6): Tanh()
    )
  )
  (critic_body): FCBody(
    (layers): ModuleList(
      (0): Linear(in_features=37, out_features=128, bias=True)
      (1): LeakyReLU(negative_slope=0.01)
      (2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): Linear(in_features=128, out_features=128, bias=True)
      (4): LeakyReLU(negative_slope=0.01)
      (5): Linear(in_features=128, out_features=1, bias=True)
    )
  )
)

Author

nov05 commented Apr 15, 2024 •

edited

Loading

🟢⚠️ issue solved: alphazero folder jupyter notebook: %matplotlib notebook threw Javascript Error: IPython is not defined.

$ jupyter notebook ..\alphazero\alphazero-TicTacToe-advanced.ipynb

✅solution 1: downgrade $ pip install "notebook<7".
then $ conda install -c conda-forge nbconvert mistune to fix the Jupyter Notebook - 500 Internal Server Error
❌solution 2: upgrade jupyterlab

jupyter lab --version
pip install --upgrade jupyterlab
ipython --version
pip install --upgrade ipython

my env drlnd_py310 upgraded jupyterlab from 4.1.4 to jupyterlab-4.1.6, ipython from 8.22.2 to ipython-8.23.0.

Author

nov05 commented Oct 19, 2024 •

edited

Loading

🟢⚠️ issue solved: p3 unity tennis game (MADDPG), error raised when forwarding states through the local neural network to get actions.

$ python -m experiments.deeprl_maddpg_continuous --is_training True
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x24 and 8x128)

  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\deeprl_maddpg_continuous.py", line 133, in <module>
    maddpg_continuous(game='unity-tennis',
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\deeprl_maddpg_continuous.py", line 76, in maddpg_continuous
    run_episodes(DDPGAgent(config))  ## log by episodes
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\utils\misc.py", line 97, in run_episodes
    agent.eval_episodes(by_episode=config.by_episode)
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\agent\BaseAgent.py", line 80, in eval_episodes
    episodic_returns = self.eval_episode()
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\agent\BaseAgent.py", line 60, in eval_episode
    actions = self.eval_step(self.eval_states)
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\agent\DDPG_agent.py", line 134, in eval_step
    actions = to_np(self.network(states))  ## get actions from the local network
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\network\network_heads.py", line 169, in forward
    action = self.actor(phi)
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\network\network_heads.py", line 175, in actor
    x = layer(x)
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\torch\nn\modules\linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x24 and 8x128)

◼️ Debug: Set breakpoints in ..\python\experiments\deeprl_maddpg_continuous.py to check actions and states shape. Both are fine. However the shape of mat1 is 4x24 for 2 eval envs, 10x24 for 5 eval envs. There is an issue with the input dimensions.

input length is supposed to be 24 - , now it is 8.
check the params in file ..\python\unityagents\brain.py.

class BrainParameters:
    def __init__(self, brain_name, brain_param):
        self.vector_observation_space_size = brain_param["vectorObservationSize"]  ## 8
        self.num_stacked_vector_observations = brain_param["numStackedVectorObservations"]  ## 3

change the code in file ..\python\deeprl\component\envs.py
from brain_params.vector_observation_space_size to brain_params.vector_observation_space_size*brain_params.num_stacked_vector_observations

def get_unity_spaces(brain_params: BrainParameters): 
    """
    tranlate Unity ML-Agents spaces to gym spaces for compatibility with the deeprl and Baselines packages
    """
    if brain_params.vector_observation_space_type=='continuous':
        observation_space = Box(
            float('-inf'), float('inf'), 
            (brain_params.vector_observation_space_size*brain_params.num_stacked_vector_observations,), 
            np.float64)

Author

nov05 commented Oct 27, 2024 •

edited

Loading

🟢⚠️ issue solved: "Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). " Add .detach() to a (local actors output tensors concatenated together).

$ python -m experiments.deeprl_maddpg_continuous --is_training True
..\python\deeprl\agent\MADDPG_agent.py

Traceback (most recent call last):
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\deeprl_maddpg_continuous.py", line 142, in <module>
    maddpg_continuous(game='unity-tennis',
  File "D:\github\udacity-deep-reinforcement-learning\python\experiments\deeprl_maddpg_continuous.py", line 85, in maddpg_continuous
    run_episodes(MADDPGAgent(config))  ## log by episodes
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\utils\misc.py", line 101, in run_episodes
    agent.step()
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\agent\MADDPG_agent.py", line 159, in step
    actor_loss.backward()
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\torch\_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\site-packages\torch\autograd\__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

actor_loss = -self.networks[i].critic(
                    states_.reshape(self.config.mini_batch_size, -1), 
                    a.reshape(self.config.mini_batch_size, -1).detach()
                    ).mean(dim=0)

Author

nov05 commented Oct 27, 2024 •

edited

Loading

🟢⚠️ issue solved: Tennis game, more than 1 env to train and test. reset the envs once they are done.

solution: when ``self.statesis None, the agent will reset the envs. hence make sureself.states = None ## reset`.

Max episodes:  39%|███████████████████████████████████▉                                                        | 78/200 [00:31<00:41,  2.91it/s]2024-10-27 02:36:37,299 - root - INFO: Episode 78, Step 1456, 0.04 s/episode
2024-10-27 02:36:37,368 - root - INFO: Episode 78, Step 1483, episodic_return_train 0.05000000074505806
2024-10-27 02:36:37,368 - root - INFO: Episode 79, Step 1484, 0.07 s/episode
Process SpawnProcess-2:
Process SpawnProcess-1:
Traceback (most recent call last):
Traceback (most recent call last):
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\process.py", line 314, in _bootstrap
    self.run()
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\process.py", line 314, in _bootstrap
    self.run()
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\envs.py", line 372, in unity_worker
    brain_info = env.step(data)[brain_name] ## info type ".unityagents.brain.BrainInfo"
  File "D:\github\udacity-deep-reinforcement-learning\python\deeprl\component\envs.py", line 372, in unity_worker
    brain_info = env.step(data)[brain_name] ## info type ".unityagents.brain.BrainInfo"
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\environment.py", line 384, in step
    raise UnityActionException("⚠️ The episode is completed. Reset the environment with 'reset()'")
  File "D:\github\udacity-deep-reinforcement-learning\python\unityagents\environment.py", line 384, in step
    raise UnityActionException("⚠️ The episode is completed. Reset the environment with 'reset()'")
unityagents.exception.UnityActionException: ⚠️ The episode is completed. Reset the environment with 'reset()'
unityagents.exception.UnityActionException: ⚠️ The episode is completed. Reset the environment with 'reset()'
Max episodes:  40%|████████████████████████████████████▎                                                       | 79/200 [00:38<00:58,  2.06it/s]
Traceback (most recent call last):
  File "D:\Users\guido\miniconda3\envs\drlnd_py310\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended

It seems for the Unity Reacher game (p2), all episodes have the same number of steps to finish. However for the Unity Tennis game, episodes' number of steps vary?
in python\deeprl\agent\BaseAgent.py:

        if self.config.num_workers > 0:  ## agent could have no task when eval
            self.total_episodic_returns = [None] * self.config.task.num_envs   ## added by nov05
            self.episode_dones = [False] * self.config.task.num_envs  ## added by nov05

and in MADDPGAgent and DDPGAgent, change the logic to decide whether all envs have done. do the same to the eval logic:

        ## check whether the episode is done
        for i,(done,info) in enumerate(zip(dones,infos)):
            if np.any(done):  ## or np.all(done) which should be the same
                self.episode_dones[i] = True
                self.total_episodic_returns[i] = info['episodic_return']
        if all(self.episode_dones): ## all envs finish one episode
            ## reset self.episode_dones in "python\deeprl\utils\misc.py"
            ## log train returns
            self.record_online_return(self.total_episodic_returns, 
                                      by_episode=self.config.by_episode)  
            self.states = None  ## reset
            self.total_episodic_returns = [None] * self.task.num_envs  ## reset
            self.total_episodes += 1
        self.total_steps += 1

Author

nov05 commented Oct 28, 2024 •

edited

Loading

⚠️ issue: p3 Unity Tennis game, MADDPG agent, if it uses the PrioritizedReplay buffer, sampled states etc. will contain nans, which will cause all the neural network outputs, such as a_target (action), q_target (Q-value), a, q, etc. to be nans.

debug: the local critic gets NaNs, hence actor loss is NaN during training. However the target critic and previous local critic forward seem fine. states_ could range [-20, 20] or more, and a (actions) [-1, 1].
actor_loss = -self.networks[i].critic(states_.reshape(self.config.mini_batch_size,-1), a).mean(dim=0)
try to clip the actions to be within the action space, which is [-1,1] for Unity Tennis.
try to clip the states to be within the range of [-10,10].
config.state_normalizer = MeanStdNormalizer()
try to clip the gradients before the optimizers step.
torch.nn.utils.clip_grad_norm_(self.networks[i].critic_body.parameters(), max_norm=1.0)
torch.nn.utils.clip_grad_norm_(self.networks[i].actor_body.parameters(), max_norm=1.0)

debug network parameters:

              q = self.networks[i].critic(states_.reshape(self.config.mini_batch_size, -1), a)
              if torch.isnan(q).any():
                  print('🙄 q', q)
                  for param in self.networks[i].critic_body.parameters():
                      if torch.isnan(param).any():
                          print("🙄 NaN found in parameters")
                      if torch.isinf(param).any():
                          print("🙄 Inf found in parameters")

Then found a bug. I worte something wrong, sampling_probs_ = tensor(transitions.mask).

          sampling_probs_ = tensor(transitions.sampling_prob).unsqueeze(-1).transpose(0, 1)
          sample_weights_ = 1.0 / (sampling_probs_ * self.replay.size())  ## Caution: it might create Inf

Author

nov05 commented Nov 2, 2024 •

edited

Loading

🟢⚠️ issue: pip install torchrl unsuccessfully. uninstalled it. then got error. run `` to reinstall torchvision. got another error.

no time for this. just reinstalled the drlnd_py310 local environment.
https://gist.github.com/Nov05/36ed6fff08f16f29c364090844eb1d24
create a back up env drlnd_py310 for drlnd_py310.

(drlnd_py310) PS D:\github\udacity-deep-reinforcement-learning\python> python -m experiments.deeprl_maddpg_continuous --is_training True                  
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

(drlnd_py310) PS D:\github\udacity-deep-reinforcement-learning> conda deactivate drlnd_py310
(base) PS D:\github\udacity-deep-reinforcement-learning> conda create --name drlnd_py310_backup --clone drlnd_py310
Source:      D:\Users\guido\miniconda3\envs\drlnd_py310
Destination: D:\Users\guido\miniconda3\envs\drlnd_py310_backup
Packages: 115
Files: 40037

Downloading and Extracting Packages:


Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate drlnd_py310_backup
#
# To deactivate an active environment, use
#

nov05/20240223_udacity_drlnd_p2_env.md

Udacity Deep Reinforcement Learning - p2 & deeprl env setup

folder ./python/deeprl structure

nov05 commented Feb 29, 2024 • edited Loading

nov05 commented Mar 2, 2024 • edited Loading

nov05 commented Mar 2, 2024 • edited Loading

nov05 commented Mar 2, 2024 • edited Loading

nov05 commented Mar 3, 2024 • edited Loading

nov05 commented Mar 3, 2024 • edited Loading

nov05 commented Mar 3, 2024 • edited Loading

nov05 commented Mar 4, 2024 • edited Loading

nov05 commented Mar 4, 2024 • edited Loading

nov05 commented Mar 4, 2024 • edited Loading

nov05 commented Mar 4, 2024 • edited Loading

nov05 commented Mar 5, 2024 • edited Loading

nov05 commented Mar 5, 2024 • edited Loading

nov05 commented Mar 5, 2024 • edited Loading

nov05 commented Mar 5, 2024 • edited Loading

nov05 commented Mar 7, 2024 • edited Loading

nov05 commented Mar 11, 2024 • edited Loading

nov05 commented Mar 11, 2024 • edited Loading

nov05 commented Mar 11, 2024

nov05 commented Mar 14, 2024 • edited Loading

nov05 commented Mar 15, 2024 • edited Loading

nov05 commented Mar 23, 2024 • edited Loading

nov05 commented Apr 7, 2024 • edited Loading

nov05 commented Apr 9, 2024 • edited Loading

nov05 commented Apr 15, 2024 • edited Loading

nov05 commented Oct 19, 2024 • edited Loading

nov05 commented Oct 27, 2024 • edited Loading

nov05 commented Oct 27, 2024 • edited Loading

nov05 commented Oct 28, 2024 • edited Loading

nov05 commented Nov 2, 2024 • edited Loading

Udacity Deep Reinforcement Learning - p2 & `deeprl` env setup

folder `./python/deeprl` structure

nov05 commented Feb 29, 2024 •

edited

Loading

nov05 commented Mar 2, 2024 •

edited

Loading

nov05 commented Mar 2, 2024 •

edited

Loading

nov05 commented Mar 2, 2024 •

edited

Loading

nov05 commented Mar 3, 2024 •

edited

Loading

nov05 commented Mar 3, 2024 •

edited

Loading

nov05 commented Mar 3, 2024 •

edited

Loading

nov05 commented Mar 4, 2024 •

edited

Loading

nov05 commented Mar 4, 2024 •

edited

Loading

nov05 commented Mar 4, 2024 •

edited

Loading

nov05 commented Mar 4, 2024 •

edited

Loading

nov05 commented Mar 5, 2024 •

edited

Loading

nov05 commented Mar 5, 2024 •

edited

Loading

nov05 commented Mar 5, 2024 •

edited

Loading

nov05 commented Mar 5, 2024 •

edited

Loading

nov05 commented Mar 7, 2024 •

edited

Loading

nov05 commented Mar 11, 2024 •

edited

Loading

nov05 commented Mar 11, 2024 •

edited

Loading

nov05 commented Mar 14, 2024 •

edited

Loading

nov05 commented Mar 15, 2024 •

edited

Loading

nov05 commented Mar 23, 2024 •

edited

Loading

nov05 commented Apr 7, 2024 •

edited

Loading

nov05 commented Apr 9, 2024 •

edited

Loading

nov05 commented Apr 15, 2024 •

edited

Loading

nov05 commented Oct 19, 2024 •

edited

Loading

nov05 commented Oct 27, 2024 •

edited

Loading

nov05 commented Oct 27, 2024 •

edited

Loading

nov05 commented Oct 28, 2024 •

edited

Loading

nov05 commented Nov 2, 2024 •

edited

Loading