Chenjia Bai Baichenjia

Embodied AI, Reinforcement Learning, LLMs

Baichenjia / trans_file.py

Created August 31, 2020 01:48

trans log file from server

	import glob
	import os
	from gym import envs
	import shutil

	# collect all valid env names
	envnames = []
	for env_spec in envs.registry.all():
	name = env_spec.id
	if 'ram' not in name and "-v4" in name and 'Deterministic' not in name:

Baichenjia / grammarly.py

Created May 17, 2020 12:30

Grammarly Code

Baichenjia / example1.py

Last active October 25, 2021 11:16

Bootstrapped Q-function visualization

	import numpy as np
	import matplotlib.pyplot as plt
	import tensorflow as tf
	config = tf.ConfigProto()
	config.gpu_options.visible_device_list = '0'
	config.gpu_options.allow_growth = True
	tf.enable_eager_execution(config=config)

	PRIOR_SCALE = 2.

Baichenjia / gist:fc5609335f515743a60373b9705d4c74

Created August 2, 2017 07:12

Cross-Entropy Method

	# Paper: Szita I, Lörincz A. Learning Tetris using the noisy cross-entropy method.[J]. Neural Computation, 2006, 18(12):2936.
	# code: https://gym.openai.com/evaluations/eval_HIz0KjtWSvW06yKKPiaF5A

	# annotation
	# 1.Linear function approximation, the number of parameters to learn is the feature dimension of the state s + 1(bias)
	# 2.Cross-Entropy Method is an evolutionary algorithm that searches for the optimal parameters by iterating.
	First, “batch_size” vectors are sampled from a normal distribution of initial \mu and \sigma parameters,
	then these parameters are evaluated by a evaluate function, pip the top-n parameter vectors ordered by the evaluation result.
	use the new n_elite parameters to estimate the new \mu and \sigma.
	# 3. Iterate through the method 2 to update the values of \mu and \sigma.