Vaibhav Kumar vaibkumr

🌊

Gradient wave off kanagawa

vaibkumr / nTensor2.py

Created October 11, 2019 16:32

none

vaibkumr / nTensor1.py

Created October 11, 2019 16:28

null

vaibkumr / aaa_data.json

Created June 21, 2019 14:02

	{"help":{
	"help":"display the list of commands and their functions",
	"info" : "Fetch personal information",
	"clear" : "Clear screen",
	"all" : "Print all information",
	"contact" : "Fetch contact details",
	"projects" : "Fetch personal information",
	"technical_strengths" : "Print technical strengths ",
	"publications" : "Print publications",
	"any other command" : "command detail"

vaibkumr / kek.ipynb

Created May 26, 2019 20:28

kek

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

vaibkumr / expectedsarsa.py

Created March 20, 2019 10:03

	import gym
	import numpy as np
	import time

	"""
	SARSA on policy learning python implementation.
	This is a python implementation of the SARSA algorithm in the Sutton and Barto's book on
	RL. It's called SARSA because - (state, action, reward, state, action). The only difference
	between SARSA and Qlearning is that SARSA takes the next action based on the current policy
	while qlearning takes the action with maximum utility of next state.

vaibkumr / qlearning.py

Created March 20, 2019 10:00

	import gym
	import numpy as np
	import time

	"""
	Qlearning is an off policy learning python implementation.
	This is a python implementation of the qlearning algorithm in the Sutton and
	Barto's book on RL. It's called SARSA because - (state, action, reward, state,
	action). The only difference between SARSA and Qlearning is that SARSA takes the
	next action based on the current policy while qlearning takes the action with

vaibkumr / SARSA.py

Created March 20, 2019 09:45

	import gym
	import numpy as np
	import time

	"""
	SARSA on policy learning python implementation.
	This is a python implementation of the SARSA algorithm in the Sutton and Barto's book on
	RL. It's called SARSA because - (state, action, reward, state, action). The only difference
	between SARSA and Qlearning is that SARSA takes the next action based on the current policy
	while qlearning takes the action with maximum utility of next state.

vaibkumr / epsilon_greedy.py

Created March 20, 2019 09:43

	def epsilon_greedy(Q, epsilon, n_actions, s, train=False):
	"""
	@param Q Q values state x action -> value
	@param epsilon for exploration
	@param s number of states
	@param train if true then no random actions selected
	"""
	if train or np.random.rand() < epsilon:
	action = np.argmax(Q[s, :])
	else:

vaibkumr / backward.py

Created January 7, 2019 15:12

backward function

	import torch
	# Creating the graph
	x = torch.tensor(1.0, requires_grad = True)
	z = x ** 3
	z.backward() #Computes the gradient
	print(x.grad.data) #Prints '3' which is dz/dx

vaibkumr / tracking.py

Last active January 6, 2019 12:34

Check if tracking is enabled

	import torch
	# Creating the graph
	x = torch.tensor(1.0, requires_grad = True)
	# Check if tracking is enabled
	print(x.requires_grad) #True
	y = x * 2
	print(y.requires_grad) #True

	with torch.no_grad():
	# Check if tracking is enabled