Skip to content

Instantly share code, notes, and snippets.

View vaibkumr's full-sized avatar
🌊
Gradient wave off kanagawa

Vaibhav Kumar vaibkumr

🌊
Gradient wave off kanagawa
View GitHub Profile
import torch
batch = torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
print(batch.shape) #torch.Size([64, 3, 100, 100])
batch = batch.align_to('N', 'H', 'W', 'C')
print(batch.shape) #torch.Size([64, 100, 100, 3])
import torch
batch = torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
print(batch.names)
{"help":{
"help":"display the list of commands and their functions",
"info" : "Fetch personal information",
"clear" : "Clear screen",
"all" : "Print all information",
"contact" : "Fetch contact details",
"projects" : "Fetch personal information",
"technical_strengths" : "Print technical strengths ",
"publications" : "Print publications",
"any other command" : "command detail"
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import gym
import numpy as np
import time
"""
SARSA on policy learning python implementation.
This is a python implementation of the SARSA algorithm in the Sutton and Barto's book on
RL. It's called SARSA because - (state, action, reward, state, action). The only difference
between SARSA and Qlearning is that SARSA takes the next action based on the current policy
while qlearning takes the action with maximum utility of next state.
import gym
import numpy as np
import time
"""
Qlearning is an off policy learning python implementation.
This is a python implementation of the qlearning algorithm in the Sutton and
Barto's book on RL. It's called SARSA because - (state, action, reward, state,
action). The only difference between SARSA and Qlearning is that SARSA takes the
next action based on the current policy while qlearning takes the action with
import gym
import numpy as np
import time
"""
SARSA on policy learning python implementation.
This is a python implementation of the SARSA algorithm in the Sutton and Barto's book on
RL. It's called SARSA because - (state, action, reward, state, action). The only difference
between SARSA and Qlearning is that SARSA takes the next action based on the current policy
while qlearning takes the action with maximum utility of next state.
def epsilon_greedy(Q, epsilon, n_actions, s, train=False):
"""
@param Q Q values state x action -> value
@param epsilon for exploration
@param s number of states
@param train if true then no random actions selected
"""
if train or np.random.rand() < epsilon:
action = np.argmax(Q[s, :])
else:
@vaibkumr
vaibkumr / backward.py
Created January 7, 2019 15:12
backward function
import torch
# Creating the graph
x = torch.tensor(1.0, requires_grad = True)
z = x ** 3
z.backward() #Computes the gradient
print(x.grad.data) #Prints '3' which is dz/dx
@vaibkumr
vaibkumr / tracking.py
Last active January 6, 2019 12:34
Check if tracking is enabled
import torch
# Creating the graph
x = torch.tensor(1.0, requires_grad = True)
# Check if tracking is enabled
print(x.requires_grad) #True
y = x * 2
print(y.requires_grad) #True
with torch.no_grad():
# Check if tracking is enabled