Skip to content

Instantly share code, notes, and snippets.

@wanchaol
wanchaol / MultiColumnLabelEncoder.py
Created March 7, 2016 00:51
Transform categorical features to numerical features
## Credit to: http://stackoverflow.com/questions/24458645/label-encoding-across-multiple-columns-in-scikit-learn
import pandas as pd
from sklearn.preprocessing import LabelEncoder
class MultiColumnLabelEncoder:
def __init__(self,columns = None):
self.columns = columns
def fit(self,X,y=None):
@wanchaol
wanchaol / tmux.conf
Created September 6, 2016 05:56 — forked from rajanand02/tmux.conf
Tmux configurations with status bar theme
# set prefix to control-f
set -g prefix C-f
#unbind system defined prefix
unbind C-b
# helps in faster key repetition
set -sg escape-time 0
# start session number from 1 rather than 0
@wanchaol
wanchaol / jekyll.py
Created September 21, 2016 21:37 — forked from cscorley/jekyll.py
IPython to Jekyll Markdown
try:
from urllib.parse import quote # Py 3
except ImportError:
from urllib2 import quote # Py 2
import os
import sys
BLOG_DIR = os.environ['BLOG_DIR']
# BLOG_DIR = '/Users/cscorley/git/cscorley.github.io/'
@wanchaol
wanchaol / gist:c3f052ca773862903be056567de74a32
Created November 28, 2016 05:52 — forked from zackdever/gist:8701478
arc diff off another diff
taken directly from https://sites.google.com/a/khanacademy.org/forge/for-developers/code-review-policy/using-phabricator
Advanced topic: Dependent Phabricator reviews
Say you have an upstream called master, and a feature branch F1, and a second change that depends on F1, (call it F2).
git checkout master
git checkout -b F1
# work work
git commit -a
arc diff
NO_CUDA=1 DEBUG=1 CC=clang CXX=clang++ python setup.py clean install:
running clean
running install
running build_deps
+ USE_CUDA=0
+ USE_ROCM=0
+ USE_NNPACK=0
+ USE_MKLDNN=0
+ USE_GLOO_IBVERBS=0

A Tour of PyTorch Internals (Part I)

The fundamental unit in PyTorch is the Tensor. This post will serve as an overview for how we implement Tensors in PyTorch, such that the user can interact with it from the Python shell. In particular, we want to answer four main questions:

  1. How does PyTorch extend the Python interpreter to define a Tensor type that can be manipulated from Python code?
  2. How does PyTorch wrap the C libraries that actually define the Tensor's properties and methods?
  3. How does PyTorch cwrap work to generate code for Tensor methods?
  4. How does PyTorch's build system take all of these components to compile and generate a workable application?

Extending the Python Interpreter

PyTorch defines a new package torch. In this post we will consider the ._C module. This module is known as an "extension module" - a Python module written in C. Such modules allow us to define new built-in object types (e.g. the Tensor) and to call C/C++ functions.

import torch
x = torch.randn(3, 3, requires_grad=True)
print(x)
min = float('nan')
max = 0.0
y = torch.clamp(x, min, max)
print('y', y)
y.sum().backward()
@wanchaol
wanchaol / cheat_sheet.txt
Created July 23, 2018 19:05
GDB cheat sheet
GDB commands by function - simple guide
---------------------------------------
More important commands have a (*) by them.
Startup
% gdb -help print startup help, show switches
*% gdb object normal debug
*% gdb object core core debug (must specify core file)
%% gdb object pid attach to running process
% gdb use file command to load object
In [1]: import torch
In [2]: input = torch.tensor([[0.2, -0.2, 0.07]], requires_grad=True)
...: target = torch.tensor([[0, 0, 1]])
...: outputs = torch.nn.functional.multilabel_margin_loss(input, target)
...:
...:
In [3]: outputs
Out[3]: tensor(1.0033, grad_fn=<MultilabelMarginLossBackward>)

Torch NN library support in JIT Script

The torch nn library is considered as the first library we want to support in JIT Script. It consists of modules including activation, conv, rnn, loss, etc. These modules inherits from torch.nn.Module, each of them is wrapping things up in different classes (like argument preprocessing, etc.), then call nn.functional to do real work. The nn.functional is also exposed to the users.

Proposed plan:

  1. identify the difference between nn functional ops and the corresponding ATen op, PR already out pytorch/pytorch#10409
  2. maintain a separate copy of nn.functional that do script annotation, apply some workarounds to make every function generate the IR
  3. open registration for nn.functional ops (possibly nn modules), to let us directly inline the graph in C++
  4. scripting nn.modules to make each submodule runs in script mode.