Mehdi Cherti mehdidc

tmux cheatsheet

As configured in my dotfiles.

start new:

tmux

start new with session name:

Cython

Cython has two major benefits:

Making python code faster, particularly things that can't be done in scipy/numpy
Wrapping/interfacing with C/C++ code

Cython gains most of it's benefit from statically typing arguments. However, statically typing is not required, in fact, regular python code is valid cython (but don't expect much of a speed up). By incrementally adding more type information, the code can speed up by several factors. This gist just provides a very basic usage of cython.

What / Why

Deploy key is a SSH key set in your repo to grant client read-only (as well as r/w, if you want) access to your repo.

As the name says, its primary function is to be used in the deploy process, where only read access is needed. Therefore keep the repo safe from the attack, in case the server side is fallen.

How to

Generate a ssh key

(Internal Tranining Material)

Usually the first step in performance optimization is to do profiling, e.g. to identify performance hotspots of a workload. This gist tells basic knowledge of performance profiling on PyTorch, you will get:

How to find the bottleneck operator?
How to trace source file of a particular operator?
How do I indentify threading issues? (oversubscription)
How do I tell a specific operator is running efficiently or not?

This tutorial takes one of my recent projects - pssp-transformer as an example to guide you through path of PyTorch CPU peformance optimization. Focus will be on Part 1 & Part 2.

	# docker build -t ubuntu1604py36
	FROM ubuntu:16.04

	RUN apt-get update
	RUN apt-get install -y software-properties-common vim
	RUN add-apt-repository ppa:jonathonf/python-3.6
	RUN apt-get update

	RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
	RUN apt-get install -y git

	"""
	Example TensorFlow script for finetuning a VGG model on your own data.
	Uses tf.contrib.data module which is in release v1.2
	Based on PyTorch example from Justin Johnson
	(https://gist.github.com/jcjohnson/6e41e8512c17eae5da50aebef3378a4c)

	Required packages: tensorflow (v1.2)
	Download the weights trained on ImageNet for VGG:
	```
	wget http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz

	"""
	A bare bones examples of optimizing a black-box function (f) using
	Natural Evolution Strategies (NES), where the parameter distribution is a
	gaussian of fixed standard deviation.
	"""

	import numpy as np
	np.random.seed(0)

	# the function we want to optimize

	#!/bin/bash
	# This file sets the environment variable CUDA_VISIBLE_DEVICES to the MPI local rank to enable multi-GPU usage of this benchmark. Note that this disables any GPU distribution handling by he batch scheduler.
	# Background: Most/some batch schedulers set CUDA_VISIBLE_DEVICES to all available GPUs on a node. In that case, the Arbor benchmark would only use the first entry in the list, probably GPU#0. This script changes that.
	# -Andreas Herten, Nov 2018

	_verbose=1

	localrank=$CUDA_VISIBLE_DEVICES

	if [[ -n "$OMPI_COMM_WORLD_NODE_RANK" ]]; then

	#!/usr/bin/env bash
	set -e

	cd

	case "$OSTYPE" in
	darwin*) DOWNLOAD=https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh ;;
	linux*) DOWNLOAD=https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh ;;
	*) echo "unknown: $OSTYPE" ;;
	esac

	# use ImageMagick convert
	# the order is important. the density argument applies to input.pdf and resize and rotate to output.pdf
	convert -density 90 input.pdf -rotate 0.5 -attenuate 0.2 +noise Multiplicative -colorspace Gray output.pdf