Seyyed Hossein Hasanpour Coderx7

General guidelines for CPU performance on PyTorch

This file serves a BKM to get better performance on CPU for PyTorch, mostly focusing on inference or deployment. Chinese version available here.

1. Use mkldnn layout

layout refers to how data is organized in a tensor. PyTorch default layout is NCHW, from optimization perspective, MKL-DNN library (renamed as DNNL recently) may choose a different layout, sometimes refered to as internal layout or primitive layout. This is actually a normal technique for acceleration libraries, common knowledge is that NHWC runs faster than NCHW for convolution, changing the default NCHW to NHWC is called a reorder. MKL-DNN may choose different internal layouts based on the input pattern and the algorithm selected, e.g. nChw16c, a.k.a. reorder a 4-dim tensor into 5-dim by chop down dimension C by 16, for vectorization purpose (AVX512 instruction length is 16x32 bit).

By default on CPU, conv2d will ru

	package main

	import (
	"github.com/kardianos/service"
	"log"
	"flag"
	)

	type Service struct {}

	#include <Python.h>
	#include <stdio.h>
	/*
	* gcc embpython.c -I/usr/include/python2.7 -lpython
	**/
	void loadModule()
	{
	/* run objects with low-level calls */
	char arg1="sir", arg2="robin", *cstr;
	printf("Load Module err!\n");

	/* Example of embedding Python in another program */
	// to compile run:
	// gcc -o test $(python-config --cflags) test.c $(python-config --ldflags) && ./test

	#include<stdio.h>
	#include "Python.h"

	void initxyzzy(void); /* Forward */

	main(int argc, char **argv)

	workbench.main.js:3313 [Extension Host] debugger listening on port 36301
	workbench.main.js:238 [Extension Host] [vscode-icons] v8.7.0 activated!
	workbench.main.js:238 [Extension Host] (node:12808) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
	t.log @ workbench.main.js:238
	workbench.main.js:238 [Extension Host] Info Python Extension: 2019-06-07 22:02:44: Display locator refreshing progress, Class name = p, completed in 1ms, , Return Value: undefined
	workbench.main.js:238 [Extension Host] Info Python Extension: 2019-06-07 22:02:44: Notify locators are locating, Class name = p, completed in 1ms, , Return Value: undefined
	workbench.main.js:238 [Extension Host] Info Python Extension: 2019-06-07 22:02:44: Checking whether locactors have completed locating, Class name = p, completed in 0ms, , Return Value: false
	workbench.main.js:238 [Extension Host] Info Python Extension: 2019-06-07 22

	workbench.main.js:3313 [Extension Host] debugger listening on port 41068
	workbench.main.js:3311 Extension Host
	workbench.main.js:3311 Debugger listening on ws://127.0.0.1:41068/e4a995f0-d848-45d7-836c-3acdce098a58
	For help, see: https://nodejs.org/en/docs/inspector

	workbench.main.js:238 [Extension Host] (node:22448) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
	t.log @ workbench.main.js:238
	workbench.main.js:238 [Extension Host] [vscode-icons] v8.7.0 activated!
	workbench.main.js:238 [Extension Host] Info Python Extension: 2019-06-07 21:52:01: Display locator refreshing progress, Class name = p, completed in 1ms, , Return Value: undefined
	workbench.main.js:238 [Extension Host] Info Python Extension: 2019-06-07 21:52:01: Notify locators are locating, Class name = p, completed in 2ms, , Return Value: undefined

	workbench.main.js:3272 [Extension Host] debugger listening on port 16367
	workbench.main.js:3270 Extension Host
	workbench.main.js:3270 Debugger listening on ws://127.0.0.1:16367/d86fbd32-94fb-45fe-a384-4880d4073811
	For help, see: https://nodejs.org/en/docs/inspector

	workbench.main.js:238 [Extension Host] [vscode-icons] v8.7.0 activated!
	workbench.main.js:238 [Extension Host] (node:8044) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
	t.log @ workbench.main.js:238
	workbench.main.js:238 [Extension Host] Info Python Extension: 2019-06-04 01:29:01: Display locator refreshing progress, Class name = p, completed in 1ms, , Return Value: undefined
	workbench.main.js:238 [Extension Host] Info Python Extension: 2019-06-04 01:29:01: Notify locators are locating, Class name = p, completed in 2ms, , Return Value: undefined

	=> creating model 'simplenetv1_imagenet_3p'
	=> Model : simplenetv1_imagenet_3p(
	(features): Sequential(
	(0): Conv2d(3, 64, kernel_size=[3, 3], stride=(2, 2), padding=(1, 1))
	(1): BatchNorm2d(64, eps=1e-05, momentum=0.05, affine=True)
	(2): ReLU(inplace)
	(3): Conv2d(64, 128, kernel_size=[3, 3], stride=(2, 2), padding=(1, 1))
	(4): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True)
	(5): ReLU(inplace)
	(6): Conv2d(128, 128, kernel_size=[3, 3], stride=(1, 1), padding=(1, 1))

	# https://github.com/pytorch/vision/blob/master/torchvision/models/__init__.py
	import argparse
	import os
	import shutil
	import time
	import os, sys, pdb, shutil, time, random, datetime

	import torch
	import torch.nn as nn
	import torch.nn.parallel