Bilal Khan bilal2vec

Try more architectures
Basic architectures are sometimes better
Try other forms of ensembling than cv
Blend with linear regression
Rely more on shakeup predictions
Make sure copied code is correct
Pay more attention to correlations between folds
Try not to extensively tune hyperparameters
Optimizing thresholds can lead to "brittle" models
Random initializations between folds might help diversity\

Initialization

means and stddevs of activations should be close to 0 and 1 to prevent gradients exploding or vanishing
activations of layers have stddevs close to sqrt(num_input_channels)
so, to get the stddevs back to 1, multiply random weights by 1 / sqrt(c_in)
this works well without activations, but results in vanishing or exploding gradients when used with a tanh or sigmoid activation function
bias weights should be initialized to 0
intializations can either be from a uniform distribution or a normal distribution
use xavier for sigmoid and softmax activations

TPU info

v2 and v3
v2 has 180 tflops 64gb ram
- colab uses v2s
v3 has 420 tflops 128gb ram
two versions, single tpus or pods
- single tpus
8 cores each

Memory Usage

Background

GPU memory is used in a few main ways:

Memory to store the network's parameters
Memory to store the network's gradients
Memory to store the activations of the current batch
Memory used by optimizers (momentum, adam, etc) that stores running averages

	import os
	import numpy as np

	import torch
	import torchvision

	from torch.autograd import Variable
	import torch.nn as nn
	import torch.nn.functional as F

	version: "2"

	networks:
	gitea:
	external: false

	services:
	server:
	image: gitea/gitea:latest
	environment:

	optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
	scheduler = torch.optim.CyclicMomentum(optimizer)
	data_loader = torch.utils.data.DataLoader(...)
	for epoch in range(10):
	for batch in data_loader:
	scheduler.batch_step()
	train_batch(...)

	class Encoder(nn.Module):
	def __init__(self, in_ch, out_ch, r):
	super(Encoder, self).__init__()

	self.conv = nn.Conv2d(in_ch, out_ch, 3, padding=1)
	self.se = SqueezeAndExcitation(out_ch, r)

	def forward(self, x):
	x = F.relu(self.conv(x), inplace=True)
	x = self.se(x)

	# Send a notification to your phone directly with IFTTT (https://ifttt.com/) notifying
	# you when a training run ends or at the end of an epoch.
	notify({'value1': 'Notification title', 'value2': 'Notification body'}, key=[IFTTT_KEY])

	# Automatically set random seeds for Python, numpy, and Pytorch to make sure your results can be reproduced
	seed_envirionment(42)

	# Print how much GPU memory is currently allocated
	gpu_usage(device, digits=4)
	# GPU Usage: 6.5 GB