Hamaad Shah hamaadshah

Here's a little walkthrough of how Yannick and I are using feature branches and pull requests to develop new features and adding them to the project. Below are the steps I take when working on a new feature. Hopefully this, along with watching the process on Github, will serve as a starting point to having everyone use a similar workflow.

Questions, comments, and suggestions for improvements welcome!

Start with the latest on master

When starting a new feature, I make sure to start with the latest and greatest codebase:

git checkout master

	# input_seqs is a batch of input sequences as a numpy array of integers (word indices in vocabulary) padded with zeroas
	input_seqs = Variable(torch.from_numpy(input_seqs.astype('int64')).long())

	# First: order the batch by decreasing sequence length
	input_lengths = torch.LongTensor([torch.max(input_seqs[i, :].data.nonzero()) + 1 for i in range(input_seqs.size()[0])])
	input_lengths, perm_idx = input_lengths.sort(0, descending=True)
	input_seqs = input_seqs[perm_idx][:, :input_lengths.max()]

	# Then pack the sequences
	packed_input = pack_padded_sequence(input_seqs, input_lengths.cpu().numpy(), batch_first=True)

	How much memory is permanently in memory vs how much is used for transformations
	(ratio)
	spark.storage.memoryFraction

	Suggested settings... (need to debug the logs after these settings)
	-XX:+UseG1GC -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc
	-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
	-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark
	-Xms88g -Xmx88g -XX:InitiatingHeapOccupancyPercent=35
	-XX:ConcGCThread=15 -XX:+AlwaysPreTouch

	# coding: utf8

	## based on this article: http://qiita.com/mokemokechicken/items/937a82cfdc31e9a6ca12

	import numpy as np
	from keras.models import Sequential
	from keras.engine.topology import Input, Container
	from keras.engine.training import Model
	from keras.layers.core import Dense

	import torch
	import torch.nn as nn
	import numpy as np
	import torch.optim as optim
	from torch.autograd import Variable

	# (1, 0) => target labels 0+2
	# (0, 1) => target labels 1
	# (1, 1) => target labels 3
	train = []