Shashank Gupta shashankg7

*2vec papers

Curriculum Learning - When training machine learning models, start with easier subtasks and gradually increase the difficulty level of the tasks.
Motivation comes from the observation that humans and animals seem to learn better when trained with a curriculum like a strategy.
Link to the paper.

duplicates = multiple editions

	'''This script goes along the blog post
	"Building powerful image classification models using very little data"
	from blog.keras.io.
	It uses data that can be downloaded at:
	https://www.kaggle.com/c/dogs-vs-cats/data
	In our setup, we:
	- created a data/ folder
	- created train/ and validation/ subfolders inside data/
	- created cats/ and dogs/ subfolders inside train/ and validation/
	- put the cat pictures index 0-999 in data/train/cats

	""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
	import numpy as np
	import cPickle as pickle
	import gym

	# hyperparameters
	H = 200 # number of hidden layer neurons
	batch_size = 10 # every how many episodes to do a param update?
	learning_rate = 1e-4
	gamma = 0.99 # discount factor for reward

	#Get the data here http://grouplens.org/datasets/movielens/
	movielens = sc.textFile("../in/ml-100k/u.data")

	movielens.first() #u'196\t242\t3\t881250949'
	movielens.count() #100000

	#Clean up the data by splitting it
	#Movielens readme says the data is split by tabs and
	#is user product rating timestamp
	clean_data = movielens.map(lambda x:x.split('\t'))

	from keras.models import Graph
	from keras.layers import containers
	from keras.layers.core import Dense, Dropout, Activation, Reshape, Flatten
	from keras.layers.embeddings import Embedding
	from keras.layers.convolutional import Convolution2D, MaxPooling2D

	def ngram_cnn(n_vocab, max_length, embedding_size, ngram_filters=[2, 3, 4, 5], n_feature_maps=100, dropout=0.5, n_hidden=15):
	"""A single-layer convolutional network using different n-gram filters.

	Parameters

	"""
	Code to parse sklearn classification_report
	"""
	##
	import sys
	import collections
	##
	def parse_classification_report(clfreport):
	"""
	Parse a sklearn classification report into a dict keyed by class name

	"""
	Implementations of:

	Probabilistic Matrix Factorization (PMF) [1],
	Bayesian PMF (BPMF) [2],
	Modified BPFM (mBPMF)

	using `pymc3`. mBPMF is, to my knowledge, my own creation. It is an attempt
	to circumvent the limitations of `pymc3` w/regards to the Wishart distribution:

	#!/usr/bin/env python
	# -- coding: utf-8 --

	'''

	This script just show the basic workflow to compute TF-IDF similarity matrix with Gensim


	OUTPUT :