Alonso alonsopg

NLTK API to Stanford NLP Tools compiled on 2015-12-09

Stanford NER

With NLTK version 3.1 and Stanford NER tool 2015-12-09, it is possible to hack the StanfordNERTagger._stanford_jar to include other .jar files that are necessary for the new tagger.

First set up the environment variables as per instructed at https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software

Getting Stanford NLP and MaltParser to work in NLTK for Windows Users

Firstly, I strongly think that if you're working with NLP/ML/AI related tools, getting things to work on Linux and Mac OS is much easier and save you quite a lot of time.

Disclaimer: I am not affiliated with Continuum (conda), Git, Java, Windows OS or Stanford NLP or MaltParser group. And the steps presented below is how I, IMHO, would setup a Windows computer if I own one.

Please please please understand the solution don't just copy and paste!!! We're not monkeys typing Shakespeare ;P

	def plot_confusion_matrix(cm, classes,normalize=False,title='Confusion matrix',cmap=plt.cm.Blues):
	if normalize:
	cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
	#print(cm)

	plt.imshow(cm, interpolation='nearest', cmap=cmap)
	plt.title(title)
	plt.colorbar()
	tick_marks = np.arange(len(classes))
	plt.xticks(tick_marks, classes, rotation=45)

	# Import requests (to download the page)
	import requests

	# Import BeautifulSoup (to parse what we download)
	from bs4 import BeautifulSoup

	# Import Time (to add a delay between the times the scape runs)
	import time

	# Import smtplib (to allow us to email)

	from textract import process
	import sys
	reload(sys)
	sys.setdefaultencoding("utf-8")
	# para spa, hay que bajarlo de aquí: https://github.com/tesseract-ocr/langdata/tree/master/spa y ponerlo en
	# el folder correspondiente
	def transform_files(input_directory, output_directory):
	import codecs, glob, os
	from collections import OrderedDict
	all_texts = OrderedDict()

	Instalamos:

	```
	user@MacBook-Pro-de-User-2:~$ brew install mysql
	```

	Para iniciar:
	```
	user@MacBook-Pro-de-User-2:~$ mysql.server start
	```

	### Boosting

	Muchos algoritmos como AdaBoost, LogitBoost, BrownBoost, XGboost usan una técnica llamada boosting para reducir el sesgo y la varianza en problemas de aprendizaje supervisado. En [Schapire, 1990](http://www.cs.princeton.edu/~schapire/papers/strengthofweak.pdf), se propone un proceso para convertir un estimador con bajo poder predictivo en un estimador con alto poder predictivo, mediante el uso de una metodologia que se basa en iterativamente construir estimadores debiles respecto a una distribucion, agregandolos a un clasificador final que tendrá un alto poder predictivo. En otras palabras, los estimadores debiles se construyen de forma secuencial y uno trata de reducir el sesgo del estimador combinado, es decir estamos combinando varios modelos débiles para producir un modelo más poderoso.

	[imagen tomada de Peter Prettenhofer]

	Boosting tambien es una téncnica de ensemble que consiste en una mezcla de expertos. A diferencia de modelos como RF, boosting aprende estos ensembles de manera sec

	#List unique values in a DataFrame column
	pd.unique(df.column_name.ravel())

	#Convert Series datatype to numeric, getting rid of any non-numeric values
	df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)

	#Grab DataFrame rows where column has certain values
	valuelist = ['value1', 'value2', 'value3']
	df = df[df.column.isin(value_list)]