José P. González-Brenes josepablog

Machine Learning | EdTech | Data Science

7 followers · 1 following

@CheggEng
San Francisco, CA
www.josepablogonzalez.com

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

josepablog / DictVectorizer_Pandas.py

Last active May 30, 2018 11:04

Extracting features from a Pandas Dataframe does not work out of the box with DictVectorizer. This is an efficient way to extract your categorical features

	from sklearn.feature_extraction import DictVectorizer
	import pandas as pd

	df = pd.DataFrame({"user_name": ["a", "b", "c"]})
	fe_lm = DictVectorizer()
	design_lm = fe_lm.fit_transform(df.to_dict(orient="records"))

	# Note that this solution is MUCH faster (60 times) than transposing and converting into a dictionary:
	# http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/ is much slower

josepablog / to_redshift.py

Last active August 10, 2021 02:22 — forked from TomAugspurger/to_redshift.py

to_redshift.py

	import gzip
	from functools import wraps
	import boto3
	from sqlalchemy import MetaData
	from pandas import DataFrame
	from pandas.io.sql import SQLTable, pandasSQL_builder
	import psycopg2
	import codecs
	import cStringIO
	from io import BytesIO