Hugh Brown hughdbrown

Here is how I copied data from one S3 bucket to another:

aws s3 sync s3://bitly-challenges/hdb_sanitized s3://hughdbrown/data-capstone

Adapted from stackoverflow

Resume clustering

Description

I have a resume, but does it say what I want it to say? Specifically, do machine learning algorithms cluster my resume with the job title I would like them to?

Data source

Linkedin data/resumes for various job titles: developer, devops, data scientist, full stack, etc.

Method

Create a database of resumes: developer, devops, data scientist, full stack, etc.
Train a K-means model

Homeaway data

Description

Homeaway has data on vacation rentals. The data is not nearly so worked over as AirBNB data. Possibly there is something interesting in there to disover.

Data source

Homeaway API access The main problem with the project is that the Homeaway API is pretty opaque. I can't figure out how to get a data dump. Also, the API requires registration and advance permission.

Job recommender

Description

So often, job sites give candidates job listings that are far off topic. The job title is often not applicable for the candidate, and less often, the location does not match the cadidate's location.

Question

Can we build a better system for users by applying a recommender system to existing public listings?

Data source

glassdoor.com API
indeed.com web scraping

UN global warming voting blocs

Description

I was listening on NPR today and heard that within the UN, there are about a dozen different blocs that vote together on global warming issues:

Switzerland alone
Developed countries
European group
"77 countries plus China" ... which is actually 134 countries
Various island nations most affected

Wikipedia data

Description

I like wikipedia. There must be some sort of project I could do with this data.

Data source

Wikipedia There are accessible dumps of wikipedia data.

How to put your project online

This is a short description of the infrastructure you need to set up to get your project reachable on the web.

	import numpy
	import scipy.stats as scs

	def a_b_test(new_views, new_clicks, old_views, old_clicks, size=10000):
	new_site = scs.beta(a=new_clicks + 1, b=new_views + 1).rvs(size=size)
	old_site = scs.beta(a=old_clicks + 1, b=old_views + 1).rvs(size=size)
	return (new_site > old_site).mean()

Hugh Brown hughdbrown

Resume clustering

Description

Data source

Method

Homeaway data

Description

Data source

Job recommender

Description

Question

Data source

UN global warming voting blocs

Description

Wikipedia data

Description

Data source

How to put your project online

Github hosting

AWS hosting

Heroku hosting

Chronic kidney disease

Description

Data source

Bit.ly data

Description

Data source

Display style