Skip to content

Instantly share code, notes, and snippets.

View hughdbrown's full-sized avatar

Hugh Brown hughdbrown

View GitHub Profile
@hughdbrown
hughdbrown / ds_a_b_test.py
Last active October 2, 2015 15:39
Data science: a-b-test
import numpy
import scipy.stats as scs
def a_b_test(new_views, new_clicks, old_views, old_clicks, size=10000):
new_site = scs.beta(a=new_clicks + 1, b=new_views + 1).rvs(size=size)
old_site = scs.beta(a=old_clicks + 1, b=old_views + 1).rvs(size=size)
return (new_site > old_site).mean()
@hughdbrown
hughdbrown / aws-copy-s3-to-s3.md
Last active September 4, 2015 17:10
Copy s3 to s3

Here is how I copied data from one S3 bucket to another:

aws s3 sync s3://bitly-challenges/hdb_sanitized s3://hughdbrown/data-capstone

Adapted from stackoverflow

@hughdbrown
hughdbrown / data-resume-clustering.md
Last active April 6, 2018 20:29
Resume clustering

Resume clustering

Description

I have a resume, but does it say what I want it to say? Specifically, do machine learning algorithms cluster my resume with the job title I would like them to?

Data source

  • Linkedin data/resumes for various job titles: developer, devops, data scientist, full stack, etc.

Method

  1. Create a database of resumes: developer, devops, data scientist, full stack, etc.
  2. Train a K-means model
@hughdbrown
hughdbrown / data-homeaway.md
Created August 31, 2015 23:04
Homeaway data

Homeaway data

Description

Homeaway has data on vacation rentals. The data is not nearly so worked over as AirBNB data. Possibly there is something interesting in there to disover.

Data source

  • Homeaway API access The main problem with the project is that the Homeaway API is pretty opaque. I can't figure out how to get a data dump. Also, the API requires registration and advance permission.
@hughdbrown
hughdbrown / data-job-recommender.md
Created August 31, 2015 22:59
Job recommender that bootstraps from list of job postings

Job recommender

Description

So often, job sites give candidates job listings that are far off topic. The job title is often not applicable for the candidate, and less often, the location does not match the cadidate's location.

Question

Can we build a better system for users by applying a recommender system to existing public listings?

Data source

  • glassdoor.com API
  • indeed.com web scraping
@hughdbrown
hughdbrown / data-UN-voting-blocs.md
Created August 31, 2015 17:31
UN global warming voting blocs

UN global warming voting blocs

Description

I was listening on NPR today and heard that within the UN, there are about a dozen different blocs that vote together on global warming issues:

  • Switzerland alone
  • Developed countries
  • European group
  • "77 countries plus China" ... which is actually 134 countries
  • Various island nations most affected
@hughdbrown
hughdbrown / data-wikipedia.md
Created August 31, 2015 17:26
Data project in wikipedia

Wikipedia data

Description

I like wikipedia. There must be some sort of project I could do with this data.

Data source

  • Wikipedia There are accessible dumps of wikipedia data.
@hughdbrown
hughdbrown / how-to-put-your-project-online.md
Last active August 31, 2015 16:06
Getting your project online

How to put your project online

This is a short description of the infrastructure you need to set up to get your project reachable on the web.

Github hosting

AWS hosting

Heroku hosting

@hughdbrown
hughdbrown / data-chronic-kidney-disease.md
Last active August 31, 2015 17:35
Chronic kidney disease predictor

Chronic kidney disease

Description

Data source

@hughdbrown
hughdbrown / data-bitly.md
Last active September 1, 2015 10:42
Analysis of bit.ly data

Bit.ly data

Description

GermanWings crash/suicide news story spreads over bit.ly links.

Data source

  • bit.ly
  • twitter

Display style