Skip to content

Instantly share code, notes, and snippets.

@mortehu
mortehu / google-cloud-storage-list.py
Created April 16, 2015 15:18
Minimal Google Cloud Storage API Python example
#!/usr/bin/env python
"""Sample Google Cloud Storage API client.
Based on <https://cloud.google.com/storage/docs/json_api/v1/json-api-python-samples>,
but removed parts that are not relevant to the Cloud Storage API.
Assumes the use of a service account, whose secrets are stored in
$HOME/google-api-secrets.json"""
@inkhorn
inkhorn / cellphone analysis.R
Created April 6, 2015 20:47
Cell Phone Analysis
library(jsonlite)
cp = fromJSON(txt = "Cell Phone Data.txt", simplifyDataFrame = TRUE)
num.atts = c(4,9,11,12,13,14,15,16,18,22)
cp[,num.atts] = sapply(cp[,num.atts], function (x) as.numeric(x))
cp$aspect.ratio = cp$att_pixels_y / cp$att_pixels_x
cp$isSmartPhone = ifelse(grepl("smart|iphone|blackberry", cp$name, ignore.case=TRUE) == TRUE | cp$att_screen_size >= 4, "Yes", "No")
@steadyfish
steadyfish / dplyr_functions_programmatic_use.R
Last active October 12, 2020 09:27
using dplyr functions programmatically
# using dplyr finctions in non-interactive mode
# examples
library(plyr)
library(dplyr)
d1 = data_frame(x = seq(1,20),y = rep(1:10,2),z = rep(1:5,4))
head(d1)
#### single table verbs ####
@hadley
hadley / ds-training.md
Created March 13, 2015 18:49
My advise on what you need to do to become a data scientist...

If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

  • Statistical knowledge
  • Programming/hacking skills
  • Domain expertise

Statistical knowledge

@johnynek
johnynek / AliceInAggregatorLand.scala
Last active January 24, 2024 19:38
A REPL Example of using Aggregators in scala
/**
* To get started:
* git clone https://github.com/twitter/algebird
* cd algebird
* ./sbt algebird-core/console
*/
/**
* Let's get some data. Here is Alice in Wonderland, line by line
*/
@abresler
abresler / tufte
Last active July 4, 2023 18:56
Recreating Edward Tufte's New York City Weather Visualization
library(dplyr)
library(tidyr)
library(magrittr)
library(ggplot2)
"http://academic.udayton.edu/kissock/http/Weather/gsod95-current/NYNEWYOR.txt" %>%
read.table() %>% data.frame %>% tbl_df -> data
names(data) <- c("month", "day", "year", "temp")
data %>%
group_by(year, month) %>%
@bsweger
bsweger / useful_pandas_snippets.md
Last active August 10, 2025 13:33
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@miki725
miki725 / .bash_prompt.sh
Last active July 14, 2025 18:23
Custom bash prompt which displays: (virtualenv) user:/path (git-branch)
#!/bin/bash
#
# DESCRIPTION:
#
# Set the bash prompt according to:
# * the active virtualenv
# * the branch of the current git/mercurial repository
# * the return value of the previous command
# * the fact you just came from Windows and are used to having newlines in
# your prompts.
@stucchio
stucchio / bayesian_ab_test.py
Last active April 2, 2023 03:17
Bayesian A/B test code
from matplotlib import use
from pylab import *
from scipy.stats import beta, norm, uniform
from random import random
from numpy import *
import numpy as np
import os
# Input data
recipes = readLines('recipes combined.tsv')
# Once I read it into R, I have to get rid of the /t
# characters so that it's more acceptable to the tm package
recipes.new = apply(as.matrix(recipes), 1, function (x) gsub('\t',' ', x))
recipes.corpus = Corpus(VectorSource(recipes.new))
recipes.dtm = DocumentTermMatrix(recipes.corpus)