Skip to content

Instantly share code, notes, and snippets.

View gfelitti's full-sized avatar

Guilherme Felitti gfelitti

  • São Paulo
View GitHub Profile
@seanjtaylor
seanjtaylor / gist:568141f04a16d518be24
Created February 11, 2015 01:46
Reshaping a Pandas dataframe into a sparse matrix
import pandas as pd
import scipy.sparse as sps
df = pd.DataFrame({'tag1': ['sean', 'udi', 'bogdan'], 'tag2': ['sean', 'udi', 'udi'], 'freq': [1,2,3]})
# tag1 -> rows, tag2 -> columns
df.set_index(['tag1', 'tag2'], inplace=True)
mat = sps.coo_matrix((df.freq, (df.index.labels[0], df.index.labels[1])))
print(mat.todense())
#!/usr/bin/python
import urllib2
from bs4 import BeautifulSoup
# Abre a pagina principal do IBGE onde contem os links para os estados.
html = urllib2.urlopen('http://cidades.ibge.gov.br/xtras/home.php').read()
# Pega o conteudo da pagina em HTML e joga para o BeautifulSoup mapear as tags
soup = BeautifulSoup(html)
@bsweger
bsweger / useful_pandas_snippets.md
Last active December 1, 2025 15:53
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@gotoplanb
gotoplanb / journalism-case-studies.md
Created January 8, 2014 16:10
Case studies useful for differing journalism job titles.
@AlexandreAbraham
AlexandreAbraham / unsupervised_alt.py
Last active July 31, 2024 03:42
These are two implementations of the silhouette score. They are compatible with the scikit learn implementation but offers different drawbacks in term of complexity and memory usage. The slow version needs no memory but is painfully slow and should, I think, not be used. The second one is based on a block strategy: distance between samples and c…
""" Unsupervised evaluation metrics. """
# License: BSD Style.
from itertools import combinations
import numpy as np
from sklearn.utils import check_random_state
from sklearn.metrics.pairwise import distance_metrics
from sklearn.metrics.pairwise import pairwise_distances
@yanofsky
yanofsky / LICENSE
Last active August 30, 2025 02:53
A script to download all of a user's tweets into a csv
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
@imjared
imjared / scraping-with-casperjs.js
Created March 20, 2013 00:33
A CasperJS script that crawled a list of links on then scraped the relevant content on each page and output it to a nicely formatted XML file. Sure beats database dumps/SQL manipulation, in my opinion.
/*jshint strict:false*/
/*global CasperError console phantom require*/
/**
* grab links and push them into xml
*/
var casper = require("casper").create({
});
@iamatypeofwalrus
iamatypeofwalrus / roll_ipython_in_aws.md
Last active February 21, 2025 18:39
Create an iPython HTML Notebook on Amazon's AWS Free Tier from scratch.

What

Roll your own iPython Notebook server with Amazon Web Services (EC2) using their Free Tier.

What are we using? What do you need?

  • An active AWS account. First time sign-ups are eligible for the free tier for a year
  • One Micro Tier EC2 Instance
  • With AWS we will use the stock Ubuntu Server AMI and customize it.
  • Anaconda for Python.
  • Coffee/Beer/Time
@cosmocatalano
cosmocatalano / instagram_scrape.php
Last active February 8, 2025 07:43
Quick-and-dirty Instagram web scrape, just in case you don't think you should have to make your users log in to deliver them public photos.
<?php
//returns a big old hunk of JSON from a non-private IG account page.
function scrape_insta($username) {
$insta_source = file_get_contents('http://instagram.com/'.$username);
$shards = explode('window._sharedData = ', $insta_source);
$insta_json = explode(';</script>', $shards[1]);
$insta_array = json_decode($insta_json[0], TRUE);
return $insta_array;
}
@alotaiba
alotaiba / google_speech2text.md
Created February 3, 2012 13:20
Google Speech To Text API

Google Speech To Text API

Base URL: https://www.google.com/speech-api/v1/recognize
It accepts POST requests with voice file encoded in FLAC format, and query parameters for control.

Query Parameters

client
The client's name you're connecting from. For spoofing purposes, let's use chromium

lang
Speech language, for example, ar-QA for Qatari Arabic, or en-US for U.S. English