Skip to content

Instantly share code, notes, and snippets.

@ameyavilankar
ameyavilankar / preprocess.py
Last active January 25, 2023 10:19
Removing Punctuation and Stop Words nltk
import string
import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
import re
def preprocess(sentence):
sentence = sentence.lower()
tokenizer = RegexpTokenizer(r'\w+')
tokens = tokenizer.tokenize(sentence)
@WickyNilliams
WickyNilliams / index.html
Last active March 26, 2025 13:05
parseTable.js - convert HTML table to array of objects. MIT licensed (https://opensource.org/licenses/MIT)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>parseTable</title>
</head>
<body>
<table>
<thead>
<tr>
@tboggs
tboggs / dirichlet_plots.png
Last active May 26, 2025 18:01
A script to generate contour plots of Dirichlet distributions
dirichlet_plots.png
@thigm85
thigm85 / lda_vs_pca.R
Last active March 16, 2021 11:31
Visualize the difference between PCA and LDA on the iris dataset.
require(MASS)
require(ggplot2)
require(scales)
require(gridExtra)
pca <- prcomp(iris[,-5],
center = TRUE,
scale. = TRUE)
prop.pca = pca$sdev^2/sum(pca$sdev^2)
@mango314
mango314 / README.md
Last active December 7, 2018 01:58
a map of all 2166 Census Tracts of New York City in Python Matplotlib

Census Tracts of New York City

Here at PyData NYC, I heard a tutorial of how to use numpy and iPython notebooks. In a previous gist, I wrote drew all the zip codes of the Bronx in d3.js

This would be great for reproducing inforgraphics like Educational Attainment in New York City -- Brooklyn which looks a bit like a jigsaw puzzle:

Where to Obtain the Data

@jdmonaco
jdmonaco / t_welch.py
Last active February 2, 2021 10:00
Welch's t-test for two samples, not assuming equal sample size or variance. Requires Python, NumPy, and SciPy.
from collections import namedtuple
import numpy as np
import scipy.stats as st
TtestResults = namedtuple("Ttest", "T p")
def t_welch(x, y, tails=2):
"""
Welch's t-test for two unequal-size samples, not assuming equal variances
@brentp
brentp / linear_model.py
Created April 10, 2013 15:57
calculate t statistics and p-values for coefficients in Linear Model in python, using scikit-learn framework.
from sklearn import linear_model
from scipy import stats
import numpy as np
class LinearRegression(linear_model.LinearRegression):
"""
LinearRegression class after sklearn's, but calculate t-statistics
and p-values for model coefficients (betas).
Additional attributes available after .fit()
@1st
1st / tests_for_toptal_on_codility.py
Last active May 19, 2022 19:58
My answers for tests on http://codility.com that I passed for company http://toptal.com I use Python language to solve problems.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Test that I passed on codility.com for TopTal company
#
# Task #1
def binary_gap(N):
@iamatypeofwalrus
iamatypeofwalrus / roll_ipython_in_aws.md
Last active February 21, 2025 18:39
Create an iPython HTML Notebook on Amazon's AWS Free Tier from scratch.

What

Roll your own iPython Notebook server with Amazon Web Services (EC2) using their Free Tier.

What are we using? What do you need?

  • An active AWS account. First time sign-ups are eligible for the free tier for a year
  • One Micro Tier EC2 Instance
  • With AWS we will use the stock Ubuntu Server AMI and customize it.
  • Anaconda for Python.
  • Coffee/Beer/Time
@yusugomori
yusugomori / LogisticRegression.py
Last active March 30, 2021 00:03
multiclass Logistic Regression
#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
Logistic Regression
References :
- Jason Rennie: Logistic Regression,
http://qwone.com/~jason/writing/lr.pdf