Skip to content

Instantly share code, notes, and snippets.

@a-paxton
a-paxton / text-cleaning+word2vec-gensim.py
Created September 11, 2015 23:31
Cleaning Text Data and Creating 'word2vec' Model with Gensim
# preliminaries
from pymongo import MongoClient
from nltk.corpus import stopwords
from string import ascii_lowercase
import pandas as pd
import gensim, os, re, pymongo, itertools, nltk, snowballstemmer
# set the location where we'll save our model
savefolder = '/data'
@karpathy
karpathy / min-char-rnn.py
Last active May 12, 2025 17:28
Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""
import numpy as np
# data I/O
data = open('input.txt', 'r').read() # should be simple plain text file
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
@stulacy
stulacy / MAUCpy.py
Last active November 29, 2020 11:23
"""
MAUCpy
~~~~~~
Contains two equations from Hand and Till's 2001 paper on a multi-class
approach to the AUC. The a_value() function is the probabilistic approximation
of the AUC found in equation 3, while MAUC() is the pairwise averaging of this
value for each of the classes. This is equation 7 in their paper.
"""
@jkbradley
jkbradley / LDA_SparkDocs
Created March 24, 2015 23:56
LDA Example: Modeling topics in the Spark documentation
/*
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/
import scala.collection.mutable
@tokestermw
tokestermw / visualizing_topic_models.py
Last active September 7, 2021 16:57
visualization topic models in four different ways
import json
import urlparse
from itertools import chain
flatten = chain.from_iterable
from nltk import word_tokenize
from gensim.corpora import Dictionary
from gensim.models.ldamodel import LdaModel
from gensim.models.tfidfmodel import TfidfModel
@shanebutler
shanebutler / sql.export.randomForest.R
Last active September 14, 2020 14:04
Deploy your RandomForest models in SQL! This tool enables in-database scoring of Random Forest models built using R. To use it, you simply call the function with the Random Forest model, output filename, SQL input data table and the name of the unique key on that table. For example:sql.export.rf(rf.mdl, file="model_output.SQL", input.table="sour…
# sql.export.rf(): save a randomForest model as SQL
# v0.04
# Copyright (c) 2013-2014 Shane Butler <shane dot butler at gmail dot com>
#
# sql.export.rf is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 2 of the License, or
# (at your option) any later version.
#
# sql.export.rf is distributed in the hope that it will be useful, but
@erogol
erogol / logistic_ensemble.py
Last active February 8, 2018 20:28
logistic regression ensembles with feature selection. It requires sklearn python lib
def linear_model_ensemble(X, y, X_test, fold_num, fold_num_sec, grid_search_range, oobe=True, x_val=True ):
'''
X - Train set
y - Train set labels with. Labels are 1 for pos instances and -1 for neg instances
fold_num1 - Fold size for the first step X-validation to set the hyper-params
and feature selectors
@shanebutler
shanebutler / sql.export.gbm.R
Last active July 4, 2023 08:10
Deploy your GBM models in SQL! This tool enables in-database scoring of GBM models built using R. To use it, you simply call the function with the GBM model, output filename, SQL input data table and the name of the unique key on that table. For example:sql.export.gbm(gbm1, file="model_output.SQL", input.table="source_table", id="id") Please let…
# sql.export.gbm(): save a GBM model as SQL
# v0.11
# Copyright (c) 2013-2014 Shane Butler <shane dot butler at gmail dot com>
#
# sql.export.gbm is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 2 of the License, or
# (at your option) any later version.
#
# sql.export.gbm is distributed in the hope that it will be useful, but
@iamatypeofwalrus
iamatypeofwalrus / roll_ipython_in_aws.md
Last active February 21, 2025 18:39
Create an iPython HTML Notebook on Amazon's AWS Free Tier from scratch.

What

Roll your own iPython Notebook server with Amazon Web Services (EC2) using their Free Tier.

What are we using? What do you need?

  • An active AWS account. First time sign-ups are eligible for the free tier for a year
  • One Micro Tier EC2 Instance
  • With AWS we will use the stock Ubuntu Server AMI and customize it.
  • Anaconda for Python.
  • Coffee/Beer/Time
@justinvw
justinvw / es_simple_autocomplete_example_config.sh
Last active January 26, 2021 17:15
Simple ElasticSearch autocomplete example configuration. The 'autocomplete' functionality is accomplished by lowercasing, character folding and n-gram tokenization of a specific indexed field (in this case "city").
# Delete the possibly existing autocomplete test index
curl -X DELETE localhost:9200/autocomplete_test
# Put the config of the autocomplete index
curl -X PUT localhost:9200/autocomplete_test -d '
{
"settings" : {
"index" : {
"analysis" : {
"analyzer" : {