Skip to content

Instantly share code, notes, and snippets.

View gregdl's full-sized avatar

mutedial gregdl

  • tokyo
View GitHub Profile
@drjwbaker
drjwbaker / 2015-04-17_IOR.py
Last active August 29, 2015 14:19
Python script to query a tsv file using strings in a txt file and make a new tsv with the lines that contain matches
#I have a list of strings in a text file (`mylist.txt`). I want to search for these strings in a tsv file (`somestuff.tsv`) and make a new file that contains only the lines in which the strings appear. Some strings in the text file will not appear in the tsv file.
#see https://gist.github.com/MartinPaulEve/c0610fa89da4df4d546a
#!/usr/bin/env python
output = []
# use a "with" block to automatically close I/O streams
with open('mylist.txt') as word_list:
@soodoku
soodoku / text_classifier.R
Last active December 15, 2016 17:44
Basic Text Classifier
"
Basic Text Classifier
- Takes a csv with a text column, and column of labels
- Splits into train and test
- Preprocesses text using tm/bag-of-words, 1/2-order Markov
- Uses SVM and Lasso
@author: Gaurav Sood
"
anonymous
anonymous / findsimilar.py
Created June 9, 2015 17:16
from whoosh.index import open_dir
from whoosh.index import create_in
from whoosh.fields import *
from whoosh.qparser import QueryParser
import glob
import os
# USER SET PARAMETERS ############
@darinwilson
darinwilson / ambient1
Created August 14, 2015 19:46
Ambient experiment using Sonic Pi
# Ambient experiment for Sonic Pi (http://sonic-pi.net/)
#
# The piece consists of three long loops, each of which plays one of
# two randomly selected pitches. Each note has different attack,
# release and sleep values, so that they move in and out of phase
# with each other. This can play for quite awhile without
# repeating itself :)
live_loop :note1 do
use_synth :hollow
@jabranham
jabranham / evernote-to-org-mode.org
Last active October 24, 2023 13:52
Evernote to org-mode

Export from evernote

You’ll have to open up the evernote application on either Mac or Windows (they don’t have a linux client), right click on the notebook you want to export, and select “Export.” Select the option to export to html (either one page or several pages, depending on your preference. I went with one html page for each note).

Clean filenames

@quadrismegistus
quadrismegistus / gensim_word2vec_procrustes_align.py
Last active November 16, 2023 01:57
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.
@quadrismegistus
quadrismegistus / gensim_word2vec_make_semantic_network.py
Last active June 7, 2020 15:14
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations) between a set of words in a gensim word2vec model.
"""
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations)
between a set of words in a gensim word2vec model.
To use:
Set the filenames for the word2vec model.
Set `my_words` to be a list of your own choosing.
Set `num_top_dists` to be a number or a factor of the length of `my_words.`
Choose between the two methods below to produce distances, and comment-out the other one.
"""
@aparrish
aparrish / word_counts_with_counter.ipynb
Last active October 23, 2020 10:33
Quick word counts with Counter. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@aparrish
aparrish / data-as-documents.ipynb
Last active October 23, 2020 10:32
"Documents as data" notebook. (Python 2.7) Click the "Raw" button to download! Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@aparrish
aparrish / understanding-word-vectors.ipynb
Last active May 8, 2025 14:50
Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.