Skip to content

Instantly share code, notes, and snippets.

View gregdl's full-sized avatar

mutedial gregdl

  • tokyo
View GitHub Profile
@onyxfish
onyxfish / example1.py
Created March 5, 2010 16:51
Basic example of using NLTK for name entity extraction.
import nltk
with open('sample.txt', 'r') as f:
sample = f.read()
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True)
@mhermans
mhermans / neo4R_example.R
Created August 29, 2011 09:50
Neo4j-Cypher-R
# Requirements
#sudo apt-get install libcurl4-gnutls-dev # for RCurl on linux
#install.packages('RCurl')
#install.packages('RJSONIO')
library('RCurl')
library('RJSONIO')
query <- function(querystring) {
h = basicTextGatherer()
@jasonbaldridge
jasonbaldridge / topics_gibbs_sg_example.R
Created May 29, 2012 18:44
Gibbs sampler for topic models for artificial data in Steyvers and Griffiths 2007.
## An implementation of Gibbs sampling for topic models for the
## example in section 4 of Steyvers and Griffiths (2007):
## http://cocosci.berkeley.edu/tom/papers/SteyversGriffiths.pdf
##
## Author: Jason Baldridge ([email protected])
# Functions to parse the input data
words.to.indices = data.frame(row.names=c("r","s","b","m","l"),1:5)
mysplit = function(x) { strsplit(x,"")[[1]] }
word.vector = function(x) { words.to.indices[mysplit(x),] }
from sussex_nltk import untag_sequence, extract_by_pos
all_tags = r".+"
all_nouns = r"N+"
all_verbs = r"V+"
all_adjectives = r"J+"
example_tagged_words = [('The', 'DT'), ('little', 'JJ'), ('badgers', 'NNS'), ('ate', 'VBP'), ('some', 'DT'), ('jam', 'NN')]
#Decide on some patterns to match
@gupul2k
gupul2k / pos_tagging.py
Created November 2, 2012 13:32
NER and POS Tagging with NLTK and Python
#Script tags POS and NER[Named Entity Recognition] for a supplied text file.
#Date: Nov 2 2012
#Author: Hota Sobhan
import nltk
f = open('C:\Python27\Test_File.txt')
data = f.readlines()
#Parse the text file for NER with POS Tagging
@benmarwick
benmarwick / R2MALLET.r
Last active April 12, 2021 10:27
R code to operate MALLET entirely from within R. Set variables, send commands to Windows' command console and get MALLET's result back into R for further analysis.
# Set working directory
dir <- "C:\\" # adjust to suit
setwd(dir)
# configure variables and filenames for MALLET
## here using MALLET's built-in example data and
## variables from http://programminghistorian.org/lessons/topic-modeling-and-mallet
# folder containing txt files for MALLET to work on
importdir <- "C:\\mallet-2.0.7\\sample-data\\web\\en"
# coding=UTF-8
import nltk
from nltk.corpus import brown
# This is a fast and simple noun phrase extractor (based on NLTK)
# Feel free to use it, just keep a link back to this post
# http://thetokenizer.com/2013/05/09/efficient-way-to-extract-the-main-topics-of-a-sentence/
# Create by Shlomi Babluki
# May, 2013
@benmarwick
benmarwick / citation-analysis-sketch.R
Last active February 1, 2025 15:00
sketch of citation analysis
# sources:
# http://www.jgoodwin.net/?p=1223
# http://orgtheory.wordpress.com/2012/05/16/the-fragile-network-of-econ-soc-readings/
# http://nealcaren.web.unc.edu/a-sociology-citation-network/
# http://kieranhealy.org/blog/archives/2014/11/15/top-ten-by-decade/
# http://www.jgoodwin.net/lit-cites.png
###########################################################################
# This first section scrapes content from the Web of Science webpage. It takes
@brianckeegan
brianckeegan / backbone_extractor.py
Last active August 2, 2023 19:26
Given a networkx graph containing weighted edges and a threshold parameter alpha, this code will return another networkx graph with the "backbone" of the graph containing a subset of weighted edges that fall above the threshold following the method in Serrano et al. 2008.
# Serrano, Boguna, Vespigani backbone extractor
# from http://www.pnas.org/content/106/16/6483.abstract
# Thanks to Michael Conover and Qian Zhang at Indiana with help on earlier versions
# Thanks to Clay Davis for pointing out an error
import networkx as nx
import numpy as np
def extract_backbone(g, weight='weight', alpha=.05):
backbone_graph = nx.Graph()
@jennybc
jennybc / 2014-10-12_stop-working-directory-insanity.md
Last active February 19, 2025 22:20
Stop the working directory insanity

There are packages for this now!

2017-08-03: Since I wrote this in 2014, the universe, specifically Kirill Müller (https://github.com/krlmlr), has provided better solutions to this problem. I now recommend that you use one of these two packages:

  • rprojroot: This is the main package with functions to help you express paths in a way that will "just work" when developing interactively in an RStudio Project and when you render your file.
  • here: A lightweight wrapper around rprojroot that anticipates the most likely scenario: you want to write paths relative to the top-level directory, defined as an RStudio project or Git repo. TRY THIS FIRST.

I love these packages so much I wrote an ode to here.

I use these packages now instead of what I describe below. I'll leave this gist up for historical interest. 😆