vfive fivejjs

Expand The Edinburgh Twitter FSD Corpus

The Python scripts attached here take care of the following tedious work, and should help one quickly get started with some real work on the corpus:

Respect the Twitter API rate limits and throttle API hits.
Don't hit the API for already expanded tweet ID's, so you can resume tweet expansion after stopping midway.
Parse the API response and dump it into the correct column in the sqlite3 database.
Gracefully handle exceptions while acquiring tweets from the API.
Wrap version 1.1 of the Twitter API.
Start from a specified tweet ID, assuming the input file is sorted in increasing order of tweet ID.

The Python scripts attached here take care of the following tedious work, and should help one quickly get started with some real work on the corpus:

Respect the Twitter API rate limits and throttle API hits.
Don't hit the API for already expanded tweet ID's, so you can resume tweet expansion after stopping midway.
Parse the API response and dump it into the correct column in the sqlite3 database.
Gracefully handle exceptions while acquiring tweets from the API.
Wrap version 1.1 of the Twitter API.
Start from a specified tweet ID, assuming the input file is sorted in increasing order of tweet ID.

	import os
	import sys
	from getpass import getpass

	import gdata.docs.service
	import gdata.spreadsheet.service


	'''
	get user information from the command line argument and

	#!/usr/bin/env python
	# -- coding: utf-8 --
	#
	# pegasos.py
	#
	# Copyright 2013 nipun batra <[email protected]>
	#
	# This program is free software; you can redistribute it and/or modify
	# it under the terms of the GNU General Public License as published by
	# the Free Software Foundation; either version 2 of the License, or

	import nltk

	text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
	computer or the gears of a cycle transmission as he does at the top of a mountain
	or in the petals of a flower. To think otherwise is to demean the Buddha...which is
	to demean oneself."""

	# Used when tokenizing words
	sentence_re = r'''(?x) # set flag to allow verbose regexps
	([A-Z])(\.[A-Z])+\.? # abbreviations, e.g. U.S.A.