Chris Callison-Burch callison-burch

University of Pennsylvania
Philadelphia, PA
cis.upenn.edu/~ccb
https://orcid.org/0000-0001-8196-1943

Recently created

Least recently created

Recently updated

Least recently updated

callison-burch / word-alignment-input.csv

Created February 18, 2010 16:39

We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 8 columns, instead of 5 in line 1.

	fileName,segNum,source,target,sureAlignments,possAlignments,sourceHighlights,targetHighlights
	dev.ur-en.alignments.0,0,سپین : قدیم ترین دانت کی دریافت,Spain : the Discovery of the Oldest Tooth,0-0 4-7 6-3 2-6 5-2 5-5 5-4 1-1
	dev.ur-en.alignments.0,1,اٹا پیورکا فاؤنڈیشن کے ماہرین کا کہنا ہے کہ یہ دانت مغربی یورپ کے قدیم ترین انسان کی باقیات کو ظاہر کرتا ہے .,The experts of Atapuerca Foundation say that this tooth represents the remains of the oldest man of Western Europe .,13-16 3-0 4-1 11-17 18-11 2-4 0-3 12-18 0-9 9-7 23-19 17-12 16-15 5-2 8-6 10-8 14-14 1-3 3-2 6-5
	dev.ur-en.alignments.0,2,دراصل پری مولر یعنی دانتوں کے ساتھ کی داڑھ,In fact premolar meaning the molar with the canine,8-5 0-1 8-8 7-7 6-6 3-3 2-2
	dev.ur-en.alignments.0,3,اس داڑھ کی دریافت سپین کے صوبے برگوس کے شمالی علاقے اٹاپیورکا میں ہوئی ہے جس کی مزید تفصیلات تحقیقی ادارہ مکمل جانچ کے بعد سائنسی جریدے میں شائع کرے گا .,This molar has been discovered from the northern area of Spanish province Burgos , Atapuerca . The foundation will publi

callison-burch / word-alignment.js

Created February 18, 2010 15:52

	<!--
	This word alignment interface was written by Chris Callison-Burch.
	It's free and open source. If you publish a paper using data that you collected
	with it, please give me a shout out in the acknowledgements. You can cite my
	EMNLP-2009 paper "Fast, Cheap, and Creative: Evaluating Translation Quality
	Using Amazon's Mechanical Turk" or my ACL-2004 paper "Statistical Machine
	Translation with Word- and Sentence-Aligned Parallel Corpora"
	June 22, 2009
	//-->
	<h3> Urdu-English Word alignment</h3>

callison-burch / gist:293896

Created February 3, 2010 19:01

	<h1>Highlight clicked word</h1>
	<style>
	<!--
	span.white { background-color: #FFFFFF; }
	span.highlight { background-color: yellow; }

	A:link {text-decoration: none; color: black;}
	A:visited {text-decoration: none; color: black;}
	A:active {text-decoration: none; color: black;}
	A:hover {text-decoration: none; color: black;}

callison-burch / gist:293884

Created February 3, 2010 18:49

	<script language="Javascript" src="http://gd.geobytes.com/gd?after=-1&variables=GeobytesCountry,GeobytesCity,GeobytesRegion,GeobytesIpAddress"></script>
	<script language="Javascript">
	<!--
	function getUserInfo() {
	var userDisplayLanguage = navigator.language ? navigator.language : navigator.userDisplayLanguage;
	var browserInfo = navigator.userAgent;
	var country = sGeobytesCountry;
	var city = sGeobytesCity;
	var region = sGeobytesRegion;

callison-burch / generate_csv_for_mturk_translation.py

Created January 27, 2010 18:46

	from BeautifulSoup import BeautifulSoup
	from urllib import quote_plus, unquote_plus
	from wt_articles.splitting import determine_splitter
	from pyango_view import str2img
	import wikipydia

	lang = 'ur'
	category = u'\u0632\u0645\u0631\u06c1:\u0645\u0646\u062a\u062e\u0628_\u0645\u0642\u0627\u0644\u06d2'
	sentence_filename = '/Users/ccb/Desktop/urdu_sentences'
	csv_filename = '/Users/ccb/Desktop/wikipedia_article_to_translate-2.csv'

callison-burch / gist:279534

Created January 17, 2010 19:38

	urdu_lines = open('/Users/ccb/Desktop/urdu_file').read().decode('utf8').split('\n')

	urdu_lines
	[u'\u0632\u0645\u0631\u06c1:\u0645\u0646\u062a\u062e\u0628_\u0645\u0642\u0627\u0644\u06d2', u'\u0632\u0645\u0631\u06c1:\u0627\u0645\u06cc\u062f\u0648\u0627\u0631_\u0628\u0631\u0627\u06d3_\u0645\u0646\u062a\u062e\u0628_\u0645\u0642\u0627\u0644\u06c1']

callison-burch / gist:275868

Created January 13, 2010 02:27

	def query_category_members(category, language='en', limit=100):
	"""
	action=query,prop=categories
	Returns all the members of a category up to the specified limit
	"""
	url = api_url % (language)
	query_args = {
	'action': 'query',
	'list': 'categorymembers',
	'cmtitle': category,