Yoshiki Schmitz yoshikischmitz

Scripts for preparing a full-text-search sqlite database from an english-chinese parallel corpus

These scripts were written to prepare the FTS database for use in Otacon507's Chinese Example sentences plugin: https://github.com/otacon507/zh-examples

The database importer will hopefully be rewritten in python and integrated into the above repo so end-users can easily create their own databases. I originally used Dr. Xian Qian's 2MParallelCorpus(https://github.com/qxred/2MParallelCorpus). Please note that though this data is of a generally high quality, it does contain some instances of non-English/non-Chinese sentences(est. 3000+ using langid), ~900+ html fragments, and an unknown amount of nonsense sentence fragments(but probably not too many).

Regarding the nonsense sentence fragments, we partially avoided this issue in the plugin by sorting results by their sentence length's distance from the average length of the result set. This is accomplished in two queries:

// pseudocode:

	/*
	Implementation of ISynchronizeInvoke for Unity3D game engine.
	Can be used to invoke anything on main Unity thread.
	ISynchronizeInvoke is used extensively in .NET forms, it's is elegant and quite useful in Unity as well.
	I implemented it so i can use it with System.IO.FileSystemWatcher.SynchronizingObject.

	help from: http://www.codeproject.com/Articles/12082/A-DelegateQueue-Class
	example usage: https://gist.github.com/aeroson/90bf21be3fdc4829e631

	license: WTFPL (http://www.wtfpl.net/)

	#include <stdio.h>

	int multiply_like_an_egyptian(int first, int second) {
	int n;
	int o;
	if(first < second) {
	n = second; o = first;
	} else {
	n = first; o = second;
	}

	applicators = {3 => "fizz", 5 => "buzz"}
	fizzbuzz =
	(1..100).map do \|n\|
	fb = applicators.
	select{\|a\| n % a == 0}.
	values.join
	[n.to_s, fb].max # "1" > "" and "fizz" > "100000" since "1" < "a"
	end
	puts fizzbuzz

	class CircularBuffer
	BUFFER_SIZE = 10

	def initialize
	@buffer_array = Array.new(BUFFER_SIZE)
	@count = 0
	@out_counter = 0
	@in_counter = 0
	end

	# This program is free software: you can redistribute it and/or modify
	# it under the terms of the GNU General Public License as published by
	# the Free Software Foundation, either version 3 of the License, or
	# (at your option) any later version.
	#
	# This program is distributed in the hope that it will be useful,
	# but WITHOUT ANY WARRANTY; without even the implied warranty of
	# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	# GNU General Public License for more details.
	#

	#!/bin/sh
	#
	# Prevents commits to master.
	# Put in .git/hooks/pre-commit
	# Bypass with --no-verify flag on commit

	if [ $(git rev-parse --abbrev-ref HEAD) = "master" ];
	then
	echo "You cannot commit to master"
	exit 1

	user=>
	(map (fn [x]
	(println x '= (clojure.string/capitalize x)
	(= x (clojure.string/capitalize x)))
	) "EeeeFFfE")

	(E = E false
	e = E false
	nil e = E false
	nil e = E false

	# This extends the Array class with a special method called each_[insert integer here]
	# For example, each_1 invokes the block for every single element in an array, each_2 invokes the block for
	# each pair, each_3 for triples, so on and so forth. This is a purely educational example of using
	# metaprogramming to replicate functionality using patterns rather hard coding a new method for each
	# new case you need to account for.
	class Array

	def method_missing(meth, *args, &block)
	if meth.to_s =~ /^each_(\d+)$/
	run_each_by_num($1.to_i, &block)