Luis Rei lrei

Recently created

Least recently created

Recently updated

Least recently updated

endolith / Has weird right-to-left characters.txt

Last active October 26, 2024 13:55

Unicode kaomoji smileys emoticons emoji

	ּ_בּ
	בּ_בּ
	טּ_טּ
	כּ‗כּ
	לּ_לּ
	מּ_מּ
	סּ_סּ
	תּ_תּ
	٩(×̯×)۶
	٩(̾●̮̮̃̾•̃̾)۶

kohlmeier / compressed_features.py

Last active March 2, 2024 18:08

Example of computing compressed features. NOTE: If you want to want to create such features consistently across process, you will need to persist the random components. Easy enough, but I've written the code for that, too, here: https://github.com/Khan/analytics/blob/master/map_reduce/py/random_features.py

	import collections
	import numpy as np


	class CompressedFeatures:

	def __init__(self, num_features=50):
	self.random_components = collections.defaultdict(
	self._generate_component)
	self.num_features = num_features

bittlingmayer / ft_wiki_preproc.py

Last active March 4, 2019 22:56

fastText pre-trained vectors preprocessing [moved to ftio.wiki.preproc - pip install ftio / https://github.com/SignalN/ftio]

	# See https://github.com/facebookresearch/fastText/blob/master/get-wikimedia.sh
	#
	# From https://github.com/facebookresearch/fastText/issues/161:
	#
	# We now have a script called 'get-wikimedia.sh', that you can use to download and
	# process a recent wikipedia dump of any language. This script applies the preprocessing
	# we used to create the published word vectors.
	#
	# The parameters we used to build the word vectors are the default skip-gram settings,
	# except with a dimensionality of 300 as indicated on the top of the list of word