Focusing

Kung-Hsiang Steeve Huang khuangaf

Focusing

Research Scientist @ Salesforce Research | Formerly: PhD @ UIUC, PhD Fellow @ Amazon, MSc @ USC, BEng @ HKUST | He/him/his 🇹🇼

khuangaf / gist:5864d31d6c685ca5a68d938a9dc7a5a2

Created July 3, 2020 16:58

	# compute average hit rate for all users
	def precision_at_k(predictions, k):
	'''
	Return the average ndcg for each users
	args:
	predictions: np.array user-item predictions
	returns:
	hit_rate: float, computed hit rate
	'''

khuangaf / gist:96f1103e837e91339039b010ae53a07d

Created July 3, 2020 16:58

	# compute average hit rate for all users
	def precision_at_k(predictions, k):
	'''
	Return the average ndcg for each users
	args:
	predictions: np.array user-item predictions
	returns:
	hit_rate: float, computed hit rate
	'''

khuangaf / gist:d6e4c8b8b369d7864868884927250c56

Last active March 14, 2020 17:22

	class CustomBERTModel(BertPreTrainedModel):

	def __init__(self, config, num_class):
	super(CustomBERTModel, self).__init__(config)
	self.bert = BertModel(config)
	self.linear = nn.Linear(config.hidden_size, num_class)

	model = CustomBERTModel.from_pretrained('bert-base-uncased',num_class=10)

khuangaf / gist:cf345183226018d1ebf05ad173680b95

Created March 14, 2020 17:13

model = BertModel.from_pretrained('bert-base-uncased')

khuangaf / gist:2bf8fcb799efb2cfd76da79e9ba13e52

Created March 14, 2020 16:59

	all_doc_tokens=['SEP']
	orig_to_tok_index=[]
	for (i, word) in enumerate(words):
	orig_to_tok_index.append(len(all_doc_tokens))
	sub_tokens = tokenizer.tokenize(token)
	all_doc_tokens.extend(sub_tokens)

khuangaf / gist:48469a3d0042ce723c7b1bdc9002a7ac

Created March 14, 2020 16:40

tokenizer.convert_tokens_to_ids(tokens)

khuangaf / gist:b5ff454a45061342611a8028f7a1ad3b

Last active March 14, 2020 16:33

BERT tokenization

	tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
	tokens = tokenizer.tokenize('Learn Hugging Face Transformers & BERT with PyTorch in 5 Minutes')
	tokens = ['[CLS]'] + tokens + ['[SEP]']

khuangaf / gist:6a76cfdde67e80cb83aa748dbc54e016

Created February 2, 2020 07:30

	# infer the topic distribution of the second corpus.
	lda[common_corpus[1]]
	'''
	output
	[(0, 0.014287902),
	(1, 0.014287437),
	(2, 0.014287902),
	(3, 0.014285716),
	(4, 0.014285716),
	(5, 0.014285714),

khuangaf / gist:e39fdf01a5a9954186128a034b480e58

Last active February 2, 2020 07:30

	from gensim.test.utils import common_texts
	from gensim.corpora.dictionary import Dictionary
	from gensim.models import LdaModel

	# Create a corpus from a list of texts
	common_dictionary = Dictionary(common_texts)
	common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]

	# Train the model on the corpus.
	lda = LdaModel(common_corpus, num_topics=10)

khuangaf / gist:0827377670bad433ee31ebfd95b1b2ef

Created February 2, 2020 07:28

	from gensim.test.utils import common_texts
	from gensim.corpora.dictionary import Dictionary
	from gensim.models import LdaModel

	# Create a corpus from a list of texts
	common_dictionary = Dictionary(common_texts)
	common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]

	# Train the model on the corpus.
	lda = LdaModel(common_corpus, num_topics=10)