Skip to content

Instantly share code, notes, and snippets.

@bryanyang0528
Created July 16, 2014 11:59
Show Gist options
  • Select an option

  • Save bryanyang0528/87e29864d192e4b9edf6 to your computer and use it in GitHub Desktop.

Select an option

Save bryanyang0528/87e29864d192e4b9edf6 to your computer and use it in GitHub Desktop.
N-gram v2.0 part3
def longTermPriority(path, maxTermLength, minFreq):
longTerms=[] #長詞
longTermsFreq=[] #長詞+次數分配
for i in range(maxTermLength,1,-1): ##字詞數由大至小
text_list = cutSentence(path,longTerms) #呼叫cutSentence function
#print len(text_list)
words_freq = ngram(text_list,i, minFreq) #呼叫 ngram function
#print i
for word_freq in words_freq:
longTerms.append(word_freq[0]) #將跑出來的長詞加入 longTerms list 做為下次切割檔案的基礎
#print word_freq[0]
longTermsFreq.append(word_freq) #將長詞和次數加入另外一個list 分成兩個檔儲存的用意是減少迴圈次數
#print word_freq
return longTermsFreq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment