Created
August 9, 2019 13:36
-
-
Save lawrencechen0921/4ca6225e0f0d22dff35654bd80a21812 to your computer and use it in GitHub Desktop.
使用Counter模組找出最常出現的五個字以及次數
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#再windos按鈕畫面點擊右鍵(並按下執行) | |
#先在cmd 命令提示字元 輸入python再輸入import this | |
#將那段話存入 xxx.txt | |
from collections import Counter #記住從from 甚麼 import 甚麼都要放在第一行 以免篩選器錯誤 | |
fin=open('xxx.txt','rt') | |
s=fin.read().lower() #物件fin 使用函式read一次讀取所有內容, 使用函式lower將所有字母轉換小寫 | |
words=re.findall(r'[\w\']+',s) #用re.findall找出大小寫字母、數字、底線、和單引號 | |
c=Counter(words) #將物件counter以words為輸入, 將獲得的counter物件轉換成c | |
print(c.most_common(5)) #把c內容最常出現的五個屬性字找出 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment