Skip to content

Instantly share code, notes, and snippets.

@bryanyang0528
Last active August 29, 2015 14:02
Show Gist options
  • Save bryanyang0528/e8000be63720b9c6b2d9 to your computer and use it in GitHub Desktop.
Save bryanyang0528/e8000be63720b9c6b2d9 to your computer and use it in GitHub Desktop.
TEXT MINING
Revised <- read.csv(file="D:/Data/Revised/revised data2013.csv", na.string = "NA", header = T, sep = ",", quote = "\"", dec = ".", fill = T, encoding="ANSI", stringsAsFactors = FALSE )
##載入檔案
library(tm)
library(tmcn)
library(rJava)
library(Rwordseg)
##一些TEXT MINING必用的套件
#將每個分詞切開統計次數,沒有切到"字"
Detail = Revised[c(16)]
R_corpus <- Corpus(DataframeSource(Detail,encoding="ANSI"))
##TEXT MINING的特殊物件格式,需要用這個來讀取要分析的文字檔案
#R_corpus <- tm_map(R_corpus, segmentCN)
#R_Corpus1 <- Corpus(VectorSource(R_corpus))
##上述兩個套件會把中文依字詞拆開來,但是今天我們的分析只是想整理一下文字檔的內容,不用切到詞彙
tdm <- TermDocumentMatrix(R_corpus)
##到這裡就整理出將鋸子拆開來的矩陣檔
#inspect(tdm[1:10,1:2])
library(wordcloud)
#載入文字雲套件
m1 <- as.matrix(tdm)
v <- sort(rowSums(m1),decreasing = TRUE)
#計算每個字詞的FREQ
d <- data.frame(word = names(v), freq = v)
#轉乘DATA FRAME的格式給文字雲套件吃
wordcloud(d$word, d$freq, min.freq = 10, random.order = F, ordered.colors = F,
colors = rainbow(length(row.names(m1))))
write.csv(d, "d.csv", row.names=F)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment