This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark import SparkContext | |
sc = SparkContext("local","test_app") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0 | |
/_/ | |
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65) | |
Type in expressions to have them evaluated. | |
Type :help for more information. | |
... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##設定Java Path | |
JAVA_HOME=/.../jdk1.8.0_05/HOME | |
CLASSPATH=.:$JAVA_HOME/lib.tools.jar | |
PATH=$JAVA_HOME/bin:$PATH | |
export JAVA_HOME CLASSPATH PATH | |
##設定spark path | |
export SCALA_HOME=/.../scala | |
export SPARK_HOME=/.../spark | |
export PATH=$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> tdm = as.matrix(TermDocumentMatrix(d.corpus)) | |
> tdm | |
Docs | |
Terms 1 2 | |
昨天天氣很好1 1 0 | |
昨天天氣很好2 0 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> d.corpus <- Corpus(VectorSource(text)) | |
> d.corpus <- tm_map(d.corpus, content_transformer(function(x) gsub("今天","昨天",x))) | |
> inspect(d.corpus) | |
<<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>> | |
[[1]] | |
<<PlainTextDocument (metadata: 7)>> | |
昨天天氣很好1 | |
[[2]] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> str(d.corpus) | |
List of 2 | |
Error in UseMethod("meta", x) : | |
no applicable method for 'meta' applied to an object of class "character" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tdm = as.matrix(TermDocumentMatrix(d.corpus)) | |
''' | |
Error in UseMethod("meta", x) : | |
no applicable method for 'meta' applied to an object of class "character" | |
In addition: Warning message: | |
In mclapply(unname(content(x)), termFreq, control) : | |
all scheduled cores encountered errors in user code | |
''' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d.corpus <- tm_map(d.corpus, function(x) gsub("今天","昨天",x)) | |
inspect(d.corpus) | |
<<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>> | |
[[1]] | |
[1] 昨天天氣很好1 | |
[[2]] | |
[1] 昨天天氣很好2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> str(d.corpus) | |
List of 2 | |
$ 1:List of 2 | |
..$ content: chr "今天天氣很好1" | |
..$ meta :List of 7 | |
.. ..$ author : chr(0) | |
.. ..$ datetimestamp: POSIXlt[1:1], format: "2014-09-19 16:12:28" | |
.. ..$ description : chr(0) | |
.. ..$ heading : chr(0) | |
.. ..$ id : chr "1" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
text <- c("今天天氣很好1","今天天氣很好2") | |
d.corpus <- Corpus(VectorSource(text)) | |
inspect(d.corpus) | |
''' | |
<<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>> | |
[[1]] | |
<<PlainTextDocument (metadata: 7)>> | |
今天天氣很好1 |