Skip to content

Instantly share code, notes, and snippets.

View bryanyang0528's full-sized avatar
🎯
Focusing

Bryan Yang bryanyang0528

🎯
Focusing
View GitHub Profile
from pyspark import SparkContext
sc = SparkContext("local","test_app")
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
...
##設定Java Path
JAVA_HOME=/.../jdk1.8.0_05/HOME
CLASSPATH=.:$JAVA_HOME/lib.tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
##設定spark path
export SCALA_HOME=/.../scala
export SPARK_HOME=/.../spark
export PATH=$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
@bryanyang0528
bryanyang0528 / gist:43f8a1cc5bb91bc7badb
Created September 19, 2014 16:37
corpus correct-2
> tdm = as.matrix(TermDocumentMatrix(d.corpus))
> tdm
Docs
Terms 1 2
昨天天氣很好1 1 0
昨天天氣很好2 0 1
> d.corpus <- Corpus(VectorSource(text))
> d.corpus <- tm_map(d.corpus, content_transformer(function(x) gsub("今天","昨天",x)))
> inspect(d.corpus)
<<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
昨天天氣很好1
[[2]]
> str(d.corpus)
List of 2
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "character"
tdm = as.matrix(TermDocumentMatrix(d.corpus))
'''
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "character"
In addition: Warning message:
In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code
'''
d.corpus <- tm_map(d.corpus, function(x) gsub("今天","昨天",x))
inspect(d.corpus)
<<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>>
[[1]]
[1] 昨天天氣很好1
[[2]]
[1] 昨天天氣很好2
@bryanyang0528
bryanyang0528 / gist:4669314eeb4c4739a8f6
Created September 19, 2014 16:17
corpus structure
> str(d.corpus)
List of 2
$ 1:List of 2
..$ content: chr "今天天氣很好1"
..$ meta :List of 7
.. ..$ author : chr(0)
.. ..$ datetimestamp: POSIXlt[1:1], format: "2014-09-19 16:12:28"
.. ..$ description : chr(0)
.. ..$ heading : chr(0)
.. ..$ id : chr "1"
text <- c("今天天氣很好1","今天天氣很好2")
d.corpus <- Corpus(VectorSource(text))
inspect(d.corpus)
'''
<<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
今天天氣很好1