Skip to content

Instantly share code, notes, and snippets.

@bryanyang0528
Created September 19, 2014 16:36
Show Gist options
  • Save bryanyang0528/e6dab5275e140bead10a to your computer and use it in GitHub Desktop.
Save bryanyang0528/e6dab5275e140bead10a to your computer and use it in GitHub Desktop.
corpus correct
> d.corpus <- Corpus(VectorSource(text))
> d.corpus <- tm_map(d.corpus, content_transformer(function(x) gsub("今天","昨天",x)))
> inspect(d.corpus)
<<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
昨天天氣很好1
[[2]]
<<PlainTextDocument (metadata: 7)>>
昨天天氣很好2
> str(d.corpus)
List of 2
$ 1:List of 2
..$ content: chr "昨天天氣很好1"
..$ meta :List of 7
.. ..$ author : chr(0)
.. ..$ datetimestamp: POSIXlt[1:1], format: "2014-09-19 16:35:11"
.. ..$ description : chr(0)
.. ..$ heading : chr(0)
.. ..$ id : chr "1"
.. ..$ language : chr "en"
.. ..$ origin : chr(0)
.. ..- attr(*, "class")= chr "TextDocumentMeta"
..- attr(*, "class")= chr [1:2] "PlainTextDocument" "TextDocument"
$ 2:List of 2
..$ content: chr "昨天天氣很好2"
..$ meta :List of 7
.. ..$ author : chr(0)
.. ..$ datetimestamp: POSIXlt[1:1], format: "2014-09-19 16:35:11"
.. ..$ description : chr(0)
.. ..$ heading : chr(0)
.. ..$ id : chr "2"
.. ..$ language : chr "en"
.. ..$ origin : chr(0)
.. ..- attr(*, "class")= chr "TextDocumentMeta"
..- attr(*, "class")= chr [1:2] "PlainTextDocument" "TextDocument"
- attr(*, "class")= chr [1:2] "VCorpus" "Corpus"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment