This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Function to reduce the max. number of words in a string | |
maxwords <- function(x, max = 10){ | |
lwords <- length(x) | |
if(lwords > max) lwords <- max | |
paste0(x[1:lwords], collapse = " ") | |
} | |
# Apply maxwords to the data | |
tdft$text <- sapply(words, maxwords) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Remove sensitive text | |
summary(factor(Encoding(tdft$text))) | |
Encoding(tdft$text) <- "UTF-8" | |
tdft$text <- iconv(tdft$text, "UTF-8", "UTF-8",sub='') | |
tdft$text <- gsub('@\\S+', '@', tdft$text) # remove all to '@' texts | |
tdft$text <- gsub('http\\S+', 'http', tdft$text) # remove all to hyperlinks | |
head(tdft$text) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Take a random selection of the data | |
set.seed(2014) | |
tdft <- tdft[sample(nrow(tdft), size = 1000), ] | |
# Select only the variables of interest | |
install.packages("dplyr") # library for data manipulation | |
tdft <- select(tdft, lat, lon, created, text, | |
language, n_followers, n_tweets, user_location) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for each item in z List | |
do | |
end for |
NewerOlder