Skip to content

Instantly share code, notes, and snippets.

View Robinlovelace's full-sized avatar
💭
It's time to understand the world

Robin Lovelace Robinlovelace

💭
It's time to understand the world
View GitHub Profile
@Robinlovelace
Robinlovelace / maxwords.R
Created October 27, 2014 15:26
Reduce max. number of words in a text string, keeping only first
# Function to reduce the max. number of words in a string
maxwords <- function(x, max = 10){
lwords <- length(x)
if(lwords > max) lwords <- max
paste0(x[1:lwords], collapse = " ")
}
# Apply maxwords to the data
tdft$text <- sapply(words, maxwords)
@Robinlovelace
Robinlovelace / remove-words.R
Created October 27, 2014 15:21
Removal of sensitive words
# Remove sensitive text
summary(factor(Encoding(tdft$text)))
Encoding(tdft$text) <- "UTF-8"
tdft$text <- iconv(tdft$text, "UTF-8", "UTF-8",sub='')
tdft$text <- gsub('@\\S+', '@', tdft$text) # remove all to '@' texts
tdft$text <- gsub('http\\S+', 'http', tdft$text) # remove all to hyperlinks
head(tdft$text)
@Robinlovelace
Robinlovelace / shrink-data.R
Last active August 29, 2015 14:08
Shrinking the Tour de France data
# Take a random selection of the data
set.seed(2014)
tdft <- tdft[sample(nrow(tdft), size = 1000), ]
# Select only the variables of interest
install.packages("dplyr") # library for data manipulation
tdft <- select(tdft, lat, lon, created, text,
language, n_followers, n_tweets, user_location)
for each item in z List
do
end for