Skip to content

Instantly share code, notes, and snippets.

@cmdcolin
Last active January 31, 2016 21:11
Show Gist options
  • Save cmdcolin/df605dc680780bb0480c to your computer and use it in GitHub Desktop.
Save cmdcolin/df605dc680780bb0480c to your computer and use it in GitHub Desktop.
Tumblr community analysis
#!/usr/bin/env Rscript
require(jsonlite)
require(reshape2)
api_key='your_api_key'
get_reblogs<-function(blogname,apikey,total=100) {
res=lapply(seq(0,total-20,by=20), function(offset) {
apiresults <- fromJSON(sprintf('http://api.tumblr.com/v2/blog/%s.tumblr.com/posts?api_key=%s&reblog_info=true&offset=%d&limit=%d',
blogname,apikey,offset,ifelse(total-offset<20,total-offset,20)))
if('reblogged_root_name' %in% colnames(apiresults$response$posts)) {
apiresults$response$posts[,c('blog_name','reblogged_root_name')]
} else {
data.frame(blog_name=NA, reblogged_root_name=NA)
}
})
do.call(rbind, res)
}
reblogs = get_reblogs('glitchgifs',api_key)
for(blog in unique(reblogs$reblogged_root_name)) {
reblogs = rbind(reblogs, get_reblogs(blog, api_key))
}
# filter reblogs in glitchgifs network
dataframe=reblogs[reblogs$reblogged_root_name %in% unique(reblogs$blog_name),]
# filter self reblogs
dataframe=dataframe[apply(dataframe, 1, function(x) all(!duplicated(x))),]
# write raw blog data (e.g. has duplicate edges)
write.csv(dataframe, file="reblogs.csv",quote=F)
# cast to matrix
m=acast(dataframe,reblogged_root_name~blog_name)
# write adjacency matrix to file
write.csv(m, file="reblog_matrix.csv", quote=F)
# write adjacency list using remelted matrix to get edge weights
write.csv(melt(m), file="weights.csv", quote=F)
@cmdcolin
Copy link
Author

Example of the "glitchgifs" community after plotting in cytoscape

output101

@cmdcolin
Copy link
Author

Example of the "glitchtheory" community after plotting in cytoscape

output301

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment