Skip to content

Instantly share code, notes, and snippets.

View nassimhaddad's full-sized avatar

Nassim Haddad nassimhaddad

View GitHub Profile
@nassimhaddad
nassimhaddad / branch_workflow.md
Last active December 11, 2015 20:39
Git - needed functions to create a new repository in Rstudio and then uploading it to bitbucket
@nassimhaddad
nassimhaddad / non-ascii.R
Created January 26, 2013 18:13
remove non-ascii characters
# remove non-ascii characters
df$text <- gsub("[^\x20-\x7E]", "", df$text)
@nassimhaddad
nassimhaddad / read.xls.R
Last active December 11, 2015 18:28
read excel sheets
#' best package to read excel files is gdata
#' which works with both .xls and .xlsx
#' windows: follow instructions here:
#' http://cran.r-project.org/web/packages/gdata/INSTALL
library(gdata)
xlsx_file <- "myfile.xls"
sheet1 <- read.xls(xlsx_file,
sheet = "Sheet1",
stringsAsFactors = FALSE,
@nassimhaddad
nassimhaddad / levenshtein.R
Created January 26, 2013 09:55
String matching, distance between two strings. Works particularly well to detect retweets or tweet variations.
### string matching
### metric to find the similarity between two strings
### some context in:
### http://en.wikipedia.org/wiki/String_metric
### testing levenshtein metric
library(RecordLinkage)
@nassimhaddad
nassimhaddad / import_json.R
Last active December 11, 2015 17:38
import json file
# install.packages("rjson")
library("rjson")
json_file <- "json_file.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
# additional if needed
library(plyr)
json_data <- lapply(json_data, as.data.frame)
json_data <- do.call(rbind.fill, json_data)
@nassimhaddad
nassimhaddad / digest.R
Created January 19, 2013 08:49
digest package to create hash from any r object
library(digest)
test <- c("hobe", "jmjj", 1)
digest(test, algo = "md5")
digest(test, algo = "sha1")
digest(test, algo = "crc32") # not collision proof
digest(test, algo = "sha256")
digest(test, algo = "sha512")
@nassimhaddad
nassimhaddad / hist_compare.R
Created January 16, 2013 12:06
compare histograms by plotting their sensity functions in the same chart
plot(density(data1))
lines(density(data2), col = blue)
@nassimhaddad
nassimhaddad / get_word_count.R
Created January 16, 2013 08:26
function that counts the number of words (= delimited by " ") in a string.
get_word_count <- function(string){
length(unlist(strsplit(as.character(string), " ")))
}
@nassimhaddad
nassimhaddad / read_from_clipboard.R
Last active December 11, 2015 04:29
read data from clipboard, works with excel
# windows
x <- read.delim(file("clipboard","r"),
header=TRUE,
stringsAsFactors = FALSE)
# mac
data <- read.table(pipe("pbpaste"), sep="\t", header=T)
# read from and write to clipboard with Kmisc (windows + OS X):
library(Kmisc)
df <- data.frame(f = 1:4, g = letters[1:4])
df$g <- factor(df$g, levels = letters[4:1])