Skip to content

Instantly share code, notes, and snippets.

@chainsawriot
chainsawriot / freqtable
Created February 21, 2013 13:36
Frequency table of Chinese tokens extracted from evchk, zhyue wikipedia and press releases of HKGOV. Released under CC-BY 3.0 unported.
tokens freq
的 366981
一 232555
年 229119
人 190091
有 189779
會 160113
日 144340
月 135667
港 132770
@chainsawriot
chainsawriot / Example of do.call rbind
Last active January 1, 2016 14:19
Combining list of data.frames into a single data.frame.
# combining list of data frames
irisBroken <- list(iris[1:10,], iris[11:30,])
do.call(rbind, irisBroken)
@chainsawriot
chainsawriot / arraycolrenames
Created January 2, 2014 17:16
Array rename
testarray <- array(1:6, c(2,3))
dimnames(testarray) <- list(c("a", "b"), c("d", "e","f"))
testarray
dimnames(testarray) <- list(c("H", "K"), c("R", "U","G"))
@chainsawriot
chainsawriot / gist:8330931
Last active January 2, 2016 16:29
Reading data with multiple comments The comment chars should not be those special regex characters.
### assuming the test.tab have this structure
# hello
# ! This is comment
# # This is also a comment
# 1
# 2
# 3
read.table(text=sub(paste0("[!#]", ".*"), "", readLines("test.tab")), header=TRUE)
iris[c(3,2,1),] # index the dataframe by an "indices vector", will only pick the first three rows in reversed order
order(iris$Sepal.Length) # generate an "indices vector" based on the ranked value of Sepal Length
# therefore
iris[order(iris$Sepal.Length, decreasing=TRUE),]
# is ordered by row based on the value of Sepal Length, you still need to specify the column required.
# try these also
@chainsawriot
chainsawriot / analysis.R
Last active August 29, 2015 14:22
rainstorm
require(dplyr)
require(magrittr)
# 95% CI: assume normally distributed
read.table("rainstorm.fwf") %>% mutate(hr = V6 + (V7/60)) %>% group_by(V1) %>% summarise(meanhr = mean(hr), sehr = sqrt(var(hr)/(length(hr)-1)), lowerCI = meanhr - (1.96*sehr), upperCI = meanhr + (1.96*sehr))
# Median and 2.5 and 97.5 percentile. Mean > Median, the data is positive skew and mean will overestimate the central tendency. Use median for robustness
read.table("rainstorm.fwf") %>% mutate(hr = V6 + (V7/60)) %>% group_by(V1) %>% summarise(medianhr = median(hr), lowerqhr = quantile(hr, probs = 0.025), upperqhr = quantile(hr, probs = 0.975))
# github.com/chainsawriot
import os, sys
from selenium import webdriver
import selenium.webdriver.support.ui as ui
from selenium.webdriver.common.keys import Keys
# your facebook username and password
USERNAME = ""
@chainsawriot
chainsawriot / rdebugger.R
Last active January 13, 2016 08:40
how to use the R debugger
## prereq:
## 1) How to define a function
## Security level
## Warning: Does not stop execution
## Error: Stop execution
## example of warning
log(-1)
gendf <- function() {
a <- data.frame(x = rnorm(200, 1, 1), y = rnorm(200, 3, 1))
b <- data.frame(x = rnorm(200, 9, 1), y = rnorm(200, 10, 1))
z <- c(rep(1, 200), rep(2, 200))
cbind(rbind(a, b), z)[sample(1:400),]
}
df1 <- gendf()
df2 <- gendf()
var click_score_pos = function(rev_id, pos) {
var reviews = document.querySelector("div[role='presentation'].x-grid3-body").querySelectorAll('tr');
tds = reviews[rev_id].querySelectorAll('td')
for (var key of tds.keys()) {
if (tds[key].getAttribute('class').includes('wrong')) {
var exId = key;
}
}
tds[exId + pos].click()
// sleep(2000)