Dapangmao dapangmao

####神仙打架，凡人遭殃

回国了一段时间，发现网站封锁的厉害。常用的Google，Facebook，Wiki等等都无从寻觅，叫人好生惆怅。连技术类的GitHub也慢的登录不上去. 这回G FW可能从惯用的DNS污染找到了灵感, 劫持了Baidu Analytics的javascript, 从而绑架了海外访问国内网站的流量. 访问使用Baidu Analytics的网站的浏览器, 会导致每两秒钟会去访问两个GitHub Pages (1 和 2）。话说GitHub也是牛, 这么大规模的distributed denial-of-service攻击只是慢, 也没有down掉.

然后Google和Mozollia禁掉了CNNIC的SSL证书。这个也很麻烦，比如人在海外，上12306买个火车票。用chrome或者firefox就会看到。

####具体的翻墙方案这才意识到翻墙是必需的. 正好手头有一个DigitalOcean的最便宜VPS。

###On a single machine

import heapq
class find_median(object):
    def __init__(self):
        self.first_half = [] # will be a max heap
        self.second_half = [] # will be a min heap, 1/2 chance has one more element
        self.N = 0

###Transform RDD to DataFrame in Spark

from pyspark.sql import Row
import os

rdd = sc.textFile('C:/Users/chao.huang.ctr/spark-playground//class.txt')
def transform(x):
    args = x.split()
 funcs = [str, str, int, float, float]

###Why a minimal cluster

Testing:
Prototyping

###Requirements

I need a cluster that lives short time and handles ad-hoc requests of data analysis, or more specificly, running Spark. I want it to be quickly created to load data to memory. And I don't want to keep the cluster perpetually. Therefore, a public cloud may be the best fit for my demand.

Intranet speed

http://www.ninechapter.com/course/2/

##Basics

Evolution of taobao
- Remove IOE
- Add memcached and hadoop
- Add CDN
NoSQL
- get(key)
put(key, value)

http://www.ninechapter.com/course/1/

Basics and two pointers

complexity
two pointers
- slow vs fast
- left vs right

Linked list

	//http://code2flow.com
	A Q is raised;
	Name of the function;
	Input type and output type;
	Test case;
	Contrains / time / space requirements;

	if (Q in [Leetcode, CareerCup] or similiar)
	{
	Recall the answer;

	1. Add schema after becoming DataFrame

	sqlCtx.inferSchema(rdd1)

	1. Add schema after becoming DataFrame

	from pyspark.sql import Row
	import os

	current_path = os.getcwd()

	sudo openvpn --config *.opvn
	apt-get update
	apt-get install vim
	wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0-bin-hadoop2.4.tgz \| tar zxf
	hadoop fs -mkdir /spark
	hadoop fs -put spark-1.3.0-bin-hadoop2.4.tgz /spark
	hadoop fs -du -h /spark
	cp spark-env.sh.template spark-env.sh

	class TreeNode:
	def __init__(self, x):
	self.val = x
	self.left = None
	self.right = None


	input = [1, 2, 3, 4, '#', 5, 6, 7, 8, 9]

	def map_node(x):