Dapangmao dapangmao

###Why a minimal cluster

Testing:
Prototyping

###Requirements

I need a cluster that lives short time and handles ad-hoc requests of data analysis, or more specificly, running Spark. I want it to be quickly created to load data to memory. And I don't want to keep the cluster perpetually. Therefore, a public cloud may be the best fit for my demand.

Intranet speed

###Transform RDD to DataFrame in Spark

from pyspark.sql import Row
import os

rdd = sc.textFile('C:/Users/chao.huang.ctr/spark-playground//class.txt')
def transform(x):
    args = x.split()
 funcs = [str, str, int, float, float]

###On a single machine

import heapq
class find_median(object):
    def __init__(self):
        self.first_half = [] # will be a max heap
        self.second_half = [] # will be a min heap, 1/2 chance has one more element
        self.N = 0

####神仙打架，凡人遭殃

回国了一段时间，发现网站封锁的厉害。常用的Google，Facebook，Wiki等等都无从寻觅，叫人好生惆怅。连技术类的GitHub也慢的登录不上去. 这回G FW可能从惯用的DNS污染找到了灵感, 劫持了Baidu Analytics的javascript, 从而绑架了海外访问国内网站的流量. 访问使用Baidu Analytics的网站的浏览器, 会导致每两秒钟会去访问两个GitHub Pages (1 和 2）。话说GitHub也是牛, 这么大规模的distributed denial-of-service攻击只是慢, 也没有down掉.

然后Google和Mozollia禁掉了CNNIC的SSL证书。这个也很麻烦，比如人在海外，上12306买个火车票。用chrome或者firefox就会看到。

####具体的翻墙方案这才意识到翻墙是必需的. 正好手头有一个DigitalOcean的最便宜VPS。

Undirected graph

"""
Given a 2d grid map of '1's (land) and '0's (water), count the number of islands. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.

Example 1:

11110
11010
11000

From here

Decorateor

from functools import wraps

def makebold(fn):
    @wraps(fn)
    def wrapped():
        return "<b>" + fn() + "</b>"

	sudo openvpn --config *.opvn
	apt-get update
	apt-get install vim
	wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0-bin-hadoop2.4.tgz \| tar zxf
	hadoop fs -mkdir /spark
	hadoop fs -put spark-1.3.0-bin-hadoop2.4.tgz /spark
	hadoop fs -du -h /spark
	cp spark-env.sh.template spark-env.sh

	1. Add schema after becoming DataFrame

	sqlCtx.inferSchema(rdd1)

	1. Add schema after becoming DataFrame

	from pyspark.sql import Row
	import os

	current_path = os.getcwd()

	//http://code2flow.com
	A Q is raised;
	Name of the function;
	Input type and output type;
	Test case;
	Contrains / time / space requirements;

	if (Q in [Leetcode, CareerCup] or similiar)
	{
	Recall the answer;

	class Solution:
	def combinationSum(self, candidates, target):
	self.res = []
	self.dfs(sorted(candidates), [], target)
	return self.result

	def dfs(self, candidates, current, target):
	if target == 0:
	self.res.append(current)
	return