Skip to content

Instantly share code, notes, and snippets.

View dapangmao's full-sized avatar
🏠
Working from home

Dapangmao dapangmao

🏠
Working from home
View GitHub Profile
//http://code2flow.com
A Q is raised;
Name of the function;
Input type and output type;
Test case;
Contrains / time / space requirements;
if (Q in [Leetcode, CareerCup] or similiar)
{
Recall the answer;
@dapangmao
dapangmao / blog.md
Last active January 14, 2016 20:07
回国翻墙小结

####神仙打架,凡人遭殃

回国了一段时间,发现网站封锁的厉害。常用的Google,Facebook,Wiki等等都无从寻觅,叫人好生惆怅。连技术类的GitHub也慢的登录不上去. 这回G FW可能从惯用的DNS污染找到了灵感, 劫持了Baidu Analytics的javascript, 从而绑架了海外访问国内网站的流量. 访问使用Baidu Analytics的网站的浏览器, 会导致每两秒钟会去访问两个GitHub Pages (1 和 2)。话说GitHub也是牛, 这么大规模的distributed denial-of-service攻击只是慢, 也没有down掉.

然后Google和Mozollia禁掉了CNNIC的SSL证书。这个也很麻烦,比如人在海外,上12306买个火车票。用chrome或者firefox就会看到。

####具体的翻墙方案 这才意识到翻墙是必需的. 正好手头有一个DigitalOcean的最便宜VPS。

@dapangmao
dapangmao / solution.md
Last active August 29, 2015 14:17
maintain a median

###On a single machine

import heapq
class find_median(object):
    def __init__(self):
        self.first_half = [] # will be a max heap
        self.second_half = [] # will be a min heap, 1/2 chance has one more element
        self.N = 0
@dapangmao
dapangmao / gist:84618a65ac5f921db76a
Created March 18, 2015 15:00
Two ways to transform RDD to DataFrame in Spark
1. Add schema after becoming DataFrame
sqlCtx.inferSchema(rdd1)
1. Add schema after becoming DataFrame
from pyspark.sql import Row
import os
current_path = os.getcwd()
@dapangmao
dapangmao / blog.md
Last active April 5, 2016 15:57
Spark example

###Transform RDD to DataFrame in Spark

from pyspark.sql import Row
import os

rdd = sc.textFile('C:/Users/chao.huang.ctr/spark-playground//class.txt')
def transform(x):
    args = x.split()
 funcs = [str, str, int, float, float]
@dapangmao
dapangmao / blog.md
Last active August 29, 2015 14:17
Deploy a minimal Spark cluster

###Why a minimal cluster

  1. Testing:

  2. Prototyping

###Requirements

I need a cluster that lives short time and handles ad-hoc requests of data analysis, or more specificly, running Spark. I want it to be quickly created to load data to memory. And I don't want to keep the cluster perpetually. Therefore, a public cloud may be the best fit for my demand.

  1. Intranet speed
@dapangmao
dapangmao / s.bash
Last active September 1, 2020 16:28
How to set up a spark cluster on digitalocean
sudo openvpn --config *.opvn
apt-get update
apt-get install vim
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0-bin-hadoop2.4.tgz | tar zxf
hadoop fs -mkdir /spark
hadoop fs -put spark-1.3.0-bin-hadoop2.4.tgz /spark
hadoop fs -du -h /spark
cp spark-env.sh.template spark-env.sh
class TreeNode:
def __init__(self, x):
self.val = x
self.left = None
self.right = None
input = [1, 2, 3, 4, '#', 5, 6, 7, 8, 9]
def map_node(x):