Skip to content

Instantly share code, notes, and snippets.

from twisted.web import client
from twisted.internet import reactor, defer
from lxml import etree
from StringIO import StringIO
import re
import os
save_dir = "~/.video/tube8/"
base_url = "http://www.tube8.com"
download_re = re.compile("so.addVariable\('videoUrl',\s*'([\w\d.:/_]*)'\);", re.M)
@toddlipcon
toddlipcon / er
Created September 21, 2011 07:21
This file has been truncated, but you can view the full file.
it was a bright cold day in April , and the clocks were striking thirteen . Winston Smith , his chin nuzzled into his breast in an effort to escape the vile wind , slipped quickly through the glass doors of Victory Mansions , though not quickly enough to prevent a swirl of gritty dust from entering along with him . intr - o zi senina si friguroasa de aprilie , pe cind ceasurile bateau ora treisprezece , Winston Smith , cu barbia infundata in piept pentru a scapa de vintul care - l lua pe sus , se strecura iute prin usile de sticla ale Blocului Victoria , desi nu destul de repede pentru a impiedica un virtej de praf si nisip sa patrunda o data cu el .
the hallway smelt of boiled cabbage and old rag mats . holul blocului mirosea a varza calita si a presuri vechi .
at one end of it a coloured poster , too large for indoor display , had been tacked to the wall . It depicted simply an enormous face , more than a metre wide : the face of a man of about forty - five , with a heavy black moustache and ruggedly ha
@tokoroten
tokoroten / tiny_web_crawler.py
Created July 28, 2012 07:10
tiny_web_crawler
#coding:utf-8
import urllib
import BeautifulSoup
import urlparse
import time
def main():
urlList = open("seed.txt","r").read().splitlines()
allowDomainList = set(open("allowDomain.txt","r").read().splitlines())
@piscisaureus
piscisaureus / pr.md
Created August 13, 2012 16:12
Checkout github pull requests locally

Locate the section for your github remote in the .git/config file. It looks like this:

[remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = git@github.com:joyent/node.git

Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/* to this section. Obviously, change the github url to match your project's URL. It ends up looking like this:

@tk0miya
tk0miya / index.txt
Last active January 29, 2019 23:03
sphinxcontrib_wikitable.py
Usage
======
.. wiki-table::
:header-rows: 1
:widths: 2 3 5
|id|header1|header2|
|1|hello|world|
|2|foo|:strong:`bar`|

GenericUDFHiveLogo リリース!

この記事は,2013年のエイプリルフール用に作成した記事です.

背景

2013年1月21日,Hadoop Conference Japan 2013 Winter にて, Hive Logo Lovers は恐らく世界初のHive T シャツを片手に,衝撃的なデビューを果たしました.今や,その愛くるしい T シャツは Cloudera や TreasureData のトップエンジニアにも愛用されていると言います.

あれから約2ヶ月半.Hive Logo Lovers は,Hive Logo の露出時間を1秒でも長くしようと活動してきました.

こんな感じ

cd $HIVE_HOME

# パッチをあててビルド
wget https://issues.apache.org/jira/secure/attachment/12577210/HIVE-4299.patch
patch -p0 < HIVE-4299.patch
ant clean package

# テストの内容作成(.qファイルを使ったpositiveテスト)
@mopemope
mopemope / crawler.py
Last active October 19, 2017 04:53
2ch crawler prototype
# -*- coding: utf-8 -*-
import requests
from pyquery import PyQuery as pq
import parser
import re
import datastore
url_re = re.compile(".*/(\d+)/.*", re.M)
import re
from saying.exceptions import ApplicationException
from itertools import chain
class DSLSyntaxError(ApplicationException):
pass
def format(handler, text, message):
var = {
u'sender': lambda: resolve_handle(message.Sender.Handle),
@tokoroten
tokoroten / amidakuji.py
Created June 13, 2013 15:07
アミダクジは公平な仕組みなのかの検証。 縦線が10本のアミダクジでは、横線を150本程度ひかないと公平なアミダクジにはなりませんでした。 それ以下の場合、当たりがある場所の真上を選択すると当たり確率が高いです。
#coding:utf-8
import random
def amida_shuffle(line_num, exchange_num):
lines = range(line_num)
for i in xrange(exchange_num):
p = random.randrange(line_num - 1)
lines[p], lines[p+1] = lines[p+1], lines[p]
return lines