This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Beautiful Soup | |
Elixir and Tonic | |
"The Screen-Scraper's Friend" | |
http://www.crummy.com/software/BeautifulSoup/ | |
Beautiful Soup parses a (possibly invalid) XML or HTML document into a | |
tree representation. It provides methods and Pythonic idioms that make | |
it easy to navigate, search, and modify the tree. | |
A well-formed XML/HTML document yields a well-formed data |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#-*-coding:utf-8-*- | |
from BeautifulSoup import BeautifulSoup | |
import requests | |
from PIL import Image | |
from StringIO import StringIO | |
r = requests.get('http://www.xiaomi.com') | |
assert(r.status_code == 200) | |
soup = BeautifulSoup(r.text) | |
urls = soup.findAll('img') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import md5 | |
class HashRing(object): | |
def __init__(self, nodes=None, replicas=3): | |
"""Manages a hash ring. | |
`nodes` is a list of objects that have a proper __str__ representation. | |
`replicas` indicates how many virtual points should be used pr. node, | |
replicas are required to improve the distribution. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from __future__ import with_statement | |
import os | |
from django.core import management | |
# We have to re-name this to avoid clashes with fabric.api.settings. | |
import ohbooklist.conf.local.settings as django_settings | |
management.setup_environ(django_settings) | |
from fabric.api import * |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*-coding: utf-8 -*- | |
from struct import * | |
class BinaryStream: | |
def __init__(self,base_stream): | |
self.base_stream = base_stream | |
self.offset = 0 | |
def readBytes(self,length): | |
string, = unpack_from(str(length) + 's',self.base_stream[self.offset:]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Copyright (c) 2010 Pedro Matiello <[email protected]> | |
# Juarez Bochi <[email protected]> | |
# | |
# Permission is hereby granted, free of charge, to any person | |
# obtaining a copy of this software and associated documentation | |
# files (the "Software"), to deal in the Software without | |
# restriction, including without limitation the rights to use, | |
# copy, modify, merge, publish, distribute, sublicense, and/or sell | |
# copies of the Software, and to permit persons to whom the | |
# Software is furnished to do so, subject to the following |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
最近我开始学习 Hadoop,本来以为课程应该会更多的侧重如何管理 Hadoop 集群,没想到开始阶段,老师为了让我们更好的理解 Hadoop 的 MapReduce 机制,让我们自己先来实现一个谷歌的 PageRank 算法,本来我想打算使用 Java 来实现的,因为毕竟过段时间,我需要在 Hadoop 集群上部署 Java 代码从而实现数据分析,但我从毕业后就再没用过 Java 写过一行代码,所以我真是写不出来啊,尤其是 PageRank 基本就是矩阵和向量的迭代运算,用 Java 的话一定用到二维数组,我上学的时候学的就不太好。我考虑再三还是决定用 Python 来实现,毕竟上半年的时候自学了一些 Python 语言,而且我知道 Python 有一个第三方模块叫 python-graph,用它来做图论方面的编程容易很多。我是在 Linode VPS 上搭建的 Python 编程环境。相关的模块安装过程如下: | |
[root@chenjunlu ~]# yum install graphviz* | |
[root@chenjunlu ~]# yum install vsftpd | |
[root@chenjunlu ~]# wget http://python-graph.googlecode.com/files/python-graph-core-1.8.2.tar.gz | |
[root@chenjunlu ~]# tar -zxvf python-graph-core-1.8.2.tar.gz | |
[root@chenjunlu ~]# cd python-graph-core-1.8.2 | |
[root@chenjunlu ~]# python setup.py install |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# PageRank algorithm | |
# By Peter Bengtsson | |
# http://www.peterbe.com/ | |
# [email protected] | |
# | |
# Requires the numarray module | |
# http://www.stsci.edu/resources/software_hardware/numarray | |
from numarray import * | |
import numarray.linear_algebra as la |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
一、文本文件比较命令diff | |
1>diff命令的功能 | |
Linux中diff命令的功能为逐行比较两个文本文件,列出其不同之处。它对给出的文件进行系统的检查,并显示出两个文件中所有不同的行,不要求事先对文件进行排序。 | |
2>语法 | |
diff [options] file1 file2 | |
该命令告诉用户,为了使两个文件file1和file2一致,需要修改它们的哪些行。如果用”-”表示file1或file2,则表示标准输入。如果file1或file2是目录,那么diff将使用该目录中的同名文件进行比较。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var check = require('validator').check, | |
sanitize = require('validator').sanitize | |
//Validate | |
check('[email protected]').len(6, 64).isEmail(); //Methods are chainable | |
check('abc').isInt(); //Throws 'Invalid integer' | |
check('abc', 'Please enter a number').isInt(); //Throws 'Please enter a number' | |
check('abcdefghijklmnopzrtsuvqxyz').is(/^[a-z]+$/); |