Daniel Waardal waawal

Simple Website Crawler

The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.

Basic Usage

from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')

displays the urls

List of ECMAScript 6 shims

Implemented on top of ECMAScript 5

Provided as distinct CJS modules, installable via npm

ECMAScript 5 Built-in Objects extensions

Individual modules of es5-ext package. See ES6 features for usage information.

Array

A while ago I needed a deepcopy function in python. I found out however that for my usecase I could better build my own. I want to share it so that others might benefit as well.

If the data you're copying is simple data, deepcopy might be overkill. With simple I mean if your data is representable as Json. Let me illustrate with code:

I've used [json-generator](http://www.json-generator.com/ to get some sample json data.)

def deepCopyList(inp):
    for vl in inp:
        if isinstance(vl, list):

Description

This simple script will take a picture of a whiteboard and use parts of the ImageMagick library with sane defaults to clean it up tremendously.

The script is here:

#!/bin/bash
convert "$1" -morphology Convolve DoG:15,100,0 -negate -normalize -blur 0x1 -channel RBG -level 60%,91%,0.1 "$2"

	daemon off;
	# Heroku dynos have at least 4 cores.
	worker_processes <%= ENV['NGINX_WORKERS'] \|\| 4 %>;

	events {
	use epoll;
	accept_mutex on;
	worker_connections 1024;
	}

	<RoutingRules>
	<RoutingRule>
	<Condition>
	<HttpErrorCodeReturnedEquals>404</HttpErrorCodeReturnedEquals>
	</Condition>
	<Redirect>
	<HostName>danilop.net</HostName>
	<ReplaceKeyWith/>
	</Redirect>
	</RoutingRule>

	""" Inspired by http://flask.pocoo.org/snippets/40/ """

	app = Flask(__name__)

	@app.url_defaults
	def hashed_url_for_static_file(endpoint, values):
	if 'static' == endpoint or endpoint.endswith('.static'):
	filename = values.get('filename')
	if filename:
	if '.' in endpoint: # has higher priority

	from selenium import webdriver
	from selenium.webdriver.common.keys import Keys
	import unittest as ut
	import time

	class NewPaymentTest(ut.TestCase):

	def setUp(self):
	self.browser = webdriver.Firefox()
	self.browser.implicitly_wait(3)

	curl -XPOST localhost:9200/boosts -d '
	{
	"settings": {
	"number_of_shards": 1,
	"number_of_replicas": 0
	},
	"mappings": {
	"doc": {
	"properties": {
	"text": {

	"""
	As Hubic web services are deprecated, this is a small script to request
	access and refresh token. It starts a flask server at http://localhost:5000/, the
	users fill the hubic authentication form with its navigator and obtain the
	credentials returned by the application.

	You need requests-oauthlib and flask:
	pip install flask
	pip install requests-oauthlib