The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.
from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')
daemon off; | |
# Heroku dynos have at least 4 cores. | |
worker_processes <%= ENV['NGINX_WORKERS'] || 4 %>; | |
events { | |
use epoll; | |
accept_mutex on; | |
worker_connections 1024; | |
} |
<RoutingRules> | |
<RoutingRule> | |
<Condition> | |
<HttpErrorCodeReturnedEquals>404</HttpErrorCodeReturnedEquals> | |
</Condition> | |
<Redirect> | |
<HostName>danilop.net</HostName> | |
<ReplaceKeyWith/> | |
</Redirect> | |
</RoutingRule> |
""" Inspired by http://flask.pocoo.org/snippets/40/ """ | |
app = Flask(__name__) | |
@app.url_defaults | |
def hashed_url_for_static_file(endpoint, values): | |
if 'static' == endpoint or endpoint.endswith('.static'): | |
filename = values.get('filename') | |
if filename: | |
if '.' in endpoint: # has higher priority |
from selenium import webdriver | |
from selenium.webdriver.common.keys import Keys | |
import unittest as ut | |
import time | |
class NewPaymentTest(ut.TestCase): | |
def setUp(self): | |
self.browser = webdriver.Firefox() | |
self.browser.implicitly_wait(3) |
The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.
from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')
Individual modules of es5-ext package. See ES6 features for usage information.
A while ago I needed a deepcopy function in python. I found out however that for my usecase I could better build my own. I want to share it so that others might benefit as well.
If the data you're copying is simple data, deepcopy might be overkill. With simple I mean if your data is representable as Json. Let me illustrate with code:
I've used [json-generator](http://www.json-generator.com/ to get some sample json data.)
def deepCopyList(inp):
for vl in inp:
if isinstance(vl, list):
curl -XPOST localhost:9200/boosts -d ' | |
{ | |
"settings": { | |
"number_of_shards": 1, | |
"number_of_replicas": 0 | |
}, | |
"mappings": { | |
"doc": { | |
"properties": { | |
"text": { |
""" | |
As Hubic web services are deprecated, this is a small script to request | |
access and refresh token. It starts a flask server at http://localhost:5000/, the | |
users fill the hubic authentication form with its navigator and obtain the | |
credentials returned by the application. | |
You need requests-oauthlib and flask: | |
pip install flask | |
pip install requests-oauthlib |
This simple script will take a picture of a whiteboard and use parts of the ImageMagick library with sane defaults to clean it up tremendously.
The script is here:
#!/bin/bash
convert "$1" -morphology Convolve DoG:15,100,0 -negate -normalize -blur 0x1 -channel RBG -level 60%,91%,0.1 "$2"