Skip to content

Instantly share code, notes, and snippets.

View uolter's full-sized avatar

Walter Traspadini uolter

View GitHub Profile
#! /bin/sh
### BEGIN INIT INFO
# Provides: elasticsearch
# Required-Start: $all
# Required-Stop: $all
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Starts elasticsearch
# Description: Starts elasticsearch using start-stop-daemon
### END INIT INFO

Simple Website Crawler

The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.

Basic Usage

from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')

displays the urls