Skip to content

Instantly share code, notes, and snippets.

View cheekybastard's full-sized avatar

cheekybastard

View GitHub Profile

Django request flow

-------------------------------                         ------------------ Django --------------------
| Browser: GET /udo/contact/2 |    === wsgi/fcgi ===>   | 1. Asks OS for DJANGO_SETTINGS_MODULE      |
-------------------------------                         | 2. Build Request (from wsgi/fcgi callback) |
                                                        | 3. Get settings.ROOT_URLCONF module        |
                                                        | 4. Resolve URL/view from request.path      | # url(r'^udo/contact/(?P<id>\w+)', view, name='url-identifier')
                                                        | 5. Apply request middlewares               | # settings.MIDDLEWARE_CLASSES
Use extractDocs.py to parse and index the StackOverflow posts.xml file into an existing index.
Usage: extractDocs.py [options] file
Options:
-h, --help show this help message and exit
-s SERVER, --server=SERVER
ElasticSearch Server
-i INDEX, --index=INDEX
Index name to use
/*jshint globalstrict:true */
/*global angular:true */
'use strict';
angular.module('demo', [
'demo.controllers',
'demo.directives',
'elasticjs.service'
]);
# coding=UTF-8
from __future__ import division
import re
# This is a naive text summarization algorithm
# Created by Shlomi Babluki
# April, 2013
class SummaryTool(object):
# coding=UTF-8
import nltk
from nltk.corpus import brown
# This is a fast and simple noun phrase extractor (based on NLTK)
# Feel free to use it, just keep a link back to this post
# http://thetokenizer.com/2013/05/09/efficient-way-to-extract-the-main-topics-of-a-sentence/
# Create by Shlomi Babluki
# May, 2013
#! /usr/bin/env python
# See http://preshing.com/20130115/view-your-filesystem-history-using-python
import optparse
import os
import fnmatch
import time
# Parse options
parser = optparse.OptionParser(usage='Usage: %prog [options] path [path2 ...]')
parser.add_option('-g', action='store', type='long', dest='secs', default=10,
# Rewritten code from /r2/r2/lib/db/_sorts.pyx
# http://amix.dk/blog/post/19588 # How Reddit ranking algorithms work
# http://amix.dk/blog/post/19574 # How Hacker News ranking algorithm works
# http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
from math import sqrt
def _confidence(ups, downs):
n = ups + downs
/* =========================================================================
CurveZMQ - authentication and confidentiality for 0MQ
-------------------------------------------------------------------------
Copyright (c) 1991-2013 iMatix Corporation <www.imatix.com>
Copyright other contributors as noted in the AUTHORS file.
This is free software; you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 3 of the License, or (at
from scrapy import log
from scrapy.item import Item
from scrapy.http import Request
from scrapy.contrib.spiders import XMLFeedSpider
def NextURL():
"""
Generate a list of URLs to crawl. You can query a database or come up with some other means
Note that if you generate URLs to crawl from a scraped URL then you're better of using a
<?xml version="1.0" encoding="UTF-8" ?>
<Data>
<Series>
<id>83462</id>
<Actors>|Nathan Fillion|Stana Katic|Molly C. Quinn|Jon Huertas|Seamus Dever|Tamala Jones|Susan Sullivan|Ruben Santiago-Hudson|Monet Mazur|</Actors>
<Airs_DayOfWeek>Monday</Airs_DayOfWeek>
<Airs_Time>10:00 PM</Airs_Time>
<ContentRating>TV-PG</ContentRating>
<FirstAired>2009-03-09</FirstAired>
<Genre>|Drama|</Genre>