Skip to content

Instantly share code, notes, and snippets.

View rmax's full-sized avatar
:octocat:
ヾ(⌐■_■)ノ♪

R Max Espinoza rmax

:octocat:
ヾ(⌐■_■)ノ♪
View GitHub Profile
--- a/scrapy/contrib/downloadermiddleware/httpcompression.py
+++ b/scrapy/contrib/downloadermiddleware/httpcompression.py
@@ -33,7 +33,11 @@ class HttpCompressionMiddleware(object):
def _decode(self, body, encoding):
if encoding == 'gzip' or encoding == 'x-gzip':
- body = gunzip(body)
+ try:
+ body = gunzip(body)
+ except IOError:
@rmax
rmax / avg.orig.py
Created December 30, 2011 00:34
non-pythonic vs pythonic code
from __future__ import division
def c_avrg(the_dict, exclude):
""" Calculate the average excluding the given element"""
i = 0
total = 0
for e in the_dict:
if e != exclude:
i += 1
total += the_dict[e]
@rmax
rmax / example usage
Created December 2, 2011 03:40
script to download mp3 files from contenidos.comteco.com.bo
bash$ python mp3box.py http://contenidos.comteco.com.bo/component/content/article/15-mp3-box/6434-top-40-usa.html
Downloading adele-rolling_in_the_deep.mp3 to /home/rolando/adele-rolling_in_the_deep.mp3
Downloading blake_shelton-honey_bee.mp3 to /home/rolando/blake_shelton-honey_bee.mp3
Downloading bruno_mars-grenade.mp3 to /home/rolando/bruno_mars-grenade.mp3
Downloading bruno_mars-just_the_way_you_are.mp3 to /home/rolando/bruno_mars-just_the_way_you_are.mp3
...
@rmax
rmax / gist:1350162
Created November 9, 2011 02:41
bash prompt
# http://i.imgur.com/Tg068.png
PS1='\[\033[01;34m\]`if [ \$? = 0 ]; then echo \[\e[0\;32m\]\(^_^\); else echo \[\e[0\;31m\]\(0_0\); fi` ~ ${SECONDS}s\n\[\e[1;34m\][\t \u@\h:\w]\n\[\033[1;35m\]$>\[\033[00m\] '
@rmax
rmax / gist:1250036
Created September 29, 2011 05:20
aggreate tweets in 15-minutes slots
import sys
from disco import func
from disco.core import Job
def mapper((id, tweet), params):
import rfc822
from datetime import datetime, timedelta
from time import mktime
utc_dt = datetime.fromtimestamp(mktime(rfc822.parsedate(tweet['created_at'])))
@rmax
rmax / merge_dicts.py
Created September 21, 2011 21:05
using itertools's chain and groupby to merge a list of dictionaries
def merge_dicts(dict_list):
"""Merge all values from dict list into a single dict
>>> d1 = {'a': 1, 'b': 2}
>>> d2 = {'a': 2, 'b': 3}
>>> merge_dicts([d1, d2])
{'a': [1, 2], 'b': [2, 3]}
"""
kviter = chain.from_iterable(d.iteritems() for d in dict_list)
@rmax
rmax / dupefilter.py
Created August 28, 2011 23:12
A Redis-based request dupefilter for Scrapy
import redis
from scrapy.dupefilter import BaseDupeFilter
from scrapy.utils.request import request_fingerprint
class RedisDupeFilter(BaseDupeFilter):
def __init__(self, host, port):
self.redis = redis.Redis(host, port)
@rmax
rmax / bencode.py
Created August 5, 2011 02:14
bittorrent format encoder/decoder that I found somewhere in the internet
# The contents of this file are subject to the Python Software Foundation
# License Version 2.3 (the License). You may not copy or use this file, in
# either source code or executable form, except in compliance with the License.
# You may obtain a copy of the License at http://www.python.org/license.
#
# Software distributed under the License is distributed on an AS IS basis,
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
# for the specific language governing rights and limitations under the
# License.
@rmax
rmax / parse_urls.py
Created June 2, 2011 14:35
using scrapy without scrapy
"""
python parse_urls.py http://somesite/foo/ ".pdf\$"
"""
import sys
import urllib2
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.http import HtmlResponse
@rmax
rmax / datauri.py
Created March 29, 2011 19:46
single script to convert a image into data uri
#!/usr/bin/env python
"""
Simple script to convert a image into data uri.
More info http://en.wikipedia.org/wiki/Data_URI_scheme
"""
import base64
import mimetypes
import sys