Skip to content

Instantly share code, notes, and snippets.

View msukmanowsky's full-sized avatar
🥳
Building the future of how companies work with elvex!

Mike Sukmanowsky msukmanowsky

🥳
Building the future of how companies work with elvex!
View GitHub Profile
@msukmanowsky
msukmanowsky / pyspark_cassandra.py
Last active August 29, 2015 14:08
Work in progress ideas for a PySpark binding to the DataStax Cassandra-Spark Connector.
from pyspark.context import SparkContext
from pyspark.serializers import BatchedSerializer, PickleSerializer
from pyspark.rdd import RDD
from py4j.java_gateway import java_import
class CassandraSparkContext(SparkContext):
def _do_init(self, *args, **kwargs):
@msukmanowsky
msukmanowsky / CassandraConverters.scala
Last active August 29, 2015 14:08
Custom version of CassandraConverters.scala in the spark/examples/src/main/scala/org/apache/spark/examples/pythonconverters/CassandraConverters.scala. Provides better (though not perfect) serialization of keys and values for CqlOutputFormat.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
@msukmanowsky
msukmanowsky / storm_version.py
Last active August 29, 2015 14:07
Parse Apache Storm versions in Python and do easy comparisons on them. You could probably even import something from here https://github.com/pypa/pip/blob/19e29fc2e8e57a671e584726655bbb42c6e15eee/pip/_vendor/distlib/version.py and it'd work just fine but haven't tested.
import re
class InvalidVersionException(Exception): pass
class StormVersion(object):
VERSION_RE = re.compile(r"(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)"
"(?P<older_patch>\.\d+)?(?P<other>.*)")
RC_RE = re.compile(r"-rc(?P<release_candidate>\d+)", re.IGNORECASE)
@msukmanowsky
msukmanowsky / custom_code_bolt.py
Last active August 29, 2015 14:05
A custom code execution bolt, not yet tested.
import logging
from streamparse.bolt import Bolt
log = logging.getLogger("custom_code_bolt")
class CustomCodeBolt(Bolt):
<!DOCTYPE html>
<html lang="en">
<head>
<title>TODO</title>
<!-- CSS -->
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css">
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap-theme.min.css">
<style>
.done {
text-decoration: line-through;

Python 2.7 contains a bug when dealing with percent-encoded Unicode strings such as:

>>> import urlparse
>>> url = u"http%3A%2F%2F%C5%A1%C4%BC%C5%AB%C4%8D.org%2F"
>>> print "{!r}".format(urlparse.unquote(url))
u'http://\xc5\xa1\xc4\xbc\xc5\xab\xc4\x8d.org/'
>>> print urlparse.unquote(url)
http://šļū�.org/
@msukmanowsky
msukmanowsky / url_monkey.py
Created June 23, 2014 20:59
Monkey patches needed to fix a bug in how Unicode percent-encoded strings are handled in Python's unquote function.
import urlparse
import urllib
import urllib2
def patch_unquote():
urllib.unquote = unquote
urllib2.unquote = unquote
urlparse.unquote = unquote
{
"metadata": {
"name": "",
"signature": "sha256:d0e242ec0ee3bf0798a38aba54eda99ab710de1890ddab0f0b6fd91939170314"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
@msukmanowsky
msukmanowsky / cbloomfilter.pyx
Last active August 29, 2015 13:59
An implementation of a Bloom Filter using Cython, still has a memory leak to debug.
# ported from https://github.com/jvirkki/libbloom
from cpython cimport bool
from libc.stdlib cimport malloc, calloc, free
from libc.string cimport memset
from libc.stdio cimport printf
from libc.math cimport log, ceil
from cpython.mem cimport PyMem_Malloc, PyMem_Free
DEF LN2_SQUARED = 0.480453013918201 # ln(2)^2
@msukmanowsky
msukmanowsky / save_dict_list.py
Created March 12, 2014 14:42
Handy little decorator to cache a list of dictionaries returned from some long running operation like web queries.
from functools import wraps
import csv
import os.path
def save_dict_list(filename, **kwargs):
"""Decorator to take the results of a function call (assumed to be a
``list`` of ``dicts``) and cache them in a local file via
csv.DictWriter and serialize them with csv.DictReader"""
def decorator(f):