Skip to content

Instantly share code, notes, and snippets.

@drorata
drorata / gist:6d6be93ca74edffe0760
Created July 16, 2015 07:26
Word count in a file stored on S3 using Spark (python version)
from pyspark import SparkContext
sc = SparkContext(appName = "simple app")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "yourAccessKeyId")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "yourSecretAccessKey")
text_file = sc.textFile("s3n://bucketName/filename.tar.gz")
counts = text_file.flatMap(lambda line: line.split(" ")) \
@ololobus
ololobus / Spark+ipython_on_MacOS.md
Last active March 18, 2026 03:40
Apache Spark installation + ipython/jupyter notebook integration guide for macOS

Apache Spark installation + ipython/jupyter notebook integration guide for macOS

Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112

For older versions of Spark and ipython, please, see also previous version of text.

Install Java Development Kit

@artjomb
artjomb / infiniteScroll_1.py
Last active June 12, 2023 00:00
infinite scroll of stackstatus with python in phantomjs
import selenium
import time
from selenium import webdriver
browser = webdriver.PhantomJS("phantomjs")
browser.get("https://twitter.com/StackStatus")
print browser.title
pause = 3
@adrianorsouza
adrianorsouza / sublime-command-line.md
Last active September 26, 2023 16:26
launch sublime text from the command line

Launch Sublime Text from the command line on OSX

Sublime Text includes a command line tool, subl, to work with files on the command line. This can be used to open files and projects in Sublime Text, as well working as an EDITOR for unix tools, such as git and subversion.

Requirements

  • Sublime text 2 or 3 installed in your system within Applications folder

Setup

@bsweger
bsweger / useful_pandas_snippets.md
Last active March 4, 2026 19:59
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@julionc
julionc / 00.howto_install_phantomjs.md
Last active February 1, 2026 03:05
How to install PhantomJS on Debian/Ubuntu

How to install PhantomJS on Ubuntu

Version: 1.9.8

Platform: x86_64

First, install or update to the latest system software.

sudo apt-get update
sudo apt-get install build-essential chrpath libssl-dev libxft-dev
@nodesocket
nodesocket / bootstrap.flatten.css
Last active October 20, 2025 11:48
Below are simple styles to "flatten" bootstrap. I didn't go through every control and widget that bootstrap offers, only what was required for https://commando.io, so your milage may vary.
/* Flatten das boostrap */
.well, .navbar-inner, .popover, .btn, .tooltip, input, select, textarea, pre, .progress, .modal, .add-on, .alert, .table-bordered, .nav>.active>a, .dropdown-menu, .tooltip-inner, .badge, .label, .img-polaroid {
-moz-box-shadow: none !important;
-webkit-box-shadow: none !important;
box-shadow: none !important;
-webkit-border-radius: 0px !important;
-moz-border-radius: 0px !important;
border-radius: 0px !important;
border-collapse: collapse !important;
background-image: none !important;
@christianroman
christianroman / test.py
Created May 30, 2013 16:02
Bypass Captcha using 10 lines of code with Python, OpenCV & Tesseract OCR engine
import cv2.cv as cv
import tesseract
gray = cv.LoadImage('captcha.jpeg', cv.CV_LOAD_IMAGE_GRAYSCALE)
cv.Threshold(gray, gray, 231, 255, cv.CV_THRESH_BINARY)
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_SINGLE_WORD)
tesseract.SetCvImage(gray,api)
print api.GetUTF8Text()
@mgronhol
mgronhol / trie.py
Created May 13, 2012 16:10
Trie in Python
#!/usr/bin/env python
class Node( object ):
def __init__( self, end_node = False ):
self.end_node = end_node
self.prefix_count = 0
self.children = {}
@ChrisWills
ChrisWills / .screenrc-main-example
Created November 3, 2011 17:50
A nice default screenrc
# GNU Screen - main configuration file
# All other .screenrc files will source this file to inherit settings.
# Author: Christian Wills - [email protected]
# Allow bold colors - necessary for some reason
attrcolor b ".I"
# Tell screen how to set colors. AB = background, AF=foreground
termcapinfo xterm 'Co#256:AB=\E[48;5;%dm:AF=\E[38;5;%dm'