Aleksandr Kosobokov aleksandr-kosobokov

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

A "Best of the Best Practices" (BOBP) guide to developing in Python.

	file = open("minify1.txt", "w")
	with open('test.txt') as f:
	for line in f:
	newTxt = line.rstrip('\r\n').replace(" ","")
	file.write(newTxt)
	if 'str' in line:
	break

	file.close()

	def get_proxy_url
	# Doesn't support different proxies for different protocols at present
	host_proxy = ENV['http_proxy'] \|\| ENV['HTTP_PROXY'] \|\| ENV['https_proxy'] \|\| ENV["HTTPS_PROXY"]
	if host_proxy
	uri = URI(host_proxy)
	if ['localhost', '127.0.0.1'].include? uri.host
	# 10.0.2.2 is the default vagrant gateway and should connect to the host OS.
	# Confirm this by running 'netstat -r' in the guest.
	host_proxy = host_proxy.sub(uri.host, '10.0.2.2')
	end

	# When you're sure of the format, it's much quicker to explicitly convert your dates than use `parse_dates`
	# Makes sense; was just surprised by the time difference.
	import pandas as pd
	from datetime import datetime
	to_datetime = lambda d: datetime.strptime(d, '%m/%d/%Y %H:%M')

	%time trips = pd.read_csv('data/divvy/Divvy_Trips_2013.csv', parse_dates=['starttime', 'stoptime'])
	# CPU times: user 1min 29s, sys: 331 ms, total: 1min 29s
	# Wall time: 1min 30s

	#!/usr/bin/env python

	"""
	Serialize/unserialize a class with a pandas data structure attribute using msgpack.
	"""

	import msgpack

	import numpy as np
	import pandas as pd

	Eine kurze Dokumentation, wie man Postgres/Postgis unter Windows ohne Installation betreibt:

	- PG-Binaries (ZIP) herunterladen und entpacken
	http://www.enterprisedb.com/products-services-training/pgbindownload

	- Batch-Datei lt. Anhang bzw. lt.
	http://www.postgresonline.com/journal/archives/172-Starting-PostgreSQL-in-windows-without-install.html
	im Haupt-Verzeichnis speichern (das Verzeichnis, in dem /bin zu finden ist).

	- Beim ERSTEN Start der run.bat muss die gekennzeichnete Zeile ausgeführt, danach aber wieder auskommentiert werden.

	import pandas as pd
	pd.DataFrame._repr_html_ = lambda self: self.to_html(classes='table table-striped')

	Source: http://jakeaustwick.me/python-web-scraping-resource/

	Jake Austwick
	09 Mar 2014 on requests \| python \| lxml \| scrape \| proxies \| web crawler \| download images
	Python web scraping resource
	If you need to extract data from a web page, then the chances are you looked for their API. Unfortunately this isn't always available and you sometimes have to fall back to web scraping.

	In this article I'm going to cover a lot of the things that apply to all web scraping projects and how to overcome some common gotchas.

	Please Note: This is a work in progress. I am adding more things as I come across them. Got a suggestion? Drop me an email - [email protected]

	@echo off
	SET st3Path=C:\Program Files\Sublime Text 3\sublime_text.exe

	rem add it for all file types
	@reg add "HKEY_CLASSES_ROOT\*\shell\Open with Sublime Text 3" /t REG_SZ /v "" /d "Open with Sublime Text 3" /f
	@reg add "HKEY_CLASSES_ROOT\*\shell\Open with Sublime Text 3" /t REG_EXPAND_SZ /v "Icon" /d "%st3Path%,0" /f
	@reg add "HKEY_CLASSES_ROOT\*\shell\Open with Sublime Text 3\command" /t REG_SZ /v "" /d "%st3Path% \"%%1\"" /f

	rem add it for folders
	@reg add "HKEY_CLASSES_ROOT\Folder\shell\Open with Sublime Text 3" /t REG_SZ /v "" /d "Open with Sublime Text 3" /f