A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
file = open("minify1.txt", "w") | |
with open('test.txt') as f: | |
for line in f: | |
newTxt = line.rstrip('\r\n').replace(" ","") | |
file.write(newTxt) | |
if 'str' in line: | |
break | |
file.close() |
def get_proxy_url | |
# Doesn't support different proxies for different protocols at present | |
host_proxy = ENV['http_proxy'] || ENV['HTTP_PROXY'] || ENV['https_proxy'] || ENV["HTTPS_PROXY"] | |
if host_proxy | |
uri = URI(host_proxy) | |
if ['localhost', '127.0.0.1'].include? uri.host | |
# 10.0.2.2 is the default vagrant gateway and should connect to the host OS. | |
# Confirm this by running 'netstat -r' in the guest. | |
host_proxy = host_proxy.sub(uri.host, '10.0.2.2') | |
end |
# When you're sure of the format, it's much quicker to explicitly convert your dates than use `parse_dates` | |
# Makes sense; was just surprised by the time difference. | |
import pandas as pd | |
from datetime import datetime | |
to_datetime = lambda d: datetime.strptime(d, '%m/%d/%Y %H:%M') | |
%time trips = pd.read_csv('data/divvy/Divvy_Trips_2013.csv', parse_dates=['starttime', 'stoptime']) | |
# CPU times: user 1min 29s, sys: 331 ms, total: 1min 29s | |
# Wall time: 1min 30s |
A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
#!/usr/bin/env python | |
""" | |
Serialize/unserialize a class with a pandas data structure attribute using msgpack. | |
""" | |
import msgpack | |
import numpy as np | |
import pandas as pd |
Eine kurze Dokumentation, wie man Postgres/Postgis unter Windows ohne Installation betreibt: | |
- PG-Binaries (ZIP) herunterladen und entpacken | |
http://www.enterprisedb.com/products-services-training/pgbindownload | |
- Batch-Datei lt. Anhang bzw. lt. | |
http://www.postgresonline.com/journal/archives/172-Starting-PostgreSQL-in-windows-without-install.html | |
im Haupt-Verzeichnis speichern (das Verzeichnis, in dem /bin zu finden ist). | |
- Beim ERSTEN Start der run.bat muss die gekennzeichnete Zeile ausgeführt, danach aber wieder auskommentiert werden. |
import pandas as pd | |
pd.DataFrame._repr_html_ = lambda self: self.to_html(classes='table table-striped') |
Source: http://jakeaustwick.me/python-web-scraping-resource/ | |
Jake Austwick | |
09 Mar 2014 on requests | python | lxml | scrape | proxies | web crawler | download images | |
Python web scraping resource | |
If you need to extract data from a web page, then the chances are you looked for their API. Unfortunately this isn't always available and you sometimes have to fall back to web scraping. | |
In this article I'm going to cover a lot of the things that apply to all web scraping projects and how to overcome some common gotchas. | |
Please Note: This is a work in progress. I am adding more things as I come across them. Got a suggestion? Drop me an email - [email protected] |
@echo off | |
SET st3Path=C:\Program Files\Sublime Text 3\sublime_text.exe | |
rem add it for all file types | |
@reg add "HKEY_CLASSES_ROOT\*\shell\Open with Sublime Text 3" /t REG_SZ /v "" /d "Open with Sublime Text 3" /f | |
@reg add "HKEY_CLASSES_ROOT\*\shell\Open with Sublime Text 3" /t REG_EXPAND_SZ /v "Icon" /d "%st3Path%,0" /f | |
@reg add "HKEY_CLASSES_ROOT\*\shell\Open with Sublime Text 3\command" /t REG_SZ /v "" /d "%st3Path% \"%%1\"" /f | |
rem add it for folders | |
@reg add "HKEY_CLASSES_ROOT\Folder\shell\Open with Sublime Text 3" /t REG_SZ /v "" /d "Open with Sublime Text 3" /f |