[TOC]

Read

csv

import pandas as pd

data = pd.read_csv("http://www.google.de")
df = pd.read_csv('../data/example.csv', header=None)
df = pd.read_csv('../data/example.csv', na_values=['.']) # specifying "." as missing values

df = pd.read_csv('../data/example.csv', na_values={'Last Name': ['.', 'NA'], 'Pre-Test Score': ['.']}) # specifying "." and "NA" as missing values in the Last Name column and "." as missing values in Pre-Test Score column

df = pd.read_csv('../data/example.csv', na_values=sentinels, skiprows=3) # skipping the top 3 rows
df = pd.read_csv('../data/example.csv', thousands=',') # interpreting "," in strings around numbers as thousands seperators

Parsing dates

dateparse = lambda dates: pd.datetime.strptime(dates, '%Y')
data = pd.read_csv(in_file, parse_dates='Month', index_col='Month',date_parser=dateparse)

Excel

Import the excel file and call it xls_file

xls_file = pd.ExcelFile('../data/example.xls')

Load the xls file's Sheet1 as a dataframe

df = xls_file.parse('Sheet1')

SAVE

CSV

df.to_csv("../submission.csv", index = False)

fabsta/1. data import export (python data science).md

Read

csv

Parsing dates

Excel

Import the excel file and call it xls_file

Load the xls file's Sheet1 as a dataframe

SAVE

CSV