Skip to content

Instantly share code, notes, and snippets.

View fabsta's full-sized avatar

Fabian Schreiber fabsta

View GitHub Profile

[TOC]

writing file

using pickle

try:
  f = open(pickle_file, 'wb')
  save = {

DATA TYPES

determine the type of an object

type(2)         # returns 'int'
type(2.0)       # returns 'float'
type('two')     # returns 'str'
type(True)      # returns 'bool'
type(None)      # returns 'NoneType'

IMPORTS

'generic import' of math module

import math
math.sqrt(25)

import a function

[TOC]

read csv

train <- read.csv("../train.csv", stringsAsFactors = F, row.names = 1)

with row indices

[TOC]

duplicates

table(complete.cases(df)) # returns logical vector 

Imputing

impute missing values with linear regresion

[TOC]

New dataframe

First Column as data frame

as.data.frame( df[,1], drop=false)

mydata$newvar <- oldvar

[TOC]

Selecting

Cells

Y = data['TV'] # column
Y = data.TV
df.ix['Arizona', 2]  # Select the third cell in the row named Arizona
[TOC]
# numerical value
age_mean = df['Age'].mean()
df['Age'] = df['Age'].fillna(age_mean)
# categorical value
from scipy.stats import mode
mode_embarked = mode(df['Embarked'])[0][0]
@fabsta
fabsta / 1. data import export (python data science).md
Last active November 21, 2017 18:05
#DataScience #InputOutput

[TOC]

Read

csv

import pandas as pd
### single
Y = data['TV'] # column
Y = data.TV
df[:2] # first two rows
df.ix['Maricopa'] # view a row
df.ix[:, 'coverage'] # view a column
df.ix['Yuma', 'coverage'] # view the value based on a row and column