Last active
December 10, 2015 10:18
-
-
Save marinados/572a5b3dba41548193a9 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pandas, Numpy, SciPy - libraries for vectorized calculations of non structured data | |
Folium - geographic data | |
iPython - work environment | |
List ~ Array (methods: append, insert, del) | |
Dictionary (dict) ~ Hash | |
.type ~ .class | |
| ~ || | |
& ~ && | |
syntax --> | |
def method_name(argument): | |
// body | |
======= | |
NUMPY | |
import numpy as np | |
ndarray ~ matrix | |
np.array - creates a list | |
np.arange(15) -> (1..15) | |
np.dtype | |
array.astype(np.float64) | |
array[5:9] | |
array[0][1] | |
1. vector comparison | |
ndarray = [-10, 1, 7, -8] | |
ndarray > 0 --> [false, true, true, false] | |
2. vector comparison as condition | |
ndarray[ndarray > 0] --> [1, 7] | |
3. operations on each array element | |
array = np.arand(1,10,1) | |
np.sqrt(array) --> [sqrt(1), srrt(2) etc.] | |
4. max and min of several tables | |
np.maximum(array1, array2) --> [max of array1 and array2] | |
========== | |
PANDAS | |
import pandas as pd | |
1. Series - elements of several types but 1 dimension | |
ser = pd.Series(elements) | |
ser.index --> [0,1,2 etc.] (not necessarily integers) | |
ser * 2 | |
ser[ser > 0] | |
ser.values | |
When created from dictionaries, dictionary's keys become indexes | |
2. DataFrames - series with multiple columns | |
pd.DataFrame(dictionary) --> columns = [column list], index = [1,2,3 etc.] | |
--> table with keys as headers, | |
concatenation of series | |
ACCESS LINE / COL | |
Columns accessible with ['name'] or .name | |
datafr.ix(2) --> selection of line 2 | |
datafr.reindex[list of new indexes] (or .fill_value = 0 /sth else) | |
If line with this index is n/a --> NaN (not a number) | |
datafr.fillna(0) | |
datafr.drop[2, axis=0] (either index or col/line name) | |
axis0 - line | |
axis1 - column | |
CONCATENATION OF TWO DATAFRAMES | |
- datafr1 + datafr2 --> only takes the values existing in two tables | |
- datafr1.add(datafr2, fill_value = 0) --> takes all data | |
SORTING | |
.sort_index(axis=0) | |
.sort('col_name') - ASC by def (ascending=false) | |
.describe() --> classic stats like mean, max etc. | |
.mean() --> every line mean | |
.sum() | |
.dropna() --> drop all lines with at least 1 NULL value | |
.dropna(axis=1, how='all') --> only if all values are NULL | |
.fillna(0) --> creates a copy | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment