Skip to content

Instantly share code, notes, and snippets.

@mpjdem
Created August 7, 2019 10:47
Show Gist options
  • Select an option

  • Save mpjdem/3e9c7442e55f976bda2dbb152b77f2c4 to your computer and use it in GitHub Desktop.

Select an option

Save mpjdem/3e9c7442e55f976bda2dbb152b77f2c4 to your computer and use it in GitHub Desktop.
Basic operations in Python datatable
import numpy as np
import datatable as dt
from datatable import f, by, mean
# Reading a CSV
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
tbl = dt.fread(url)
# Filtering rows
tbl = tbl[f.species != "setosa", :]
# Selecting columns
tbl = tbl[:, (f.species, f.sepal_length)]
# Adding a computed column (by reference)
tbl.cbind(tbl[:, {"sepal_length_sq" : np.square(f.sepal_length)}])
# Aggregating tables
agg_tbl = tbl[:, {"avg_sq_length" : mean(f.sepal_length_sq)}, by(f.species)]
# Outputting the result (and conversion to pandas)
agg_tbl.to_pandas()
@mpjdem
Copy link
Copy Markdown
Author

mpjdem commented Aug 7, 2019

As discussed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment