Created
September 30, 2015 16:38
-
-
Save waylonflinn/fee3920534f2088754b7 to your computer and use it in GitHub Desktop.
Filtering and Aggregation with Bquery
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import bcolz | |
import bquery | |
data_path = '/some/place/with/bcolz/data' | |
c_table = bquery.ctable(rootdir=data_path) | |
## Filter | |
# create the criteria | |
string_feature = 'some_string_feature' | |
criterion = "feature == b'{0}'".format(string_feature) | |
# create the boolean array, with numexpr | |
boolarr = c_table.eval(criterion) | |
## Aggregate | |
# column_name whose unique values define groups | |
group_columns = ['group_column_name'] | |
# input_column_name, operation, output_column_name | |
aggregation_operations = [['number_column_name', 'mean', 'mean_column']] | |
# use boolean array in aggregation | |
mean_repin_count = c_table.groupby(group_columns, | |
aggregation_operations, bool_arr=boolarr) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment