Skip to content

Instantly share code, notes, and snippets.

@vanessasoutoc
Created February 14, 2017 12:59
Show Gist options
  • Save vanessasoutoc/771812dd9a370f55e1ecebe8458b3c5e to your computer and use it in GitHub Desktop.
Save vanessasoutoc/771812dd9a370f55e1ecebe8458b3c5e to your computer and use it in GitHub Desktop.
$ python rosie.py run
2017-02-14 10:04:27 Creating the CSV file
2017-02-14 10:04:27 Reading the XML file
2017-02-14 10:04:28 Writing record #4,949 to the CSV
2017-02-14 10:04:28 Done!
2017-02-14 10:04:28 Creating the CSV file
2017-02-14 10:04:28 Reading the XML file
2017-02-14 10:05:38 Writing record #343,681 to the CSV
2017-02-14 10:05:38 Done!
2017-02-14 10:05:38 Creating the CSV file
2017-02-14 10:05:38 Reading the XML file
2017-02-14 10:13:42 Writing record #2,404,938 to the CSV
2017-02-14 10:13:42 Done!
Merging all datasets…
Loading current-year.xz…
Loading last-year.xz…
Loading previous-years.xz…
Dropping rows without document_value or reimbursement_number…
Grouping dataset by applicant_id, document_id and year…
Gathering all reimbursement numbers together…
Summing all net values together…
Summing all reimbursement values together…
Generating the new dataset…
Casting changes to a new DataFrame…
Writing it to file…
Done.
rosie.py:23: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.
rosie.main(target_directory)
rosie.py:23: DtypeWarning: Columns (21,22,101,102,103,104,105,106,107,108,109,110,111,112,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222) have mixed types. Specify dtype option on import or set low_memory=False.
rosie.main(target_directory)
/home/nelltech/Projetos/rosie/rosie/dataset.py:52: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
dataset['cnpj'] = dataset['cnpj'].str.replace(r'\D', '')
Traceback (most recent call last):
File "rosie.py", line 36, in <module>
command()
File "rosie.py", line 23, in run
rosie.main(target_directory)
File "/home/nelltech/Projetos/rosie/rosie/__init__.py", line 65, in main
Rosie(dataset, target_directory).run_classifiers()
File "/home/nelltech/Projetos/rosie/rosie/__init__.py", line 30, in run_classifiers
self.predict(model, irregularity)
File "/home/nelltech/Projetos/rosie/rosie/__init__.py", line 56, in predict
y = model.predict(self.dataset)
File "/home/nelltech/Projetos/rosie/rosie/traveled_speeds_classifier.py", line 37, in predict
_X = pd.merge(X, _X, how='left', left_on=self.AGG_KEYS, right_on=self.AGG_KEYS)
File "/home/nelltech/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 62, in merge
return op.get_result()
File "/home/nelltech/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 564, in get_result
concat_axis=0, copy=self.copy)
File "/home/nelltech/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4825, in concatenate_block_managers
placement=placement) for placement, join_units in concat_plan]
File "/home/nelltech/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4825, in <listcomp>
placement=placement) for placement, join_units in concat_plan]
File "/home/nelltech/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4922, in concatenate_join_units
for ju in join_units]
File "/home/nelltech/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4922, in <listcomp>
for ju in join_units]
File "/home/nelltech/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 5222, in get_reindexed_values
fill_value=fill_value)
File "/home/nelltech/anaconda3/lib/python3.5/site-packages/pandas/core/algorithms.py", line 1100, in take_nd
out = np.empty(out_shape, dtype=dtype)
MemoryError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment