Skip to content

Instantly share code, notes, and snippets.

@jtemporal
Created January 30, 2017 16:02
Show Gist options
  • Save jtemporal/b770941023a516dcfc6c7aea8085d2f7 to your computer and use it in GitHub Desktop.
Save jtemporal/b770941023a516dcfc6c7aea8085d2f7 to your computer and use it in GitHub Desktop.
memory-error-8GB-machine
root@rosie-staging-8gb:~/rosie# docker run --rm -v /tmp/serenata-data:/tmp/serenata-data rosie
2017-01-30 14:57:15 Creating the CSV file
2017-01-30 14:57:15 Reading the XML file
2017-01-30 14:57:17 Writing record #3,200 to the CSV
2017-01-30 14:57:17 Done!
2017-01-30 14:57:17 Creating the CSV file
2017-01-30 14:57:17 Reading the XML file
2017-01-30 14:59:22 Writing record #342,225 to the CSV
2017-01-30 14:59:22 Done!
2017-01-30 14:59:22 Creating the CSV file
2017-01-30 14:59:22 Reading the XML file
2017-01-30 15:13:26 Writing record #2,404,847 to the CSV/rosie/dataset.py:52: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
dataset['cnpj'] = dataset['cnpj'].str.replace(r'\D', '')
2017-01-30 15:13:26 Writing record #2,404,938 to the CSV
2017-01-30 15:13:26 Done!
Merging all datasets��
Loading current-year.xz��
Loading last-year.xz��
Loading previous-years.xz��
Dropping rows without document_value or reimbursement_number��
Grouping dataset by applicant_id, document_id and year��
Gathering all reimbursement numbers together��
Summing all net values together��
Summing all reimbursement values together��
Generating the new dataset��
Casting changes to a new DataFrame��
Writing it to file��
Done.
Traceback (most recent call last):
File "rosie.py", line 36, in <module>
command()
File "rosie.py", line 23, in run
rosie.main(target_directory)
File "/rosie/__init__.py", line 65, in main
Rosie(dataset, target_directory).run_classifiers()
File "/rosie/__init__.py", line 25, in __init__
self.irregularities = self.dataset[self.DATASET_KEYS].copy()
File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2053, in __getitem__
return self._getitem_array(key)
File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2098, in _getitem_array
return self.take(indexer, axis=1, convert=True)
File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 1666, in take
self._consolidate_inplace()
File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 2801, in _consolidate_inplace
self._protect_consolidate(f)
File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 2790, in _protect_consolidate
result = f()
File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 2799, in f
self._data = self._data.consolidate()
File "/usr/local/lib/python3.5/site-packages/pandas/core/internals.py", line 3526, in consolidate
bm._consolidate_inplace()
File "/usr/local/lib/python3.5/site-packages/pandas/core/internals.py", line 3531, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "/usr/local/lib/python3.5/site-packages/pandas/core/internals.py", line 4523, in _consolidate
_can_consolidate=_can_consolidate)
File "/usr/local/lib/python3.5/site-packages/pandas/core/internals.py", line 4546, in _merge_blocks
new_values = new_values[argsort]
MemoryError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment