Created
January 30, 2017 16:02
-
-
Save jtemporal/b770941023a516dcfc6c7aea8085d2f7 to your computer and use it in GitHub Desktop.
memory-error-8GB-machine
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
root@rosie-staging-8gb:~/rosie# docker run --rm -v /tmp/serenata-data:/tmp/serenata-data rosie | |
2017-01-30 14:57:15 Creating the CSV file | |
2017-01-30 14:57:15 Reading the XML file | |
2017-01-30 14:57:17 Writing record #3,200 to the CSV | |
2017-01-30 14:57:17 Done! | |
2017-01-30 14:57:17 Creating the CSV file | |
2017-01-30 14:57:17 Reading the XML file | |
2017-01-30 14:59:22 Writing record #342,225 to the CSV | |
2017-01-30 14:59:22 Done! | |
2017-01-30 14:59:22 Creating the CSV file | |
2017-01-30 14:59:22 Reading the XML file | |
2017-01-30 15:13:26 Writing record #2,404,847 to the CSV/rosie/dataset.py:52: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy | |
dataset['cnpj'] = dataset['cnpj'].str.replace(r'\D', '') | |
2017-01-30 15:13:26 Writing record #2,404,938 to the CSV | |
2017-01-30 15:13:26 Done! | |
Merging all datasets�� | |
Loading current-year.xz�� | |
Loading last-year.xz�� | |
Loading previous-years.xz�� | |
Dropping rows without document_value or reimbursement_number�� | |
Grouping dataset by applicant_id, document_id and year�� | |
Gathering all reimbursement numbers together�� | |
Summing all net values together�� | |
Summing all reimbursement values together�� | |
Generating the new dataset�� | |
Casting changes to a new DataFrame�� | |
Writing it to file�� | |
Done. | |
Traceback (most recent call last): | |
File "rosie.py", line 36, in <module> | |
command() | |
File "rosie.py", line 23, in run | |
rosie.main(target_directory) | |
File "/rosie/__init__.py", line 65, in main | |
Rosie(dataset, target_directory).run_classifiers() | |
File "/rosie/__init__.py", line 25, in __init__ | |
self.irregularities = self.dataset[self.DATASET_KEYS].copy() | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2053, in __getitem__ | |
return self._getitem_array(key) | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2098, in _getitem_array | |
return self.take(indexer, axis=1, convert=True) | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 1666, in take | |
self._consolidate_inplace() | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 2801, in _consolidate_inplace | |
self._protect_consolidate(f) | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 2790, in _protect_consolidate | |
result = f() | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 2799, in f | |
self._data = self._data.consolidate() | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/internals.py", line 3526, in consolidate | |
bm._consolidate_inplace() | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/internals.py", line 3531, in _consolidate_inplace | |
self.blocks = tuple(_consolidate(self.blocks)) | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/internals.py", line 4523, in _consolidate | |
_can_consolidate=_can_consolidate) | |
File "/usr/local/lib/python3.5/site-packages/pandas/core/internals.py", line 4546, in _merge_blocks | |
new_values = new_values[argsort] | |
MemoryError |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment