Created
December 30, 2023 12:26
-
-
Save iangow/2d8f7be06fea688ec9b84bc45c6c473a to your computer and use it in GitHub Desktop.
Notebook for photo-package for RAM-friendly DB-to-parquet conversion
The row_group_size
argument is not being used if batched=True
. Yet somehow I end up with the same row_group_size
. Perhaps it's picked up in the schema passed into pyarrow.parquet.ParquetWriter()
. Perhaps it reflects some default in terms of batch sizes.
Amendment: It must be the latter, because row_group_size
seems completely unused with batched=True
.
@blucap The above may be of interest. Suggestions welcomed.
Is it necessary to specify Python 3.11 (not 3.12) for now?
See here.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It would be good to add
keep
anddrop
arguments topg_to_pq()
. These wouldn't be snippets of SAS code, but I think it would be good to support a regular expression or a list of strings for each. I guess anything would be built on IBIS "selectors".