Created
December 30, 2023 12:26
-
-
Save iangow/2d8f7be06fea688ec9b84bc45c6c473a to your computer and use it in GitHub Desktop.
Notebook for photo-package for RAM-friendly DB-to-parquet conversion
@blucap The above may be of interest. Suggestions welcomed.
Is it necessary to specify Python 3.11 (not 3.12) for now?
See here.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The
row_group_size
argument is not being used ifbatched=True
. Yet somehow I end up with the samerow_group_size
. Perhaps it's picked up in the schema passed intopyarrow.parquet.ParquetWriter()
. Perhaps it reflects some default in terms of batch sizes.Amendment: It must be the latter, because
row_group_size
seems completely unused withbatched=True
.