Skip to content

Instantly share code, notes, and snippets.

@iangow
Created December 30, 2023 12:26
Show Gist options
  • Save iangow/2d8f7be06fea688ec9b84bc45c6c473a to your computer and use it in GitHub Desktop.
Save iangow/2d8f7be06fea688ec9b84bc45c6c473a to your computer and use it in GitHub Desktop.
Notebook for photo-package for RAM-friendly DB-to-parquet conversion
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@iangow
Copy link
Author

iangow commented Dec 30, 2023

It would be good to add keep and drop arguments to pg_to_pq(). These wouldn't be snippets of SAS code, but I think it would be good to support a regular expression or a list of strings for each. I guess anything would be built on IBIS "selectors".

@iangow
Copy link
Author

iangow commented Dec 30, 2023

The row_group_size argument is not being used if batched=True. Yet somehow I end up with the same row_group_size. Perhaps it's picked up in the schema passed into pyarrow.parquet.ParquetWriter(). Perhaps it reflects some default in terms of batch sizes.

Amendment: It must be the latter, because row_group_size seems completely unused with batched=True.

@iangow
Copy link
Author

iangow commented Dec 30, 2023

@blucap The above may be of interest. Suggestions welcomed.

@iangow
Copy link
Author

iangow commented Dec 30, 2023

Is it necessary to specify Python 3.11 (not 3.12) for now?

@iangow
Copy link
Author

iangow commented Dec 30, 2023

See here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment