Skip to content

Instantly share code, notes, and snippets.

@ianmcook
Created April 2, 2025 16:48
Show Gist options
  • Save ianmcook/2576980948af17803de7d61e8ecb131b to your computer and use it in GitHub Desktop.
Save ianmcook/2576980948af17803de7d61e8ecb131b to your computer and use it in GitHub Desktop.
Create sample data and write it to two files in Arrow IPC stream format and file format
import pandas as pd
import pyarrow as pa
file_path = 'fruit.arrow'
stream_path = 'fruit.arrows'
df = pd.DataFrame(data={'fruit': ['apple', 'apple', 'apple', 'orange', 'orange', 'orange'],
'variety': ['gala', 'honeycrisp', 'fuji', 'navel', 'valencia', 'cara cara'],
'weight': [134.2 , 158.6, None, 142.1, 96.7, None]})
table = pa.Table.from_pandas(df, preserve_index=False)
table = table.replace_schema_metadata(None)
# write file in Arrow IPC file format
with pa.ipc.new_file(file_path, table.schema) as writer:
writer.write_table(table)
# write file in Arrow IPC stream format
with pa.ipc.new_stream(stream_path, table.schema) as writer:
writer.write_table(table)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment