Skip to content

Instantly share code, notes, and snippets.

@simicd
Last active June 24, 2020 16:02
Show Gist options
  • Save simicd/98cccac6b87a393646d5ce200252edcc to your computer and use it in GitHub Desktop.
Save simicd/98cccac6b87a393646d5ce200252edcc to your computer and use it in GitHub Desktop.
# Write to csv
df.to_csv("penguin-dataset.csv")
# Write to parquet
df.to_parquet("penguin-dataset.parquet")
# Write to Arrow
# Convert from pandas to Arrow
table = pa.Table.from_pandas(df)
# Write out to file
with pa.OSFile('penguin-dataset.arrow', 'wb') as sink:
with pa.RecordBatchFileWriter(sink, table.schema) as writer:
writer.write_table(table)
# Convert from no-NaN pandas to Arrow
table_nonan = pa.Table.from_pandas(df_nonan)
# Write out to file
with pa.OSFile('penguin-dataset-nonan.arrow', 'wb') as sink:
with pa.RecordBatchFileWriter(sink, table_nonan.schema) as writer:
writer.write_table(table_nonan)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment