Skip to content

Instantly share code, notes, and snippets.

View CodeBear801's full-sized avatar

Xun(Perry) Liu CodeBear801

View GitHub Profile
@CodeBear801
CodeBear801 / csv_to_parquet.py
Created June 27, 2019 15:43
convert csv into parquet
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
csv_file = 'id2ids.csv'
parquet_file = 'id2ids.parquet'
chunksize = 10_000_000
csv_stream = pd.read_csv(csv_file, sep='\t', chunksize=chunksize, low_memory=False)