Skip to content

Instantly share code, notes, and snippets.

@LouisAmon
Created January 15, 2017 11:55
Show Gist options
  • Save LouisAmon/300b4a906a6d25a7fb5d2c4d174d242e to your computer and use it in GitHub Desktop.
Save LouisAmon/300b4a906a6d25a7fb5d2c4d174d242e to your computer and use it in GitHub Desktop.
Read Avro file from Pandas
import pandas
import fastavro
def avro_df(filepath, encoding):
# Open file stream
with open(filepath, encoding) as fp:
# Configure Avro reader
reader = fastavro.reader(fp)
# Load records in memory
records = [r for r in reader]
# Populate pandas.DataFrame with records
df = pandas.DataFrame.from_records(records)
# Return created DataFrame
return df
@nat-n
Copy link

nat-n commented Nov 13, 2021

It looks like you can save a line of code and avoid temporarily duplicating the data in memory by passing the reader iterable directly to from_records rather than loading it into a list first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment