Skip to content

Instantly share code, notes, and snippets.

@LouisAmon
Created January 15, 2017 11:55
Show Gist options
  • Save LouisAmon/300b4a906a6d25a7fb5d2c4d174d242e to your computer and use it in GitHub Desktop.
Save LouisAmon/300b4a906a6d25a7fb5d2c4d174d242e to your computer and use it in GitHub Desktop.
Read Avro file from Pandas
import pandas
import fastavro
def avro_df(filepath, encoding):
# Open file stream
with open(filepath, encoding) as fp:
# Configure Avro reader
reader = fastavro.reader(fp)
# Load records in memory
records = [r for r in reader]
# Populate pandas.DataFrame with records
df = pandas.DataFrame.from_records(records)
# Return created DataFrame
return df
@WASDi
Copy link

WASDi commented Apr 28, 2021

I had an error 'utf8' codec can't decode byte 0x83. This was solved by using open(filepath, 'rb'), where the b means to read the file in binary format. As already mentioned, this argument is the "mode", not "encoding".

@nat-n
Copy link

nat-n commented Nov 13, 2021

It looks like you can save a line of code and avoid temporarily duplicating the data in memory by passing the reader iterable directly to from_records rather than loading it into a list first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment