Created
January 15, 2017 11:55
-
-
Save LouisAmon/300b4a906a6d25a7fb5d2c4d174d242e to your computer and use it in GitHub Desktop.
Read Avro file from Pandas
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas | |
import fastavro | |
def avro_df(filepath, encoding): | |
# Open file stream | |
with open(filepath, encoding) as fp: | |
# Configure Avro reader | |
reader = fastavro.reader(fp) | |
# Load records in memory | |
records = [r for r in reader] | |
# Populate pandas.DataFrame with records | |
df = pandas.DataFrame.from_records(records) | |
# Return created DataFrame | |
return df |
It looks like you can save a line of code and avoid temporarily duplicating the data in memory by passing the reader
iterable directly to from_records
rather than loading it into a list first.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I had an error
'utf8' codec can't decode byte 0x83
. This was solved by usingopen(filepath, 'rb')
, where theb
means to read the file in binary format. As already mentioned, this argument is the "mode", not "encoding".