Skip to content

Instantly share code, notes, and snippets.

@trungquy
Created December 21, 2018 03:08
Show Gist options
  • Save trungquy/856ba6b27ea773241474c930dc8ae37c to your computer and use it in GitHub Desktop.
Save trungquy/856ba6b27ea773241474c930dc8ae37c to your computer and use it in GitHub Desktop.
Pyspark - Utilities
# source: https://stackoverflow.com/questions/37471346/automatically-and-elegantly-flatten-dataframe-in-spark-sql
from pyspark.sql.types import StructType, ArrayType
def flatten(schema, prefix=None):
fields = []
for field in schema.fields:
name = prefix + '.' + field.name if prefix else field.name
dtype = field.dataType
if isinstance(dtype, ArrayType):
dtype = dtype.elementType
if isinstance(dtype, StructType):
fields += flatten(dtype, prefix=name)
else:
fields.append(name)
return fields
df.select(flatten(df.schema)).show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment