Skip to content

Instantly share code, notes, and snippets.

@xtrmstep
Created January 10, 2023 17:06
Show Gist options
  • Save xtrmstep/ca6deae7dd5332100f443457845f38f1 to your computer and use it in GitHub Desktop.
Save xtrmstep/ca6deae7dd5332100f443457845f38f1 to your computer and use it in GitHub Desktop.
Flatten Spark Dataframe
# source: https://stackoverflow.com/a/50156142/2833774
def flatten(schema, prefix=None):
fields = []
for field in schema.fields:
name = prefix + '.' + field.name if prefix else field.name
alias_name = name.replace(".", "__")
dtype = field.dataType
if isinstance(dtype, pst.ArrayType):
dtype = dtype.elementType
if isinstance(dtype, pst.StructType):
fields += flatten(dtype, prefix=name)
else:
fields.append(col(name).alias(alias_name))
return fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment