Skip to content

Instantly share code, notes, and snippets.

@zoltanctoth
Last active July 15, 2023 13:23
Show Gist options
  • Save zoltanctoth/2deccd69e3d1cde1dd78 to your computer and use it in GitHub Desktop.
Save zoltanctoth/2deccd69e3d1cde1dd78 to your computer and use it in GitHub Desktop.
Writing an UDF for withColumn in PySpark
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())
df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
df.withColumn("maturity", maturity_udf(df.age))
df.show()
Copy link

ghost commented Jan 7, 2019

Thanks ! :)

@swetaravi
Copy link

I have a question. When I have a data frame with date columns in the format of 'Mmm dd,yyyy' then can I use this udf?

1 Change date fields

review_date_udf = fn.udf(
lambda x: datetime.strptime(x, ' %b %d, %Y'), DateType()
)

reviews_df = reviews_df.withColumn("dates", review_date_udf(reviews_df['dates']))

But when I try to view the data frame it starts throwing an error of Caused by: java.net.SocketTimeoutException: Accept timed out. Any ideas to solve this issue?

@datbui
Copy link

datbui commented Feb 3, 2020

Thanks !

@vinothkumar-dev
Copy link

TypeError: a bytes-like object is required, not 'NoneType'

I am getting this error while trying 'mrandrewandrade' input.
How can I resolve this error?

Thanks.

@abdu95
Copy link

abdu95 commented Sep 2, 2021

It was nice to come across my teacher's code even after graduation. Thank you!

@Weiyu-Luo
Copy link

I encountered this problem too. Have you solved it? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment