Skip to content

Instantly share code, notes, and snippets.

@scott-hsieh
Created April 10, 2021 02:02
Show Gist options
  • Save scott-hsieh/49691f9177b011c1669b9394ef86d441 to your computer and use it in GitHub Desktop.
Save scott-hsieh/49691f9177b011c1669b9394ef86d441 to your computer and use it in GitHub Desktop.
如何在 PySpark 中管理 Python 相依性套件
import pandas as pd
from pyspark.sql.functions import pandas_udf
@pandas_udf('double')
def pandas_plus_one(v: pd.Series) -> pd.Series:
return v + 1
spark.range(10).select(pandas_plus_one("id")).show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment