Skip to content

Instantly share code, notes, and snippets.

@dipanjanS
Created April 6, 2019 22:43
Show Gist options
  • Save dipanjanS/e0531f2c139c0157c2eeb038c3b0e0ab to your computer and use it in GitHub Desktop.
Save dipanjanS/e0531f2c139c0157c2eeb038c3b0e0ab to your computer and use it in GitHub Desktop.
from pyspark.sql.functions import col
from pyspark.sql.functions import sum as spark_sum
def count_null(col_name):
return spark_sum(col(col_name).isNull().cast('integer')).alias(col_name)
# Build up a list of column expressions, one per column.
exprs = [count_null(col_name) for col_name in logs_df.columns]
# Run the aggregation. The *exprs converts the list of expressions into
# variable function arguments.
logs_df.agg(*exprs).show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment