Turning column values into rows
ntfLog.groupby("auth_method","auth_result").agg(F.count("*").alias("cnt"))
.sort("auth_method","auth_result").show(20,False)
Result:
+------------------+-----------+------+
|auth_method |auth_result|cnt |
+------------------+-----------+------+
|FACE_RECOGNITION |false |41528 |
|FACE_RECOGNITION |true |154838|
|NCIIC |true |35420 |
|QUICKPAY_SIGN |false |15382 |
|QUICKPAY_SIGN |true |156307|
|SHORT_PAY_PASSWORD|false |28698 |
|SHORT_PAY_PASSWORD|true |157004|
+------------------+-----------+------+
(ntfLog.groupby("auth_method")
.pivot("auth_result", ['false','true'])
.agg(F.count("*"))
.sort("auth_method")
.show(20,False)
)
Result:
+------------------+-----+------+
|auth_method |false|true |
+------------------+-----+------+
|FACE_RECOGNITION |41528|154838|
|NCIIC |null |35420 |
|QUICKPAY_SIGN |15382|156307|
|SHORT_PAY_PASSWORD|28698|157004|
+------------------+-----+------+
https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html