rainsunny

Spark DataFrame `pivot` functions

Turning column values into rows

ntfLog.groupby("auth_method","auth_result").agg(F.count("*").alias("cnt"))
.sort("auth_method","auth_result").show(20,False)

curl http://www.centos.org

Suppose you need to apply the same function to multiple columns in one DataFrame, one straight way is like this:

val newDF = oldDF.withColumn("colA", func("colA")).withColumn("colB", func("colB")).withColumn("colC", func("colC"))

If you want to save some type, you can try this:

import spark.implicits._

UDF can return only a single column at the time. There are two different ways you can overcome this limitation:

The most general solution is a StructType but you can consider ArrayType or MapType as well.

import org.apache.spark.sql.functions.udf

val df = Seq(
  (1L, 3.0, "a"), (2L, -1.0, "b"), (3L, 0.0, "c")

	# Copyright (c) 2017 Cary Kempston

	# Permission is hereby granted, free of charge, to any person obtaining a copy
	# of this software and associated documentation files (the "Software"), to deal
	# in the Software without restriction, including without limitation the rights
	# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	# copies of the Software, and to permit persons to whom the Software is
	# furnished to do so, subject to the following conditions:

	# The above copyright notice and this permission notice shall be included in all

	# Used to flatten json object while using pandas
	from pandas.io.json import json_normalize

	def flatten_json(y):
	out = {}

	def __flatten(x, name=''):
	if type(x) is dict:
	for a in x:
	__flatten(x[a], name + a + '_')

	%matplotlib inline

	buckets = [-87.0, -15, 0, 30, 120]
	rdd_histogram_data = ml_bucketized_features\
	.select("ArrDelay")\
	.rdd\
	.flatMap(lambda x: x)\
	.histogram(buckets)

	create_hist(rdd_histogram_data)

	# 附带一个用spark将数据取回本地用于绘图的方法
	def toArr(df, col, dtype=np.int32):
	"""
	将DataFrame的一列取回本地，并转成numpy.ndarray格式。

	df: 目标DataFrame
	col: 目标列名
	dtype: 目标列的数据格式
	return: 目标列的数据。np.ndarray
	"""