Vinod KC vinodkc

Spark on Docker - HDP3 YARN

Kerberize the cluster
Enable CGroup from yarn and restart

To enable cgroups on an Ambari cluster, select YARN > Configs on the Ambari dashboard, then click CPU Isolation under CPU. Click Save, then restart all cluster components that require a restart

I got mount failure error: /sys/fs/cgroup/cpu/yarn Solution , run below command on all node manager hosts:

Login to LLAP host node

A) Test with Spark-shell

step 1:

cd /tmp
wget https://raw.githubusercontent.com/dbompart/hive_warehouse_connector/master/hwc_info_collect.sh
chmod +x  hwc_info_collect.sh

Kafka Hive integration

Prerequisites

Hive3
Download Hive Kafka Storage Handler jar from https://mvnrepository.com/artifact/org.apache.hive/kafka-handler and add in hive lib location
Restart hive

Storm-Hive-Integration

Download storm-hive-examples < version >.jar from maven central or build it from https://github.com/apache/storm/tree/v1.2.1/examples/storm-hive-examples

Note: Ensure to use matching jar version of cluster version

We will try to save records with following fields into Hive table

{"id","name","phone","street","city","state"}

Spark Structured Streaming HWC integration

1) Setup Kafka topic

cd /usr/hdp/current/kafka-broker/bin/

./kafka-topics.sh --create --zookeeper c420-node2.coelab.cloudera.com:2181 --replication-factor 2 --partitions 3 --topic ss_input

Spark Listener Demo

This demonstrates Spark Job, Stage and Tasks Listeners

1) Start spark-shell

Welcome to
      ____              __
 / __/__ ___ _____/ /__

Hi there 👋

Storm-Hive-Integration - on HDP 3.1.0.0-78

Download https://mvnrepository.com/artifact/org.apache.storm/storm-hive-examples/1.2.1.3.1.0.0-78 from maven central or build it from https://github.com/hortonworks/storm/blob/HDP-3.1.0.0-78-tag/examples/storm-hive-examples

Note: No need to setup Kafka, as this demo topology simulates the input data from a local Spout.

We will try to save records with following fields into Hive table

{"id","name","phone","street","city","state"}

How to use hive builtin udf in spark sql

./spark-shell --jars /usr/hdp/current/hive-server2/lib/hive-exec.jar

val data = (1 to 10).toDF("col1").withColumn("col2",col("col1")).registerTempTable("table1")
spark.sql("CREATE TEMPORARY FUNCTION genericUDFAbsFromHive AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs'")
sql("select genericUDFAbsFromHive(col1-2000) as absCol1,col2 from table1").show(false)