Skip to content

Instantly share code, notes, and snippets.

View vinodkc's full-sized avatar

Vinod KC vinodkc

  • Databricks
  • Mountainview
  • 16:39 (UTC -07:00)
View GitHub Profile

Spark on Docker - HDP3 YARN

  1. Kerberize the cluster

  2. Enable CGroup from yarn and restart

To enable cgroups on an Ambari cluster, select YARN > Configs on the Ambari dashboard, then click CPU Isolation under CPU. Click Save, then restart all cluster components that require a restart

I got mount failure error: /sys/fs/cgroup/cpu/yarn Solution , run below command on all node manager hosts:

Login to LLAP host node

A) Test with Spark-shell

step 1:

cd /tmp
wget https://raw.githubusercontent.com/dbompart/hive_warehouse_connector/master/hwc_info_collect.sh
chmod +x  hwc_info_collect.sh

Spark Structured Streaming HWC integration

1) Setup Kafka topic

cd /usr/hdp/current/kafka-broker/bin/

./kafka-topics.sh --create --zookeeper c420-node2.coelab.cloudera.com:2181 --replication-factor 2 --partitions 3 --topic ss_input

Spark Listener Demo

This demonstrates Spark Job, Stage and Tasks Listeners

1) Start spark-shell

Welcome to
      ____              __
 / __/__ ___ _____/ /__

Hi there 👋

How to use hive builtin udf in spark sql

./spark-shell --jars /usr/hdp/current/hive-server2/lib/hive-exec.jar
val data = (1 to 10).toDF("col1").withColumn("col2",col("col1")).registerTempTable("table1")
spark.sql("CREATE TEMPORARY FUNCTION genericUDFAbsFromHive AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs'")
sql("select genericUDFAbsFromHive(col1-2000) as absCol1,col2 from table1").show(false)