thanooj kalathuru thanoojgithub

🏠

Working from home

a member of Java/BigData practice team at IMPETUS bangalore

thanoojgithub / How to find java.home

Last active August 9, 2020 07:06

How to find java.home

	For Linux and macOS, let's use grep:
	java -XshowSettings:properties -version 2>&1 > /dev/null \| grep 'java.home'

	And for Windows, let's use findstr:
	java -XshowSettings:properties -version 2>&1 \| findstr "java.home"



	----------------------------------------------------------------------------------------------------

thanoojgithub / Spring Boot notes

Created September 19, 2020 11:41

Spring Boot notes

Spring Boot is an open source Java-based framework used to develop a stand-alone and production-grade spring application that you can just run.

thanoojgithub / MySQL_Notes.sql

Last active April 22, 2022 09:47

MySQL Notes

	# MySQL installation in WSL2 ubuntu
	# How to access mysql with default password in Ubuntu 20.04 ::
	--------------------------------------------------------
	sudo apt update
	sudo apt upgrade
	sudo apt install mysql-server
	sudo apt install mysql-client
	mysql --version

	sudo usermod -d /var/lib/mysql/ mysql

thanoojgithub / SparkSQLOne

Created November 4, 2020 05:58

Spark SQL notes

	start-dfs.sh
	start-yarn.sh
	jps
	sudo mkdir /tmp/spark-events
	sudo chown hduser:hadoop -R tmp

	hduser@thanoojubuntu-Inspiron-3521: start-master.sh
	hduser@thanoojubuntu-Inspiron-3521: start-slave.sh spark://thanoojubuntu-Inspiron-3521:7077
	starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.6-bin-hadoop2.7/logs/spark-hduser-org.apache.spark.deploy.worker.Worker-1-thanoojubuntu-Inspiron-3521.out
	hduser@thanoojubuntu-Inspiron-3521:/tmp$ spark-shell --master spark://thanoojubuntu-Inspiron-3521:7077

thanoojgithub / gist:528cbcc042998068add7261dfec93124

Created November 20, 2020 17:07

How to run HiveServer2 (Hive 2.3.3) on ubuntu 20.04

	Pre-requisites:
	1. Expecting Hadoop and Hive is well configured
	2. hive CLI is working as expected
	then, we can try running HiveServer2


	hduser@thanoojubuntu-Inspiron-3521:~/softwares/apache-hive-2.3.3-bin/conf$ hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console
	2020-11-20 21:25:32: Starting HiveServer2
	SLF4J: Class path contains multiple SLF4J bindings.
	SLF4J: Found binding in [jar:file:/home/hduser/softwares/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

thanoojgithub / AutoGenClassUtil.java

Created November 24, 2020 18:45

Basic Auto Gen Class Util

	package com.autogenclass;

	import java.io.File;
	import java.io.IOException;
	import java.nio.file.Files;
	import java.nio.file.Paths;
	import java.util.ArrayList;
	import java.util.List;
	import java.util.stream.Collectors;
	import java.util.stream.Stream;

thanoojgithub / pysparkstreamingusingnc.py

Created November 27, 2020 07:49

pyspark streaming using netcat ‎as HTTP requests

	hduser@thanoojubuntu-Inspiron-3521:~$ nc -lk 9999
	helllo word hello python hello spark hello pyspark hellow streaming pyspark

thanoojgithub / KafkaSampleProducer.java

Last active November 30, 2020 16:23

Kafka SampleProducer in java

	package com.kafkaconnectone;

	import java.util.Map.Entry;
	import java.util.Properties;

	import org.apache.kafka.clients.producer.KafkaProducer;
	import org.apache.kafka.clients.producer.Producer;
	import org.apache.kafka.clients.producer.ProducerRecord;
	import org.apache.kafka.common.KafkaException;
	import org.apache.kafka.common.errors.AuthorizationException;

thanoojgithub / WhySparkSQLOverHiveQL.txt

Created December 4, 2020 18:04

Why Spark SQL Over Hive QL

	By default hive uses MR engine but, we can set to taz or even spark engine (in-memory computation)
	But,
	hive has SQL like HiveQL (HQL) and more usage when you are a SQL developer
	even though we have UDFs, we do not have extra backyard area to do some core/complex business logic
	and Spark has Spark SQL and we can move from DF to RDD and RDD to DF to perform core/complex business logic

	No resume capability

	Hive can not drop encripted databases

thanoojgithub / SparkWithHiveUsingPython.py

Created December 12, 2020 17:37

spark with hive using python

	import subprocess
	from pyspark.sql import functions as f
	from operator import add
	from pyspark.sql import Row, SparkSession
	from pyspark.sql.types import StructField, StringType, StructType

	def sparkwithhiveone():
	sparkwithhive = getsparkwithhive()
	try:
	assert (sparkwithhive.conf.get("spark.sql.catalogImplementation") == "hive")