Skip to content

Instantly share code, notes, and snippets.

View thanoojgithub's full-sized avatar
🏠
Working from home

thanooj kalathuru thanoojgithub

🏠
Working from home
View GitHub Profile
@thanoojgithub
thanoojgithub / How to find java.home
Last active August 9, 2020 07:06
How to find java.home
For Linux and macOS, let's use grep:
java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home'
And for Windows, let's use findstr:
java -XshowSettings:properties -version 2>&1 | findstr "java.home"
----------------------------------------------------------------------------------------------------
@thanoojgithub
thanoojgithub / Spring Boot notes
Created September 19, 2020 11:41
Spring Boot notes
Spring Boot is an open source Java-based framework used to develop a stand-alone and production-grade spring application that you can just run.
@thanoojgithub
thanoojgithub / MySQL_Notes.sql
Last active April 22, 2022 09:47
MySQL Notes
# MySQL installation in WSL2 ubuntu
# How to access mysql with default password in Ubuntu 20.04 ::
--------------------------------------------------------
sudo apt update
sudo apt upgrade
sudo apt install mysql-server
sudo apt install mysql-client
mysql --version
sudo usermod -d /var/lib/mysql/ mysql
@thanoojgithub
thanoojgithub / SparkSQLOne
Created November 4, 2020 05:58
Spark SQL notes
start-dfs.sh
start-yarn.sh
jps
sudo mkdir /tmp/spark-events
sudo chown hduser:hadoop -R tmp
hduser@thanoojubuntu-Inspiron-3521: start-master.sh
hduser@thanoojubuntu-Inspiron-3521: start-slave.sh spark://thanoojubuntu-Inspiron-3521:7077
starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.6-bin-hadoop2.7/logs/spark-hduser-org.apache.spark.deploy.worker.Worker-1-thanoojubuntu-Inspiron-3521.out
hduser@thanoojubuntu-Inspiron-3521:/tmp$ spark-shell --master spark://thanoojubuntu-Inspiron-3521:7077
@thanoojgithub
thanoojgithub / gist:528cbcc042998068add7261dfec93124
Created November 20, 2020 17:07
How to run HiveServer2 (Hive 2.3.3) on ubuntu 20.04
Pre-requisites:
1. Expecting Hadoop and Hive is well configured
2. hive CLI is working as expected
then, we can try running HiveServer2
hduser@thanoojubuntu-Inspiron-3521:~/softwares/apache-hive-2.3.3-bin/conf$ hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console
2020-11-20 21:25:32: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hduser/softwares/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
@thanoojgithub
thanoojgithub / AutoGenClassUtil.java
Created November 24, 2020 18:45
Basic Auto Gen Class Util
package com.autogenclass;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
@thanoojgithub
thanoojgithub / pysparkstreamingusingnc.py
Created November 27, 2020 07:49
pyspark streaming using netcat ‎as HTTP requests
hduser@thanoojubuntu-Inspiron-3521:~$ nc -lk 9999
helllo word hello python hello spark hello pyspark hellow streaming pyspark
@thanoojgithub
thanoojgithub / KafkaSampleProducer.java
Last active November 30, 2020 16:23
Kafka SampleProducer in java
package com.kafkaconnectone;
import java.util.Map.Entry;
import java.util.Properties;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.errors.AuthorizationException;
@thanoojgithub
thanoojgithub / WhySparkSQLOverHiveQL.txt
Created December 4, 2020 18:04
Why Spark SQL Over Hive QL
By default hive uses MR engine but, we can set to taz or even spark engine (in-memory computation)
But,
hive has SQL like HiveQL (HQL) and more usage when you are a SQL developer
even though we have UDFs, we do not have extra backyard area to do some core/complex business logic
and Spark has Spark SQL and we can move from DF to RDD and RDD to DF to perform core/complex business logic
No resume capability
Hive can not drop encripted databases
@thanoojgithub
thanoojgithub / SparkWithHiveUsingPython.py
Created December 12, 2020 17:37
spark with hive using python
import subprocess
from pyspark.sql import functions as f
from operator import add
from pyspark.sql import Row, SparkSession
from pyspark.sql.types import StructField, StringType, StructType
def sparkwithhiveone():
sparkwithhive = getsparkwithhive()
try:
assert (sparkwithhive.conf.get("spark.sql.catalogImplementation") == "hive")