Skip to content

Instantly share code, notes, and snippets.

View thanoojgithub's full-sized avatar
🏠
Working from home

thanooj kalathuru thanoojgithub

🏠
Working from home
View GitHub Profile
@thanoojgithub
thanoojgithub / pysparkstreamingusingnc.py
Created November 27, 2020 07:49
pyspark streaming using netcat ‎as HTTP requests
hduser@thanoojubuntu-Inspiron-3521:~$ nc -lk 9999
helllo word hello python hello spark hello pyspark hellow streaming pyspark
@thanoojgithub
thanoojgithub / AutoGenClassUtil.java
Created November 24, 2020 18:45
Basic Auto Gen Class Util
package com.autogenclass;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
@thanoojgithub
thanoojgithub / gist:528cbcc042998068add7261dfec93124
Created November 20, 2020 17:07
How to run HiveServer2 (Hive 2.3.3) on ubuntu 20.04
Pre-requisites:
1. Expecting Hadoop and Hive is well configured
2. hive CLI is working as expected
then, we can try running HiveServer2
hduser@thanoojubuntu-Inspiron-3521:~/softwares/apache-hive-2.3.3-bin/conf$ hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console
2020-11-20 21:25:32: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hduser/softwares/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
@thanoojgithub
thanoojgithub / SparkSQLOne
Created November 4, 2020 05:58
Spark SQL notes
start-dfs.sh
start-yarn.sh
jps
sudo mkdir /tmp/spark-events
sudo chown hduser:hadoop -R tmp
hduser@thanoojubuntu-Inspiron-3521: start-master.sh
hduser@thanoojubuntu-Inspiron-3521: start-slave.sh spark://thanoojubuntu-Inspiron-3521:7077
starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.6-bin-hadoop2.7/logs/spark-hduser-org.apache.spark.deploy.worker.Worker-1-thanoojubuntu-Inspiron-3521.out
hduser@thanoojubuntu-Inspiron-3521:/tmp$ spark-shell --master spark://thanoojubuntu-Inspiron-3521:7077
@thanoojgithub
thanoojgithub / MySQL_Notes.sql
Last active April 22, 2022 09:47
MySQL Notes
# MySQL installation in WSL2 ubuntu
# How to access mysql with default password in Ubuntu 20.04 ::
--------------------------------------------------------
sudo apt update
sudo apt upgrade
sudo apt install mysql-server
sudo apt install mysql-client
mysql --version
sudo usermod -d /var/lib/mysql/ mysql
@thanoojgithub
thanoojgithub / Spring Boot notes
Created September 19, 2020 11:41
Spring Boot notes
Spring Boot is an open source Java-based framework used to develop a stand-alone and production-grade spring application that you can just run.
@thanoojgithub
thanoojgithub / How to find java.home
Last active August 9, 2020 07:06
How to find java.home
For Linux and macOS, let's use grep:
java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home'
And for Windows, let's use findstr:
java -XshowSettings:properties -version 2>&1 | findstr "java.home"
----------------------------------------------------------------------------------------------------
@thanoojgithub
thanoojgithub / docker-mysql-standalone
Last active June 20, 2020 07:29
docker-mysql connecting using mysql workbench
PS C:\Users\thanooj> docker pull mysql
PS C:\Users\thanooj> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
springio/gs-spring-boot-docker latest c8778cb72ef5 5 days ago 527MB
openjdk 8 b190ad78b520 10 days ago 510MB
mysql latest be0dbf01a0f3 11 days ago 541MB
hello-world latest bf756fb1ae65 5 months ago 13.3kB
PS C:\Users\thanooj>
PS C:\Users\thanooj> docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.
@thanoojgithub
thanoojgithub / HadoopHiveSparkHBase
Last active February 11, 2020 07:42
Hadoop Hive Spark configuration on Ubuntu 16.04
sudo apt-get install ssh
sudo apt-get install rsync
sudo apt install openssh-client
sudo apt install openssh-server
ssh localhost
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys