thanooj kalathuru thanoojgithub

🏠

Working from home

a member of Java/BigData practice team at IMPETUS bangalore

12 followers · 0 following

Impetus Infotech India Pvt Ltd
Bangalore, India
https://www.linkedin.com/in/thanooj-kalathuru-47079310

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

thanoojgithub / org.apache.hadoop.hive.serde2

Created March 12, 2017 12:38

Apache Hadoop Hive Serde2 Notes

	Notes:
	-------------
	external table:
	ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
	WITH SERDEPROPERTIES ("input.regex" = "(.{2})(.{10})(.{30})(.{10})(.{10}).*" )
	LOCATION '${hiveconf:path}';

	location:maprfs:/externalpath
	inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
	serializationLib:org.apache.hadoop.hive.serde2.RegexSerDe,

thanoojgithub / MySQL.sql

Last active March 18, 2017 11:05

MySQL installation and setup in Ubuntu 16.4

	1. To see hostname and fully qualified domain name (FQDN), use:
	thanooj@thanooj-VirtualBox:~$ hostname
	thanooj-VirtualBox
	thanooj@thanooj-VirtualBox:~$ hostname -f
	thanooj-VirtualBox

	2. Update your system:
	thanooj@thanooj-VirtualBox:~$ sudo apt-get update

	3. Install MySQL

thanoojgithub / NhighestvalueinHive.sql

Created March 20, 2017 16:22

Nth highest value in Hive

	hive> CREATE TABLE emp_sal(id INT, salary DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
	OK
	Time taken: 0.261 seconds
	hive> LOAD DATA INPATH 'maprfs:/home/thanooj/emp_sal.txt' INTO TABLE emp_sal;
	Loading data to table thanooj.emp_sal
	Table thanooj.emp_sal stats: [numFiles=1, numRows=0, totalSize=139, rawDataSize=0]
	OK
	Time taken: 0.504 seconds
	hive> select * from emp_sal;
	OK

thanoojgithub / WordCountInHive.sql

Last active March 20, 2017 17:03

Word Count in Hive

	$ ls -ltr
	total 4
	-rw-r--r-- 1 thanooj users 3221 Mar 20 05:36 words_count.txt
	$ vi words_count.txt
	$ cat words_count.txt
	Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources.
	Relational databases are examples of structured data sources with well defined schema for the data they store.
	Cassandra, Hbase are examples of semi-structured data sources.
	HDFS is an example of unstructured data source that Sqoop can support.
	With Sqoop, you can import data from a relational database system or a mainframe into HDFS

thanoojgithub / Deck.java

Created March 29, 2017 17:46

Java Collection - Standard 52-card deck use case

	package com;

	import java.util.ArrayList;
	import java.util.Arrays;
	import java.util.Collections;
	import java.util.List;
	import java.util.Random;

	public class Deck {

thanoojgithub / HowToDeleteFilesListedInATextFile.sh

Created April 1, 2017 14:13

How to delete files listed in a text file

	pwd
	current_pwd=`pwd`
	cd /home/thanooj/work
	pwd
	ls -ltr
	for f in $(cat /home/thanooj/files/symlink_localized_file_list.txt); do
	rm "$f"
	done
	ls -ltr
	cd $current_pwd

thanoojgithub / HiveInstallConfig.sh

Last active April 16, 2017 18:28

Apache Hive Install and Configuration

	Apache Hive - Installation and Configuration

	UBUNTU 14.04 LTS
	JAVA - Oracle JDK 8
	HADOOP 2.7.3
	HIVE 2.1.1
	MySQL 5.5 server

	1.
	https://hive.apache.org/downloads.html

thanoojgithub / HadoopHiveSparkHBase

Last active February 11, 2020 07:42

Hadoop Hive Spark configuration on Ubuntu 16.04

	sudo apt-get install ssh
	sudo apt-get install rsync


	sudo apt install openssh-client
	sudo apt install openssh-server

	ssh localhost
	ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
	cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

thanoojgithub / wordCount.txt

Created July 28, 2019 07:25

Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.

thanoojgithub / docker-mysql-standalone

Last active June 20, 2020 07:29

docker-mysql connecting using mysql workbench

	PS C:\Users\thanooj> docker pull mysql
	PS C:\Users\thanooj> docker images
	REPOSITORY TAG IMAGE ID CREATED SIZE
	springio/gs-spring-boot-docker latest c8778cb72ef5 5 days ago 527MB
	openjdk 8 b190ad78b520 10 days ago 510MB
	mysql latest be0dbf01a0f3 11 days ago 541MB
	hello-world latest bf756fb1ae65 5 months ago 13.3kB
	PS C:\Users\thanooj>
	PS C:\Users\thanooj> docker container ls -a
	CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

Older Newer