thanooj kalathuru thanoojgithub

🏠

Working from home

a member of Java/BigData practice team at IMPETUS bangalore

thanoojgithub / HiveInstallConfig.sh

Last active April 16, 2017 18:28

Apache Hive Install and Configuration

	Apache Hive - Installation and Configuration

	UBUNTU 14.04 LTS
	JAVA - Oracle JDK 8
	HADOOP 2.7.3
	HIVE 2.1.1
	MySQL 5.5 server

	1.
	https://hive.apache.org/downloads.html

thanoojgithub / HowToDeleteFilesListedInATextFile.sh

Created April 1, 2017 14:13

How to delete files listed in a text file

	pwd
	current_pwd=`pwd`
	cd /home/thanooj/work
	pwd
	ls -ltr
	for f in $(cat /home/thanooj/files/symlink_localized_file_list.txt); do
	rm "$f"
	done
	ls -ltr
	cd $current_pwd

thanoojgithub / Deck.java

Created March 29, 2017 17:46

Java Collection - Standard 52-card deck use case

	package com;

	import java.util.ArrayList;
	import java.util.Arrays;
	import java.util.Collections;
	import java.util.List;
	import java.util.Random;

	public class Deck {

thanoojgithub / WordCountInHive.sql

Last active March 20, 2017 17:03

Word Count in Hive

	$ ls -ltr
	total 4
	-rw-r--r-- 1 thanooj users 3221 Mar 20 05:36 words_count.txt
	$ vi words_count.txt
	$ cat words_count.txt
	Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources.
	Relational databases are examples of structured data sources with well defined schema for the data they store.
	Cassandra, Hbase are examples of semi-structured data sources.
	HDFS is an example of unstructured data source that Sqoop can support.
	With Sqoop, you can import data from a relational database system or a mainframe into HDFS

thanoojgithub / NhighestvalueinHive.sql

Created March 20, 2017 16:22

Nth highest value in Hive

	hive> CREATE TABLE emp_sal(id INT, salary DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
	OK
	Time taken: 0.261 seconds
	hive> LOAD DATA INPATH 'maprfs:/home/thanooj/emp_sal.txt' INTO TABLE emp_sal;
	Loading data to table thanooj.emp_sal
	Table thanooj.emp_sal stats: [numFiles=1, numRows=0, totalSize=139, rawDataSize=0]
	OK
	Time taken: 0.504 seconds
	hive> select * from emp_sal;
	OK

thanoojgithub / MySQL.sql

Last active March 18, 2017 11:05

MySQL installation and setup in Ubuntu 16.4

	1. To see hostname and fully qualified domain name (FQDN), use:
	thanooj@thanooj-VirtualBox:~$ hostname
	thanooj-VirtualBox
	thanooj@thanooj-VirtualBox:~$ hostname -f
	thanooj-VirtualBox

	2. Update your system:
	thanooj@thanooj-VirtualBox:~$ sudo apt-get update

	3. Install MySQL

thanoojgithub / org.apache.hadoop.hive.serde2

Created March 12, 2017 12:38

Apache Hadoop Hive Serde2 Notes

	Notes:
	-------------
	external table:
	ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
	WITH SERDEPROPERTIES ("input.regex" = "(.{2})(.{10})(.{30})(.{10})(.{10}).*" )
	LOCATION '${hiveconf:path}';

	location:maprfs:/externalpath
	inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
	serializationLib:org.apache.hadoop.hive.serde2.RegexSerDe,

thanoojgithub / spark_notes.sh

Created February 12, 2017 20:30

apache spark with scala

	Apache Spark is general purpose computation/execution engine,
	uses RDD in a reselient(lineage using underlying HDFS for recovery, in its own way).
	having Transformations results new RDD from it, consistency by Immutable in nature
	does Lazy evaluation until action called.

	Benifits:
	Fault Recovery using lineage
	Optimized for inmemory computations - placing computations optimally using directed acyclic graph
	Easy programming - doing transfermations on RDD by calling actions.
	Rich is library support - MLib (machine learning), graphx, data frames, including batch and streaming

thanoojgithub / HadoopConfigureFilesChanges

Created February 11, 2017 14:36

Hadoop configure files changes

	export JAVA_HOME=/usr/lib/jvm/java-8-oracle
	export SCALA_HOME=/home/thanooj/Scala/scala-2.12.1
	export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin
	export HADOOP_HOME=/home/thanooj/bigdata/hadoop-2.7.3
	export HADOOP_MAPRED_HOME=/home/thanooj/bigdata/hadoop-2.7.3
	export HADOOP_COMMON_HOME=/home/thanooj/bigdata/hadoop-2.7.3
	export HADOOP_HDFS_HOME=/home/thanooj/bigdata/hadoop-2.7.3
	export YARN_HOME=/home/thanooj/bigdata/hadoop-2.7.3
	export HADOOP_CONF_DIR=/home/thanooj/bigdata/hadoop-2.7.3/etc/hadoop
	export HADOOP_COMMON_LIB_NATIVE_DIR=/home/thanooj/bigdata/hadoop-2.7.3/lib/native

thanoojgithub / HadoopConfiguration

Created January 20, 2017 19:52

	1. Install JAVA


	2.
	thanooj@thanooj-Inspiron-3521:~$ sudo addgroup hadoop
	Adding group `hadoop' (GID 1001) ...
	Done.
	thanooj@thanooj-Inspiron-3521:~$ sudo adduser --ingroup hadoop hadoopuser
	Adding user `hadoopuser' ...
	Adding new user `hadoopuser' (1001) with group `hadoop' ...