Skip to content

Instantly share code, notes, and snippets.

View thanoojgithub's full-sized avatar
🏠
Working from home

thanooj kalathuru thanoojgithub

🏠
Working from home
View GitHub Profile
@thanoojgithub
thanoojgithub / HiveInstallConfig.sh
Last active April 16, 2017 18:28
Apache Hive Install and Configuration
Apache Hive - Installation and Configuration
UBUNTU 14.04 LTS
JAVA - Oracle JDK 8
HADOOP 2.7.3
HIVE 2.1.1
MySQL 5.5 server
1.
https://hive.apache.org/downloads.html
@thanoojgithub
thanoojgithub / HowToDeleteFilesListedInATextFile.sh
Created April 1, 2017 14:13
How to delete files listed in a text file
pwd
current_pwd=`pwd`
cd /home/thanooj/work
pwd
ls -ltr
for f in $(cat /home/thanooj/files/symlink_localized_file_list.txt); do
rm "$f"
done
ls -ltr
cd $current_pwd
@thanoojgithub
thanoojgithub / Deck.java
Created March 29, 2017 17:46
Java Collection - Standard 52-card deck use case
package com;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Random;
public class Deck {
@thanoojgithub
thanoojgithub / WordCountInHive.sql
Last active March 20, 2017 17:03
Word Count in Hive
$ ls -ltr
total 4
-rw-r--r-- 1 thanooj users 3221 Mar 20 05:36 words_count.txt
$ vi words_count.txt
$ cat words_count.txt
Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources.
Relational databases are examples of structured data sources with well defined schema for the data they store.
Cassandra, Hbase are examples of semi-structured data sources.
HDFS is an example of unstructured data source that Sqoop can support.
With Sqoop, you can import data from a relational database system or a mainframe into HDFS
@thanoojgithub
thanoojgithub / NhighestvalueinHive.sql
Created March 20, 2017 16:22
Nth highest value in Hive
hive> CREATE TABLE emp_sal(id INT, salary DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
OK
Time taken: 0.261 seconds
hive> LOAD DATA INPATH 'maprfs:/home/thanooj/emp_sal.txt' INTO TABLE emp_sal;
Loading data to table thanooj.emp_sal
Table thanooj.emp_sal stats: [numFiles=1, numRows=0, totalSize=139, rawDataSize=0]
OK
Time taken: 0.504 seconds
hive> select * from emp_sal;
OK
@thanoojgithub
thanoojgithub / MySQL.sql
Last active March 18, 2017 11:05
MySQL installation and setup in Ubuntu 16.4
1. To see hostname and fully qualified domain name (FQDN), use:
thanooj@thanooj-VirtualBox:~$ hostname
thanooj-VirtualBox
thanooj@thanooj-VirtualBox:~$ hostname -f
thanooj-VirtualBox
2. Update your system:
thanooj@thanooj-VirtualBox:~$ sudo apt-get update
3. Install MySQL
@thanoojgithub
thanoojgithub / org.apache.hadoop.hive.serde2
Created March 12, 2017 12:38
Apache Hadoop Hive Serde2 Notes
Notes:
-------------
external table:
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{2})(.{10})(.{30})(.{10})(.{10}).*" )
LOCATION '${hiveconf:path}';
location:maprfs:/externalpath
inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
serializationLib:org.apache.hadoop.hive.serde2.RegexSerDe,
@thanoojgithub
thanoojgithub / spark_notes.sh
Created February 12, 2017 20:30
apache spark with scala
Apache Spark is general purpose computation/execution engine,
uses RDD in a reselient(lineage using underlying HDFS for recovery, in its own way).
having Transformations results new RDD from it, consistency by Immutable in nature
does Lazy evaluation until action called.
Benifits:
Fault Recovery using lineage
Optimized for inmemory computations - placing computations optimally using directed acyclic graph
Easy programming - doing transfermations on RDD by calling actions.
Rich is library support - MLib (machine learning), graphx, data frames, including batch and streaming
@thanoojgithub
thanoojgithub / HadoopConfigureFilesChanges
Created February 11, 2017 14:36
Hadoop configure files changes
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export SCALA_HOME=/home/thanooj/Scala/scala-2.12.1
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin
export HADOOP_HOME=/home/thanooj/bigdata/hadoop-2.7.3
export HADOOP_MAPRED_HOME=/home/thanooj/bigdata/hadoop-2.7.3
export HADOOP_COMMON_HOME=/home/thanooj/bigdata/hadoop-2.7.3
export HADOOP_HDFS_HOME=/home/thanooj/bigdata/hadoop-2.7.3
export YARN_HOME=/home/thanooj/bigdata/hadoop-2.7.3
export HADOOP_CONF_DIR=/home/thanooj/bigdata/hadoop-2.7.3/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/thanooj/bigdata/hadoop-2.7.3/lib/native
1. Install JAVA
2.
thanooj@thanooj-Inspiron-3521:~$ sudo addgroup hadoop
Adding group `hadoop' (GID 1001) ...
Done.
thanooj@thanooj-Inspiron-3521:~$ sudo adduser --ingroup hadoop hadoopuser
Adding user `hadoopuser' ...
Adding new user `hadoopuser' (1001) with group `hadoop' ...