Skip to content

Instantly share code, notes, and snippets.

View thanoojgithub's full-sized avatar
🏠
Working from home

thanooj kalathuru thanoojgithub

🏠
Working from home
View GitHub Profile
@thanoojgithub
thanoojgithub / org.apache.hadoop.hive.serde2
Created March 12, 2017 12:38
Apache Hadoop Hive Serde2 Notes
Notes:
-------------
external table:
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{2})(.{10})(.{30})(.{10})(.{10}).*" )
LOCATION '${hiveconf:path}';
location:maprfs:/externalpath
inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
serializationLib:org.apache.hadoop.hive.serde2.RegexSerDe,
@thanoojgithub
thanoojgithub / MySQL.sql
Last active March 18, 2017 11:05
MySQL installation and setup in Ubuntu 16.4
1. To see hostname and fully qualified domain name (FQDN), use:
thanooj@thanooj-VirtualBox:~$ hostname
thanooj-VirtualBox
thanooj@thanooj-VirtualBox:~$ hostname -f
thanooj-VirtualBox
2. Update your system:
thanooj@thanooj-VirtualBox:~$ sudo apt-get update
3. Install MySQL
@thanoojgithub
thanoojgithub / NhighestvalueinHive.sql
Created March 20, 2017 16:22
Nth highest value in Hive
hive> CREATE TABLE emp_sal(id INT, salary DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
OK
Time taken: 0.261 seconds
hive> LOAD DATA INPATH 'maprfs:/home/thanooj/emp_sal.txt' INTO TABLE emp_sal;
Loading data to table thanooj.emp_sal
Table thanooj.emp_sal stats: [numFiles=1, numRows=0, totalSize=139, rawDataSize=0]
OK
Time taken: 0.504 seconds
hive> select * from emp_sal;
OK
@thanoojgithub
thanoojgithub / WordCountInHive.sql
Last active March 20, 2017 17:03
Word Count in Hive
$ ls -ltr
total 4
-rw-r--r-- 1 thanooj users 3221 Mar 20 05:36 words_count.txt
$ vi words_count.txt
$ cat words_count.txt
Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources.
Relational databases are examples of structured data sources with well defined schema for the data they store.
Cassandra, Hbase are examples of semi-structured data sources.
HDFS is an example of unstructured data source that Sqoop can support.
With Sqoop, you can import data from a relational database system or a mainframe into HDFS
@thanoojgithub
thanoojgithub / Deck.java
Created March 29, 2017 17:46
Java Collection - Standard 52-card deck use case
package com;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Random;
public class Deck {
@thanoojgithub
thanoojgithub / HowToDeleteFilesListedInATextFile.sh
Created April 1, 2017 14:13
How to delete files listed in a text file
pwd
current_pwd=`pwd`
cd /home/thanooj/work
pwd
ls -ltr
for f in $(cat /home/thanooj/files/symlink_localized_file_list.txt); do
rm "$f"
done
ls -ltr
cd $current_pwd
@thanoojgithub
thanoojgithub / HiveInstallConfig.sh
Last active April 16, 2017 18:28
Apache Hive Install and Configuration
Apache Hive - Installation and Configuration
UBUNTU 14.04 LTS
JAVA - Oracle JDK 8
HADOOP 2.7.3
HIVE 2.1.1
MySQL 5.5 server
1.
https://hive.apache.org/downloads.html
@thanoojgithub
thanoojgithub / HadoopHiveSparkHBase
Last active February 11, 2020 07:42
Hadoop Hive Spark configuration on Ubuntu 16.04
sudo apt-get install ssh
sudo apt-get install rsync
sudo apt install openssh-client
sudo apt install openssh-server
ssh localhost
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.
@thanoojgithub
thanoojgithub / docker-mysql-standalone
Last active June 20, 2020 07:29
docker-mysql connecting using mysql workbench
PS C:\Users\thanooj> docker pull mysql
PS C:\Users\thanooj> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
springio/gs-spring-boot-docker latest c8778cb72ef5 5 days ago 527MB
openjdk 8 b190ad78b520 10 days ago 510MB
mysql latest be0dbf01a0f3 11 days ago 541MB
hello-world latest bf756fb1ae65 5 months ago 13.3kB
PS C:\Users\thanooj>
PS C:\Users\thanooj> docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES