This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Apache Hive - Installation and Configuration | |
UBUNTU 14.04 LTS | |
JAVA - Oracle JDK 8 | |
HADOOP 2.7.3 | |
HIVE 2.1.1 | |
MySQL 5.5 server | |
1. | |
https://hive.apache.org/downloads.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pwd | |
current_pwd=`pwd` | |
cd /home/thanooj/work | |
pwd | |
ls -ltr | |
for f in $(cat /home/thanooj/files/symlink_localized_file_list.txt); do | |
rm "$f" | |
done | |
ls -ltr | |
cd $current_pwd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com; | |
import java.util.ArrayList; | |
import java.util.Arrays; | |
import java.util.Collections; | |
import java.util.List; | |
import java.util.Random; | |
public class Deck { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ls -ltr | |
total 4 | |
-rw-r--r-- 1 thanooj users 3221 Mar 20 05:36 words_count.txt | |
$ vi words_count.txt | |
$ cat words_count.txt | |
Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources. | |
Relational databases are examples of structured data sources with well defined schema for the data they store. | |
Cassandra, Hbase are examples of semi-structured data sources. | |
HDFS is an example of unstructured data source that Sqoop can support. | |
With Sqoop, you can import data from a relational database system or a mainframe into HDFS |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hive> CREATE TABLE emp_sal(id INT, salary DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile; | |
OK | |
Time taken: 0.261 seconds | |
hive> LOAD DATA INPATH 'maprfs:/home/thanooj/emp_sal.txt' INTO TABLE emp_sal; | |
Loading data to table thanooj.emp_sal | |
Table thanooj.emp_sal stats: [numFiles=1, numRows=0, totalSize=139, rawDataSize=0] | |
OK | |
Time taken: 0.504 seconds | |
hive> select * from emp_sal; | |
OK |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. To see hostname and fully qualified domain name (FQDN), use: | |
thanooj@thanooj-VirtualBox:~$ hostname | |
thanooj-VirtualBox | |
thanooj@thanooj-VirtualBox:~$ hostname -f | |
thanooj-VirtualBox | |
2. Update your system: | |
thanooj@thanooj-VirtualBox:~$ sudo apt-get update | |
3. Install MySQL |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Notes: | |
------------- | |
external table: | |
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' | |
WITH SERDEPROPERTIES ("input.regex" = "(.{2})(.{10})(.{30})(.{10})(.{10}).*" ) | |
LOCATION '${hiveconf:path}'; | |
location:maprfs:/externalpath | |
inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, | |
serializationLib:org.apache.hadoop.hive.serde2.RegexSerDe, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Apache Spark is general purpose computation/execution engine, | |
uses RDD in a reselient(lineage using underlying HDFS for recovery, in its own way). | |
having Transformations results new RDD from it, consistency by Immutable in nature | |
does Lazy evaluation until action called. | |
Benifits: | |
Fault Recovery using lineage | |
Optimized for inmemory computations - placing computations optimally using directed acyclic graph | |
Easy programming - doing transfermations on RDD by calling actions. | |
Rich is library support - MLib (machine learning), graphx, data frames, including batch and streaming |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
export JAVA_HOME=/usr/lib/jvm/java-8-oracle | |
export SCALA_HOME=/home/thanooj/Scala/scala-2.12.1 | |
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin | |
export HADOOP_HOME=/home/thanooj/bigdata/hadoop-2.7.3 | |
export HADOOP_MAPRED_HOME=/home/thanooj/bigdata/hadoop-2.7.3 | |
export HADOOP_COMMON_HOME=/home/thanooj/bigdata/hadoop-2.7.3 | |
export HADOOP_HDFS_HOME=/home/thanooj/bigdata/hadoop-2.7.3 | |
export YARN_HOME=/home/thanooj/bigdata/hadoop-2.7.3 | |
export HADOOP_CONF_DIR=/home/thanooj/bigdata/hadoop-2.7.3/etc/hadoop | |
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/thanooj/bigdata/hadoop-2.7.3/lib/native |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Install JAVA | |
2. | |
thanooj@thanooj-Inspiron-3521:~$ sudo addgroup hadoop | |
Adding group `hadoop' (GID 1001) ... | |
Done. | |
thanooj@thanooj-Inspiron-3521:~$ sudo adduser --ingroup hadoop hadoopuser | |
Adding user `hadoopuser' ... | |
Adding new user `hadoopuser' (1001) with group `hadoop' ... |