This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
env # to get all env variables | |
*********to work as root************* | |
su - | |
**************ifconfig synonyms------------ | |
ip address show or ip a s or ip a s eth0 | |
************formatted file name************ | |
cp a.txt a_$(date +%F).txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hive> set mapreduce.framework.name=local | |
display hive database name: set hive.cli.print.current.db=true; | |
DESCRIBE EXTENDED husn_small; --to get statistics | |
Analyze table husn_small compute statistics; | |
create table snpn(sn String, pn String) | |
LOAD DATA INPATH 'hdfs://127200813master.eap.g4ihos.itcs.hpecorp.net:8020/user/centos7/test_data/snpn' append INTO TABLE snpn | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Scala examples | |
map: | |
val l = List(1,2,3,4,5) | |
l.map(x => x + 3 ) or l.map(_ + 3 ) | |
pass a function as param to map: | |
def f(x:Int) = if (x > 3 ) (x) else None | |
l.map(x => f(x)) or l.map( f(_)) | |
flatMap example: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | |
<modelVersion>4.0.0</modelVersion> | |
<parent> | |
<groupId>nosql</groupId> | |
<artifactId>gettingstarted</artifactId> | |
<version>0.0.1-SNAPSHOT</version> | |
</parent> | |
<groupId>com</groupId> | |
<artifactId>hbase</artifactId> | |
<version>0.0.1-hbase-SNAPSHOT</version> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Source --> channel --> sink | |
set source, channel and sink in .conf file. | |
source type eg: exec for shell commands like tail...... | |
flume-ng agent --conf conf -conf-file /usr/hdp/2.5.0.0-1245/flume/conf/flume-hdfs-sink.conf --name agent1 | |
flume-ng agent --conf conf -conf-file /usr/hdp/2.5.0.0-1245/flume/conf/flume-hdfs-sink_file.conf --name agent2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Small file size | |
1.MR - CombinedFileInputFormat | |
Hive - copy by fewer Reducers | |
2.set input split size - block size - number of mappers( to bigger number) | |
each mapper uses one jvm - fewer the mappers, fewer the jvms created and destroyed. | |
if you have more mapper then smaller split size is better. - fewer mappers bigger size is better. | |
3.allocating proper number of reducres |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ACID - Atomicity, Consistency, Isoloation, Durability | |
Atomicity: all or none(mutiple dmls all as one) | |
Consistency: a transaction either creates a new and valid state of data, or in failure, its previous state.(commit/rollback) | |
Isolation: transaction in process(uncommitted inserts) should not be visible to other transaction. | |
Durability: in the event of failure or restart committed data should be recoverable. | |
CAP theorem: Consistency, Availability, Partition Tolerance | |
consistency - Every read receives the most recent write or error. | |
availability - every request receives a non-error response - without guarantee that it contains the most recent data | |
partition tolerance - the system continues to operate despite an arbitrary number of messages being dropped /delayed between the nodes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo ssh 127200813data00.eap.g4ihos.itcs.hpecorp.net | |
cd /usr/hdp/2.5.0.0-1245/kafka/bin/ | |
./kafka-topics.sh --create --zookeeper 127200813master.eap.g4ihos.itcs.hpecorp.net:2181,127200813data02.eap.g4ihos.itcs.hpecorp.net:2181,127200813data01.eap.g4ihos.itcs.hpecorp.net:2181,127200813data00.eap.g4ihos.itcs.hpecorp.net:2181 --replication-factor 1 --partitions 1 --topic test | |
./kafka-topics.sh --list --zookeeper 127200813master.eap.g4ihos.itcs.hpecorp.net:2181,127200813data02.eap.g4ihos.itcs.hpecorp.net:2181,127200813data01.eap.g4ihos.itcs.hpecorp.net:2181,127200813data00.eap.g4ihos.itcs.hpecorp.net:2181 | |
./kafka-console-producer.sh --broker-list 127200813data00.eap.g4ihos.itcs.hpecorp.net:9092 --topic test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you own souce code, make all methods final - never accidentally overridden | |
Arrays and collections should never be null | |
avoid state - like http - parallelism/distributed | |
eventHandling: | |
https://www.youtube.com/watch?v=ZUe1Xz7DAcY#t=17.905495 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
adduser | |
useradd etl_user -g hadoop | |
identify user | |
id etl_user | |
login as a different user | |
sudo su - etl_user | |
as hdfs is the hadoop rootuser in distributions like hortonworks/cloudera | |
sudo su - hdfs | |
hadoop fs -mkdir /user/etl_user | |
hadoop fs -chown etl_user:supergroup /user/etl_user |
OlderNewer