Install HDB

Add HDB (HAWQ) to HDP 2.4.2 with Zeppelin

Goals:

Install a 4 node cluster running HDP 2.4.2 using Ambari 2.2.2.0 (including Zeppelin and HDB) using Ambari bootstrap via blueprints or Ambari install wizard
Configure HAWQ for Zeppelin
Configure Zeppelin for HAWQ
Run HAWQ queries via Zeppelin

Notes:

HDB managed via Ambari is only supported from Ambri 2.2.2.0 onwards. Do not attempt using older versions of Ambari

Install Ambari 2.2.2.0 and HDB service definitions

Bring up 4 VMs imaged with RHEL/CentOS 6.x (e.g. node1-4 in this case)
On non-ambari nodes (nodes2-4 in this case), install ambari-agents and point them to ambari node (e.g. node1 in this case)

export ambari_server=node1
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh

On Ambari node (e.g. node1), install ambari-server

export install_ambari_server=true
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh

Install Zeppelin service definition

yum install -y git
git clone https://github.com/hortonworks-gallery/ambari-zeppelin-service.git /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ZEPPELIN
sed -i.bak '/dependencies for all/a \    "ZEPPELIN_MASTER-START": ["NAMENODE-START", "DATANODE-START"],' /var/lib/ambari-server/resources/stacks/HDP/2.4/role_command_order.json

Install Pivotal service definition and repo per HDB doc

Create staging dir:

mkdir /staging
chmod a+rx /staging

Copy hdb-2.0.0.0-22126.tar.gz and hdb-ambari-plugin-2.0.0-448.tar.gz to /staging
Setup HDB repo and Ambari service definition:

tar -xvzf /staging/hdb-2.0.0.0-*.tar.gz -C /staging/
tar -xvzf /staging/hdb-ambari-plugin-2.0.0-*.tar.gz -C /staging/  
yum install -y httpd
service httpd start
cd /staging/hdb*
./setup_repo.sh
cd /staging/hdb-ambari-plugin*
./setup_repo.sh  
 yum install -y hdb-ambari-plugin

At this point you should see a local repo up at http://node1/HDB/
Restart Ambari so it now recognizes Zeppelin, HAWQ, PXF services

service ambari-server restart
service ambari-agent restart

Confirm 4 agents were registered and agent is up

curl -u admin:admin -H  X-Requested-By:ambari http://localhost:8080/api/v1/hosts
service ambari-agent status

Deploy vanilla HDP + Zeppelin + HDB

Deploy cluster running latest HDP including Zeppelin, HAWQ, PXF. You can either:
- Option 1: login to Ambari UI and use Install Wizard. In this case:
  - You will need to set the 'HAWQ System User Password' to any value you like
  - Make sure to manually adjust the HDFS settings mentioned in HDB doc
  - Make sure that the port specified in 'HAWQ master port' (by default, 5432) is not in use on the host where you will install HAWQ master
    - If installing on single node or any other scenario where HAWQ master need to be installed on node where a postgres setup already exists (e.g. if installing HAWQ master on the same host where Ambari is installed) you will need to change the master port from default value (5432)
    - On single node setup, 'HAWQ standby master' will not be installed
  - Refer to HDB doc for full details
- OR
- Option 2: generate/deploy a customized blueprint using ambari-bootstrap that takes care of the HDFS configurations as below:

yum install -y python-argparse
cd
git clone https://github.com/seanorama/ambari-bootstrap.git

#decide which services to deploy and set the number of nodes in the cluster
export ambari_services="HDFS MAPREDUCE2 YARN ZOOKEEPER HIVE ZEPPELIN SPARKHAWQ PXF"
export host_count=4
 
cd ./ambari-bootstrap/deploy/

#add HDFS config customizations for HAWQ and any others you may want
cat << EOF > configuration-custom.json
{
  "configurations" : {
    "hdfs-site": {
        "dfs.allow.truncate": "true",
        "dfs.block.access.token.enable": "false",
        "dfs.block.local-path-access.user": "gpadmin",
        "dfs.client.read.shortcircuit": "true",
        "dfs.client.socket-timeout": "300000000",
        "dfs.client.use.legacy.blockreader.local": "false",
        "dfs.datanode.handler.count": "60",
        "dfs.datanode.socket.write.timeout": "7200000",                                
        "dfs.namenode.handler.count": "600",
        "dfs.support.append": "true"               
    },
    "hawq-env":{
        "hawq_password":"gpadmin"
      },
    "core-site": {
        "ipc.client.connection.maxidletime": "3600000",
        "ipc.client.connect.timeout": "300000",
        "ipc.server.listen.queue.size": "3300"
    }
  }
}
EOF

#optional - if you want to review the BP before deploying it
#export deploy=false
#./deploy-recommended-cluster.bash
#more temp*/blueprint.json

#generate BP including customizations and start cluster deployment
export deploy=true
./deploy-recommended-cluster.bash

This will kick off HDP cluster install, including Zeppelin, HAWQ and PXF. You can monitor it via Ambari at http://node1:8080

Configure HAWQ for Zeppelin

On HAWQ master node:
- SSH in
- connect to HAWQ
- create a new DB
- add a user for zeppelin
- give access to the DB to zeppelin user

su - gpadmin
source /usr/local/hawq/greenplum_path.sh
export PGPORT=5432
psql -d postgres

create database contoso;
CREATE USER zeppelin WITH PASSWORD 'zeppelin';
GRANT ALL PRIVILEGES ON DATABASE contoso to zeppelin;
\q

Note: you only need to set PGPORT if HAWQ master was not installed on default port (5432). If you specified a different port, you will need to set this accordingly.
On HAWQ master node, run below to add the IP of zeppelin node to HAWQ pg_hba.conf conf. This is done to allow Zeppelin to access HAWQ from a different node
- Make sure to replace 172.17.0.2 below with IP of host running Zeppelin

echo "host all all 172.17.0.2/32 trust" >> /data/hawq/master/pg_hba.conf

Restart HAWQ via Ambari

Configure Zeppelin for HAWQ

Open Zeppelin interpreter and scroll down to section for psql and make below changes to use zeppelin user to connect to contoso DB:
- postgresql.url = jdbc:postgresql://node3:5432/contoso
- postgresql.user = zeppelin
- postgresql.password = zeppelin

Run HAWQ queries via Zeppelin

Create a new note in Zeppelin with below cells to create/populate a test table and calculate avg of subset:

%psql.sql
create table tt (i int);
insert into tt select generate_series(1,1000000);

%psql.sql
select avg(i) from tt where i>5000;

abajwa-hw/HDB-install.md

Add HDB (HAWQ) to HDP 2.4.2 with Zeppelin

Install Ambari 2.2.2.0 and HDB service definitions

Deploy vanilla HDP + Zeppelin + HDB

Configure HAWQ for Zeppelin

Configure Zeppelin for HAWQ

Run HAWQ queries via Zeppelin