Goals:
- Install a 4 node cluster running HDP 2.4.2 using Ambari 2.2.2.0 (including Zeppelin and HDB) using Ambari bootstrap via blueprints or Ambari install wizard
- Configure HAWQ for Zeppelin
- Configure Zeppelin for HAWQ
- Run HAWQ queries via Zeppelin
Notes:
- HDB managed via Ambari is only supported from Ambri 2.2.2.0 onwards. Do not attempt using older versions of Ambari
-
Bring up 4 VMs imaged with RHEL/CentOS 6.x (e.g. node1-4 in this case)
-
On non-ambari nodes (nodes2-4 in this case), install ambari-agents and point them to ambari node (e.g. node1 in this case)
export ambari_server=node1
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh
- On Ambari node (e.g. node1), install ambari-server
export install_ambari_server=true
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh
- Install Zeppelin service definition
yum install -y git
git clone https://github.com/hortonworks-gallery/ambari-zeppelin-service.git /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ZEPPELIN
sed -i.bak '/dependencies for all/a \ "ZEPPELIN_MASTER-START": ["NAMENODE-START", "DATANODE-START"],' /var/lib/ambari-server/resources/stacks/HDP/2.4/role_command_order.json
-
Install Pivotal service definition and repo per HDB doc
- Create staging dir:
mkdir /staging chmod a+rx /staging
-
Copy hdb-2.0.0.0-22126.tar.gz and hdb-ambari-plugin-2.0.0-448.tar.gz to /staging
-
Setup HDB repo and Ambari service definition:
tar -xvzf /staging/hdb-2.0.0.0-*.tar.gz -C /staging/ tar -xvzf /staging/hdb-ambari-plugin-2.0.0-*.tar.gz -C /staging/ yum install -y httpd service httpd start cd /staging/hdb* ./setup_repo.sh cd /staging/hdb-ambari-plugin* ./setup_repo.sh yum install -y hdb-ambari-plugin
-
At this point you should see a local repo up at http://node1/HDB/
-
Restart Ambari so it now recognizes Zeppelin, HAWQ, PXF services
service ambari-server restart service ambari-agent restart
-
Confirm 4 agents were registered and agent is up
curl -u admin:admin -H X-Requested-By:ambari http://localhost:8080/api/v1/hosts
service ambari-agent status
- Deploy cluster running latest HDP including Zeppelin, HAWQ, PXF. You can either:
- Option 1: login to Ambari UI and use Install Wizard. In this case:
- You will need to set the 'HAWQ System User Password' to any value you like
- Make sure to manually adjust the HDFS settings mentioned in HDB doc
- Make sure that the port specified in 'HAWQ master port' (by default, 5432) is not in use on the host where you will install HAWQ master
- If installing on single node or any other scenario where HAWQ master need to be installed on node where a postgres setup already exists (e.g. if installing HAWQ master on the same host where Ambari is installed) you will need to change the master port from default value (5432)
- On single node setup, 'HAWQ standby master' will not be installed
- Refer to HDB doc for full details
- OR
- Option 2: generate/deploy a customized blueprint using ambari-bootstrap that takes care of the HDFS configurations as below:
- Option 1: login to Ambari UI and use Install Wizard. In this case:
yum install -y python-argparse
cd
git clone https://github.com/seanorama/ambari-bootstrap.git
#decide which services to deploy and set the number of nodes in the cluster
export ambari_services="HDFS MAPREDUCE2 YARN ZOOKEEPER HIVE ZEPPELIN SPARKHAWQ PXF"
export host_count=4
cd ./ambari-bootstrap/deploy/
#add HDFS config customizations for HAWQ and any others you may want
cat << EOF > configuration-custom.json
{
"configurations" : {
"hdfs-site": {
"dfs.allow.truncate": "true",
"dfs.block.access.token.enable": "false",
"dfs.block.local-path-access.user": "gpadmin",
"dfs.client.read.shortcircuit": "true",
"dfs.client.socket-timeout": "300000000",
"dfs.client.use.legacy.blockreader.local": "false",
"dfs.datanode.handler.count": "60",
"dfs.datanode.socket.write.timeout": "7200000",
"dfs.namenode.handler.count": "600",
"dfs.support.append": "true"
},
"hawq-env":{
"hawq_password":"gpadmin"
},
"core-site": {
"ipc.client.connection.maxidletime": "3600000",
"ipc.client.connect.timeout": "300000",
"ipc.server.listen.queue.size": "3300"
}
}
}
EOF
#optional - if you want to review the BP before deploying it
#export deploy=false
#./deploy-recommended-cluster.bash
#more temp*/blueprint.json
#generate BP including customizations and start cluster deployment
export deploy=true
./deploy-recommended-cluster.bash
- This will kick off HDP cluster install, including Zeppelin, HAWQ and PXF. You can monitor it via Ambari at http://node1:8080
- On HAWQ master node:
- SSH in
- connect to HAWQ
- create a new DB
- add a user for zeppelin
- give access to the DB to zeppelin user
su - gpadmin
source /usr/local/hawq/greenplum_path.sh
export PGPORT=5432
psql -d postgres
create database contoso;
CREATE USER zeppelin WITH PASSWORD 'zeppelin';
GRANT ALL PRIVILEGES ON DATABASE contoso to zeppelin;
\q
-
Note: you only need to set PGPORT if HAWQ master was not installed on default port (5432). If you specified a different port, you will need to set this accordingly.
-
On HAWQ master node, run below to add the IP of zeppelin node to HAWQ pg_hba.conf conf. This is done to allow Zeppelin to access HAWQ from a different node
- Make sure to replace 172.17.0.2 below with IP of host running Zeppelin
echo "host all all 172.17.0.2/32 trust" >> /data/hawq/master/pg_hba.conf
- Restart HAWQ via Ambari
- Open Zeppelin interpreter and scroll down to section for psql and make below changes to use zeppelin user to connect to contoso DB:
- postgresql.url = jdbc:postgresql://node3:5432/contoso
- postgresql.user = zeppelin
- postgresql.password = zeppelin
- Create a new note in Zeppelin with below cells to create/populate a test table and calculate avg of subset:
%psql.sql
create table tt (i int);
insert into tt select generate_series(1,1000000);
%psql.sql
select avg(i) from tt where i>5000;