This installation of Cloudera is for development only, do not use it in production.
Download cloudera docker image from https://www.cloudera.com/downloads/quickstart_vms/5-13.html
See also. https://www.cloudera.com/documentation/enterprise/5-13-x/topics/quickstart_docker_container.html
Uncompress the file:
tar -xf cloudera-quickstart-vm-5.13.0-0-beta-docker.tar.gz
Import image inside your docker:
docker import cloudera-quickstart-vm-5.13.0-0-beta-docker.tar
Tag the image with a better name:
docker image tag IMAGE_HASH cloudera-5-13
Run the container:
docker run --name quickstart.cloudera \
--hostname=quickstart.cloudera \
-d --privileged=true \
-t -i cloudera-5-13 \
/usr/bin/docker-quickstart
(hostname
should not be changed, otherwise some services are not started correctly)
Attach to the container (to run command inside):
docker attach quickstart.cloudera
To detach the tty without exiting the shell, use the escape sequence Ctrl+p + Ctrl+q
IMPORTANT: For windows you should also expose one or more ports during docker run
commands by adding options -p 8888:8888 -p 7180:7180 -p 8181:80
(host_port:container_port). See below for available ports.
NOTE: It will require several minutes to start all services and start to respond to http ports.
If you receive clock sync problems, try manually running /etc/init.d/ntpd start
to sync time.
You can edit /usr/bin/docker-quickstart
to disable the auto start of some of the services that you don't use to reduce the memory consumption.
Consider adding hostname quickstart.cloudera
inside /etc/hosts
that point to your container IP for a more friendly usage. Get the container IP using:
docker inspect quickstart.cloudera | grep "\"IPAddress\""
IMPORTANT: For windows use your HOST ip instead of the container ip.
Then just browse to the requested service, for example http://quickstart.cloudera:7180/
to connect to cloudera manager.
Available services/ports:
- 8888 Hue
- 7180 Cloudera Manager
- 80 Tutorial
- 8983 SolR
- 8088 Hadoop MapReduce UI
- 11000 Oozie
- 9092 Kafka
- 2181 Zookeeper
- etc...
In this way you can connect to any service without exposing any ports.
Credentials used inside the Cloudera quick start:
- username: cloudera
- password: cloudera
For unknown reason sometime ntpd service is not started and you will see clock offset problems. To solve run /etc/init.d/ntpd stop
and /etc/init.d/ntpd start
inside the container.
Try Graceful shutdown using:
docker stop --time=60 quickstart.cloudera
Often after starting the image you should manually restart every Cloudera services using
curl -X POST -u "admin:admin" -i http://localhost:7180/api/v18/clusters/Cloudera%20QuickStart/commands/start
By default Cloudera container Java 7 (1.7). Here some guide to upgrade to Java 8:
https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_jdk_installation.html https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_cm_upgrading_to_jdk8.html
-
Download latest java 1.8 SDK, (this at the time of writing) (Cloudera suggest to use a specific revision, but seems to be not necessary).
-
extract it inside
/usr/java/jdk1.8.0_171
tar -xzf jdk-8u171-linux-x64.tar.gz mv jdk1.8.0_171 /usr/java/
-
Add/modify inside
/etc/profile
,/etc/default/cloudera-scm-server
and/etc/default/bigtop-utils
the following line:export JAVA_HOME=/usr/java/jdk1.8.0_171
-
Change sym link:
rm /usr/bin/java ln -s /usr/java/jdk1.8.0_171/jre/bin/java /usr/bin/java
-
Change java for Cloudera host:
- Open the Cloudera Manager Admin Console.
- In the main navigation bar, click the Hosts tab and optionally click a specific host link.
- Click the Configuration tab.
- Select Category > Advanced.
- Set the Java Home Directory property to the custom location.
- Click Save Changes.
-
Restart the docker
CDH 5.11 is required to install Spark 2.1
- Open Cloudera Manager
- Click on Parcels icon (present)
- Click to Configuration and add the following repositories:
http://archive.cloudera.com/cdh5/parcels/5.11/
, CDH 5.11 is required by spark 2.1
- Click Check for New Parcels
- Select CDH (5.11)
- Click Download, Distribute, Activate
- Redeploy client configuration using Cloudera Manager
- Restart the cluster/docker (check on CM if all is started correctly)
First downgrade to CDH 5.11 (see above).
- Open Cloudera Manager
- Click on Parcels icon (present)
- Click to Configuration and add the following repositories:
http://archive.cloudera.com/spark2/parcels/2.1.0.cloudera1/
. See also https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html for more info.
- Click Check for New Parcels
- Select Spark2 and click Download, Distributue, Activate
- Redeploy Client Configuration using Cloudera Manager
- Restart the cluster/docker
- Download the CDS from https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html#versions
- Put the jar inside
/opt/cloudera/csd
- Restart the Cloudera Manager Server
service cloudera-scm-server restart
- Log into the Cloudera Manager Admin Console and restart the Cloudera Management Service
- Add the Spark 2 service to your cluster.
- In step #1, select a dependency option:
- HDFS, YARN, ZooKeeper: Choose this option if you do not need access to a Hive service.
- HDFS, Hive, YARN, ZooKeeper: Hive is an optional dependency for the Spark service. If you have a Hive service and want to access Hive tables from your Spark applications, choose this option to include Hive as a dependency and have the Hive client configurations always available to Spark applications.
- In step #2, when customizing the role assignments for CDS Powered By Apache Spark, add a gateway role to every host.
- Note that the History Server port is 18089 instead of the usual 18088.
- Complete the steps to add the Spark 2 service.
- In step #1, select a dependency option:
IMPORTANT: You should use spark2-shell
and spark2-submit
to use the new installed spark 2.
See also:
- https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
- https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_parcels.html#concept_vwq_421_yk__section_cnx_b3y_bm
- https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_migrating_packages_to_parcels.html#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--772e__section_xcy_vzv_k4
- https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_parcels.html#concept_vwq_421_yk
Cloudera Kafka 3.1 will install kafka 1.0.1.
See https://www.cloudera.com/documentation/kafka/1-3-x/topics/kafka_installing.html
- Download Kafka Parcel
- Add the service from Cloudera Manager
- Using Cloudera Manager-> Kafka -> Configurations
- Increase Broker Heap Size
- Change default replication factor from 3 to 1 using
- From terminal inside
/home/cloudera
runsudo ./kerberos
(the first time I have received an error, but running it a second time it will succeeded) - Run the CM Kerberos wizard using the information output by the script (Administration->Security->Enable Kerberos).
- Select "Manage krb5.conf through Cloudera Manager"
Then, it will prompt you for the following details (accept defaults if not specified here):
KDC Type: MIT KDC
KDC Server Host: quickstart.cloudera
Kerberos Security Realm: CLOUDERA
Later, it will prompt you for KDC account manager credentials:
Username: cloudera-scm/admin (@ CLOUDERA)
Password: cloudera
Finally select "Yes, I am ready to restart the cluster now."
To connect a client host to the Kerberos Cloudera server install apt-get install krb5-user
and follow instructions by entering required info (ensure that you can resolve the server name, quickstart.cloudera
in this case).
Usually it is recomanded to copy /etc/krb5.conf
from server to all clients.
See also:
- https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_sg_intro_kerb.html
- http://csetutorials.com/setup-kerberos-ubuntu.html
The goal is to login to hbase shell using the hbase principal, from there grant access to another principal, in this case cloudera-scm/admin
.
We can use the keytab file already created internally by cloudera inside /var/run/cloudera-scm-agent/process/
, something similar:
kinit -k -t /var/run/cloudera-scm-agent/process/180-hbase-MASTER/hbase.keytab hbase/quickstart.cloudera@CLOUDERA
Check with ls /var/run/cloudera-scm-agent/process
your exact folder name.
From now you are authenticated as hbase
.
Connect to hbase shell:
hbase shell
Grant admin user access to HBase:
Important:
If you are using Kerberos principal names when setting ACLs for users, Hadoop uses only the first part (short) of the Kerberos principal when converting it to the username. Hence, for the principal ann/[email protected]
, HBase ACLs should only be set for user ann
.
grant 'cloudera-scm', 'RWCA'
exit
Test by authenticate again using 'cloudera-scm/admin@CLOUDERA':
kinit cloudera-scm/admin@CLOUDERA
Thanks a lot @davideicardi!