-
Kerberize the cluster
-
Enable CGroup from yarn and restart
To enable cgroups on an Ambari cluster, select YARN > Configs on the Ambari dashboard, then click CPU Isolation under CPU. Click Save, then restart all cluster components that require a restart
I got mount failure error: /sys/fs/cgroup/cpu/yarn Solution , run below command on all node manager hosts:
sudo mkdir /sys/fs/cgroup/cpu/yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/cpu/yarn
- Install docker on all node manager hosts
yum install docker
systemctl start docker
systemctl status docker
systemctl enable docker
sudo systemctl edit --full docker.service
This brings up the whole configuration for editing. Just replace the systemd
string to cgroupfs
. Save the
changes and restart both the systemd and Docker daemon:
sudo systemctl daemon-reload
sudo systemctl restart docker.service
(Else you will get Error while submitting spark application --> error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". Ref: https://issues.apache.org/jira/browse/YARN-9660)
docker ps
sudo groupadd docker
sudo usermod -aG docker spark
su spark
docker ps
3.1. Enable docker in Ambari Yarn -> Advanced container-executor
docker.allowed.ro-mounts=/sys/fs/cgroup,/etc/passwd,/etc/krb5.conf,{{nm_local_dirs}},{{docker_allowed_ro_mounts}}
min_user_id=50
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=nobody
- whitelist antaladam
Ambari Yarn ->Docker Trusted Registries add local,centos,hortonworks,antaladam eg: Docker Trusted Registries local,centos,hortonworks,antaladam
Note: antaladam/python2:v1 is the docker image with python and numpy
- kinit as spark user and run
pyspark --master yarn --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=antaladam/python2:v1 --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/krb5.conf:/etc/krb5.conf:ro,/etc/passwd:/etc/passwd:ro
def inside(p):
import numpy as np
return np.cos(np.pi * p / 2) > 0.5
inside(0) # to varify that numpy not in driver, only avaialble in docker image in executor, so the error below
num_samples = 100000
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
# Run code, which using numpy in docker in executor
print(count)
-
First, enure that livy intepretter is running fine without docker containers(default yarn containers).
-
Then add below configurations in livy interpreter setting to run AM and executor using Docker containers
livy.spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
livy.spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=antaladam/python2:v1
livy.spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/krb5.conf:/etc/krb5.conf:ro,/etc/passwd:/etc/passwd:ro
livy.spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
livy.spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=antaladam/python2:v1
livy.spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/krb5.conf:/etc/krb5.conf:ro,/etc/passwd:/etc/passwd:ro
- Add a new zeppelin notebook
%livy2.pyspark
def inside(p):
import numpy as np
return np.cos(np.pi * p / 2) > 0.5
inside(0) # Test method
num_samples = 100000
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
# Run code, which using numpy in docker in executor
print(count)
Ref:
https://docs.cloudera.com/runtime/7.2.0/yarn-troubleshooting/topics/yarn-troubleshooting-docker.html
https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/DockerContainers.html
Todo : change yarn.nodemanager.linux-container-executor.group to the custom group save and restart