tags |
---|
Kubernetes, Kubernetes-dashboard, docker, nvidia driver T4 |
Kubernetes Installtion with Docker Installation, NVIDIA T4 Driver and Kubernetes Dashboard Installation
Hardware System: SCB 1921B-AA1 OS: Ubuntu 18.04 LTS, kernel 5.4.0-42 GPU: NVIDIA GEFORCE GTX 1050
Install the prerequsities:
$ sudo apt-get update
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
Add gpg key:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88
Make sure result is like this:
9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88
Add repository:
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
$ sudo apt-get update
Docker Installtion:
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
Check the Docker Version:
$ docker version
download .run installing driver file from NVIDIA website:
Note: Select your model of Nvida and OS before Downlaoding: https://www.nvidia.com/Download/index.aspx?lang=en-us
before installtion blacklist nouveau driver
create a file:
$ sudo vim /etc/modprobe.d/blacklist-nouveau.conf
in blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
save the file and exit
final:
$ sudo update-initramfs -u
$ sudo reboot
After restarting, we can use the following command to confirm whether Nouveau has stopped working:
lsmod | grep nouveau
If nothing is printed, then congratulations! You have disabled Nouveau's kernel driver. Now we can try again to see if we can install Nvidia's official driver
make it excutable
$ chmod +x NVIDIA-Linux-x86_64-460.32.03.run //make it executable
install gcc and make
$ sudo apt-get install gcc make
installing nvidia driver
$ ./NVIDIA-Linux-x86_64-460.32.03.run //name of file may be different, depends on the version which you download from
in .run, there're some warnings, just choose continue installing item and finish the installing procedure
and
$ reboot
after reboot, press nvidia-smi to see the driver is OK or not
$ nvidia-smi
Do the SWAPOFF:
$ sudo su
$ swapoff –a
**Optional Step: $ nano /etc/fstab add # into the following line like this: #UUID=45fc9fe6-6500-4bca-864e-1effad4764b3 and save **
Install the prerequsities:
$ sudo apt-get update
$ sudo apt-get install -y apt-transport-https ca-certificates curl
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
Adding the kubernetes repository into the update list:
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
update:
$ sudo apt-get update
install kubeadm kubectl kubelet:
$ apt-get install -y kubelet kubeadm kubectl
check the versions
$ docker -v
Docker version 17.03.2-ce, build f5ec1e2
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:43:08Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
$ kubelet --version
Kubernetes v1.12.1
Set up the environment driver in config file:
$ gedit /etc/systemd/system/kubelet.service.d/10.kubeadm.conf
Start cluster:
$ kubeadm init --pod-network-cidr=10.244.0.0/16
*###This will take 3-4 mintues##*
Run the following commands as non-root user:
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
cluster information:
$ kubectl cluster-info
Kubernetes master is running at https://10.132.0.2:6443
KubeDNS is running at https://10.132.0.2:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kube-master-1 NotReady master 4m26s v1.12.1 10.132.0.2 <none> Ubuntu 16.04.5 LTS 4.15.0-1021-gcp docker://17.3.2
$ kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-576cbf47c7-lw7jv 0/1 Pending 0 4m55s
kube-system pod/coredns-576cbf47c7-ncx8w 0/1 Pending 0 4m55s
kube-system pod/etcd-kube-master-1 1/1 Running 0 4m23s
kube-system pod/kube-apiserver-kube-master-1 1/1 Running 0 3m59s
kube-system pod/kube-controller-manager-kube-master-1 1/1 Running 0 4m17s
kube-system pod/kube-proxy-bwrwh 1/1 Running 0 4m55s
kube-system pod/kube-scheduler-kube-master-1 1/1 Running 0 4m10s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5m15s
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 5m9s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 <none> 5m8s
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2 2 2 0 5m9s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-576cbf47c7 2 2 0 4m56s
Install CNI (I prefer weave)::
$kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)”
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
.
.
.
Confirm with those commands:
$ kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kube-master-1 Ready master 9m15s v1.12.1 10.132.0.2 <none> Ubuntu 16.04.5 LTS 4.15.0-1021-gcp docker://17.3.2
Deploy the Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml
Now we have to creat the admin-user for access the dashboard
kubectl apply -f https://gist.githubusercontent.com/chukaofili/9e94d966e73566eba5abdca7ccb067e6/raw/0f17cd37d2932fb4c3a2e7f4434d08bc64432090/k8s-dashboard-admin-user.yaml
copy the key and used for login
sign in and see the GUI
## source:
1. https://clay-atlas.com/blog/2020/02/11/linux-chinese-note-nvidia-driver-nouveau-kernel/
2. https://askubuntu.com/questions/841876/how-to-disable-nouveau-kernel-driver
3. https://docs.docker.com/engine/install/ubuntu/
4. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
5. https://stackoverflow.com/questions/52720380/kubernetes-api-server-is-not-starting-on-a-single-kubeadm-cluster