Gourav J. Shah gouravjshah

switch to instavote namespace

kubectl config set-context --current --namespace=instavote

helm uninstall -n dev instavote 
kubectl delete deploy vote redis db result worker  -n instavote 
kubectl delete svc vote redis db result -n instavote

Event Driven Auto Scaling with KEDA

Configure Prometheus

Install Prometheus with Grafana with helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Create a namespace

kubectl create namespace instavote 
kubectl config set-context --current --namespace=instavote

git clone https://github.com/schoolofdevops/instavote-kustomize.git

rag/build_index.py

# rag/build_index.py
import argparse, json
from pathlib import Path
from typing import Dict, Any, Iterable, Tuple, List

# ---------- common snippet renderers ----------

What’s happening

That log line:

Overriding ... dispatch key: AutocastCPU ... new kernel: ... ipex-cpu ...
INFO ... Automatically detected platform cpu.

means IPEX’s autocast kernels replaced the default ones. With --dtype=float16 on CPU, PyTorch/ipex either upcasts or hits slow/non-vectorized code paths and can “hang” at model load/compile.

Local registry for KIND

We’ll run a registry container named kind-registry on port 5001 and attach it to the kind network so nodes can pull via kind-registry:5001/....

`scripts/start_local_registry.sh`

#!/usr/bin/env bash
set -euo pipefail

	---
	apiVersion: networking.k8s.io/v1
	kind: Ingress
	metadata:
	name: vote
	namespace: instavote
	spec:
	ingressClassName: nginx
	rules:
	- host: vote.example.com

	apiVersion: apps/v1
	kind: Deployment
	metadata:
	name: vote
	namespace: default
	spec:
	replicas: 2
	selector:
	matchLabels:
	app: vote

	{
	"annotations": {
	"list": [
	{
	"builtIn": 1,
	"datasource": {
	"type": "grafana",
	"uid": "-- Grafana --"
	},
	"enable": true,

	FROM openeuler/vllm-cpu:0.9.1-oe2403lts

	# Patch the cpu_worker.py to handle zero NUMA nodes
	RUN sed -i 's/cpu_count_per_numa = cpu_count \/\/ numa_size/cpu_count_per_numa = cpu_count \/\/ numa_size if numa_size > 0 else cpu_count/g' \
	/workspace/vllm/vllm/worker/cpu_worker.py

	ENV VLLM_TARGET_DEVICE=cpu \
	VLLM_CPU_KVCACHE_SPACE=1 \
	OMP_NUM_THREADS=2 \
	OPENBLAS_NUM_THREADS=1 \