To verify Raw Mode can work with cluster which already have ServiceMesh/Serverless installed. Test object is a very simple mnist-8 OCI with sample data as input
This step is only to setup an cluster which has such resource alread in place, to use Raw Mode this is not needed at all.
- install all 3 dependent opreators: ServicMesh v2/v3, Serverless 1.y and Authorino 0.16.z
- install ODH/RHOAI
- create default DSCI if it is not done by Operator yet, esp in ODH case
- create DSC with
.spec.components.kserve.defaultDeploymentMode: RawDeployment
can keep.spec.component.kserve.serving.managementState: Managed
since we want to verifiy existing Serverless resource won't cause conflict.
Result: it should work on the installation level -- as Operator will not complain such config.
Here take project wen-raw as example
- oc create namespace wen-raw
- oc apply -f permission.yaml
>cat permission.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: wen-raw-sa
namespace: wen-raw
secrets: []
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: wen-raw-role
namespace: wen-raw
rules:
- apiGroups:
- 'serving.kserve.io'
resources:
- inferenceservices
resourceNames:
- mnist # specifcy isvc name
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: wen-raw-rb-view
namespace: wen-raw
subjects:
- kind: ServiceAccount
name: wen-raw-sa
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: wen-raw-role
---
apiVersion: v1
kind: Secret
metadata:
name: wen-raw-sa-token
namespace: wen-raw
annotations:
kubernetes.io/service-account.name: wen-raw-sa # match serviceaccount we created above
type: kubernetes.io/service-account-token
- create SR CR mnist example is using a simple SR from ServingRuntime Template generated by Kserve
oc apply -f sr.yaml
>cat sr.yaml
---
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: ovms-runtime
namespace: wen-raw
spec:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8888"
containers:
- args:
- --model_name={{.Name}}
- --port=8001
- --rest_port=8888
- --model_path=/mnt/models
- --file_system_poll_wait_seconds=0
- --grpc_bind_address=0.0.0.0
- --rest_bind_address=0.0.0.0
- --target_device=AUTO
- --metrics_enable
image: quay.io/modh/openvino_model_server@sha256:6c7795279f9075bebfcd9aecbb4a4ce4177eec41fb3f3e1f1079ce6309b7ae45
name: kserve-container
ports:
- containerPort: 8888
protocol: TCP
multiModel: false
protocolVersions:
- v2
- grpc-v2
supportedModelFormats:
- autoSelect: true
name: openvino_ir
version: opset13
- name: onnx
version: "1"
- autoSelect: true
name: tensorflow
version: "1"
- autoSelect: true
name: tensorflow
version: "2"
- autoSelect: true
name: paddle
version: "2"
- autoSelect: true
name: pytorch
version: "2"
- create ISVC CR mnist
- Enable: external route
- Require: authentication
oc apply -f isvc.yaml
>cat isvc.yaml
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: mnist
namespace: wen-raw
annotations:
serving.kserve.io/deploymentMode: RawDeployment # can skip setting it here if DSC has Raw as default mode
labels:
networking.kserve.io/visibility: 'exposed' # to enable external route
security.opendatahub.io/enable-auth: 'true' # set Redirect as termination in route
spec:
predictor:
maxReplicas: 1 # for simple test purpose only
minReplicas: 1
serviceAccountName: wen-raw-sa # match SA we created above
model:
storageUri: oci://quay.io/wenzhou/model:mnist-8
modelFormat:
name: onnx
version: '1'
name: ''
resources:
limits:
cpu: '2'
memory: 8Gi
requests:
cpu: '1'
memory: 4Gi
runtime: ovms-runtime # match SR CR we created above
export model=mnist # this is the name of model which is the same as the isvc created above
export host=`oc get route mnist -n wen-raw -o jsonpath='{.status.ingress[0].host}'`
export token=`oc get secret wen-raw-sa-token -n wen-raw -o jsonpath='{.data.token}'`
>curl -iLk ${host}/v2/models/${model}/infer -d @wen-raw-onnx.json -H "Authorization: Bearer ${token}"