Skip to content

Instantly share code, notes, and snippets.

@israel-hdez
Created May 14, 2024 20:47
Show Gist options
  • Save israel-hdez/c5f3354582cc3210bb2f984e0e10826a to your computer and use it in GitHub Desktop.
Save israel-hdez/c5f3354582cc3210bb2f984e0e10826a to your computer and use it in GitHub Desktop.
ODH/KServe blocked internal hostnames reproducer
# For this reproducer, CRC (aka OpenShift Local) was used.
# Note: This script is NOT prepared nor tested to be run at the CLI.
# Instead, commands were copied & pasted. This would be the recommended way to try.
#
# 1. Install ODH 2.11.
# 1.1 Install dependencies - OSSM operator
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: servicemeshoperator
namespace: openshift-operators
spec:
channel: stable
installPlanApproval: Automatic
name: servicemeshoperator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
# 1.2 Install dependencies - Authorino operator
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: authorino-operator
namespace: openshift-operators
spec:
channel: stable
installPlanApproval: Automatic
name: authorino-operator
source: community-operators
sourceNamespace: openshift-marketplace
EOF
# 1.3 Install dependencies - Serverless operator
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: openshift-serverless
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: serverless-operators
namespace: openshift-serverless
spec: {}
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: serverless-operator
namespace: openshift-serverless
spec:
channel: stable
name: serverless-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
---
EOF
# 1.4 Install ODH operator
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: opendatahub-operator
namespace: openshift-operators
spec:
channel: fast
installPlanApproval: Automatic
name: opendatahub-operator
source: community-operators
sourceNamespace: openshift-marketplace
startingCSV: opendatahub-operator.v2.11.1
EOF
# 1.5 Initialize ODH platform and wait for it to be reconciled
oc apply -f https://raw.githubusercontent.com/opendatahub-io/opendatahub-operator/v2.11.1/config/samples/dscinitialization_v1_dscinitialization.yaml
oc wait --for=jsonpath='{.status.phase}'=Ready dsci --all --timeout=300s
# 1.6 Install KServe
cat <<EOF | oc apply -f -
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
name: default-dsc
spec:
components:
codeflare:
managementState: Removed
dashboard:
managementState: Removed
datasciencepipelines:
managementState: Removed
kserve:
managementState: Managed
serving:
managementState: Managed
modelmeshserving:
managementState: Removed
kueue:
managementState: Removed
trainingoperator:
managementState: Removed
ray:
managementState: Removed
workbenches:
managementState: Removed
trustyai:
managementState: Removed
modelregistry:
managementState: Removed
EOF
oc wait --for=jsonpath='{.status.phase}'=Ready dsc --all --timeout=300s
# 2. Deploy a sample model
# 2.1 Create a namespace to deploy the sample model
oc new-project kserve-model
# 2.2 Create the ServingRuntime
curl -s https://raw.githubusercontent.com/opendatahub-io/kserve/master/config/runtimes/kserve-sklearnserver.yaml | \
sed 's/ClusterServingRuntime/ServingRuntime/' | \
sed "s|kserve-sklearnserver:replace|docker.io/kserve/sklearnserver:latest|" | \
oc apply -n kserve-model -f -
# 2.3 Create the InferenceService (i.e. deploy the model)
cat <<EOF | oc apply -n kserve-model -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-v2-iris"
annotations:
serving.knative.openshift.io/enablePassthrough: "true"
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
spec:
predictor:
model:
modelFormat:
name: sklearn
protocolVersion: v2
runtime: kserve-sklearnserver
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF
# 3. Deploy testing workloads
# 3.1 Deploy a workload that does _not_ belong to the mesh
oc run curl-outside-mesh -n kserve-model --image=quay.io/curl/curl:latest --command -- sleep infinity
# 3.2 Deploy a workload that belongs to the mesh
oc run curl-inside-mesh -n kserve-model --image=quay.io/curl/curl:latest --labels='sidecar.istio.io/inject=true' --command -- sleep infinity
# 4. Test requests to the deployed model pod
# 4.1 Get the public URL of the model (ISVC) pod and do a simple request from outside the cluster -- works
MODEL_ENDPOINT=$(kubectl get inferenceservice sklearn-v2-iris -o jsonpath='{.status.url}')
curl -k $MODEL_ENDPOINT
# 4.1 Get the public URL of the model (KSVC) pod and do a simple request from outside the cluster -- works
KSVC_ENDPOINT=$(kubectl get ksvc sklearn-v2-iris-predictor -o jsonpath='{.status.url}')
curl -k $KSVC_ENDPOINT
# 4.2 Do a request to the KServe name from the workload that is inside the mesh --- it does not work
oc exec curl-inside-mesh -- curl -s http://sklearn-v2-iris.kserve-model.svc.cluster.local
# 4.3 Do a request to the KServe name from the workload that is _not_ in the mesh --- it does not work (also using HTTPS won't work)
oc exec curl-outside-mesh -- curl -s http://sklearn-v2-iris.kserve-model.svc.cluster.local
# 4.4 Try requests to the Knative names
oc exec curl-inside-mesh -- curl -s http://sklearn-v2-iris-predictor.kserve-model.svc.cluster.local # Works
oc exec curl-outside-mesh -- curl -s http://sklearn-v2-iris-predictor.kserve-model.svc.cluster.local # Doesn't work
# 5. Re-try with plain text HTTP on the Gateway
# 5.1 Patch Knative Local Gateway to use plain text HTTP
oc patch gateway --type=merge -n knative-serving knative-local-gateway --patch-file <(echo "
spec:
servers:
- hosts:
- '*.svc.cluster.local'
port:
name: http
number: 8081
protocol: HTTP
")
# 5.2 Re-try requests
curl -k $MODEL_ENDPOINT # Public hostname (KServe) -- does not work
oc exec curl-inside-mesh -- curl -s http://sklearn-v2-iris.kserve-model.svc.cluster.local # KServe name - in mesh - does not work
oc exec curl-outside-mesh -- curl -s http://sklearn-v2-iris.kserve-model.svc.cluster.local # KServe name - outside mesh - does not work
curl -k $KSVC_ENDPOINT # Public hostname (Knative) -- works
oc exec curl-inside-mesh -- curl -s http://sklearn-v2-iris-predictor.kserve-model.svc.cluster.local # Knative name - in mesh - works
oc exec curl-outside-mesh -- curl -s http://sklearn-v2-iris-predictor.kserve-model.svc.cluster.local # Knative name - outside mesh - works
@ReToCode
Copy link

ReToCode commented May 15, 2024

Thanks for this, here my comments:

  1. https://gist.github.com/israel-hdez/c5f3354582cc3210bb2f984e0e10826a#file-reproducer-script-sh-L164 -> this will work once you add the mesh VirtualService for the KServe URL:
cat <<EOF | oc apply -n kserve-model -f - 
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  annotations:
    networking.knative.dev/ingress.class: istio.ingress.networking.knative.dev
    serving.knative.openshift.io/enablePassthrough: "true"
  name: sklearn-v2-iris-mesh
  namespace: kserve-model
spec:
  gateways:
  - mesh
  hosts:
  - sklearn-v2-iris.kserve-model.svc.cluster.local
  - sklearn-v2-iris-kserve-model.apps.sno.codemint.ch
  http:
  - headers:
      request:
        set:
          Host: sklearn-v2-iris-predictor.kserve-model.svc.cluster.local
    match:
    - authority:
        regex: ^sklearn-v2-iris\.kserve-model(\.svc(\.cluster\.local)?)?(?::\d{1,5})?$
      gateways:
      - mesh
    - authority:
       regex: ^sklearn-v2-iris-kserve-model\.apps\.sno\.codemint\.ch(?::\d{1,5})?$
      gateways:
      - mesh
    route:
    - destination:
        host: knative-local-gateway.istio-system.svc.cluster.local
        port:
          number: 80
      weight: 100
EOF
oc exec curl-inside-mesh -- curl -s http://sklearn-v2-iris.kserve-model.svc.cluster.local
{"status":"alive"}% 
  1. The problem with https://gist.github.com/israel-hdez/c5f3354582cc3210bb2f984e0e10826a#file-reproducer-script-sh-L188 is, that when setting the SMCP to mtls: true you get this:
oc get destinationrules -n istio-system default -o yaml 

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
...
spec:
  host: '*.cluster.local'
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

This forces any traffic in an istio envoy to *.cluster.local to use mTLS, as we do the loop for the KServe ingress from the "public" gateway to the "local" gateway, istio will use mTLS here, but as the local gateway is now http, this fails with the TLS error you see. We opened issues about this in the past (https://issues.redhat.com/browse/OSSM-4194, https://issues.redhat.com/browse/OSSM-1716, https://issues.redhat.com/browse/OSSM-1047) One option is to disable mTLS in the SMCP and use PeerAuthentication to achieve mTLS in the mesh (see JIRAs), or the following:

cat <<EOF | oc apply -f - 
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: local-gateway-disable-mtls
  namespace: istio-system
spec:
  host: 'knative-local-gateway.istio-system.svc.cluster.local'
  trafficPolicy:
    tls:
      mode: DISABLE
EOF

curl -k $MODEL_ENDPOINT
{"status":"alive"}% 
oc exec curl-inside-mesh -- curl -s http://sklearn-v2-iris.kserve-model.svc.cluster.local
{"status":"alive"}% 
oc exec curl-outside-mesh -- curl -s http://sklearn-v2-iris.kserve-model.svc.cluster.local
{"status":"alive"}% 

@israel-hdez
Copy link
Author

So, about (1), we are aware of that and we would fix it somehow in ODH stack.

About (2)... Then, we just need the DestinationRule, right?
That looks like progress.... But then we are limited to plain HTTP. I'm not sure if that's acceptable, even if we are dealing with a "loop".

@ReToCode
Copy link

Yes, this solution only fixes the http stuff. But it will work with https as well once that feature comes along.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment