Skip to content

Instantly share code, notes, and snippets.

@srampal
Last active May 21, 2026 22:07
Show Gist options
  • Select an option

  • Save srampal/8f577e8d999da25b33ac0a821bdfc9be to your computer and use it in GitHub Desktop.

Select an option

Save srampal/8f577e8d999da25b33ac0a821bdfc9be to your computer and use it in GitHub Desktop.

Red Hat Summit 2026 AI Gateways + MaaS

Demo script for session at Red Hat Summit 2026

Slide deck is here

Demo scripts and commands

Prerequisites

  1. MaaS is deployed. Payload processing is deployed

  2. Apply MaaS CRs from local files.

cd <your-path>/models-as-a-service

kubectl create namespace llm --dry-run=client -o yaml | kubectl apply -f -
kustomize build docs/samples/maas-system | kubectl apply -f -
  1. Grab the MaaS host.
MHOST="https://maas.$(kubectl get ingresses.config.openshift.io cluster \
 -o jsonpath='{.spec.domain}')"

Manual Test - Internal models

List all accessible models

  • Expected: 200 with the free model
curl -sSk ${MHOST}/maas-api/v1/models \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(oc whoami -t)" | jq -r .

Mint API key

  • Expected: sk-oai-xxxxxxxxxx
API_KEY=$(curl -sSk \
  -H "Authorization: Bearer $(oc whoami -t)" \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "name": "my-permanent-key",
    "description": "Production API key for my application",
    "subscription": "simulator-subscription"
  }' \
  "${MHOST}/maas-api/v1/api-keys" | jq -r .key)
echo $API_KEY

Look up subscription

  • Expected: 200 with the free subscription
curl -X POST "${MHOST}/maas-api/internal/v1/subscriptions/select" \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "username": "<your-username>",
    "groups": ["system:authenticated"],
    "requestedSubscription": "simulator-subscription"
  }'

Test auth

  • Case 1: free model, no auth provided - expected 401
  • Case 2: free model, free API key provided - expected 200
  • Case 3: premium model, free API key provided - expected 403
curl -sSk -o /dev/null -w "%{http_code}\n" \
  "${MHOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" \
  -H "Content-Type: application/json" -d '{
    "model":"facebook/opt-125m",
    "messages":[{"role":"user","content":"Hi"}],
    "max_tokens":5}'
curl -sSk -o /dev/null -w "%{http_code}\n" \
  "${MHOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" -d '{
    "model":"facebook/opt-125m",
    "messages":[{"role":"user","content":"Hi"}],
    "max_tokens":5}'
curl -sSk -o /dev/null -w "%{http_code}\n" \
  "${MHOST}/llm/premium-simulated-simulated-premium/v1/chat/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" -d '{
    "model":"facebook/opt-125m",
    "messages":[{"role":"user","content":"Hi"}],
    "max_tokens":5}'

Test token rate limit

  • Expected: a few 200s followed by 429s
for i in {1..16}; do curl -sSk -o /dev/null -w "%{http_code}\n" \
  "${MHOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" -d '{
    "model":"facebook/opt-125m",
    "messages":[{"role":"user","content":"Hi"}],
    "max_tokens":50}'; done

Manual Test - External models

Prerequisites

  1. API keys created and available for all external model providers of interest (openai, anthropic, aws bedrock etc)
  2. Payload processing module deployed (handled by RHOAI 3.4 installer)

Create secret with the OpenAI key

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: openai-api-key
  namespace: llm
  labels:
    inference.networking.k8s.io/bbr-managed: "true"
type: Opaque
stringData:
  api-key: "sk-proj-..."
EOF

Create “ExternalModel” and “MaasModelRef” CRs

cat <<EOF | kubectl apply -f -
apiVersion: maas.opendatahub.io/v1alpha1
kind: ExternalModel
metadata:
  name: gpt-4o
  namespace: llm
spec:
  provider: openai
  targetModel: gpt-4o
  endpoint: api.openai.com
  credentialRef:
    name: openai-api-key
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSModelRef
metadata:
  name: gpt-4o
  namespace: llm
spec:
  modelRef:
    kind: ExternalModel
    name: gpt-4o
EOF

Confirm the external model is ready

kubectl get maasmodelref -n llm

The output should show the 3 models in ready state .. somewhat like the following

% kubectl get maasmodelref -n llm

*NAME                                  PHASE   ENDPOINT                                                                                      HTTPROUTE                                          GATEWAY                AGE*

*facebook-opt-125m-simulated           Ready   https://maas.apps.ocp.zw8s9.sandbox2989.opentlc.com/llm/facebook-opt-125m-simulated           facebook-opt-125m-simulated-kserve-route           maas-default-gateway   2d22h*

*gpt-4o                                Ready   https://maas.apps.ocp.zw8s9.sandbox2989.opentlc.com/llm/gpt-4o                                gpt-4o                                             maas-default-gateway   17h*

*premium-simulated-simulated-premium   Ready   https://maas.apps.ocp.zw8s9.sandbox2989.opentlc.com/llm/premium-simulated-simulated-premium   premium-simulated-simulated-premium-kserve-route   maas-default-gateway   2d22h*

Configure Access via MaaSAuthPolicy & Token rate limits via MaaSSubscription

kubectl apply -f - <<'EOF'
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSAuthPolicy
metadata:
  name: gpt-4o-access
  namespace: models-as-a-service
spec:
  modelRefs:
  - name: gpt-4o
    namespace: llm
  subjects:
    groups:
    - name: "system:authenticated"
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
  name: gpt-4o-subscription
  namespace: models-as-a-service
spec:
  owner:
    groups:
    - name: "system:authenticated"
  modelRefs:
  - name: gpt-4o
    namespace: llm
    tokenRateLimits:
    - limit: 100000
      window: "1h"
EOF

Mint the MaaS API key

GW_HOST=$(kubectl get gateway maas-default-gateway -n openshift-ingress \
-o jsonpath='{.spec.listeners[0].hostname}')
TOKEN=$(oc whoami -t)

AKEY2=$(curl -sS -X POST "https://${GW_HOST}/maas-api/v1/api-keys" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name":"external-model-key","subscription":"gpt-4o-subscription"}' | jq -r '.key')

echo "MaaS API key: ${AKEY2:0:20}..."

Note in the above example the maas api key was derived for a specific gpt-40 subscription but in general would be derived for a subscription that includes many models and offerings.

Access the OpenAI API first directly, then via MaaS

curl -sk "https://api.openai.com/v1/chat/completions" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello, give me an interesting fact about LLMs"}],"max_tokens":100}'

# Through MaaS gateway

curl -sS "https://${GW_HOST}/llm/gpt-4o/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AKEY2" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello, give me an interesting fact about LLMs"}],"max_tokens":100}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment