Skip to content

Instantly share code, notes, and snippets.

@johnidm
Last active October 10, 2025 14:00
Show Gist options
  • Select an option

  • Save johnidm/27d9a642e3c67c8721459c0ab5197437 to your computer and use it in GitHub Desktop.

Select an option

Save johnidm/27d9a642e3c67c8721459c0ab5197437 to your computer and use it in GitHub Desktop.
KServe Models Format

KServe supports a wide variety of model formats across both predictive and generative AI inference.

Predictive AI Model Formats

spaCy models are not natively supported by KServe (as of 2025). However, you can serve spaCy models easily using KServe’s Python model server β€” the kserve SDK provides a way to deploy custom Python models (like spaCy NLP pipelines, HuggingFace models, or anything else in Python).

🧠 1. TensorFlow: TensorFlow SavedModel format (kserve-tensorflow-serving)

Formats:

  • SavedModel (TensorFlow format)
  • .pb (frozen graph, less common now)

Example storage layout:

model/
  └── 1/
      β”œβ”€β”€ saved_model.pb
      └── variables/

πŸ”₯ 2. TorchServe: PyTorch ScriptModule and TorchScript models

Formats:

  • .mar (model archive)
  • You create it using torch-model-archiver

Example:

model-store/
  └── mymodel.mar

🧩 3. ONNX Runtime: Open Neural Network Exchange format

Formats:

  • .onnx

Works for models exported from PyTorch, TensorFlow, scikit-learn, etc.

πŸ“Š 4. SKLearn Server: Pickled models (.pkl, .pickle) and Joblib (.joblib) files

Formats:

  • Pickled model file (.pkl, .joblib)

Model serialization using:

import joblib
joblib.dump(model, "model.joblib")

🌲 5. XGBoost Server: Saved models (.bst, .json, .ubj)

Formats:

  • Binary model file (.bst)
  • JSON (.json from XGBoost >= 1.0)

Example:

model/
  └── model.bst

🧬 6. LightGBM Server: Saved LightGBM models (.bst)

Formats:

  • Text or binary (.txt, .bin)

Example:

model/
  └── model.txt

πŸ“¦ 7. Custom Model Servers

If you need a different format, you can:

  • Implement a custom KServe inference service (Python, gRPC, or HTTP).
  • Package your model however you like β€” just expose the predict() or /infer API.
  • Add a custom Python inference logic with the kserve.Model base class.

Here’s a minimal example with external dependecies:

from kserve import Model, ModelServer
import spacy

class SpacyModel(Model):
    def __init__(self, name: str):
        super().__init__(name)
        self.name = name
        self.nlp = None

    def load(self):
        self.nlp = spacy.load("/mnt/models/my_spacy_model")
        self.ready = True

    def predict(self, request):
        texts = request["instances"]
        responses = [self.nlp(text).cats for text in texts]  # or .ents, .vector, etc.
        return {"predictions": responses}

if __name__ == "__main__":
    model = SpacyModel("spacy-ner")
    ModelServer().start(models=[model])

Example InferenceService YAML:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: spacy-ner
spec:
  predictor:
    model:
      modelFormat:
        name: python
      runtime: kserve-python
      storageUri: "s3://mybucket/models/spacy-ner"
      resources:
        requests:
          cpu: 500m
          memory: 1Gi
        limits:
          cpu: 1
          memory: 2Gi

Your storageUri can point to:

  • MinIO / S3 bucket
  • PVC
  • GCS / Azure Blob, etc.

Directory Structure (as seen by KServe):

s3://mybucket/models/spacy-ner/

spacy-ner/
β”œβ”€β”€ model.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ meta.json
β”œβ”€β”€ tokenizer/
β”œβ”€β”€ vocab/
β”œβ”€β”€ ner/
└── ...

Example requirements.txt:

spacy==3.7.2
numpy>=1.24.0

Then KServe will automatically install these dependencies into the runtime container before your model loads.

Let’s assume:

  • Your MinIO bucket is called models.
  • Your spaCy model folder is spacy-ner.
  • You want to serve this as s3://models/spacy-ner.

At runtime:

  • KServe mounts everything from s3://models/spacy-ner to /mnt/models
  • The Python runtime automatically imports and executes model.py
  • Your SpacyNERModel class is instantiated and loaded
  • The /v1/models/:predict endpoint becomes active

Once deployed and the service is ready (kubectl get inferenceservices), send a request like:

curl -v \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"instances": [{"text": "Apple is looking at buying U.K. startup for $1 billion"}]}' \
  http://<your-ingress-host>/v1/models/spacy-ner:predict

Response:

{
"predictions": [
  {
    "entities": [
      {"text": "Apple", "label": "ORG"},
      {"text": "U.K.", "label": "GPE"},
      {"text": "$1 billion", "label": "MONEY"}
    ]
  }
]
}

Alternative (for dev/testing)

If you want to test locally (without Kubernetes), you can simply run:

python model.py
curl -X POST -H "Content-Type: application/json" \
  -d '{"instances": [{"text": "Apple is looking at buying U.K. startup"}]}' \
  http://localhost:8080/v1/models/spacy-ner:predict  

🧱 8. Custom Docker Image (Simple Python Model with Dependencies)

If you need:

  • Custom dependencies (e.g., private packages)
  • Offline environments (no PyPI access)
  • Large ML frameworks (like torch, transformers, etc.)

You can build your own image extending KServe’s official Python runtime.

Example Dockerfile:

FROM kserve/pythonserver:v0.11.2  # Match your KServe version

# Switch to root to install packages
USER root

# Install dependencies
RUN pip install --no-cache-dir \
    spacy==3.7.2 \
    numpy==1.26.4 \
    pandas==2.2.2

# (Optional) Preload spaCy model into the image
RUN python -m spacy download en_core_web_md

# Switch back to non-root user for security
USER 1000

Push it to your registry (e.g., DockerHub, ECR, GCR):

docker build -t myrepo/spacy-ner:latest .
docker push myrepo/spacy-ner:latest

Then reference it in your YAML:

spec:
  predictor:
    containers:
      - image: myrepo/spacy-ner:latest
        name: kserve-container
        args: ["--model_name", "spacy-ner"]
        env:
          - name: STORAGE_URI
            value: "s3://models/spacy-ner"

🧱 9. Custom Docker Image (Custom Inference Logic (model.py inside image))

Sometimes you don’t want to store model.py or dependencies in S3 at all β€” you want everything baked into the container.

Dockerfile

FROM kserve/pythonserver:v0.11.2

USER root

RUN pip install --no-cache-dir spacy==3.7.2
RUN python -m spacy download en_core_web_md

# Copy your inference script into the container
COPY model.py /app/model.py

USER 1000

ENTRYPOINT ["python", "/app/model.py"]

InferenceService YAML

spec:
  predictor:
    containers:
      - image: myrepo/spacy-ner:latest
        name: kserve-container
        env:
          - name: STORAGE_URI
            value: "s3://models/spacy-ner"   # or empty if not needed  

Generative AI Model Formats

For generative AI workloads, KServe supports:

  • Hugging Face Models: Both saved models and models directly from Hugging Face Hub using model IDs
  • Large Language Models (LLMs): Through specialized runtimes with OpenAI-compatible APIs

Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment