KServe supports a wide variety of model formats across both predictive and generative AI inference.
spaCy models are not natively supported by KServe (as of 2025). However, you can serve spaCy models easily using KServeβs Python model server β the kserve SDK provides a way to deploy custom Python models (like spaCy NLP pipelines, HuggingFace models, or anything else in Python).
Formats:
SavedModel(TensorFlow format).pb(frozen graph, less common now)
Example storage layout:
model/
βββ 1/
βββ saved_model.pb
βββ variables/
Formats:
.mar(model archive)- You create it using torch-model-archiver
Example:
model-store/
βββ mymodel.mar
Formats:
.onnx
Works for models exported from PyTorch, TensorFlow, scikit-learn, etc.
Formats:
- Pickled model file (
.pkl,.joblib)
Model serialization using:
import joblib
joblib.dump(model, "model.joblib")
Formats:
- Binary model file (
.bst) - JSON (.json from XGBoost >= 1.0)
Example:
model/
βββ model.bst
Formats:
- Text or binary (
.txt,.bin)
Example:
model/
βββ model.txt
If you need a different format, you can:
- Implement a custom KServe inference service (Python, gRPC, or HTTP).
- Package your model however you like β just expose the
predict()or/inferAPI. - Add a custom Python inference logic with the
kserve.Modelbase class.
Hereβs a minimal example with external dependecies:
from kserve import Model, ModelServer
import spacy
class SpacyModel(Model):
def __init__(self, name: str):
super().__init__(name)
self.name = name
self.nlp = None
def load(self):
self.nlp = spacy.load("/mnt/models/my_spacy_model")
self.ready = True
def predict(self, request):
texts = request["instances"]
responses = [self.nlp(text).cats for text in texts] # or .ents, .vector, etc.
return {"predictions": responses}
if __name__ == "__main__":
model = SpacyModel("spacy-ner")
ModelServer().start(models=[model])
Example InferenceService YAML:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: spacy-ner
spec:
predictor:
model:
modelFormat:
name: python
runtime: kserve-python
storageUri: "s3://mybucket/models/spacy-ner"
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
Your storageUri can point to:
- MinIO / S3 bucket
- PVC
- GCS / Azure Blob, etc.
Directory Structure (as seen by KServe):
s3://mybucket/models/spacy-ner/
spacy-ner/
βββ model.py
βββ requirements.txt
βββ meta.json
βββ tokenizer/
βββ vocab/
βββ ner/
βββ ...
Example requirements.txt:
spacy==3.7.2
numpy>=1.24.0
Then KServe will automatically install these dependencies into the runtime container before your model loads.
Letβs assume:
- Your MinIO bucket is called
models. - Your spaCy model folder is
spacy-ner. - You want to serve this as
s3://models/spacy-ner.
At runtime:
- KServe mounts everything from s3://models/spacy-ner to /mnt/models
- The Python runtime automatically imports and executes model.py
- Your SpacyNERModel class is instantiated and loaded
- The /v1/models/:predict endpoint becomes active
Once deployed and the service is ready (kubectl get inferenceservices), send a request like:
curl -v \
-X POST \
-H "Content-Type: application/json" \
-d '{"instances": [{"text": "Apple is looking at buying U.K. startup for $1 billion"}]}' \
http://<your-ingress-host>/v1/models/spacy-ner:predict
Response:
{
"predictions": [
{
"entities": [
{"text": "Apple", "label": "ORG"},
{"text": "U.K.", "label": "GPE"},
{"text": "$1 billion", "label": "MONEY"}
]
}
]
}
If you want to test locally (without Kubernetes), you can simply run:
python model.py
curl -X POST -H "Content-Type: application/json" \
-d '{"instances": [{"text": "Apple is looking at buying U.K. startup"}]}' \
http://localhost:8080/v1/models/spacy-ner:predict
If you need:
- Custom dependencies (e.g., private packages)
- Offline environments (no PyPI access)
- Large ML frameworks (like torch, transformers, etc.)
You can build your own image extending KServeβs official Python runtime.
Example Dockerfile:
FROM kserve/pythonserver:v0.11.2 # Match your KServe version
# Switch to root to install packages
USER root
# Install dependencies
RUN pip install --no-cache-dir \
spacy==3.7.2 \
numpy==1.26.4 \
pandas==2.2.2
# (Optional) Preload spaCy model into the image
RUN python -m spacy download en_core_web_md
# Switch back to non-root user for security
USER 1000
Push it to your registry (e.g., DockerHub, ECR, GCR):
docker build -t myrepo/spacy-ner:latest .
docker push myrepo/spacy-ner:latest
Then reference it in your YAML:
spec:
predictor:
containers:
- image: myrepo/spacy-ner:latest
name: kserve-container
args: ["--model_name", "spacy-ner"]
env:
- name: STORAGE_URI
value: "s3://models/spacy-ner"
Sometimes you donβt want to store model.py or dependencies in S3 at all β you want everything baked into the container.
Dockerfile
FROM kserve/pythonserver:v0.11.2
USER root
RUN pip install --no-cache-dir spacy==3.7.2
RUN python -m spacy download en_core_web_md
# Copy your inference script into the container
COPY model.py /app/model.py
USER 1000
ENTRYPOINT ["python", "/app/model.py"]
InferenceService YAML
spec:
predictor:
containers:
- image: myrepo/spacy-ner:latest
name: kserve-container
env:
- name: STORAGE_URI
value: "s3://models/spacy-ner" # or empty if not needed
For generative AI workloads, KServe supports:
- Hugging Face Models: Both saved models and models directly from Hugging Face Hub using model IDs
- Large Language Models (LLMs): Through specialized runtimes with OpenAI-compatible APIs