1. Introduction
As AI models grow more complex and require frequent updates, the infrastructure supporting these models must evolve to keep pace. Traditional methods of storing model files on cloud object storage can introduce delays during deployment, particularly when fetching large model files. KServe's ModelMesh Serving addresses these challenges by providing a scalable model serving framework that can dynamically load and unload models as needed. By leveraging Kubernetes Persistent Volumes, this solution further optimizes model serving by reducing latency and improving overall efficiency.
2. Step-by-Step Guide
This section provides a detailed guide on configuring KServe ModelMesh Serving with Kubernetes Persistent Volumes. Follow these steps to set up an efficient AI model serving environment.
2.1. Prerequisites
Before starting, ensure you have the following:
- A Kubernetes cluster with admin privileges (or Minikube) with at least 4 CPUs and 8 GB of memory
- kubectl and kustomize (v4.0.0+) installed
- A "Quickstart" installation of ModelMesh Serving
2.2. Create a Persistent Volume Claim (PVC)
To begin, create a Persistent Volume Claim (PVC) within your Kubernetes cluster to allocate storage that ModelMesh can use to store model files:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "my-models-pvc"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
Apply this configuration using kubectl:
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "my-models-pvc"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
EOF
Verify that the PVC is created and bound to a persistent volume:
kubectl get pvc
# Output example:
# NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
# my-models-pvc Bound pvc-783726ab-9fd3-47f3-8c7d-bf7822d6d7f8 15Gi RWX retain-file-gold 2m
2.3. Create a Pod to Access the PVC
Next, create a pod that will mount the PVC as a volume:
apiVersion: v1
kind: Pod
metadata:
name: "pvc-access"
spec:
containers:
- name: main
image: ubuntu
command: ["/bin/sh", "-ec", "sleep 10000"]
volumeMounts:
- name: "my-pvc"
mountPath: "/mnt/models"
volumes:
- name: "my-pvc"
persistentVolumeClaim:
claimName: "my-models-pvc"
Apply this configuration:
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Pod
metadata:
name: "pvc-access"
spec:
containers:
- name: main
image: ubuntu
command: ["/bin/sh", "-ec", "sleep 10000"]
volumeMounts:
- name: "my-pvc"
mountPath: "/mnt/models"
volumes:
- name: "my-pvc"
persistentVolumeClaim:
claimName: "my-models-pvc"
EOF
Confirm that the pod is running:
kubectl get pods | grep pvc\|STATUS
# Output example:
# NAME READY STATUS RESTARTS AGE
# pvc-access 1/1 Running 0 2m12s
2.4. Store the Model on the Persistent Volume
Download the MNIST model:
curl -sOL https://github.com/kserve/modelmesh-minio-examples/raw/main/sklearn/mnist-svm.joblib
Copy the model to the pod:
kubectl cp mnist-svm.joblib pvc-access:/mnt/models/
Verify the model upload:
kubectl exec -it pvc-access -- ls -alr /mnt/models/
# Expected output:
# total 356
# -rw-r--r-- 1 501 staff 344917 Sep 17 09:20 mnist-svm.joblib
# drwxr-xr-x 3 nobody 4294967294 4096 Sep 17 09:20 ..
# drwxr-xr-x 2 nobody 4294967294 4096 Sep 17 09:20 .
2.5. Configure ModelMesh Serving
Create a ConfigMap to enable PVC usage:
apiVersion: v1
kind: ConfigMap
metadata:
name: model-serving-config
data:
config.yaml: |
allowAnyPVC: true
Apply the configuration:
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: ConfigMap
metadata:
name: model-serving-config
data:
config.yaml: |
allowAnyPVC: true
EOF
2.6. Deploy Inference Service
Deploy the MNIST model inference service:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-mnist
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: sklearn
storage:
parameters:
type: pvc
name: my-models-pvc
path: mnist-svm.joblib
Apply the service:
kubectl apply -f - <<EOF
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-mnist
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: sklearn
storage:
parameters:
type: pvc
name: my-models-pvc
path: mnist-svm.joblib
EOF
Check the service status:
kubectl get isvc
# Output example:
# NAME URL READY PREV LATEST AGE
# sklearn-mnist grpc://modelmesh-serving.modelmesh-serving:8033 True 35s
2.7. Run an Inference Request
Set up port-forwarding:
kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8008 &
Send an inference request:
MODEL_NAME="sklearn-mnist"
curl -X POST -k "http://localhost:8008/v2/models/${MODEL_NAME}/infer" -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 3.0, 10.0, 15.0, 16.0, 2.0, 0.0, 0.0, 2.0, 14.0, 16.0, 11.0, 15.0, 7.0, 0.0, 0.0, 7.0, 16.0, 3.0, 5.0, 15.0, 4.0, 0.0, 0.0, 4.0, 14.0, 10.0, 12.0, 13.0, 0.0, 0.0, 0.0, 0.0, 3.0, 13.0, 15.0, 12.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.0, 15.0, 15.0, 5.0, 0.0, 0.0, 0.0, 0.0, 15.0, 15.0, 15.0, 6.0, 0.0, 0.0, 0.0, 0.0, 10.0, 12.0, 11.0, 0.0, 0.0]}]}'
The response should look like:
{
"model_name": "sklearn-mnist__isvc-2d5cba6382",
"outputs": [
{ "name": "predict", "datatype": "INT64", "shape": [1], "data": [7] }
]
}
3. Conclusion
By configuring KServe ModelMesh Serving to utilize Kubernetes Persistent Volumes, you can significantly improve the efficiency and performance of your AI model deployments. This approach not only reduces the latency associated with fetching models from remote storage but also enhances the overall scalability of your AI infrastructure.