This document details the key changes and improvements introduced in WVA v0.5.0, released as part of PR #549.
Problem Solved: Previously, the saturation engine could repeatedly trigger scale-up for the same variant before previous scale-up operations completed, leading to over-provisioning.
Solution: WVA now tracks pending replicas (pods that exist but are not yet ready) per variant and blocks scale-up for variants with pending replicas.
Technical Details:
- Added
PendingReplicasfield toVariantReplicaStateinterface PendingReplicas = CurrentReplicas - ReadyReplicas- Scale-up variant selection skips variants with
PendingReplicas > 0 - Per-variant tracking allows other variants to scale up if eligible
Timeline Impact:
Before (without protection):
T+0s: Saturation detected → Scale variant-1: 2 → 3
T+30s: Pod not ready yet, saturation persists → Scale: 3 → 4
T+60s: Pods still starting → Scale: 4 → 5
T+90s: All ready, but over-provisioned by 2-3 replicas
After (with protection):
T+0s: Saturation detected → Scale variant-1: 2 → 3 (PendingReplicas=1)
T+30s: variant-1 skipped (has pending), variant-2 scaled if cheaper
T+90s: variant-1 ready (PendingReplicas=0), eligible again
Benefits:
- ✅ Prevents excessive scale-up during model loading (2-7 minutes)
- ✅ Reduces infrastructure costs
- ✅ Maintains cost-optimized scaling across variants
Documentation:
Enhancement: WVA now supports Prometheus configuration through environment variables, providing more flexible deployment options.
Configuration Methods:
-
Environment Variables (New, Recommended):
env: - name: PROMETHEUS_BASE_URL value: "https://prometheus-k8s.monitoring.svc:9091" - name: PROMETHEUS_TLS_INSECURE_SKIP_VERIFY value: "false" - name: PROMETHEUS_CA_CERT_PATH value: "/etc/prometheus-certs/ca.crt"
-
ConfigMap (Existing, Fallback):
data: PROMETHEUS_BASE_URL: "https://prometheus-k8s.monitoring.svc:9091" PROMETHEUS_TLS_INSECURE_SKIP_VERIFY: "false"
Configuration Priority:
- Environment variables (checked first)
- ConfigMap values (fallback)
- Error if neither provides
PROMETHEUS_BASE_URL
Available Environment Variables:
PROMETHEUS_BASE_URL- Prometheus server URL (required)PROMETHEUS_TLS_INSECURE_SKIP_VERIFY- Skip TLS verification (dev/test only)PROMETHEUS_CA_CERT_PATH- CA certificate pathPROMETHEUS_CLIENT_CERT_PATH- Client certificate for mTLSPROMETHEUS_CLIENT_KEY_PATH- Client private key for mTLSPROMETHEUS_SERVER_NAME- Expected server name in TLS certificatePROMETHEUS_BEARER_TOKEN- Bearer token authentication
Benefits:
- ✅ Easier configuration in containerized environments
- ✅ Better secret management (use Kubernetes Secrets via env)
- ✅ Simpler Helm chart customization
- ✅ Backward compatible (ConfigMap still supported)
Documentation:
Security Enhancement: Added automatic parameter escaping and validation to prevent PromQL injection attacks.
Protection Mechanisms:
-
Parameter Escaping:
- All query parameters automatically escaped before use in PromQL
- Backslashes:
\→\\ - Double quotes:
"→\"
-
Namespace Validation:
- Namespace values validated before PromQL construction
- Prevents malicious label matchers
Example Attack Prevention:
// Malicious input attempt
namespace := `prod",malicious="attack`
// WVA automatically escapes
escapedNamespace := EscapePromQLValue(namespace)
// Result: `prod\",malicious=\"attack`
// Safe query
query := fmt.Sprintf(`vllm_kv_cache_usage{namespace="%s"}`, escapedNamespace)
// Prometheus treats as literal string, injection blockedWhy This Matters:
- Prevents unauthorized access to metrics from other namespaces
- Blocks label injection attacks
- Ensures multi-tenant isolation
Implementation:
internal/collector/v2/query_template.go:EscapePromQLValue()functioninternal/collector/v2/prometheus_source.go: Automatic escaping inexecuteQuery()
Documentation:
getVariantKey() Function:
- Added namespace-safe variant identification
- Format:
namespace/variantName - Prevents collisions when multiple namespaces have deployments with same name
Location: internal/engines/saturation/engine.go
Prometheus Query Results:
- Added safe type assertions for Prometheus query results
- Prevents runtime panics from unexpected metric types
- Better error handling and logging
Enhancements:
- Tuned load generation parameters for consistent ~2-3 replica scale-up
- Added per-model token configuration for sustained saturation testing
- Improved test stability and reliability
Impact:
- More predictable E2E test outcomes
- Better validation of saturation-based scaling
- Reduced flaky test failures
Impact: VariantAutoscaling resources created before v0.5.0 that do not have scaleTargetRef must be updated before upgrading.
What Changed:
- The
scaleTargetReffield is now required in the VariantAutoscaling CRD (enforced via kubebuilder validation). - The controller skips processing VAs without
scaleTargetRef.name(seeinternal/utils/variant.go).
Impact on Scale-to-Zero:
- Critical: v0.4.1 VAs without
scaleTargetRefwill not scale to zero properly, even if:- HPAScaleToZero feature gate is enabled
- HPA has
minReplicas: 0configured - Scale-to-zero is enabled in WVA configuration
- This occurs because the HPA cannot properly reference the target deployment without
scaleTargetRef.
Migration Required:
-
Before upgrading to v0.5.0, update all existing VariantAutoscaling resources:
apiVersion: llmd.ai/v1alpha1 kind: VariantAutoscaling metadata: name: <your-va-name> namespace: <your-namespace> spec: scaleTargetRef: kind: Deployment name: <your-deployment-name> # Required: must match your deployment modelID: <your-model-id>
-
After CRD update, VAs without
scaleTargetRefwill fail validation and cannot be created or updated. -
Verify your VAs have
scaleTargetRefbefore upgrading:kubectl get va -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\t"}{.spec.scaleTargetRef.name}{"\n"}{end}' | grep -v "^\s*$"
Why This Change:
- Provides explicit, unambiguous target deployment reference (follows HPA pattern)
- Enables proper scale-to-zero functionality with HPA
- Prevents controller from skipping VAs due to missing target reference
-
Pending Replicas Tracking:
- Automatically enabled, no configuration required
- May observe different scaling behavior during pod startup periods
- This is expected and prevents over-provisioning
-
Prometheus Configuration:
- Existing ConfigMap configuration continues to work
- Consider migrating to environment variables for better secret management
- See Prometheus Integration docs
-
Security:
- PromQL injection prevention is automatic
- No action required, but validates multi-tenant deployment security
All changes include comprehensive unit tests:
internal/saturation/analyzer_test.go- Pending replica handlinginternal/config/prometheus_test.go- Environment variable configinternal/collector/v2/query_template.go- Query escaping
E2E tests updated:
test/e2e-saturation-based/- Enhanced load generationtest/e2e-openshift/- Cross-platform validation
- Andrew Anderson (@clubanderson)
- PR #549: llm-d#549
- Commit: 14e2bd88 - fix: pending-aware scaling and E2E test stability improvements
For detailed implementation, see: