PromptGuard Service

A lightweight service for detecting prompt injection attacks and jailbreak attempts in AI agent interactions. PromptGuard uses Llama Prompt Guard 2 model to classify user input as "safe" or "unsafe" before processing.

Overview

PromptGuard is the first layer of defense in a defense-in-depth security strategy for AI agents. It specifically targets:

Prompt injection attacks: Attempts to manipulate the AI system through crafted inputs
Jailbreak attempts: Techniques to bypass safety restrictions
Malicious user input: Content designed to exploit the AI system

How It Works

The service provides an OpenAI-compatible API endpoint (/v1/chat/completions) that:

Extracts the user message from the conversation
Runs inference using the Llama Prompt Guard 2 model
Returns a classification: "safe" or "unsafe"
Provides confidence scores for monitoring

The service is designed to be lightweight (86M model) and can run on CPU, making it suitable for production deployments.

Deployment

Build and Push Image

# Build the PromptGuard service image
make build-promptguard-image

# Push to registry
make push-promptguard-image

Environment Variables

PromptGuard requires the following environment variables:

export PROMPTGUARD_ENABLED=true  # Required Makefile variable to enable PromptGuard
export HF_TOKEN=your-huggingface-token  # Required for gated models

Optional environment variables:

PROMPTGUARD_MODEL_ID: Model identifier (defaults to meta-llama/Llama-Prompt-Guard-2-86M)

Configuration Options

Configure PromptGuard in helm/values.yaml:

promptGuard:
  enabled: true
  replicas: 1
  modelId: "meta-llama/Llama-Prompt-Guard-2-86M"
  huggingfaceToken: ""  # Required for gated models
  resources:
    limits:
      cpu: "2"
      memory: 2Gi
    requests:
      cpu: "1"
      memory: 1Gi

Integration with Agent Service

PromptGuard is automatically integrated when configured in agent YAML files:

# agent-service/config/agents/your-agent.yaml
input_shields:
  - "meta-llama/Llama-Prompt-Guard-2-86M"  # First layer: attack detection
  - "meta-llama/Llama-Guard-3-8B"          # Second layer: content safety

The agent service will route shield requests to the PromptGuard service at: http://self-service-agent-promptguard.<namespace>.svc.cluster.local:8000/v1

API Endpoints

GET /health: Health check endpoint
GET /v1/models: List available models (OpenAI-compatible)
POST /v1/chat/completions: Classify user input as safe/unsafe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PromptGuard Service

Overview

How It Works

Deployment

Build and Push Image

Environment Variables

Configuration Options

Integration with Agent Service

API Endpoints

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PromptGuard Service

Overview

How It Works

Deployment

Build and Push Image

Environment Variables

Configuration Options

Integration with Agent Service

API Endpoints