DevOps GenAI Platform is an AI infrastructure engineering lab for platform teams that want more than a chatbot.
It combines:
- metadata-aware RAG retrieval
- multi-provider inference routing
- Kubernetes-native deployment patterns
- CI/CD failure diagnosis
- structured fix suggestion generation
- guarded PR automation through a Python executor
- token, latency, and cost observability
This repo is designed to demonstrate how an AI system can behave like a platform capability: ingesting operational evidence, reasoning over the right context, and assisting with safe remediation workflows.
Most GenAI demos answer questions.
This platform does that and:
- ingests incident logs and knowledge-base docs
- retrieves only context relevant to a specific repo/workflow/run
- generates CI/CD-focused diagnosis with evidence
- proposes machine-readable fixes with verification steps, patch text, and workflow actions
- routes inference across OpenAI, Ollama, and Mock providers
- measures cost, traffic, failures, and latency
- can open a PR from a suggested fix while preserving human review and no-merge safety
It is a practical blueprint for AI-assisted DevOps and platform operations.
Logs / Runbooks / KB Docs
↓
Ingest API
↓
Vector Store + Metadata
↓
RAG Service
↓
Inference Router
↓
OpenAI / Ollama / Mock
↓
Diagnosis / Fix Suggestions / PR Workflow
↓
Prometheus / Grafana / Cost Tracking
-
rag-service- ingest docs and logs
- retrieve by metadata filters
- answer questions with citations
- generate structured fix suggestions
- expose usage and cost data
-
inference-router- route by
model_hint - retry transient failures
- fall back to alternate providers
- emit operational metrics
- route by
-
Python executor
- discover failed GitHub runs
- fetch and clean failed logs
- ingest incident data
- inspect target repo state
- call
/suggest-fix - validate and re-evaluate with runtime evidence
- create a reviewable PR when gates pass
-
Kubernetes + dashboards
- deployable services with persistent vector storage
- health checks, autoscaling, config separation
- dashboards for platform visibility
- document and log ingestion with different chunking strategies
- retrieval filters on
repo,pipeline,environment,status,workflow,service_name,run_id - source browsing and inspection APIs
- hybrid context support for incident logs plus KB docs
- specialized CI/CD prompt mode in
/ask - clear separation of symptom, root cause, and next checks
- concise operator-friendly output
- support for common delivery and runtime failure patterns
/suggest-fix produces typed remediation objects containing:
diagnosisfix_typetarget_filetarget_changeswhy_this_fixevidence_usedassumptionsverification_stepsalternatives_consideredpatch_textworkflowsafe_to_auto_applyconfidencerequires_review
- evidence-first prompting
- anti-hallucination and anti-assumption rules
- PR-only policy
- no automatic merge
- checkout and repo-inspection requirements for high-confidence automation
- server-side confidence gating based on runtime evidence
- OpenAI for cloud inference
- Ollama for local/alternate model serving
- Mock provider for fallback/testing
- retry/backoff and fallback logic in the router
- GitHub failed-run discovery
- failed-log download and cleanup
- auto-ingestion into RAG
- repo auto-clone or repo reuse
- validation command derivation from suggested fix steps
- patch normalization and remediation fallback
- transient artifact cleanup before commit
- guarded PR creation
- structured JSON logs
- inference request/failure metrics
- token accounting
- request cost estimation
- Grafana dashboards for traffic, cost, latency, and failures
POST /ingestPOST /ingest-log- embeddings with
text-embedding-3-small - Chroma-backed vector retrieval
- metadata-aware filtering
- source and ingested-chunk inspection endpoints
POST /ask- general question-answer mode
- CI/CD-specific analysis mode
POST /suggest-fix- hybrid retrieval from logs + KB
- structured fix schema and parsing
- runtime context injection from executor
- post-generation safety and confidence enforcement
POST /v1/generate- provider abstraction
- retries and timeouts
- fallback support
- request tracing support
- pure Python executor in tools/suggest_fix_executor.py
- failed GitHub run lookup
- log cleanup and ingestion loop
- repo checkout/reset/clean flow
- patch application with normalization
- PR creation with strict guardrails
- Prometheus metrics
- Grafana dashboards
- golden evaluation harness for answer and fix flows
devops-genai/
├── README.md
├── ROADMAP.md
├── requests.http
├── dashboard/
├── deploy/k8s/
├── docs/
├── eval/
├── images/
├── services/
│ ├── inference-router/
│ └── rag-service/
└── tools/
├── analyze_github_failure.sh
└── suggest_fix_executor.py
services/rag-service/– ingestion, retrieval, prompting,/ask,/suggest-fixservices/inference-router/– routing, retries, fallback, metricstools/– shell and Python automation flowsdeploy/k8s/– Kubernetes manifestsdashboard/– Grafana JSON and demo screenshotseval/– regression harnessdocs/– architecture notes and design rationale
- Accept raw text via
/ingestor/ingest-log. - Chunk according to content type.
- Generate embeddings.
- Store vectors and metadata.
- Make context searchable by incident scope.
- Embed the question.
- Retrieve top relevant chunks.
- Assemble context with citations.
- Send prompt through inference-router.
- Return answer, retrieved chunks, usage, and cost.
- Retrieve incident logs and optional KB chunks.
- Build structured remediation prompt.
- Generate typed fix suggestions.
- Parse diagnosis and fix objects.
- Apply runtime-aware confidence policy.
- Discover the latest failed GitHub run.
- Download and clean failed logs.
- Ingest logs into the platform.
- Ensure clean repo checkout.
- Call
/suggest-fix. - Run safe validation commands from suggested steps.
- Re-evaluate with runtime evidence.
- Create a PR when policy gates allow it.
GET /healthzPOST /chatPOST /ingestPOST /ingest-logPOST /askPOST /suggest-fixGET /sourcesGET /ingestedGET /costsDELETE /reset?confirm=trueDELETE /delete_source?source=...
GET /healthzGET /metricsPOST /v1/generate
| Provider | Status | Notes |
|---|---|---|
| OpenAI | ✅ Supported | Default cloud inference path |
| Ollama | ✅ Supported | Local or alternate inference path |
| Mock | ✅ Supported | Lightweight fallback/testing path |
gpt*→ OpenAIllama*,phi*,mistral*,qwen*,gemma*→ Ollamamock*→ Mock- otherwise → default provider
python -m venv .venv
source .venv/Scripts/activateCreate .env in repo root:
OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4o-mini
OPENAI_MODEL_DEFAULT=gpt-4o-mini
OPENAI_EMBED_MODEL=text-embedding-3-small
CHROMA_DIR=./chroma_db
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_DEFAULT_MODEL=llama3.2:1b
ROUTER_DEFAULT_PROVIDER=openai
ROUTER_ENABLE_FALLBACK=true
ROUTER_FALLBACK_PROVIDER=mock
INFERENCE_ROUTER_URL=http://localhost:8001
INFERENCE_TIMEOUT_S=30pip install -r services/inference-router/requirements.txt
pip install -r services/rag-service/requirements.txtTerminal A:
cd services/inference-router
uvicorn app.main:app --host 0.0.0.0 --port 8001 --reloadTerminal B:
cd services/rag-service
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadUse requests.http for quick endpoint testing.
curl -X POST http://localhost:8000/ingest-log \
-H "Content-Type: application/json" \
-d '{
"source": "github-actions",
"text": "<paste your failed pipeline logs here>",
"repo": "payments-api",
"pipeline": "deploy",
"environment": "dev",
"status": "failed",
"workflow": "deploy.yml",
"service_name": "payments-api"
}'curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{
"question": "Why did this deployment fail and what should I check first?",
"top_k": 5,
"source": "github-actions",
"repo": "payments-api",
"pipeline": "deploy",
"environment": "dev",
"status": "failed",
"analysis_mode": "cicd",
"model_hint": "gpt-4o-mini"
}'curl -X POST http://localhost:8000/suggest-fix \
-H "Content-Type: application/json" \
-d '{
"question": "Based on this failure, what are the actionable fixes?",
"top_k": 5,
"min_relevance": 2.0,
"content_type": "logs",
"source": "github-actions",
"repo": "payments-api",
"pipeline": "deploy",
"environment": "dev",
"status": "failed",
"use_kb": true,
"kb_source": "kb-playbook"
}'The executor connects diagnosis with safe action.
It can:
- discover the latest failed GitHub Actions run
- fetch and clean failed logs
- ingest logs into the RAG system
- auto-clone or reuse a target repo checkout
- inspect repo state and send runtime evidence to the agent
- render rich remediation suggestions
- normalize malformed diffs
- fall back to safe remediation commands when needed
- remove temporary artifacts before commit
- open PRs without any merge automation
python tools/suggest_fix_executor.py \
--agent-url http://localhost:8000 \
--question "CI failed in deploy workflow with missing artifact" \
--repo-path . \
--source github-actions \
--content-type logs \
--use-kbpython tools/suggest_fix_executor.py \
--agent-url http://localhost:18000 \
--github-repo GauJosh/cicd-demo \
--github-workflow failing-ci \
--ingest-logs \
--use-kbpython tools/suggest_fix_executor.py \
--agent-url http://localhost:18000 \
--github-repo GauJosh/cicd-demo \
--github-workflow failing-ci \
--ingest-logs \
--use-kbPR creation is automatic when safe-to-apply gates pass. Use --no-create-pr to run analysis-only mode.
- PR only
- no auto-merge
- requires strong runtime evidence
- requires high-confidence suggestion output
- requires meaningful repo changes
- cleans transient files before commit
Build images:
docker build -t rag-service:local services/rag-service
docker build -t inference-router:local services/inference-routerRun example:
docker run --rm -p 8001:8000 --env-file .env inference-router:local
docker run --rm -p 8000:8000 --env-file .env -e INFERENCE_ROUTER_URL=http://host.docker.internal:8001 rag-service:localApply manifests:
kubectl apply -f deploy/k8s/namespace.yaml
kubectl apply -f deploy/k8s/configmap.yaml
kubectl apply -f deploy/k8s/secret.yaml
kubectl apply -f deploy/k8s/pvc.yaml
kubectl apply -f deploy/k8s/router-deployment.yaml
kubectl apply -f deploy/k8s/router-service.yaml
kubectl apply -f deploy/k8s/rag-deployment.yaml
kubectl apply -f deploy/k8s/rag-service.yaml
kubectl apply -f deploy/k8s/hpa.yaml- namespace:
devops-genai - persistent vector storage via PVC-backed Chroma
- HPA for both services
- set a valid
OPENAI_API_KEYin deploy/k8s/secret.yaml
inference_requests_total{provider,purpose,model}inference_failures_total{provider,purpose,failure_stage}inference_latency_seconds{provider,purpose,model}inference_input_tokens_total{provider,purpose,model}inference_output_tokens_total{provider,purpose,model}
- dashboard/grafana-dashboard-v1.json
- dashboard/grafana-dashboard-v2.json
- dashboard/grafana-dashboard-v3.json
| Dashboard View 1 | Dashboard View 2 | Dashboard View 3 |
|---|---|---|
![]() |
![]() |
![]() |
This project supports a log-to-remediation loop that combines:
- log-first retrieval
- metadata-scoped context selection
- evidence-backed diagnosis
- structured fixes
- guarded automation
| Scenario 1 | Scenario 2 | Scenario 3 |
|---|---|---|
![]() |
![]() |
![]() |
Run golden tests:
python eval/run_eval.pyOptional custom file:
python eval/run_eval.py eval/golden.jsonOPENAI_API_KEY not set→ provide a key in.envor deploy/k8s/secret.yaml/askreturnsInsufficient context→ ingest docs/logs first- weak CI/CD answers → ensure incident metadata matches the failing run
- router 502s → inspect inference-router logs for retry/fallback failures
- no autoscaling → verify
metrics-serverexists in cluster - noisy PRs → executor now removes transient artifacts before commit
- malformed patch text → executor normalizes diff hunks and can fall back to safe remediation commands
The current system already demonstrates a strong local and Kubernetes-native AI operations platform.
The next evolution is straightforward:
- expose the agent behind a secure DNS endpoint
- trigger executor from a separate workflow/repo
- move vector persistence to
pgvector - support cross-repo operational automation
- strengthen tenant isolation and enterprise access patterns
That turns this from a powerful demo into a compelling foundation for AI-assisted platform operations at scale.






