An intelligent, agentic customer service system powered by LangChain v1.0+ and LangGraph.
Current Status: Phase 6 Complete β - MVP PRODUCTION READY
This is a complete, portfolio-ready multi-agent customer service system featuring:
- π€ Multi-Provider LLMs: AWS Bedrock (Nova Lite) for routing + OpenAI (GPT-4o-mini) for generation
- π Real-Time Streaming: Server-Sent Events (SSE) with user toggle
- π Advanced RAG/CAG: Pure RAG, Pure CAG, and Hybrid strategies
- π― 4 Specialized Agents: Technical Support, Billing, Compliance, and General Information
- π§ͺ Production Quality: 145 tests passing (91% coverage)
A supervisor agent intelligently routes queries to specialized workers while maintaining conversation memory across routing.
# 1. Clone the repository
git clone <repository-url>
cd Agentic_Customer_Project1
# 2. Set up the backend (FastAPI + LangChain)
cd backend
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
uvicorn main:app --reload
# 3. In a new terminal, set up the frontend (Next.js + TypeScript)
cd frontend
pnpm install
cp .env.example .env.local
# Edit .env.local if needed (default: http://localhost:8000)
pnpm devAccess the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Supervisor Agent: AWS Bedrock Nova Lite ($0.06/1M tokens) for cost-effective routing
- Worker Agents: OpenAI GPT-4o-mini ($0.15/1M tokens) for high-quality responses
- Automatic Fallback: Gracefully falls back to OpenAI if AWS unavailable
- 11% Cost Savings: Optimized model selection for each task
- Server-Sent Events (SSE): Token-by-token streaming for immediate user feedback
- Toggle Mode: Switch between streaming (real-time) and standard (single response)
- Smooth UX: No flicker, graceful error recovery, visual indicators
- Production Ready: Full error handling and session continuity
- Pure RAG (Technical & General): Dynamic document retrieval from ChromaDB
- Hybrid RAG/CAG (Billing): First query retrieves, subsequent queries use cache
- Pure CAG (Compliance): Pre-loaded context for instant, consistent responses
- 8 Document Repository: 2 documents per domain (technical, billing, compliance, general)
π οΈ Technical Support (Pure RAG)
- Errors, bugs, crashes, and software malfunctions
- Installation, configuration, and setup issues
- Performance problems and diagnostics
- Step-by-step troubleshooting from knowledge base
π³ Billing Support (Hybrid RAG/CAG)
- Payment methods and processing
- Invoice inquiries and unexpected charges
- Subscription management (upgrade, downgrade, cancel)
- Cached pricing information after first query
π Compliance (Pure CAG)
- Terms of Service and policy questions
- Privacy policy and data collection practices
- GDPR, CCPA, and data protection regulations
- Instant responses from pre-loaded documents
π General Information (Pure RAG)
- Company background and mission
- Service offerings and features
- Getting started guides and onboarding
- Dynamic retrieval from general knowledge base
- Domain-specific query analysis and routing
- Conversation context maintained across routing
- Session persistence across page refreshes
- Clear conversation to start fresh
- Detailed logging (π ROUTING, β DIRECT indicators)
- Backend: FastAPI with
/chatand/chat/streamendpoints - Frontend: Next.js 16 with TypeScript and Tailwind CSS
- Real-time Updates: Token-by-token streaming display
- User Controls: Streaming toggle, clear conversation, error handling
- Type Safety: Full TypeScript + Pydantic validation
- 145 Automated Tests: 129 unit + 16 integration tests
- 91% Code Coverage: All worker agents thoroughly tested
- Comprehensive Docs: Setup guides, architecture, API docs
- Error Handling: Graceful fallbacks and user-friendly messages
- LangSmith Support: Full tracing and debugging
- AWS Setup Guide: Complete 409-line setup documentation
- Start the application (see Quick Start above)
- Open http://localhost:3000
- Test streaming: Enable streaming toggle (lightning bolt icon)
- Test technical query: "Getting Error 500 when logging in"
- Watch response stream token-by-token
- Check logs for
π ROUTINGto Technical Support
- Test billing query: "What are your pricing plans?"
- First query retrieves from vector store (RAG)
- Second query uses cached policies (CAG)
- Test compliance query: "What's your data retention policy?"
- Instant response from pre-loaded compliance docs
- Test memory: Follow up with "Can you explain more?"
- Context maintained across routing
- Overview
- Architecture
- Prerequisites
- Monorepo Structure
- Setup Instructions
- Development Workflow
- Testing
- Troubleshooting
- Documentation
- Contributing
- License
This project implements a production-ready, intelligent customer service AI system powered by LangChain v1.0+, AWS Bedrock, and OpenAI.
MVP Complete - All 6 Phases Finished:
A sophisticated multi-agent system featuring:
- π€ Multi-Provider LLMs: AWS Bedrock Nova Lite for routing, OpenAI GPT-4o-mini for generation
- π Real-Time Streaming: Server-Sent Events (SSE) with user toggle between streaming/standard modes
- π Advanced RAG/CAG: Pure RAG, Pure CAG, and Hybrid strategies for optimal knowledge retrieval
- π― 4 Specialized Agents: Technical Support, Billing, Compliance, and General Information
- π§ Stateful Memory: Conversation context maintained across routing with InMemorySaver
- π Intelligent Routing: Domain-specific query analysis and agent selection
- π¨ Modern Full-Stack: FastAPI backend + Next.js frontend with TypeScript
- π§ͺ Production Quality: 145 tests (91% coverage), comprehensive error handling
Key Technologies:
| Component | Technology | Purpose |
|---|---|---|
| Backend | FastAPI + Python 3.11+ | REST API and agent orchestration |
| AI Framework | LangChain v1.0+ & LangGraph | Multi-agent system and workflows |
| LLM Providers | AWS Bedrock + OpenAI | Multi-provider strategy for cost optimization |
| Vector Store | ChromaDB | Document retrieval and semantic search |
| Frontend | Next.js 16 + TypeScript | Modern, responsive web interface |
| Styling | Tailwind CSS v4 | Beautiful, utility-first design |
| Package Manager | pnpm | Fast, efficient dependency management |
| Testing | pytest + TypeScript | 145 automated tests, 91% coverage |
Phase 6 Complete - Production-Ready Multi-Agent System:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js + TypeScript) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Chat Interface (with Streaming Toggle) β β
β β β’ Real-time SSE streaming or standard responses β β
β β β’ Message history with session persistence β β
β β β’ User controls (clear, toggle streaming) β β
β ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β POST /chat or /chat/stream
β {message, session_id}
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (FastAPI + LangChain v1.0+) β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββββββ β
β β /chat (standard) β β /chat/stream (SSE streaming) β β
β ββββββββββ¬ββββββββββ ββββββββββββ¬ββββββββββββββββββββ β
β ββββββββββββββββ¬ββββββββββ β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Supervisor Agent (AWS Nova Lite) β
β β’ Analyzes query domain (technical/billing/etc.) β
β β’ Routes to appropriate worker agent β
β β’ Fallback to OpenAI GPT-4o-mini if AWS unavailable β
β β’ Memory: InMemorySaver (cross-routing context) β
ββββββββ¬ββββββββββββ¬ββββββββββββββ¬βββββββββββββββ¬ββββββββββ
β β β β
β β β β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ
βTechnical β β Billing β βComplianceβ β General β
β Support β β Support β β β β Information β
β β β β β β β β
β Pure RAG β β Hybrid β β Pure CAG β β Pure RAG β
βGPT-4o-mi β βRAG/CAG β βGPT-4o-mi β β GPT-4o-mini β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββββ¬ββββββββ
β β β β
β β β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAG/CAG Knowledge System β
β ββββββββββββββββ ββββββββββββββ ββββββββββββββββββ β
β β ChromaDB β β Cache β β Pre-loaded β β
β β Vector β β Session β β Compliance β β
β β Store β β Billing β β Documents β β
β β (Technical, β β Policies β β (ToS, PP) β β
β β General) β β β β β β
β ββββββββββββββββ ββββββββββββββ ββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Components:
- Multi-Provider LLMs: AWS Nova Lite ($0.06/1M) for supervisor, OpenAI GPT-4o-mini ($0.15/1M) for workers
- Streaming Support: SSE for real-time responses, standard mode for single-response
- 4 Worker Agents: Technical (Pure RAG), Billing (Hybrid), Compliance (Pure CAG), General (Pure RAG)
- Knowledge Strategies:
- Pure RAG: Dynamic retrieval from ChromaDB
- Hybrid RAG/CAG: First query retrieves, subsequent use cache
- Pure CAG: Pre-loaded static documents
- Session Memory: InMemorySaver maintains context across routing
- Automatic Fallback: Graceful degradation to OpenAI if AWS unavailable
- Type Safety: Full TypeScript + Pydantic validation
For detailed architecture documentation, see:
- π ARCHITECTURE.md - Complete system design and patterns
- π FLOWCHARTS.md - Visual process flows and diagrams
- πΊοΈ PHASED_DEVELOPMENT_GUIDE.md - Development roadmap
- π PHASE5_RAG_CAG_GUIDE.md - RAG/CAG implementation details
- π PHASE6_COMPLETION_SUMMARY.md - Final MVP features
Before setting up this project, ensure you have the following installed:
- Python 3.11 or higher (Python 3.13 recommended)
python3 --version # Should be 3.11+ - pip (Python package manager, usually comes with Python)
- virtualenv or venv (for isolated Python environments)
- Node.js v20 or higher
node --version # Should be v20+ - pnpm v9 or higher (recommended package manager)
# Install pnpm if needed npm install -g pnpm pnpm --version # Should be v9+
- Docker & Docker Compose - For containerized deployment
- Git - For version control (should already be installed)
- Visual Studio Code - Recommended IDE with extensions:
- Python
- ESLint
- Tailwind CSS IntelliSense
- Prettier
You'll need an OpenAI API key to run the agents:
- OpenAI API Key: Get from https://platform.openai.com/api-keys
- (Optional) LangSmith API Key: For debugging and tracing - https://smith.langchain.com/
- (Optional) AWS Credentials: If using AWS Bedrock models
This is a monorepo containing both backend and frontend in a single repository:
Agentic_Customer_Project1/
βββ backend/ # Python FastAPI backend
β βββ agents/ # Agent modules (Phase 2-3)
β β βββ simple_agent.py # Phase 2: Simple agent (reference)
β β βββ supervisor_agent.py # Phase 3: Supervisor β
β β βββ workers/ # Phase 3: Specialized workers β
β β βββ billing_support.py # Billing worker β
β β βββ compliance.py # Compliance worker β
β β βββ general_info.py # General info worker β
β β βββ technical_support.py # Technical worker β
β βββ data/ # Data and documents
β β βββ docs/ # Document repositories (Phase 5+)
β β βββ technical/ # Technical documentation
β β βββ billing/ # Billing documents
β β βββ compliance/ # Compliance documents
β βββ tests/ # Backend tests (54 tests β
)
β β βββ test_main.py # API + routing integration tests
β β βββ test_agent.py # Phase 2 agent tests
β β βββ test_supervisor.py # Supervisor unit tests β
β β βββ test_technical_worker.py # Worker unit tests β
β βββ utils/ # Utility functions
β βββ main.py # FastAPI app with supervisor routing β
β βββ test_routing_logs.sh # Routing test script β
β βββ requirements.txt # Python dependencies
β βββ .env.example # Environment variables template
β βββ Dockerfile # Backend container config
β βββ README.md # Backend documentation (Phase 3 β
)
β
βββ frontend/ # Next.js TypeScript frontend
β βββ app/ # Next.js App Router pages
β βββ components/ # React components
β βββ lib/ # Frontend utilities
β βββ public/ # Static assets
β βββ package.json # Frontend dependencies
β βββ tsconfig.json # TypeScript configuration
β βββ .env.example # Environment variables template
β βββ README.md # Frontend documentation
β
βββ tasks/ # Project management
β βββ 0001-prd-project-setup.md # Phase 1 PRD
β βββ tasks-0001-prd-project-setup.md # Phase 1 tasks
β βββ 0002-prd-simple-agent.md # Phase 2 PRD
β βββ tasks-0002-prd-simple-agent.md # Phase 2 tasks
β βββ 0003-prd-multi-agent-supervisor.md # Phase 3 PRD β
β βββ tasks-0003-prd-multi-agent-supervisor.md # Phase 3 tasks β
β
βββ .github/ # GitHub workflows and templates
β βββ workflows/ # CI/CD pipelines
β
βββ PHASE3_MULTI_AGENT_DEMO_GUIDE.md # Phase 3 demo guide β
βββ docker-compose.yml # Docker orchestration
βββ ARCHITECTURE.md # System architecture docs
βββ FLOWCHARTS.md # Process flow diagrams
βββ PHASED_DEVELOPMENT_GUIDE.md # Development roadmap
βββ CONTRIBUTING.md # Contribution guidelines
βββ README.md # This file (Phase 3 β
)
Key Points:
- Backend and Frontend are completely independent and can be developed separately
- Each has its own dependencies, environment variables, and documentation
- They communicate via REST API (backend exposes endpoints, frontend consumes them)
- Both can be run independently or together using Docker Compose
-
Navigate to backend directory:
cd backend -
Create and activate virtual environment:
# Create virtual environment python3 -m venv venv # Activate it # On macOS/Linux: source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install --upgrade pip pip install -r requirements.txt
-
Configure environment variables:
cp .env.example .env # Edit .env and add your API keys nano .env # or use your preferred editor
Required variables:
OPENAI_API_KEY=your_openai_api_key_here ENVIRONMENT=development LOG_LEVEL=INFO
-
Run the backend:
uvicorn main:app --reload # Or simply: python main.py -
Verify it's working:
- Open http://localhost:8000/health
- Open http://localhost:8000/docs (interactive API documentation)
π For detailed backend documentation, see backend/README.md
-
Navigate to frontend directory:
cd frontend -
Install dependencies:
pnpm install # Or if you prefer npm: npm install -
Configure environment variables:
cp .env.example .env.local # Edit if needed (default backend URL is http://localhost:8000) -
Run the development server:
pnpm dev # Or with npm: npm run dev -
Verify it's working:
- Open http://localhost:3000
- You should see the "Customer Service AI" welcome page
π For detailed frontend documentation, see frontend/README.md
To run both backend and frontend using Docker:
# From the project root
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose downNote: Docker Compose configuration will be added in a future task.
-
Start backend (Terminal 1):
cd backend source venv/bin/activate # Activate venv uvicorn main:app --reload
-
Start frontend (Terminal 2):
cd frontend pnpm dev -
Start coding!
- Backend changes auto-reload with
--reloadflag - Frontend changes auto-reload with Fast Refresh
- Backend changes auto-reload with
Backend (Python):
cd backend
# Lint and format
ruff check .
ruff format .
# Run tests
pytestFrontend (TypeScript):
cd frontend
# Lint
pnpm lint
# Type check
pnpm build # This runs TypeScript compilerThis project follows GitHub Flow with feature branches:
-
Create feature branch from
main:git checkout main git pull origin main git checkout -b feat/your-feature-name
-
Make changes and commit using Conventional Commits:
git add . git commit -m "feat: add user authentication" \ -m "- Added login endpoint" \ -m "- Added JWT token generation"
-
Push and merge:
git push -u origin feat/your-feature-name git checkout main git merge --no-ff feat/your-feature-name git push origin main
π For detailed contribution guidelines, see CONTRIBUTING.md
To maintain code quality and prevent accidental changes to the main branch, it's recommended to enable branch protection rules on GitHub.
-
Navigate to Repository Settings:
- Go to your repository on GitHub
- Click Settings (requires admin access)
- Click Branches in the left sidebar
- Click Add branch protection rule
-
Configure Branch Name Pattern:
- Set Branch name pattern to:
main
- Set Branch name pattern to:
-
Enable Required Status Checks:
- β Require status checks to pass before merging
- β Require branches to be up to date before merging
- Select required checks:
Backend (Python) - RuffFrontend (TypeScript) - ESLintBackend Tests (pytest)(optional but recommended)Frontend TypeScript Check(optional but recommended)
-
Enable Pull Request Requirements (Optional but Recommended):
- β Require a pull request before merging
- β Require approvals: Set to 1 or more reviewers
- β Dismiss stale pull request approvals when new commits are pushed
-
Additional Recommended Settings:
- β Require conversation resolution before merging
- β Do not allow bypassing the above settings (keeps even admins accountable)
- β Restrict who can push to matching branches (optional for team environments)
-
Click "Create" to save the protection rules
- β Direct pushes to
mainwithout review (if PR required) - β Merging code that fails linting checks
- β Merging code that fails tests
- β Merging code with unresolved review comments
- β Accidentally force-pushing to
main
If you're working alone and find PR requirements too restrictive:
- Enable only the required status checks (linting and tests)
- Skip the "Require pull request" option
- You can still push directly to
main, but linting/tests must pass
After enabling, try to:
- Push directly to
main- Should be blocked if PR required - Create a PR with failing tests - Should show checks failing
- Fix the issues and push again - Checks should pass and allow merge
Backend Tests (54 passing, 64% coverage):
cd backend
source venv/bin/activate
# Run all tests (unit only, fast)
pytest
# Run with integration tests (mocked, no tokens used)
pytest --run-integration
# Run with coverage report
pytest --cov=. --cov-report=html
# Run specific test suites
pytest tests/test_main.py -v # API + routing integration (37 tests)
pytest tests/test_supervisor.py -v # Supervisor unit tests (15 tests)
pytest tests/test_technical_worker.py -v # Worker unit tests (19 tests)
pytest tests/test_agent.py -v # Phase 2 agent tests (10 tests)
# View coverage report
open htmlcov/index.htmlTest Breakdown:
- Unit Tests (44 tests): Fast, mocked, no API calls
- 15 supervisor tests
- 19 technical worker tests
- 10 Phase 2 agent tests (reference)
- Integration Tests (10 tests): Full endpoint routing tests (mocked supervisor)
- Technical query routing
- General query handling
- Context maintenance across routing
- Error handling scenarios
Frontend Linting & Type Checks:
cd frontend
# Run ESLint
pnpm lint
# TypeScript type checking
pnpm tsc --noEmitRun All Tests (CI-style):
# From project root
./scripts/test-all.sh
# Or use Make commands
make test # Run all tests
make lint # Run all lintersMulti-Agent Routing Testing (Phase 3):
-
Start the application:
# Terminal 1: Backend cd backend && source venv/bin/activate && uvicorn main:app --reload # Terminal 2: Frontend cd frontend && pnpm dev
-
Test technical query routing:
- Open http://localhost:3000
- Type: "Getting Error 500 when logging in"
- Expected: Technical troubleshooting response
- Check logs: Should see
π ROUTING: Query routed to worker agent
-
Test general query direct handling:
- Type: "Hello! How are you?"
- Expected: Friendly greeting response
- Check logs: Should see
β DIRECT: Supervisor handled query directly
-
Test conversation memory across routing:
- Type: "I'm having an installation problem"
- Type: "What did I just say?"
- Expected: AI remembers the installation problem
- Verify: Context maintained across routing
-
Test session persistence:
- Refresh the page (F5)
- Type: "Do you remember my issue?"
- Verify: AI still remembers (session persisted)
-
Test clear conversation:
- Click "Clear Conversation" button
- Type: "What was my problem?"
- Verify: AI doesn't remember (new session)
Test Routing with Script:
cd backend
chmod +x test_routing_logs.sh
./test_routing_logs.sh
# Watch logs for π ROUTING and β DIRECT indicatorsFor comprehensive manual testing scenarios, see MANUAL_TESTING.md
# Make sure virtual environment is activated
source backend/venv/bin/activate
pip install -r backend/requirements.txt# Check your .env file has the key set
cat backend/.env | grep OPENAI_API_KEY
# Make sure it's not quoted and has no spaces
OPENAI_API_KEY=sk-...# Kill process using port 8000
lsof -ti:8000 | xargs kill -9
# Or use a different port
uvicorn main:app --reload --port 8001# Clear cache and reinstall
cd frontend
rm -rf .next node_modules
pnpm install# Use a different port
pnpm dev -- -p 3001# Install Python certificates
/Applications/Python\ 3.*/Install\ Certificates.commandBackend:
# Set LOG_LEVEL=DEBUG in .env
LOG_LEVEL=DEBUG
# Or run with debug logging
uvicorn main:app --reload --log-level debugFrontend:
# Next.js shows detailed errors in development mode by default
pnpm devAdd to backend/.env:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=customer-service-aiView execution traces at: https://smith.langchain.com/
This project includes comprehensive documentation:
| Document | Description |
|---|---|
| README.md | This file - Project overview, quick start, and Phase 2 features |
| MANUAL_TESTING.md | NEW - Step-by-step manual testing guide with 10 test cases |
| backend/README.md | UPDATED - Backend setup, /chat API docs, LangSmith tracing |
| frontend/README.md | Frontend setup, component guide, and styling documentation |
| ARCHITECTURE.md | Complete system architecture, design patterns, and technical decisions |
| FLOWCHARTS.md | Visual process flows, sequence diagrams, and system interactions |
| PHASED_DEVELOPMENT_GUIDE.md | Development roadmap with phases, milestones, and implementation details |
| CONTRIBUTING.md | Contribution guidelines, Git workflow, and coding standards |
| DEVELOPMENT.md | NEW - Developer setup guide and best practices |
| CI_VERIFICATION.md | NEW - Local vs CI test command mapping |
| Makefile | NEW - Convenient make commands for common tasks |
| agentic-customer-specs.md | Original project specifications and requirements |
| tasks/ | PRDs and task lists for feature development |
We follow a structured development process:
- PRDs (Product Requirements Documents): Define what we're building
- Task Lists: Break down PRDs into actionable tasks
- Feature Branches: One branch per sub-task
- Conventional Commits: Clear, semantic commit messages
- Testing: All features must include tests
- Documentation: Update relevant docs with changes
For complete contribution guidelines, see CONTRIBUTING.md
This project is part of the ASU VibeCoding curriculum, demonstrating:
- Modern full-stack development
- AI/ML integration with LangChain v1.0+
- Multi-agent system design
- REST API development
- TypeScript and type safety
- Responsive web design
- DevOps practices (Docker, CI/CD)
This project is part of the ASU VibeCoding curriculum.
- Backend API Docs: http://localhost:8000/docs (when running)
- LangChain Documentation: https://docs.langchain.com/
- LangGraph Guide: https://docs.langchain.com/oss/python/langgraph
- FastAPI Documentation: https://fastapi.tiangolo.com/
- Next.js Documentation: https://nextjs.org/docs
- Tailwind CSS: https://tailwindcss.com/docs
For questions or issues:
- Check this README and relevant documentation
- Review the specific component README (backend or frontend)
- Enable debug logging and LangSmith tracing
- Check GitHub issues for known problems
- Review test files for usage examples
Phase 1-4: Foundation β
- β FastAPI backend + Next.js frontend infrastructure
- β Simple agent foundation with LangChain v1.0+
- β Multi-agent supervisor architecture
- β 4 specialized worker agents (Technical, Billing, Compliance, General)
Phase 5: RAG/CAG Integration β
- β Pure RAG for Technical & General (ChromaDB vector retrieval)
- β Hybrid RAG/CAG for Billing (first query retrieves, then caches)
- β Pure CAG for Compliance (pre-loaded static documents)
- β 8 sample documents across 4 domains
- β
Document indexing pipeline (
index_documents.py)
Phase 6: Multi-Provider LLMs & Streaming β
- β AWS Bedrock Nova Lite for supervisor routing ($0.06/1M tokens)
- β OpenAI GPT-4o-mini for worker generation ($0.15/1M tokens)
- β Real-time SSE streaming with token-by-token display
- β User toggle between streaming/standard modes
- β 11% cost savings vs single-provider strategy
π€ Multi-Provider LLM Strategy
- AWS Nova Lite for routing decisions (60% cheaper)
- OpenAI GPT-4o-mini for response generation
- Automatic fallback mechanism
π Real-Time Streaming
- Server-Sent Events (SSE) implementation
- Token-by-token response display
- User-controlled streaming toggle
π Advanced Knowledge System
- 3 RAG/CAG strategies optimized per domain
- ChromaDB vector store with 8 documents
- Session-based caching for billing queries
π― 4 Specialized Agents
- Technical Support (Pure RAG)
- Billing Support (Hybrid RAG/CAG)
- Compliance (Pure CAG)
- General Information (Pure RAG)
π§ͺ Production Quality
- 145 automated tests (91% coverage)
- Comprehensive error handling
- Full TypeScript + Pydantic validation
- LangSmith tracing support
- β Phase 1: Project Setup & Infrastructure
- β Phase 2: Simple Agent Foundation (20/20 tasks)
- β Phase 3: Multi-Agent Supervisor (13/13 tasks)
- β Phase 4: Additional Workers (11/11 tasks)
- β Phase 5: RAG/CAG Integration (10/10 tasks)
- β Phase 6: Multi-Provider LLMs & Streaming (3/3 tasks)
- β GitHub repository with complete source code
- β Comprehensive README and setup instructions
- πΉ Next: Record 5-10 minute YouTube demo video
User β Frontend (Streaming Toggle) β Backend API (/chat or /chat/stream)
β
Supervisor (AWS Nova Lite + fallback)
β
βββββββββ¬ββββββββββ¬ββββββββββββ¬ββββββββββ
βTechnicalβBilling βComplianceβ General β
β(Pure RAG)β(Hybrid)β(Pure CAG)β(Pure RAG)β
βββββββββ΄ββββββββββ΄ββββββββββββ΄ββββββββββ
β
ChromaDB / Cache / Pre-loaded Docs
Version: 1.0.0 (MVP Complete)
Last Updated: December 9, 2025
Status: Phase 6 Complete β
- PRODUCTION READY MVP
LangChain Version: 1.0+
All Requirements Met: Backend, Frontend, RAG/CAG, Multi-Provider LLMs, Streaming
Built with β€οΈ using Vibe Coding Strategy
ASU VibeCoding Project - Advanced Customer Service AI