Skip to content

rehan243/RAG-Enterprise-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RAG-Enterprise-Search

Enterprise knowledge retrieval pipeline across 2TB+ structured and unstructured data sources using LangChain, FAISS, ChromaDB, and AWS SageMaker.

Python LangChain AWS


Overview

Production-grade Retrieval-Augmented Generation (RAG) pipeline designed for enterprise-scale knowledge retrieval. This system processes and indexes 2TB+ of structured and unstructured documents, providing context-aware search and AI-powered answers grounded in organizational knowledge.

Built and deployed at Verticiti and Reallytics.ai for clients in healthcare, finance, and enterprise operations.

Architecture

????????????????????????????????????????????????????????????
?                    Document Ingestion                      ?
?  PDF, DOCX, HTML, CSV, JSON, Databases, APIs              ?
????????????????????????????????????????????????????????????
                         ?
????????????????????????????????????????????????????????????
?                  Processing Pipeline                       ?
?  ???????????  ????????????  ????????????  ????????????? ?
?  ? Parser  ?? ? Chunker  ?? ? Cleaner  ?? ? Embedder  ? ?
?  ???????????  ????????????  ????????????  ????????????? ?
????????????????????????????????????????????????????????????
                         ?
         ?????????????????????????????????
         ?               ?               ?
?????????????????? ????????????? ????????????????
?     FAISS      ? ? ChromaDB  ? ?  PG-Vector   ?
?  (Dense Index) ? ? (Hybrid)  ? ? (Structured) ?
?????????????????? ????????????? ????????????????
         ?               ?               ?
?????????????????????????????????????????????????
?              Retrieval Orchestrator             ?
?  - Multi-index fusion                          ?
?  - Re-ranking (Cross-Encoder)                  ?
?  - Context window optimization                 ?
?????????????????????????????????????????????????
                     ?
?????????????????????????????????????????????????
?            LLM Generation Layer                ?
?  - GPT-4 / Claude / LLaMA                     ?
?  - Grounded responses with citations           ?
?  - Hallucination detection                     ?
?????????????????????????????????????????????????
                     ?
?????????????????????????????????????????????????
?              FastAPI Serving Layer              ?
?  - REST API endpoints                          ?
?  - Streaming responses                         ?
?  - Authentication & rate limiting              ?
?????????????????????????????????????????????????

Key Features

  • Multi-Source Ingestion: Parse PDF, DOCX, HTML, CSV, JSON, and database content with intelligent chunking strategies
  • Hybrid Search: Combine dense vector search (FAISS) with sparse retrieval for optimal recall
  • Cross-Encoder Re-Ranking: Two-stage retrieval with bi-encoder + cross-encoder for precision
  • Multi-Vector Store: FAISS for speed, ChromaDB for hybrid search, PG-Vector for structured data
  • Context Window Optimization: Intelligent truncation and relevance-prioritized context assembly
  • Grounded Responses: LLM outputs include source citations and confidence scores
  • Hallucination Detection: Post-generation validation against retrieved context
  • Scalable Deployment: AWS SageMaker endpoints with auto-scaling and A/B testing

Tech Stack

Category Technologies
Framework LangChain, Python
Vector Stores FAISS, ChromaDB, PG-Vector, Pinecone
Embeddings OpenAI Ada-002, HuggingFace sentence-transformers
LLMs GPT-4, Claude, LLaMA, Mistral
Re-Ranking Cross-Encoder (ms-marco), Cohere Rerank
Cloud AWS SageMaker, Lambda, S3
Data PostgreSQL, Redis (caching), Apache Airflow
API FastAPI, WebSockets (streaming)

Performance

Metric Value
Document corpus size 2TB+
Indexing throughput 10K docs/hour
Query latency (p95) < 500ms
Retrieval recall@10 92%
Answer accuracy (RAGAS) 87%
API cost reduction vs hosted models 40%

Source Code: The production source code for this project is maintained in a private repository due to proprietary and client confidentiality requirements. This repository documents the architecture, design decisions, and technical approach. For code-level discussions or collaboration inquiries, feel free to reach out.

Author

Rehan Malik - CTO @ Reallytics.ai


About

Production RAG pipeline — enterprise knowledge retrieval across 2TB+ data using LangChain, FAISS, ChromaDB, PG-Vector with cross-encoder re-ranking. Deployed on AWS SageMaker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages