A distributed system to process multimodal inputs (Images/PDFs) with asynchronous task queuing, ML classification, and vector search.
- FastAPI — REST API server
- Redis — Async task queue
- Scikit-learn — Document classification
- Pinecone — Vector search database
- Matplotlib — Real-time dashboard
- PyMuPDF + Pillow — PDF & Image processing
- Upload PDF or Image documents via API
- Automatic ML-based categorization (technology, medical, legal, finance)
- Vector embeddings stored in Pinecone for similarity search
- Live performance dashboard with real-time graphs
- Fully async pipeline — never blocks on processing
pip install -r requirements.txt
PINECONE_API_KEY=your_key_here PINECONE_INDEX=nexus-index
brew services start redis
uvicorn main:app --reload
python worker.py
python dashboard.py
- GET / — Health check
- POST /upload — Upload a document
- GET /results — View all processed results
- GET /status — Check queue status
main.py — FastAPI server worker.py — Document processor classifier.py — ML classifier vector_store.py — Pinecone integration dashboard.py — Matplotlib dashboard