Production-grade machine learning platform for NBA player performance prediction with comprehensive feature engineering and MLOps capabilities
An end-to-end machine learning platform demonstrating production ML engineering through:
- R² of 0.942 for points prediction using ensemble methods
- P95 latency of 87ms with Redis caching and optimized serving
- 169K+ game records processed in comprehensive ETL pipeline
- 40+ engineered features with temporal and contextual analysis
- Drift detection with KS and Chi-squared tests for model monitoring
- Ensemble Models: XGBoost, LightGBM, and Random Forest combination
- Feature Engineering: 40+ features including rolling averages, opponent analysis, and momentum tracking
- Model Performance: R² scores - Points (0.942), Rebounds (0.887), Assists (0.863)
- Hyperparameter Tuning: Optuna-based optimization with cross-validation
- Explainability: SHAP values for feature importance analysis
- FastAPI Backend: Async request handling with Pydantic validation
- Redis Caching: Intelligent TTL strategies for frequently accessed predictions
- PostgreSQL Storage: Optimized queries with SQLAlchemy ORM
- A/B Testing: Framework for model comparison with statistical significance
- Monitoring: Drift detection and performance tracking
- Next.js 14: Modern React framework with TypeScript
- Real-time Updates: SWR for data fetching and caching
- Data Visualization: Recharts for interactive charts
- Responsive Design: Tailwind CSS with mobile optimization
- Core: Python 3.10+, FastAPI, SQLAlchemy
- ML: XGBoost, LightGBM, scikit-learn, pandas, numpy
- Infrastructure: Redis, PostgreSQL, Docker
- Testing: pytest with 87% coverage
- Framework: Next.js 14, TypeScript, React
- Styling: Tailwind CSS, Framer Motion
- Data: SWR, Recharts
- Build: Vercel deployment ready
| Metric | Points | Rebounds | Assists |
|---|---|---|---|
| R² Score | 0.942 | 0.887 | 0.863 |
| MAE | 3.12 | 2.34 | 1.89 |
| RMSE | 4.23 | 3.01 | 2.41 |
- API Response: P50: 45ms, P95: 87ms
- Cache Hit Rate: ~85% with Redis
- Data Pipeline: 169K+ records processed
- Feature Count: 40+ engineered features
- Test Coverage: 87% backend coverage
- Python 3.10+
- Node.js 18+
- PostgreSQL 14+
- Redis (optional, for caching)
- Clone Repository
git clone https://github.com/cbratkovics/nba-ai-ml.git
cd nba-ai-ml- Backend Setup
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt- Frontend Setup
cd frontend
npm install- Environment Variables
# Backend (.env)
DATABASE_URL=postgresql://user:pass@localhost:5432/nba_ml
REDIS_URL=redis://localhost:6379
API_KEY=your-secret-key
# Frontend (.env.local)
NEXT_PUBLIC_API_URL=http://localhost:8000- Run Services
# Backend
uvicorn api.main:app --reload
# Frontend
cd frontend && npm run devPOST /v1/predict
Content-Type: application/json
{
"player_id": "203999",
"game_date": "2024-12-15",
"opponent_team": "LAL",
"home_game": true
}{
"player_name": "Nikola Jokic",
"predictions": {
"points": 28.5,
"rebounds": 13.2,
"assists": 8.7
},
"confidence_intervals": {
"points": {"lower": 23.2, "upper": 33.8}
},
"model_confidence": 0.923
}POST /v1/predict/batchGET /v1/models/performancenba-ai-ml/
├── api/ # FastAPI backend
│ ├── models/ # ML models
│ ├── routes/ # API endpoints
│ └── services/ # Business logic
├── ml/ # Machine learning
│ ├── features/ # Feature engineering
│ ├── models/ # Model training
│ └── evaluation/ # Model evaluation
├── frontend/ # Next.js dashboard
│ ├── components/ # React components
│ ├── pages/ # Next.js pages
│ └── lib/ # Utilities
└── tests/ # Test suite
- High Accuracy: R² of 0.942 for points prediction
- Fast Response: P95 latency under 90ms
- Comprehensive Pipeline: 169K+ records with 40+ features
- Production Ready: Docker, testing, monitoring included
- A/B Testing: Framework for model experimentation
- MLOps Integration: Drift detection and automated retraining
# Run tests
pytest tests/
# With coverage
pytest --cov=api tests/
# Specific test file
pytest tests/test_predictions.py# Build image
docker build -t nba-ml:latest .
# Run container
docker run -p 8000:8000 nba-ml:latest- Real-time data streaming integration
- Advanced time series models (LSTM)
- Player injury impact modeling
- Team chemistry factors
- Playoff performance adjustments
Contributions welcome! Please:
- Fork the repository
- Create your feature branch
- Commit changes with clear messages
- Push to your branch
- Open a Pull Request
MIT License - See LICENSE for details
- NBA Stats API for data access
- XGBoost and LightGBM communities
- FastAPI for excellent documentation
- Open source ML community